General Information


Crate-O Use Cases
RO-Crate Collection Hierarchy
Schemas, Profiles and Modes
Schema.org Style Schemas (SOSSs) and RO-Crate Profiles and Modes

Crate-O is a browser-based editor that allows you to create and update Research Object Crates (RO-Crates), either using the web interface or with metadata from a spreadsheet. It provides researchers with a relatively simple way to describe their data using best practice in formal metadata description.

RO-Crate is a way of packaging research data that stores the data together with its associated metadata and other component files, such as the data license. It is a flexible, developer-friendly approach to linked-data description and packaging.

Currently, Crate-O works only with Google Chrome and Microsoft Edge. We will be releasing versions that work with online resources directly, which will be compatible with other browsers (see the Roadmap).

While the current version of Crate-O is designed for editing self-contained RO-Crates (and works fine with crates containing tens of thousands of entities), our roadmap includes adding the ability to edit fragments of larger linked-data resources and to integrate with repositories, such as the Oni repository, data API and archival repositories such as the Language Data Commons of Australia.


Crate-O Use Cases


Crate-O is designed to work with any of the following use cases:

  • Describe data collections and files on a user’s computer, and add contextual information about those files
  • Describe abstract contextual entities, such as in a Cultural Collection or an encyclopaedia
  • Annotate existing resources elsewhere on the web
  • Submit a data collection to the LDaCA Portal
  • Edit a Schema that contains a set of vocabulary terms, such as the terms used by LDaCA.

RO-Crate Collection Hierarchy


The diagram below shows the hierarchical relationship between collections, objects and files in a corpus, together with the metadata categories which track these relationships.


Self-contained corpus crate with all resources
Self-contained corpus crate with all resources
Image Source: LDaCA

The metadata is organised according to Schema.org entity types.

EntityDefinition
Classrdfs:Class is used to classify resources. Classes in the Language Data Commons (LDAC) schema include CollectionEvent, CollectionProtocol, DataDepositLicense, DataLicense and DataReuseLicense (see https://w3id.org/ldac/terms).
Propertyrdfs:Property is an attribute of an instance of a Class. For example, on an entity that is an instance of Class Person the property “name” would be their name, expressed as a text string, while “affiliation” would be a property that referenced another entity, their university.
DefinedTermA ‘word, name, acronym, phrase, etc. with a formal definition’, ‘often used in the context of category or subject classification.’ DefinedTerms allow us to a) have accurate definitions of the values we want to give to properties, and b) group such definitions in DefinedTermSets, which can function as controlled vocabularies.

The table below shows an example of the relationship between each of these entities:

LevelExample
ClassAnnotation
PropertyannotationType
Defined Term SetAnnotationTypeTerms
Defined TermsGestural, Phonemic, Phonetic, Phonological, Prosodic, Semantic, Syntactic, Transcription, Translation

For more details on these and other metadata entities, see Metadata for Language Data.


Schemas, Profiles and Modes


This diagram shows the relationship between the three main components used by Crate-O and other tools employed by LDaCA for specifying and validating RO-Crates. This section explains what these components are and how they relate.


The three main components for RO-Crate editing with Crate-O
The three main components for RO-Crate editing with Crate-O
Image Source: LDaCA

  1. A Schema specifies a metadata vocabulary of Classes and Properties, based on the RO-Crate specification’s use of Schema.org classes.

  2. An RO-Crate Mode is a set of lightweight syntactic rules for combining Schema.org Style Schema (SOSS) Classes, Properties and DefinedTerms, expressed in a JSON file that can be:

    • loaded into an editor such as Crate-O
    • imported into another program and used for RO-Crate validation
    • used to summarise the rules for an RO-Crate Profile.
  3. An RO-Crate Profile has (at least) a document that explains how metadata entities from the Schema are used for a particular purpose.

These are all inter-related, and can be developed together or separately using tools.


See the links below to the LDAC schema, profile and modes:


Schema.org Style Schemas (SOSSs) and RO-Crate Profiles and Modes


Schema.org, which provides the basic vocabulary for RO-Crate, has a light-touch approach to describing what it refers to as its schema (with a small-s), which might also be thought of as an ontology. Schema.org is defined as a set of Classes and Properties, each of which has an online definition. The below example illustrates that the base class Thing and its subclass Person has properties such as birthDate.


Class: Thing → Sub-Class: Person → Property: birthDate


Schema.org specifies which Properties can occur in the domain of which Classes, and the range of Classes that are expected as values for a Property.

While Schema.org has terms for Class and Property, it does not use these for defining the classes and properties in Schema.org itself (possibly as this would be circular). Rather, it uses the equivalent Classes from the rdf: and rdfs: vocabularies.

Here is the definition for Person:

{
      "@id": "schema:Person",
      "@type": "rdfs:Class",
      "owl:equivalentClass": {
        "@id": "foaf:Person"
      },
      "rdfs:comment": "A person (alive, dead, undead, or fictional).",
      "rdfs:label": "Person",
      "rdfs:subClassOf": {
        "@id": "schema:Thing"
      },
      "schema:source": {
        "@id": "http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_rNews"
      }
    }

The Class definition does not have any information about the occurrence of properties – that is found in a Property definition:

    {
      "@id": "schema:sibling",
      "@type": "rdf:Property",
      "rdfs:comment": "A sibling of the person.",
      "rdfs:label": "sibling",
      "schema:domainIncludes": {
        "@id": "schema:Person"
      },
      "schema:rangeIncludes": {
        "@id": "schema:Person"
      }
    }

A SOSS is a Flattened JSON-LD graph, just like an RO-Crate. Some members of the RO-Crate community are beginning to define its basic schema and RO-Crate Profiles using the SOSS’s same approach.

To make an RO-Crate Mode File, we transform the flat graph of a schema into something optimised for driving an editor or a validator; it creates a list of Classes, and what properties each may have.


Base Mode File creation, combining the Schema.org schema and RO-Crate additions using the rocsoss script
Base Mode File creation, combining the Schema.org schema and RO-Crate additions using the rocsoss script
Image Source: LDaCA