Convert Spreadsheet


Template
Metadata Schemas
Tab Breakdown
Column Breakdown
Convert Spreadsheet to an RO-Crate with Crate-O

Template

For collections where there are a lot of interconnected objects and files, it may be easier or preferable to add the metadata for these via converting a spreadsheet to an RO-Crate in Crate-O, rather than adding these items manually. An RO-Crate metadata spreadsheet template can be downloaded below and populated with metadata specific to your collection:


ro-crate-metadata-template.xlsx


This template can be edited in Microsoft Excel, LibreOffice Calc or Google Sheets. It is not compatible with Apple Numbers.

The template is based on an example data collection that contains three types of files within each object:

  • Audio files (WAV), the primary material
  • Text files (CSV), transcriptions of the audio files
  • ELAN files (EAF), linguistic annotations of the audio files

Spreadsheet conversion currently only has functionality to add new data, and cannot overwrite or edit existing data in your RO-Crate.


Metadata Schemas

The spreadsheet uses a number of standard vocabularies for terms. Namespaces such as ldac and pcdm are prefixed to some metadata in the sections below - this is to:

  • indicate that the term is not a part of the Schema.org vocabulary which RO-Crate is based on, but uses another namespace
  • avoid overlaps where multiple schemas have the same term but differing usages and definitions.

In Crate-O, these prefixes are hidden for legibility.


If your collection requires metadata terms that are not present in the template, check if an existing term fits from the schemas below.

If you have terms specific to your collection that aren’t covered by the above schemas, see the Custom Terms tab on how to use them.


Tab Breakdown

The spreadsheet has the below tabs by default, but depending on your collection, you may need to add additional tabs, or others may not be applicable.


TabDescription
RootDatasetMetadata about the root or top level of the collection. Unlike the other tabs, the header for the root dataset is vertical rather than horizontal.
@contextSpecifies the vocabulary or schema that is intended to be used with language data.
Custom TermsSpecifies any terms used that are specific to the collection and not part of other existing metadata schemas.
AuthorsMetadata about the person or organisation responsible for creating this collection.
PublishersMetadata about the organisation responsible for releasing this collection.
LicensesMetadata about the license(s) within the collection, both for the objects and files.
ProvenanceMetadata about the documented history or chain of custody of materials from their creation to their current location within a collection.
PeopleMetadata about the people within the collection.
PlacesMetadata about the places within the collection.
LocalitiesMetadata about the geometric location data within the collection.
ObjectsMetadata about the entities within the collection that could encompass one or more files.
FilesMetadata about the files in your collection. If the collection has multiple file formats that you prefer to track separately, duplicate this tab and add the formats to the tab names, e.g. CSV_Files, EAF_Files, WAV_Files.
SchemasMetadata about the schemas within the collection, for example, the set of columns used in tabular CSV files.
ColumnsMetadata about the columns within the CSV files in the collection.

ELAN (.eaf) files can have relative or absolute paths to the data they relate to. The ELAN preferences file is generally not needed for the collection and relates to the particular ELAN user only.


Examples in the Template

Below the header, at least one example row is included to illustrate how the section can be filled. This is colour-coded according to whether the column:

  • requires the user to manually input data (blue)
  • is pre-filled with a formula or static value and doesn’t require editing (green)
  • is for internal use and doesn’t require editing in most cases (orange).

HINT: Highlight the example row and drag it down to copy all the pre-filled cells. Remember to remove the example rows before you convert your spreadsheet in Crate-O!

The columns provided in the template tabs are illustrative only and may not all apply to your collection; please edit these as needed. Where a column header begins with a full stop ., this indicates that the column will be ignored when the data is loaded into Crate-O and will not appear in the RO-Crate. This can be helpful if you want to retain other information in your spreadsheet that may not be in a format applicable to the RO-Crate.


At a minimum, it’s best practice to include @id and @type columns in each of your spreadsheet tabs, as these appear in Crate-O for each of the entities. The tables in the next sections provide further detail on what constitutes a valid @id and @type in each tab.

HINT: To type a column name beginning with @ in Excel, put an apostrophe before it '@. This will force it to be recognised as a text value rather than a formula.


Column Breakdown

The section below describes each of the columns included in the template, ordered by tab. Please note that the columns provided in the template tabs are illustrative only and should be edited according to the requirements of your collection.


RootDataset

The root dataset tab provides information about the top level of the collection. Unlike the other tabs, the root dataset tab lists items row by row and can only have one column, so if there are rows that require more than one value (like @type), duplicate that row.

ColumnTypeDescription
@idData entryPersistent, managed unique ID in URL format (if available), for example, a DOI for a collection. The default for this field is ./ indicating a relative path to your current directory, however, if you already have a persistent ID for the collection, it can be added in this field instead.
@typePre-filledThe type of the collection. Both Dataset and RepositoryCollection are required.
nameData entryThe name of this collection.
descriptionData entryAn abstract of the collection. Include as much detail as possible about the motivation and use of the dataset, including things that we do not yet have properties for.
ldac:doiData entryA Digital Object Identifier, e.g. https://doi.org/10.1000/182.
isRef_authorPre-filledGenerated from the @id column in the Authors tab.
isRef_publisherPre-filledGenerated from the @id column in the Publishers tab.
isRef_licensePre-filledGenerated from the @id column in the Licenses tab.
datePublishedData entryThe date the object was published. The date should be in the ISO 8601 format YYYY-MM-DD.
inLanguageData entryThe language in which the resource is written. For example, a work about the Italian language as used in Australia (ldac:subjectLanguage) that is written in English (inLanguage).
ldac:subjectLanguageData entryThe languages that the materials in the collection are about (not the language that it is in). For example, a work about the Italian language as used in Australia (ldac:subjectLanguage) that is written in English (inLanguage).
ldac:metadataIsPublicData entryDetermines whether the collection metadata can be viewed publicly. Requires a Boolean value (TRUE or FALSE).

The prefix isRef_ indicates that data in this column should be taken from another @id field in the spreadsheet. For example, isRef_author uses the @id from the Author tab to link all the author details to the RootDataset tab.


@context

The context specifies the vocabulary or schema that is intended to be used with the collection. In the case of language data, the Language Data Commons Schema (ldac) is used. This is also the place to specify the schema for Custom Terms if these occur in your collection.

ColumnTypeDescription
namePre-filledThe namespace of the required vocabulary or schema. The template is pre-filled with the ldac schema, which is required for this template. It is also pre-filled with csvw for Schemas and Columns, and custom for Custom Terms.
@idPre-filledPersistent, managed unique ID in URL format of the vocabulary or schema.

Custom Terms

If you have terms specific to your collection that aren’t covered by the existing Metadata Schemas, use the custom terms tab to add them.

Metadata terms are organised according to the following entity types. Use the types that are most useful for your collection.

EntityDescriptionExamples
PropertyUsed for attributes of the thing you are describing, similar to fields you might see on a form.
  • If you want the value of the term to be free text, e.g. name, age, make a property.
  • If you have a set of finite values for the term, like multiple choice, make a property as the overarching term.
  • motherTongue
  • register
Defined Term SetsUsed to define a group of terms that can be used under a single property.
  • If you have a set of finite values for a property, like multiple choice, make a defined term set to group these.
  • RegisterTerms
Defined TermsUsed for the values of a defined term set that are used under a single property.
  • If you have a set of finite values for a property, like multiple choice, make each of these a defined term.
  • GovernmentEnglish
  • PrivateWritten
  • PublicWritten
  • SpeechBased

In the template, the example contains a property, two defined terms, and a defined term set. The property #textType has the defined term set #TextTypeTerms, which contains the two defined terms #Speech and #Interview.

Properties should start with a lowercase letter, whereas Defined Terms and Defined Term Sets start with uppercase.

ColumnTypeDescription
@idPre-filledA complete identifier for the term, generated from the @id of the custom row in the @context tab and the .id column.
.idData entryA unique identifier for the term. The prefix # isn’t needed.
@typeData entryThe type of the term. Select either rdf:Property, DefinedTerm or DefinedTermSet. See the table above for descriptions and examples of these.
nameData entryThe name of the term.
descriptionData entryA description of the term.
isRef_inDefinedTermSetData entryIf one or more of your terms has the @type DefinedTerm, add the @id of the defined term set it is a part of here. If you haven’t created a term with the @type DefinedTermSet, add another row for this. For rows that have the @type rdf:Property or DefinedTermSet, leave this field blank.
sameAsData entryIf the term you are defining is the same as a term in another schema (excluding those in Metadata Schemas), add the URL to the term.
rdfs:subClassOfData entryFor internal use and doesn’t need editing. See rdfs:subClassOf for more detail.

Using Custom Terms on Other Tabs

Once you’ve listed your custom terms, these can be used throughout the spreadsheet in the following ways:

  • Properties: as column headers in the format custom:yourProperty
  • Defined Terms: as column values under their related custom property in the format custom:YourDefinedTerm

Defined Term Sets are only required in the Custom Terms tab to group a set of Defined Terms and don’t need to be used elsewhere in the spreadsheet.


Authors

An author is a person or organisation responsible for creating the collection. It is possible for collections to have multiple authors.

ColumnTypeDescription
@idData entryPersistent, managed unique ID in URL format (if available), for example, an ROR for an organisation or an ORCID, personal home page URL or email address for a person.
@typeData entryThe type of the author. Select either Person or Organization.
nameData entryThe name of the author. Don’t include titles such as Dr/Prof.

Publishers

A publisher is an organisation responsible for releasing the collection. It is possible for collections to have multiple publishers.

ColumnTypeDescription
@idData entryPersistent, managed unique ID in URL format (if available), for example, an ROR for an organisation.
@typePre-filledThe type of the publisher. Only Organization is valid.
nameData entryThe name of the organisation.

Licenses

A license for a collection establishes the conditions for who can access, share and reuse the data, and other conditions as required. It is a legal arrangement between the creator of the data and the end-user specifying what users can do with the data.

ColumnTypeDescription
@idData entryA URL to a version of the license (if available), for example, a URL of a Creative Commons license. For custom licenses (i.e. those specific to a particular collection), it is recommended that a copy of the license file be included in the repository to ensure that it remains accessible. license.txt or similar should be added as the @id.
@typePre-filledThe type of license. ldac:DataReuseLicense is required for all items, and File should also be added for physical licenses in the collection as [ldac:DataReuseLicense, File].
nameData entryThe name of the license.
descriptionData entryA description of the license.
ldac:allowTextIndexData entryDetermines whether the collection text can be indexed for search purposes. Requires a Boolean value (TRUE or FALSE).
isRef_sameAsData entryIndicates that two items are identical versions of the same license. For example, a Creative Commons license that has a URL as well as a local copy contained within the collection.
isRef_isPartOfPre-filledSpecifies the collection that the license is a part of, generated from the @id column in the RootDataset tab.

It is possible to leave the licensing tab blank if these details are still being finalised for the collection, however, this will need to be amended later in Crate-O.

If there are any additional usage restrictions or options for use outside of a given license, this information can be included in a usageInfo field, e.g. “For any use not permitted by the CC-BY-ND 4.0 License, please contact the Data Steward”.


Provenance

The provenance for a collection details the documented history from an item’s creation to its current location within a collection, including changes in format and tools required to read the file.

ColumnTypeDescription
@idData entryA unique identifier for the document change within the collection. Identifiers should be prefixed with #.
@typePre-filledThe type of provenance. Only CreateAction is valid.
nameData entryThe name of the action on the document.
descriptionData entryA description of the changes to the document within the collection.
isRef_objectData entryThe document upon which the action is carried out, i.e. a file that was used as an input in some way.
isRef_resultData entryThe resulting document produced in the action, i.e. the output file.
instrumentData entryThe tool or software app used to create the output file. If a more complete description of the software is required, change the header from instrument to isRef_instrument instead.
isRef_agentData entryThe direct performer or driver of the action, for example, an ROR for an organisation or an ORCID, personal home page URL or email address for a person.

People

This tab contains information about the people within the collection.

ColumnTypeDescription
@idPre-filledA unique identifier for the person, generated from the name column. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only Person is valid.
nameData entryThe name of the person.
genderData entryThe gender of the person. An example of an optional metadata field from the source data, using a Schema.org term.
birthDateData entryThe birth date (year) of the person. An example of an optional metadata field from the source data, using a Schema.org term.
isRef_prov:specializationOfData entryA reference to another Person entity, used for collections where a person appears more than once with different demographic info (e.g. a different age). In these collections, there should be a ‘canonical’ person for each participant and another Person entity each time they participate, with different ages or other statuses.

Places

This tab contains information about the places within the collection.

ColumnTypeDescription
@idData entryA unique identifier for the place. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only Place is valid.
nameData entryThe name of the place.
descriptionData entryA description of the place, including its alternative names.
isRef_geoData entryThe @id of the location to which this object relates from the Localities tab.

Localities

This tab contains information about the geometric locations within the collection.

ColumnTypeDescription
@idData entryA unique identifier for the location. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only Geometry is valid.
.latitudeData entryThe latitude of the location in decimal degree format.
.longitudeData entryThe longitude of the location in decimal degree format.
asWKTPre-filledThe WKT serialisation of the geometry, generated from the .latitude and .longitude columns. Note that asWKT format lists longitude first followed by latitude.

Objects

An object is a single resource or a group of tightly related resources in a collection. For example, a work (document) in a written corpus, or the files associated with a dialogue or session in a speech study (recordings, transcriptions, etc.). Some systems, such as PARADISEC, refer to Objects as Items or may use other terms.

ColumnTypeDescription
@idPre-filledA unique identifier for the object, generated from the name column. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only RepositoryObject is valid.
nameData entryThe name of the object.
descriptionData entryA description of the object.
isRef_ldac:speakerPre-filledGenerated from the .pseudonym column with # prefixed.
.pseudonymData entryAn example of a column from a data steward’s source data, so that speakers in the collection are anonymised.
datePublishedData entryThe date the object was published. The date should be in ISO 8601 format YYYY-MM-DD.
isRef_pcdm:memberOfPre-filledThe collection this object is a member of, generated from the @id column in the RootDataset tab. Or if the collection contains sub-collections, a reference to another RepositoryCollection @id.
isRef_licenseData entryThe @id of the license to which this object adheres from the Licenses tab.
isRef_ldac:indexableTextData entryIdentifies which of the files in the given object has content that is indexed for search purposes. For example, in the template, the content of the CSV file would be searchable, whereas the EAF and WAV files would not. If isRef_ldac:indexableText is not included in a collection, search will only run on the metadata and not the transcript file content.
isRef_contentLocationData entryThe @id of the place to which this object relates from the Places tab.
inLanguageData entryThe language in which the resource is written. For example, a work about the Italian language as used in Australia (ldac:subjectLanguage) that is written in English (inLanguage).
ldac:subjectLanguageData entryThe languages that the materials in the collection are about (not the language that it is in). For example, a work about the Italian language as used in Australia (ldac:subjectLanguage) that is written in English (inLanguage).
isRef_custom:textTypeData entryThe @id of the term to which this object relates from the Custom Terms tab. An example of an optional custom term from the source data.

Files

A file is a container for data and can store data in different formats. A single object could have an audio file as well as a text file containing a transcription of the audio. Three examples of file types are included in the template: CSV, EAF and WAV.

ColumnTypeDescription
@idPre-filledThe file path to the given file. Generated from the .folder and .filename columns.
@typeData entryThe type of file. In the first @type column, File is required for all items.
In the second @type column, choose from either ldac:PrimaryMaterial, ldac:Annotation or ldac:DerivedMaterial (see materialType for full term descriptions). If the file is a tabular CSV and you have a schema of the columns used, csvw:Table should also be added, e.g. [ldac:Annotation, csvw:Table]. See Schemas and Columns for more detail.
.folderData entryThe folder name in which the given file appears. If the file path has subfolders, use forward slash /, without the slash at the end of the file path, e.g. path/to/folder.
.filenameData entryThe name of the given file, including postfixes, e.g. filename.txt.
isType_ldac:PrimaryMaterialData entryIndicates whether the given file is the object of study, such as a literary work, film, or recording of natural discourse. Requires the Boolean value TRUE or leave blank if false.
isType_ldac:AnnotationData entryIndicates whether the given file is an annotation of another file. Requires the Boolean value TRUE or leave blank if false.
isRef_isPartOfData entrySpecifies the object that the file is a part of. Template example uses the @id column of the Objects tab. If entering manually, note that this field is case-sensitive.
isRef_ldac:annotationOfData entryThe full filename of the primary material that the given file is an annotation of. Leave this blank if the file is the primary material.
isRef_csvw:tableSchemaData entryIf the file is a tabular CSV and you have a schema of the columns used, add the schema ID here, otherwise leave this field blank. See Schemas and Columns for more detail.

Schemas

If you have tabular CSV files in your collection, a schema allows you to define tabular formats for the tables used within the collection, which are further detailed in the Columns tab.

ColumnTypeDescription
@idData entryA unique identifier for the schema. Identifiers should be prefixed with #.
@typePre-filledThe type of the schema. Only csvw:Schema is valid.
nameData entryThe name of the schema.
isRef_conformsToData entryA standard that the schema follows. Only tabulatorMapping is valid.

Columns

If you have tabular CSV files in your collection with multiple columns, this tab allows you to identify the columns within your data as well as provide definitions for them. This allows users to see at a glance what is contained within the CSV files through the metadata and HTML preview, rather than having to open the files individually.

ColumnTypeDescription
@idPre-filledA unique identifier for the column, generated from the name column. Identifiers should be prefixed with #col_.
@typePre-filledThe type of the entity. Only csvw:Column is valid.
nameData entryThe name of the column. Avoid using spaces, as this is also used to generate the @id.
descriptionData entryA definition of the column.
csvw:propertyUrlData entryIf any of the columns map directly to another schema term, use this field to provide the unique identifier of that term, otherwise leave this field blank.
isReverse_csvw:columnsData entryThe @id of the schema to which this column relates from the Schemas tab.

Convert Spreadsheet to an RO-Crate with Crate-O

For steps on adding your spreadsheet data to an RO-Crate using Crate-O, see Append Data from Spreadsheet.