LDaCA is basing its data storage on Research Object Crates and the Oxford Common File Layout, which are described below. The overall approach is informed by the Arkisto platform, taking the view that research data has interest and value that extends beyond funding cycles and its long-term preservation and accessibility must continue to be managed. This presentation gives further details of the technical architecture.
Research Object Crates (RO-Crate)
A Research Object (RO) is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. RO-Crate is a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wide variety of situations. While RO-Crates can be considered general-purpose containers of arbitrary data and open-ended metadata, in practical use within a particular domain, application or framework, it is beneficial to further constrain RO-Crate to a specific profile: a set of conventions, types and properties that one minimally can require and expect to be present in that subset of RO-Crates. LDaCA is developing such a profile to be used for language data.
Oxford Common File Layout (OCFL)
OCFL is a specification for laying out digital collections on file or object storage. It is designed with long-term preservation principles in mind and does not rely on specialised software. Amongst the benefits of using OCFL with RO-Crate objects are:
- completeness: a repository can be re-indexed from the files it stores
- versioning: repositories can make changes to objects and still allow their history to persist