Glossary

Access Conditions	Conditions which specify who can access data and what they can do with that data. A well-governed archival repository has mechanisms in place to administer and implement such conditions which will be specified on a data license.
ADA	Australian Data Archive. A national service for the collection and preservation of digital research data. More information
ADM+S	ARC Centre of Excellence for Automated Decision-Making and Society. It brings together universities, industry, government and the community to support the development of responsible, ethical and inclusive automated decision-making. More information
ADO	Australian Digital Observatory. An ARDC platform working to establish a national infrastructure to support a diverse array of researchers, especially in the humanities, in accessing and working with dynamic digital data. More information
AIATSIS	Australian Institute of Aboriginal and Torres Strait Islander Studies. Australia’s only national institution focused exclusively on the diverse history, culture and heritage of Aboriginal and Torres Strait Island Australia, with a growing collection of over one million items, dedicated to Australian Aboriginal and Torres Strait Islander cultures, histories and contemporary stories. More information
Analogue	Ways of storing information where the variables used for encoding are continuous. This contrasts with digital storage which uses only two discrete values. An example of analogue storage is a hard-copy photograph where the variables of hue and intensity are represented as continuous variables.
API	Application Programming Interface. A way for computer programs to communicate with each other. It is a way for one computer or system to ask another computer or system to do something, like provide a dataset.
ARC	Australian Research Council. Its purpose is to grow knowledge and innovation for the benefit of the Australian community through funding the highest quality research, assessing the quality, engagement and impact of research, and providing advice on research matters. More information
Archival Repository	A location for the storage of data that has an appropriate governance regime in place.
ARCP	Archive and Packaging ID. A globally unique, searchable ID with zero management overhead, but which can be used like URLs in linked data systems, but does not resolve to content in a browser.
ARDC	Australian Research Data Commons. The ARDC is Australia’s leading research data infrastructure facility accelerating Australian research and innovation by driving excellence in the creation, analysis and retention of high-quality data assets. More information
ARDS	Aboriginal Resource and Development Services (ARDS Aboriginal Corporation). Its work champions the importance of language and culture in developing self-determination for Aboriginal people, and supports Aboriginal communities to increase control and understanding of mainstream services and systems. More information
Arkisto	A scalable, standards-based platform for sustainable data. Data on an Arkisto deployment is always available on disc (or object storage) with a complete description independently of any services such as websites or APIs. Once the data is safe and well-described, Arkisto has a flexible model for how data can be accessed using a variety of services. Built on top of RO-Crate and OCFL. More information See also: Oxford Common File Layout See also: RO-Crate
ASR	Automatic Speech Recognition. ASR enables computers to process human spoken language into readable text, allowing users to operate devices through speech or facilitate translation of that speech into other languages.
ATAP	Australian Text Analytics Platform. An open source environment that provides researchers with tools and training for analysing, processing and exploring text. More information
AustLang	Provides a controlled vocabulary of persistent identifiers, a thesaurus of languages and peoples, and information about Aboriginal and Torres Strait Islander languages which has been assembled from referenced sources. Alphanumeric codes are used as persistent identifiers, while associated text strings are changeable and can reflect community preferences (including alternative names and spellings). In AustLang, Warlpiri has two codes: C15 for the language in general, and C15.1 for the variety named as Wakirti Warlpiri. More information
BI	Batchelor Institute of Indigenous Tertiary Education. The only First Nations dual sector tertiary education provider in Australia. The Institute gives precedence to its philosophy of Both Ways: positioning First Nations peoples as knowledge holders in all educational transactions with Western knowledge systems as well as privileging First Nations ways of learning and teaching to underpin engagement with mainstream education systems and society more broadly. More information
BinderHub	A Kubernetes-based cloud service that allows users to share reproducible interactive computing environments from code repositories. It is the primary technology behind Binder. ATAP notebooks are made available using a Binder instance maintained by AARNet/Nectar. More information
CADRE	Coordinated Access for Data, Researchers and Environments. A shared and distributed sensitive data access management platform for the social sciences and related disciplines. More information
CARE	Four principles developed by the Global Indigenous Data Alliance (GIDA) to ensure that Indigenous communities have control over the application and use of Indigenous data and Indigenous Knowledge for collective benefit. The principles specify four aspects of the respectful use of data: Collective Benefit Authority to Control Responsibility Ethics More information
CDL	Community Data Lab. CDL shares tools and datasets for collaborative HASS research projects that use data from archives, libraries and collections. More information
CDU	Charles Darwin University. More information
CLARIN	CLARIN is a digital infrastructure offering data, tools and services to support research based on language resources. It is a European Research Infrastructure Consortium (ERIC). More information
Class	In linked data, a resource that represents a concept or entity. Classes in the LDAC Metadata Schema include CollectionEvent, CollectionProtocol, DataDepositLicense, DataLicense and DataReuseLicense.
CMDI	Component Metadata Infrastructure. Provides a standard for metadata within CLARIN. It draws on the earlier ISLE Metadata Initiative (IMDI), but CMDI adopts a more flexible approach where components are assembled into reusable profiles. More information
Collection	A group of related Objects. Examples of collections include corpora, and sub-corpora, as well as aggregations of cultural objects such as PARADISEC collections, which bring together items collected in a region or a session with consultants.
Confidentiality	The obligation to protect identity and privacy as recognised under Australian Law in the Privacy Act 1988. More information
Controlled Vocabulary	A set of choices which are permitted as values of a data field. For example, the LDaCA metadata schema has a property communicationMode which can describe a resource. This property takes a value from a controlled vocabulary: SpokenLanguage, WrittenLanguage, SignedLanguage, Gesture, Song, WhistledLanguage.
Copyright	The legal right of the owner of intellectual property. In simpler terms, copyright is the right to copy. This means that the original creators of products and anyone they give authorisation to are the only ones with the exclusive right to reproduce the work.
Copyright Owner	The creator of the work, and the person/institution who has the exclusive right to reproduce, publish, perform, communicate, and adapt or modify the work, for both commercial and non-commercial purposes. The copyright owner may be the same as the Data Steward.
Corpus	A sizable collection of real-life examples of language selected to be a fair representation of the language or a particular linguistic genre. Use of the term generally implies that the material is in a form which can be read and manipulated by a computer.
Crate-O	A browser-based editor that allows you to create and update RO-Crates using a web interface, and with metadata spreadsheets. It provides researchers with a relatively simple way to describe their data using the best practices in formal metadata description. More information
Creative Commons Licenses	A set of licenses that allow for data reusability under specified conditions regarding attribution, data sharing, commercialisation and data adaptation.
Data Collection	A set of data collected under similar conditions and brought together in a shared framework.
Data Commons	Cloud-based infrastructure coupled with governance strategies and principles that allow a community to use, share, manage and analyse its data. LDaCA is a language data commons serving researchers and community groups that are interested in language data.
Data Custodian	A person to whom authority has been given to exercise (some of) the rights of a data owner.
Data Format	The encoding used to store a file. See also: Format
Data Governance	The policies and processes by which data is managed through its life cycle to ensure the quality, reliability, security, and sustainability of the data.
Data License	A legal arrangement between the creator of the data and the end-user specifying what users can do with the data. More information
Data Management Plan	A document that (1) outlines key information about a research project and its data, including the access conditions and ownership, storage, and future use and (2) sets out roles and responsibilities in its management.
Data Onboarding	The process by which language collections are catalogued in LDaCA, carried out collaboratively by the Data Steward and LDaCA.
Data Owner	A person who holds rights, such as copyright or moral rights, over some data.
Data Packaging	The application of widely used standards, for example, in terms of formats, metadata , and access conditions, to the collection data. See also: Data Transformation
Data Processing	The collection and manipulation of digital data to produce meaningful information.
Data Steward	An individual or organisation with the authority to make decisions regarding the collection.
Data Transformation	The process of converting, cleansing, and structuring data into a usable format, converting data from one format or structure into another format or structure. Sometimes used as a synonym for Data Packaging. See also: Data Packaging See also: Data Wrangling
Data Wrangling	The process of transforming messy raw data into tidier, more usable formats for analysis. See also: Data Packaging See also: Data Transformation
Defined Term	In linked data, a metadata category that allows for a) accurate definitions of the values assigned to Properties, and b) grouping such definitions in DefinedTermSets, which can function as controlled vocabularies. DefinedTerms in the LDAC Metadata Schema include DerivedMaterial, PartOfSpeech, SignedLanguage, SpokenLanguage, etc.
Describo	A tool that allows you to create and update RO-Crates. It provides researchers with a relatively simple way to describe their data using the best practices in formal metadata description. Superseded for project purposes by Crate-O.
DOI	Digital Object Identifier. A type of Persistent Identifier (PID) which is becoming the default identifier for research datasets, as a long-lasting reference to the collection. It comprises a unique number made up of a prefix and a suffix separated by a forward slash, resolvable by displaying it as a link, e.g. https://doi.org/10.1000/182
ELAN	A software tool to make time-aligned annotations (which may be transcriptions) of audio and video recordings. The tool is commonly used by linguists and others who work with language. More information
Elpis	A tool to obtain a first-pass transcription of untranscribed audio. It brings cutting-edge speech recognition technology within reach of language workers and researchers who don’t have backgrounds in speech engineering. More information
FAIR	Four key principles developed in 2016 with the aim of supporting the discovery and reuse of research data. The principles encourage us to make data: Findable Accessible Interoperable Reusable More information
Field Notebook/Journal	A collection of fieldnotes compiled while completing fieldwork.
Fieldnotes	Notes taken by a researcher while conducting fieldwork that record their observations and other relevant information.
Fieldwork	The collection of data from an environment where the data is likely to occur naturally or organically without the intervention of researchers. In linguistics, this typically involves studying a language as it is spoken by a community of speakers in a particular location.
FLA	First Languages Australia. A national organisation working to ensure the strength of all Aboriginal and Torres Strait Islander languages. More information
Format	The encoding used to store a file which is often specific for the nature of the material being stored. For example, .mp3 and .wav are formats for audio, .jpg and .tiff are formats for images. The examples illustrate that information about a file’s format is given in the extension of the filename.
GitHub	A developer platform that allows developers to create, store, manage and share their code, using Git software. More information
GLAM	Galleries, Libraries, Archives and Museums.
GLAM Peak	A representative national body that brings together the representative bodies for Australia’s galleries, libraries, archives, museums, historical societies, cultural heritage organisations and research peak bodies. More information
GLAM Workbench	A suite of Jupyter notebooks developed by Tim Sherratt to help with exploring and using data from GLAM institutions. Primarily, the notebooks use data from Trove newspaper and magazine collections, but have some extensions beyond this. More information
Glottolog	An alternative catalogue of the world’s languages, language families and dialects - Glottolog uses the term languoid to cover all of these. Each languoid is assigned a unique identifier consisting of four alphanumeric characters and four digits. For example, (standard) French has the code stan1290, and Warlpiri is warl1254. More information
HASS	Humanities, Arts and Social Sciences.
HMI	Human Machine Interface. A user interface that connects a person to a machine, system or device. For example, in-car HMIs allow drivers to interact with their vehicle.
IAD	Institute for Aboriginal Development (Aboriginal Corporation). An Aboriginal community-controlled organisation established as a cross-cultural adult education and training centre serving all Aboriginal people in Central Australia. More information
ICIP	Indigenous Cultural and Intellectual Property is the traditional knowledge, traditional cultural expression, and cultural heritage of Indigenous peoples.
IDIL	International Decade of Indigenous Languages. The United Nations General Assembly has declared the period between 2022 and 2032 as the International Decade of Indigenous Languages, to draw global attention to the critical status of Indigenous languages worldwide and encourage action for their revitalisation, promotion and ongoing use. More information
IDN	Indigenous Data Network. A national network of Aboriginal community-controlled organisations, university research partners, Indigenous businesses and government agencies and departments established to support and coordinate the governance of Indigenous data for Aboriginal and Torres Strait Islander peoples and empower Aboriginal and Torres Strait Islander communities to decide their own local data priorities. More information
IIRC	Improving Indigenous Research Capability. A project supporting the creation of an Aboriginal and Torres Strait Islander Research Data Commons. More information
Intellectual Property	Creative works protected by law via patents, copyright and trademarks.
Interoperability	The ability of computer systems or software to exchange and make use of information. The relevant FAIR principle uses the term specifically in relation to data.
IPA	International Phonetic Alphabet. An alphabetic system of phonetic notation based primarily on the Latin script, designed as a standardised representation of speech sounds in written form.
ISO-639	A standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups. An earlier version of this system used two-letter codes to identify languages; more recent versions use three-letter codes (referred to as ISO 639-3). The ISO 639-3 code for French is fra, and Warlpiri is wbp. More information
JSON	JavaScript Object Notation. A data-interchange text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages. More information
Jupyter Notebook	Interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media. More information
LADAL	Language Technology and Data Analysis Laboratory. A free, open-source, collaborative support infrastructure for digital and computational humanities assisting anyone interested in working with language data in matters relating to data processing, visualization and analysis, and offering guidance on matters relating to language technology and digital research tools. More information
LDAC	Language Data Commons. LDAC can refer either to the schema, profile or modes associated with it.
LDaCA	Language Data Commons of Australia. LDaCA is making nationally significant language data available for academic and non-academic use and providing a model for ensuring continued access with appropriate community control. Our preferred pronunciation of the name is el-dakka (and that is why you may find the odd alpaca on this website). More information
Legacy (File) Format	An old, outdated or obsolete file format that is no longer supported by modern hardware and/or software systems.
Lexicon	A list of forms in a language with associated information, such as meanings, pronunciations or word class assignments.
Licensing	A process that allows the copyright owner of a work to share the right to access and use some material from the work without reassigning the ownership of the copyright. License terms establish the conditions for that access and use. A license for a data collection is the legal agreement between the creator of the data and the end-user specifying who can access, share and reuse the data, and other conditions as required.
Linked Data	Structured data that is interlinked with other data and published in a machine-readable way to maximise interoperability and improve the precision of metadata.
Metadata	The information that defines and describes data. It provides data users with information about the purpose, processes, and methods involved in the data collection. (Source: Australian Bureau of Statistics).
Methods	Procedures, processes or techniques which can be systematically and repetitively used to achieve a particular goal.
Mode	Also called a Mode file. An implementation of an RO-Crate Profile consisting of a set of lightweight syntactic rules for combining Schema.org Style Schema (SOSS) Classes, Properties and DefinedTerms in a JSON file. Modes can be loaded to an editor such as Crate-O, used for RO-Crate validation or used to summarise rules for RO-Crate Profiles.
MT	Machine Translation.
NCRIS	National Collaborative Research Infrastructure Strategy. It provides strategic funding for national-scale research infrastructure, driving collaboration to bring economic, environmental, health and social benefits for Australia. More information
NER	Named-Entity Recognition. NER locates and classifies named entities in unstructured text into predefined categories such as person names, organisations and locations.
NFSA	National Film and Sound Archive. Australia’s national audiovisual cultural institution which collects, preserves and shares Australia’s audiovisual culture. More information
Nyingarn	A 3-year Australian Research Council funded project that will provide digital access to early sources of Australia’s Indigenous languages, using various ways to turn images of manuscripts into text, including Optical Character Recognition (OCR), and crowdsourced transcription (using DigiVol). More information
Object	A single resource or a group of tightly related resources; for example, a work (document) in a written corpus, or the files associated with a dialogue or session in a speech study (recordings, transcriptions etc.).
OCFL	Oxford Common File Layout. An application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories. More information
OCR	Optical Character Recognition. The electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text.
OLAC	Open Language Archives Community. An international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. More information
Oni	A portal for discovery of RO-Crated data. It is a web application which provides indexing, searching and access to secure data repositories which follow the Arkisto model. More information
Oral History	The gathering, recording and preserving of historical information, based on interviews about the experiences, memories and opinions of people who participated in or observed past events.
ORCID	Open Researcher and Contributor ID. A registry providing globally unique persistent identifiers (PIDs) for researchers, authors and contributors of scholarly works. More information
Orthographic Transcription	A transcription method that employs the standard spelling system of each target language.
PARADISEC	Pacific and Regional Archive for Digital Sources in Endangered Cultures. A digital archive that works to digitise, preserve and make accessible recordings that are at risk of loss; particularly for languages in the Pacific region. More information
Phonemic Transcription	A representation of speech in terms of the sound contrasts made in a language, using a phonetic alphabet, such as the International Phonetic Alphabet (IPA) or X-SAMPA.
Phonetic Transcription	A representation of speech in terms of the sounds actually produced in specific instances, using a phonetic alphabet, such as the International Phonetic Alphabet (IPA) or X-SAMPA.
PID	Persistent Identifier. A digital identifier that is permanently assigned and provides a long-lasting reference to an object or entity, for example a Digital Object Identifier (DOI).
Profile	Specifies a subset of a metadata standard for a particular use case, such as for describing language resources. LDaCA uses RO-Crate profiles, which are a set of conventions, types and properties that are required in RO-Crates. Specifically, the LDAC RO-Crate Metadata Profile provides the minimum structural metadata for describing language data resources.
Property	In linked data, a metadata category which is an attribute of an instance of a Classes. Properties in the LDAC Metadata Schema include author, communicationMode, linguisticGenre, speaker, signer, etc.
Provenance	The documented history or chain of custody of materials from their creation to their current location within a collection. The full history and ownership of an item from the time of its discovery or creation to the present day, through which authenticity and ownership are determined.
Python	A high-level, general-purpose programming language with an emphasis on code readability. More information
QUT	Queensland University of Technology. More information
R	A programming language and environment for statistical computing and graphics. More information
RDC	Research Data Commons. See also: ARDC
Research Data Management	The handling of data during and after a research activity including generating, collecting, organising, accessing, using, analysing, storing, disclosing, documenting, preserving, disposing of, sharing and re-using data.
REMS	Resource Entitlement Management System. A tool to help researchers browse resources such as datasets relevant to their research and to manage the application process for access to those resources.
Research Infrastructure	The facilities, systems, tools, platforms, equipment, instruments and other resources and services that are needed for research communities to conduct research. This can include both tangible assets, like supercomputers, and intangible assets, like data collections.
RIIP	Research Infrastructure Investment Plan (NCRIS). It provides continued support for Australia’s National Research Infrastructure facilities, as well as investment in emerging research priorities.
RO-Crate	Research Object Crate. A way of packaging research data that stores the data together with its associated metadata and other component files, such as the data license. More information
Schema	Specifies a metadata vocabulary of Classes and Properties, based on the RO-Crate specification’s use of Schema.org classes.
Sensitive Data	Data that, as a result of research, contains confidential or other ‘sensitive information’ which is defined in the Privacy Act as information or opinion about an individual’s: racial or ethnic origin political opinions membership of a political association religious beliefs or affiliations philosophical beliefs membership of a professional or trade association membership of a trade union sexual preferences or practices criminal record health information genetic information culturally sensitive data or data deemed sensitive by the data provider More information
Standard	A document that sets out specifications, procedures or guidelines ensuring that products, services, and systems are safe, consistent, and reliable is a standard. Standards exist at different levels,such as International Standards (like ISO639), national standards, and those which are accepted within a community of practice.
Takedown Policy	The policy according to which data may be removed, or access may be adjusted in some way, and the steps by which this is implemented.
TK Labels	Traditional Knowledge Labels. An initiative for Indigenous communities and local organisations, allowing communities to express local and specific conditions for sharing and engaging in future research and relationships in ways that are consistent with already existing community rules, governance and protocols for using, sharing and circulating knowledge and data. More information
Tools	Code or software developed in order to support or enhance (language) data accessibility and use. More generally, physical or digital resources which are used to accomplish specific tasks. For example, a hammer is used to drive nails, and the find-replace tool in a word processor is used to consistently change one piece of text to another.
Transcoding	The process of converting one digital encoding format to another, such as from a high-resolution image to a lower-resolution one.
TTS	Text-to-Speech. TTS generates an artificial spoken audio version of a written text and can be used to improve accessibility.
UI	The way in which a computer application is presented to the user and which defines how the user can interact with the application.
UoM	The University of Melbourne. More information
UQ	The University of Queensland. More information
USC	University of the Sunshine Coast. More information
USyd	The University of Sydney. More information
UWA	The University of Western Australia. More information
VoIP	Voice over Internet Protocol. A technology allowing phone calls to be made through the Internet using a broadband connection, rather than through a landline or mobile network.
Wangka Maya	Wangka Maya Pilbara Aboriginal Language Centre. It aims to be recognised as a leading Aboriginal language and resource centre in Australia, using expertise, knowledge and sensitivity to record and foster Aboriginal languages, culture and history. More information
Work Plan	An agreement between LDaCA and the Data Steward establishing the terms according to which the data will be onboarded to LDaCA, including the goals and responsibilities of each party, and the steps and timeline for carrying out the onboarding process.
WP	The smallest and most manageable unit of a project, a sequence of activities that leads to a deliverable.
X-SAMPA	Extended Speech Assessment Methods Phonetic Alphabet. A phonetic script designed to extend SAMPA to cover the range of characters in the International Phonetic Alphabet (IPA).
XML	Extensible Markup Language. A markup language and file format for storing, transmitting, and reconstructing data. More information
Zenodo	A multi-disciplinary open data repository maintained by CERN. More information

Glossary

About

Resources

News

Contact