What is LDaCA?

The Language Data Commons of Australia (LDaCA) is making nationally significant language data available for academic and non-academic use and providing a model for ensuring continued access with appropriate community control.

Australia is a massively multilingual country, in one of the world’s most linguistically diverse regions. Significant collections of this intangible cultural heritage have been amassed, including collections of Aboriginal and Torres Strait Islander languages, Australian Englishes, and regional languages of the Pacific, as well as collections important for cyber-security and for emergency communication. LDaCA is integrating this existing work into a national research infrastructure while also securing at-risk collections and improving access to under-utilised collections. LDaCA is thus ensuring that these invaluable resources will be available for analysis and reuse in the future, and that they will be managed in a culturally, ethically and legally appropriate manner guided by FAIR and CARE principles.

To accomplish these goals, LDaCA is:

  • Developing a comprehensive language data access policy framework

    LDaCA is establishing governance structures guided by FAIR and CARE principles which will ensure that the legal rights and ethical concerns of the providers of data are respected.

  • Developing shared technical infrastructure and standards across institutions

    LDaCA is building technical capabilities based on best-practice international standards for sustainable data management. High-quality description of data is a foundation of this approach as it enables reuse of data in the future.

  • Building a sustainable long-term repository for curating language data collections of national significance

    The technical infrastructure being built is designed to maximise the security of data in the face of inevitable continuing technological change. That infrastructure includes storing data within the life of the project and also working with partner institutions to find storage solutions which will be stable in the future.

  • Building portals for discovery and access of language data

    The online interfaces provided by LDaCA will make it easier for researchers and other users of language data to find the materials they need to support their work. Specific needs vary across user groups and therefore different portals are being built to assist different communities.

  • Making analytic tools available to a diverse research community

    Through the associated projects the Australian Text Analytics Platform (ATAP) and the Language Technology and Data Analysis Laboratory (LADAL), LDaCA is making tools for analysing text available to researchers across various disciplines and various skill levels.

  • Contributing to Australia’s emerging digital research culture

    As well as providing data and tools for working with that data, LDaCA assists researchers by offering training in digital research practices which can lead to improved accountability and a strengthened research culture.

The result will be an integrated national technical infrastructure to analyse language collections at scale which will open up the social and economic possibilities of Australia’s rich linguistic heritage. The project is building connections to other projects in the Humanities and Social Sciences and Indigenous Research Data Commons (HASS&I-RDC) and assisting in laying the foundation for the establishment of that broader Research Data Commons, as well as positioning Australia internationally as a leading contributor of language collections and digital infrastructure.