Indigenous knowledge

  • Protocols for using First Nations Cultural and Intellectual Property in the Arts (Australia Council)
  • Citing Indigenous elders and knowledge keepers
  • TK Labels: Traditional Knowledge labels are an initiative for Indigenous communities and local organizations. Developed through sustained partnership and testing within Indigenous communities across multiple countries, the Labels allow communities to express local and specific conditions for sharing and engaging in future research and relationships in ways that are consistent with already existing community rules, governance and protocols for using, sharing and circulating knowledge and data.
  • Decolonising linguistics: Spinning a better yarn
  • Rawlings, V, Flexner, J L, Riley, L (eds.) (2021) Community-Led Research: Walking New Pathways Together. Sydney University Press

Indigenous Languages of Australia

  • First Languages Australia: First Languages Australia encourages communication between communities, the government and key partners whose work can impact Aboriginal and Torres Strait Islander languages.
  • Living Languages: Living Languages provides grassroots training to people, communities and Language Centres in Australia doing language work on the ground in remote, regional and urban areas.
  • Language Centres: The Indigenous Languages and Arts (ILA) program currently supports a number of organisations, including a network of 20 language centres. A list of these centres is available from this page (pdf or docx format).
  • AustLang: AustLang provides a controlled vocabulary of persistent identifiers, a thesaurus of languages and peoples, and information about Aboriginal and Torres Strait Islander languages which has been assembled from referenced sources.
  • Nyingarn: a platform for primary sources in Australian Indigenous languages
  • Digital Daisy Bates: a collection of over 23,000 pages of wordlists of Australian languages, originally recorded by Daisy Bates in the early 1900s, made up of the original questionnaires and around 4,000 pages of typescripts.
  • 50 Words Project: The 50 Words Project showcases words from 64 languages across Australia, with further languages being added regularly as more communities around Australia become involved. For each language you can hear the words spoken via a map that shows the general location of the language.
  • Gambay: a map of Australia’s first languages.
  • Australian Indigenous language collections: a guide to materials held in the National Library of Australia, with links to similar resources at State Libraries.
  • Living Archive of Australian Languages: see below
  • Contemporary and Historical Reconstruction in the Indigenous Languages of Australia (CHIRILA): CHIRILA is a lexical database (a database with words from different languages). Currently there are about 780,000 words, from all over Australia, of which about 20% is publicly available.


Language and linguistics datasets are often not cited, or cited imprecisely, because of confusion surrounding the proper methods for citing them. For the use of researchers and scholars in the field working with datasets, the Tromsø recommendations propose components of data citation for referencing language data, both in the bibliography and in the text of linguistics publications.

Languages of the world

  • Glottolog: Comprehensive reference information for the world’s languages, especially the lesser known languages. LDaCA uses Glottolog language codes in our metadata.

  • Open Language Archives Community (OLAC) is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. OLAC harvests metadata and their web site has a search facility to find resources for languages. OLAC metadata recommendations are the basis for some of LDaCA’s metadata.

  • Ethnologue: Ethnologue provides information about the languages of the world, but operates on a subscription model. The information which is available without a subscription is very limited: the first three lines of individual language entries which include the ISO 639-3 code, the classification of the language into a language family, and a link to the language’s OLAC page.

  • The World Atlas of Linguistic Structures Online (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors.

Language Archives

  • LAAL: The Living Archive of Aboriginal Languages is a digital archive of endangered literature in Australian Indigenous languages of the Northern Territory. It contains nearly 4000 books in 50 languages from 40 communities available to read online or download freely. This is a living archive, with connections to the people and communities where the books were created. This will allow for collaborative research work with the Indigenous authorities and communities.
  • PARADISEC: The Pacific and Regional Archive for Digital Sources in Endangered Cultures holds 14,500 hours of audio recordings and 2,000 hours of video recordings that might otherwise have been lost. These recordings are of performance, narrative, singing, and other oral tradition. This amounts to 150 terabytes, and represents 1,315 languages, mainly from the Pacific region.
  • ELAR: The Endangered Languages Archive is a digital repository for preserving multimedia collections of endangered languages from all over the world, making them available for future generations.
  • The Language Archive: The Language Archive (TLA) is an integral part of the Max Planck Institute for Psycholinguistics in Nijmegen. It contains various types of materials, including: audio and video language corpus data from languages around the world; photographs, notes, experimental data, and other relevant information required to document and describe languages and how people use them; records of speech in everyday interactions in families and communities; naturalistic data from adult conversations from endangered and under-studied languages, and linguistic phenomena.
  • Kaipuleohone Language Archive: Kaipuleohone is the digital language archive of the University of Hawaiʻi. Founded in 2008, the archive houses texts, images, audio, and video collected from around the world by linguists, anthropologists, ethnomusicologists, and more. Our collection includes a wealth of photographs, notes, dictionaries, transcriptions, and other materials related to small and endangered languages.
  • DELAMAN: The Digital Endangered Languages and Musics Archive is an international network of archives of data on linguistic and cultural diversity, in particular on small languages and cultures under pressure.

Australian Organisations

  • The Australian Linguistic Society (ALS) is the national organisation for linguists and linguistics in Australia. Its primary goal is to further interest in and support for linguistics research and teaching in Australia.

  • The Applied Linguistics Association of Australia (ALAA) is the national professional organisation for applied linguistics in Australia. It welcomes academics, teachers, researchers, students and members of the wider community to join and become part of an active community interested in questions, issues and problems that can be understood and addressed through a focus on language in our world.

  • The Australasian Language Technology Association (ALTA) has the purpose of promoting language technology research and development in Australia and New Zealand.

  • The Australasian Speech Science and Technology Association (ASSTA) is a scientific association which aims to advance the understanding of speech science and its application to speech technology in a way that is appropriate for Australia and New Zealand.

  • The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) is an Indigenous-led, national institute that celebrates, educates and inspires people from all walks of life to connect with the knowledge, heritage and cultures of Australia’s First Peoples.

International Organisations

  • Common Language Resources and Technology Infrastructure (CLARIN) is a research infrastructure that was initiated from the vision that all digital language resources and tools from all over Europe and beyond are accessible through an online environment for the support of researchers in the humanities and social sciences.

  • Endangered Languages Documentation Program (ELDP) supports the documentation and preservation of endangered languages through granting, training and outreach activities. The collections compiled through our funding are freely accessible at the Endangered Languages Archive.

  • The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. LDC was formed in 1992 to address the critical data shortage then facing language technology research and development. Initially, LDC’s primary role was as a repository and distribution point for language resources. but with the help of its members, LDC has grown into an organization that creates and distributes a wide array of language resources.


  • The Open Handbook of Linguistic Data Management Edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister. MIT Press 2022.

    “A guide to principles and methods for the management, archiving, sharing, and citing of linguistic research data, especially digital data. “Doing language science” depends on collecting, transcribing, annotating, analyzing, storing, and sharing linguistic research data. This volume offers a guide to linguistic data management, engaging with current trends toward the transformation of linguistics into a more data-driven and reproducible scientific endeavor. It offers both principles and methods, presenting the conceptual foundations of linguistic data management and a series of case studies, each of which demonstrates a concrete application of abstract principles in a current practice.”

    This material is Open Access.

  • Language Documentation and Conservation

    “LD&C publishes papers on all topics related to language documentation and conservation, including, but not limited to, the goals of language documentation, data management, fieldwork methods, ethical issues, orthography design, reference grammar design, lexicography, methods of assessing ethnolinguistic vitality, archiving matters, language planning, areal survey reports, short field reports on endangered or underdocumented languages, reports on language maintenance, preservation, and revitalization efforts, plus software, hardware, and book reviews.”

    LD&C is an Open Access journal.

  • Living languages/Lenguas vivas/Línguas vivas journal

    “Living Languages is an international, multilingual journal dedicated to topics in language revitalization and sustainability. The goal of the journal is to promote scholarly work and experience-sharing in the field. The primary focus is on bringing together language revitalization practitioners from a diversity of backgrounds, whether academic or not, within a peer-reviewed publication venue that is not limited to academic contributions and is inclusive of a diversity of perspectives and forms of expression.”

    Living Languages is an Open Access journal.

  • Pacific Linguistics: a digital archive of many Pacific Linguistics publications up to 2012.


  • Audacity: Audacity is a free, easy-to-use, multi-track audio editor and recorder for Windows, macOS, GNU/Linux and other operating systems.
  • ELAN: a tool for making time-aligned annotations on audio and video recordings.
  • Praat: Free software for doing phonetics by computer.
  • FieldWorks: Software tools for managing linguistic and cultural data. FieldWorks supports tasks ranging from the initial entry of collected data through to the preparation of data for publication, including dictionary development, interlinearization of texts, morphological analysis, and other publications.
  • Toolbox: Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.
  • AntConc: A freeware corpus analysis toolkit for concordancing and text analysis.