LDaCA Newsletter Quarter 1 2024

LDaCA Newsletter — Quarter 1 2024
LDaCA logo with text Language Data Commons of Australia on a colourful background with black to green shading

LDaCA Newsletter — Quarter 1 2024


Welcome to the first issue for 2024 of this newsletter about the activities of the Language Data Commons of Australia (LDaCA) and the Australian Text Analytics Platform (ATAP). This quarter, we introduce the 2024 Graduate Digital Research Fellowship program, highlight the upcoming Computational Skills Summer School 2024 from the Australian Research Data Commons (ARDC) and report back from the 2023 Australian Linguistics Society (ALS) annual conference. If you have any questions or feedback, please email us at ldaca@uq.edu.au or message us on our new LinkedIn page.


New LinkedIn page

LDaCA is now on LinkedIn! We look forward to sharing more content about the project and connecting with the wider community. Follow our page or send us a message here.

Website update

Our website has a new look and has been reorganised. We hope that you like the design changes and that information is now easier to find. We encourage feedback via email or LinkedIn so that we can continue to make the website more useful.

Online policy documents and advice

New team member

We had a new team member, Teresa Chan, join us at the start of this year. Here is her brief introduction:

Hi, my name is Teresa Chan and I’ve joined the LDaCA team as a Communications Officer, based in Sydney on Dharug/Darug Country. My specialty is applied linguistics, most recently managing language data technology projects in the artificial intelligence and machine learning space. I’m looking forward to raising the profile of LDaCA and showcasing the important work being done to provide appropriate, continued access to language data collections, particularly for Aboriginal and Torres Strait Islander and Pacific languages.


Upcoming Events

HASS and Indigenous Research Data Commons Computational Skills Summer School 2024

When: 7–9 February 2024, 8:00am to 3:30pm (AEST), 9:00am to 4:30pm (AEDT)

Where: Naarm/Melbourne

Run by: ARDC

HDR, EMCR, Indigenous and HASS researchers as well as managers of Indigenous data are invited to a free Computational Skills Summer School. At this 3-day, face-to-face event, you will gain useful digital skills for HASS and Indigenous research in an interactive group setting and network with researchers in your field.

Session highlights include:

  • data governance and management, with a focus on Indigenous data

  • integrating social and geospatial data

  • finding and analysing GLAM (galleries, libraries, archives, and museums) data.

For more information and to register, see the registration page here.

Co-design workshops for LDaCA


  • Day 1: 22 February 2024, 1:00pm to 3:00pm (AEST), 2:00pm to 4:00pm (AEDT)

  • Day 2: 7 March 2024, 12:00pm to 2:00pm (AEST), 1:00pm to 3:00pm (AEDT)

Where: Online

Run by: ARDC

The ARDC invites the research community to join two workshops to co-design a national research infrastructure program that will support LDaCA. Through the co-design workshops, the ARDC aims to better understand the current digital research challenges faced by researchers and managers of Indigenous data. The workshops will enable the research community to discuss the investment opportunities that ARDC has identified, with the aim of learning:

  • what outcomes and developments would be of most benefit

  • what will be both valuable and feasible

  • how our investments can align with other activity in the sector.

Who should participate:

  • HASS and Indigenous research community, including academics, researchers and citizen scientists, particularly those involved in data-driven research

  • managers of Indigenous data

  • senior decision makers at research, GLAM and Indigenous institutions, industry and NGOs (non-governmental organisations)

  • those who collect and manage data for use by research

  • research infrastructure providers and digital skills trainers.

For more information and to register, see the registration page here.

Recent Events

Workshop on community language corpora in Australia

Li Nguyen and Catherine Travis convened a workshop on community language corpora in Australia (Canberra, Ngambri and Ngunnawal Country, 9–10 Nov). The participants enjoyed two days of stimulating presentations (including ones by LDaCA team members Li, Catherine and Simon Musgrave) and contributed to the important discussions which followed. The program and abstracts can be found here. This is the second workshop on language corpora in Australia that Li and Catherine organised in 2023. You can access the program and the abstracts for the first workshop here.

2023 Vocabulary Symposium: FAIR Vocabularies for All

Simon Musgrave and Peter Sefton attended the 2023 Vocabulary Symposium with the theme “FAIR Vocabularies for All” (Canberra, Ngambri and Ngunnawal Country, and online, 14–15 Nov). A summary of the event and video recordings can be found on the ARDC website here. A recording of Simon’s talk (“Using GitBook for user-friendly documentation of a vocabulary”) can be viewed here and a copy of the abstract and slides can be accessed here. The abstract and slides for Peter’s talk (“Building domain specific vocabularies for packaging and archiving research and cultural data”) can be accessed here.

Text analytics webinar series

Monika Bednarek gave a seminar called “New Tools for Corpus Linguistics”  (online, 9 Oct) in a text analytics webinar series jointly hosted by the ESRC Centre for Corpus Approaches to Social Sciences at Lancaster University and Sydney Corpus Lab at USyd. You can watch a recording and access a transcript here.

Corpus Spotlight: Corpus of Australian and New Zealand Spoken English (CoANZSE)

The Corpus of Australian and New Zealand Spoken English (CoANZSE) is a recent 196-million-word corpus of speech transcripts from Australia and New Zealand, including annotation, audio and forced alignment files.

Steven Coats (University of Oulu, Finland) created the corpus from 55,896 Automatic Speech Recognition transcripts from 478 YouTube channels of local councils and other governmental entities across all primary regions and territories of Australia and New Zealand. Annotation includes part-of-speech tagging and individual word timings. Exact latitude-longitude coordinates for the council authorities are also provided in the metadata, supporting geographical analysis of language data.

All the corpus material is scraped from YouTube channels associated with local government entities and is therefore in the public domain. This includes recordings of council meetings, but also interviews, informational and public service videos, vlogs, public readings and other content types.

CoANZSE will likely interest digital humanities and social science researchers in the fields of linguistics and language, communication, cultural studies, geography, political science, sociology, urban studies and planning, government and international relations and public and social policy, among others. You can read more about the corpus, including preliminary results from two exploratory analyses, in this conference paper.

The corpus is freely available for research and education purposes. Transcripts can be found at the Harvard Dataverse here. The CoANZSE Audio v0.2 website offers a searchable online version of the corpus, including audio files and forced alignments in Praat's TextGrid format, by logging in via a CLARIN/eduGAIN-affiliated service provider or a CLARIN account.

Contact Steven Coats at
steven.coats@oulu.fi or see the CoANZSE website for more information.

Team Member’s Tip

No Office Hours

The Joint Office Hour run by LDaCA and the Australian Digital Observatory will not take place in 2024. The teams from the two projects are working towards an alternative way to provide targeted advice to researchers watch this space!

We welcome any feedback to make future issues more useful for you. If the newsletter was forwarded to you, you can subscribe here.

Share this with a friend

LDaCA acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respects to their Ancestors and their descendants, who continue cultural and spiritual connections to Country.

You are receiving this email because you have provided us with your email address for promotional purposes.

Republishing is encouraged — CC BY text and infographics.

If you have questions about republishing, please contact ldaca@uq.edu.au

©LDaCA — 2024