Workshop on community language corpora in Australia
Australia is a highly multilingual and multicultural society, with more than 490 languages coming from around 300 ancestries and cultural traditions (ABS, 2021, 2022). For decades, the language and cultural maintenance of various immigrant groups have been under investigation by many scholars, not only in linguistics but also in history, sociology, anthropology, and other disciplines. This work has amassed a large body of data reflecting the languages of these communities, providing information about how Australia’s immigration history has contributed to the country today.
The purpose of this workshop is to bring together scholars working with language corpora from across different disciplines. The workshop is being run as part of the Language Data Commons of Australia (LDaCA), which is working to build national research infrastructure for the Humanities and Social Sciences, facilitating sustainable access to and controlled use of digital language corpora for linguists, scholars across the Humanities and Social Sciences, and non-academics.
The workshop will consist of presentations on language data collected from Australian immigrant communities for different research purposes, and will close with a panel discussion on needs and challenges around managing and archiving community language data in a way that is ethical, legal and culturally sensitive, and how LDaCA can help support that.
When: 9-10 November 2023
Where: Engma Room (3.165 HC Coombs Building, ANU)
Organisers: Li Nguyen & Catherine Travis
Program (pdf with abstracts)
Our webinar series is a joint initiative with the Language Technology and Data Analysis Laboratory (LADAL), (School of Languages and Cultures, The University of Queensland).
October 3 2022 - Paweł Kamocki: European Union Data Protection initiatives and their consequences for research
The European Union, with its large population and GDP, is a leading force in regulatory globalisation. This webinar will discuss recent developments in legal frameworks affecting research data in Europe. Apart from the General Data Protection Regulation which, since its entry into application in 2018, has become an international standard of personal data protection, the recent introduction of statutory copyright exceptions for Text and Data Mining will also be discussed. Moreover, the webinar will also include a presentation of the most recent changes in EU law, such as the Data Governance Act and the Artificial Intelligence Act, which are expected to enter into application in the coming years.
Paweł Kamocki is a legal expert in Leibniz-Institut für Deutsche Sprache, Mannheim. He studied linguistics and law, and in 2017 obtained his doctorate in law from the universities of Paris and Münster for a thesis on legal aspects of data-intensive university research, with a focus on Knowledge Commons. He worked as a research and teaching assistant at the Paris Descartes university (now: Université de Paris), then also in the private sector. He is certified to work as an attorney in France. An active member of the CLARIN community since 2012, he currently chairs the CLARIN Legal and Ethical Issues Committee. He also worked with other projects and initiatives in the field of research data policy (RDA, EUDAT) and co-created several LegalTech tools for researchers. One of his main research interests are legal issues in Machine Translation.
August 1 2022 - Václav Cvrček: The Czech national Corpus
Václav Cvrček is a linguist who deals with the description of the Czech language, especially with the use of large electronic corpora and quantitative methods. In 2013-2016 he worked as the director of the Czech National Corpus project, since 2016 he has been the deputy director. Recently, he has been focusing on research on textual variability and corpus-based discourse analysis with a focus on online media.
June 6 2022 - Barbara McGillivray: The Journal of Open Humanities Data
Barbara McGillivray is a Turing Research Fellow at The Alan Turing Institute, and Editor in Chief of the Journal of Open Humanities Data. Since September 2021 she is also a lecturer in Digital Humanities and Cultural Computation at the Department of Digital Humanities of King’s College London. Before joining the Turing, she was language technologist in the Dictionary division of Oxford University Press and data scientist in the Open Research Group of Springer Nature. Her research at the Turing is on how words change meaning over time and how to model this change in computational ways. She works on machine-learning models for the change in meaning of words in historical times (Ancient Greek, Latin, eighteen-century English) and in contemporary texts (Twitter, web archives, emoji). Her interdisciplinary contribution covers Data Science, Natural Language Processing, Historical Linguistics and other humanistic fields, to push the boundaries of what academic disciplines separately have achieved so far on this topic.
4 April 2022 - Keoni Mahelona: A practical approach to Indigenous data sovereignty
Keoni Mahelona is the Chief Technical Officer of Te Hiku Media where he is a part of the team developing the Kaitiakitanga Licence. This licence seeks to balance the importance of publicly accessible data with the reality that indigenous peoples may not have access to the resources that enable them to benefit from public data. By simply opening access to data and knowledge, indigenous people could be further colonised and taken advantage of in a digital, modern world. Therefore Keoni is committed to devising data governance regimes which enable Indigenous people to reclaim and maintain sovereignty over indigenous data.
We invite Australian researchers working with linguistics, text analytics, digital and computational methods, social media and web archives, and much more to attend our regular online office hours, jointly hosted with the Digital Observatory. Bring your technical questions, research problems and rough ideas and get advice and feedback from the combined expertise of our ARDC research infrastructure projects. No question is too small, and even if we don’t know the answer we are likely to be able to point you to someone who does.
These sessions run over Zoom from 2-3pm (Australia/Sydney time) every second Tuesday - details.