Collections

LDaCA Portal

LDaCA has begun adding datasets, including:

most of the datasets which were part of the Australian National Corpus collection, now available at our data portal (or click the Data Portal button top right).

A Corpus of Oz Early English (COOEE)

A collection of texts written in Australia between 1788 and 1900. The corpus is divided into four time periods (1788–1825, 1826–1850, 1851–1875 and 1876–1900) each holding about 500,000 words. Four registers were defined for COOEE: the Speech-based Register (SB), the Private Written Register (PrW), the Public Written Register (PcW) and the register of Government English (GE). For each time period, there is a similar number of words in the different registers.

AustLit

The contribution from AustLit provides full-text access to select samples of out-of-copyright poetry, fiction and criticism ranging from 1795 to the 1930s. The collection includes literature intended for popular audiences as well as literature intended for audiences concerned with literary quality or the establishment of a national canon.

Australian Corpus of English

The Australian Corpus of English (ACE) was compiled to match Australian data from 1986 with the American (Brown) and British (LOB) corpora of written English from the 1960s. It includes 500 samples of published texts taken from 15 different categories of nonfiction and fiction, including newspapers, reportage, editorials, reviews; magazines and journals: popular, academic; government and corporate documents; fiction monographs and short stories (both popular and literary).

Australian Radio Talkback

Australian Radio Talkback (ART) is a set of transcribed recordings of samples of national, regional and commercial Australian talkback radio from 2004 to 2006. It includes 27 audio recordings and transcripts of talkback from ABC National Radio, ABC Radio broadcasts to eastern Australia, ABC Radio broadcasts to southern and western Australia, as well as commercial stations broadcasting to eastern Australia and southern and western Australia.

Braided Channels

The Braided Channels research collection is constructed from some 70 hours of oral history interviews with women from Australia’s Channel Country, together with archival film, transcripts, photos and music. It includes both audiovisual recordings and transcripts of interviews.

International Corpus of English (ICE-AUS)

The Australian component of the International Corpus of English (ICE-AUS) is an approximately one million word corpus of transcribed spoken and written Australian English from 1992-1995. It consists of 500 samples of Australian English (60% speech, 40% writing) that match the structure of other corpora associated with the International Corpus of English.

The La Trobe Corpus of Spoken Australian English

The La Trobe Corpus of Spoken Australian English comprises a collection of six recordings and transcriptions of spoken interaction amongst Australian speakers of English (some in conversation with native French speakers speaking English) made in Melbourne from 2001 to 2002.

The speech of Australian adolescents: research data and recordings collected by A.G. Mitchell and Arthur Delbridge in 1959 and 1960

This dataset comprises 22,187 recordings of Australian English as spoken by 7,736 students at 330 schools across Australia and specific information about the speakers. The recordings were made on reel-to-reel tapes and were used to create the 1965 monograph The speech of Australian adolescents: a survey and the revised 1965 publication The pronunciation of English in Australia (originally published in 1946). The Australian National Corpus provided access to a sample of this material; the full dataset is now available.

Other datasets not yet available in the portal:

Sydney Speaks: This project seeks to document and explore Australian English, as spoken in Australia’s largest and most ethnically and linguistically diverse city – Sydney.
From Farms to Freeways: This research project sought to analyse the experiences of women who had lived in the Blacktown and Penrith areas since the early 1950s, including their responses to social changes brought about by rapid suburbanisation in the Western Sydney region in the post-war period. Two-hour taped discussions were held with 34 women, aged 60 and over, who were in their early twenties during the Western Sydney region’s population growth.
A collection of government documents in various languages. This is a very small dataset assembled to check that our technology can handle different languages and different scripts; more information about this work is available in this presentation.

Work is underway to make data from other earlier projects accessible through LDaCA:

Datasets from The Australian National Corpus not listed above (Monash Corpus of English, Griffith Corpus of Spoken Australian English)
AusTalk
Corpus of Australian English as a Second Language (AusESL)

Indigenous Language Data (ILD) Portal

Caroline Kelly Papers

The Caroline Kelly Papers collection comprises personal and professional papers of Caroline Kelly, including correspondence; financial and legal papers; unpublished poetry and stories; theatre records and publications; anthropology field notes, reports and articles; photographs and newspaper cuttings.

Elwyn Flint Collection, UQFL173

The Elwyn Flint collection comprises written documents collected by Flint mostly as part of a long term research project in the 1960s, known as the Queensland Speech Survey, during which Flint recorded traditional Aboriginal languages. This collection also documents the Yuulngu (Gupapuyngu) language. Other parts of the collection include journal articles of languages, correspondence with field-linguists and staff from academic institutions, Indonesian material, and documentation of the English spoken by Australian Aboriginal and Torres Strait Islander peoples, Australian and regional pidgins, “migrants” and “mother-tongue” people from different ages, regions or socio-economic groups.

Fryer Library, The University of Queensland

The Fryer Library manuscript collections consists of unpublished materials and includes personal papers, photographs, audio recordings, architectural plans, artworks and more. Manuscripts are generally arranged in collections according to who created or collected the material. Note that Fryer Library’s published material can also be searched through its Library Search.

UQ Library Collection

A selection of Indigenous language resources held in the University of Queensland Library.