CASS researchers strive to preserve Australia’s multilingual identity

Image Caption: L to R, Dr Li Nguyen, Professor Catherine Travis, Dr Julia Miller and Senior Data Analyst Wolfgang Barth.

30 November, 2023

4 minute read

Researchers from the ANU School of Literature, Language & Linguistics are working in the Language Data Commons of Australia (LDaCA), an inter-university project that will facilitate accessibility of large language collections in Australia.

We are building a strong foundation for the Humanities and Social Sciences in Australia, one that will help preserve our country’s cultural heritage.

Australia is one of the most linguistically diverse countries in the world. An enclave where English, Indigenous, migrant, sign and pacific region languages come together in a rich multilingual convergence.

Over the decades, an enormous amount of language data has been recorded by academics, communities, and individuals, including interviews and oral histories that are filled with countless hours of testimonials from past generations.

From first-hand accounts of war refugees to recollections of incoming migrants and many different social groups to bushfire stories, the narratives collected are living proof of centuries of history and a testament to Australia’s multicultural diversity.

These language collections carry invaluable data for linguists, historians, sociologists and many other investigators. Such is their cultural significance, that the National Library of Australia houses a good amount of these records, but many more are still scattered around the country at risk of being lost.

Losing them permanently would be like erasing a vital chapter of Australia’s history.

A haven for data: The Language Data Commons of Australia

To date, the existing language data has not been fully mapped and integrated in a unified and accessible digital infrastructure. And that’s precisely where the Language Data Commons of Australia (LDaCA) steps in.

Seeking to fill this long-overlooked gap, LDaCA’s primary goal is resolute: to create a dedicated digital space where researchers, communities and the general public can easily access a treasure trove of language knowledge.

As a project of national interest, LDaCA took shape as a partnership between five prominent Australian universities.

The University of Queensland is spearheading the project in close cooperation with The Australian National University, The University of Sydney, The University of Melbourne, and Monash University, working alongside partner organisations AARNet and First Languages Australia.

Professor Catherine Travis, a linguistics researcher and one of the chief investigators of the project, is leading the ANU team – which includes postdoctoral researcher Dr Li Nguyen, Senior Data Analyst Wolfgang Barth and Senior Data Manager Dr Julia Miller.

“The goal of LDaCA is to secure these collections and integrate them into a national research infrastructure, ensuring their discoverability and accessibility,” Professor Travis says.

“Working across multiple institutions brings together the diversity of skills, expertise and experience needed to build a project of this scale.”

Initiated in 2021, LDaCA has recently secured an additional tranche of funding from the Australian Research Data Commons (ARDC), Australia’s leading research data infrastructure facility.

The new contribution will extend the project for an additional year and has already allowed Professor Travis to bring her team together in person for the first time.

“After 6 months working with a dispersed team, it is wonderful to now have us all at the ANU campus. This way we can communicate much more easily, and that helps to get the creative juices flowing,” Professor Travis expresses.

ANU in quest to unearth hidden language treasures

As a dedicated project branch, the researchers from the ANU College of Arts and Social Sciences (CASS) are working to track and catalogue collections of Australian English and migrant languages that are underused and, in some cases, unknown.

“Tracking down language collections is a bit like a ‘language dig’, as these are diverse, dispersed, and often quite hidden away,” Professor Travis explains.

“There are many collections in the National Library of Australia, but also in community language centres, historical societies, researchers’ offices and even in people’s garages.”

Through this intricate data mining process, the ANU researchers are populating the soon-to-be nationwide infrastructure, expanding the understanding of Australia’s social, cultural and linguistic history.

“With many decades of data collection behind us and with recent technological developments, now is the perfect time to open this up to create a language data commons,” Professor Travis observes.

LDaCA brings together an exceptional team of cross disciplinary experts, from data scientists, to linguists, computational specialists and community engagement practitioners, and is also partnering with the long-running archive for Pacific and regional languages and cultures, with support from PARADISEC —which is run at ANU by Julia Miller and Dr Rosey Billington.

Collaboratively, the various working groups are striving to create a rich resource that will bring long-lasting benefits for Australian society.

“We are building a strong foundation for the Humanities and Social Sciences in Australia, one that will help preserve our country’s cultural heritage,” Professor Travis stresses.

If you would like to obtain more information about the Language Data Commons of Australia (LDaCA), read this article published in the ARDC website.