Evaluating the potential of language-family-specific generative models for low resource data augmentation: a Faroese case study

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Downloads (Pure)

Abstract

We investigate GPT-Sw3, a generative language model for the Nordic languages, to assess its understanding of low-resource Faroese. Our aim is to demonstrate the advantages of using language-family-specific generative models to augment data for related languages with fewer resources. We evaluate GPT-Sw3 by prompting it for Faroese to English translation in a zero, one and few-shot setting. We assess such translations with an ensemble score consisting of an arithmetic average between the BLEU and a semantic similarity score (SBERT). Moreover, we challenge the model’s Faroese language understanding capabilities on a small dataset of curated Faroese trick sentences. There, we compare the model’s performance with Open AI’s GPT 3.5 and GPT 4, demonstrating the advantages of using a language family specific generative model for navigating non trivial scenarios. We evaluate the pipeline thus created and use it, as a proof of concept, to create an automatically annotated Faroese semantic textual similarity (STS) dataset.
Original languageEnglish
Title of host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Place of PublicationTorino
Pages6496–6503
Number of pages8
Publication statusPublished - 22 May 2024
EventLREC-COLING 2024 - Torino, Italy
Duration: 20 May 202425 May 2024
https://lrec-coling-2024.org/

Conference

ConferenceLREC-COLING 2024
Country/TerritoryItaly
CityTorino
Period20/05/2425/05/24
Internet address

Keywords

  • Semantic Textual Similarity
  • low-resource language
  • Machine translation
  • Data augmentation

Fingerprint

Dive into the research topics of 'Evaluating the potential of language-family-specific generative models for low resource data augmentation: a Faroese case study'. Together they form a unique fingerprint.

Cite this