NoDaLiDa 2023 - May 22-24, 2023
German Research Center for Artifical Intelligence
Towards Digital Language Equality in Europe: An Overview of Recent Developments
Digital Language Equality (DLE) “is the state of affairs in which all languages have the technological support and situational context necessary for them to continue to exist and to prosper as living languages in the digital age”, as we specified in one of our key reports of the EU project European Language Equality (ELE). Our empirical findings suggest that Europe is currently very far from having a situation in which all our languages are supported equally well through technologies. In this presentation, I’ll give an overview of the two ELE projects and their main results and findings with a special focus on the Nordic languages (including insights from the FSTP projects supported through ELE2). This will also include a brief look back into the past, especially discussing the question if and where we have seen progress in the last, say, 15 years. Furthermore, I’ll present an overview of our main strategic recommendations towards the European Union in terms of bringing about DLE in Europe by 2030. The presentation will conclude with a look at other relevant activities in Europe, including, critically the Common European Language Data Space project, which started in early 2023.
MARTA R. COSTA-JUSSÀ
Research scientist at Meta
No-language-left-behind: Scaling Human-Centered Machine Translation and Toxicity at Scale
Machine Translation systems can produce different types of errors, some of which are characterized as critical or catastrophic due to the specific negative impact that they can have on users. In this talk, we focus on one type of critical error: added toxicity. We evaluate and analyze added toxicity in the context of NLLB-200 that open-sources models capable of delivering evaluated, high-quality translations directly between 200 languages. An automatic toxicity evaluation shows that added toxicity across languages varies from 0% to 5%. The output languages with the most added toxicity tend to be low-resource ones, and the demographic axes with the most added toxicity include sexual orientation, gender and sex, and ability. Making use of the input attributions allows us to further explain toxicity and our recommendations to reduce added toxicity are to curate training data to avoid mistranslations, mitigate hallucination and check unstable translations.
HJALMAR P. PETERSEN
Professor in Faroese language
Department of Language and Literatur, the University of Faroe Islands
Aspects of the structure of FaroesePhonological changes and later morphologization have led to different complex alternations in Faroese. These are argued to emerge especially in small languages, with little contact and tight networks. The alternations will be exemplified with 'skerping', palatalization, glide insertion and the quantity-shift. There will be a discussion of the morphology-phonology interface, where the suggestion is that Faroese has 3 strata, stem1, stem2 and a word- strata. Syntactic variation and different construction will be addressed and illustrated; in this context reflexives are included and the present reorganization of the case system of complements of prepositions, where speakers use semantic and structural case in a certain way.