NoDaLiDa 2023 - May 22-24, 2023


Extracting Sign Language Articulation from Videos with MediaPipe

Carl Börstell

This paper concerns evaluating methods for extracting phonological information of Swedish Sign Language signs from video data with MediaPipe's pose estimation. The methods involve estimating i) the articulation phase, ii) hand dominance (left vs. right), iii) the number of hands articulating (one- vs. two-handed signs) and iv) the sign's place of articulation. The results show that MediaPipe's tracking of the hands' location and movement in videos can be used to estimate the articulation phase of signs. Whereas the inclusion of transport movements improves the accuracy for the estimation of hand dominance and number of hands, removing transport movements is crucial for estimating a sign's place of articulation. 

Who said what? Speaker Identification from Anonymous Minutes of Meetings

Daniel Holmer, Lars Ahrenberg, Julius Monsen, Arne Jönsson, Mikael Apel, Marianna Blix Grimaldi

We study the performance of machine learning techniques to the problem of identifying speakers at meetings from anonymous minutes issued afterwards. The data comes from board meetings of Sveriges Riksbank (Sweden's Central Bank). The data is split in two ways, one where each reported contribution to the discussion is treated as a data point, and another where all contributions from a single speaker have been aggregated. Using interpretable models we find that lexical features and topic models generated from speeches held by the board members outside of board meetings are good predictors of speaker identity. Combining topic models with other features gives prediction accuracies close to 80% on aggregated data, though there is still a sizeable gap in performance compared to a not easily interpreted BERT-based transformer model that we offer as a benchmark. 

NorQuAD: Norwegian Question Answering Dataset

Sardana Ivanova, Fredrik Aas Andreassen, Matias Jentoft, Sondre Wold, Lilja Øvrelid

In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available. 

Question Answering and Question Generation for Finnish

Ilmari Kylliäinen, Roman Yangarber

Recent advances in the field of language modeling have improved the state-of-the-art in question answering (QA) and question generation (QG).  However, the development of modern neural models, their benchmarks, and datasets for training them has mainly focused on English. Finnish, like many other languages, faces a shortage of large QA/QG model training resources, which has prevented experimenting with state-of-the-art QA/QG fine-tuning methods. We present the first neural QA and QG models that work with Finnish. To train the models, we automatically translate the SQuAD dataset and then use normalization methods to reduce the amount of problematic data created during the translation.  Using the synthetic data, together with the Finnish partition of the TyDi-QA dataset, we fine-tune several transformer-based models to both QA and QG and evaluate their performance. To the best of our knowledge, the resulting dataset is the first large-scale QA/QG resource for Finnish. This paper also sets the initial benchmarks for Finnish-language QA and QG.