Luka Krsnik

Portfolio: Slovene Word Stress Assignment

Data Science

Ensemble Methods

Machine Learning

Master Thesis

Natural Language Processing

Neural Networks

Research

Project Highlights

Role: I developed the algorithm, under my mentor's guidance and a linguistic expert's help.
Experimentation: We conducted various experiments, testing different neural network architectures and data representations.
Utilization: We used the best model for automatic stress assignment of a morphological lexicon.

Project Description

This project was a continuation of my master's thesis. It addressed stress assignment for Slovene words. Slovene differs from many other languages due to its lack of a straightforward stress assignment algorithm. Traditionally, word accents are learned alongside vocabulary, yet Slovene speakers can still annotate unfamiliar words, likely due to their understanding of similar word patterns.

Our primary objective was to identify the most effective approach for stress location and stress type assignment in Slovene. We conducted multiple experiments, exploring various deep artificial neural network architectures, data representations, and ensemble methods.

Project Outcome

We achieved an accuracy of 87.62% for the tested words. This performance surpassed other machine learning methods, indicating that the neural network approach is superior in stress assignment for Slovene.

Utilizing our best models, we assigned the stress type and location of the largest morphological lexicon available for the Slovenian language. This allowed linguists and language enthusiasts to gain more insights into Slovene language.