Portfolio: Enhancing Deep Neural Networks with Morphological Information

BERT
Classification
Data Science
Fasttext
LSTMs
Large Language Models
Machine Learning
Named Entity Recognition
Natural Language Processing
Research


Project Description

Our project investigated the impact of adding morphological data to the input of deep neural networks. We tested additions on downstream tasks - named entity recognition (NER), dependency parsing (DP), and comment filtering (CF) - across various languages. We conducted tests utilizing two distinct architectures: an adapted BERT model and a combination of Fasttext embeddings and LSTM models.

Role

I managed all aspects of testing, data gathering, and experiments related to NER, collaborating with another developer who handled DP and CF tasks. Our research findings were published in the Natural Language Engineering journal, operated by Cambridge University. We utilized Python and PyTorch for machine learning tasks, supported by Scikit-Learn for data analysis.

Project Outcome

Our research demonstrated that incorporating morphological features enhances LSTM-based models' performance for NER and DP tasks. However, for the CF task, the inclusion of these features did not yield noticeable improvements. Regarding BERT-based models, added morphological features showed performance enhancements solely for DP when the features were of high quality (manually checked), while their predicted counterparts did not provide significant improvements.