Portfolio: STARK - Dependency Parsing Statistics Extractor
Data Extraction
Dependency Parsing
Multi-Core Processing
Natural Language Processing
Text Analysis
Text Mining
Tool Development
Project Description
Overview
STARK is a tool that extracts subtrees from dependency trees provided in CONLL-U formatted file. It counts them based on user-defined parameters and calculates various statistics. It leverages multicore processing for speed.
Key Features
- Query search: This mode allows users to define query trees and efficiently count subtrees that match them.
- Greedy search: This mode greedily creates subtrees of a certain size.
- Multiple parameters: The tool supports about 20 parameters that allow customizable extraction.
- Website: The tool is used in a demo website.
Role
As a sole developer, I was in charge of designing program architecture and developing the tool. I also acted as a technical consultant on the implementation of various features.
Project Outcome
STARK has been used for various linguistic analyses. It is being promoted to reach an even wider audience internationally.