Portfolio: STARK - Dependency Parsing Statistics Extractor

Data Extraction
Dependency Parsing
Multi-Core Processing
Natural Language Processing
Text Analysis
Text Mining
Tool Development


Project Description

Overview

STARK is a tool that extracts subtrees from dependency trees provided in CONLL-U formatted file. It counts them based on user-defined parameters and calculates various statistics. It leverages multicore processing for speed.

Key Features

  • Query search: This mode allows users to define query trees and efficiently count subtrees that match them.
  • Greedy search: This mode greedily creates subtrees of a certain size.
  • Multiple parameters: The tool supports about 20 parameters that allow customizable extraction.
  • Website: The tool is used in a demo website.

Role

As a sole developer, I was in charge of designing program architecture and developing the tool. I also acted as a technical consultant on the implementation of various features.

Project Outcome

STARK has been used for various linguistic analyses. It is being promoted to reach an even wider audience internationally.