Abstract
Semantic analysis of corpora containing heavy usage of jargon words and phrases introduces problems not commonly addressed by Natural Language Processing methods. Modern semantic analysis relies on data from unedited websites or other expertly written sources, which lack similar usage of jargon words and phrases. This paper presents a system of semi-supervised lexicon learning algorithms that collate several manually labeled and clustered data sources, such as thesauri. In addition, this paper demonstrates an improvement in performance of these subjectivity classifiers by applying a boosting method. This paper presents a method of automatic Aviation Safety Reporting System (ASRS) shaping factor classification based on the most relevant words from a subjectivity lexicon.