Abstract
Alcohol use is one of the main risk factors related to many diseases. However, alcohol use information is buried in the patient's clinical records, and extracting this information from narrative text requires substantial manual labor. This work aims to develop an automated system for detecting alcohol use status from patients' discharge summaries. A combination of machine learning and rule-based techniques has been employed in order to identify alcohol status in three stages. In the first stage, the proposed system detects alcohol-related sentences by utilizing a keyword search technique. The second stage distinguishes between the negative and positive alcohol sentences and identifies the temporal status. In this stage different machine learning classifiers have been employed in order to achieve the best performance. Finally, the document level alcohol use status is aggregated from the sentence-level for each patient's record. The proposed system exhibits high performance in identifying alcohol use status, achieving an F1-score up to 0.99 in identifying alcohol use related records, 0.96 in detecting negative records and 0.89 identifying temporal status.