Abstract
The quality of automatic app feature extraction from app reviews depends on various aspects, e.g. the feature extraction method, training and evaluation datasets, evaluation method etc. Annotation guidelines used to guide the annotation of training and evaluation datasets can have a considerable impact to the quality of the whole system but it is one of the aspects that is often overlooked. We conducted a study in which we explore the effects of annotation guidelines to the quality of app feature extraction. We propose several changes to the existing annotation guidelines with the goal of making the extracted app features more useful to app developers. We test the proposed changes via simulating the application of the new annotation guidelines and evaluating the performance of the supervised machine learning models trained on datasets annotated with initial and simulated annotation guidelines. While the overall performance of automatic app feature extraction remains the same as compared to the model trained on the dataset with initial annotations, the features extracted by the model trained on the dataset with simulated new annotations are less noisy and more informative to app developers.