Abstract
Conference Title: 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) Conference Start Date: 2015, Aug. 24 Conference End Date: 2015, Aug. 26 Conference Location: Victoria, BC, Canada This paper focuses on the use of machine learning techniques for the analysis of computer programs in order to acquire information about an author's gender. There are few existing studies that address the relationship between linguistics and programming; however, in many areas where language is analyzed it is possible to mine important information about the users of that language associated with set of attribute or coding style. In this work we use open source implementations of machine learning algorithms, specifically, nearest neighbor (K*), decision tree (J48), and Bayes classifier (Nïve Bayes). These algorithms were applied to C++ programs which were associated with sociolinguistic information about the program authors. Our goal was to classify the programs according to the gender of the author. As indicated by our initial results we have been able to achieve precision of 72.3%, recall of 72%, and f-measure of 71.9% which demonstrates that we can predict the gender of the authors of C++ programs.