Abstract
Although, over the years, information retrieval systems have shown tremendous improvements in searching for relevant scientific literature, human cognition is still required to search for specific document elements in full text publications. For instance, pseudocodes pertaining to algorithms published in scientific publications cannot be correctly matched against user queries, hence the process requires human involvement. AlgorithmSeer, a state-of-the-art technique, claims to replace humans in this task, but one of the limitations of such an algorithm search engine is that the metadata is simply a textual description of each pseudocode, without any algorithm-specific information. Hence, the search is performed merely by matching the user query to the textual metadata and ranking the results using conventional textual similarity techniques. The ability to automatically identify algorithm-specific metadata such as precision, recall, or f-measure would be useful when searching for algorithms. In this article, we propose a set of algorithms to extract further information pertaining to the performance of each algorithm. Specifically, sentences in an article that convey information about the efficiency of the corresponding algorithm are identified and extracted using a recurrent convolutional neural network (RCNN). Furthermore, we propose improving the efficacy of the pseudocode detection task by using a multi-layer perceptron (MLP) classification trained with 15 features, which improves the classification performance of the state-of-the-art pseudocode detection methods used in AlgorithmSeer by 27%. Finally, we show the advantages of the AIenabled search engine (based on RCNN and MLP models) over conventional text-retrieval models.