Abstract
Conference Title: 2018 IEEE International Conference on Big Data (Big Data) Conference Start Date: 2018, Dec. 10 Conference End Date: 2018, Dec. 13 Conference Location: Seattle, WA, USA Program comprehension is an imperative and indispensable prerequisite for several software tasks, including testing, maintenance, and evolution. In practice, understanding the software system requires investigating the high-level system functionality and mapping it to its low-level implementation, i. e. source code. The implementation of a software system can be captured using a call graph. A call graph represents the syste’s functions and their interactions at a single level of granularity. While call graphs can facilitate understanding the inner system functionality, developers are still required to manually map the high-level system functionality to its call graph. This manual mapping process is expensive, time-consuming and creates a cognitive gap between the syste’s highly-level functionality and its implementation. In this paper, we present an innovative approach that can automatically (1) construct and visualize the static call graph for a system written in Python, (2) cluster the execution paths of the call graph into hierarchal abstractions, and (3) label the clusters according to their major functional behaviors. The goal is to bridge the cognitive gap between the high-level system functionality and its call graph, which can further facilitate system comprehension. To validate our approach, we conducted four case studies including code2graph, Detectron, Flask, and Keras. The results demonstrated that our approach is feasible to construct call graphs and hierarchically cluster them into abstraction levels with proper labels.