Cited by Lee Sonogan
Abstract by Aviel J. Stein, Daniel Schwartz, Yiwen Shi, Spiros Mancoridi
Source code segmentation is the process of dividing the source code of a program into meaningful pieces, such as in preparation for source code analysis (SCA) tasks. Our goal is to segment code based on the semantics of its content. Specifically such that the segments reflect logical locations that are good candidates for the insertion of manually composed comments or automatically generated comments. Instead of focusing on syntactic boundaries for segmentation, such as function and class declarations, we exploit the semantic content of the code. We use code snippets mined from Github as known semantic segments to train a LSTM Neural Network model. It is able to infer locations in the code where a programmer would likely insert comments. The model can operate on any text and performs well across multiple programming languages for detecting candidate segment boundaries within a program. This semantic code segmentation is especially useful for incomplete code repositories under development, which may be also written in more than one programming language. Additionally, our technique supports a detection threshold parameter so users can adjust the number of suggestions provided by our tool.
Publication: College of Computing and Informatics Drexel University, Philadelphia, Pennsylvania (Peer-Reviewed Journal)
Pub Date: 2021 Doi: https://www.cs.drexel.edu/~spiros/papers/ICSC22.pdf
Keywords: Deep Learning, Natural Language, Big Data, Source Code Analysis, Segmentation
https://www.cs.drexel.edu/~spiros/papers/ICSC22.pdf (Plenty more sections and references in this research article)