swMATH ID: 39411
Software Authors: Francisco Ortin, Javier Escalada
Description: Cnerator: A Python application for the controlled stochastic generation of standard C source code. The Big Code and Mining Software Repositories research lines analyze large amounts of source code to improve software engineering practices. Massive codebases are used to train machine learning models aimed at improving the software development process. One example is decompilation, where C code and its compiled binaries can be used to train machine learning models to improve decompilation. However, obtaining massive codebases of portable C code is not an easy task, since most applications use particular libraries, operating systems, or language extensions. In this paper, we present Cnerator, a Python application that provides the stochastic generation of large amounts of standard C code. It is highly configurable, allowing the user to specify the probability distributions of each language construct, properties of the generated code, and post-processing modifications of the output programs. Cnerator has been successfully used to generate code that, utilized to train machine learning models, has improved the performance of existing decompilers. It has also been used in the implementation of an infrastructure for the automatic extraction of code patterns.
Homepage: https://www.sciencedirect.com/science/article/pii/S235271102100056X
Source Code:  https://github.com/ElsevierSoftwareX/SOFTX-D-21-00022
Keywords: Big code; Mining software repositories; Machine learning; C programming language; Stochastic program generation; Python; SoftwareX; Cnerator
Related Software: YARPGen; Frama-C; Python
Cited in: 0 Publications

Standard Articles

1 Publication describing the Software Year