By Jurgen Vinju
The assumption is that for software to “be good”, you need to understand the code. However, most software engineers do not have time to read it. So, how can you understand it? Automatic analyis could help to answer questions about software. In fact, all tasks (be it verification, re-engineering, refactoring, etc.) are questions about software. To advance scientifically, we evaluate and validate analysis tools: demonstrate the value of the tool, show that the tool works, and to verify that the tool is correct. To advance industrially, we need to innovate and valorize, e.g. rapid prototyping.
Software itself is just a bunch of characters, from which you need to extract facts to build a model. Computers are very fast, and patient, and therefore can do a lot for us, but many things are undecidable, so we need approximations and measure accuracy of these approximations. But, code analysis is hard because of the enourmous variety in details. All little details matter. Rascal is a “language parametric” DSL for meta-programming. It follows the EASY approach: Extract Analyse and SYnthesise. Under the hood, Rascal uses syntax trees. It has no more information than the original source, but it is simpler to do stuff with (you only loose the layout of the source). Next, you annotate the tree with additional things you know, such as types. Many advances come with new intermediate representations. It would be great if we could get these intermediate representations “for free”. These are also all trees! They built several front-ends, Java-AiR (Analysis in Rascal), which can be reused, which is highly beneficial, but also quite risky.
As things are undecidable: you will always be wrong: 100% is not possible. This gives papers: you need to test how good your approximation is, e.g. in terms of accuracy (the “right” answer) and precision (detailed answer). But, quality is very subjective to the target question! A first good step in making an analysis tool is safe-over approximations, or efficient under-approximation. However, most tools combine under and over approximation. That makes it hard to validate your tool, because: what is the ground-truth? Tip op Jurgen: do one of the two, and slightly grow your tool to add slow advances without confusion. For example, the ClAiR (C++ Analysis in Rascal) over-approximates the C++ syntax, so that analysis becomes less hard.
Bottom line: it is still a wide open field: all is fair in love, war and code analysis. You have nothing to lose; so think outside the box. Software engineers need automated software analyses. “Do or do not, there is no try!” Use the source!