Presented by Manoj Bhat

The idea of the paper is to use machine learning, especially classification, from issue management systems. Why are we looking at architecture design decisions (ADDs)? We all know the advantages, but it is not yet well adopted. To the author’s idea because of a lack of tool support. Thus, ADDs are not well documented: it takes a lot of time to document those. One of the reasons is the agile: “working software over comprehensive documentation”. So, how can we automatically extract ADDs from textual information, and how can we automatically classify thesse?

ADDs are made on different levels (high level, detailed design, implementation). Similarly, we can categorize ADDs into several classes: property (constraints, rules), existence (structural, behavioral, ban), and executive. Especially the Banned ADDs are dificicult to retrieve: these are removed from the source code, and thus typically non-existent in the documentation.

In this paper, the authors propose a two-phase approach. From Jira, it tries to detect decisions (phase 1), which are then classified (phase 2). As a case study, they used the apache spark and apache hadoop common library, containing 19k and 10k issues, respectively. The reason is that these projects have a structured issue process. Manual labeling led eventually for 781 design decisions and 1358 non design-decisions (the dataset is publicly available!).

 

Next, they trained their system. On the one hand, the results seem promising. Each of the phases have a quite good F-score, but what if you combine them? Is the accuracy then not too low? The results also raise many new questions. Are the results so good because of the regime in issue management? What happens if there is a less rigid regime? What if there are multiple categories possible for a decision? How robust is the trained model (overfitting)? Can it be applied in other projects as well? Also I have some questions about the classification: does it better score on high-level decisions, or does it better score on implementation decisions? Interesting to find out!

ECSA 2017: Automatic extraction of design decisions from issue management systems: a machine learning based approach