Crokage: Source text help uses millions of responses from Stack Overflow

IT researchers have developed software that uses the developer portal Stack Overflow to provide understandable answers to developer queries. Crokage interprets search strings and provides solutions in the form of code examples and tutorials intended to explain specific source text passages, methods, classes and functions. The machine-learning system is based on the 18 million questions published on Stack Overflow and 27 million responses to them.

Job market

  1. Marienhaus Dienstleistungen GmbH, Neustadt an der Weinstrasse
  2. State capital Wiesbaden, Wiesbaden

"One of the most powerful features of Stack Overflow is the accumulation of developer knowledge over time", writes the developer portal in one Blog Post, Crokage takes advantage of this knowledge. It should be a solution for the frequently occurring semantic differences between the questions of the developers and the answers of the system they use. These are often not or only partially satisfactory. "These issues cause developers to search dozens of documents to create a satisfactory solution.", the developers write in the summary of the project.

Still in the test phase

Crokage applies speech recognition to high-level responses in stack overflow threads to handle them. The basic framework provides Fast text, an already trained text classifier. The software then returns source code examples and explanations. In initial tests, the program should be superior to conventional code searches. However, this has so far only been tested on 97 programming examples, half of which were used for further training by Crokage. 29 developers were involved in the test. For a productive application probably far more extensive test situations are necessary.

A first limited trial version of Crokage is already on a corresponding trial Website available, Currently the software is limited to the programming language Java. Users can try it and submit reviews. These in turn can help developers to improve results in the future.