OpenAI has published a machine learning model (ML) that translates natural language into source code for different programming languages. Codex is based on the Generative Pre-trained Transformer 3 (GPT-3) language model. It forms the basis for GitHub’s ML Assistant Copilot. Initially, a private beta test is running and interested parties can register on a waiting list.
At the start of GitHub Copilot in June, OpenAI announced that it would publish Codex as an independent model. Since then, the company has arguably expanded the model. It can now convert simple English sentences into source code. On the OpenAI page there are videos on different scenarios from game programming to data science applications.
Space Oddity per Machine Learning
The videos demonstrate the expanded understanding of language, for example by adding three exclamation marks to the obligatory “Hello World” for more emotional expression and adding “HELLO WORLD !!!” to the request “Even louder please” with capital letters. implements.
One-sided polyglot model
Codex is not designed for specific scenarios, but rather as a generally applicable model. Apart from the concrete implementation in a programming language, the most important part of the work remains in human hands. The commands entered break down the task into logical components. Formulated using the example of the small game, humans give dedicated specifications as to which objects appear on the screen, how the program reacts when a key is pressed and what happens when two objects collide.
According to its own statements, OpenAI has tried out the model for various tasks beyond translating text into code. This includes refactoring, creating explanations and transpiling, i.e. translating source code into another programming language or into another form in the same language. A sample video converts Python to Ruby code.
OpenAI, Codex and Copilot
OpenAI was founded in 2015 as a research project, Elon Musk was one of the founders. Initially, the organization published many projects in the field of reinforcement learning. The company achieved great fame with the Generative Pre-trained Transformer 3 (GPT-3) language model. In 2019, OpenAI said goodbye to the pure non-profit business and is now investing itself in start-ups in the field of machine learning – most recently with 100 million US dollars.
The copilot based on Codex has been in the test phase since the end of June. GitHub calls it an “AI Pair Programmer”: As with pair programming, the virtual supporter should give suggestions for improving or adding to the source code. There are critical tones from some quarters. For example, the Free Software Foundation has called for papers to be submitted on the implications of the service that the organization describes as unacceptable and unjust.
One of the focal points is the question of potential copyright infringements. GitHub uses code from numerous publicly available repositories as the basis for Copilot, and parts of the training material are under GPL license.
More details on Codex’s beta test can be found in the OpenAI blog. The name is not based on the code of conduct, but should go back to the ancient Roman meaning of the wax or wooden tablets as a pre-form of the book and definitely to “code”. If you want to try Codex, you can in a web form on the waiting list let sit.
- The title topic of the current iX 8/2021 “Better Code with AI” highlights ML-based software development with GitHub Copilot, among others: How does machine learning help with programming, where are the limits and where are the risks?