Natural language code: OpenAI opens beta for Codex programming AI


OpenAI has published a machine learning model (ML) that translates natural language into source code for different programming languages. Codex is based on the Generative Pre-trained Transformer 3 (GPT-3) language model. It forms the basis for GitHub’s ML Assistant Copilot. Initially, a private beta test is running and interested parties can register on a waiting list.

At the start of GitHub Copilot in June, OpenAI announced that it would publish Codex as an independent model. Since then, the company has arguably expanded the model. It can now convert simple English sentences into source code. On the OpenAI page there are videos on different scenarios from game programming to data science applications.

For the game, sentences like “When the rocket is clicked, temporarily display some text saying ‘Firing thrusters!” in white on the current location – and temporarily speed up by 4x for 0.25 second “the basis for JavaScript code. The instruction in natural language is then placed as a comment above the implementation.

Codex generates JavaScript code based on the instruction in natural language.

(Image: OpenAI)

The videos demonstrate the expanded understanding of language, for example by adding three exclamation marks to the obligatory “Hello World” for more emotional expression and adding “HELLO WORLD !!!” to the request “Even louder please” with capital letters. implements.

According to OpenAI, Python is the programming language that Codex “speaks particularly fluently”. In addition, the model can create code in JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, among others. Shell code is also part of the understanding of programming language. Apparently, however, only English is currently intended as the source language.

Codex is not designed for specific scenarios, but rather as a generally applicable model. Apart from the concrete implementation in a programming language, the most important part of the work remains in human hands. The commands entered break down the task into logical components. Formulated using the example of the small game, humans give dedicated specifications as to which objects appear on the screen, how the program reacts when a key is pressed and what happens when two objects collide.

According to its own statements, OpenAI has tried out the model for various tasks beyond translating text into code. This includes refactoring, creating explanations and transpiling, i.e. translating source code into another programming language or into another form in the same language. A sample video converts Python to Ruby code.

OpenAI was founded in 2015 as a research project, Elon Musk was one of the founders. Initially, the organization published many projects in the field of reinforcement learning. The company achieved great fame with the Generative Pre-trained Transformer 3 (GPT-3) language model. In 2019, OpenAI said goodbye to the pure non-profit business and is now investing itself in start-ups in the field of machine learning – most recently with 100 million US dollars.

The copilot based on Codex has been in the test phase since the end of June. GitHub calls it an “AI Pair Programmer”: As with pair programming, the virtual supporter should give suggestions for improving or adding to the source code. There are critical tones from some quarters. For example, the Free Software Foundation has called for papers to be submitted on the implications of the service that the organization describes as unacceptable and unjust.

Copilot creates a function based on the comment.

(Image: GitHub)

One of the focal points is the question of potential copyright infringements. GitHub uses code from numerous publicly available repositories as the basis for Copilot, and parts of the training material are under GPL license.

More details on Codex’s beta test can be found in the OpenAI blog. The name is not based on the code of conduct, but should go back to the ancient Roman meaning of the wax or wooden tablets as a pre-form of the book and definitely to “code”. If you want to try Codex, you can in a web form on the waiting list let sit.

  • The title topic of the current iX 8/2021 “Better Code with AI” highlights ML-based software development with GitHub Copilot, among others: How does machine learning help with programming, where are the limits and where are the risks?


To home page