Now Code Without Learning to Code: OpenAI Kicks off Codex Beta

essidsolutions

OpenAI Codex, the AI technology that enables GitHub Copilot to synthesize code is now available for others to generate code from the natural language. OpenAI has released a limited private beta of the tool for developers to test out its promising text, speech to code generation capabilities. Is a deal with Microsoft in the offing?

GitHub Copilot made a splash when it was released in technical preview late in June. Just over a month later, the artificial intelligence (AI) technology that goes into powering the most exciting software development tool in recent years is also being launched. OpenAI on Tuesday announced the release of OpenAI Codex in private beta.

OpenAI co-founder and CTO Greg BrockmanOpens a new window in June described OpenAI Codex as “a descendant of GPT-3,” which is a deep learning model that is capable of generating human-like text. The major difference between GPT-3 and OpenAI Codex is that unlike the former that generates text in the English language, the latter generates code in programming languages.

OpenAI believes Codex is the next big thing in computer programming, which has considerably evolved from the days of physical punch cards. OpenAI co-founder Wojciech ZarembaOpens a new window , who also leads Codex development at the six-year-old company, told The VergeOpens a new window , “What is happening with Codex has happened before a few times. In the early days of computing, programming was done by creating physical punch cards that had to be fed into machines, then people invented the first programming languages and began to refine these.”

“These programming languages, they started to resemble English, using vocabulary like ‘print’ or ‘exit’ and so more people became able to program. Each of these stages represents programming languages becoming more high level,” he adds. In essence, the inclusion of words from human languages is what made programming more relatable. So the use of OpenAI Codex, which is built specifically to convert human language commands into snippets of code is an interesting prospect.

What Does OpenAI Codex Do?

Natural Language to Code

GPT-3 is a generative pre-trained transformer that can calculate and implement probability scores to generate, rather than predict human language based on natural prompts. Much like that, Codex also generates a result based on natural language prompts although the output is in the form of program code rather than sentences.

It is trained on billions of lines of publicly available source code as well as a given natural language. It can ‘converse’ with humans in a dozen programming languages including JavaScript, Go, Python, Perl, PHP, Ruby, Swift, TypeScript, Shell, with the highest proficiency in Python.

OpenAI Codex is trained to listen and interpret natural language commands such as “print four multiplied by eight on the screen” or “resize the image to fit within the computer screen.” A successful interpretation would result in the delivery of a computer code in a user-specified language.

Moreover, the code is formatted in accordance with industry-standards because the code that it was trained on is from the internet repositories.

Printing “Hello World” with OpenAI Codex | Source: OpenAI

It should be noted that Codex is designed to generate code and not ideate the program concept itself. That’s still the job of the developer. Think of OpenAI Codex as a personal, albeit virtual coding assistant that delivers program code based on the ideas a developer feeds it.

This is why it is utterly important to find the right command with a bit of trial and error. Consider the previously mentioned “print four multiplied by eight on the screen” command. What would Codex write the code for? It could be to print the expression ‘4×8′ when the developer actually wants the code to actually display the result of the expression, which is 32.

So it depends on the ambiguity of the natural language command. Once the developer is clear about what needs to be the output, they just need to break it down into simple terms and see how it can be mapped to existing code libraries, functions, etc. “Sometimes it doesn’t quite know exactly what you’re asking. So you had to think a little about what’s going on but not super deeply,” Brockman said.

Code to Code Conversion

Considering OpenAI Codex supports 12 programming languages, it certainly is a good idea to leverage it for converting code from one language to another. Essentially, a well-thought-out, and more importantly functioning code written in Python can be repurposed as Ruby code or code in any other supported language.

This process is called transpilation.

Converting Python Code to Ruby with OpenAI Codex | Source: OpenAI

Control Other Programs

As long as any particular program or software has an API, OpenAI Codex can be used to establish user-program interaction. Once again, all the user would need to do is instruct Codex to generate an appropriate code, which is then fed to the computer program to execute it via the API. 

Brockman demonstrated this with an example. He commanded Codex to write a code to instruct Microsoft Word to:

  • Delete all initial spaces
  • Prepend everything with L, the line number, then a colon
  • Delete the last line in the document
  • Set the font for every line to be a random color chosen from pink, orange, purple, red, blue, and green, and other instructions

Instructing a Computer Software via OpenAI Codex | Source: OpenAIOpens a new window

It is noteworthy that code generation takes place only after Codex self-corrects imprecisions in hearing a natural language. For example, Codex self-corrected ‘everyone’ to ‘every line’ in the video.

Limitations of OpenAI Codex

In July this year, OpenAI in a paper detailedOpens a new window how Codex can fall short of expectations. The company noted more than a few limitations, one of which, and probably the most interesting one is Codex’s inability to actually understand programming.

Sure, its machine learning base enables Codex to deliver results based on the established statistical relationship between different codes on which it was trained. But it doesn’t really understand it. It just pairs fragmented code together by its comprehension of a natural language.

OpenAI Codex also falls short of expectations when it comes to non-specific problems. Codex’s competency in code generation highly depends on the scope of the task. Generalized problems (instead of specific tasks) could prove to be quite challenging.

The tool is also not sample efficientOpens a new window despite being trained on billions of lines of code. “A strong student who completes an introductory computer science course is expected to be able to solve a larger fraction of problems than Codex-12B,” OpenAI researchers said.

Codex can also be biasedOpens a new window and can generate code with a “structure that reflects stereotypes about gender, race, emotion, class, the structure of names, and other characteristics.” Users can also become over-reliant on Codex. The company also mentioned environmental, legal, and security implications stemming from its energy footprint, chances of code regeneration (< 0.1%Opens a new window ), and the non-deterministic nature of the tool respectively.

How is OpenAI Codex Different from GitHub Copilot?

Technically, both synthesize code, just that the functional input elements of the two differ.

GitHub Copilot can be used to enhance developer productivity. It autocompletes code by suggesting to programmers what can/should come next, but it also automatically generates code snippets based on developer comments.

OpenAI Codex, on the other hand, actually creates or generates code based on the natural language instructions given by the developer/user.

Closing Thoughts

Besides aiding developers in code generation, OpenAI Codex can prove to be particularly useful in integrating new code components into an existing code. It can be used to onboard users to new codebases, and help in reducing context switching. Budding programmers or in other words, non-programmers can leverage it to navigate the space as well. But as of now, Codex looks to be in nascent stages when it comes to broader industry adoption.

What’s exciting is the possibility of OpenAI Codex being used in more projects besides the Microsoft-owned GitHub Copilot. Microsoft has pumped $1 billion into OpenAIOpens a new window since 2019 and has expressed interest in making language understanding widely available under AI at ScaleOpens a new window on its Azure cloud platform. An OpenAI-Microsoft may be on the cards but until it materializes, users can sign up with OpenAI to join the waitlist for the Codex API.

Let us know if you enjoyed reading this news on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!