Machine Learning Will Do Auto-Programming’s Heavy Lifting

Programming an information system can be strenuous labor. If you’ve ever spent hours intently producing some intricately detailed textual composition, you know what I mean. And if you’re typing out mind-numbing technical verbiage every single work day, the repetitive stresses can wear you down.

If you’re a programmer, your nervous system will scream for anything that can relieve the robotic tedium of it all and possibly even accelerate successful completion of whatever build you’re laboring on. For many years, software developers have eagerly embraced source-code generation tools to lighten the load. Some refer to these as “automatic programming” solutions, though you’d be hard-pressed to name any developer who’s ever automated him or herself out of a job. That’s because expert human judgment is still the core of high-quality software development.

With that in mind, it might be more apt to refer to code generation as an “augmented programming” technique. Throughout the 70+ years since the invention of stored-program controlled computing, code generation has evolved to encompass a wide range of technical approaches. The foundation of them all is the notion that developers can specify a concise high-level programming abstraction from which a more verbose executable code implementation may be generated. As implemented in commercial code-generation tools, the most common abstractions include programming templates, domain-specific languages, metamodels, database models, metadata models, graphical models, flowchart models, tree models, and scripts. What they all output is well-formed textual statements in whatever specific programmatic languages and syntaxes the tool vendor supports, such as Java, C#, C++, VB.NET, Python, SQL, JSON, JavaScript, XML, HTML, and so on.

One of the more interesting recent innovations in this mature segment has been the use of Artificial Intelligence (AI) as a code-generation kickstarter. For example, here’s a recent story on one Denmark-based vendor’s tool that uses Machine Learning (ML) models to automatically translate raw application screenshots into code that can execute on Android, IOS, and other platforms. Considering it only gets roughly 3 out of every 4 lines of code correct, this implementation of ML-driven code-gen won’t obsolete human programmers. However, as a proof of technology, it’s not too shabby. If nothing else, it’s a forerunner of what are sure to be other, more accurate, and increasingly versatile code-generation tools in the future that leverage Machine Learning, Deep Learning (DL), and other data-driven AI techniques.

Actually, I’m surprised it took this long for tool vendors this long to realize that Machine Learning has code-generation potential. After all, natural language generation (NLG) is one of ML’s well-established applications, especially in Web content creation, and such vendors as Arria NLG, Automated Insights, Narrative Science, and Yseop are solid players in this segment. What they produce is human-readable textual narratives, not necessarily machine-readable or compilable source code. But it’s not a huge stretch to imagine that the same core Machine Learning technologies that they use can also mill massive volumes of structured program code (a process that the cheeky might call “unnatural language generation”).

We can see the outlines of a more versatile ML-driven auto-programming future in initiatives such as the one that Microsoft Research discusses in this recent article and this research paper. Essentially, what Microsoft’s “DeepCoder” initiative does is use ML models to auto-generate program code through the following procedure:

Compile a supervised-learning training-data corpus of code-builds related to a particular application domain of interest;
Use that training data to develop Machine Learning models that learn the specific programmatic abstraction-layer inputs (e.g., domain-specific language specifications) most predictive of various programmatic code-build properties;
Search the predictive “program space” of likely code-builds, and associated properties, that might be generated from various abstraction-layer inputs consistent with some new programming challenge;
Assess the best-fit trade-offs (e.g., performance, efficiency, maintainability) among the most likely code-builds that address those core functional requirements; and
Generate the best-fit code-build, from the likely program space, that is consistent with a particular abstraction-layer specification of some specific programming challenge

According to one of the paper’s authors, this approach—which they call “Inductive Program Synthesis” – targets “people who can’t or don’t want to code, but can specify what their problem is.” And they boast that their ML-augmented code-generation is “able to solve problems of difficulty comparable to the simplest problems on programming competition websites).”

That may be so, but this technology is in its early stages, so it’s nowhere near ready to serve as a power tool for the most complex programming challenges. Nevertheless, its developers hold out the possibility that in the future it might evolve to scan “popular code repositories like StackOverflow or GitHub” to build its corpus of code-builds that address various programming problems. The cited article states that in the future, DeepCoder’s developers “want this system to understand the nuances of complete coding languages, and be able to recognize good code online.” And as the approach evolves, it’s likely that researchers will incorporate other abstraction-layer inputs (e.g., data models, metadata, metamodels, templates, etc.) into their ML-driven code-generator.

My sense is that in the next several years we’ll see this approach—which one might call ML-augmented coding—gain adoption as a rapid application development tool. Rather than “re-invent the wheel” with handcrafted code or repurposed code modules, future developers may simply check off program requirements in a high-level GUI, and then, with a single click, auto-generate the predictively best-fit code-build into the target runtime environment.

Let’s take our imaginations one step further. If future programmers leverage generative adversarial networks to ensure that ML-generated code is at least as good as whatever the best human programmers might have produced, that could hasten the day when most developers never need to touch a single line of executable code ever again. And if the target platforms to which the ML-generated code is being deployed are also running auto-generated Machine Learning algorithms along the lines of what I discuss in this recent Datanami column, that’s highly ironic.

And it’s also highly likely, given the state of current trends. I predict that by 2025, this soup-to-nuts ML-driven “auto-programming” scenario will be mainstream in most enterprise application environments.

Machine Learning Will Do Auto-Programming’s Heavy Lifting

James Kobielus

The Good AI: The Need for AI Agent Behavior Catalogs

How Logical Data Layers Support Ethical, Transparent AI

Ask a Data Ethicist: Can You (Ever) Safely Use ChatGPT for Researching a Medical Condition?

Thanks!

Machine Learning Will Do Auto-Programming’s Heavy Lifting

James Kobielus

Related Articles

The Good AI: The Need for AI Agent Behavior Catalogs

How Logical Data Layers Support Ethical, Transparent AI

Ask a Data Ethicist: Can You (Ever) Safely Use ChatGPT for Researching a Medical Condition?

Lead the Data Revolution from Your Inbox.

Thanks!