AI’s Exponential Leap: What the Next Few Years May Bring

It’s Difficult to Make Predictions, Especially About the Future

The quote has been attributed to various figures—including Niels Bohr and Nostradamus—and applies to everything from the stock market to AI capabilities.

I will in this blog post try to explain what I believe is coming in the next few years of AI, and the reason why I believe so. Even though it is, indeed, difficult. But let’s take it from the start, shall we?

The roots of modern AI research can be traced back to the Dartmouth Workshop in 1956 (where the term “Artificial Intelligence” was coined). Since then, one venue of AI research has been to train increasingly larger neural networks. (Large language models, LLMs, are a specific kind of neural networks). One measure of the size of these networks is the FLOP (floating-point operations) required to train them.

It’s clear that models are growing exponentially larger—but does increased size actually translate into greater competence?

My current perspective is informed primarily by two data sources, Our World in Data’s analysis “Test scores of AI systems on various capabilities relative to human performance” and a recent Metr report titled “Measuring AI Ability to Complete Long Tasks”.

Let’s have a look at each one.

The Our World in Data analysis illustrates how AI performance on various benchmarks has evolved over time, compared against a human baseline (set at 0), with a completely incompetent AI scoring -100. I note that the benchmarks can be divided in to three types of benchmarks:

  • Stuff which was “solved” before the invention of the transformer architecture (the architecture used by modern LLMs)

  • Stuff which was “solved” quickly after the invention of the transformer architecture

  • Stuff which is not yet “solved” (as of publication of the data, 2023)

I use “solved” to mean “according to a benchmark, AI systems are better than humans at this particular task”.

We can see that capabilites in the data set being monitored before the invention of the transformer architecture made progress from -100 (useless) to 0 (comparable with human performance) over 7-20 years.

After the invention of the transformer architecture (2017), we suddenly see some capabilities being “solved” in just one or two years (mainly metrics related to language understanding - something LLMs excel in).

As of 2023, a few high-value capabilities were rapidly improving but had not yet reached human-level performance. An update to this analysis is planned for May 2025, and this will most probably reflect the significant progress made in general knowledge test, math problem-solving and code generation.

While correctly answering single-prompt questions is impressive, solving real-world problems often requires reliably performing multiple tasks in sequence over extended periods. The aptly named Metr report, “Measuring AI Ability to Complete Long Tasks,” assesses how well AI systems perform on such extended tasks.

Metr defines task length as the time a professional human would take to complete it. According to the report, AI’s capability to reliably handle tasks—measured in equivalent human completion time—is following a scaling law, doubling approximately every seven months.

Older models, such as GPT-2, could only handle tasks that a professional human would complete in seconds. For example, GPT-2 could reliably produce a grammatically correct sentence, but struggled to construct multiple sentences into a coherent story. The latest state-of-the-art models can handle tasks which would take a human one hour to complete (such as writing longer pieces of computer code, or performing multi-steps searches on internet and combining the results).

If this empirical scaling law holds, AI systems will be able to reliably handle tasks equivalent to one month work for a human in just a few years time.

Combine this prediction (one month worth of work being done reliably with a simple prompt) with how capabilities in many domains evolve (language, coding, math and complex reasoning) and we should expect AI being capable of automating several economically valuable parts of the economy.

The scenario predicted in AI 2027 maybe is not so outlandish?

We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.