- OpenAI's central mission is to develop AI that emulates human intelligence.
- Its new o1 models inch it closer to that goal thanks to its new reasoning capabilities.
- Experts are impressed, but note that the AI models continue to face familiar issues.
Each time OpenAI shows off new research, it's worth considering how much closer it inches the company toward its mission statement: achieving artificial general intelligence.
The coming of AGI — a type of AI that can emulate the ingenuity, judgment, and reasoning of humans — has been an industry obsession since the arrival of the Turing Test in the 1950s. Three months after its release of ChatGPT, OpenAI reaffirmed its ambition to deliver AGI.
So, how does its latest release stack up?
On Thursday, after much anticipation, the San Francisco-based company led by Sam Altman finally unveiled OpenAI o1, a new series of AI models that are "designed to spend more time thinking before they respond."
OpenAI's big claims about the models suggest it is entering a new paradigm in the generative AI boom. Some experts agree. But are they putting the industry on the cusp of AGI? Not yet.
AGI is a distance away
OpenAI has tried to strike a careful balance between managing expectations and generating hype in the reveal of its new models.
OpenAI has said the current GPT-4o model behind the chatbot is better for "browsing the web for information." But, despite not having "many of the features that make ChatGPT useful," the company claimed the o1 models represent a "significant advancement" for complex reasoning tasks.
The company is so confident in this claim it has said it was "resetting the counter back to one" with the release of these new models — limited to a preview release for now — and naming them "o1" as a symbol of the new paradigm they present.
In some ways, the o1 models do enter OpenAI into a new paradigm.
The company said the models emulate the capabilities of Ph.D. students on "challenging benchmark tasks in physics, chemistry, and biology". They can also excel in tough competitions like the International Mathematics Olympiad and the Codeforces programming contest, OpenAI added.
There seem to be a few reasons for this boost in performance. OpenAI said it "trained these models to spend more time thinking through problems before they respond, much like a person would."
"Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes," the company noted in a blog.
Noam Brown, a research scientist at OpenAI, offered a useful way to think about it. The models, he wrote on X, were trained to have a "private chain of thought" before responding, which essentially means they spend more time "thinking" before they speak.
Where previous AI models were bottlenecked by the data fed to them during the "pre-training" phase, Brown wrote, o1 models showed that "we can now scale inference". This is the ability of a model to process data it hasn't previously been presented with.
Jim Fan, a senior research scientist at Nvidia, noted that the technicalities underlying this are what have helped make this fundamental breakthrough of OpenAI's o1 models possible.
As Fan wrote, that's because a huge amount of the computing power once reserved for the training portion of building an AI model has been "shifted to serving inference instead."
It's not clear this takes OpenAI much closer to AGI, however.
Following the release of the o1 models, OpenAI boss Altman responded to an X post from Will Depue — an OpenAI staffer who highlighted how far large language models have come along in the past four years — by writing, "stochastic parrots can fly so high…"
It was a subtle reference from Altman to a research paper published in 2021, which positioned the kinds of AI models OpenAI works on as technologies that appear to understand the language they generate but do not. Is Altman suggesting the o1 models are stochastic parrots?
Meanwhile, others have pointed out that the models appear to be stuck with some of the same issues associated with previous models. Uncertainty hovers over how o1 models will perform more broadly.
Ethan Mollick, a professor of management at Wharton who spent some time experimenting with the o1 models before their unveiling on Thursday, noted that despite the clear jump in reasoning capabilities, "errors and hallucinations still happen."
Nvidia's Fan also noted that applying o1 to products is "much harder than nailing the academic benchmarks" OpenAI used to showcase the reasoning capabilities of its new models.
How OpenAI and the wider AI industry works toward solving these problems remains to be seen.
While the reasoning capabilities of the o1 models shift OpenAI into a new era of AI development, the company put its technology at stage two on a five-stage scale of intelligence this summer.
If it's serious about reaching its end goal of AGI, it's got a lot more work to do.