OpenAI o1: Expectations vs Reality

Shalom Femi

6 months ago

OpenAI released its new o1 model on Thursday, also called ”an AI designed to overthink it”

ChatGPT users had their first chance to try AI models that pause to “think” before they answer.

There was a lot of hype building up to these models, codenamed “Strawberry” inside OpenAI.

“The hype sort of grew out of OpenAI’s control,” said Rohan Pandey, a research engineer with the AI startup ReWorkd.

The CEO however tried to trim expectations , tweeting that “o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.”

PROs

OpenAI o1 is unique in its ability to think through big ideas because, it;

“thinks” before answering,
breaks down big problems into small steps and
attempts to identify when it gets one of those steps right or wrong

CONs

1. OpenAI’s latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive

2. OpenAI o1 is uniquely pricey. It is roughly four times more expensive to use than GPT-4o.

In most models, users pay for input tokens and output tokens. However, o1’s ability to break big problens down into smaller steps adds a hidden process and a large amount of compute users never fully see. These extra steps are charged in the form of “reasoning tokens.”

3. Open AI o1 struggles at simpler tasks.

GPT-4o and AI o1 Trial examples

In a trial example, to help a family plan Thanksgiving, figuring out if two ovens would be sufficient to cook a dinner for 11 people , the model performed much better than GPT-4o.

For a simpler question,however, o1 does way too much — it doesn’t know when to stop overthinking.

For example when asked where to find cedar trees in America, it delivered an 800+ word response., outlining every variation of cedar tree in the country, including their scientific names.

GPT-4o did a much better job answering this question, in about three sentences explaining that cedar trees are all over the country.

Conclusion

AI critics, observers and users have, from their interactions with the o1 model ,made the following conclusions :

1. Compared to GPT-4o, the o1 models feel like one step forward and two steps back.

2. It is not all round impressive.

Ravid Shwartz Ziv, an NYU professor who studies AI models, said;

“It’s impressive, but I think the improvement is not very significant.” “ It’s better at certain problems, but you don’t have this across-the-board improvement.”

3. Open AI o1’s reasoning ability is good enough to solve a niche set of complicated problems where GPT-4 falls short, but not quite as the revolutionary step forward that GPT-4 represented for the industry.

4. For all of these reasons, it’s important to use o1 only for the questions it’s truly designed to help with: big ones.