OpenAI o1: Expectations vs Reality

OpenAI released its new o1 models on Thursday, also calledc”an AI designed to overthink it”

  ChatGPT users had their first chance to try AI models that pause to “think” before they answer. 

There was  a lot of hype building up to these models, codenamed “Strawberry” inside OpenAI.

“The hype sort of grew out of OpenAI’s control,” said Rohan Pandey, a research engineer with the AI startup ReWorkd.

The CEO  however tried to trim expectations , tweeting that  “o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.”

PROs

OpenAI o1 is unique  in its ability to think through big ideas because, it;

CONs

1.    OpenAI’s latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive

2.     OpenAI o1 is  uniquely pricey. It is roughly four times more expensive to use than GPT-4o.

In most models, users pay for input tokens and output tokens. However, o1’s ability to  break big problens down into smaller steps adds a hidden process  and  a large amount of compute users never fully see. These extra steps are charged  in the form of “reasoning tokens.”

3.    Open AI o1 struggles at simpler tasks.

 GPT-4o and AI o1 Trial examples

In a trial example, to help a family plan Thanksgiving,  figuring out if two ovens would be sufficient to cook a dinner for 11 people , the model performed much better than GPT-4o.

For a simpler question,however, o1 does way too much — it doesn’t know when to stop overthinking.

For example when asked where to find cedar trees in America,  it delivered an 800+ word response., outlining every variation of cedar tree in the country, including their scientific names.

 GPT-4o did a much better job answering this question, in  about three sentences explaining  that cedar trees are  all over the country.

Conclusion 

AI critics, observers and users have, from their interactions with the o1 model ,made the following conclusions :

1.   Compared to GPT-4o, the o1 models feel like one step forward and two steps back. 

2.  It is not all round impressive. 

Ravid Shwartz Ziv, an NYU professor who studies AI models, said;

“It’s impressive, but I think the improvement is not very significant.” “ It’s better at certain problems, but you don’t have this across-the-board improvement.”

3. Open AI o1’s reasoning ability is good enough to solve a niche set of complicated problems where GPT-4 falls short, but not quite as the revolutionary step forward that GPT-4 represented for the industry.

4. For all of these reasons, it’s important to use o1 only for the questions it’s truly designed to help with: big ones.

Exit mobile version