Have you heard about this incredible new AI system called GPT-4o? It’s like having a personal assistant that can understand and communicate in almost any way imaginable! You can talk to it, show it pictures or videos, and it can respond with text, audio, or even images. It’s crazy fast too – it can respond to your voice in less than a third of a second, just like talking to another person.
And get this – it’s not just good at understanding English text and code like the previous models. GPT-4o is way better at comprehending different languages, images, and audio. It’s like having a translator, personal shopper, and research assistant all rolled into one!
I know AI can be a bit intimidating, but this feels like a major step towards really natural interaction with computers. It’s almost like they’ve become fluent in human communication.
Watch the OpenAI LiveStream
GPT-4o (the ‘o’ stands for ‘Omni’), which promises enhanced quality, speed, and multilingual support across more than 50 languages. This update aims to provide an inclusive experience, making the technology accessible to a wider global audience.
A notable addition is a desktop version compatible with Mac computers, initially available to paid subscribers. The team highlighted potential use cases, such as university lecturers offering these tools to students or podcasters leveraging AI to create content tailored to their audience’s needs. The real-time capabilities of GPT-4o are impressive, with a response time as low as 232 milliseconds and an average of 320 milliseconds, comparable to human conversation dynamics.
While the new features will be accessible to free users, OpenAI emphasized that Pro users will enjoy exclusive benefits, including access to up to five times the computational capacity. Furthermore, the improvements will extend to the application programming interface (API), promising a twofold increase in speed and a 50% reduction in cost.
GPT-4o – solving the latency issue, access to the GPT store, and more…
Large Language Models like ChatGPT previously faced latency issues when using voice mode due to the combination of transcription, text processing, and text-to-speech. OpenAI’s GPT-4o addresses this problem by reasoning across voice, text, and vision simultaneously. GPT-4o can respond to audio inputs within an average of 320 milliseconds, providing the same level of intelligence as GPT-4 to users. Additionally, GPT-4o supports vision inputs, allowing users to upload images and ask specific questions. With support for 50 different languages, 97% of the world’s internet population can now use ChatGPT comfortably. Furthermore, GPT-4o users will have access to the ChatGPT store, where over a million users have already built Custom GPTs for various tasks.
GPT-4o API
OpenAI announced the GPT-4o API, a more efficient and cost-effective model compared to GPT-4 Turbo. GPT-4o offers up to 2x faster speeds than GPT-4 Turbo and is 50% cheaper, costing $5 per million tokens compared to $10 per million tokens for GPT-4 Turbo. It is available for commercial use through OpenAI’s API.
GPT-4o Going Multimodal
Unveiled at the OpenAI event, GPT-4 showcased its groundbreaking ability to understand and respond to real-time conversational speech, mimicking human-like interactions. In a live demonstration, researchers engaged in a natural dialogue with the AI model, which not only comprehended vocal inputs but also responded contextually, even addressing emotions.
One researcher intentionally tried to throw GPT-4 off-balance by taking heavy breaths, but the model humorously remarked, “I’m not a vacuum cleaner,” and guided the user to calm down. This real-time, interruptible conversation marked a significant advancement over the previous “Voice Mode,” eliminating the unnatural pauses common in AI interactions.
Moreover, GPT-4’s multimodal capabilities extended to understanding and responding to human emotions, a remarkable leap in Emotion AI technology. This seamless integration of speech recognition, natural language processing, and emotional intelligence showcased OpenAI’s commitment to pushing the boundaries of human-AI interaction.
Generating stories, even lullabies, using just voice input
In a remarkable demonstration, ChatGPT seamlessly crafted a bedtime story about robots and love for an OpenAI researcher struggling with sleep. When prompted to enhance the narration with dramatic flair, the AI model’s voice output evolved naturally, mimicking human vocal inflections. Furthermore, upon request, it effortlessly adopted a robotic voice, exhibiting its versatility in storytelling through vocal modulation.
Interacting with ChatGPT using video
You can now interact with ChatGPT using video. Barret demonstrated this feature during the launch as he asked ChatGPT to help him solve a linear equation. The researcher asked ChatGPT not to directly give a solution, but rather help him solve it in a step-by-step manner.
The chatbot gave a complete walkthrough of how to solve this equation, answering multiple questions from Barret at various stages of solving the problem.
But this was just a teaser of things to come. Solving linear equations, while impressive, is a simple ask, when compared to say, solving a coding problem.
Solving a coding problem
Barret opened his computer, and on the screen was a complex coding problem he was solving and needed help with. He hit “Command+C” and gave ChatGPT the simple voice prompt – “Give me a brief one-sentence description of what is going on in the code?”
Not only was GPT-4 able to describe the code, but it was also able to explain what would happen when a particular function was added or removed from the code.
It would take even experienced programmers at least a few minutes to provide this response, but GPT-4 was explaining the code as if it had been written by the chatbot itself.
Real-time Language Translation
Google Translate now faces serious competition, as the team at OpenAI demonstrated GPT-4’s translation capabilities. Mira Murati spoke to ChatGPT in Italian, and the chatbot effortlessly translated the sentences from Italian to English and vice versa.
In conclusion, the launch of GPT-4o by OpenAI represents a significant milestone in the evolution of AI technology. With its unprecedented speed, multilingual capabilities, and versatile features like real-time conversation and multimodal interactions, GPT-4o stands at the forefront of innovation. As users explore the diverse applications of this powerful tool, from language translation to storytelling and problem-solving, it becomes evident that GPT-4o has the potential to revolutionize the way we interact with and harness the power of artificial intelligence in our daily lives.