ChatGPT, the viral chatbot from AI research company OpenAI, has taken the next leap towards more natural language interactions. OpenAI has announced that ChatGPT now has the ability to see images, hear spoken questions, and respond with synthesized speech. This ushers in a new phase of conversational AI that promises to make interactions feel more human than ever before.
But what does this actually mean for the average ChatGPT user? In short – easier, more intuitive conversations that unlock new possibilities.
Let’s start with the new speech capabilities. ChatGPT can now listen and respond to your voice requests fluently. This allows you to have free-flowing, back-and-forth conversations on the go without typing. Ask ChatGPT to tell you a bedtime story in different voices while getting the kids ready for bed. Settle debates during family dinner by asking for its take. Or get recipe inspiration by listing what’s in your fridge and having ChatGPT walk you through options. The AI generates human-like voices, making it feel like you’re chatting with a real person.
The image abilities take things a step further. Snap a photo of a graph at work and have ChatGPT analyze the data for you. Capture the contents of your pantry to get personalized menu ideas. Having trouble with your bike? Send ChatGPT a picture of the flat tire and it can walk you through how to fix it. The possibilities are endless.
To make image conversations even more intuitive, you can use the drawing tool in the mobile app to circle or point out specific parts of a photo. This allows you to focus ChatGPT on the relevant details. For example, you could circle a particular line on a graph or point to the part of your bike that’s broken.
Under the hood, these new capabilities are powered by major upgrades to ChatGPT’s AI. The image understanding comes from new multimodal models GPT-3.5 and GPT-4 which can apply language reasoning to photos, documents, screenshots and more. The voice synthesis uses a new text-to-speech model plus Whisper, OpenAI’s speech recognition system.
While exciting, OpenAI is taking care to roll out these features responsibly. Concerns around misuse of synthesized voices and limitations around image understanding are being actively addressed. User feedback will help refine the models’ safety and accuracy over time.
ChatGPT is entering a new phase of intuitive, multi-modal conversation. With voice and images, it can help with daily tasks in bold new ways while inching closer towards the holy grail of conversational AI – true natural language understanding by machines. This glimpse of the future is here for early testers now, with wider releases soon.