OpenAI Enhances ChatGPT with Voice Conversations and Image Recognition

Users can now engage in voice conversations and utilize image-based queries with ChatGPT, available to Plus and Enterprise users initially

The image-based features are equally intriguing. Users can show the chatbot images for various purposes, such as diagnosing a malfunctioning grill, meal planning based on fridge contents, or solving math problems from a picture. OpenAI is utilizing GPT-3.5 and GPT-4 to drive the image recognition capabilities. To use ChatGPT’s image-based functions, users can tap the photo button (iOS and Android require tapping the plus button first) to capture a new image or select an existing one. Multiple images can be discussed, and a drawing tool is available to focus on specific image details.

In its announcement, OpenAI acknowledged the potential for misuse, such as bad actors mimicking voices, potentially leading to fraud. As a result, OpenAI is initially focusing on voice conversations with ChatGPT and collaborating with select partners for limited use cases.

Regarding images, OpenAI has worked with Be My Eyes, an app aiding blind and low-vision individuals through volunteer-assisted video calls. ChatGPT engages in general image conversations, even those featuring individuals in the background. OpenAI, however, limits ChatGPT’s analysis and direct statements about people in images to respect privacy. The company has also published a paper on the safety properties of its image-based functionality, referred to as GPT-4 with vision.