ChatGPT can now access live information on the internet, answer queries based on image content, and interface with users via speech, after a busy week of updates from OpenAI.
Those on a ChatGPT Plus or Business subscription are now able to draw on live data from the internet, with a link to sources, through an integration with Bing search.
This will enable users to produce more relevant and informed responses and put an end to ChatGPT being limited to data from 2021.
In addition to its expanded search capabilities, the chatbot is also being given wider access to the multimodal capabilities of GPT-4, the large language model (LLM) that powers its paid tier, to accept image inputs.
The chatbot can also now process user speech using OpenAI’s open source speech recognition model Whisper.
Research is a clear use case for the new search features, with workers now able to draw together facts from across the internet via the central dashboard of ChatGPT.
For example, users could use ChatGPT to summarize information on a rival business’ financial performance based on their publicly available earnings reports, or to aggregate product reviews for a trending product from across a variety of websites.
“Browsing is particularly useful for tasks that require up-to-date information, such as helping you with technical research, trying to choose a bike, or planning a vacation,” OpenAI stated on X (formerly Twitter).
With its new capabilities, ChatGPT can compete more directly with the likes of Bing Chat or Google Bard. Direct comparison between Bing’s AI search option could be an area of particular interest, as both make heavy use of GPT-4 for generative AI search.
GPT-4 was billed as a multimodal model from its announcement, with the capability to process both text and image inputs. In tests, the model provided real-time guidance to a sight-impaired person via the app Be My Eyes, in place of a human volunteer.
Discover the three pathways that AWS customers use to generate business value from their applications.
DOWNLOAD FOR FREE
OpenAI stated that its development of the new ChatGPT image input has been shaped by its collaboration with Be My Eyes, including feedback on how to make the service most useful for sight-impaired users.
IT users could now make use of these capabilities within ChatGPT to turn flowcharts into working code or troubleshoot flaws present in a circuit diagram. They could also pass in screenshots of a PDF document to produce a summary of its content, or have the service expand on a hand-written note.
Smaller businesses in particular may benefit from image processing capabilities being embedded directly within ChatGPT, rather than having to call GPT-4 via the OpenAI API and develop their own app with which it can interface.
ChatGPT’s new ability to process voice inputs is another potential boon for accessibility, as it allows users who are unable to use a keyboard to interact with the chatbot.
Accessibility tech is still expensive, from screen readers to accurate transcription models, and ChatGPT’s capabilities could soon provide a more affordable alternative to these dedicated solutions.
Businesses could also soon use OpenAI’s new voice model to automatically translate customer input or their own content. The new voice input features for ChatGPT allow a user to translate speech in one language directly into text in another through the service.
Taking the model a step further, firms could also take audio content such as a podcast or video voice-over and use a combination of Whisper and OpenAI’s new voice model to produce a translated copy of the content in any of the supported languages, in the voice of the original speaker.
Spotify has already trialed this in collaboration with OpenAI for its Voice Translation feature for podcasts, which has seen a select few podcasts on its platform translated to French, German, and Spanish while retaining the voiceprint of the original hosts.
Introducing up-to-date internet access to ChatGPT allows the system to inform answers using a wider range of information, but will not necessarily improve its accuracy.
A recent study found that ChatGPT gave incorrect answers to more than half of the programming questions it was given and that users placed an undue amount of confidence in poor answers simply because they were produced by ChatGPT.
In the day since the internet search feature was added to ChatGPT, some on the ChatGPT subreddit have expressed their disappointment.
“This model seems to be trained to summarise web pages,” wrote one Redditor.
“It's a challenge to get it to give the full content but you can.”
Another noted that the Bing search instructions seemingly overrode their custom instructions to provide detailed answers, and claimed the new feature is “virtually useless unless you really just want a short answer”.
OpenAI has promised that in the coming weeks, the new features will be added for all users, but did not elaborate on whether this means GPT-4 will be made available for free tiers.
What are the risks of the new features?
There are clear risks posed by AI search engines, which are not free from the hallucinations that have become a typical feature of LLMs, and users will still benefit from double-checking ChatGPT responses using the links it provides.
Similarly, businesses will have to assess the risks posed by relying on AI image processing. For example, misidentification of an image could lead to a user being given incorrect advice on anything from a programming flowchart to how to fix a piece of hardware, and users will ultimately have to take the degree of uncertainty that comes with all AI systems into account.
Image inputs can also provoke the same kinds of hallucinations as text inputs, and OpenAI stated that it has curbed ChatGPT’s ability to describe people in images due to privacy concerns and the potential for inaccurate statements to be made.
This could work to shield ChatGPT from accusations that it is performing retrospective or real-time facial recognition, both of which are subject to thorny legislative debate in the EU, UK, and US.
The privacy implications of image analysis, either in real-time or after the fact via the upload of images into services such as ChatGPT, will form one pillar of the multiple legal hurdles AI faces in the coming years.
Reputational damage is something all enterprises seeking to implement AI systems must consider.
As reported by The Guardian, users in New Zealand recently discovered that supermarket chain Pak ‘n’ Save’s AI meal planner app, intended to produce recipes based on descriptions of ingredients a user has in their cupboards and fridge, would generate recipes containing bleach or ant poison if told to use non-edible items.
Pak ‘n’ Save has stated it is working to fine-tune this potentially dangerous quirk out of its model, and OpenAI will need to put similar mitigations in place to prevent users from producing unwanted or risky outputs.
Similarly, OpenAI will be faced with questions over whether ChatGPT image and voice input is used to train its image-generating model DALL-E, or its new text-to-speech model.
OpenAI has previously stated that it does not use ChatGPT Enterprise data to train its own models, but this has not prevented firms such as Apple banning its employees from using ChatGPT over fears sensitive data could leak through the service.
Businesses will seek affirmation that images their employees send through the model - including photos of manufacturing floors or proprietary diagrams - are secure once passed to the black-box AI service.
Microsoft’s generative AI voice model VALL-E stoked fears of potential fraudulent activity and audio deepfakes with its ability to reproduce a speaker’s voice using just a few seconds of recorded audio. OpenAI stated that its new voice technology can produce similar results and that it has subsequently limited the technology to trusted voice actors for the time being.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2023.
Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at email@example.com or on LinkedIn.