OpenAI released an artificial intelligence (AI) tool dubbed Whisper in 2022, which can transcribe speech to text. However, a report claimed that the AI tool is prone to hallucinations and is adding imaginary text in transcriptions. This is concerning as the tool is said to be used in several high-risk industries such as medicine and accessibility. A particular concern reportedly comes from the use of the tool in doctor-patient consultations, where hallucination can add potentially harmful information and put the patient’s life at risk.
OpenAI Whisper Reportedly Prone to Hallucinations
The Associated Press reported that OpenAI’s automatic speech recognition (ASR) system Whisper has a high potential of generating hallucinated text. Citing interviews with multiple software engineers, developers, and academic researchers, the publication claimed that the imaginary text includes racial descriptions, violence, and medical treatments and medications.
Hallucination, in the AI parlance, is a major issue which causes AI systems to generate responses which are incorrect or misleading. In the case of Whisper, the AI is said to be inventing text which was never spoken by anyone.
In an example verified by the publication, the speaker’s sentence, “He, the boy, was going to, I’m not sure exactly, take the umbrella.” was changed to “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.” In another instance, Whisper reportedly added racial information without any mention of it.
While hallucination is not a new problem in the AI space, this particular tool’s issue is more impactful as the open-source technology is being used by several tools that are being used in high-risk industries. Paris-based Nabla, for instance, has created a Whisper-based tool which is reportedly being used by more than 30,000 clinicians and 40 health systems.
Nabla’s tool has been used to transcribe more than seven million medical visits. To maintain data security, the company also deletes the original recording from its servers. This means if any hallucinated text was generated in these seven million transcriptions, it is impossible to verify and correct them.
Another area where the technology is being used is in creating accessibility tools for the deaf and hard-of-hearing community, where again, verifying the accuracy of the tool is significantly difficult. Most of the hallucination is said to be generated from background noises, abrupt pauses, and other environmental sounds.
The extent of the issue is also concerning. Citing a researcher, the publication claimed that eight out of every 10 audio transcriptions were found to contain hallucinated text. A developer told the publication that hallucination occurred in “every one of the 26,000 transcripts he created with Whisper.”
Notably, at the launch of Whisper, OpenAI said that Whisper offers human-level robustness to accents, background noise, and technical language. A company spokesperson told the publication that the AI firm continuously studies ways to reduce hallucinations and has promised to incorporate the feedback in future model updates.