Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near “human level robustness and accuracy.”

But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.

More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients’ consultations with doctors, despite OpenAI’ s warnings that the tool should not be used in “high-risk domains.”

  • wdx@feddit.org
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    2
    ·
    20 days ago

    This definition of “better” feels like claiming that a Beeper that’s constantly hooked to power is the perfect alarm because it warns you every time someone is trying to break in - while entirely ignoring that it is just constantly blaring.

    • TheBlackLounge@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      20 days ago

      I use it for generating subtitles. It figures out context, it ignores stuttering, it does punctuation etc. It’s really is just better. With clean audio it transcribes like a human does.

      It does better than other techniques with dirty audio, but when it fails it fails weird, which is the big issue here.