This artificial intelligence already translates and transcribes any audio file

Whisper is a new artificial intelligence from OpenAI that aims to revolutionize speech-to-text technologies and translators. According to Ars Technica, this AI is capable of transcribing and translating interviews, podcasts, conversations, etc. More importantly, his ability to do so is almost on a human level.

According to OpenAI, its artificial intelligence has been trained with more than 680,000 hours of audio. But in addition to listening, Whisper also had to match those words to the written text.

Thanks to the neural network of artificial intelligence, you can use input data context and then learn associations that can be translated into the model output.

How Whisper works, the AI ​​capable of translating and transcribing any audio input


The input audio is divided into 30-second chunks, OpenAI describes in the official statement. It is then converted into a spectrogram… and transmitted to the encoder.

But that is not all. The encoder is then trained to predict the corresponding text. How is it possible ? They intertwine tokens special tokens that tell the model to perform a single task such as language identification. Other variables are then added to the equation, such as sentence-level time signature identification, multilingual speech transcription, and English translation.

Best of all, Whisper’s work doesn’t end there. OpenAI has decided to publish its code so that it can serve as the basis for future speech processors. and accessibility tools. Therefore, it is possible to witness improvements in artificial intelligence.

The results are impressive

artificial intelligence
The technology behind this artificial intelligence is as impressive as the results. They used a podcast episode to test its power which contained a fragment where a telephone was used to transmit sound, so the quality left a lot to be desired.

Despite this, Whisper did a good job of transcribing text while running in Python. However, this technology doesn’t work in real time and, according to ArsTechnica, it took a while to finalize on a mid-range Intel processor. In the end, the result was much better than the AI-powered transcription services we’ve tried in the past.

But beware, there is fine print in the Whisper code. According to its creators, it is a tool that could also be used for evil. For example, to identify the interlocutors of a conversation, or even to automate the follow-up. However, OpenAI hopes that it will be put to good use and allow developers to create much more complex transcription and translation tools.

Leave a Comment

Your email address will not be published. Required fields are marked *