Baidu Develops New AI that Can Imitate your Voice by Listening for One Minute

Baidu Develops New AI that Can Imitate your Voice by Listening for One Minute
Share this:

With the current rapid technological advancement, it is hard to turn a blind eye to the mimicry being made possible by technology today. Previously, researchers created a deep learning-based AI with the ability to overlay a person’s face onto another individual’s body. They have done it again. This time, researchers working at Baidu, a Chinese Internet search giant, have developed artificial intelligence (AI), which they claim can learn how to mimic your voice accurately upon listening to it for less than a minute.

A member of Baidu’s communication team, Leo Zou, acknowledged the new artificial intelligence (AI) in an interview with Digital Trends. He said that the technology marks a groundbreaking achievement, which shows the ability to adapt speech synthesis, a sophisticated generative modeling issue, to new cases through learning from several examples efficiently. Previously, such a model would require numerous examples to learn as opposed to a fraction of what is needed today. This situation is a clear testament of Baidu’s success in artificial intelligence research, and more precisely, speech synthesis technology.

Despite Baidu’s success, the company is not the first to develop a voice-mimicking AI. In fact, we looked at a project dubbed Lyrebird last year, which relied on neural networks to mimic voices using a relatively limited number of samples. In fact, the project was successful in replicating the voices of the former and current president of the United States of America, Barrack Obama and Donald Trump respectively.

Similar to Lyrebird’s innovation, Baidu’s voice replicating technology is not completely convincing. However, it represents a remarkable improvement compared to many other robotic AI-powered voice assistants developed in the past.

The development of the new artificial intelligence by Baidu is deep-rooted on the company’s text-to-speech generating system called Deep Voice. The system underwent audio training for more than 800 hours using a whopping 2,400 speakers. It requires only 100 5-second parts of voice training data in a bid to sound its best. Nevertheless, a version of Deep Voice that was trained on merely ten 5-second examples or samples managed to trick a voice-recognition system over 95% of the time.

According to Leo Zou, Baidu sees powerful applications for their AI-powered voice-replicating technology. One of the significant examples of its potential use includes assisting patients without voices. He gave this example while acknowledging the company’s technology as a significant step towards succeeding in developing modified human-machine interfaces. Leo Zou also said that the technology could make it easier for a mother to configure an audiobook reader by just using her voice.

Baidu’s projects that the technology will allow the production of original digital content. For instance, many video game characters will be in a better position to acquire distinctive voices thanks to this technological breakthrough. Leo Zou added that the voice-replicating technology could come in handy in speech-to-speech translation since the synthesizer can learn to imitate the speaker identity in a different language. In addition, if you want to delve deeper into this topic, try reading a paper that describes such work.

Source Digitaltrends

Share this:

Leave a Reply

Notify of