Well, so much for the “Her” jokes. OpenAI, the company behind the latest multimodal language model, GPT-4o, has announced that it is pausing the development of its new voice, called “Sky.” The decision comes amidst a wave of suggestive comments, memes, and comparisons to Scarlett Johansson's AI character in the film “Her,” which prompted the company to debunk any ties to the Hollywood actress.
"We've heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them," OpenAI stated.
We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them.
Read more about how we chose these voices: https://t.co/R8wwZjU36L
— OpenAI (@OpenAI) May 20, 2024
OpenAI's Sky voice was unveiled last week as part of the company's efforts to create more human-like and natural conversations with its AI models. The voice was designed to understand and respond to emotional cues, showcasing the advanced capabilities of GPT-4o.
However, the demonstration quickly took a different direction, as users began to test the model's boundaries and flirt with the voice. A flurry of tweets described Sky as “flirty,” “sexy,” and “provocative,” with some users joking about having a new girlfriend or feeling seduced by the AI voice.
<literally any request at all>
gpt4o: https://t.co/2hOwT37uPr pic.twitter.com/GvrNeSzWUr
— bayes (@bayeslord) May 13, 2024
just ended calling with gpt-4o at 5am pic.twitter.com/IdywljzEYq
— echo4eva (@echo4eva) May 13, 2024
“oh yeah, my girlfriend is a model” pic.twitter.com/1Ts6MIOypg
— vittorio (@IterIntellectus) May 14, 2024
Parallels were drawn to the 2013 Spike Jonze film "Her," featuring Scarlett Johansson as an AI assistant that a lonely writer becomes attached to. The situation escalated with comedy sketches referencing Sky's sultry vocals and apparent resemblance to the actress —a similarity even noted by OpenAI co-founder Andrej Karpathy
I’m guessing this is the joke that is getting Sky booted from @OpenAI’s voices.
(For context: on the season finale of SNL, hosts Michael Che and Colin Jost write jokes for each other that neither has seen before reading them on-air.) pic.twitter.com/3U7fd5M89y
— Theoretically Media (@TheoMediaAI) May 20, 2024
And here’s where things started to get tricky. Johansson has actively opposed training AI models on her likeness, while OpenAI is known for aiming to release models as content-friendly as possible to avoid controversy—to the point of being criticized for “dumbing down” its models, or allegedly fine-tuning them to have a more left-leaning stance. Within days, CEO Sam Altman tweeted about potential changes to Sky, assuring people that the new model was not publicly available.
also for clarity: the new voice mode hasn't shipped yet (though the text mode of GPT-4o has). what you can currently use in the app is the old version.
the new one is very much worth the wait!
— Sam Altman (@sama) May 15, 2024
On Sunday, the company apparently put the first nail in Sky’s coffin announcing that it was pausing development. The announcement was accompanied by an official blog post explaining how OpenAI worked with voice actors to select and train its voice models.
In the article, the company denied using Johansson's voice as a template for Sky, clarifying that it belonged to an unnamed professional actress who was using her own voice.
"We worked with industry-leading casting and directing professionals to narrow down over 400 submissions before selecting the five voices ... Each of the voices—Breeze, Cove, Ember, Juniper and Sky—are sampled from voice actors we partnered with to create them.” OpenAI said.
“We believe that AI voices should not deliberately mimic a celebrity's distinctive voice—Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” the post notes.
OpenAI said it would not reveal the actors’ identities due to privacy reasons. The company also revealed that it plans to introduce additional voices in the future to better match the diverse interests and preferences of users.
Probably none of those will flirt with you, though, based on the fallout to Sky.
Edited by Andrew Hayward