Google just announced Gemini, its most powerful suite of AI models yet, and the company has already been accused of lying about its performance.
An op-ed from Bloomberg claims Google misrepresented the power of Gemini in a recent video. Google aired an impressive “what the quack” hands-on video during its announcement earlier this week, and columnist Parmy Olson says it seemed remarkably capable in the video — perhaps too capable.
The six-minute video shows off Gemini’s multimodal capabilities (spoken conversational prompts combined with image recognition, for example). Gemini seemingly recognizes images quickly — even for connect-the-dots pictures — responds within seconds, and tracks a wad of paper in a cup and ball game in real-time. Sure, humans can do all of that, but this is an AI able to recognize and predict what will happen next.
But click the video description on YouTube, and Google has an important disclaimer:
“For the purposes of this demo, latency has been reduced, and Gemini outputs have been shortened for brevity.”
That’s what Olson takes umbrage with. According to her Bloomberg piece, Google admitted when asked for comment that the video demo didn’t happen in real time with spoken prompts but instead used still image frames from raw footage and then wrote out text prompts to which Gemini to responded. “That’s quite different from what Google seemed to be suggesting: that a person could have a smooth voice conversation with Gemini as it watched and responded in real-time to the world around it,” Olson writes.
To be fair to Google, companies edit demo videos often, especially as many want to avoid any technical hiccups that live demos bring. It’s common to tweak things a little. But Google has a history of questionable video demos. People wondered if Google’s Duplex demo (remember Duplex, the AI voice assistant that called hair salons and restaurants to book reservations?) was real because there was a distinct lack of ambient noise and too-helpful employees. And prerecorded videos of AI models tend to make people even more suspicious. Remember when Baidu launched its Ernie Bot with edited videos and its shares tanked?
In a situation like this, Olson says Google is “showboating” in order to mislead people from the fact Gemini still lags behind OpenAI’s GPT.
Google disagrees. When asked about the validity of the demo, it pointed The Verge to a post from Oriol Vinyals, vice president of research and deep learning lead at Google’s DeepMind (also the co-lead for Gemini), which explains how the team made the video.
“All the user prompts and outputs in the video are real, shortened for brevity,” Vinyals says. “The video illustrates what the multimode user experiences built with Gemini could look like. We made it to inspire developers.”
He added that the team gave Gemini images and texts and asked it to respond by predicting what comes next.
That’s certainly one way to approach this situation, but it might not be the right one for Google — which has already appeared, at least to the public eye, to have been caught flat-footed by OpenAI’s enormous success this year. If it wants to inspire developers, it’s not through carefully edited sizzle reels that arguably misrepresent the AI’s capabilities. It’s through letting journalists and developers actually experience the product. Let people do stupid stuff with Gemini in a small public beta. Show us how powerful it really is.