The tech industry’s generative AI race just got more competitive as Google launched Whisk, a tool designed to create unique images from user-uploaded photos.
Unveiled through Google Labs, Whisk allows users in the US to remix subjects, styles, and settings into new visuals without requiring text prompts.
It builds on Google DeepMind’s AI advancements, showcasing Gemini and Imagen 3 technologies.
The move highlights Google’s focus on delivering accessible AI tools while competing against OpenAI’s suite of consumer products, including the text-to-video generator Sora.
What is Whisk and how does it work?
Whisk offers a new take on AI-powered creativity.
Users can upload images representing subjects, settings, or styles.
The platform processes these inputs using Gemini, Google’s AI foundation model launched in December 2023, which generates captions for the content.
These captions feed into DeepMind’s Imagen 3, a text-to-image generator.
Unlike traditional photo editors, Whisk focuses on creative exploration rather than pixel-perfect results.
It allows users to remix categories—such as turning an image into a plushie toy, enamel pin, or sticker—by adjusting inputs or incorporating text to guide specific details.
Google emphasises that the outputs capture the “essence” of a subject, meaning some variations, such as changes to hairstyle or skin tone, may occur.
DeepMind’s Nobel Prize-winning expertise underpins Whisk
Whisk leverages cutting-edge developments from DeepMind, the AI division Google acquired in 2014.
DeepMind’s AI research contributed to two employees winning the 2024 Nobel Prize in Chemistry for protein structure discoveries.
This underscores the lab’s reputation for pushing technological boundaries, which now extends to creative applications like Whisk.
Whisk also positions Google as a leader in consumer-friendly AI.
While its initial text-to-image tool Gemini faced criticism for producing historically inaccurate images, Whisk aims to avoid similar pitfalls by focusing on abstract, exploratory outputs rather than exact replicas.
AI innovation spurs rivalry among tech giants
Google’s unveiling of Whisk highlights its broader strategy to dominate AI-driven consumer products.
The competition is fierce, with OpenAI recently introducing Sora, a text-to-video generator.
Google aims to solidify its advantage by integrating Whisk with Gemini’s capabilities and Imagen 3, signalling a shift toward dynamic, multi-modal AI tools.
Dan Ives, an equity analyst at Wedbush Securities, views Whisk as part of Google’s “treasure chest” of 2025 offerings, alongside its collaboration with Samsung and Qualcomm on a new Android operating system.
These initiatives demonstrate Google’s effort to maintain an edge in the highly lucrative and competitive AI landscape.
Generative AI tools like Whisk have captured public imagination but also faced scrutiny.
For instance, Gemini’s earlier issues with historically inaccurate image outputs raised concerns about AI reliability.
Whisk seeks to navigate these challenges by focusing on imaginative, user-directed creations.
As Google continues to refine its offerings, the tool’s initial rollout as a website for US users will provide a critical testbed for future updates and iterations.
Google’s AI ambitions
Whisk’s debut signals a broader evolution in how AI is used for consumer creativity.
By focusing on user-friendly interfaces and integrating advanced technologies like Gemini, Google aims to democratise access to generative AI.
However, the competition remains intense, with rival platforms pushing the boundaries of what AI can achieve.
The post Google unveils Whisk, a creative image tool powered by Gemini appeared first on Invezz