Google’s New Text-to-Image Generator Imagen Is Scary Accurate

Text-to-image generators are a fun way to create uncanny images using a short description and an AI. For example, using the AI Art Maker tool from Hotpot.ai, all you have to do is type in a few words or a description of something you want to see (e.g. a dog wearing a hat), and its model produces a picture with hilarious—and frequently creepy-looking—results based on your words (e.g. the dog looks like it’s melting).

The images produced from these AI tools aren’t perfect. After all, even machine-learning neural networks have their limits. However, that all might change with yesterday’s unveiling of Google’s new AI image generator, Imagen. The powerful text-to-speech model can create incredibly photorealistic images based off of a sentence-long description. The results—which can be found in a paper released by the Google Brain Team yesterday—are astounding.

Below are just a few images created by the AI—captioned with the sentence that was given to the model to create it. Keep in mind, these are cherry-picked examples from the team. Still, it’s pretty impressive:

a-photo-of-a-corgi-dog-riding-a-bike-in-times-square_d6rrvs — A photo of a Corgi dog riding a bike in Times Square. It is wearing sunglasses and a beach hat.
Courtesy of Google

a-brain-riding-a-rocketship_dt9w0g — A brain riding a rocketship heading towards the moon.
Courtesy of Google

a-photo-of-a-raccoon-wearing-an-astronaut-helmet_zwlwmm — A photo of a raccoon wearing an astronaut helmet, looking out of the window at night.
Courtesy of Google

You can see more on Imagen’s website.

The model isn’t available to the public. However, the team at Google claims that their AI is more powerful than other, similar text-to-speech generators such as the powerful VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2. To compare the quality of Imagen against those models, Google created DrawBench, a “comprehensive and challenging benchmark for text-to-image models.” For this, human volunteers evaluated and rated the images created by the different AI generators using a list of roughly 200 text prompts.

While fun, it should be noted that there’s a dark side to this type of AI image generation. After all, many of these models—including Google’s—are trained using data scraped from the internet, which we all know is filled with a whole lot of racist, sexist, and generally problematic crap (and that’s putting it lightly). As such, these algorithms often come with their own set of biases that have harmful results. It’s not hard to imagine bad actors weaponizing them in order to gin up fake images to harm someone’s reputation or sow discord in the news cycle.

Even the team behind Google’s new generator acknowledges this saying in the Imagen website, “there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.”

So it’s probably for the best that the model has a long way before it sees the light of day. Hopefully, when it does, it’ll be used to create more pictures of dogs wearing hats and less fake news.

220524-Tran-google-AI-generated-images-embed-02_xcl2eg — The model is capable of creating some of the most accurate text-to-image pictures out there.
Courtesy of Google

Google’s New Text-to-Image Generator Is Scary Accurate

It's Imagen-ation gone wild, but there's a potential dark side to this tool.

Tony Ho Tran