AI text-to-image processors: Threat to creatives or new tool in the toolbox?
Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
An image produced from scratch by a video game designer using an AI tool recently won an art competition at the Colorado State Fair, as has been widely reported. Some artists are alarmed, but should they be?
For several years AI has been incorporated into tools used by artists every day, from computational photography within the Apple iPhone to image enhancement tools from Topaz Labs and Lightricks, and even open source applications. But because an image generated entirely by an AI tool won a competition, some see this as a tipping point — a sign of an AI catastrophe to come that will lead to widespread job displacement for those in creative fields including graphic design and illustration, photography, journalism, creative writing and even software development.
The winning image was generated using Midjourney, a cloud-based text-to-image tool developed by a small research lab by that name that is “exploring new mediums of thought and expanding the imaginative powers of the human species.” Their product is a text-to-image generator, the result of AI neural networks trained on vast numbers of images. The company has not disclosed its technology stack, but CEO David Holz said it uses very large AI models with billions of parameters. “They’re trained over billions of images.” Although Midjourney has only recently emerged from stealth mode, already hundreds of thousands of people are using the service.
There is suddenly a proliferation of similar tools, including DALL-E from OpenAI and Imagen from Google. According to a Vanity Fair story, Imagen provides “photorealistic images [that] are even more indistinguishable from the real thing.” Stable Diffusion from Stability.ai is another new text-to-image tool that is open-source and can run locally on a PC with a good graphics card. Stable Diffusion can also be used via art generator services including Artbreeder, Pixelz.ai and Lightricks.
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
Using is believing
As an avid hobbyist photographer who displays work in galleries, I have my own concerns that these tools could mark the end of photography. I decided to try Midjourney myself to see what it could output, and to better think through the possible ramifications. The following image was generated by trying variations on these text prompts: “An emerald-green lake backed by steep Canadian Rockies + A few patches of snow on the mountains + Soft morning light + mountains with green conifer forest + Sunrise + 4K UHD.”
This seems like an amazing result for a novice user. The total time it took from when I first accessed the system to the final image was less than 30 minutes. I must admit to experiencing a childlike wonder as I watched the image materialize in mere seconds from the prompts I supplied. This brought to memory a 60-year-old quote from science fiction writer and futurist Arthur C. Clarke: “Any sufficiently advanced technology is indistinguishable from magic.” It felt like magic.
There are others using Midjourney who display far more sophistication. For example, one user produced an “alien cat” image from more than 30 text prompts including: “cat+alien with rainbow shimmering scales, glowing, hyper-detailed, micro details, ultra-wide angle, octane render, realistic …” It appears that more detailed prompts can lead to more sophisticated and higher-quality images.
These AI text-to-image tools are already good enough for commercial endeavors. Creative artist Karen X. Cheng was engaged to create an AI-produced cover image for Cosmopolitan. To help generate ideas and the final image, she used DALL-E, or more specifically the newest version, DALL-E 2. Cheng describes the process including the search for the right set of prompts, noting that she generated thousands of images, modifying the text prompts hundreds of times over many hours before finding one image that felt right.
Text-to-image: A new tool or threat to a way of life?
In a LinkedIn post, Cheng commented: “I think the natural reaction is to fear that AI will replace human artists. Certainly, that thought crossed my mind, especially in the beginning. But the more I use DALL-E, the less I see this as a replacement for humans, and the more I see it as tool for humans to use — an instrument to play.”
I had the same feeling when using Midjourney. I posted the Canadian Rockies image on Flickr, an image-sharing site for artists — mainly photographers and digital artists — and asked for opinions. Specifically, I wanted to know whether people viewed an AI image generator as an abomination and threat or simply another tool. One professional responded: “I’ve also been playing around with Midjourney. I’m a creative! How can I NOT mess around with it to see what it can do? I am of the opinion that the results are art, even though it is AI-generated. A human imagination creates the prompt, then curates the results or tries to coax a different result from the system. I think it’s wonderful.”
A common refrain in the debate over AI is that it will destroy jobs. The response to this worry is often twofold: first, that many existing jobs will be augmented by AI such that humans and machines working together will produce better output by extending human creativity, not replacing it; second, that AI will also create new jobs, possibly in fields that did not exist before.
Entrepreneur and influencer Rob Lennon predicted recently that AI text and image generators will lead to new career opportunities, specifically citing “prompt engineering.” Prompt craft is the art of knowing how to write a prompt to get optimal results from an AI. The best prompts are concise while giving the AI context to understand the desired outcome. Already, PromptBase has started to market this service. Its platform enables prompt engineers to “sell text descriptions that reliably produce a certain art style or subject on a specific AI platform.”
Megan Paetzhold, a photo editor at New York magazine, put DALL-E to the test with assignments she would normally give to artists on her team. In the end, she called it “a draw” and noted: “DALL-E never gave me a satisfying image on the first try — there was always a workshopping process.” She added: “As I refined my techniques, the process began to feel shockingly collaborative; I was working with DALL-E rather than using it. DALL-E would show me its work, and I’d adjust my prompt until I was satisfied.”
Isn’t there a dark side?
Clearly, these tools can be used to produce high-quality content. While many creative jobs could ultimately be threatened, for now, text-to-image generators are an example of people and machines working together in a new area of artistic exploration. Ethically, the key is to disclose that an image or text was created using an AI generator so people know that the content has been produced by a machine. They may like the output or not, and in that regard, it is no different from any other creative endeavor.
This perspective will not satisfy everyone. Many writers, photographers, illustrators and other creatives — even if they agree that the AI generation tools lack refinement — believe it is only a matter of time until they, the creative professionals, are replaced by machines. Bloomberg technology editor Vlad Savov encapsulated these arguments, seeing these tools as both stifling and ripping off artists. He may ultimately be correct, though as a respondent to my Flickr query noted, “It is another kind of art, which is not necessarily bad and potentially allows for incredible creativity.” Another wrote, “I don’t feel threatened by AI. Everything changes.” It does. I guess we just thought there would be more time.
It is possible these tools are just one more in the artist’s kit. They will be used to produce images and text that will be enjoyed and sold. As Jesus Diaz writes in Fast Company: “Once you try a text-to-image program, the joy of artificial intelligence seems undeniable despite the many dangers that lie ahead.” This does not automatically mean that more traditional creative pursuits will vanish. Ironically there may come a time in the not-too-distant future when “human-made” will carry a cachet, and work produced without an AI image or text generator could command a premium.
Gary Grossman is the senior VP of technology practice at Edelman and global lead of the Edelman AI Center of Excellence.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read More From DataDecisionMakers