<aside> 👉 Hi everyone! My name is Jack Morris. I’m a PhD student at Cornell studying NLP, and I’m interested in AI in general. To read more of my writing, check out my blog: https://jxmo.io/

</aside>

The world of Artificial-Intelligence generated art has exploded over the last twelve months. In January 2021, OpenAI released two models that changed the game: DALL-E and CLIP. These models showed what might be possible by generating visual art from text-based prompts.

“Kowloon City, in the style of Wes Anderson”, from Twitter user @somnai_dreams

“Kowloon City, in the style of Wes Anderson”, from Twitter user @somnai_dreams

The release of DALL-E and CLIP kickstarted a new wave of work in AI art. Digital artists organized on Twitter, Github, and Discord developed tools for prototyping and generating art. For the technically proficient, artists shared their work in the form of Colab notebooks. For those without coding skills, art can be created through new websites and tools that allow you to harness the power of these deep learning models without writing any code.

How much have things really changed? Take a look at some “AI art” generated by pre-2021 techniques:

Untitled

Untitled

Untitled

Now consider the following three images, generated in 2021 with CLIP, DALL-E, or related technologies (feel free to zoom):

Screen Shot 2022-01-28 at 2.20.54 PM.png

Screen Shot 2022-01-28 at 2.21.34 PM.png

Untitled

The reality right now is really, really crazy.

I don’t think the majority of AI researchers would even have suspected that these images could be created with current tools. The rapidity of the past year’s developments have surprised even some of the most bullish technologists.

The AI art we had before 2021 was intriguing, but tended to be abstract, esoteric, and just not that relatable to a human. The AI art we have now is fully controllable, and can be about whatever you want it to be.

What changed? Well, there’s something to be said for the new wave of publicity and interest, which certainly accelerated the pace of our art-generation techniques. But the main development is the rise of multimodal learning.

Multimodal learning, in this case, is learning to match up text and images. Our new models are really good at learning to write captions for images, and (more importantly for artistic purposes) to generate images that correspond to a given caption.

We already had AI-generated images that were high-resolution (see thispersondoesnotexist.com for an example). What’s changed is that our new joint image-text models give us some control over image generation. And when we combine multimodal models with really good image generation models that we already had, we get results like this:

Screen Shot 2022-01-28 at 2.34.19 PM.png

The rise of multimodal models has jumpstarted research into the best way to create beautiful AI art from a text-based prompt. A whole new generation of deep-learning-researcher-slash-artists have emerged from the depths of the Internet and joined forces to push the science forward. We’ve observed an interesting confluence of empirical science (deep learning works) plus engineering (so many hyperparameters to set: smoothness, color range, clipping, optimization learning rate, initialization...).

Before we take a look at some art generated with CLIP and DALL-E and related models, let’s consider some of the art generated from before we developed text-based prompting. (Or, if you’re not interested in any of the history, scroll to the bottom to see some cool pictures!)

Early forms of AI art