What Does That Mean
Generative adversarial networks (GAN) are a highly effective and relatively new form of AI which is really good at learning complex tasks quickly. It works by creating two AIs which compete against each other to learn how to do the task; this makes them learn faster and also helps them find each other’s weak spots more quickly. Overall this technique leads to a much more robust model than one trained in isolation.
Transformers (GPT-3, etc) are another highly effective and even newer form on AI. They have a radically different topography from other neural nets, and this allows them to be very good at abstract creativity. For example, you can ask GPT-3 to write a creepy story about strawberry people, and it has no problem. Most of the work I’ve published here has been done on GPT-3.
VQGAN (Vector Quantized Generative Adversarial Network) is a new hybrid of GANs and Transformers. Together, this new type of AI is able to create complex abstract visual art. Dall-E is the latest OpenAI VQGAN. There are also other examples from all the other AI labs.
Google made a VQGAN called Imagen. NVidia made a VQGAN called GauGAN. OpenAI even has a new slimmer one called GLIDE. One of my favorite free ones is called Hypnogram. Hypnogram uses VQGAN+CLIP which is a popular open source technique for running this tech on Google’s cloud hardware.
How It’s Made
When these networks are trained, they learn about lots of things. The GAN portion of the network learns about paintings and photos, but also the style of particular artists and aesthetic movements. The Transformer portion of the network learns about words and text and ideas from philosophy to art criticism. This way, the transformer can understand the text input and instruct the GAN on what to draw.
The thing is, these networks learn a lot about the world but once their training is done and they are “born,” so to say, they stop learning. When the training period is over and they are being used to make pictures, they no longer absorb new information; it’s as though they are frozen in time at that point.
Imagine VQGAN is like a box with all the world’s classical artists trapped in there with a bunch of encyclopedias and magazines and whatever else they fed into the network when it was being trained. When training was done, the researchers trapped the box in a time loop. Every time you ask VQGAN to drawn a picture, the time loop resets, and all the workers inside forget everything except whatever they learned during training. They have all their books and artists and knowledge from the training period and nothing else.
DallE can tell you everything there is to know about Goya and Monet, but Dall-E doesn’t know anything at all about what Dall-E is, because Dall-E didn’t exist until after it was done being trained. At that point, it’s memory and knowledge was frozen in time. So if you ask Dall-E to draw Dall-E, it has to guess what you’re talking about. And as it turns out, it thinks Dall-E sounds like a brand of drones. 🤣
(Dall-E Mini is the name of the app that lets you easily use Dall-E online.)
You can try it out for yourself here: https://huggingface.co/spaces/dalle-mini/dalle-mini