Artificial intelligence (AI) in the non-fungible token (NFT) space is becoming increasingly relevant. Generative art (that is, art that has been created by an autonomous system) has quickly emerged into one of the main categories of the NFT market, driving innovative projects and astonishing collections. From the works of AI art legends such as Refik Anadol or Sofia Crespo to Tyler Hobbs’s new QQL project, NFTs have become one of the main vehicles to access AI-powered art.
Generative art has been one of the quintessential machine-learning use cases, but only recently has the space achieved mainstream prominence. The leap has been mostly powered by computational gains and a new generation of techniques that can help models learn without requiring a lot of labeled datasets, which are incredibly limited and expensive to build. Even though the gap between the generative art community and AI research has been closing in the last few years, many of the new generative art techniques still haven’t been widely adopted by prominent artists, as it takes a while to experiment with these new methods.
Jesus Rodriguez is the CEO of IntoTheBlock.
The rise of generative AI has come as a surprise even to many of the early AI pioneers who mostly saw this discipline as a relatively obscure area of machine learning. The impressive progress in generative AI can be traced back to three main factors:
Multimodal AI: In the last five years, we have seen an explosion of AI methods that can operate across different domains such as language, image, video or sound. This has enabled the creation of models like DALL-E or Stable Diffusion, which generate images or videos from natural language.
Pretrained language models: The emergence of multimodal AI has been accompanied by remarkable progress in language models with methods like GPT-3. This has enabled the use of language as an input mechanism to produce artistic outputs such as images, sounds or videos. Language has played a paramount role in this new phase of generative AI as it has lowered the barrier for people to interact with generative AI models.
Diffusion methods: Most of the photo-realistic art produced by AI methods that we see today is based on a technique called diffusion models. Prior to diffusion models coming onto the scene, the generative AI space was dominated by methods such as generative adversarial networks (GAN) or variational auto-encoders (VAE), which have trouble scaling and suffer from lack of diversity of generated outputs. Diffusion models address those limitations by following an unconventional approach of destroying the training data images until they are complete noise and reconstructing them back. The reasoning is that if a model is able to reconstruct an image from something that is, theoretically, noise, then it should be able to do it from pretty much any representation, including other domains like language. Not surprisingly, diffusion methods have become the foundation of text-to-image generation models like DALL-E and Stable Diffusion.
The influence of these methods in generative art has coincided with the emergence of another technology trend: NFTs, which have unlocked incredibly important capabilities for digital art such as digital ownership, programmable incentives and more democratized distribution models.
See also: Why NFT Artists Shouldn’t Expect ‘Royalties’ | Opinion
Text to image: Text-to-image (TTI) synthesis has been the most popular area of generative AI within the NFT community. The TTI space has produced some AI models that are literally transcending into pop culture. OpenAI’s DALL-E has arguably become the best-known example of TTI used to generate artistic images. GLIDE is another TTI model created by OpenAI, which has been adopted in many generative art settings. Google has been dabbling into the generative art space, experimenting with different approaches such as Imagen, which is based on diffusion models, or Parti, which is based on a different technique called autoregressive models. Meta has also been cultivating the generative art community with models like Make-A-Scene. AI startups are making inroads in the TTI space as well with models like Midjourney gaining a vibrant community via its Discord distribution and Stability AI shocking the AI community by open sourcing Stable Diffusion.
From an NFT perspective, TTI models have seen the widest adoption because a disproportionate percentage of digital art collectibles today are represented as static images.
Text-to-video: Text-to-video(TTV) is a more challenging aspect of generative art but one in which we are seeing major progress. Meta and Google recently published TTV models such as Make-A-Video and Imagen Video, which can generate high-frame-fidelity video clips based on natural language. Video is one of the most active areas of research for generative art, and we should expect most image generation models to have video equivalents. Videos are still not as prominent in the NFT space as images, but this is likely to change as TTV models become more widely adopted by generative artists. Video is one of the areas that differentiates digital art from traditional art.
Read more: NFTs Can and Will Be So Much More | Opinion
Image-to-image: Image generation via textual inputs feels almost natural but has limitations when it comes to capturing aspects such as positions between different objects, orientation or even very specific details of scenery. Sketches or other images are a better mechanism to convey this information. Several of the top diffusion models such as DALL-E, Stable Diffusion and Imagen all incorporate mechanisms for generating images from sketchers. Similarly, these models incorporate techniques such as in-painting or out-painting, which allow for extending images within or beyond their original borders.
Most of the best-established generative art practices focus on creating images from other images. Not surprisingly, several popular generative art NFT collections are based on variations of image-to-image methods.
Music generation: Automatic music generation has been another common use case in generative AI that has gained prominence over the last few years. OpenAI has also been at the forefront of this revolution with models including MuseNet and, more prominently, Jukebox, which is able to generate music in various styles and genres. Recently, Google entered the space with AudioLM, a model that creates realistic speech and piano music simply by listening to sound snippets. Stability AI-backed Harmonai started pushing the boundaries of the AI music generation space with the release of Dance Diffusion, a set of algorithms and tools that can generate original clips of music.
AI-generated music is one of the biggest areas in which NFTs can deliver unique value. Different from other art forms, music is distributed in digital form. Generative AI can evolve into a natural complement for music producers, and NFTs offer creators unique ways to express ownership in music clips or songs.
Throughout the history of technology there have been several instances in which relatively different trends are able to influence each other to gain incredible market share. The most recent example is the social-mobile-cloud revolution in which each one of those trends expanded the market of the other two. Generative AI and NFTs are starting to exhibit a similar dynamic. Both trends have been able to bring a complex technology market to mainstream culture. NFTs complement generative AI with digital ownership and distribution models that would be nearly impossible to implement otherwise. Similarly, generative AI is likely to become one of the most important sources of NFT creation.