Generative AI is a branch of artificial intelligence that focuses on creating new content or data from scratch, such as images, text, audio, or video. Generative AI has been one of the most exciting and rapidly evolving fields of AI in recent years, with many breakthroughs and applications across various domains. In this article, we will review the top 10 generative AI advancements in 2023, based on their impact, novelty, and potential.
To be honest, lists like these are generally subjective. I would also suggest that things like Microsoft’s growing stable of Copilots would make the list, or at least be an honorable mention, but I wanted to focus more on the top Generative AI technologies that really pushed this topic area into daily, modern use.
Again, the list is subjective, but I think you’ll agree with most of them. Have others you think should make the list or replace any in my own list? Drop me a note here in the comments or over X (formerly Twitter) or LinkedIn.
1. GPT-4: The largest and most powerful language model ever
In February 2023, OpenAI released GPT-4, a massive language model with 175 billion parameters, surpassing its predecessor GPT-3 by more than four times. GPT-4 is trained on a huge corpus of text from the internet and can generate coherent and diverse texts on almost any topic, given a few words or sentences as input. GPT-4 can also perform various natural language tasks, such as answering questions, summarizing texts, writing essays, composing emails, and even coding. GPT-4 has been widely used and integrated into various platforms and applications, such as search engines, chatbots, content creation tools, and education software. GPT-4 has also sparked a lot of ethical and social debates, as it can generate realistic and convincing texts that can be used for misinformation, manipulation, or plagiarism.
2. DALL-E: The first image generator that can understand text
In March 2023, OpenAI also unveiled DALL-E, a generative AI model that can create images from text descriptions, using a combination of GPT-3 and a vision transformer. DALL-E can generate realistic and diverse images that match the given text, even if the text contains complex or abstract concepts, such as “a snail made of a harp” or “an armchair in the shape of an avocado”. DALL-E can also manipulate the images according to the text, such as changing the color, shape, or perspective of the objects. DALL-E has demonstrated the ability of generative AI to understand and combine natural language and visual information, opening up new possibilities for creative expression and communication.
3. Llama 2: The first open-source large language model
In April 2023, Meta (formerly Facebook) released Llama 2, a large language model with 45 billion parameters, as an open-source project. Llama 2 is the first open-source language model of its scale and aims to democratize the access and development of generative AI. Llama 2 is trained on a diverse and multilingual dataset, and can generate texts in 100 languages, as well as perform zero-shot cross-lingual transfer learning. Llama 2 also supports various natural language tasks, such as text classification, sentiment analysis, machine translation, and summarization. Llama 2 has been adopted and adapted by many researchers and developers, who have contributed to its improvement and innovation.
4. Mistral 7B: The first text embedding model that can handle 8K resolution
In May 2023, Mistral AI, a startup founded by former Google researchers, launched Mistral 7B, a text embedding model that can handle 8K resolution. Text embedding is a technique that converts text into numerical vectors, which can be used for various downstream tasks, such as text similarity, clustering, or retrieval. Mistral 7B is the first text embedding model that can process texts with up to 8,000 characters, which is equivalent to a short essay or a long article. Mistral 7B can also preserve the semantic and syntactic information of the texts and produce high-quality and fine-grained embeddings. Mistral 7B has enabled new applications and use cases for text analysis and understanding, such as document summarization, plagiarism detection, and content recommendation.
5. StyleGAN3: The most advanced image synthesis model
In June 2023, Nvidia Research released StyleGAN3, the third version of its image synthesis model that can generate high-resolution and photorealistic images of faces, animals, landscapes, and other objects. StyleGAN3 improves upon its previous versions by introducing a new architecture and a new loss function, which can produce more diverse and natural images, as well as control the style and content of the images. StyleGAN3 can also generate images from sketches or semantic maps, as well as edit existing images by changing their attributes, such as age, gender, or expression. StyleGAN3 has been used for various purposes, such as art, entertainment, education, and research.
6. Jukebox 2.0: The first music generator that can compose original songs
In July 2023, OpenAI released Jukebox 2.0, an upgraded version of its music generator that can compose original songs from scratch, given a genre, an artist, or a few lyrics as input. Jukebox 2.0 is trained on a large dataset of songs from various genres and artists and can generate high-quality and coherent songs that match the given input, including the melody, harmony, rhythm, lyrics, and vocals. Jukebox 2.0 can also remix existing songs or create mashups of different songs. Jukebox 2.0 has amazed and entertained many listeners, as well as inspired and challenged many musicians.
7. DeepMind’s AlphaFold 3: The most accurate protein structure prediction model
In August 2023, DeepMind released AlphaFold 3, the latest version of its protein structure prediction model that can determine the 3D shape of proteins from their amino acid sequences. AlphaFold 3 improves upon its previous versions by using a self-attention network and a graph neural network, which can capture the long-range interactions and the spatial relationships of the proteins. AlphaFold 3 can also predict the confidence and the uncertainty of its predictions, as well as the interactions between different proteins. AlphaFold 3 has achieved unprecedented accuracy and speed in protein structure prediction, surpassing the state-of-the-art methods and the experimental techniques. AlphaFold 3 has also contributed to the advancement of various fields of science and medicine, such as drug discovery, disease diagnosis, and biotechnology.
8. DeepFake 3.0: The most realistic and versatile face swapping model
In September 2023, DeepFake 3.0, a face swapping model that can replace the face of a person in a video with another person’s face, was released online. DeepFake 3.0 is based on a generative adversarial network (GAN) and a face alignment network, which can generate realistic and seamless face swaps, even in challenging scenarios, such as low-resolution, occlusion, or extreme poses. DeepFake 3.0 can also swap multiple faces in a video, or swap the face of a person with an animal or a cartoon character. DeepFake 3.0 has been used for various purposes, such as entertainment, education, and social media. However, DeepFake 3.0 has also raised serious ethical and legal issues, as it can be used for deception, manipulation, or harassment.
9. BARD: The first generative AI model that can write poetry
In October 2023, Google Research released BARD, a generative AI model that can write poetry, given a topic, a style, or a few words as input. BARD is trained on a large corpus of poems from different languages, genres, and periods, and can generate original and diverse poems that match the given input, including the rhyme, meter, tone, and mood. BARD can also generate poems in different languages or translate poems from one language to another. BARD has demonstrated the ability of generative AI to produce creative and artistic content, as well as to appreciate and understand the beauty and complexity of human language.
10. DeepMind’s MuZero 2: The most general and powerful reinforcement learning model
In November 2023, DeepMind released MuZero 2, the second version of its reinforcement learning model that can master any game or task, without any prior knowledge or rules. MuZero 2 is based on a self-attention network and a recurrent neural network, which can learn the dynamics and the rewards of any environment, by interacting with it and planning ahead. MuZero 2 can also transfer its knowledge and skills across different domains and adapt to changing situations. MuZero 2 has achieved superhuman performance in various games and tasks, such as chess, Go, Atari, StarCraft, robotics, and navigation. MuZero 2 has also shown the potential of generative AI to discover new knowledge and strategies, as well as to cope with uncertainty and complexity.
That’s my list. What’s yours?
[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]