A foundation model can be trained on one data modality such as text, or several, such as text and images (like ChatGPT), or even sound and video. Two prominent types of foundation models pushing generative AI forward are transformer machine learning models and large language models. Both are neural networks involving hundreds of millions, even trillions, of prediction-related parameters.
What makes these models uniquely powerful—and perhaps endlessly adaptable—is that their capabilities are not task-specific. Because they are broadly trained across one or more data modalities, foundation models can learn new tasks with little to no additional training. As long as the task is within its domain, the AI can handle it.
What we’re seeing is this learning ability in action—and the race to harness its capabilities is on. Google, Microsoft, Baidu, and Meta have all created their own large language models, while other companies such as OpenAI have created large multimodal models. DeepMind’s Gato may be the most advanced multimodal AI model yet: It can complete more than 600 tasks, including chatting, captioning images, playing Atari video games, and stacking blocks with a robotic arm.
For now, most foundation models are fairly limited in the amount of data they’re trained on—mainly just natural language (text) and images. As models involve more and more varied data—video, 3D spatial data, protein structures, industrial sensor data—their potential uses and value will skyrocket. In fact, 97% of global executives in the Accenture report agree that AI foundation models will enable connections across data types, revolutionizing where and how AI is used.
The New Competitive Differentiator
Not surprisingly, the recent advances in AI foundation models have grabbed the attention of business leaders across the globe.
Companies are now experimenting with these models, adapting them for tasks that range from powering customer service bots to automated coding. And just as quickly as the models advance, organizations are discovering new ways to use them.
Take CarMax, for example. The company recently used OpenAI’s GPT-3 model to read and synthesize more than 100,000 customer reviews for every vehicle the company sells. The model then produced 5,000 summaries—a task the company says would have taken its editorial team 11 years to complete.
This example underscores an important point about how and why generative AI is poised to impact so many industries. Most companies won’t need to build their own foundation models. Instead, they can access existing models as platforms via open source channels or paid access. Just as companies lean on public cloud data centers, they will increasingly tap AI models created and offered by other companies. Thousands of applications have already been powered by OpenAI’s GPT-3, including copywriting, website building, and of course, chatbots.