Watch the first 10:50 mins of this deep dive into the foundations of watsonx.ai by Kate Soule, Senior Manager, MIT-IBM Watson AI Lab, from the Q2 2023 IBM Tech Exchange. She describes IBM data strategy for training models plus introduces the various IBM Foundation Models to-be-released in watsonx.ai. We recommend watching only those first 11 mins as we'll return to Kate's video and watch section of Tuning models later in the Boot Camp.
Here are the key timelines for release of IBM's various foundation models. You also can read more details about the watsonx.ai model roadmap on Seismic. The broader watsonx roadmap is also on Seismic.
Want to learn more about the Transformers that everyone's talking about? Read this blog post which visually illustrates many aspects of transformers.
Kate quickly mentions Decoder-Only models plus Encoder-Decoder models. It's valuable to understand the distinction between these plus when to use each type of model. For exmple, you saw in Kate's presentation that IBM's Granite models (like MPT) are decoder-only models while IBM's Sandstone models are Encoder-Decoder models (like flan-ul2). Here's a quick introduction to the difference between encoder decoder model architecture and capabilities.
OK, now which of these architectures is used by the GPT family of models?
- Encoder-Only
- Encoder-Decoder
- Decoder-Only