ibm-skills-ai-colab-sessions

IBM Skills Build AI Fundamentals - Colab - Sessions

Objectives

A. Portfolio focused Project Based Learning

B. Self Directed Configuration of VSCode and Python Locally

...

A. Sessions: Project Based Learning

Artefacts from Live Technical Sessions in the form of:

Session 1: Python Fundamentals (for beginners and new to Python). (2024.06.19)
- 🖇️ Session1.ipnyb:
  CoLab Run -> :
  - NB: Was familar with Python Fundamentals from previous software engineering efforts and courses.
    - i) Lists, Tuples, and Dictionaries
    - ii) Basic Python Operations
    - iii) Flow Control Structoures
    - iv) Handling errors
    - v) Functions
  - Recommended Activities
    1. Code with Mosh Complete Python Mastery
    2. Practice Katas, for example, Code Wars, CodeSignal
Session 2: Machine Learning Models and Methodologies Fundamentals. (2024.07.02)
- 🖇️ Session2.ipnyb
  CoLab Run -> :
  - i) Regressions
  - ii) Classifications
  - iii) Clustering
  - iv) Recommender Systems
Session 3: Generative AI Lab (2024.07.16)
- 🖇️ Session3_VAE.ipnyb:
  CoLab Run ->
  - i) Load Datasets
  - ii) Encoders
  - iii) VAE Sampling
  - iv) Decoders
  - v) VAE Model
  - vi) VAE Loss
  - vii) Model Training
  - viii) Display Images (func)
- 🖇️ Session3_Transformers.ipnyb:
  CoLab Run ->
  - i) Setups/Imports
  - ii) Load Datasets
  - iii) Load Transformer Model (BERT)
  - iv) Training Params
  - v) Trainer
  - vi) Model Evaluation
  - vii) Predictions
Session 4: ~~OpenAI~~ Anthropic Text Completions (2024.07.30)
- 🖇️ Session4 Anthropic Text Completion.ipynb:
  CoLab Run -> , Anthropic, ~~not OpenAI~~
  - i) Install
  - ii) Intiatiate API Key
  - iii) Model Functions
    - Original
    - Refactored
  - vi) Examples
  - v) Interactive Prompt

^{Back to Top | To Acknowledgements}

B. Machine Learning Methods & Approaches

Session 2: Unsupervised Learning Models.
Session 3:
- 3.1 GenAI: VAE
- 3.2 GenAI: Tuning Transformers
Session 4 Embeddable AI: ChatBot & Text Completion

OpenAi Issue: #7

The OpenAI API Key was not being issued due to a CORS policy, so Anthropic was switched, and the Session4 notebook was duplicated. All subsequent references will incluude Anthropic, not OpenAI, as alternative LLM provider

Use the Jumpto buttons to launch Google Colab per Sessions' cell

Unsupervised Learning `Session 2`

Objective: Understand the theory and hands-on implementation of:
1️⃣ Regression,
2️⃣ Classification,
3️⃣ Clustering and
4️⃣ Recommender Systems.

1️⃣ Regression

NumPy^A

NumPy, short for "Numerical Python," is a powerful library used in Python programming for numerical and scientific computing.

NumPy like a supercharged version of Python's built-in list data structure, designed to handle large amounts of data more efficiently.

^| ^|

Matplotlib^B

Matplotlib is a powerful library in Python used for creating visualizations, such as graphs and charts.

MatplotLibs is particularly useful for data scientists, engineers, and anyone who needs to visualize data to understand and communicate trends, patterns, and insights

^| ^|

Linear Regression

...

^{Back to Top | To Acknowledgements}

2️⃣ Classification

SciKit_Learn

Scikit-learn is a popular Python library for machine learning, offering simple and efficient tools for data analysis and modeling

SciKit_Learn (sklearn) provides a wide range of algorithms for classification, regression, clustering, and dimensionality reduction.

It integrates well with other scientific libraries like NumPy and pandas
As such, makes it easy to build and evaluate machine learning models.
Is widely used for its
- ease of use,
- comprehensive documentation, and
- versatility in handling different machine learning tasks.

^| ^|

Logistic Regression

Logistic Regression models, as a type of linear models, are used as a common workflow in classification tasks (like binary classification) where you want to estimate the likelihood of a data point belonging to different categories.

^{Back to Top | To Acknowledgements}

3️⃣ Clustering

K-Means Clustering
Hierarchical Clustering
DBSCAN

Are just 3 of the 26 algorithms from sklearn.cluster, and the rest are out of scope for this purpose.

K-Means Clustering

K-Means is an unsupervised machine learning algorithm that partitions a dataset into k distinct clusters based on similarities, aiming to minimize the sum of squared distances between data points and their assigned cluster centroids

It minimizes within-cluster variances (squared Euclidean distances), facilitating partitioning by mean rather than Euclidean distances.

^|

Hierarchical Clustering

Hierarchical Clustering (a la Agglomerative Clustering) is an unsupervised machine learning algorithm that groups unlabeled data points into a hierarchy of clusters based on their similarity. An analytical method that seeks to build a hierarchy of clusters by either merging or splitting them based on data observations.

It builds a cluster hierarchy in the form of a tree-like structure called a dendrogram, where each merge or split is represented by a node

Agglomerative (Bottom-up) - Starting small, think of this as starting with one feature as its own group
Divisive (Top-down) - Starting big, think of this as starting with the whole box of features as one big group

^|

DBSCAN

DBSCAN is an unsupervised clustering algorithm that groups together closely packed data points based on their density, while identifying points in low-density regions as outliers or noise.

DBSCAN is known as Density-Based Spatial Clustering of Applications with Noise.

It operates by defining clusters as areas where a minimum number of points (minPts) exist within a specified radius (epsilon) around each point, allowing it to detect clusters of arbitrary shapes and effectively handle noise in datasets

^|

^{Back to Top | To Acknowledgements}

4️⃣ Recommender Systems

Recommender systems are a type of information filtering system that predict the "rating" or "preference" a user would give to an item. They help users discover items they might like but haven't encountered yet. The algorthmic steps are somewhat as follows:

  i.   Idenitify a target to compare.
  ii. Find similar targets.
  iii. Calculate an average value for similar targets.
  iv. Sort the high ranking values for recommendations.
  v.   Display the recommendation.

Pandas

Pandas aims to be the fundamental high-level building block for doing practical, real world data analysis in Python; designed to make working with "relational" or "labeled" data both easy and intuitive

Pandas, as a python package, has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language, via fast, flexible, and expressive data structures. It is already well on its way towards this goal.

^| ^|

Cosine Similarity (`SKLearn.metrics`)

SKLearn.metrics is part of 3 APIs used for evaluating the quality of a model's predicitions; specifically implementing functions assessing prediction error for targeted purposes.

Score functions, performance metrics, pairwise metrics and distance computations.

SciKit's Metric Pairwise sub module implements utilities to evaluate pairwise distances or affinity of sets of samples.

Pairwise metrics is involved with subset of data transformations Pairwise metrics, affinities and Kernels, specifically covering transforming feature spaces into affinity spaces.
Cosine Similarity is a popular choice for computing the similarity of documents represented as tf-idf vectors.
- tf-idf vectors:
  - TF-IDF stands for: Term Frequency-Inverse Document Frequency.
  - They represent text documents as numerical vectors, where each dimension corresponds to a unique word
- called so as Euclidean (L2) normalization projects the vectors onto the unit sphere.
  - As their dot product is then the cosine of the angle between the points denoted by the vectors.
- accepts scipy.sparse matrices.
- computes the L2-normalized dot product of vectors. That is, if \(x\) and \(y\) are row vectors, their cosine similarity \(k\) as follows for equation display:

Equation Display

The following equation represents the function \( k(x, y) \):

$k(x, y) = \frac{x y^\top}{\|x\| \|y\|}$

^|

^{Back to Top | To Acknowledgements}

Generative AI `Session 3`

These sessions needs to be run on if local system compute are not configured or specified for GPU loads.

Objective: Exploring and implementing basic generative models as well as pre-trained foundational models; along with fine tuning these models for specific tasks:

1️⃣ Generative AI: Variational Autoencoders (VAE) (3.1)
2️⃣ Generative AI: Fine Tuning Transformers (3.2)

1️⃣ Generative AI: VAE: `Session 3.1`

VAE is an unsupervised learning technique where the machine is using and analyzing unlabeled data sets. With this method, the model can learn patterns in the data and learn how to reconstruct the inputs as its outputs after significantly downsizing it.

Autoencoders have four main layers: encoder, bottleneck, decoder, and the reconstruction loss.
- The encoder is the given input with reduced dimensionality.
- The bottleneck is the compressed representation of the encoded data.
- The decoder is the reconstructed version of the original output.
- The reconstruction loss is the difference between the original output and the reconstructed output.

Input ➡️ Encoder ➡️ Bottleneck ➡️ Decoder ➡️ Ouput

TensorFlow

TensorFlow is an end-to-end open source platform for machine learning and it is easy to create ML models that can run in any environment.

It has a comprehensive, flexible ecosystem of tools, libraries, and community resources to build and deploy ML-powered applications.
- Lite lirbaries for mobile and edge devices
- Browser libraries
- ML models & datasets
- Developer tools for model evaluation, performance optimisation and productising ML workflows.

^| ^| ^|

TensorFlow & Keras 3 (Source:^PyPi)

Keras is a multi-backend deep learning framework, with support for JAX, TensorFlow, and PyTorch

It provides an approachable, highly-productive interface for solving machine learning (ML) problems, with a focus on modern deep learning.
Build and train models for computer vision, natural language processing, audio processing, timeseries forecasting, recommender systems, etc.
To use keras, you should also install the backend of choice: tensorflow, jax, or torch.
NB: Note that tensorflow is required for using certain Keras 3 features: certain preprocessing layers as well as tf.data pipelines.
- Keras 3 is intended to work as a drop-in replacement for tf.keras (when using the TensorFlow backend).

^| ^| ^| ^|

^{Back to Top | To Acknowledgements}

2️⃣ Generative AI: Tuning Transformers: `Session 3.2`

Foundational Models are large scale models pre-trained on vast ammount of data, broad and diverse datasets, for adaption to downstream tasks. These can be fine tuned for specific applications by building more specialized models.

BERT, a type of transformer model, is used in this session (3.2).
- Designed to underdstand a words's context in search queries.
- It does this by looking at the words that come before and after it
- It is bidirectional: BERT reads entire sequence of words one, considering the full context of each word.
- Is excellent for understanding text and it's context, thus ideal for deep understanding and analysis of language.
Pre-training models allows models to learn general language patterns, structures, and representations.
Fine-tuning: The process of adapting pre-trained models to specific tasks, using smaller, task specific datasets.
- Customises the model to improve performance in specific applications without needing to train it from scratch.

Accelerate

HuggingFace's 🤗 library that enables the same PyTorch code to be run across any distributed configuration.

It's run your raw PyTorch training script on any kind of device.
Accerlate was created for PyTorch users who like to write the training loop of PyTorch models ..
- ... but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.
Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16.

^| ^|

Transformers

HuggingFace provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and share them on HuggingFace's mode; hub.

It provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

^| ^|

UseCases (Source:^PyPi)

These models can be applied on:

📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
🖼️ Images, for tasks like image classification, object detection, and segmentation.
🗣️ Audio, for tasks like speech recognition and audio classification.
Transformer models can also perform tasks on several modalities combined, such as
- Table question answering,
- Ooptical character recognition,
- Information extraction from scanned documents,
- Video classification, and
- Visual question answering.

^{Back to Top | To Acknowledgements}

Embeddable AI | ChatBot & Text Completion `Session 4`

These sessions needs to be run on if local system compute are not configured or specified for GPU loads.

Objective: *Understand the theory and hands-on implementation of *:

1️⃣ Embedded AI- Hands-on Chatbots

Embedded AI- Hands-on Chatbots using Python, Jupyter Notebook.

Integrate the chatbot with OpenAI's GPT-4-o model to give it a high level of intelligence and the ability to understand and respond to user requests

2️⃣ Embedded AI - IBM Watson Speach to Text / Text-Speech

Embedded AI- Hands-on Chatbots using Python, Flask, HTML, CSS, and Javascript.

Implement IBM Watson Speech-to-Text functionality to allow the chatbot to understand voice input from users.

Implement IBM Watson Text-to-Speech functionality to allow the chatbot to communicate with users through voice output.

1️⃣ OpenAI ChatBot on Notebook: `Session 4.1`

ported to (Issue #8).

OpenAI is a leading AI research company that offers powerful models for text completion and chatbot development.

Its GPT models excel at understanding and generating human-like text, enabling applications like:

Text Completion: Predicting and suggesting subsequent words in a sentence or paragraph.

Chatbots: Building interactive conversational agents that can engage in natural, dynamic dialogue with users

OpenAI provides APIs and tools for easy integration of these capabilities into various platforms and applications.
- LLM Provider registration/login and Freeemium Subscription 💳🔐

^| ^| ^| ^|

Issue: #7 | 🐛 [Bug]: External | OpenAI bug with API Key generation
- ➡️➡️ Switch to another LLM Provider/Platform: Anthropic Claude Sonnet 3.5
Request: #8: Update Session4 Notebook or duplicate/mirror OpenAi variant.

^| ^| ^| ^|

https://docs.anthropic.com/en/api/client-sdks

2️⃣ Embedded AI - IBM Watson Speach to Text (STT) / Text-Speech (TTS): `Session 4.2`

For a live session, this was not covered, due to the extant requirements of IBM WatsonX.ai registration, credit card approvals and multiple Entitlements and Trial License; as well as local access/technical configuration of each of these SST and TTS models.
- IBM ID/Cloud account registration/login required - 🔐 (** elective)
This is guided project is optional to the requirments of accrditation completion and will be updated here in an external (private) repository. Access by demand. WIP.

^| ^| ^| ^| ^|

^{Back to Top}

Acknowledgements

References

IBM SkillsBuild

ⁿ
ⁿ
ⁿ

IBM Developer

ⁿ IBM Developer (2023-12-08) "Implement autoencoders using TensorFlow" (Accessed: July 2024); URL https://developer.ibm.com/tutorials/implement-autoencoders-using-tensorflow/
ⁿ
ⁿ
ⁿ

Author

^| ^|

ChangeLog

Date^a	Version	Changed By	Change	Activity
2024-07-16	0.1	Charles J Fowler	Initial version created	Create
2024-07-27	0.2	Charles J Fowler	Draft Portfolio version	Modify
^a: `YYYY-MM-DD`

^{Back to Top}

Files

Sessions.md

Latest commit

History

Sessions.md

File metadata and controls

ibm-skills-ai-colab-sessions

IBM Skills Build AI Fundamentals - Colab - Sessions

Objectives

A. Sessions: Project Based Learning

B. Machine Learning Methods & Approaches

OpenAi Issue: #7

Unsupervised Learning Session 2

Objective: Understand the theory and hands-on implementation of: 1️⃣ Regression, 2️⃣ Classification, 3️⃣ Clustering and 4️⃣ Recommender Systems.

1️⃣ Regression

NumPyA

MatplotlibB

Linear Regression

2️⃣ Classification

SciKit_Learn

Logistic Regression

3️⃣ Clustering

K-Means Clustering

Hierarchical Clustering

DBSCAN

4️⃣ Recommender Systems

Pandas

Cosine Similarity (SKLearn.metrics)

Equation Display

Generative AI Session 3

Objective: Exploring and implementing basic generative models as well as pre-trained foundational models; along with fine tuning these models for specific tasks: 1️⃣ Generative AI: Variational Autoencoders (VAE) (3.1) 2️⃣ Generative AI: Fine Tuning Transformers (3.2)

1️⃣ Generative AI: VAE: Session 3.1

TensorFlow

TensorFlow & Keras 3 (Source:PyPi)

2️⃣ Generative AI: Tuning Transformers: Session 3.2

Accelerate

Transformers

Embeddable AI | ChatBot & Text Completion Session 4

Objective: *Understand the theory and hands-on implementation of *:

1️⃣ OpenAI ChatBot on Notebook: Session 4.1

2️⃣ Embedded AI - IBM Watson Speach to Text (STT) / Text-Speech (TTS): Session 4.2

Acknowledgements

References

IBM SkillsBuild

IBM Developer

Author

ChangeLog

Unsupervised Learning `Session 2`

Objective: Understand the theory and hands-on implementation of:
1️⃣ Regression,
2️⃣ Classification,
3️⃣ Clustering and
4️⃣ Recommender Systems.

NumPy^A

Matplotlib^B

Cosine Similarity (`SKLearn.metrics`)

Generative AI `Session 3`

Objective: Exploring and implementing basic generative models as well as pre-trained foundational models; along with fine tuning these models for specific tasks:

1️⃣ Generative AI: Variational Autoencoders (VAE) (3.1)
2️⃣ Generative AI: Fine Tuning Transformers (3.2)

1️⃣ Generative AI: VAE: `Session 3.1`

TensorFlow & Keras 3 (Source:^PyPi)

2️⃣ Generative AI: Tuning Transformers: `Session 3.2`

Embeddable AI | ChatBot & Text Completion `Session 4`

Objective: Understand the theory and hands-on implementation of :

1️⃣ OpenAI ChatBot on Notebook: `Session 4.1`

2️⃣ Embedded AI - IBM Watson Speach to Text (STT) / Text-Speech (TTS): `Session 4.2`