Artificial Intelligence, Machine Learning, Deep Learning

AL ML Deep Learning
Originated 1950 1960 1970
What Simulated Intelligence in Machines Machine making decisions without being programmed Using Neural networks to solve complex problems
Objective Building machines which can think like humans Algo which can learn thru data Neural n/w to identify patterns

Activation Function

Relu?
Relu is operation carried on 2D Tensor. if value is less than 0, Take 0 else take the original value

#Means max(x,0)

def relu(x):     
    if (element < 0)
        Replace with 0
    else
        Keep element as it is

def naive_relu(x):                      #x is 2D Tensor
    assert len(x.shape) == 2

    x = x.copy()                        #copy to avoid changing input
    for i in range(x.shape[0]):         #x.shape[0]=2
        for j in range(x.shape[1]):     #x.shape[1]=3
            x[i, j] = max(x[i, j], 0)
return x            

a = np.array([              #Dimension=2. Shape(2,3)
            [0, 1, -2],
            [4, 5, -6],
        ])
b = naive_relu(a)
print(b)
[[0 1 0]
 [4 5 0]]

        
Adding activation function to a layer introduces non-linearity into the model, allowing it to learn more complex relationships between the input and output data
Non-linear activation functions such as ReLU, Sigmoid, and Tanh can help the model to better fit the training data and make more accurate predictions on new data.

Conda

Miniconda is the recommended approach for installing TensorFlow with GPU support
It creates a separate environment to avoid changing any installed software in your system. This is also the easiest way to install the required software especially for the GPU setup.

Optimizers

optimizer is an algorithm used to adjust the parameters of a model in order to minimize the error or loss function
Optimizer Meaning Example
1. RMSprop (Root Mean Square Propagation) It divides the learning rate for a weight by a running average of the magnitudes

from keras.optimizers import RMSprop
optimizer = RMSprop(learning_rate=0.001, rho=0.9)
                    
2. Stochastic Gradient Descent (SGD) Adjusts the model parameters based on the average gradient of the loss

from keras.optimizers import SGD
optimizer = SGD(learning_rate=0.01, momentum=0.9)
                

Overfitting

Means that the model performs well on the training data, but it does not generalize well(ie produces good results on real world/unseen data), because there is too of much uneccessary data(noise) in traning data.
Regularization: Constraining a model to make it simpler and reduce the risk of overfitting.

Bias/Sampling Bias

We should use a training data set that is representative of the cases we want model to predict.
if the sample is too small, you will have sampling noise (i.e., nonrepresentative data as a result of chance), but even very large samples can be nonrepresentative if the sampling method is flawed. This is called sampling bias.

CNTK

This is Microsoft Cognitive Toolkit (CNTK) backend, plugged with keras.

Keras

Library(in Python) which provides functions/APIs to build deep-learning models. Different backends can be plugged with keras
            Keras
            Tensorflow / Theano / CNTK
             CUDA           BLAS,Eigen
             GPU            CPU
        

Large language model

This is a computational model that can do natural language processing tasks such as classification.
LLMs learns from LM by self-supervised and semi-supervised training.
LLMs can be used for text generation, a form of generative AI, by taking an input text

Llama (Large Language Model Meta AI)

A LLMs released by Meta AI starting in February 2023.
Llama Models
Llama 2 (Released on Jun,2023) Llama 3 (Released on Apr,2024)
Trained Model sizes 7, 13, and 70 billion parameters 8B and 70B parameters
Improvments fine-tuned for dialog wrt Llama-1 400B+ parameters is currently being trained

Layer / class or Function

Layer processes input data(tensor) and produces an output(tensor) in specific format. Neural network is created by cascading multiple layers.
Types of Layers:
Dense Layer / Fully connected layer Convolutional Layer Recurrent Layer
What Each neuron is connected to every neuron in the previous layer. Use convolutional operations to detect local patterns in the input data. Processes sequential data, where the order of the input matters
Usage image classification, regression, and more image classification, object detection, image segmentation, spatial hierarchies natural language processing (NLP), time series analysis, and speech recognition

Learning Types

Supervised Learning Unsupervised Learning Semisupervised Learning Reinforcement learning
What Training data feed to the algorithm includes the desired solutions(called labels) Dataset does not have labels. ML model tries to learn without teacher. lot of unlabeled data and a little bit of labeled data Agent(AI Program) can observe the environment,
select and perform actions, and get rewards in return
Types 1. Classification (give yes/no):
a. Spam Filterning: Algo is trained with many example emails along with their class (spam or ham). Each email has a label.
2. Regression: (give % or numeric value):
Predict whether based on inputs, Predict price of car provided with some inputs((mileage, age, brand, etc.)
3. Logistic Regression (yes/no with %):
Mix of classfication & Regression. Example: 20% of chances being a spam.
1. Clustering
2. Visualization and dimensionality reduction
With unlabelled data this algorithm provides ouput which can be plotted on 2-D, 3-D plane.
3. Association rule learning:
This provides output as relations between attributes.
Algorithms k-Nearest Neighbors, Linear Regression, Logistic Regression, Support Vector Machines (SVMs), Decision Trees & Random Forests, Neural networks Clustering: k-Means, Hierarchical Cluster Analysis (HCA), Expectation Maximization
Visualization and dimensionality reduction: Principal Component Analysis (PCA), Kernel PCA, Locally-Linear Embedding (LLE), t-distributed Stochastic Neighbor Embedding (t-SNE)
Association rule learning: Apriori, Eclat

A. Supervised Learning

The training data feed to the algorithm includes the desired solutions(called labels). Supervised learning algorithms:
k-Nearest Neighbors,
Linear Regression,
Logistic Regression,
Support Vector Machines (SVMs),
Decision Trees & Random Forests,
Neural networks

Types of Supervised learning

1. Classification (give yes/no)
a. Spam Filterning: Algo is trained with many example emails along with their class (spam or not), and it must learn how to classify new emails. Each email has a label.
Types of Classification models
1. Binary classification:
  Model output a value from a class that contains only two values, for example rain or no rain
2. Multiclass classification:
  Output a value from a class that contains more than 2 values. Eg: rain, hail, snow, or sleet
2. Regression: (give % or numeric value)
Predictors(Predict something):
Predict whether based on inputs, Predict price of car provided with some inputs((mileage, age, brand, etc.)
3. Logistic Regression (yes/no with %)
Mix of classfication & Regression. Example: 20% of chances being a spam.

B. Unsupervised Learning

Dataset does not have labels. ML has to estimate whether it's correct/incorrect and create its own rules

Types of unsupervised learning

1. Clustering
The model finds data points that demarcate natural groupings.
Algorithms used: (k-Means, Hierarchical Cluster Analysis (HCA), Expectation Maximization)
2. Visualization and dimensionality reduction
With unlabelled data this algorithm provides ouput which can be plotted on 2-D, 3-D plane.
Algorithms used:
Principal Component Analysis (PCA), Kernel PCA, Locally-Linear Embedding (LLE), t-distributed Stochastic Neighbor Embedding (t-SNE)
3. Association rule learning
This provides output as relations between attributes.
Algorithms used: Apriori, Eclat
Examples:
1. Supermarket data analysis: suppose you own a supermarket. Sales logs may reveal that people who purchase sauce, potato chips also buy bread. Thus, you may want to place these items close to each other

C. Semisupervised Learning

lot of unlabeled data and a little bit of labeled data
Algorithms: Deep belief networks (DBNs),
Examples: 1. FB Photos: When we load photos, we provide labels to few and leave others. AI identifies photos.

D. Reinforcement learning

Agent(AI Program) can observe the environment, select and perform actions, and get rewards/punishment in return

E. Generative AI

Class of models that creates content from user input. For example, generative AI can create unique images, music etc
Example:
Text-to-text
Text-to-image
Text-to-video
Text-to-code
Text-to-speech
Image and text-to-image

Loss Function

In Keras, a loss function is used during the training of a neural network. It measures the difference between the model's predictions and the actual target values. The goal of training is to minimize this loss

            Loss function = (Actual O/P) - (Expected output)
        
Types What Example
1. categorical_crossentropy used in multi-class classification problems when the target variable is one-hot encoded model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
2. Binary Crossentropy (binary_crossentropy) Used for binary classification problems, where the target variable is binary (0 or 1) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
3. Mean Squared Error (mean_squared_error or mse) Measures the average squared difference between the true and predicted values model.compile(optimizer='adam', loss='mean_squared_error')
to_categorical
Function to convert labels into a one-hot encoded format.
One-hot encoding is converting categorical labels into a binary matrix (1s and 0s)

from keras.utils import to_categorical
train_labels = [0, 1, 2, 0, 1]
train_labels_one_hot = to_categorical(train_labels) # Convert to one-hot encoding
print(train_labels_one_hot)

array([[1., 0., 0.],                //represents 0
       [0., 1., 0.],                #1
       [0., 0., 1.],                #2
       [1., 0., 0.],                #0
       [0., 1., 0.]], dtype=float32)#1

        

Matplotlib

Popular plotting library for Python that provides a variety of high-quality 2D and 3D plots and visualizations.
Matplotlib.pyplot is a collection of functions that make Matplotlib work like MATLAB, allowing you to create plots, charts
imshow(digit, cmap=plt.cm.binary) is used to display images in digit array, cmap=colormap for mapping the data values to colors in the plot. plt.cm.binary = black and white colors.

Metrics

Metrics to monitor during training and testing

Neuron / Node / Function or class

A neuron is the basic unit within a layer. It takes input, performs a computation, and produces an output.
Each neuron has weight, bias, activation function
weight
Neurons receive input signals, and each input is associated with a weight. These weights represent the strength of the connection between the input and the neuron.

class Neuron:
    def __init__(self, num_inputs):
        self.weights = initialize_weights(num_inputs)
        self.bias = initialize_bias()
        self.activation_function = relu

    def forward(self, input_data):
        # Compute the weighted sum of inputs
        weighted_sum = sum(weight * input_value for weight, input_value in zip(self.weights, input_data)) + self.bias
        
        # Apply the activation function
        output = self.activation_function(weighted_sum)
        
        return output        
        

Neural Network

It try to emulate the human brain, combining computer science and statistics to solve common problems in the field of AI
It contains an input layer, one or more hidden layers, and an output layer.

Tensor = n-D Matrix

This is matrix(as in maths). Multi-dimensional numpy arrays used to store numbers during computation.

Types of Tensors

Dimension/Rank
/Axis/Ndim
Name Representation Examples Shape(Rows,cols)
Represents Number of elements in each direction
Processed By (Keras)
0 Scalar [0] (0)
1 vector [1,2,3,4] (4)
//Since 4 elements in 1 direction
2 Matrix / 2D Tensor

| 1 2 3 |
| 4 5 6 |
                
samples (2,3)
//Since 2 elements in 1 direction & 3 in other
Dense Layer
3 3D Tensor

[
    | 1 2 3 |
    | 4 5 6 |,

    | 1 2 3 |
    | 4 5 6 |
]
                
Timestamped data (2,2,3) Recurrent layers(eg: LSTM layer)
4 4D Tensor 3D tensors packed together 2D convolution layers (Conv2D)

            import numpy as np
            ########## 2-D Tensor ###########
            b = np.array(
                [
                    [0, 1, 2, 3],
                    [4, 5, 6, 7],
                    [8, 9, 10, 11],
                ]
            )
            print("Dimension/Ndim:", b.ndim)        # 2         //2d array
            print("Shape:", b.shape)                # (3, 4)    //(row,col)
            
            ########## 3-D Tensor, Packing 2-D matrices ###########
            c = np.array(
                [
                    [
                        [0, 1, 2],
                        [4, 5, 6],
                        [8, 9, 10],
                    ],
                    [
                        [10, 11, 12],
                        [14, 15, 16],
                        [18, 19, 110],
                    ]
                ]
            )
            print("Dimension/Ndim:", c.ndim)    # 3         //3d array
            print("Shape:", c.shape)            # (2,3,3)   //(2=arrays, 3=row, 3=col)

            ####### Operations ############
                #all,row,col
            d = c[:, 2:, 2:]            # Select all elements 2nd(row), 2nd(col) onwards.
            print(d)                    #   [[[ 10]]  [[110]]]
            

Tensor Operations

Add vector(dimension=1) + matrix(dimension=2)

|1 2 3| + | 1 2 3 | = |2 4 6|
          | 4 5 6 |   |5 7 9|

import numpy as np
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2    #Matrix
    assert len(y.shape) == 1    #vector
    assert x.shape[1] == y.shape[0]

    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

x = np.array(
        [
            [1,2,3],
            [4,5,6]
        ]
)
y = np.array([1,2,3])
z = naive_add_matrix_and_vector(x,y)
print(z)
'''
[[2,4,6]
[5,7,9]]
'''
            
Dot Product (.)
2 Vectors = Scalar Vectors(1D).Matrix(2D) = vector

[1,2,3].[2,3,4] = 1*2+2*3+3*4 = 20.0

import numpy as np
def naive_vector_dot(x, y):
    z = 0.                  #float
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

x = np.array([1,2,3])
y = np.array([2,3,4])
z = naive_vector_dot(x, y)
print(z)        #20.0    
                        

| 1 2 3 | . | 1 2 3 | = |1x1 + 2x2 + 3x3 , 1x4 + 2x5 + 3x6| 
            | 4 5 6 | = |13.0, 32.0|
import numpy as np
def naive_matrix_vector_dot(x, y):
    assert len(x.shape) == 2    #matrix
    assert len(y.shape) == 1    #vector
    assert x.shape[1] == y.shape[0]
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            z[i] += x[i, j] * y[j]
    return z

y = np.array([1,2,3])       #y.shape = (3)
x = np.array(               #x.shape = (2,3) 
                [
                    [1,2,3],
                    [4,5,6]
                ]
            )
print(naive_matrix_vector_dot(x,y))
[14. 32.]
                        
Tensor Reshaping
Reshaping a tensor means rearranging its rows and columns to match a target shape.
Shape(3,2) to (6,1). (2,3)

>>> x = np.array([[0, 1],
                  [2, 3],
                  [4, 5]])
>>> print(x.shape)      #See above how shape(3,2)
(3, 2)
>>> x = x.reshape((6, 1))
>>> x
array([[ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])
>>> x = x.reshape((2, 3))
>>> x
array([[ 0, 1, 2],
       [ 3, 4, 5]])
        
Transpose
Change Shape(x, y) to (y ,x)

>>> x = np.zeros((300, 20))
>>> x = np.transpose(x)
>>> print(x.shape)
(20, 300)
        

Tensor Terms

Term Mearning
Data types(dtype) Data type of data present in tensor. Eg: float32, uint8, float64
String tensors don’t exist in Numpy (or in most other libraries), because tensors are preallocated contiguous memory segments, and strings, being variable length
Axis/ndim/Rank/Dimension Dimension of matrix
Dimension/Axis   Array                        
    0           np.array(12)                     // Point does not have any dimension
    1           np.array([1,2])                     // 1x2. 1 Dimensional
    2           np.array([[5, 78, 2, 34, 0],       // 3x4. 2 Dimensional
                        [6, 79, 3, 35, 1],
                        [7, 80, 4, 36, 2]])
                    
Shape Tells how many size tensor has along each axis
previous matrix example has shape (3, 5), and the 3D tensor example has shape (3, 3, 5)

Tensorflow

This is ML Open source library(EXPOSING APIs) for numerical computation and large-scale ML supports CPUs & GPUs.
Python Front-end APIs & backend written in c++ for high performance.
            //Install conda https://docs.conda.io/projects/miniconda/en/latest/
            C:\Users\amitk\source\repos\Python> mkdir venv_ml1
            C:\Users\amitk\source\repos\Python> cd venv_ml1
            C:\Users\amitk\source\repos\Python\venv_ml1>"c:\Users\amitk\miniconda3\Scripts\activate" venv_ml1
            //Env is created here: C:\Users\amitk\miniconda3\envs
            (venv_ml1) C:\Users\amitk\source\repos\Python\venv_ml1>
            (venv_ml1) C:\Users\amitk\source\repos\Python\venv_ml1>"c:\Users\amitk\miniconda3\condabin\deactivate.bat"  //deactivate
            (venv_ml1) C:\Users\amitk\source\repos\Python\venv_ml1>pip install tensorflow
            Downloading tensorflow-2.6.2-cp36-cp36m-win_amd64.whl (423.3 MB)
        

Underfitting

Does not produces good results on traning data.

Variance

Variance is the tendency to learn random things unrelated to the real signal