These functions can be used to define NN with different configurations for other dataset as well.
- Introduction
- Initialize Parameters(W,b)
- Forward Propagation - [Linear -> Relu] *(L-1 times) -> [Linear -> Sigmoid]
- Cost function
- Backward Propagation
- Parameters update
- Define NN Configuration
- Import Dataset
- Train NN (Forward propagation, Backward propagation, Weights Update)
- Cache Weights
- Test NN on test Dataset
To build a L-layer NN, we need some helper function which will be useful for implementing a simple NN. These will be general function can be used with other dataset as well.
- [Linear -> Relu ] for (1,2,3...L-1 layers)
- [Linear -> Sigmoid] for (layer L)
- Zi = Wi . A_previous + bi
- Ai = activation_funtion(Zi)
- Wi: (no. of units in current layer, no. of units in previous layer)
- bi: (no. of units in current layer, 1)
- Ai-1: (no. of units in previous layer, ?)
- Zi and Ai: (no. of units in current layer, ?)
- X (Input): (Features, (Batch ?))
Initialize parameters (W & b) with random values. These are the learnable parameters which will try to get their best values during training.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def initialize_parameters(layerdims):
'''layerdims: List containing no. of units in each layer
parameters: A dict consist of all learnable parameters (W1,b1, W2,b2, ...)
L = len(layerdims)
for i in range(1, L):
parameters["W"+str(i)] = np.random.randn( layerdims[i], layerdims[i-1]) * 0.1
parameters["b"+str(i)] = np.zeros( (layerdims[i],1))
return parameters
- for L-1 layer (ie. 1,2,3...L-1)
- Linear -> Activation(Relu)
- for L th layer (ie. last layer)
- Linear -> Activation(sigmoid)
where A[0] = Inputmatrix(X).
def forward(A_prev, W, b, activation):
''' Forward Propagation for Single layer
A_prev: Activation from previous layer (size of previous layer, Batch_size)
A[0] = X
W: Weight matrix (size of current layer, size of previous layer)
b: bias vector, (size of current layer, 1)
A: Output of Single layer
cache = (A_prev,W,b,Z), these will be used while backpropagation
# Linear
Z = np.add(,A_prev), b)
# Activation Function
if activation== "sigmoid":
if activation== "relu":
A = np.maximum(0,Z)
return A, cache
def L_layer_forward(X, parameters, layerdims):
''' Forward propagation for L-layer
X: Input matrix (input size/no. of features, no. of examples/BatchSize)
parameters: dict of {W1,b1 ,W2,b2, ...}
layerdims: Vector, no. of units in each layer (no. of layers,)
y_hat: Output of Forward Propagation
caches: (A_prev,W,b,Z) *(L-1 times , of 1,2,..L layers)
L = len(layerdims)-1
A = X
# L[0] is units for Input layer
# [LINEAR -> RELU]*(L-1) Forward for L-1 layers
for l in range(1,L):
A_prev = A
A, cache=forward(A_prev, parameters["W"+str(l)], parameters["b"+str(l)], "relu")
# Forward for Last layer
# [Linear -> sigmoid]
y_hat, cache=forward(A, parameters["W"+str(l+1)], parameters["b"+str(l+1)], "sigmoid")
return y_hat, caches
Computing the cross-entropy cost J, :
def compute_cost(y_hat, Y):
'''Computes the Loss between predicted and true label
y_hat: Predicted Output (1, no. of examples)
Y: Actual label vector consist of 0/1 (1, no. of examples)
m = Y.shape[1]
costt = np.add( np.multiply(Y, np.log(y_hat)) , np.multiply(1-Y, np.log(1-y_hat)) )
cost = (-1/m) * np.sum(costt, axis=1)
return cost
def backward(dA, cache, activation):
'''Backward propagation for single layer
dz: Derivative of Cost wrt Z (of current layer)
cache: tuple of (A_prev,W,b,Z)
dW: Gradient of Cost wrt W, (having same shape as W)
db: Gradient of Cost wrt b, (having same shape as b)
dA_prev: Gradient of Cost wrt Activation (same shape as A_prev)
A_prev,W,b,Z = cache
m= A_prev.shape[1]
# Computing derivative of Cost wrt Z
# dA, Z,
if activation == "relu":
dZ = np.array(dA, copy=True)
dZ[Z <=0] =0
if activation == "sigmoid":
s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)
# Computing derivative of Cost wrt A & W & b
dA_prev =, dZ)
dW = (1/m) *, A_prev.transpose())
db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
return dA_prev, dW, db
def L_layer_backward(y_hat, Y, caches, layerdims):
''' Backward Propogation from layer L to 1
y_hat: predicted output
Y: true values
caches: list of caches stored while forward Propagation
(A_prev,W,b,Z) *(L-1 times , of 1,2,..L-1 layers) with relu
(A_prev,W,b,Z) (for layer L, with sigmoid)
layerdims:List having no. of units in each layer
grads: A dict containing gradient (dA_i, dW_i, db_i), this will be used while updating parameters
AL = y_hat
L = len(layerdims) -1
# Intializing Backpropagation
# Compute derivation of Cost wrt A
dAL = -(np.divide(Y,AL) - np.divide(1-Y, 1-AL))
# Compute derivative of Lth layer (Sigmoid -> Linear) gradients.
# Inputs: (AL, Y, caches) Outputs: (grad["dAL"], grad["dWL] , grad["dbL"])
grads["dA"+str(L)], grads["dW"+str(L)], grads["db"+str(L)] = backward(dAL, caches[-1], activation="sigmoid")
# Compute derivative for (1,2,..L-1)layers (relu -> Linear) gradients.
# Inputs:(grads[dAL], caches) Outputs:(grads(dA_i, dW_i, db_i) )
for i in list(reversed(range(L-1))):
current_cache = caches[i]
a,b,c=backward(grads["dA"+str(i+2)], current_cache, activation="relu")
grads["dA"+str(i+1)] = a
grads["dW"+str(i+1)] = b
grads["db"+str(i+1)] = c
return grads
These parameters will be update by using Gradient Descent.
where, alpha is Learning Rate
def update_params(params, grads, learning_rate):
parameters: dict of (W1,b1, W2,b2,...)
grads: Gradients of(A,W,b) stored while Backpropagation (dA,dW,db)
returns: updated parameters
# As each layer has 2 parameters (W,b)
L=len(params) // 2
for l in range(L):
params["W"+str(l+1)] = params["W"+str(l+1)] - learning_rate * grads["dW"+str(l+1)]
params["b"+str(l+1)] = params["b"+str(l+1)] - learning_rate * grads["db"+str(l+1)]
return params
Classify images as "Cats vs Non-Cat"
- Load the dataset
- Define a NN
- Train with training data and Test with test data
from utils import load_data
# Loading DataSet
train_x, train_Y, test_x, test_Y, classes = load_data()
# No. of Examples, Image Height, Image Width, no. of Channels(RBG)
print(f"Training Input Data {train_x.shape}")
# consist of 0/1, 1:if it's a cat else 0
print(f"Training labels{train_Y.shape}")
# Example
index =11
Training Input Data (209, 64, 64, 3)
Training labels(1, 209)
# Reshape each image to Vector, X to (Single_Vector 64*64*3, no.of examples)
train_Xflat = train_x.reshape(-1, train_x.shape[0])
test_Xflat = test_x.reshape(-1, test_x.shape[0])
# Scaling pixel values b/w 0 to 1
train_X = train_Xflat /255
test_X = test_Xflat /255
print(f"Training Data{train_X.shape}")
print(f"Test data{test_X.shape}")
Training Data(12288, 209)
Test data(12288, 50)
1. Define Model Config(no. of layers and units in each layer)
2. Initialize parameters
3. Loop:
a. Forward Propagation
b. Compute Cost Function
c. Backward Propagation
d. Update parameters(W,b using cache values)
4. Use parameters to predict labels
layer[0]: Input layer (12288,?)
layer[1]: 20 units with relu as Activation
layer[2]: 7 units with relu as Activation
layer[3]: 5 units with relu as Activation
layer[4]: Output layer, 1 unit with sigmoid as Activation
# Model Configuration
# len(layer_dims), will be no. of layers with Input & Output layers
layer_dims=[12288, 20, 7, 5, 1]
# No. of Gradient Descent Iterations
def image_classifier(X, Y, layer_dims, learning_rate, num_itr, parameters, initialize=False):
''' Implements a L-layer NN:
[Linear->Relu] *(L-1)times -> [Linear->Sigmoid]
X: Input data(Images) (Height*Weidth*3 , no. of examples)
Y: True labels, consist of 0/1 (1, no. of examples)
layer_dims: list, where each value is no. of units.
learning_rate: for parameters update
num_itr: no. of iterations of Gradient Descent
parameters: parameters learnt during Model Training.
if initialize:
parameters = initialize_parameters(layer_dims)
# Gradient Descent
for i in range(num_itr):
# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID
y_hat, caches = L_layer_forward(X, parameters, layer_dims)
# Compute Cost
cost = compute_cost(y_hat, Y)
# Backward propagation
grads = L_layer_backward(y_hat, Y, caches, layer_dims)
# Update Parameters
parameters = update_params(parameters, grads, learning_rate)
if i%200 ==0:
print(f"cost {i}: {np.squeeze(cost)}")
if i%100 ==0:
# Ploting the Cost
plt.xlabel("n iteration")
return parameters
parameters=image_classifier(train_X, train_Y, layer_dims, 0.05, 500, 0, initialize=True)
parameters=image_classifier(train_X, train_Y, layer_dims, 0.009, 500, parameters, initialize=False)
parameters=image_classifier(train_X, train_Y, layer_dims, 0.005, 1000, parameters, initialize=False)
cost 0: 0.6905574738743139
cost 200: 0.6259687089377447
cost 400: 0.5491339571191219
cost 0: 0.5165736771011232
cost 200: 0.44000740507364766
cost 400: 0.41408147269860646
cost 0: 0.41267587248977683
cost 200: 0.3509585569849044
cost 400: 0.34337928409686364
cost 600: 0.33967454342915293
cost 800: 0.3061756591021378
parameters = image_classifier(train_X, train_Y, layer_dims, 0.006, 2000, parameters, initialize=False)
cost 0: 0.09455121331197718
cost 200: 0.08905332813952466
cost 400: 0.08371083394364122
cost 600: 0.0793367319090336
cost 800: 0.07536823660230452
cost 1000: 0.07197433309116627
cost 1200: 0.06872838900925801
cost 1400: 0.06625443634187707
cost 1600: 0.06437765728954029
cost 1800: 0.0620562741762558
# Caching Weights
import pickle
pickle.dump(parameters, open('parameters.pkl', 'wb'))
parameters=pickle.load(open("parameters.pkl", 'rb'))
# Defining predict function
def predict(X,Y, parameters):
m= X.shape[1]
# forward propagation
y_hat, caches = L_layer_forward(X, parameters, layer_dims)
predicted=np.where(y_hat>0.5, 1, 0)
accuracy = np.sum(predicted == Y) / m
return(accuracy, predicted)
# Show Wrong predicted Images
def mislabel(X,Y,y_hat):
xx = np.array(Y+y_hat)
mislabel_index = (np.where(xx == 1))[1]
return mislabel_index
train_accuracy, train_predictedValues = predict(train_X, train_Y, parameters)
print(f"Train Data Accuracy: {train_accuracy}")
# More Accuracy can be obtain by training again at lower learning_rate
# Show Mislabeled Image
mislabel_index = mislabel(train_X,train_Y,train_predictedValues)
print("Mislabel Index", mislabel_index)
indexx =mislabel_index[1]
Train Data Accuracy: 0.9856459330143541
Mislabel Index [105 131 139]
