Courses/Deep Learning

🧠Core AI

Deep Learning

Neural networks, CNN, RNN and Transformers

Deep dive into neural networks, convolutional networks, recurrent networks, and transformer architectures using PyTorch. Build real-world models from scratch.

55 lessons

28 hrs

Intermediate

What you will learn

Build real-world projects from scratch

Write clean, production-ready code

Understand core concepts deeply with diagrams

Follow industry best practices

Get hands-on with code in every lesson

Access lifetime updates as the tech evolves

Curriculum

Module 1Why Deep Learning4 lessons

Where Classical ML Hits a Ceiling14 min

Preview

A Brief History — 1958 to Today16 min

Preview

What Makes a Network 'Deep'12 min

Preview

Setting Up Your Deep Learning Environment10 min

Preview

Module 2The Single Neuron4 lessons

Anatomy of a Single Neuron16 min

Enrolled only

From One Neuron to a Layer14 min

Enrolled only

Why a Single Neuron Still Can't Solve XOR12 min

Enrolled only

Implementing a Neuron in PyTorch14 min

Enrolled only

Module 3Multi-Layer Networks4 lessons

Stacking Layers to Bend the Decision Boundary14 min

Enrolled only

Building and Solving XOR for Real12 min

Enrolled only

Hidden Layers and Representation12 min

Enrolled only

Building a Multi-Layer Network in PyTorch with nn.Sequential12 min

Enrolled only

Module 4Activation Functions6 lessons

Why Non-Linearity Is Required At All14 min

Enrolled only

Sigmoid and Its Hidden Problem12 min

Enrolled only

Tanh — A Centered Alternative10 min

Enrolled only

ReLU — The Fix That Took Over12 min

Enrolled only

Softmax — For When There Are Multiple Classes10 min

Enrolled only

Choosing the Right Activation, and Implementing All Four in PyTorch12 min

Enrolled only

Module 5Forward Propagation3 lessons

The Forward Pass as One Clean Pipeline12 min

Enrolled only

What Actually Flows Through the Network10 min

Enrolled only

Scaling to Deeper Networks12 min

Enrolled only

Module 6Loss Functions5 lessons

What a Loss Function Actually Measures10 min

Enrolled only

Mean Squared Error — For Regression12 min

Enrolled only

Binary Cross-Entropy — For Binary Classification14 min

Enrolled only

Categorical Cross-Entropy — For Multi-Class Classification14 min

Enrolled only

Computing Loss in PyTorch and Visualizing the Loss Surface12 min

Enrolled only

Module 7Backpropagation6 lessons

The Problem — Finding the Bottom of the Hill Without Sweeping10 min

Enrolled only

The Chain Rule, Explained With a Simple Example12 min

Enrolled only

Backpropagation Through One Neuron16 min

Enrolled only

Backpropagation Through the Full XOR Network18 min

Enrolled only

Gradient Descent — Using the Gradient to Actually Update Weights16 min

Enrolled only

Autograd — Letting PyTorch Compute Gradients Automatically14 min

Enrolled only

Module 8Building a Real Training Loop4 lessons

The Standard PyTorch Training Loop Pattern10 min

Enrolled only

Tracking and Interpreting Training Progress10 min

Enrolled only

A Slightly Harder Problem Than XOR — The Circles Dataset12 min

Enrolled only

Common Beginner Training Bugs and How to Spot Them14 min

Enrolled only

Module 9The Vanishing Gradient Problem, Formally4 lessons

Tracing Gradients Backward Through a Genuinely Deep Network14 min

Enrolled only

Why Multiplying Many Small Slopes Is Catastrophic10 min

Enrolled only

Comparing Sigmoid vs ReLU's Gradient Flow Through Depth12 min

Enrolled only

The Remaining Problem ReLU Doesn't Solve10 min

Enrolled only

Module 10Weight Initialization5 lessons

Why Initialization Isn't Just "Pick Small Random Numbers"10 min

Enrolled only

The Variance Problem, Explained Simply12 min

Enrolled only

Xavier/Glorot Initialization — For Sigmoid and Tanh14 min

Enrolled only

He Initialization — For ReLU14 min

Enrolled only

Choosing the Right Initialization in PyTorch12 min

Enrolled only

Module 11Better Optimizers4 lessons

Why Plain Gradient Descent Struggles12 min

Enrolled only

Momentum — Remembering Previous Steps14 min

Enrolled only

Adam — Adapting the Learning Rate Per Weight16 min

Enrolled only

Choosing an Optimizer in PyTorch12 min

Enrolled only

Module 12Regularization5 lessons

When Training Loss Lies — Introducing Overfitting14 min

Enrolled only

Why Overfitting Happens — The Network Memorizing Noise10 min

Enrolled only

L2 Regularization (Weight Decay) — Penalizing Large Weights14 min

Enrolled only

Dropout — Randomly Disabling Neurons During Training14 min

Enrolled only

Putting It Together — A Properly Regularized Training Pipeline12 min

Enrolled only

Module 13Why Dense Networks Fail on Images4 lessons

Counting the Parameters — Why Dense Networks Explode in Size10 min

Enrolled only

No Spatial Awareness — A Dense Layer Doesn't Know What "Nearby" Means12 min

Enrolled only

What a Real Network Needs — Translation Invariance12 min

Enrolled only

Setting Up for Convolution8 min

Enrolled only

Module 14The Convolution Operation5 lessons

The Sliding Window — Convolution, By Hand14 min

Enrolled only

Why This Solves Module 13's Three Problems14 min

Enrolled only

Padding and Stride — Controlling Output Size12 min

Enrolled only

Multiple Filters and Channels12 min

Enrolled only

Implementing Convolution in PyTorch — nn.Conv2d14 min

Enrolled only

Module 15CNN Architecture — LeNet4 lessons

Pooling — Shrinking Feature Maps Deliberately12 min

Enrolled only

LeNet's Architecture, Piece by Piece12 min

Enrolled only

Building LeNet in PyTorch12 min

Enrolled only

Training LeNet on Real Image Data16 min

Enrolled only

Module 16Going Deeper — From LeNet Toward AlexNet4 lessons

What Changed Between 1998 and 201210 min

Enrolled only

Deeper CNNs Hit the Same Vanishing Gradient Wall14 min

Enrolled only

Applying Acts 1-2's Fixes to CNNs14 min

Enrolled only

Building a Deeper CNN and Training on CIFAR-1016 min

Enrolled only

Module 17VGG and the Limits of Plain Stacking4 lessons

VGG's Idea: Small Filters, Stacked Deep, Uniformly14 min

Enrolled only

Pushing Depth Further: Does More Always Mean Better?16 min

Enrolled only

Why This Isn't the Vanishing Gradient Problem Again14 min

Enrolled only

Setting Up for Skip Connections9 min

Enrolled only

Module 18ResNet and Skip Connections4 lessons

Residual Learning: Learning a Correction Instead of a Mapping13 min

Enrolled only

Building a Residual Block in PyTorch15 min

Enrolled only

Proving Residual Blocks Fix Module 17's Degradation Problem14 min

Enrolled only

Building and Training a Small ResNet on CIFAR-1017 min

Enrolled only

Module 19Transfer Learning4 lessons

Why Train From Scratch When You Don't Have To11 min

Enrolled only

Feature Extraction: Freezing the Backbone15 min

Enrolled only

Fine-Tuning: Unfreezing Carefully with Differential Learning Rates15 min

Enrolled only

Transfer Learning vs Training From Scratch on CIFAR-1016 min

Enrolled only

Module 20Why CNNs and Dense Networks Fail on Sequences4 lessons

Sequences Break the Fixed-Size Input Assumption10 min

Enrolled only

Order Carries Meaning — And These Architectures Can't Use It13 min

Enrolled only

Convolution's Local Window Can't Capture Long-Range Sequence Dependencies12 min

Enrolled only

Setting Up for Recurrent Networks: Memory Instead of a Window9 min

Enrolled only

Module 21Recurrent Neural Networks4 lessons

The RNN Cell — One Hidden State, Updated One Step at a Time15 min

Enrolled only

Building an RNN in PyTorch with nn.RNN14 min

Enrolled only

Backpropagation Through Time and the Vanishing Gradient's Return16 min

Enrolled only

Training an RNN on a Real Sequence Task16 min

Enrolled only

Module 22LSTM — Fixing the Vanishing Gradient Through Time4 lessons

The Cell State — An Additive Memory Highway, Not a Multiplicative Chain15 min

Enrolled only

Building an LSTM in PyTorch with nn.LSTM13 min

Enrolled only

Proving LSTM Preserves Gradients Across Far More Time Steps15 min

Enrolled only

Training LSTM vs RNN on a Longer Sequence Task16 min

Enrolled only

Module 23GRU — A Simpler Gated Alternative4 lessons

GRU's Simplification — One State, Two Gates Instead of Three14 min

Enrolled only

Building a GRU in PyTorch with nn.GRU12 min

Enrolled only

Does GRU's Simplification Cost Anything? Measuring Gradient Preservation14 min

Enrolled only

RNN vs GRU vs LSTM — Training Head-to-Head on the Same Task15 min

Enrolled only

Module 24Sequence-to-Sequence Models4 lessons

The New Problem: Sequence In, Sequence Out11 min

Enrolled only

Building the Encoder-Decoder Architecture16 min

Enrolled only

Teacher Forcing — Training the Decoder Without Compounding Errors14 min

Enrolled only

Training a Complete Seq2Seq Model on a Toy Translation Task16 min

Enrolled only

Module 25The Bottleneck Problem in Sequence-to-Sequence Models4 lessons

Squeezing an Entire Sentence Into One Fixed-Size Vector11 min

Enrolled only

Measuring the Bottleneck — Does Accuracy Actually Degrade With Length?15 min

Enrolled only

Why a Bigger Hidden State Only Delays the Problem, Rather Than Solving It13 min

Enrolled only

Setting Up for Attention: Let the Decoder Look Back at the Whole Input9 min

Enrolled only

Module 26The Attention Mechanism4 lessons

The Alignment Score — How Much Should the Decoder Care About Each Word?16 min

Enrolled only

From Scores to Weights to a Context Vector14 min

Enrolled only

The Complete Architecture — Encoder, Attention, and Decoder Working Together17 min

Enrolled only

Training With Attention — Fixing the Bottleneck and Seeing What It Learned17 min

Enrolled only

Module 27Self-Attention4 lessons

From Attending Across Sequences to Attending Within One13 min

Enrolled only

Query, Key, and Value — The Precise Structure Behind Every Attention Layer16 min

Enrolled only

Scaled Dot-Product Attention — The Complete, Exact Formula15 min

Enrolled only

Multi-Head Attention — Several Relationships at Once, and a Complete Sentence Example18 min

Enrolled only

Module 28The Transformer Architecture4 lessons

Positional Encoding — Giving Self-Attention a Sense of Order15 min

Enrolled only

Add & Norm — Residual Connections and Layer Normalization Around Each Sublayer14 min

Enrolled only

The Feed-Forward Network — and Assembling One Complete Encoder Block16 min

Enrolled only

The Decoder, Masked Self-Attention, and the Complete Transformer20 min

Enrolled only

Module 29Tokenization Deep Dive4 lessons

Why Word-Level Tokenization Breaks at Real Scale12 min

Enrolled only

Byte Pair Encoding — Building a Subword Vocabulary by Hand18 min

Enrolled only

Byte-Level BPE — How Real Tokenizers Handle Any Text at All14 min

Enrolled only

Why Tokenization Choices Ripple Into Everything an LLM Does15 min

Enrolled only

Module 30Embeddings Revisited — Static vs Contextual4 lessons

What a Learned Embedding Actually Captures — Word2Vec From Scratch16 min

Enrolled only

Why One Fixed Vector Per Word Isn't Enough13 min

Enrolled only

Contextual Embeddings — The Fix Was Already Inside Module 28's Transformer16 min

Enrolled only

Static vs Contextual — Real Tradeoffs, Not a One-Sided Verdict14 min

Enrolled only

Module 31Pretraining — Self-Supervised Learning at Scale4 lessons

The Self-Supervised Insight — Raw Text Is Its Own Training Signal13 min

Enrolled only

GPT's Decoder-Only Architecture — Stripping Module 28 Down to What Pretraining Needs16 min

Enrolled only

Training a Small GPT on Real Text and Generating From It17 min

Enrolled only

Scale, Emergent Behavior, and the Bridge to Fine-Tuning12 min

Enrolled only

Module 32Fine-Tuning — From Raw Prediction to Helpful Assistant3 lessons

Supervised Fine-Tuning — Teaching a Pretrained Model to Follow Instructions16 min

Enrolled only

Reward Modeling — Learning What Humans Prefer15 min

Enrolled only

Reinforcement Learning From Human Feedback — Closing the Loop16 min

Enrolled only

Module 33Efficient Inference — KV Caching and Quantization3 lessons

Naive Generation Recomputes the Same Work Over and Over14 min

Enrolled only

The KV Cache — Storing Once, Reusing Forever16 min

Enrolled only

Quantization — Storing Weights With Fewer Bits15 min

Enrolled only

Module 34Decoding Strategies — Temperature, Top-k, Top-p, and Beam Search4 lessons

Greedy Decoding and Pure Random Sampling — Both Extremes Fail12 min

Enrolled only

Temperature — Reshaping the Distribution Before Sampling14 min

Enrolled only

Top-k and Top-p (Nucleus) Sampling — Removing the Tail Explicitly16 min

Enrolled only

Beam Search — Optimizing for the Best Overall Sequence, Not Just the Next Token17 min

Enrolled only

Module 35Training at Scale — Mixed Precision and Distributed Training2 lessons

Mixed Precision — Training Faster With Fewer Bits, Safely16 min

Enrolled only

Distributed Data Parallel Training — Multiple GPUs, One Model, Synchronized Gradients17 min

Enrolled only

Module 36Generative Models — Autoencoders, VAEs, and GANs4 lessons

Autoencoders — Compressing Data Down to Its Essentials and Back15 min

Enrolled only

Why a Trained Autoencoder Cannot Reliably Generate New Data13 min

Enrolled only

Variational Autoencoders — Shaping the Latent Space So Sampling Works17 min

Enrolled only

Generative Adversarial Networks — Two Networks Competing18 min

Enrolled only

Module 37Deploying Deep Learning Models — ONNX and Real Inference Serving3 lessons

Why a Trained Model Is Not Automatically a Deployable One12 min

Enrolled only

ONNX — A Portable Model Format, Exported and Verified16 min

Enrolled only

Wrapping a Model in a Real Inference Server With FastAPI15 min

Enrolled only

150 total lessons across 37 modulesPreview Module 1 free

🧠

₹999₹1,999

50% off — limited time

Lifetime access
55 structured lessons
Code snippets and diagrams
Certificate of completion
Weekly content updates