🧠Core AI
Deep Learning
Neural networks, CNN, RNN and Transformers
Deep dive into neural networks, convolutional networks, recurrent networks, and transformer architectures using PyTorch. Build real-world models from scratch.
55 lessons
28 hrs
Intermediate
What you will learn
Build real-world projects from scratch
Write clean, production-ready code
Understand core concepts deeply with diagrams
Follow industry best practices
Get hands-on with code in every lesson
Access lifetime updates as the tech evolves
Curriculum
Module 1Why Deep Learning4 lessons
Module 2The Single Neuron4 lessons
Anatomy of a Single Neuron16 min
From One Neuron to a Layer14 min
Why a Single Neuron Still Can't Solve XOR12 min
Implementing a Neuron in PyTorch14 min
Module 3Multi-Layer Networks4 lessons
Stacking Layers to Bend the Decision Boundary14 min
Building and Solving XOR for Real12 min
Hidden Layers and Representation12 min
Building a Multi-Layer Network in PyTorch with nn.Sequential12 min
Module 4Activation Functions6 lessons
Why Non-Linearity Is Required At All14 min
Sigmoid and Its Hidden Problem12 min
Tanh — A Centered Alternative10 min
ReLU — The Fix That Took Over12 min
Softmax — For When There Are Multiple Classes10 min
Choosing the Right Activation, and Implementing All Four in PyTorch12 min
Module 5Forward Propagation3 lessons
The Forward Pass as One Clean Pipeline12 min
What Actually Flows Through the Network10 min
Scaling to Deeper Networks12 min
Module 6Loss Functions5 lessons
What a Loss Function Actually Measures10 min
Mean Squared Error — For Regression12 min
Binary Cross-Entropy — For Binary Classification14 min
Categorical Cross-Entropy — For Multi-Class Classification14 min
Computing Loss in PyTorch and Visualizing the Loss Surface12 min
Module 7Backpropagation6 lessons
The Problem — Finding the Bottom of the Hill Without Sweeping10 min
The Chain Rule, Explained With a Simple Example12 min
Backpropagation Through One Neuron16 min
Backpropagation Through the Full XOR Network18 min
Gradient Descent — Using the Gradient to Actually Update Weights16 min
Autograd — Letting PyTorch Compute Gradients Automatically14 min
Module 8Building a Real Training Loop4 lessons
The Standard PyTorch Training Loop Pattern10 min
Tracking and Interpreting Training Progress10 min
A Slightly Harder Problem Than XOR — The Circles Dataset12 min
Common Beginner Training Bugs and How to Spot Them14 min
Module 9The Vanishing Gradient Problem, Formally4 lessons
Tracing Gradients Backward Through a Genuinely Deep Network14 min
Why Multiplying Many Small Slopes Is Catastrophic10 min
Comparing Sigmoid vs ReLU's Gradient Flow Through Depth12 min
The Remaining Problem ReLU Doesn't Solve10 min
Module 10Weight Initialization5 lessons
Why Initialization Isn't Just "Pick Small Random Numbers"10 min
The Variance Problem, Explained Simply12 min
Xavier/Glorot Initialization — For Sigmoid and Tanh14 min
He Initialization — For ReLU14 min
Choosing the Right Initialization in PyTorch12 min
Module 11Better Optimizers4 lessons
Why Plain Gradient Descent Struggles12 min
Momentum — Remembering Previous Steps14 min
Adam — Adapting the Learning Rate Per Weight16 min
Choosing an Optimizer in PyTorch12 min
Module 12Regularization5 lessons
When Training Loss Lies — Introducing Overfitting14 min
Why Overfitting Happens — The Network Memorizing Noise10 min
L2 Regularization (Weight Decay) — Penalizing Large Weights14 min
Dropout — Randomly Disabling Neurons During Training14 min
Putting It Together — A Properly Regularized Training Pipeline12 min
Module 13Why Dense Networks Fail on Images4 lessons
Counting the Parameters — Why Dense Networks Explode in Size10 min
No Spatial Awareness — A Dense Layer Doesn't Know What "Nearby" Means12 min
What a Real Network Needs — Translation Invariance12 min
Setting Up for Convolution8 min
Module 14The Convolution Operation5 lessons
The Sliding Window — Convolution, By Hand14 min
Why This Solves Module 13's Three Problems14 min
Padding and Stride — Controlling Output Size12 min
Multiple Filters and Channels12 min
Implementing Convolution in PyTorch — nn.Conv2d14 min
Module 15CNN Architecture — LeNet4 lessons
Pooling — Shrinking Feature Maps Deliberately12 min
LeNet's Architecture, Piece by Piece12 min
Building LeNet in PyTorch12 min
Training LeNet on Real Image Data16 min
Module 16Going Deeper — From LeNet Toward AlexNet4 lessons
What Changed Between 1998 and 201210 min
Deeper CNNs Hit the Same Vanishing Gradient Wall14 min
Applying Acts 1-2's Fixes to CNNs14 min
Building a Deeper CNN and Training on CIFAR-1016 min
Module 17VGG and the Limits of Plain Stacking4 lessons
VGG's Idea: Small Filters, Stacked Deep, Uniformly14 min
Pushing Depth Further: Does More Always Mean Better?16 min
Why This Isn't the Vanishing Gradient Problem Again14 min
Setting Up for Skip Connections9 min
Module 18ResNet and Skip Connections4 lessons
Residual Learning: Learning a Correction Instead of a Mapping13 min
Building a Residual Block in PyTorch15 min
Proving Residual Blocks Fix Module 17's Degradation Problem14 min
Building and Training a Small ResNet on CIFAR-1017 min
Module 19Transfer Learning4 lessons
Why Train From Scratch When You Don't Have To11 min
Feature Extraction: Freezing the Backbone15 min
Fine-Tuning: Unfreezing Carefully with Differential Learning Rates15 min
Transfer Learning vs Training From Scratch on CIFAR-1016 min
Module 20Why CNNs and Dense Networks Fail on Sequences4 lessons
Sequences Break the Fixed-Size Input Assumption10 min
Order Carries Meaning — And These Architectures Can't Use It13 min
Convolution's Local Window Can't Capture Long-Range Sequence Dependencies12 min
Setting Up for Recurrent Networks: Memory Instead of a Window9 min
Module 21Recurrent Neural Networks4 lessons
The RNN Cell — One Hidden State, Updated One Step at a Time15 min
Building an RNN in PyTorch with nn.RNN14 min
Backpropagation Through Time and the Vanishing Gradient's Return16 min
Training an RNN on a Real Sequence Task16 min
Module 22LSTM — Fixing the Vanishing Gradient Through Time4 lessons
The Cell State — An Additive Memory Highway, Not a Multiplicative Chain15 min
Building an LSTM in PyTorch with nn.LSTM13 min
Proving LSTM Preserves Gradients Across Far More Time Steps15 min
Training LSTM vs RNN on a Longer Sequence Task16 min
Module 23GRU — A Simpler Gated Alternative4 lessons
GRU's Simplification — One State, Two Gates Instead of Three14 min
Building a GRU in PyTorch with nn.GRU12 min
Does GRU's Simplification Cost Anything? Measuring Gradient Preservation14 min
RNN vs GRU vs LSTM — Training Head-to-Head on the Same Task15 min
Module 24Sequence-to-Sequence Models4 lessons
The New Problem: Sequence In, Sequence Out11 min
Building the Encoder-Decoder Architecture16 min
Teacher Forcing — Training the Decoder Without Compounding Errors14 min
Training a Complete Seq2Seq Model on a Toy Translation Task16 min
Module 25The Bottleneck Problem in Sequence-to-Sequence Models4 lessons
Squeezing an Entire Sentence Into One Fixed-Size Vector11 min
Measuring the Bottleneck — Does Accuracy Actually Degrade With Length?15 min
Why a Bigger Hidden State Only Delays the Problem, Rather Than Solving It13 min
Setting Up for Attention: Let the Decoder Look Back at the Whole Input9 min
Module 26The Attention Mechanism4 lessons
The Alignment Score — How Much Should the Decoder Care About Each Word?16 min
From Scores to Weights to a Context Vector14 min
The Complete Architecture — Encoder, Attention, and Decoder Working Together17 min
Training With Attention — Fixing the Bottleneck and Seeing What It Learned17 min
Module 27Self-Attention4 lessons
From Attending Across Sequences to Attending Within One13 min
Query, Key, and Value — The Precise Structure Behind Every Attention Layer16 min
Scaled Dot-Product Attention — The Complete, Exact Formula15 min
Multi-Head Attention — Several Relationships at Once, and a Complete Sentence Example18 min
Module 28The Transformer Architecture4 lessons
Positional Encoding — Giving Self-Attention a Sense of Order15 min
Add & Norm — Residual Connections and Layer Normalization Around Each Sublayer14 min
The Feed-Forward Network — and Assembling One Complete Encoder Block16 min
The Decoder, Masked Self-Attention, and the Complete Transformer20 min
Module 29Tokenization Deep Dive4 lessons
Why Word-Level Tokenization Breaks at Real Scale12 min
Byte Pair Encoding — Building a Subword Vocabulary by Hand18 min
Byte-Level BPE — How Real Tokenizers Handle Any Text at All14 min
Why Tokenization Choices Ripple Into Everything an LLM Does15 min
Module 30Embeddings Revisited — Static vs Contextual4 lessons
What a Learned Embedding Actually Captures — Word2Vec From Scratch16 min
Why One Fixed Vector Per Word Isn't Enough13 min
Contextual Embeddings — The Fix Was Already Inside Module 28's Transformer16 min
Static vs Contextual — Real Tradeoffs, Not a One-Sided Verdict14 min
Module 31Pretraining — Self-Supervised Learning at Scale4 lessons
The Self-Supervised Insight — Raw Text Is Its Own Training Signal13 min
GPT's Decoder-Only Architecture — Stripping Module 28 Down to What Pretraining Needs16 min
Training a Small GPT on Real Text and Generating From It17 min
Scale, Emergent Behavior, and the Bridge to Fine-Tuning12 min
Module 32Fine-Tuning — From Raw Prediction to Helpful Assistant3 lessons
Supervised Fine-Tuning — Teaching a Pretrained Model to Follow Instructions16 min
Reward Modeling — Learning What Humans Prefer15 min
Reinforcement Learning From Human Feedback — Closing the Loop16 min
Module 33Efficient Inference — KV Caching and Quantization3 lessons
Naive Generation Recomputes the Same Work Over and Over14 min
The KV Cache — Storing Once, Reusing Forever16 min
Quantization — Storing Weights With Fewer Bits15 min
Module 34Decoding Strategies — Temperature, Top-k, Top-p, and Beam Search4 lessons
Greedy Decoding and Pure Random Sampling — Both Extremes Fail12 min
Temperature — Reshaping the Distribution Before Sampling14 min
Top-k and Top-p (Nucleus) Sampling — Removing the Tail Explicitly16 min
Beam Search — Optimizing for the Best Overall Sequence, Not Just the Next Token17 min
Module 35Training at Scale — Mixed Precision and Distributed Training2 lessons
Mixed Precision — Training Faster With Fewer Bits, Safely16 min
Distributed Data Parallel Training — Multiple GPUs, One Model, Synchronized Gradients17 min
Module 36Generative Models — Autoencoders, VAEs, and GANs4 lessons
Autoencoders — Compressing Data Down to Its Essentials and Back15 min
Why a Trained Autoencoder Cannot Reliably Generate New Data13 min
Variational Autoencoders — Shaping the Latent Space So Sampling Works17 min
Generative Adversarial Networks — Two Networks Competing18 min
Module 37Deploying Deep Learning Models — ONNX and Real Inference Serving3 lessons
Why a Trained Model Is Not Automatically a Deployable One12 min
ONNX — A Portable Model Format, Exported and Verified16 min
Wrapping a Model in a Real Inference Server With FastAPI15 min
150 total lessons across 37 modulesPreview Module 1 free