Courses/Deep Learning
🧠Core AI

Deep Learning

Neural networks, CNN, RNN and Transformers

Deep dive into neural networks, convolutional networks, recurrent networks, and transformer architectures using PyTorch. Build real-world models from scratch.

55 lessons
28 hrs
Intermediate

What you will learn

Build real-world projects from scratch
Write clean, production-ready code
Understand core concepts deeply with diagrams
Follow industry best practices
Get hands-on with code in every lesson
Access lifetime updates as the tech evolves

Curriculum

Module 1Why Deep Learning4 lessons
Where Classical ML Hits a Ceiling14 min
Preview
A Brief History — 1958 to Today16 min
Preview
What Makes a Network 'Deep'12 min
Preview
Setting Up Your Deep Learning Environment10 min
Preview
Module 2The Single Neuron4 lessons
Anatomy of a Single Neuron16 min
Enrolled only
From One Neuron to a Layer14 min
Enrolled only
Why a Single Neuron Still Can't Solve XOR12 min
Enrolled only
Implementing a Neuron in PyTorch14 min
Enrolled only
Module 3Multi-Layer Networks4 lessons
Stacking Layers to Bend the Decision Boundary14 min
Enrolled only
Building and Solving XOR for Real12 min
Enrolled only
Hidden Layers and Representation12 min
Enrolled only
Building a Multi-Layer Network in PyTorch with nn.Sequential12 min
Enrolled only
Module 4Activation Functions6 lessons
Why Non-Linearity Is Required At All14 min
Enrolled only
Sigmoid and Its Hidden Problem12 min
Enrolled only
Tanh — A Centered Alternative10 min
Enrolled only
ReLU — The Fix That Took Over12 min
Enrolled only
Softmax — For When There Are Multiple Classes10 min
Enrolled only
Choosing the Right Activation, and Implementing All Four in PyTorch12 min
Enrolled only
Module 5Forward Propagation3 lessons
The Forward Pass as One Clean Pipeline12 min
Enrolled only
What Actually Flows Through the Network10 min
Enrolled only
Scaling to Deeper Networks12 min
Enrolled only
Module 6Loss Functions5 lessons
What a Loss Function Actually Measures10 min
Enrolled only
Mean Squared Error — For Regression12 min
Enrolled only
Binary Cross-Entropy — For Binary Classification14 min
Enrolled only
Categorical Cross-Entropy — For Multi-Class Classification14 min
Enrolled only
Computing Loss in PyTorch and Visualizing the Loss Surface12 min
Enrolled only
Module 7Backpropagation6 lessons
The Problem — Finding the Bottom of the Hill Without Sweeping10 min
Enrolled only
The Chain Rule, Explained With a Simple Example12 min
Enrolled only
Backpropagation Through One Neuron16 min
Enrolled only
Backpropagation Through the Full XOR Network18 min
Enrolled only
Gradient Descent — Using the Gradient to Actually Update Weights16 min
Enrolled only
Autograd — Letting PyTorch Compute Gradients Automatically14 min
Enrolled only
Module 8Building a Real Training Loop4 lessons
The Standard PyTorch Training Loop Pattern10 min
Enrolled only
Tracking and Interpreting Training Progress10 min
Enrolled only
A Slightly Harder Problem Than XOR — The Circles Dataset12 min
Enrolled only
Common Beginner Training Bugs and How to Spot Them14 min
Enrolled only
Module 9The Vanishing Gradient Problem, Formally4 lessons
Tracing Gradients Backward Through a Genuinely Deep Network14 min
Enrolled only
Why Multiplying Many Small Slopes Is Catastrophic10 min
Enrolled only
Comparing Sigmoid vs ReLU's Gradient Flow Through Depth12 min
Enrolled only
The Remaining Problem ReLU Doesn't Solve10 min
Enrolled only
Module 10Weight Initialization5 lessons
Why Initialization Isn't Just "Pick Small Random Numbers"10 min
Enrolled only
The Variance Problem, Explained Simply12 min
Enrolled only
Xavier/Glorot Initialization — For Sigmoid and Tanh14 min
Enrolled only
He Initialization — For ReLU14 min
Enrolled only
Choosing the Right Initialization in PyTorch12 min
Enrolled only
Module 11Better Optimizers4 lessons
Why Plain Gradient Descent Struggles12 min
Enrolled only
Momentum — Remembering Previous Steps14 min
Enrolled only
Adam — Adapting the Learning Rate Per Weight16 min
Enrolled only
Choosing an Optimizer in PyTorch12 min
Enrolled only
Module 12Regularization5 lessons
When Training Loss Lies — Introducing Overfitting14 min
Enrolled only
Why Overfitting Happens — The Network Memorizing Noise10 min
Enrolled only
L2 Regularization (Weight Decay) — Penalizing Large Weights14 min
Enrolled only
Dropout — Randomly Disabling Neurons During Training14 min
Enrolled only
Putting It Together — A Properly Regularized Training Pipeline12 min
Enrolled only
Module 13Why Dense Networks Fail on Images4 lessons
Counting the Parameters — Why Dense Networks Explode in Size10 min
Enrolled only
No Spatial Awareness — A Dense Layer Doesn't Know What "Nearby" Means12 min
Enrolled only
What a Real Network Needs — Translation Invariance12 min
Enrolled only
Setting Up for Convolution8 min
Enrolled only
Module 14The Convolution Operation5 lessons
The Sliding Window — Convolution, By Hand14 min
Enrolled only
Why This Solves Module 13's Three Problems14 min
Enrolled only
Padding and Stride — Controlling Output Size12 min
Enrolled only
Multiple Filters and Channels12 min
Enrolled only
Implementing Convolution in PyTorch — nn.Conv2d14 min
Enrolled only
Module 15CNN Architecture — LeNet4 lessons
Pooling — Shrinking Feature Maps Deliberately12 min
Enrolled only
LeNet's Architecture, Piece by Piece12 min
Enrolled only
Building LeNet in PyTorch12 min
Enrolled only
Training LeNet on Real Image Data16 min
Enrolled only
Module 16Going Deeper — From LeNet Toward AlexNet4 lessons
What Changed Between 1998 and 201210 min
Enrolled only
Deeper CNNs Hit the Same Vanishing Gradient Wall14 min
Enrolled only
Applying Acts 1-2's Fixes to CNNs14 min
Enrolled only
Building a Deeper CNN and Training on CIFAR-1016 min
Enrolled only
Module 17VGG and the Limits of Plain Stacking4 lessons
VGG's Idea: Small Filters, Stacked Deep, Uniformly14 min
Enrolled only
Pushing Depth Further: Does More Always Mean Better?16 min
Enrolled only
Why This Isn't the Vanishing Gradient Problem Again14 min
Enrolled only
Setting Up for Skip Connections9 min
Enrolled only
Module 18ResNet and Skip Connections4 lessons
Residual Learning: Learning a Correction Instead of a Mapping13 min
Enrolled only
Building a Residual Block in PyTorch15 min
Enrolled only
Proving Residual Blocks Fix Module 17's Degradation Problem14 min
Enrolled only
Building and Training a Small ResNet on CIFAR-1017 min
Enrolled only
Module 19Transfer Learning4 lessons
Why Train From Scratch When You Don't Have To11 min
Enrolled only
Feature Extraction: Freezing the Backbone15 min
Enrolled only
Fine-Tuning: Unfreezing Carefully with Differential Learning Rates15 min
Enrolled only
Transfer Learning vs Training From Scratch on CIFAR-1016 min
Enrolled only
Module 20Why CNNs and Dense Networks Fail on Sequences4 lessons
Sequences Break the Fixed-Size Input Assumption10 min
Enrolled only
Order Carries Meaning — And These Architectures Can't Use It13 min
Enrolled only
Convolution's Local Window Can't Capture Long-Range Sequence Dependencies12 min
Enrolled only
Setting Up for Recurrent Networks: Memory Instead of a Window9 min
Enrolled only
Module 21Recurrent Neural Networks4 lessons
The RNN Cell — One Hidden State, Updated One Step at a Time15 min
Enrolled only
Building an RNN in PyTorch with nn.RNN14 min
Enrolled only
Backpropagation Through Time and the Vanishing Gradient's Return16 min
Enrolled only
Training an RNN on a Real Sequence Task16 min
Enrolled only
Module 22LSTM — Fixing the Vanishing Gradient Through Time4 lessons
The Cell State — An Additive Memory Highway, Not a Multiplicative Chain15 min
Enrolled only
Building an LSTM in PyTorch with nn.LSTM13 min
Enrolled only
Proving LSTM Preserves Gradients Across Far More Time Steps15 min
Enrolled only
Training LSTM vs RNN on a Longer Sequence Task16 min
Enrolled only
Module 23GRU — A Simpler Gated Alternative4 lessons
GRU's Simplification — One State, Two Gates Instead of Three14 min
Enrolled only
Building a GRU in PyTorch with nn.GRU12 min
Enrolled only
Does GRU's Simplification Cost Anything? Measuring Gradient Preservation14 min
Enrolled only
RNN vs GRU vs LSTM — Training Head-to-Head on the Same Task15 min
Enrolled only
Module 24Sequence-to-Sequence Models4 lessons
The New Problem: Sequence In, Sequence Out11 min
Enrolled only
Building the Encoder-Decoder Architecture16 min
Enrolled only
Teacher Forcing — Training the Decoder Without Compounding Errors14 min
Enrolled only
Training a Complete Seq2Seq Model on a Toy Translation Task16 min
Enrolled only
Module 25The Bottleneck Problem in Sequence-to-Sequence Models4 lessons
Squeezing an Entire Sentence Into One Fixed-Size Vector11 min
Enrolled only
Measuring the Bottleneck — Does Accuracy Actually Degrade With Length?15 min
Enrolled only
Why a Bigger Hidden State Only Delays the Problem, Rather Than Solving It13 min
Enrolled only
Setting Up for Attention: Let the Decoder Look Back at the Whole Input9 min
Enrolled only
Module 26The Attention Mechanism4 lessons
The Alignment Score — How Much Should the Decoder Care About Each Word?16 min
Enrolled only
From Scores to Weights to a Context Vector14 min
Enrolled only
The Complete Architecture — Encoder, Attention, and Decoder Working Together17 min
Enrolled only
Training With Attention — Fixing the Bottleneck and Seeing What It Learned17 min
Enrolled only
Module 27Self-Attention4 lessons
From Attending Across Sequences to Attending Within One13 min
Enrolled only
Query, Key, and Value — The Precise Structure Behind Every Attention Layer16 min
Enrolled only
Scaled Dot-Product Attention — The Complete, Exact Formula15 min
Enrolled only
Multi-Head Attention — Several Relationships at Once, and a Complete Sentence Example18 min
Enrolled only
Module 28The Transformer Architecture4 lessons
Positional Encoding — Giving Self-Attention a Sense of Order15 min
Enrolled only
Add & Norm — Residual Connections and Layer Normalization Around Each Sublayer14 min
Enrolled only
The Feed-Forward Network — and Assembling One Complete Encoder Block16 min
Enrolled only
The Decoder, Masked Self-Attention, and the Complete Transformer20 min
Enrolled only
Module 29Tokenization Deep Dive4 lessons
Why Word-Level Tokenization Breaks at Real Scale12 min
Enrolled only
Byte Pair Encoding — Building a Subword Vocabulary by Hand18 min
Enrolled only
Byte-Level BPE — How Real Tokenizers Handle Any Text at All14 min
Enrolled only
Why Tokenization Choices Ripple Into Everything an LLM Does15 min
Enrolled only
Module 30Embeddings Revisited — Static vs Contextual4 lessons
What a Learned Embedding Actually Captures — Word2Vec From Scratch16 min
Enrolled only
Why One Fixed Vector Per Word Isn't Enough13 min
Enrolled only
Contextual Embeddings — The Fix Was Already Inside Module 28's Transformer16 min
Enrolled only
Static vs Contextual — Real Tradeoffs, Not a One-Sided Verdict14 min
Enrolled only
Module 31Pretraining — Self-Supervised Learning at Scale4 lessons
The Self-Supervised Insight — Raw Text Is Its Own Training Signal13 min
Enrolled only
GPT's Decoder-Only Architecture — Stripping Module 28 Down to What Pretraining Needs16 min
Enrolled only
Training a Small GPT on Real Text and Generating From It17 min
Enrolled only
Scale, Emergent Behavior, and the Bridge to Fine-Tuning12 min
Enrolled only
Module 32Fine-Tuning — From Raw Prediction to Helpful Assistant3 lessons
Supervised Fine-Tuning — Teaching a Pretrained Model to Follow Instructions16 min
Enrolled only
Reward Modeling — Learning What Humans Prefer15 min
Enrolled only
Reinforcement Learning From Human Feedback — Closing the Loop16 min
Enrolled only
Module 33Efficient Inference — KV Caching and Quantization3 lessons
Naive Generation Recomputes the Same Work Over and Over14 min
Enrolled only
The KV Cache — Storing Once, Reusing Forever16 min
Enrolled only
Quantization — Storing Weights With Fewer Bits15 min
Enrolled only
Module 34Decoding Strategies — Temperature, Top-k, Top-p, and Beam Search4 lessons
Greedy Decoding and Pure Random Sampling — Both Extremes Fail12 min
Enrolled only
Temperature — Reshaping the Distribution Before Sampling14 min
Enrolled only
Top-k and Top-p (Nucleus) Sampling — Removing the Tail Explicitly16 min
Enrolled only
Beam Search — Optimizing for the Best Overall Sequence, Not Just the Next Token17 min
Enrolled only
Module 35Training at Scale — Mixed Precision and Distributed Training2 lessons
Mixed Precision — Training Faster With Fewer Bits, Safely16 min
Enrolled only
Distributed Data Parallel Training — Multiple GPUs, One Model, Synchronized Gradients17 min
Enrolled only
Module 36Generative Models — Autoencoders, VAEs, and GANs4 lessons
Autoencoders — Compressing Data Down to Its Essentials and Back15 min
Enrolled only
Why a Trained Autoencoder Cannot Reliably Generate New Data13 min
Enrolled only
Variational Autoencoders — Shaping the Latent Space So Sampling Works17 min
Enrolled only
Generative Adversarial Networks — Two Networks Competing18 min
Enrolled only
Module 37Deploying Deep Learning Models — ONNX and Real Inference Serving3 lessons
Why a Trained Model Is Not Automatically a Deployable One12 min
Enrolled only
ONNX — A Portable Model Format, Exported and Verified16 min
Enrolled only
Wrapping a Model in a Real Inference Server With FastAPI15 min
Enrolled only
150 total lessons across 37 modulesPreview Module 1 free
🧠
9991,999

50% off — limited time

  • Lifetime access
  • 55 structured lessons
  • Code snippets and diagrams
  • Certificate of completion
  • Weekly content updates