About This Project
This is an interactive learning platform where I document my journey of mastering Reinforcement Learning. Each algorithm combines theoretical understanding with hands-on Python implementations that run directly in your browser.
📚 Algorithm Collection
Multi-Armed Bandit
CompletedThe foundation of exploration vs. exploitation tradeoff. Implements ε-greedy, UCB, and Thompson Sampling strategies.
Contextual Bandits
CompletedLinear contextual bandits with LinUCB and LinTS algorithms. Learn optimal actions based on context features using ridge regression.
Q-Learning
CompletedTemporal Difference learning for value-based methods. Classic tabular RL algorithm with Bellman optimality and off-policy learning.
SARSA
CompletedOn-policy TD control algorithm. Learns the value of actions actually taken, leading to safer policies during exploration.
Policy Gradient
Coming SoonDirect policy optimization using gradient ascent. Foundation of modern RL.
Deep Q-Network (DQN)
Coming SoonCombining Q-Learning with deep neural networks. The breakthrough that started deep RL.
Actor-Critic
Coming SoonCombining value-based and policy-based methods for stable learning.
💡 Learning Philosophy
🧮 Mathematical Rigor
Building intuition from first principles using math and physics background
🔬 Hands-on Experimentation
Every algorithm comes with interactive code you can modify and test
🤔 Question-Driven
Learning through asking deep questions and exploring answers
🤝 AI-Assisted
Leveraging AI tools to accelerate understanding and implementation