Reinforcement Learning Journey | 强化学习之旅

About This Project

This is an interactive learning platform where I document my journey of mastering Reinforcement Learning. Each algorithm combines theoretical understanding with hands-on Python implementations that run directly in your browser.

PyScript Prism.js Interactive Demos

📚 Algorithm Collection

Multi-Armed Bandit

Completed

The foundation of exploration vs. exploitation tradeoff. Implements ε-greedy, UCB, and Thompson Sampling strategies.

Exploration Exploitation Regret

Explore Algorithm →

Contextual Bandits

Completed

Linear contextual bandits with LinUCB and LinTS algorithms. Learn optimal actions based on context features using ridge regression.

Context LinUCB LinTS

Explore Algorithm →

Q-Learning

Completed

Temporal Difference learning for value-based methods. Classic tabular RL algorithm with Bellman optimality and off-policy learning.

Bellman Equation TD Learning Off-Policy

Explore Algorithm →

SARSA

Completed

On-policy TD control algorithm. Learns the value of actions actually taken, leading to safer policies during exploration.

On-Policy TD Control Expected SARSA

Explore Algorithm →

Policy Gradient

Coming Soon

Direct policy optimization using gradient ascent. Foundation of modern RL.

Policy-based REINFORCE

Deep Q-Network (DQN)

Coming Soon

Combining Q-Learning with deep neural networks. The breakthrough that started deep RL.

Deep Learning Experience Replay

Actor-Critic

Coming Soon

Combining value-based and policy-based methods for stable learning.

Hybrid Advantage

💡 Learning Philosophy

🧮 Mathematical Rigor

Building intuition from first principles using math and physics background

🔬 Hands-on Experimentation

Every algorithm comes with interactive code you can modify and test

🤔 Question-Driven

Learning through asking deep questions and exploring answers

🤝 AI-Assisted

Leveraging AI tools to accelerate understanding and implementation