QLearningMouse is a small cat-mouse-cheese game based on Q-Learning. The original version is by vmayoral: basic_reinforcement_learning:tutorial1, I reconstructed his code to make the game more configurable, and what different most is that I use breadth-first-search(BFS) when cat chasing the AI mouse, so the cat looks much more brutal :P
Cat always chase the mouse in the shortest path, however the mouse first does not know the danger of being eaten.
- Mouse win when eating the cheese and earns rewards value of 50, then a new cheese will be produced in a random grid.
- cat win when eating the mouse, the latter will gain rewards value of -100 when dead. Then it will relive in a random grid.
The basic algorithm of Q-Learning is:
Q(s, a) += alpha * (reward(s,a) + gamma * max(Q(s', a') - Q(s,a))
alpha
is the learning rate.
gamma
is the value of the future reward.
It use the best next choice of utility in later state to update the former state.
Learn more about Q-Learning:
- The Markov Decision Problem : Value Iteration and Policy Iteration
- ARTIFICIAL INTELLIGENCE FOUNDATIONS OF COMPUTATIONAL AGENTS : 11.3.3 Q-learning
Below we present two GIF to plot the reinforcement learning result:
blue
is for mouse.black
is for cat.orange
is for cheese.
After 300 generations:
After 339300 generations:
By using Q-learning algorithm, the mouse is becoming smarter, then there will be a moment cat can never catch it again.
git clone https://github.com/fancoo/QLearningMouse
cd QLearningMouse
python greedyMouse.py