Exercises and project of RLD - M2 DAC+M2A, Sorbonne University
Students: Tianwei LAN, Jacques ROUGE
TME1: Upper Confidence Bound (UCB) and Linear Upper Confidence Bound (LinUCB)
TME2: Value Iteration and Policy Iteration
TME3: Q-Learning
TME4: Deep Q-Network (DQN)
TME5: Actor-Critic
TME6: Proximal Policy Optimization (PPO) with Adaptative KL and with Clipped Objective
TME7: Deep Deterministic Policy Gradient (DDPG)
TME8: Generative Adversarial Network (GAN)
TME9: Variational Autoencoder (VAE)
TME10: Multi-Agent Deep Deterministic Policy Gradient (MADDPG)