report.lyx

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass article
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\use_hyperref false
\papersize default
\use_geometry false
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Title
Reinforcement learning to solve 
\begin_inset CommandInset href
LatexCommand href
name "Agar.io"
target "agar.io"

\end_inset


\end_layout

\begin_layout Author
\begin_inset CommandInset href
LatexCommand href
name "Vishruit"
target "vishruit@gmail.com"
type "mailto:"

\end_inset

, 
\begin_inset CommandInset href
LatexCommand href
name "Suhas"
target "suhas.gundimeda@gmail.com"
type "mailto:"

\end_inset


\end_layout

\begin_layout Abstract
Agar.io is a blob-eat-blob 2D world where the player controls a circular
 blob whose primary objective is to accumulate the largest amount of volume.
 This objective boils down to the amount of food it retains, supply of which
 consists of both static randomly dropped food and other real-time players.
 The available high-level decisions and actions for the blob include eating
 static food, actively avoiding bigger blobs, actively pursuing them.
 In addition, the blob has the additional possible action of splitting its
 mass to project a percentage of itself with a higher velocity in the intended
 direction.
\begin_inset Newline newline
\end_inset

The current report focuses on an exercise to solve a simpler version of
 the problem
\begin_inset CommandInset citation
LatexCommand cite
key "key-1"

\end_inset

, a single player world with limited observability and continuous state
 space.
\begin_inset Foot
status open

\begin_layout Plain Layout
Project repository, including latest report and source code at 
\begin_inset CommandInset href
LatexCommand href
target "https://github.com/snugghash/verbose-spork"

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Section
Problem statement
\end_layout

\begin_layout Standard
The problem space consists of a large grid-world where the agent has a limited
 field of view, and their only primitive actions are forward movements or
 rotations.
 The objective is to keep moving to grids which have an green(positive reward)
 object, and avoid red(negative reward) objects.
 New objects are placed at random locations, keeping the total number of
 each kind same.
 The game isn't episodic, and the objective is to maximize reward/step averaged
 over a thousand steps.
 
\end_layout

\begin_layout Standard
The agent has sensors which output the (continuous dimension) distance to
 green, red and wall objects, along the single direction they sense.
\end_layout

\begin_layout Section
Formulation
\end_layout

\begin_layout Standard
We used parametrized state spaces and function approximation to store and
 represent, as action values, the policy to follow if greedy.
\end_layout

\begin_layout Standard
Exploration was epsilon greedy.
\end_layout

\begin_layout Section
Results
\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
Average reward over 100000 steps
\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
key "key-1"

\end_inset

http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
\end_layout

\end_body
\end_document