Skip to content

Commit

Permalink
Release alpha 0.3
Browse files Browse the repository at this point in the history
Code:
* Rewritten contributions by previous team member.
* Added icon file chooser

Documentation:
* Added book version with some academic requirements
* Date and version automatically get inputted on document
* Added implementation section
* Expanded previous chapters
  • Loading branch information
AlvarBer committed Jan 11, 2017
2 parents 97c39f1 + cdf52cd commit 5374fe2
Show file tree
Hide file tree
Showing 21 changed files with 298 additions and 124 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# PDF files
*.pdf

# Word files
*.docx

# TeX files
*.tex

Expand Down
17 changes: 11 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ install: true

script: true


after_success:
- sudo apt-get install texlive-latex-extra
- cd docs
Expand All @@ -18,6 +19,13 @@ after_success:
- cd .. && mv standalone/* .
- make travis && cd ..

# Run tests on master & dev
branches:
only:
- master
- dev

# Only deploy on master
deploy:
provider: releases
api_key: "$GH_TOKEN"
Expand All @@ -28,9 +36,6 @@ deploy:
branch:
- master

# Only run CI on master
branches:
only:
- master
- dev

# Stop bothering me
notifications:
email: false
103 changes: 62 additions & 41 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -1,48 +1,69 @@
PDF := persimmon.pdf # PDF Main Target
MARKDOWN := introduction.md state_of_the_art.md objectives.md risk_analysis.md \
implementation.md postmortem.md # Markdown files
BODY := body.tex # Markdown files will be converted to this intermediate step
#APPENDICES := appendixX.md # Markdown Appendices
#APPENDIX := appendix.tex # And appendices to this intermediate step
# METADATA := metadata.yaml # Metadata files (Author, Date, Title, etc..)
BIBLIOGRAPHY := persimmon.bib # BibLaTeX bibliography
CSL := emerald-harvard.csl # CSL file used for citations
TEMPLATE := template.tex # LaTeX template for producing PDF
BIBLIOGRAPHY := persimmon.bib # BibLaTeX bibliography
MARKDOWN := chapter0.md chapter1.md chapter2.md chapter3.md # Markdown files
GRAPHS := $(wildcard graphs/*.tex)
IMAGES := $(GRAPHS:.tex=.pdf)
IMAGES += $(wildcard graphs/*.png) # Standalone pictures to be inserted
LATEXHEADERS := latexheaders.tex # Additional LaTeX headers
METADATA := metadata.yaml # Metadata files (Author, Date, Title, etc..)

all: pdf

pdf: $(MARKDOWN) $(BIBLIOGRAPHY) $(CSL) $(TEMPLATE) $(IMAGES) $(METADATA)
pandoc --standalone --smart --latex-engine xelatex --template $(TEMPLATE) \
--top-level-division chapter --bibliography $(BIBLIOGRAPHY) --csl $(CSL) \
--include-in-header $(LATEXHEADERS) $(METADATA) $(MARKDOWN) -o $(PDF)

travis: $(IMAGES)
pandoc --standalone --smart --latex-engine xelatex --template $(TEMPLATE) \
--chapters --bibliography $(BIBLIOGRAPHY) --csl $(CSL) \
--include-in-header $(LATEXHEADERS) $(METADATA) $(MARKDOWN) -o $(PDF)

# For standalone images (Not used)

GRAPHS := $(wildcard graphs/*.tex) # Latex diagrams
IMAGES := $(wildcard graphs/*.png) # .png images
IMAGES += $(GRAPHS:.tex=.pdf) # Generated PDF Images

all: $(PDF)

# Main PDF, travis ci and book to print version
$(PDF): $(BODY) $(TEMPLATE) $(IMAGES) # TODO: Add abstract
pandoc --smart --standalone --latex-engine xelatex --template $(TEMPLATE) \
--metadata author:"Álvaro Bermejo" \
--metadata date:"$(shell date +"%d/%m/%Y") ($(shell git describe --abbrev=0 --tags))" \
--metadata title:"Persimmon" --metadata fontsize:"12pt" --toc \
--metadata subtitle:"A scikitlearn visual programming interface" \
--metadata mainlang:"English" --metada keywords:"Machine Learning","Visual Programming" \
--metadata papersize:"A4" --metadata sansfont:"Helvetica Neue LT Com" \
--metadata colorlinks --metadata documentclass:"scrreprt" \
--top-level-division chapter $(BODY) -o $@

# Main PDF, travis ci and book to print version
travis: $(BODY) $(APPENDIX) $(TEMPLATE) $(IMAGES)
pandoc --smart --standalone --latex-engine xelatex --template $(TEMPLATE) \
--metadata author:"Álvaro Bermejo" \
--metadata date:"$(shell date +"%d/%m/%Y") ($(shell git describe --abbrev=0 --tags))" \
--metadata title:"Persimmon" --metadata fontsize:"12pt" --toc \
--metadata subtitle:"A sklearn visual programming interface" \
--metadata mainlang:"English" --metada keywords:"Machine Learning","Visual Programming" \
--metadata papersize:"A4" \
--metadata colorlinks --metadata documentclass:"scrreprt" \
--chapters $(BODY) $(APPENDIX) -o $(PDF)


book_complu: $(BODY) $(APPENDIX) $(TEMPLATE) $(IMAGES)
pandoc --smart --standalone --latex-engine xelatex --template $(TEMPLATE) \
--metadata author:"Álvaro Bermejo" --metadata date:"Director: Pablo Moreno Ger" \
--metadata title:"Persimmon" --metadata fontsize:"12pt" --toc \
--metadata subtitle:"A scikitlearn visual programming interface" \
--metadata mainlang:"English" \
--metadata papersize:"A4" --metadata sansfont:"Helvetica Neue LT Com" \
--metadata documentclass:"scrbook" --metadata institute:"Universidad Complutense" \
--top-level-division chapter $(BODY) $(APPENDIX) -o bool_$(PDF)


# For standalone images
graphs/%.pdf: graphs/%.tex
xelatex $<

## Splitted creation (Not currently working)
#CHAPTERS := $(MARKDOWN:.md=.tex) # LaTeX Chapters
#GRAPHS := $(wildcard graphs/*.tex)
#IMAGES += $(GRAPHS:.tex=.pdf)
#
##splitted: $(CHAPTERS) $(BIBLIOGRAPHY) $(CSL) $(TEMPLATE)
# pandoc --standalone --smart --latex-engine xelatex --template $(TEMPLATE) \
# --top-level-division chapter --bibliography $(BIBLIOGRAPHY) --csl $(CSL) \
# $(CHAPTERS) -o $(PDF)
#
#%.tex: metadata.yaml %.md
# pandoc --no-tex-ligatures metadata.yaml $*.md -o $@
#
## For standalone images (Not used)
#graphs/%.pdf: graphs/%.tex
# xelatex $<
#
xelatex $< > /dev/null # TODO: actually output in graphs directory
mv $*.pdf graphs/

# Body and Appendices Middle Steps creation
$(BODY): $(MARKDOWN)
pandoc --no-tex-ligatures --bibliography $(BIBLIOGRAPHY) --csl $(CSL) \
metadata.yaml $(MARKDOWN) -o $@

$(APPENDIX): $(APPENDICES)
pandoc --no-tex-ligatures $(APPENDICES) -o $@

clean:
rm -f *.pdf chapter?.tex *.log *.aux *.png
rm -f $(BODY) $(APPENDIX) graphs/*.pdf *.pdf *.log *.aux

30 changes: 30 additions & 0 deletions docs/appendixX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Appendix X: How was this document made?
=======================================

This document was written on Markdown, and converted to PDF
using Pandoc.

Process
-------
Document is written on Pandoc's extended Markdown, and can be broken amongst
different files. Images are inserted with regular Markdown syntax for images.
A YAML file with metadata information is passed to pandoc, containing things
such as Author, Title, font, etc... The use of this information depends on
what output we are creating and the template/reference we are using.


Diagrams
--------
Diagrams are were created with LaTeX packages such as tikz or pgfgantt, they
can be inserted directly as PDF, but if we desire to output to formats other
than LaTeX is more convenient to convert them to .png filesi with tools such
as `pdftoppm`.


References
------------
References are handled by pandoc-citeproc, we can write our bibliography in
a myriad of different formats: bibTeX, bibLaTeX, JSON, YAML, etc..., then
we reference in our markdown, and that reference works for multiple formats


44 changes: 0 additions & 44 deletions docs/chapter0.md

This file was deleted.

Binary file added docs/graphs/early_interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/graphs/filechooser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/graphs/objectives.tex
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
\draw (-1, 2.8) rectangle (1, 3.5) node[midway, gray] {Parity};
\draw [->, gray, thick] (0, 3.6) -- (0, 4.3);
\draw (-1, 4.4) rectangle (1, 5.1) node[midway, gray] {Compilation};
\draw [red, dashed] (-3, 5.5) -- (3, 5.5) node[below left] {Out of scope};
\draw [red, dashed] (4, 5.35) -- (-4, 5.35) node[above right] {Out of scope};
\draw [->, gray, thick] (-0.1, 5.2) -- (-2, 5.9);
\draw [->, gray, thick] (0.1, 5.2) -- (2, 5.9);
\draw (-3, 6) rectangle (-1, 6.7) node[midway, gray] {Web};
Expand Down
11 changes: 11 additions & 0 deletions docs/implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Implementation
==============

First Iteration
---------------

![Early "static" interface](graphs/early_interface.png)

![File chooser](graphs/filechooser.png)


48 changes: 48 additions & 0 deletions docs/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Introduction
============

Description
-----------
Persimmon is a visual programming interface for sklearn.

This projects involves a variety of Computer Science topics, such as User
Experience (Main topic as the project is driven by the users feedback and
engagement with the project), Machine Learning (We don't write the algorithms,
but need extensive knowledge of them to surface all their options) Software
Engineering (We have to interact with already built software, using interfaces
and organizing code through object-oriented techniques), Compilers (Language
parsing and transpilers) and a number of tangentially related topics such as
Machine Learning, I/O, preprocessing of data, etc.

Motivation
----------
After learning about Machine Learning on university last year I was able to get
an internship working for a company on the algorithmic trading sector..

There, amongst other duties, I aided with moving the codebase from MATLAB to
Python, and during that process I realised many of my co-workers struggled with
the switch. All of the were not computer scientists, but instead came from a
variety of backgrounds such as Maths, Physics, Electric Engineering,
Statistics or Aerospacial Engineering.

Yet they were the whole of the department, their work requires a very high
level of theoretical maths knowledge, and so happens that these experts from
these fields tend to not have a lot of general programming skills, they mostly
work with specialized languages, tailored to these tasks such as MATLAB, R,
Julia, etc, and moving to a general purpose language such as Python involves
learning about a plethora of additional topics, such as Object Oriented
Programming, custom complex Datastructures or CPU cache optimization.

The situation is even more complicated for newcomers to Machine Learning, as
they not only have the programming barrier but also have to overcome the
difficulties of the algorithms themselves, something Computer Scientists also
struggle with (In many cases even more because their weaker maths skills).

So this project serves a double purpose, it helps with the programming barrier,
and it aids with the Machine Learning process as it allows the learner to focus
on the connections, intuitions and mathematical basis and not on the
implementation details and the quirks of the concrete language.

This hypothesis that visual learning can improve understanding is supported by
numerous sources such as [@fry2007visualizing] and [@principles].

2 changes: 0 additions & 2 deletions docs/latexheaders.tex

This file was deleted.

18 changes: 13 additions & 5 deletions docs/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@

---
author: "Álvaro Bermejo"
author: Álvaro Bermejo
date: 2017-01-01
title: "Persimmon: a visual interface for sklearn"
papersize: "A4"
fontsize: "12pt"
mainlang: "English"
toc: yes
subtitle: A scikit-learn visual prgramming interface
version: 0.2
sansfont: Helvetica Neue LT Com
colorlinks: True
documentclass: scrreprt
institute: Universidad Complutense and University of Hertforshire
papersize: A4
fontsize: 12pt
mainlang: English
toc: True
top-level-division: chapter
---

16 changes: 11 additions & 5 deletions docs/chapter2.md → docs/objectives.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
Objectives
==========
The best way we can describe the project is by dividing the objectives.
and the best way to understand the progression of those and their relation

The best way to describe the project is by dividing the objectives.
And the best way to understand the progression of those and their relation
is with a diagram.


![Objectives Tree](objectives.pdf)
![Objectives Tree](graphs/objectives.pdf)

**Capped** is more than a minimum viable product, a extensive proof-of-concept,
with a few limited algorithms and the ability of inputing `.csv` files. with a
Expand All @@ -16,21 +17,26 @@ buttons.
interaction. we don't really care much about having the same number of
underlying algorithms because that's not the focus of the project.

And the final objective is **compilation**, the ability to get the python
And the final objective is **Compilation**, the ability to get the python
source code from the visual representation. also improving the interface to
have a better flow, such as in unreal blueprints, which provide a very
intuitive interface [@shah2014mastering].

This milestone would bring Persimmon utility outside just the realm of
learning, as it would be a convenince tool for the exploratory work of any
ML solution (Business case, a Kaggle[^kaggle] competition, etc...

Out of scope, but further applications of the system are **web/junyper**
integration that means the system would be accesible from a website interface,
and script **synthesization**, which is the opposite of compilation, meaning
the ability to visualize on persimmon a python source file.

Now that we understand the objectives we can draw a much detailed gantt diagram.

![Gantt Diagram](gantt.pdf)
![Gantt Diagram](graphs/gantt.pdf)


We ommited previous months that included idea refinement but are not
interesting for us.

[^kaggle]: [Kaggle.com](https://www.kaggle.com/)
6 changes: 6 additions & 0 deletions docs/postmortem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Postmortem
==========

Bibliography
============

5 changes: 1 addition & 4 deletions docs/chapter3.md → docs/risk_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Prevention & Mitigation
-----------------------

| Risk Factor | Low Impact | Medium Impact | High Impact |
|--------------|--------------------|----------------------|------------------|
|:------------ |:------------------:|:--------------------:|:----------------:|
| Requirements | Not defined enough | Change at late stage | Unreachable goal |
| Technology | Performance issues | Interoperability | Major errors |

Expand Down Expand Up @@ -52,6 +52,3 @@ analysis of the capabilities of the platform must be done before starting the
project, identifying possible faults and providing possible solutions and or
alternatives.


Bibliography
============
Loading

0 comments on commit 5374fe2

Please sign in to comment.