Skip to content
This repository has been archived by the owner on Jul 6, 2020. It is now read-only.

Add Features and Organization comparision #245

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,17 @@ EvalAI is an open source web application that helps researchers, students and da

In recent years, it has become increasingly difficult to compare an algorithm solving a given task with other existing approaches. These comparisons suffer from minor differences in algorithm implementation, use of non-standard dataset splits and different evaluation metrics. By providing a central leaderboard and submission interface, we make it easier for researchers to reproduce the results mentioned in the paper and perform reliable & accurate quantitative analysis. By providing swift and robust backends based on map-reduce frameworks that speed up evaluation on the fly, EvalAI aims to make it easier for researchers to reproduce results from technical papers and perform reliable and accurate analyses.

<p align="center"><img width="65%" src="src/assets/images/kaggle_comparison.png" /></p>

A question we’re often asked is: Doesn’t Kaggle already do this? The central differences are:
| Features | OpenML | TopCoder | Kaggle | AICrowd | ParlAI | Codalab | EvalAI |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICrowd -> AIcrowd

| :------------------------: | :----------------------: | :----------------------: | :----------------------: | :----------------------: | :----------------------: | :----------------------: | :----------------: |
| AI challenge hosting | :heavy_multiplication_x: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :heavy_multiplication_x: | :white_check_mark: | :white_check_mark: |
| Custom metrics | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Multiple phases/splits | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :heavy_multiplication_x: | :white_check_mark: | :white_check_mark: |
| Open source | :white_check_mark: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Remote evaluation | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Human evaluation | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :heavy_multiplication_x: | :white_check_mark: |
| Evaluation in Environments | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: |

How is EvalAI different from others? The central differences are:

- **Custom Evaluation Protocols and Phases**: We have designed versatile backend framework that can support user-defined evaluation metrics, various evaluation phases, private and public leaderboard.

Expand All @@ -31,6 +39,13 @@ A question we’re often asked is: Doesn’t Kaggle already do this? The central

Our ultimate goal is to build a centralized platform to host, participate and collaborate in AI challenges organized around the globe and we hope to help in benchmarking progress in AI.

## Features
FirePing32 marked this conversation as resolved.
Show resolved Hide resolved

- **Results compiled natively which decreases network dependence**
- **Easy to export project to Android and IOS with a single codebase**
- **Test cases help identify incorrect settings during development**
- **Docker builds the project automatically - No hustle on workers and services !**

## Performance comparison

Some background: Last year, the [Visual Question Answering Challenge (VQA) 2016](http://www.visualqa.org/vqa_v1_challenge.html) was hosted on some other platform, and on average evaluation would take **~10 minutes**. EvalAI hosted this year's [VQA Challenge 2017](https://evalai.cloudcv.org/featured-challenges/1/overview). This year, the dataset for the [VQA Challenge 2017](http://www.visualqa.org/challenge.html) is twice as large. Despite this, we’ve found that our parallelized backend only takes **~130 seconds** to evaluate on the whole test set VQA 2.0 dataset.
Expand Down