Apache Superset is a modern data exploration and visualization platform allowing to create nice and interactive dashboard for data visualization.
I wanted to test its capabilities through analysis of open dataset regarding road accidents in France (especially involving kids)
- Version: 0.999.0dev
Clone Superset's repo in your terminal with the following command:
git clone https://github.com/apache/superset.git
Once that command completes successfully, you should see a new superset
folder in your current directory.
cd superset
docker-compose up
navigate to:127.0.0.1:8088
and login as admin
/ admin
Pretty straight forward: in the top right corner: settings / users
Dataset found here on data.gouv.fr
Complete field description here in french only, sorry :-(
from docker-compose in my repo
docker-compose up
connect to container
docker ps
docker exec -it <CONTAINER ID> /bin/bash
mysql -u root -p
from mysql shell:
CREATE USER 'user'@'%' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT ALL PRIVILEGES ON *.* TO 'user'@'%';
FLUSH PRIVILEGES;
I made a script to import datas to mysql:
python src/import.py
You might need to install connection driver if you want to access particular database. Follow superset instructions
in menu: data/databases click on + DATABASE
and fill in necessary informations:
in menu: data/datasets click on `+ DATASET and fill in necessary informations:
In the datasets view click on edit button at the end of the dataset row
If you need to have more than one table, you manually create a SQL query (using SQL Lab Editor)
Then by clicking Explore, you will be able to save it as a virtual dataset and use it to create reports
A lot of nice visualizations are available, lets' checkout some of them
Before using any visualization using MapBox you need to specify you token to access MapBox API
Create an account on Mapbox.com and create a token.
Copy the token and add it in your superset .env
file
cd superset
echo MAPBOX_API_KEY=<you token> > docker/.env
Dashboard UI is quite simple:
-
create your layout with components: row, column, tabs ...
-
Place charts on it
-
Resize elements
Apache Superset is really a nice and easy tool for data visualization.
It's super easy to setup and feature some advanced capabilities:
-
Support for many databases using SQLAlchemy: MySQL, Postgres, Oracle, MS SQL Server, MariaDB, Redshift ...
-
user/roles/permission granularity
-
Can use OpenID, Oauth, LDAP authentication
-
Interactive SQL editor allowing full control with exposed datas
-
Can perform some time series predictive analysis using
fbprophet
-
Custom visualization can be developped and added (not tested)
-
Can run on kubernetes and scale with needs.
Maybe, the less positive point for me, is the fact that chart creation UI is not always coherent. Depending on the chosen visualization, the visual settings for example will be in the DATA or in the CUSTOMIZE tab. Or you may find some grouping options not available on some chart types whereas it would have make perfect sense.
I would say, even if it's not yet at Tableau level, it can be an alternative in some use cases, considering the fact Superset is open source.