Explore dataset visualization with Google Facets and Streamlit.
We use these visualization tools to explore the characteristics of the patients in the ChestX-ray8 (a.k.a. ChestX-ray14) dataset available here.
For this analysis we are interested in the characteristics of the patients, not the images. We want to understand the representativeness of the dataset across genres and ages. We will explore questions such as:
- What is the overall genre distribution?
- What is the distribution of diseases across the genres?
- What is the distribution of diseases across the ages? Across age groups?
- Is the disease distribution the same in the training and test sets?
- Is the disease distribution the same across genres and ages in the training and test sets?
- What diseases coccur in the same image?
Clone this repository, then:
Note that Google Facets works only with Chrome.
- Google Facets Dive: open Chrome, then choose Open File... in the File and open the file
google-facets/google-facets-dive.html
. - Google Facets Overview: open Chrome, then choose Open File... in the File and open the file
google-facets/google-facets-overview.html
.
- Set up the environment as explained here.
cd streamlit
streamlit run streamlit-viz.py
The dataset is fairly large. It may take several seconds to show the page.
These are the quick instructions to work with the code for the visualizations.
- Install Python 3
- Clone this repository
- Go into the repository directory
- Create a Python environment:
python3 -m venv env
- Activate the environmnet:
source env/bin/activate
(Linux/Mac) or.\env\Scripts\activate
(Windows) - Install the Python packages:
pip install -r requirements.txt
See how Google Facets is used to explore the dataset here.
See how Streamlit is used to explore the dataset here.
The fundamental difference between them is that Google Facets does not require any coding. In fact, you can upload a dataset directly on their site. Look for the "load your data" button. Note that, as it says in the button, you are uploading data to the Google site - think about privacy and intellectual property. (Side note: it is possible to customize the data points to show an image - this requires some coding, as explained here, but it can be used without it).
Not requiring code is an advantage and disadvantage. It is fast to explore the data, as long as you do not need features beyond what Google Facets already has in place.
Customization is Streamlit's strength. You can transform the data on the fly, add filters and visualizations not available in Google Facets. The cost is writing the code for that. Streamlit takes care of the user interface elements and updating the user interface, so you can concentrate on writing the code that filters the data and creates the visualizations.
To summarize:
- Google Facet is good for a quick inspection of the data.
- Sreamlit is good for customized ways to explore the data.