This is the final project of Udacity: Data Analyst Nanodegree
The dataset from RITA contains information on United States flight delays and performance. The purpose of this analysis is to explore flight delays in various aspects, such as relative proportion of different delay causes, time variation of the total delay, and then inspect the delay trend by different origins, from general to specific airport that has longest average delay time.
Here is the link to the first version of the story.
Here is the link to the final version of the story.
The data(2006) is downloaded from stat-computing.org via the following link.
In the first version, my design was more like a exploration along the way. Below is the design decisions I made when crafting the first version.
Find which origins are responsible for most delays.
- Use horizontal bar charts to easy visualize the contributions of delay time.
- Add color for flight distance, aiming for gaining more insight.
- Add some filters to allow users to navigating through different para carries and months.
Visualize the number of records from each origin.
- Use bubble chart with the size indicating the number of records from each origin.
Find which origins are higher in terms of average delay time. In slide 1 and 2, I found that major airports account for more delay time, so simply comparing total delay time might not be useful.
- Use the same plotting decisions as in slide 1
Take a deeper look into Adak airport(ADK) data, which has highest average delay time in 2006.
- Use bar charts to visualize average departure delay in each month to see the distribution.
- Use Line charts that include 5 detailed delay causes to find more insights.
- Include overall average departure delay and detailed causes plots to compare with ADK's data.
After receiving the feedback from my mentor at Udacity, I decide to redesign the whole story based on his suggestions, but keep the goal the same: Explore the delay data based on origins.
Introduce the analysis. The chart in this slide includes the geographical information and some delay summaries. After browsing this chart, readers should be able to expect this analysis is about origins and delays.
- Use map to display different origins, and use colors to denote the average departure delay.
- Embed a simple pie chart to represent the proportion of different delay causes.
Look at the delay data over the year.
- Use full name of different variables in the plot(avoid using abbreviation).
- Plot departure delay as bars, since it is the main variable in the analysis.
- Plot other delay causes as lines, for readers to easily recognize their relative values.
- Introduce a calculation field for months so it displays text instead of number in the x axis
Find which origins are responsible for most delays, along with their traffic and geographical location.
- Use horizontal bar charts to visualize the contributions of delay time, and only display the top 10 result to avoid distraction.
- Use bubble charts to show the relative traffic of different origins and highlight the top 3.
- Use map and annotations to include more information of the three major airports.
Find which origins are higher in terms of average delay time.
- Use horizontal bar charts to visualize the 10 airports with highest average departure delay.
- Use map and annotations to include more information of the three airports.
Explore Adak airport (ADK) data.
- Plot departure delay as bars and other delay causes as lines, as in the previous slide, but for ADK data.
- Add detailed view for each delay time per month as stack plots, for reader to easily inspect each month's delay causes.
- Add record counts for the day of week to show that there are usually only two flights per week.
- Add filter to select month of interest
I've asked my mentor at Udactiy to give me feedback. His suggestions are very thoroghly and helpful. The suggestions can be summarized as followed:
- Have an introduction slide that (1) introduces the data set in one line and (2) lists the main findings you are going to present in the story.
- Too much information in slide 1. Filters are advised to leave for later slides when reader has some familiarity about the data being presented.
- Avoid using abbreviation.
- Build the story from more general facts to more specific facts.
- Total delay is not a good metric, an average or median is more appropriate.
- Try to limit your result in top/bottom 10 may be. Too many bars or circles distract reader's view.
- A little confused about the other delays mixed up.
- Use units of measurement in the axes. If you want to compare the different types of delay, do on a separate slide.
- Would love to see Month as name not number.