AutosRUs’ newest prototype, the MechaCar, is suffering from production troubles that are blocking the manufacturing team’s progress. AutosRUs’ upper management has called data analytics team to review the production data for insights that may help the manufacturing team.
-
- Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes
-
- Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots
-
- Run t-tests to determine if the manufacturing lots are statistically different from the mean population
-
- Design a statistical study to compare vehicle performance of the MechaCar vehicles against vehicles from other manufacturers. For each statistical analysis, you’ll write a summary interpretation of the findings
The analysing process (input and output in R) can be found here as github pages Results
- Using R, performed multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes with statistically significant, the result showed that vehicel_length, ground_clearance (and Intercept) provide a non-random amount of variance to the linear model of mpg.
-
According to the results, the multi linear model is:
- mpg = 6.27 * vehicle_length + 1.25e-3 * vehicle_weigth + 6.88e-2 * spoiler_angle -3.41 * AWD + 3.55 * ground_clearance - 1.04e+2
-
Approximated to:
- mpg = 6.27 * vehicle_length - 3.41 * AWD + 3.55 * ground_clearance - 104 So the slope of the linear model is not considered to be zero.
-
Adjusted R-square is 0.68 so 68% of the variations in mpg can be explained by changes in the vehicle length, the vehicle weight, the spoiler angle, the drivetrain and the ground clearance. We can consider this linear model as fairly efficient to predict mpg of MechaCar prototypes.
-
Regression line with vehicle length
- Regression line with ground clearance
- Using t.test in R, calculated the summary statistics as follwing:
- All lots PSIs together
- T-Test all manufacturing lots against the population mean
Assuming our significance level is the common 0.05 percent, our p-value of 0.05734 is grater than 0.05. Therefore, we do not have sufficient evidence to reject the null hypothesis, and we can state that the PSI across all manufacturing lots is comparable to the population mean of 1500 psi.
- T-Test each manufacturing lots against the population mean
According to the results above, Lot3 p-value is lower than 0.05 percent, so we can reject the null hypothesis and conclude that the PSI across the Lot 3 is statistically different from the population mean. Whereas, Lot1 and 2 both p-values are above the significance level, so we can not reject our null hypothesis and conclude that the PSI for Lot1 and Lot2 are comparable to the population mean (there is no statistically significant difference).
To compare the performance of the MechaCar prototype against the vehicles from the competition, we will perform a statistical analysis based on the following metrics:
- the fuel economy (mpg_city),
-
- the fuel economy (mpg_highway),
- the HorsePower,
- the wheelbase.
- 0ur null hypothesis (H0) would be: each performance metrics is similar between the MechaCar prototype and all vehicle from the other manufacturers.
- 0ur alternative hypothesis (H1) would be: at least one of the performance metrics is statistically different between the MechaCar prototype and all vehicle from the other manufacturers.
We would use a one-way ANOVA test. This test is used to compare the means of a continuous numerical variable across a number of groups. So in this analysis we would compare the means for each metric across the different manufacturers.
To perform the test, we would need data of MechaCar vehicles and its competition, all gathered in a single dataframe where each metric is a column. The example of data can be found on Kaggle such as car data