Skip to content

Rutgers-Data-Science-Bootcamp/MechaCar_Statistical_Analysis

Repository files navigation

MechaCar_Statistical_Analysis

Background

AutosRUs’ newest prototype, the MechaCar, is suffering from production troubles that are blocking the manufacturing team’s progress. AutosRUs’ upper management has called data analytics team to review the production data for insights that may help the manufacturing team.

Tools (R) and Approaches

    1. Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes
    1. Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots
    1. Run t-tests to determine if the manufacturing lots are statistically different from the mean population
    1. Design a statistical study to compare vehicle performance of the MechaCar vehicles against vehicles from other manufacturers. For each statistical analysis, you’ll write a summary interpretation of the findings

Resources

Results

The analysing process (input and output in R) can be found here as github pages Results

1. Multiple linear regression model to predict MPG

  • Using R, performed multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes with statistically significant, the result showed that vehicel_length, ground_clearance (and Intercept) provide a non-random amount of variance to the linear model of mpg.

Screen Shot 2022-09-11 at 11 59 25 PM

  • According to the results, the multi linear model is:

    • mpg = 6.27 * vehicle_length + 1.25e-3 * vehicle_weigth + 6.88e-2 * spoiler_angle -3.41 * AWD + 3.55 * ground_clearance - 1.04e+2
  • Approximated to:

    • mpg = 6.27 * vehicle_length - 3.41 * AWD + 3.55 * ground_clearance - 104 So the slope of the linear model is not considered to be zero.
  • Adjusted R-square is 0.68 so 68% of the variations in mpg can be explained by changes in the vehicle length, the vehicle weight, the spoiler angle, the drivetrain and the ground clearance. We can consider this linear model as fairly efficient to predict mpg of MechaCar prototypes.

  • Regression line with vehicle length

Rplot1

  • Regression line with ground clearance

Rplot2

2.Summary Statistics on Suspension Coils

  • Using t.test in R, calculated the summary statistics as follwing:
    • All lots PSIs together

Screen Shot 2022-09-12 at 12 14 58 AM

three lots in separate

Screen Shot 2022-09-12 at 12 15 11 AM

The design specs are respected for all manufacturing lots in total with a global variance of 62.3 psi. On the lot level, Lot 1 and Lot 2 are into specs with respectively variances of 0.98 and 7.5 psi. The Lot 3 is out of specs with a variance of 170.3 psi.

3. T-Tests on Suspension Coils

  • T-Test all manufacturing lots against the population mean

Screen Shot 2022-09-12 at 12 22 44 AM

Assuming our significance level is the common 0.05 percent, our p-value of 0.05734 is grater than 0.05. Therefore, we do not have sufficient evidence to reject the null hypothesis, and we can state that the PSI across all manufacturing lots is comparable to the population mean of 1500 psi.

  • T-Test each manufacturing lots against the population mean
Lot1

Screen Shot 2022-09-12 at 12 26 38 AM

Lot2

Screen Shot 2022-09-12 at 12 28 51 AM

Lot3

Screen Shot 2022-09-12 at 12 29 00 AM

According to the results above, Lot3 p-value is lower than 0.05 percent, so we can reject the null hypothesis and conclude that the PSI across the Lot 3 is statistically different from the population mean. Whereas, Lot1 and 2 both p-values are above the significance level, so we can not reject our null hypothesis and conclude that the PSI for Lot1 and Lot2 are comparable to the population mean (there is no statistically significant difference).

4. Study Design: MechaCar vs Competition

To compare the performance of the MechaCar prototype against the vehicles from the competition, we will perform a statistical analysis based on the following metrics:

  • the fuel economy (mpg_city),
    • the fuel economy (mpg_highway),
  • the HorsePower,
  • the wheelbase.

^ back to top ^

Hypothesis
  • 0ur null hypothesis (H0) would be: each performance metrics is similar between the MechaCar prototype and all vehicle from the other manufacturers.
  • 0ur alternative hypothesis (H1) would be: at least one of the performance metrics is statistically different between the MechaCar prototype and all vehicle from the other manufacturers.

We would use a one-way ANOVA test. This test is used to compare the means of a continuous numerical variable across a number of groups. So in this analysis we would compare the means for each metric across the different manufacturers.

To perform the test, we would need data of MechaCar vehicles and its competition, all gathered in a single dataframe where each metric is a column. The example of data can be found on Kaggle such as car data

Releases

No releases published

Packages

No packages published