Data Source Uci Rate Of Accident Per Million St

Data Source Uci Rate Of Accident Per Million St

Data Source Uci Rate Of Accident Per Million St

Online BA240 Individual Project Report

  • THIS PROJECT IS PRESENTED IN WORD FORMAT SO YOU CAN USE THE TABLES INCLUDED HERE.
  • The instructor reserves the right to adjust individual scores.
  • Individual or team projects that are just Excel printouts will receive 0 points.
  • Excel instructions are contained under “Project” link on Canvas.

INDIVIDUAL Project

Data Options: Discuss with your team members to figure out which topic you want to do the data analysis. Either you can select one of the datasets available from “Data Options” (available datasets on Canvas to choose) under “Project” or your team can find some other online dataset. “Data source” document also include some online website you may find useful to find data.

Each team member is to choose one of the independent variables (x) in the data sets to analyze along with the dependent data (y). All team members will have the same dependent variable (y) but a different independent variable (x). Review the Excel videos and linear regression (Chapter 11) before you do your own.

Number all your answers in your submission. Although you are sharing data, you must complete the analysis and interpretation individually.

  • Introduction:
  • Describing the data:
  • Analyze whether the x and y distributions satisfy the empirical rule (Yes or No, explain why). Show details such like the range of within 1 standard deviation, within 2 standard deviation and within 3 standard deviation and the corresponding true percentage falling in these ranges.
  • Identify and list all outliers in each distribution (Both X and Y) using appropriate methodology and explain why they are outliers. If you have more than 10 outliers in either distribution (X or Y) in your dataset, you can just list out the top 10 outliers.
  • Calculate the mean, median, and mode. Finish the following table for the five number summary (Minimum, Q1, median, Q3, maximum) and the z-scores of each.
  • The Regression: Show the output and all the plots from Excel from Simple Linear Regression analysis. You can copy and paste from Excel output and plots.
  • The Regression: Create a scatter plot of your independent variable against the dependent variable using Excel. Make sure your dependent variable is y and your independent is x on the graph. Write a paragraph about your finding in the scatter plot.
  • The Regression: Display the “Line Fit Plots” from the Simple Linear Regression output. Is there a linear relationship between these two variables from the plot? Explain why?
  • The Regression: Is this regression model is important/significant? Why or why not?
  • The Regression: Are all parameters important/significant? Why or why not?
  • The Regression: Show the mathematical equation of this model. Please give two examples after you have the equation. Select any two meaningful numbers of X and predict the value of Y and interpret the equation using words.
  • The Regression: Is this model a reliable predictor of y? Explain how much of variation is explained. Do you think there is a strong correlation and explain why or why not.
  • The Regression: Assumption check
  • Summary: Write at least one paragraph including: summary of your findings in the plots, numerical measurements and data analysis, what you have learned from the project, and any comments you have or any further improvement of your model.
  • Appendix if needed
  • Search some information related to your dataset topic and do an introduction about the topic.
  • Make assumptions before doing the data analysis (such as bigger house, the price of the house may be more expensive).
  • Plot histograms of your x variable and the y variable using reasonable intervals for each set. (There will be two histograms. Note that the histogram has no gap)
  • Comment on the shape of the distribution (such as the shape and right-skewed or left-skewed).

x

z-scores

y

z-scores

Mean

Median

Mode

Standard Deviation

NA

NA

Min

25 percentile

75 percentile

Max

Write a paragraph for the 4 assumptions check (1. mean of 0; 2. constant variance; 3. independent; 4. normally distributed) and explain why it satisfies or violate the assumptions.