DATA Analysis
A linear regression looks to see if there is a relationship between certain independent variables and a dependent variable. If a significant relationship exists, it will show it and display to what degree the demographic factor affects school performance.
The scatterplots will display the visual relationship between an independent variable and the dependent variable. It will show either a strong or weak relationship. Scatterplots are good to indicate a correlation or lack thereof thus, it will be useful in seeing if certain demographic factors affect school performance.
Similar to the linear regression, a chi-square test for independence will check whether a demographic characteristic affects the school performance. Unlike the linear regression, the chi-squared tests independence for each factor variable and if there exists a relationship.
Unlike the regular linear regression mentioned above, this linear regression trains a classifier based on the dataset. This machine learning strategy shows if a relationship exists and will make predictions on school performance based on the chosen x-variable.
Linear regression
Variables:
Percent above 25 w/o high school diploma
Hardship Index
Percent Below Poverty
Per Capita Income
Coefficients:
.537
-.017
-.671
.000176
From the linear regression conducted in R, we see that there is no significant relationship between any demographic variables and the math performance in the school. This means that there is not a factor that directly influences the math growth percentile significantly
Chi-Squared tests
Null hypothesis:
There exists no relationship
Variables:
Percent above 25 w/o high school diploma
Hardship Index
Percent Below Poverty
Per Capita Income
P-value:
0.209
0.245
0.2295
0.245
Alpha level = 0.05
Since all of the p-values for the tests are above the alpha level of 0.05, we must fail to reject the null hypothesis. This means we can say there is no relationship between any of the demographic factors and the school's math growth percentile. This follows a similar trend found in the linear regression model shown above
Scatterplots
Hardship Index

Household Below poverty

Per capita income

above 25 w/o high school diploma


This is the training data in which we train the linear regression classifier on the existing data. It will use this to learn an algorithm to predict math growth percentile based on the percent household below poverty.
This is the test set in which the classifier uses the algorithm it learned previously, to predict the math scores based on the percent household below poverty. As you can see, the points are very scattered and do not follow a pattern that shows any correlation.

