Data Analysis

DATA Analysis

A linear regression looks to see if there is a relationship between certain independent variables and a dependent variable. If a significant relationship exists, it will show it and display to what degree the demographic factor affects school performance.

Click for ScatterPlots

The scatterplots will display the visual relationship between an independent variable and the dependent variable. It will show either a strong or weak relationship. Scatterplots are good to indicate a correlation or lack thereof thus, it will be useful in seeing if certain demographic factors affect school performance.

Click for

Chi-Squared Tests for Independence

Similar to the linear regression, a chi-square test for independence will check whether a demographic characteristic affects the school performance. Unlike the linear regression, the chi-squared tests independence for each factor variable and if there exists a relationship.

Click Here for linear regression ML

Unlike the regular linear regression mentioned above, this linear regression trains a classifier based on the dataset. This machine learning strategy shows if a relationship exists and will make predictions on school performance based on the chosen x-variable.

Linear regression

Variables:

Percent above 25 w/o high school diploma

Hardship Index

Percent Below Poverty

Per Capita Income

Coefficients:

.537

-.017

-.671

.000176

From the linear regression conducted in R, we see that there is no significant relationship between any demographic variables and the math performance in the school. This means that there is not a factor that directly influences the math growth percentile significantly

Linear Regression

Chi-Square

Chi-Squared tests

Null hypothesis:

There exists no relationship

Variables:

Percent above 25 w/o high school diploma

Hardship Index

Percent Below Poverty

Per Capita Income

P-value:

0.209

0.245 0.2295

0.245 Alpha level = 0.05

Since all of the p-values for the tests are above the alpha level of 0.05, we must fail to reject the null hypothesis. This means we can say there is no relationship between any of the demographic factors and the school's math growth percentile. This follows a similar trend found in the linear regression model shown above

Scatterplots

Hardship Index

Screen Shot 2019-12-12 at 7.42.14 PM.png

Household Below poverty

Screen Shot 2019-12-12 at 7.42.55 PM.png

Per capita income

Screen Shot 2019-12-12 at 7.42.29 PM.png

above 25 w/o high school diploma

Screen Shot 2019-12-12 at 7.43.06 PM.png

Scatterplots

Screen Shot 2019-12-12 at 7.54.57 PM.png

This is the training data in which we train the linear regression classifier on the existing data. It will use this to learn an algorithm to predict math growth percentile based on the percent household below poverty.

This is the test set in which the classifier uses the algorithm it learned previously, to predict the math scores based on the percent household below poverty. As you can see, the points are very scattered and do not follow a pattern that shows any correlation.

Screen Shot 2019-12-12 at 7.55.06 PM.png

Screen Shot 2019-12-12 at 7.55.15 PM.png

The chart to the left compares the predicted math growth percentile based on percent below poverty calculated by the linear classifier. It is clear, from looking at the differences between predicted and actual, that percent household below poverty is not a good predictor.

Linear Regression ML

Linear Regression

Machine Learning

PREFACE:

We decided to only use one independent variable because as the previous analysis has shown, no indicator affects the math growth percentile. Therefore, we did not see the need to include every independent variable in the classifier to display evidence

Caruso

Magliente

Palegar

Poyer

DATA Analysis

Click for

Linear Regression

Click for ScatterPlots

Click for

Chi-Squared Tests for Independence

Click Here for linear regression ML

Linear regression

Variables:

Percent above 25 w/o high school diploma

Hardship Index

Percent Below Poverty

Per Capita Income

Coefficients:

.537

-.017

-.671

.000176

From the linear regression conducted in R, we see that there is no significant relationship between any demographic variables and the math performance in the school. This means that there is not a factor that directly influences the math growth percentile significantly

Chi-Squared tests

Null hypothesis:

There exists no relationship

Variables:

Percent above 25 w/o high school diploma

Hardship Index

Percent Below Poverty

Per Capita Income

P-value:

0.209

0.245

0.2295

0.245

Alpha level = 0.05

Scatterplots

Hardship Index

Household Below poverty

Per capita income

above 25 w/o high school diploma

This is the training data in which we train the linear regression classifier on the existing data. It will use this to learn an algorithm to predict math growth percentile based on the percent household below poverty.

This is the test set in which the classifier uses the algorithm it learned previously, to predict the math scores based on the percent household below poverty. As you can see, the points are very scattered and do not follow a pattern that shows any correlation.

The chart to the left compares the predicted math growth percentile based on percent below poverty calculated by the linear classifier. It is clear, from looking at the differences between predicted and actual, that percent household below poverty is not a good predictor.

Linear Regression

Machine Learning

PREFACE:

We decided to only use one independent variable because as the previous analysis has shown, no indicator affects the math growth percentile. Therefore, we did not see the need to include every independent variable in the classifier to display evidence

Caruso

Magliente

Palegar

Poyer

LINKS

CONTRIBUTORS