DATA INTEGRATION FINAL PROJECT
Evaluating Chicago Public Schools and Demographics
INTRODUCTION
Our data is sourced from three different places.
Per Capita Income dataset from the Chicago Data Portal. This dataset included six socioeconomic indicators of different community areas in Chicago.
Chicago Public Schools Data on Kaggle, also provided by the Chicago Data Portal. This data includes information about the school ranging from the address to certifications to Math and Reading scores, and the assessment were initially of interest. Real Estate For Sale by Chicago Zip-Code was found on a Real Estate website for Chicago and used the data because it included the Chicago areas with the corresponding zip-code.
PROBLEM
The problem underlined in this project is precisely that of Schema Heterogeneity.
The datasets of interest were from different data sources and made for different purposes. The team needed to map out exactly how to merge the datasets in a logical and efficient way. Balance was need to try to minimize the loss of data and information but also to maximize the correctness and integrity of the original data.
GOAL
This project aimed to successfully integrate the data in a correct and useful way. The team targeted their efforts to present the data in a meaningful way through data analysis and visualizations.
This ultimately enabled them to understand the relationships between demographic and school performance in subareas in the city of Chicago