This project demonstrates Exploratory Data Analysis skills and the use of visualizations, statistical analysis for data mining.
The raw dataset comprises of information downloaded from the following Wikipedia links and dataset from R package
- https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_life_expectancy
- https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_educational_attainment
- https://en.wikipedia.org/wiki/Household_income_in_the_United_States
- https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_area
- https://en.wikipedia.org/wiki/Gun_violence_in_the_United_States_by_state
- https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/state.html
- Loading raw as .csv, .xls, .txt files
- Cleaning dirty data to account for missing values and duplicate data
- Preprocessing cleaned data for exploration
- Answer data mining questions with the help of:
- Visualization of distributions for single and mutiple variables
- Statistical analysis
- Analysis of correlations among variables