Salifort Motors - HR Analytics & Employee Turnover

This repository contains my work for the Salifort Motors HR analytics capstone project from the Google Advanced Data Analytics programme. The goal of the project is to use HR data to:

Understand which factors are most associated with employees leaving the company.
Build predictive models to estimate the probability of turnover.
Translate the findings into actionable recommendations for retention. The analysis is implemented in Python using pandas, NumPy, scikit-learn, and XGBoost, with visualisation in Matplotlib/Seaborn.

Tech stack: Python · Pandas · scikit-learn · XGBoost · NumPy · SciPy · Statsmodels

Repository structure

SB_Activity_Salifort_Motors_project_lab.ipynb
Main Jupyter notebook.
Contains:
- Exploratory data analysis (EDA) of the HR dataset.
- Feature engineering and preprocessing.
- Model training and evaluation for:
  - Baseline models (e.g. logistic regression / simple classifiers).
  - Tree-based models (Random Forest, XGBoost).
- Comparison of model performance and interpretation of key features.
HR_capstone_dataset_Salifort.csv
HR dataset used in the project (employee-level records with features such as satisfaction, tenure, salary, etc.).
hr_rf1.pickle, hr_rf2.pickle
Serialized RandomForest models fitted in the notebook.
hr_xgb_model1.json, hr_xgb_model2.json
Serialized XGBoost models.
xgb_model1_metadata.pkl, xgb_model2_metadata.pkl
Python pickles containing metadata for the XGBoost models (e.g. feature lists, preprocessing details).
html_visual_SB_Activity_Salifort_Motors_project_lab.html
HTML export of the main notebook for viewing in a browser without Jupyter.
extract_images_from_ipynb.sh
Helper shell script to extract image outputs from the notebook.
ipynb_to_html.sh
Helper shell script to convert the notebook to HTML.

Note: The model files (*.pickle, *.pkl, *.json) and HTML export are convenience artefacts.
The notebook can recreate the models from scratch as long as the required libraries and dataset are available.

How to run the notebook

1. Clone the repository

git clone https://github.com/barbavegeta/Google_Advanced_Data_Analytics-Salifort_Motors.git
cd Google_Advanced_Data_Analytics-Salifort_Motors

2. Create and activate an environment

Using conda (recommended):

conda create -n salifort_motors python=3.10 -y
conda activate salifort_motors

3. Install dependencies

pip install   numpy   pandas   matplotlib   seaborn   scikit-learn   xgboost   jupyter

(If you already have these packages installed, you can skip this step.)

4. Launch Jupyter

jupyter notebook

In the browser interface, open:

SB_Activity_Salifort_Motors_project_lab.ipynb

and run the cells from top to bottom.

Reusing the trained models

The repository includes pre-trained model artefacts:

RandomForest: hr_rf1.pickle, hr_rf2.pickle
XGBoost: hr_xgb_model1.json, hr_xgb_model2.json
XGBoost metadata: xgb_model1_metadata.pkl, xgb_model2_metadata.pkl These are primarily for convenience and reproducibility.
If you only care about understanding the analysis and the modelling workflow, you can ignore these files and simply retrain the models by running the notebook.

If you want a lighter clone of the repository, you can safely delete the model artefacts from your local copy and re-run the notebook to regenerate them.

Project outline

The main notebook roughly follows these steps:

Load data
- Read HR_capstone_dataset_Salifort.csv into a pandas DataFrame.
- Inspect structure, data types, and missing values.
Exploratory data analysis
- Summary statistics and distributions for key variables.
- Visualisations (e.g. histograms, boxplots, bar charts) to compare employees who stayed vs left.
- Correlations and simple cross-tabs.
Preprocessing and feature engineering
- Handle missing values if present.
- Encode categorical variables (e.g. one-hot encoding).
- Scale or transform features as needed for certain models.
Model training and evaluation
- Train baseline models and tree-based models (RandomForest, XGBoost).
- Split data into training and test sets.
- Evaluate models using accuracy and other metrics (e.g. precision, recall, ROC-AUC) on a hold-out set.
- Compare performance and select preferred model(s).
Interpretation and business recommendations
- Inspect feature importances / model coefficients.
- Identify which HR factors are most strongly associated with turnover.
- Translate findings into suggested actions (e.g. targeting specific risk groups, adjusting workload/compensation, or monitoring key indicators).

Notes

This project is designed as a learning and portfolio piece, not a production system.
All modelling is done on a static, de-identified dataset. No live or proprietary company data is used.
The emphasis is on:
- Clear, reproducible analysis in a single notebook.
- Interpretable results that can inform HR decisions at Salifort Motors or similar organisations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Salifort Motors - HR Analytics & Employee Turnover

Repository structure

How to run the notebook

1. Clone the repository

2. Create and activate an environment

3. Install dependencies

4. Launch Jupyter

and run the cells from top to bottom.

Reusing the trained models

Project outline

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
HR_capstone_dataset_Salifort.csv		HR_capstone_dataset_Salifort.csv
README.md		README.md
SB_Activity_Salifort_Motors_project_lab.ipynb		SB_Activity_Salifort_Motors_project_lab.ipynb
extract_images_from_ipynb.sh		extract_images_from_ipynb.sh
hr_rf1.pickle		hr_rf1.pickle
hr_rf2.pickle		hr_rf2.pickle
hr_xgb_model1.json		hr_xgb_model1.json
hr_xgb_model2.json		hr_xgb_model2.json
html_visual_SB_Activity_Salifort_Motors_project_lab.html		html_visual_SB_Activity_Salifort_Motors_project_lab.html
ipynb_to_html.sh		ipynb_to_html.sh
xgb_model1_metadata.pkl		xgb_model1_metadata.pkl
xgb_model2_metadata.pkl		xgb_model2_metadata.pkl

barbavegeta/Google_Advanced_Data_Analytics-Salifort_Motors

Folders and files

Latest commit

History

Repository files navigation

Salifort Motors - HR Analytics & Employee Turnover

Repository structure

How to run the notebook

1. Clone the repository

2. Create and activate an environment

3. Install dependencies

4. Launch Jupyter

and run the cells from top to bottom.

Reusing the trained models

Project outline

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages