Skip to content

A collection of fundamental machine learning projects implementing Music Recommendation, Heart Disease Classification via SVM (including a custom solver), and various Regression techniques from scratch.

Notifications You must be signed in to change notification settings

Nikelroid/classic-ml-algorithms-implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning Homework 1: Classical Machine Learning

Python NumPy Pandas Scikit-Learn CVXOPT Matplotlib

Description

This repository contains the solutions for Homework 1 of the Deep Learning course. It focuses on the mathematical implementation of core machine learning algorithms from scratch. The primary highlight is a Latent Factor Model for music recommendation, alongside a custom Support Vector Machine (SVM) solver and various Regression techniques.

Project Breakdown

🎵 Project 1: Music Recommender System (Matrix Factorization)

This project builds a Collaborative Filtering system using the Last.fm dataset to recommend artists to users based on their listening history.

The Process:

  1. Data Ingestion & Cleaning:

    • Loads user-artist interaction data (user_artists.dat) and artist metadata (artists.dat).
    • Merges datasets to associate Artist IDs with Artist Names.
    • Cleans data by renaming the weight column to playCount and removing redundant IDs.
  2. Exploratory Data Analysis (EDA):

    • Aggregates data to calculate totalArtistPlays (popularity) and totalUniqueUsers (reach).
    • Visualizes the "Long Tail" distribution of artist popularity using Matplotlib/Seaborn.
  3. Feature Engineering (Implicit Ratings):

    • Since the data consists of "play counts" rather than explicit 1-5 star ratings, the system must derive a rating.
    • Min-Max Scaling: Converts raw playCount into a normalized playCountScaled score between 0 and 1. This serves as the implicit confidence level of a user's preference for an artist.
  4. Matrix Construction:

    • Constructs a sparse User-Item Interaction Matrix where rows represent Users and columns represent Artists.
    • Calculates dataset sparsity (approx. 0.28%) to justify the need for Matrix Factorization.
  5. Model Implementation (Matrix Factorization):

    • Algorithm: Implements a custom Recommender class based on Latent Factorization (similar to FunkSVD).
    • Latent Factors: Decomposes the Interaction Matrix ($R$) into two lower-dimensional matrices:
      • $P$ (User Factors): Represents user preferences for hidden features.
      • $Q$ (Item Factors): Represents how much each artist possesses those hidden features.
    • Training (Stochastic Gradient Descent):
      • Initializes $P$ and $Q$ with random Gaussian noise.
      • Iterates through non-zero ratings for n_epochs (30).
      • Loss Function: Minimizes the squared error between actual rating ($r_{ui}$) and predicted rating ($q_i \cdot p_u$).
      • Update Rule: Updates factors $P$ and $Q$ using a learning rate ($\alpha=0.1$) and regularization parameter ($\lambda=1$) to prevent overfitting.
  6. Inference & Recommendation:

    • Computes the full prediction matrix via dot product: $\hat{R} = Q \times P$.
    • For a target user, identifies artists they have not listened to yet.
    • Ranks these unseen artists by their predicted scores and returns the top $N$ recommendations.

❤️ Project 2: Heart Disease Classification (SVM)

A binary classification task to predict heart disease risk.

  • Preprocessing: Handles outliers using Z-Score (threshold=3) and normalizes numerical features to [0, 1].
  • Sklearn Benchmark: Evaluates Linear, Polynomial, and RBF kernels.
  • Custom SVM Solver:
    • Implements a MySVM class from scratch.
    • Uses the cvxopt library to solve the Quadratic Programming (QP) dual problem for optimization.
    • Kernel Trick: Manually implements Linear, Polynomial, and RBF kernel functions to project data into higher dimensions for non-linear separation.

📈 Project 3: Regression Analysis

An exploration of regression techniques to predict target variables.

  • Linear Regression: Derives the closed-form solution using the Normal Equation: $\theta = (X^T X)^{-1} X^T y$.
  • Locally Weighted Regression (LWR): Implements weighted least squares with a Gaussian kernel, assigning higher weights to training points closer to the query point for non-linear fits.
  • KNN Regression: Predicts values by averaging the targets of the $k$-nearest neighbors.

Installation

To run these notebooks, install the required dependencies:

pip install numpy pandas matplotlib seaborn scikit-learn tqdm cvxopt

Usage

  1. Clone the repository:
    git clone [https://github.com/your-username/Classic-ML-Algorithms-Implementation.git](https://github.com/your-username/Classic-ML-Algorithms-Implementation.git)
  2. Navigate to the directory and start Jupyter:
    jupyter notebook
  3. Open DL2022-HW1-P1.ipynb to see the Music Recommender system in action.

License

Distributed under the MIT License.

About

A collection of fundamental machine learning projects implementing Music Recommendation, Heart Disease Classification via SVM (including a custom solver), and various Regression techniques from scratch.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published