This repository contains the solutions for Homework 1 of the Deep Learning course. It focuses on the mathematical implementation of core machine learning algorithms from scratch. The primary highlight is a Latent Factor Model for music recommendation, alongside a custom Support Vector Machine (SVM) solver and various Regression techniques.
This project builds a Collaborative Filtering system using the Last.fm dataset to recommend artists to users based on their listening history.
-
Data Ingestion & Cleaning:
- Loads user-artist interaction data (
user_artists.dat) and artist metadata (artists.dat). - Merges datasets to associate Artist IDs with Artist Names.
- Cleans data by renaming the
weightcolumn toplayCountand removing redundant IDs.
- Loads user-artist interaction data (
-
Exploratory Data Analysis (EDA):
- Aggregates data to calculate
totalArtistPlays(popularity) andtotalUniqueUsers(reach). - Visualizes the "Long Tail" distribution of artist popularity using Matplotlib/Seaborn.
- Aggregates data to calculate
-
Feature Engineering (Implicit Ratings):
- Since the data consists of "play counts" rather than explicit 1-5 star ratings, the system must derive a rating.
-
Min-Max Scaling: Converts raw
playCountinto a normalizedplayCountScaledscore between 0 and 1. This serves as the implicit confidence level of a user's preference for an artist.
-
Matrix Construction:
- Constructs a sparse User-Item Interaction Matrix where rows represent Users and columns represent Artists.
- Calculates dataset sparsity (approx. 0.28%) to justify the need for Matrix Factorization.
-
Model Implementation (Matrix Factorization):
-
Algorithm: Implements a custom
Recommenderclass based on Latent Factorization (similar to FunkSVD). -
Latent Factors: Decomposes the Interaction Matrix (
$R$ ) into two lower-dimensional matrices:-
$P$ (User Factors): Represents user preferences for hidden features. -
$Q$ (Item Factors): Represents how much each artist possesses those hidden features.
-
-
Training (Stochastic Gradient Descent):
- Initializes
$P$ and$Q$ with random Gaussian noise. - Iterates through non-zero ratings for
n_epochs(30). -
Loss Function: Minimizes the squared error between actual rating (
$r_{ui}$ ) and predicted rating ($q_i \cdot p_u$ ). -
Update Rule: Updates factors
$P$ and$Q$ using a learning rate ($\alpha=0.1$ ) and regularization parameter ($\lambda=1$ ) to prevent overfitting.
- Initializes
-
Algorithm: Implements a custom
-
Inference & Recommendation:
- Computes the full prediction matrix via dot product:
$\hat{R} = Q \times P$ . - For a target user, identifies artists they have not listened to yet.
- Ranks these unseen artists by their predicted scores and returns the top
$N$ recommendations.
- Computes the full prediction matrix via dot product:
A binary classification task to predict heart disease risk.
- Preprocessing: Handles outliers using Z-Score (threshold=3) and normalizes numerical features to [0, 1].
- Sklearn Benchmark: Evaluates Linear, Polynomial, and RBF kernels.
- Custom SVM Solver:
- Implements a
MySVMclass from scratch. - Uses the
cvxoptlibrary to solve the Quadratic Programming (QP) dual problem for optimization. - Kernel Trick: Manually implements Linear, Polynomial, and RBF kernel functions to project data into higher dimensions for non-linear separation.
- Implements a
An exploration of regression techniques to predict target variables.
-
Linear Regression: Derives the closed-form solution using the Normal Equation:
$\theta = (X^T X)^{-1} X^T y$ . - Locally Weighted Regression (LWR): Implements weighted least squares with a Gaussian kernel, assigning higher weights to training points closer to the query point for non-linear fits.
-
KNN Regression: Predicts values by averaging the targets of the
$k$ -nearest neighbors.
To run these notebooks, install the required dependencies:
pip install numpy pandas matplotlib seaborn scikit-learn tqdm cvxopt- Clone the repository:
git clone [https://github.com/your-username/Classic-ML-Algorithms-Implementation.git](https://github.com/your-username/Classic-ML-Algorithms-Implementation.git)
- Navigate to the directory and start Jupyter:
jupyter notebook
- Open
DL2022-HW1-P1.ipynbto see the Music Recommender system in action.
Distributed under the MIT License.