🚀 Ecommerce Search Engine

🌐 Live Demo

🔗 https://ecommerce-search.onrender.com/

📂 Project Structure

Visualize the full project structure here

🏗 High-Level System Architecture

The system follows a layered architecture with a React frontend, Flask backend, PostgreSQL database, Redis caching layer and an ML-powered ranking pipeline.

%%{init: {'theme': 'neutral'}}%%
graph TB

    subgraph Frontend["Frontend (React/Vite)"]
        UI["User Interface (React Components)"]
        useAuth["useAuth Hook"]
        useSearch["useSearch Hook"]
        useCart["useCart Hook"]
        useAnalytics["useAnalytics Hook"]
        API_JS["api.js (HTTP Client)"]
    end

    subgraph Backend["Backend (Flask + Python)"]
        Routes["Routes Layer"]
        Controllers["Controllers Layer"]
        Services["Services Layer"]
        Utils["Utils (sanitize, search, intent, db helpers)"]
    end

    subgraph Cache["Cache Layer"]
        Redis["Redis (Upstash) - TTL 5 min"]
    end

    subgraph Database["PostgreSQL Database"]
        Users["users"]
        Products["products"]
        SearchEvents["search_events"]
        CartItems["cart_items"]
    end

    subgraph ML["ML Pipeline"]
        Retrain["Retrain Trigger"]
        Jobs["RQ Background Jobs"]
        Ranker["LightGBM Ranker"]
        Clustering["KMeans Clustering"]
        Profiles["User Profiles"]
        Vectorizer["TF-IDF Vectorizer"]
        Model["ranking_model.pkl"]
    end

    UI --> API_JS
    useAuth --> API_JS
    useSearch --> API_JS
    useCart --> API_JS
    useAnalytics --> API_JS

    API_JS --> Routes
    Routes --> Controllers
    Controllers --> Services
    Controllers --> Utils

    Services --> Database
    Services --> Redis
    Redis --> Services

    Controllers --> Retrain
    Retrain --> Jobs
    Jobs --> Ranker
    Jobs --> Clustering

    Ranker --> Database
    Clustering --> Database
    Profiles --> Database
    Vectorizer --> Database

    Ranker --> Model

📌 Overview

A production-ready, ML-powered ecommerce search engine designed to simulate real-world search, personalization, and ranking systems used in modern ecommerce platforms.

This system integrates:

🔐 Secure authentication with email verification
📊 Event-driven analytics & A/B testing
🧠 ML-based personalized ranking
👥 User clustering for segmentation
🔎 PostgreSQL tsvector full-text search
⚡ Redis caching for performance
🔄 Background job processing with RQ

It is built to demonstrate scalability, personalization, and system design best practices.

🛠 Tech Stack

Backend

Flask
Flask-CORS
SQLAlchemy
Redis
RQ (Redis Queue)

Database

PostgreSQL (Neon – recommended for production, supports tsvector)
SQLite (local development only)

Frontend

React
Vite
TailwindCSS-inspired UI

Machine Learning

scikit-learn
pandas
NumPy
joblib

Infrastructure

Redis (caching + queue system)
Background workers for async jobs

⚙️ Setup Guide

1️⃣ Python Environment (Required)

⚠️ Python 3.11 is recommended.
This project is not compatible with Python 3.13.

Install Python 3.11 (macOS/Homebrew)

brew install python@3.11

Create Virtual Environment

macOS / Linux

python3.11 -m venv venv
source venv/bin/activate

Windows

python -m venv venv
venv\Scripts\activate

2️⃣ Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

3️⃣ Configure Environment Variables

Create a .env file:

cp .env.example .env

Required Variables

DATABASE_URL=
REDIS_URL=
SECRET_KEY=

📌 Option A — PostgreSQL (Recommended)

Create a project at https://neon.tech
Copy your connection string
Update .env:

DATABASE_URL=postgresql://user:password@host/dbname?sslmode=require
REDIS_URL=redis://localhost:6379/0

📌 Option B — SQLite (Local Development Only)

DATABASE_URL=sqlite:///data/ecommerce.db
REDIS_URL=redis://localhost:6379/0

4️⃣ Email Verification Configuration (Optional but Recommended)

Email is sent via Brevo (free tier: 300 emails/day, no custom domain needed).
If BREVO_API_KEY is not set, emails are logged to console only (dev fallback).

Setup

Sign up at app.brevo.com
Verify your sender email under Senders & IPs → Senders
Create an API key under SMTP & API → API Keys
Set in your environment:

BREVO_API_KEY=xkeysib-your-api-key
FROM_EMAIL=your@gmail.com
FROM_NAME=Ecommerce Search
FRONTEND_URL=https://your-app.onrender.com

5️⃣ Frontend Environment Configuration

Create frontend/.env.local:

VITE_API_BASE_URL=http://localhost:5000/api

This tells the React frontend where the backend API is located.

6️⃣ Admin Dashboard Configuration (Optional)

To enable admin cache management for specific users, set in backend .env:

ADMIN_USER_IDS=your-user-id

Admin users can:

View real-time cache statistics & hit rates
Manually invalidate search caches
Invalidate recommendation caches
Reset cache statistics

Access is controlled entirely by ADMIN_USER_IDS on the backend — no frontend secret needed.

7️⃣ Start Required Services

Ensure:

PostgreSQL (if using)
Redis server

are running.

8️⃣ Run the Backend

python -m backend.app

Backend runs at:

http://127.0.0.1:5000

Tables and tsvector columns auto-create on first run.

9️⃣ Populate Database (Optional for Testing)

Generate Fake Data

python -m ml.generate_fake_data

Or import your own product dataset.

🔟 Start Background Worker (Required for Full Functionality)

python -m backend.worker

Handles:

Model retraining
User clustering
Analytics updates

1️⃣1️⃣ Train ML Models (Optional Manual Trigger)

python -m ml.train_ranker
python -m ml.assign_user_clusters

1️⃣2️⃣ Run the Frontend

cd frontend
npm install
npm run dev

Frontend runs at:

http://localhost:5173

🔎 Full-Text Search (PostgreSQL `tsvector`)

Ranked relevance scoring
Fast fuzzy matching
Indexed for scalability
Handles large catalogs efficiently

✨ Features

🔐 Authentication & Security

Signup/Login
Email verification
Password hashing (bcrypt)
Password reset
Input validation & sanitization
SQL injection protection (ORM-based)

📊 Event Tracking

Product clicks
Add-to-cart events
Search queries
A/B group tagging
Timestamp logging

🧠 Personalized Ranking

ML ranking model
User profile vectors
Segment-based clustering
Recent activity boost
Popularity weighting

👥 User Clustering

Behavior-based segmentation
Automated updates
Improves recommendation diversity

📈 A/B Testing

Personalized vs Popularity ranking
Performance comparison
CLI-based analytics

Run:

python -m ml.analytics

🛒 Shopping Cart

Add/remove items
Persistent per-user storage
Real-time totals
Cart clearing

📊 Analytics Dashboard

A/B experiment group metrics (CTR, conversion rates)
Top search queries
User cluster distribution
Visible to all logged-in users

🔧 Admin Cache Management

View real-time cache statistics & hit rates
Monitor cache performance
Manually invalidate search caches
Manually invalidate recommendation caches
Reset cache statistics
Admin-only (requires ADMIN_USER_IDS)

⚡ Performance Optimizations

Search: Cursor-Based Pagination

Efficient result set navigation
Stateless pagination (cursor = offset + product_id)
Prevents "skip=999999" performance issues
Ranked result caching per cursor

Usage:

// Frontend
searchProducts(query, userId, { cursor: 0, limit: 20 })

Caching

Redis query cache (5 minutes TTL)
Ranked search result caching
Product attribute cache
Session cache
Cache invalidation on product updates

Database Optimization

Indexed columns
Composite indexes
Connection pooling (5 pool / 10 overflow)
Batch operations

Auto-Retrain Triggers

Component	Trigger
Ranking Model	500 events OR 24h
Clusters	200 events OR 6h
User Profiles	Every 5 minutes

🗂 File Structure

backend/
  models.py
  database.py
  worker.py
  controllers/
  routes/
  services/
  utils/

frontend/
ml/
data/

🚀 Deployment Guide (VPS / PythonAnywhere)

1️⃣ Clone Repository

git clone https://github.com/srbmaury/Ecommerce-Search.git
cd Ecommerce-Search

2️⃣ Configure `.env`

Set:

DATABASE_URL
REDIS_URL
Email config (optional)

3️⃣ Configure WSGI

from backend.app import create_app
application = create_app()

4️⃣ Start Redis & Worker

python -m backend.worker

5️⃣ Build Frontend

cd frontend
npm run build

Serve frontend/dist via Flask static config.

🔐 Security Notes

Current Protections

bcrypt password hashing
ORM-based SQL injection protection
Foreign key constraints
Environment-based configuration
Background job isolation

Recommended for Production

Rate limiting (flask-limiter)
HTTPS only
Secure cookies
CSRF protection
Monitoring & logging
Credential rotation
Automated backups

🧪 Suggested Workflow

Sign up and verify email
Search products
Click & add to cart
Observe ranking behavior
Run analytics CLI
Retrain models
Iterate on ranking logic

📌 What This Project Demonstrates

End-to-end full-stack architecture
Search engine design
Machine learning integration
Caching & asynchronous processing
A/B experimentation
Scalable backend patterns

This project mirrors how modern ecommerce systems handle:

Search relevance
Personalization
Data-driven iteration
Performance optimization
User segmentation

🏁 Final Note

This is not just a demo app — it is a system design exercise combining ML, backend engineering, search architecture, and scalability patterns.

Built for learning, experimentation, and real-world production thinking.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
backend		backend
data		data
frontend		frontend
ml		ml
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚀 Ecommerce Search Engine

🌐 Live Demo

📂 Project Structure

🏗 High-Level System Architecture

📌 Overview

🛠 Tech Stack

Backend

Database

Frontend

Machine Learning

Infrastructure

⚙️ Setup Guide

1️⃣ Python Environment (Required)

Install Python 3.11 (macOS/Homebrew)

Create Virtual Environment

2️⃣ Install Dependencies

3️⃣ Configure Environment Variables

Required Variables

📌 Option A — PostgreSQL (Recommended)

📌 Option B — SQLite (Local Development Only)

4️⃣ Email Verification Configuration (Optional but Recommended)

Setup

5️⃣ Frontend Environment Configuration

6️⃣ Admin Dashboard Configuration (Optional)

7️⃣ Start Required Services

8️⃣ Run the Backend

9️⃣ Populate Database (Optional for Testing)

Generate Fake Data

🔟 Start Background Worker (Required for Full Functionality)

1️⃣1️⃣ Train ML Models (Optional Manual Trigger)

1️⃣2️⃣ Run the Frontend

🔎 Full-Text Search (PostgreSQL tsvector)

✨ Features

🔐 Authentication & Security

📊 Event Tracking

🧠 Personalized Ranking

👥 User Clustering

📈 A/B Testing

🛒 Shopping Cart

📊 Analytics Dashboard

🔧 Admin Cache Management

⚡ Performance Optimizations

Search: Cursor-Based Pagination

Caching

Database Optimization

Auto-Retrain Triggers

🗂 File Structure

🚀 Deployment Guide (VPS / PythonAnywhere)

1️⃣ Clone Repository

2️⃣ Configure .env

3️⃣ Configure WSGI

4️⃣ Start Redis & Worker

5️⃣ Build Frontend

🔐 Security Notes

Current Protections

Recommended for Production

🧪 Suggested Workflow

📌 What This Project Demonstrates

🏁 Final Note

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔎 Full-Text Search (PostgreSQL `tsvector`)

2️⃣ Configure `.env`

Packages