🔗 https://ecommerce-search.onrender.com/
Visualize the full project structure here
The system follows a layered architecture with a React frontend, Flask backend, PostgreSQL database, Redis caching layer and an ML-powered ranking pipeline.
%%{init: {'theme': 'neutral'}}%%
graph TB
subgraph Frontend["Frontend (React/Vite)"]
UI["User Interface (React Components)"]
useAuth["useAuth Hook"]
useSearch["useSearch Hook"]
useCart["useCart Hook"]
useAnalytics["useAnalytics Hook"]
API_JS["api.js (HTTP Client)"]
end
subgraph Backend["Backend (Flask + Python)"]
Routes["Routes Layer"]
Controllers["Controllers Layer"]
Services["Services Layer"]
Utils["Utils (sanitize, search, intent, db helpers)"]
end
subgraph Cache["Cache Layer"]
Redis["Redis (Upstash) - TTL 5 min"]
end
subgraph Database["PostgreSQL Database"]
Users["users"]
Products["products"]
SearchEvents["search_events"]
CartItems["cart_items"]
end
subgraph ML["ML Pipeline"]
Retrain["Retrain Trigger"]
Jobs["RQ Background Jobs"]
Ranker["LightGBM Ranker"]
Clustering["KMeans Clustering"]
Profiles["User Profiles"]
Vectorizer["TF-IDF Vectorizer"]
Model["ranking_model.pkl"]
end
UI --> API_JS
useAuth --> API_JS
useSearch --> API_JS
useCart --> API_JS
useAnalytics --> API_JS
API_JS --> Routes
Routes --> Controllers
Controllers --> Services
Controllers --> Utils
Services --> Database
Services --> Redis
Redis --> Services
Controllers --> Retrain
Retrain --> Jobs
Jobs --> Ranker
Jobs --> Clustering
Ranker --> Database
Clustering --> Database
Profiles --> Database
Vectorizer --> Database
Ranker --> Model
A production-ready, ML-powered ecommerce search engine designed to simulate real-world search, personalization, and ranking systems used in modern ecommerce platforms.
This system integrates:
- 🔐 Secure authentication with email verification
- 📊 Event-driven analytics & A/B testing
- 🧠 ML-based personalized ranking
- 👥 User clustering for segmentation
- 🔎 PostgreSQL
tsvectorfull-text search - ⚡ Redis caching for performance
- 🔄 Background job processing with RQ
It is built to demonstrate scalability, personalization, and system design best practices.
- Flask
- Flask-CORS
- SQLAlchemy
- Redis
- RQ (Redis Queue)
- PostgreSQL (Neon – recommended for production, supports
tsvector) - SQLite (local development only)
- React
- Vite
- TailwindCSS-inspired UI
- scikit-learn
- pandas
- NumPy
- joblib
- Redis (caching + queue system)
- Background workers for async jobs
This project is not compatible with Python 3.13.
brew install python@3.11macOS / Linux
python3.11 -m venv venv
source venv/bin/activateWindows
python -m venv venv
venv\Scripts\activatepip install --upgrade pip
pip install -r requirements.txtCreate a .env file:
cp .env.example .envDATABASE_URL=
REDIS_URL=
SECRET_KEY=
- Create a project at https://neon.tech
- Copy your connection string
- Update
.env:
DATABASE_URL=postgresql://user:password@host/dbname?sslmode=require
REDIS_URL=redis://localhost:6379/0DATABASE_URL=sqlite:///data/ecommerce.db
REDIS_URL=redis://localhost:6379/0Email is sent via Brevo (free tier: 300 emails/day, no custom domain needed).
If BREVO_API_KEY is not set, emails are logged to console only (dev fallback).
- Sign up at app.brevo.com
- Verify your sender email under Senders & IPs → Senders
- Create an API key under SMTP & API → API Keys
- Set in your environment:
BREVO_API_KEY=xkeysib-your-api-key
FROM_EMAIL=your@gmail.com
FROM_NAME=Ecommerce Search
FRONTEND_URL=https://your-app.onrender.comCreate frontend/.env.local:
VITE_API_BASE_URL=http://localhost:5000/apiThis tells the React frontend where the backend API is located.
To enable admin cache management for specific users, set in backend .env:
ADMIN_USER_IDS=your-user-idAdmin users can:
- View real-time cache statistics & hit rates
- Manually invalidate search caches
- Invalidate recommendation caches
- Reset cache statistics
Access is controlled entirely by ADMIN_USER_IDS on the backend — no frontend secret needed.
Ensure:
- PostgreSQL (if using)
- Redis server
are running.
python -m backend.appBackend runs at:
http://127.0.0.1:5000
Tables and tsvector columns auto-create on first run.
python -m ml.generate_fake_dataOr import your own product dataset.
python -m backend.workerHandles:
- Model retraining
- User clustering
- Analytics updates
python -m ml.train_ranker
python -m ml.assign_user_clusterscd frontend
npm install
npm run devFrontend runs at:
http://localhost:5173
- Ranked relevance scoring
- Fast fuzzy matching
- Indexed for scalability
- Handles large catalogs efficiently
- Signup/Login
- Email verification
- Password hashing (bcrypt)
- Password reset
- Input validation & sanitization
- SQL injection protection (ORM-based)
- Product clicks
- Add-to-cart events
- Search queries
- A/B group tagging
- Timestamp logging
- ML ranking model
- User profile vectors
- Segment-based clustering
- Recent activity boost
- Popularity weighting
- Behavior-based segmentation
- Automated updates
- Improves recommendation diversity
- Personalized vs Popularity ranking
- Performance comparison
- CLI-based analytics
Run:
python -m ml.analytics- Add/remove items
- Persistent per-user storage
- Real-time totals
- Cart clearing
- A/B experiment group metrics (CTR, conversion rates)
- Top search queries
- User cluster distribution
- Visible to all logged-in users
- View real-time cache statistics & hit rates
- Monitor cache performance
- Manually invalidate search caches
- Manually invalidate recommendation caches
- Reset cache statistics
- Admin-only (requires
ADMIN_USER_IDS)
- Efficient result set navigation
- Stateless pagination (cursor = offset + product_id)
- Prevents "skip=999999" performance issues
- Ranked result caching per cursor
Usage:
// Frontend
searchProducts(query, userId, { cursor: 0, limit: 20 })- Redis query cache (5 minutes TTL)
- Ranked search result caching
- Product attribute cache
- Session cache
- Cache invalidation on product updates
- Indexed columns
- Composite indexes
- Connection pooling (5 pool / 10 overflow)
- Batch operations
| Component | Trigger |
|---|---|
| Ranking Model | 500 events OR 24h |
| Clusters | 200 events OR 6h |
| User Profiles | Every 5 minutes |
backend/
models.py
database.py
worker.py
controllers/
routes/
services/
utils/
frontend/
ml/
data/
git clone https://github.com/srbmaury/Ecommerce-Search.git
cd Ecommerce-SearchSet:
- DATABASE_URL
- REDIS_URL
- Email config (optional)
from backend.app import create_app
application = create_app()python -m backend.workercd frontend
npm run buildServe frontend/dist via Flask static config.
- bcrypt password hashing
- ORM-based SQL injection protection
- Foreign key constraints
- Environment-based configuration
- Background job isolation
- Rate limiting (flask-limiter)
- HTTPS only
- Secure cookies
- CSRF protection
- Monitoring & logging
- Credential rotation
- Automated backups
- Sign up and verify email
- Search products
- Click & add to cart
- Observe ranking behavior
- Run analytics CLI
- Retrain models
- Iterate on ranking logic
- End-to-end full-stack architecture
- Search engine design
- Machine learning integration
- Caching & asynchronous processing
- A/B experimentation
- Scalable backend patterns
This project mirrors how modern ecommerce systems handle:
- Search relevance
- Personalization
- Data-driven iteration
- Performance optimization
- User segmentation
This is not just a demo app — it is a system design exercise combining ML, backend engineering, search architecture, and scalability patterns.
Built for learning, experimentation, and real-world production thinking.