Skip to content

launix-de/memcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,462 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MemCP — Persistent Main Memory Database with MySQL Compatibility

MemCP is a MySQL-compatible database that keeps your data fully in main memory for maximum speed. Your data is safe — it's persisted to disk automatically and survives restarts — but because there's no disk I/O on the query path, reads and aggregations run 10–100× faster than MySQL.

Use your existing MySQL connector, ORMs, and SQL queries. No migration, no rewrite.

Status: Beta. Core SQL (SELECT, INSERT, UPDATE, DELETE, JOIN, GROUP BY, subqueries, triggers) works well in production. Some advanced SQL edge cases are still being improved — check open issues if a specific query doesn't behave as expected.

Why Switch from MySQL? 💡

10-100x Faster on Aggregations and Reports

  • No disk I/O on the query path — data lives in main memory, always ready
  • Fast for writes and reads — handles inserts and complex statistics in the same database
  • Built-in REST API — query your database directly over HTTP, no middleware needed
  • Sub-millisecond response times even on large tables with GROUP BY and aggregations

Why is it so much faster for statistics? MySQL reads entire rows to compute a SUM() or COUNT(). MemCP stores each column separately, so SELECT region, SUM(revenue) FROM orders GROUP BY region only reads the two columns it actually needs — not every field of every row.

🔌 Drop-in MySQL Compatibility

-- Your existing MySQL queries work immediately
CREATE TABLE users (id INT, name VARCHAR(100), email VARCHAR(255));
INSERT INTO users VALUES (1, 'Alice', 'alice@example.com');
SELECT * FROM users WHERE id = 1;

🌐 Built-in REST API Server

# Start MemCP with REST API
./memcp --api-port=4321 lib/main.scm

# Query via HTTP instantly
curl -X POST http://localhost:4321/sql/mydb \
  -d "SELECT * FROM users" \
  -H "Authorization: Basic cm9vdDphZG1pbg=="

API Endpoints:

  • /sql/<database> — MySQL-dialect SQL
  • /psql/<database> — PostgreSQL-dialect SQL
  • /rdf/<database> — SPARQL queries
  • /rdf/<database>/load_ttl — load RDF/Turtle data
  • /dashboard — admin dashboard with live CPU/memory/connection gauges, database browser, shard and compression statistics, and user management

📊 Perfect for Modern Workloads

  • Microservices - Embedded database per service
  • APIs and Web Apps - Ultra-low latency responses
  • Real-time Analytics - Process data as fast as it arrives
  • Development & Testing - Instant setup, no configuration

Architecture & Languages 🏗️

MemCP combines the best of multiple worlds with a carefully chosen tech stack:

Go (Storage Engine & Core)

  • High-performance storage engine built in Go
  • Concurrent request handling with goroutines
  • Memory-efficient data structures
  • Cross-platform compatibility

Scheme (SQL Parser & Compiler)

  • Advanced SQL parser written in Scheme
  • Query optimization and compilation
  • Extensible language for complex transformations
  • Functional programming advantages for parsing

Flexible Scripting Support

  • Command-line argument support for automation
  • Dynamic query generation and processing
  • Easy integration with existing workflows

Key Advantages 🎯

🔥 Ultra-Fast REST APIs

Traditional setup: Client → HTTP Server → Database Connection → Disk I/O MemCP: Client → HTTP Server → Main Memory

// Response times you'll see
MySQL (with network + disk):  10-50ms
MemCP (main memory):          0.1-1ms  // 50x faster!

⚡ Docker

docker pull carli2/memcp
docker run -it -p 4321:4321 -p 3307:3307 carli2/memcp

🧠 Persistent and Safe

  • Data is written to disk — restarts and crashes don't lose your data
  • S3, MinIO, and Ceph backends for cloud and distributed deployments
  • Automatic compression reduces storage footprint significantly vs. MySQL
  • Configurable data directory — point it at any local or remote path

Memory Management 🧩

MemCP is designed to run alongside other services on the same machine without blowing up your RAM.

Automatic compression — MemCP stores each column in the most compact format that fits the data: small integers get bit-packed, repeated strings become dictionary-encoded, sequential IDs are stored as ranges. A table that takes 10 GB in MySQL often fits in 1–3 GB in MemCP.

Configurable memory budget — by default MemCP uses at most 50% of your server's RAM. You can set an exact limit via the dashboard or the settings API:

# Limit to 4 GB total
curl -u root:admin -X POST http://localhost:4321/dashboard/api/settings \
  -d '{"key":"MaxRamBytes","value":4294967296}'

# Or as a percentage of total RAM (default: 50)
curl -u root:admin -X POST http://localhost:4321/dashboard/api/settings \
  -d '{"key":"MaxRamPercent","value":40}'

Automatic eviction — when MemCP approaches its memory limit, it automatically unloads the least recently used data from RAM. That data stays safe on disk and is transparently reloaded the next time a query needs it. Frequently accessed hot data stays in memory; cold data steps aside.

System-wide pressure awareness — if the whole server runs low on free RAM (below 10%), MemCP detects this and proactively releases its own cache — even if its own budget isn't exhausted yet. This keeps your application, web server, and OS responsive regardless of load spikes.

Separate budget for persistent data — a second budget (default: 30% of RAM) controls how much space the on-disk data loaded into RAM may occupy, independently of temporary query working memory. Both limits are tunable at runtime without restart.

Storage Engines

MemCP supports several storage engines, selectable per table via CREATE TABLE ... ENGINE=<engine> or ALTER TABLE ... ENGINE=<engine>:

Engine Data durability Evictable Description
safe Persisted to disk (default) Yes All writes are immediately durable; data is evicted from RAM when memory is tight and reloaded on demand.
logged Persisted to disk Yes Like safe but uses a write-ahead log for faster writes with the same durability guarantee.
sloppy Persisted to disk Yes Data is written to disk asynchronously; faster writes but brief data loss on crash.
memory RAM only No Data lives entirely in RAM and is never written to disk. Not registered with the memory manager — data is never evicted. Lost on restart.
cache Schema only (RAM data) Yes Like memory, data lives in RAM and is never written to disk. Unlike memory, the data is registered with the memory manager and will be automatically cleared when MemCP approaches its memory limit. The table schema is always persisted; after eviction the table is accessible but empty. Useful for derived or precomputed data that can be regenerated on demand.

🔧 Developer-Friendly

  • Comprehensive test suite with 2470+ SQL tests across 100+ test suites
  • YAML-based testing framework
  • Extensive error handling and validation
  • Built-in performance monitoring

Quick Start 🚀

# 1. Build MemCP
go mod download
make

# 2. Start with REST API
./memcp --api-port=4321 --mysql-port=3307 lib/main.scm

# Run as a background daemon (use --no-repl to avoid exiting when stdin closes)
./memcp --no-repl --api-port=4321 --mysql-port=3307 lib/main.scm &

# 3. Create your first database
curl -X POST http://localhost:4321/sql/system \
  -d "CREATE DATABASE myapp" \
  -u root:admin

# 4. Start building lightning-fast apps!
curl -X POST http://localhost:4321/sql/myapp \
  -d "CREATE TABLE products (id INT, name VARCHAR(100), price DECIMAL(10,2))" \
  -u root:admin

CLI Flags

Flag Default Description
--api-port=PORT 4321 HTTP API listen port
--mysql-port=PORT 3307 MySQL protocol listen port
--mysql-socket=PATH /tmp/memcp.sock MySQL Unix socket path
--root-password=PASSWORD admin Initial root password (first run only)
--disable-api Disable HTTP API server
--disable-mysql Disable MySQL protocol server
--no-repl Disable interactive REPL (required for daemon/background use)
-data DIR ./data Data directory

Authentication

Security note: Never expose MemCP directly to the internet with default credentials. Always set a strong --root-password before any network-accessible deployment.

  • Default credentials: root / admin.
  • Set the initial root password via CLI: --root-password=supersecret at the first run (on a fresh -data folder), or via Docker env ROOT_PASSWORD.
  • Docker Compose example:
services:
  memcp:
    image: carli2/memcp:latest
    environment:
      - ROOT_PASSWORD=supersecret
      - PARAMS=--api-port=4321
    ports:
      - "4321:4321"  # HTTP API
      - "3307:3307"  # MySQL protocol
    volumes:
      - memcp_data:/data
volumes:
  memcp_data: {}
  • Change the credentials with:
curl -X POST http://localhost:4321/sql/system \
  -d "ALTER USER root IDENTIFIED BY 'supersecret'" \
  -u root:admin

Importing Existing Data 📥

MemCP can bulk-import schema and data from MySQL or PostgreSQL with a single Scheme call. The import drops and recreates the target tables on every run, so it is safe to re-run after schema changes.

Import from MySQL

; import all databases (skip system dbs)
(mysql_import nil nil "root" "secret")

; import one specific database
(mysql_import nil nil "root" "secret" "myapp")

; import into a differently-named MemCP database
(mysql_import nil nil "root" "secret" "myapp" "myapp_memcp")

Parameters: host (nil → 127.0.0.1), port (nil → 3306), username, password, sourcedb (nil → all), targetdb (nil → sourcedb), sourcetable (nil → all), targettable (nil → sourcetable).

Import from PostgreSQL

PostgreSQL has an extra hierarchy level — database → schema → table — compared to MySQL. The sourceschema parameter selects which schema(s) within the database to import. All imported schemas land in the same MemCP database (targetdb).

; import the public schema of one PostgreSQL database
(psql_import nil nil "postgres" "secret" "myapp" "public")

; import all non-system schemas of one database
(psql_import nil nil "postgres" "secret" "myapp" nil "myapp")

; import all databases (each becomes a separate MemCP database)
(psql_import nil nil "postgres" "secret")

Parameters: host (nil → 127.0.0.1), port (nil → 5432), username, password, sourcedb (nil → all), sourceschema (nil → all non-system schemas), targetdb (nil → sourcedb), sourcetable (nil → all), targettable (nil → sourcetable).

Both functions print a line for each imported table and return true on success.

Performance Comparison 📈

Query MySQL (SSD) MemCP
SELECT * FROM users WHERE id = 1 1–5ms 0.1ms
SELECT region, SUM(revenue) FROM orders GROUP BY region 200–800ms 2–10ms
SELECT COUNT(*), AVG(price) FROM products WHERE category = ? 50–200ms 0.5ms
INSERT INTO events VALUES (...) 10–30ms 0.2ms
REST API call (HTTP → query → JSON response) 20–100ms 1–10ms

Measured on standard dev hardware. Aggregation queries show the largest speedup because MemCP only reads the columns a query actually uses.

Use Cases 💼

  • 📊 Dashboards and Reports - GROUP BY queries that take seconds in MySQL run in milliseconds
  • 📡 Realtime Monitoring - Aggregate metrics and counters over millions of rows without slowing down
  • 🛒 E-commerce - Product catalog queries, price calculations, and inventory stats at any scale
  • 🎮 Gaming Backends - Leaderboards, player statistics, and session data with sub-millisecond latency
  • 💰 Financial Applications - Aggregations, risk calculations, and transaction summaries in real time
  • 🧪 Development & Testing - Instant database setup, no configuration, throw it away when done

Contributing 🤝

We'd love your help making MemCP even better!

CI behavior

  • Pull requests to master run the full required test workflow.
  • Direct pushes run test only on master (to avoid duplicate branch + PR runs).
  • If a PR shows pending/old checks, push one fresh commit so the current workflow config is applied.

🌟 Why Contribute?

  • Work with cutting-edge database technology
  • Learn Go, Scheme, and database internals
  • Help shape an early-stage project where your contributions have real impact
  • Build ultra-high-performance systems

🛠️ Easy Ways to Contribute

  • 📝 Add test cases - Expand our comprehensive test suite
  • 🐛 Fix bugs - Help us squash issues and improve stability
  • ⚡ Performance optimization - Make fast even faster
  • 📚 Documentation - Help other developers get started
  • 🔧 New features - SQL functions, operators, and capabilities

🚀 Getting Started

# 1. Fork the repository
# 2. Clone your fork
git clone https://github.com/launix-de/memcp.git

# 3. Set up development environment
cd memcp
go build -o memcp

# 4. Run the test suite (starts its own server automatically)
python3 run_sql_tests.py tests/01_basic_sql.yaml

# 5. Make your changes and add tests
# 6. Submit a pull request!

🎯 Current Contribution Opportunities

  • Vector database features - Advanced similarity search
  • Additional SQL functions - String, math, and date functions
  • Performance benchmarking - Automated performance testing
  • Driver development - Language-specific database drivers
  • Documentation examples - Real-world usage scenarios

Testing 🧪

MemCP includes a comprehensive test framework:

# Run all tests
make test

# Or if you want to contribute, deploy this as a Pre-commit hook:
cp git-pre-commit .git/hooks/pre-commit

# Run specific test suites (starts its own server automatically)
python3 run_sql_tests.py tests/01_basic_sql.yaml      # Basic operations
python3 run_sql_tests.py tests/02_functions.yaml      # SQL functions
python3 run_sql_tests.py tests/07_error_cases.yaml    # Error handling

# Connect to an already-running instance (skip startup)
python3 run_sql_tests.py tests/01_basic_sql.yaml 4321 --connect-only

Performance Testing 📊

MemCP includes an auto-calibrating performance test framework that adapts to your machine.

Running Performance Tests

# Run perf tests (uses calibrated baselines)
PERF_TEST=1 make test

# Calibrate for your machine (run ~10 times to reach target time range)
PERF_TEST=1 PERF_CALIBRATE=1 make test

# Freeze row counts for bisecting performance regressions
PERF_TEST=1 PERF_NORECALIBRATE=1 make test

# Show query plans for each test
PERF_TEST=1 PERF_EXPLAIN=1 make test

How Calibration Works

  1. Initial run starts with 10,000 rows per test
  2. Each calibration run scales row counts by 30% up/down
  3. Target is 10-20 seconds query time per test
  4. Baselines are stored in .perf_baseline.json
  5. After ~10 runs, row counts stabilize in the target range

Output Format

✅ Perf: COUNT (7.9ms / 8700ms, 20,000 rows, 0.39µs/row, 11.4MB heap)
         │       │        │           │        │           └─ Heap memory after insert
         │       │        │           │        └─ Time per row
         │       │        │           └─ Calibrated row count
         │       │        └─ Threshold (from baseline × 1.1)
         │       └─ Actual query time
         └─ Test name

Performance Debugging Cookbook

Detecting a performance regression:

# 1. Freeze baselines to use consistent row counts
PERF_TEST=1 PERF_NORECALIBRATE=1 make test

# 2. If a test fails threshold, you have a regression

Bisecting a performance bug:

# 1. Checkout the known-good commit, run calibration
git checkout good-commit
PERF_TEST=1 PERF_CALIBRATE=1 make test  # run 10x to calibrate

# 2. Save the baseline
cp .perf_baseline.json .perf_baseline_good.json

# 3. Bisect with frozen row counts
git bisect start
git bisect bad HEAD
git bisect good good-commit
git bisect run bash -c 'PERF_TEST=1 PERF_NORECALIBRATE=1 make test'

Analyzing slow queries:

# Show query plans to understand execution
PERF_TEST=1 PERF_EXPLAIN=1 make test

Environment Variables

Variable Values Description
PERF_TEST 0/1 Enable performance tests
PERF_CALIBRATE 0/1 Update baselines with new times
PERF_NORECALIBRATE 0/1 Freeze row counts (for bisecting)
PERF_EXPLAIN 0/1 Show query plans

Remote Storage Backends 🗄️

MemCP supports storing databases on remote storage backends instead of the local filesystem. To configure a remote backend, create a JSON configuration file in the data folder instead of a directory.

S3 / MinIO Storage

Store your database on Amazon S3 or any S3-compatible storage (MinIO, Ceph RGW, etc.).

Configuration file (data/mydb.json):

{
  "backend": "s3",
  "access_key_id": "your-access-key",
  "secret_access_key": "your-secret-key",
  "region": "us-east-1",
  "bucket": "memcp-data",
  "prefix": "databases"
}

For MinIO or self-hosted S3-compatible storage:

{
  "backend": "s3",
  "access_key_id": "minioadmin",
  "secret_access_key": "minioadmin",
  "endpoint": "http://localhost:9000",
  "bucket": "memcp",
  "prefix": "data",
  "force_path_style": true
}

Quick MinIO setup for testing:

# Start MinIO with Docker
docker run -d --name minio \
  -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  minio/minio server /data --console-address ":9001"

# Create a bucket (via MinIO Console at http://localhost:9001)
# Or via mc CLI:
mc alias set local http://localhost:9000 minioadmin minioadmin
mc mb local/memcp

Ceph/RADOS Storage

Store your database directly on Ceph RADOS for high-performance distributed storage.

Why is Ceph optional? The Ceph backend uses CGO to link against librados (the Ceph client library). This requires the C headers and library to be installed at compile time and the shared library at runtime. To keep the default build simple and portable, Ceph support is behind a build tag.

# Install Ceph development libraries (Ubuntu/Debian)
sudo apt-get install librados-dev

# Build MemCP with Ceph support
make ceph
# or: go build -tags=ceph

Configuration file (data/mydb.json):

{
  "backend": "ceph",
  "username": "client.memcp",
  "cluster": "ceph",
  "pool": "memcp",
  "prefix": "databases"
}

Optional fields:

  • conf_file: Path to ceph.conf (defaults to /etc/ceph/ceph.conf)

Setting up a Ceph development cluster with vstart.sh:

# Clone Ceph source
git clone https://github.com/ceph/ceph.git
cd ceph

# Install dependencies and build (only vstart target needed)
./install-deps.sh
pip install cython setuptools
./do_cmake.sh
cd build && ninja vstart

# Start a development cluster
cd ..
MON=1 OSD=3 MDS=0 MGR=1 ./build/bin/vstart.sh -d -n -x

# Create a pool for MemCP
./build/bin/ceph osd pool create memcp 32

# Create a user for MemCP (optional, can also use client.admin)
./build/bin/ceph auth get-or-create client.memcp \
  mon 'allow r' \
  osd 'allow rwx pool=memcp' \
  -o ceph.client.memcp.keyring

Environment variables for vstart cluster:

export CEPH_CONF=/path/to/ceph/build/ceph.conf
export CEPH_KEYRING=/path/to/ceph/build/keyring

Backend Configuration Reference

Field Backend Description
backend all Backend type: "s3" or "ceph"
prefix all Object key prefix for database objects
access_key_id S3 AWS or S3-compatible access key
secret_access_key S3 AWS or S3-compatible secret key
region S3 AWS region (e.g., "us-east-1")
endpoint S3 Custom endpoint URL (for MinIO, etc.)
bucket S3 S3 bucket name
force_path_style S3 Use path-style URLs (required for MinIO)
username Ceph Ceph user (e.g., "client.admin")
cluster Ceph Cluster name (usually "ceph")
conf_file Ceph Path to ceph.conf (optional)
pool Ceph RADOS pool name

License 📄

MemCP is open source software. See the LICENSE file for details.


Ready to experience database performance like never before? Get StartedContributeJoin our Community

MemCP: Because your applications deserve better than "good enough" performance.

About

A columnar In-Memory Database as Drop-In Replacement for MySQL supporting 10x performance in OLAP workloads and similar performance in OLTP

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors