MemCP — Persistent Main Memory Database with MySQL Compatibility

MemCP is a MySQL-compatible database that keeps your data fully in main memory for maximum speed. Your data is safe — it's persisted to disk automatically and survives restarts — but because there's no disk I/O on the query path, reads and aggregations run 10–100× faster than MySQL.

Use your existing MySQL connector, ORMs, and SQL queries. No migration, no rewrite.

Status: Beta. Core SQL (SELECT, INSERT, UPDATE, DELETE, JOIN, GROUP BY, subqueries, triggers) works well in production. Some advanced SQL edge cases are still being improved — check open issues if a specific query doesn't behave as expected.

Why Switch from MySQL? 💡

⚡ 10-100x Faster on Aggregations and Reports

No disk I/O on the query path — data lives in main memory, always ready
Fast for writes and reads — handles inserts and complex statistics in the same database
Built-in REST API — query your database directly over HTTP, no middleware needed
Sub-millisecond response times even on large tables with GROUP BY and aggregations

Why is it so much faster for statistics? MySQL reads entire rows to compute a SUM() or COUNT(). MemCP stores each column separately, so SELECT region, SUM(revenue) FROM orders GROUP BY region only reads the two columns it actually needs — not every field of every row.

🔌 Drop-in MySQL Compatibility

-- Your existing MySQL queries work immediately
CREATE TABLE users (id INT, name VARCHAR(100), email VARCHAR(255));
INSERT INTO users VALUES (1, 'Alice', 'alice@example.com');
SELECT * FROM users WHERE id = 1;

🌐 Built-in REST API Server

# Start MemCP with REST API
./memcp --api-port=4321 lib/main.scm

# Query via HTTP instantly
curl -X POST http://localhost:4321/sql/mydb \
  -d "SELECT * FROM users" \
  -H "Authorization: Basic cm9vdDphZG1pbg=="

API Endpoints:

/sql/<database> — MySQL-dialect SQL
/psql/<database> — PostgreSQL-dialect SQL
/rdf/<database> — SPARQL queries
/rdf/<database>/load_ttl — load RDF/Turtle data
/dashboard — admin dashboard with live CPU/memory/connection gauges, database browser, shard and compression statistics, and user management

📊 Perfect for Modern Workloads

Microservices - Embedded database per service
APIs and Web Apps - Ultra-low latency responses
Real-time Analytics - Process data as fast as it arrives
Development & Testing - Instant setup, no configuration

Architecture & Languages 🏗️

MemCP combines the best of multiple worlds with a carefully chosen tech stack:

Go (Storage Engine & Core)

High-performance storage engine built in Go
Concurrent request handling with goroutines
Memory-efficient data structures
Cross-platform compatibility

Scheme (SQL Parser & Compiler)

Advanced SQL parser written in Scheme
Query optimization and compilation
Extensible language for complex transformations
Functional programming advantages for parsing

Flexible Scripting Support

Command-line argument support for automation
Dynamic query generation and processing
Easy integration with existing workflows

Key Advantages 🎯

🔥 Ultra-Fast REST APIs

Traditional setup: Client → HTTP Server → Database Connection → Disk I/O MemCP: Client → HTTP Server → Main Memory ✨

// Response times you'll see
MySQL (with network + disk):  10-50ms
MemCP (main memory):          0.1-1ms  // 50x faster!

⚡ Docker

docker pull carli2/memcp
docker run -it -p 4321:4321 -p 3307:3307 carli2/memcp

🧠 Persistent and Safe

Data is written to disk — restarts and crashes don't lose your data
S3, MinIO, and Ceph backends for cloud and distributed deployments
Automatic compression reduces storage footprint significantly vs. MySQL
Configurable data directory — point it at any local or remote path

Memory Management 🧩

MemCP is designed to run alongside other services on the same machine without blowing up your RAM.

Automatic compression — MemCP stores each column in the most compact format that fits the data: small integers get bit-packed, repeated strings become dictionary-encoded, sequential IDs are stored as ranges. A table that takes 10 GB in MySQL often fits in 1–3 GB in MemCP.

Configurable memory budget — by default MemCP uses at most 50% of your server's RAM. You can set an exact limit via the dashboard or the settings API:

# Limit to 4 GB total
curl -u root:admin -X POST http://localhost:4321/dashboard/api/settings \
  -d '{"key":"MaxRamBytes","value":4294967296}'

# Or as a percentage of total RAM (default: 50)
curl -u root:admin -X POST http://localhost:4321/dashboard/api/settings \
  -d '{"key":"MaxRamPercent","value":40}'

Automatic eviction — when MemCP approaches its memory limit, it automatically unloads the least recently used data from RAM. That data stays safe on disk and is transparently reloaded the next time a query needs it. Frequently accessed hot data stays in memory; cold data steps aside.

System-wide pressure awareness — if the whole server runs low on free RAM (below 10%), MemCP detects this and proactively releases its own cache — even if its own budget isn't exhausted yet. This keeps your application, web server, and OS responsive regardless of load spikes.

Separate budget for persistent data — a second budget (default: 30% of RAM) controls how much space the on-disk data loaded into RAM may occupy, independently of temporary query working memory. Both limits are tunable at runtime without restart.

Storage Engines

MemCP supports several storage engines, selectable per table via CREATE TABLE ... ENGINE=<engine> or ALTER TABLE ... ENGINE=<engine>:

Engine	Data durability	Evictable	Description
`safe`	Persisted to disk (default)	Yes	All writes are immediately durable; data is evicted from RAM when memory is tight and reloaded on demand.
`logged`	Persisted to disk	Yes	Like `safe` but uses a write-ahead log for faster writes with the same durability guarantee.
`sloppy`	Persisted to disk	Yes	Data is written to disk asynchronously; faster writes but brief data loss on crash.
`memory`	RAM only	No	Data lives entirely in RAM and is never written to disk. Not registered with the memory manager — data is never evicted. Lost on restart.
`cache`	Schema only (RAM data)	Yes	Like `memory`, data lives in RAM and is never written to disk. Unlike `memory`, the data is registered with the memory manager and will be automatically cleared when MemCP approaches its memory limit. The table schema is always persisted; after eviction the table is accessible but empty. Useful for derived or precomputed data that can be regenerated on demand.

🔧 Developer-Friendly

Comprehensive test suite with 2470+ SQL tests across 100+ test suites
YAML-based testing framework
Extensive error handling and validation
Built-in performance monitoring

Quick Start 🚀

# 1. Build MemCP
go mod download
make

# 2. Start with REST API
./memcp --api-port=4321 --mysql-port=3307 lib/main.scm

# Run as a background daemon (use --no-repl to avoid exiting when stdin closes)
./memcp --no-repl --api-port=4321 --mysql-port=3307 lib/main.scm &

# 3. Create your first database
curl -X POST http://localhost:4321/sql/system \
  -d "CREATE DATABASE myapp" \
  -u root:admin

# 4. Start building lightning-fast apps!
curl -X POST http://localhost:4321/sql/myapp \
  -d "CREATE TABLE products (id INT, name VARCHAR(100), price DECIMAL(10,2))" \
  -u root:admin

CLI Flags

Flag	Default	Description
`--api-port=PORT`	`4321`	HTTP API listen port
`--mysql-port=PORT`	`3307`	MySQL protocol listen port
`--mysql-socket=PATH`	`/tmp/memcp.sock`	MySQL Unix socket path
`--root-password=PASSWORD`	`admin`	Initial root password (first run only)
`--disable-api`	—	Disable HTTP API server
`--disable-mysql`	—	Disable MySQL protocol server
`--no-repl`	—	Disable interactive REPL (required for daemon/background use)
`-data DIR`	`./data`	Data directory

Authentication

Security note: Never expose MemCP directly to the internet with default credentials. Always set a strong --root-password before any network-accessible deployment.

Default credentials: root / admin.
Set the initial root password via CLI: --root-password=supersecret at the first run (on a fresh -data folder), or via Docker env ROOT_PASSWORD.
Docker Compose example:

services:
  memcp:
    image: carli2/memcp:latest
    environment:
      - ROOT_PASSWORD=supersecret
      - PARAMS=--api-port=4321
    ports:
      - "4321:4321"  # HTTP API
      - "3307:3307"  # MySQL protocol
    volumes:
      - memcp_data:/data
volumes:
  memcp_data: {}

Change the credentials with:

curl -X POST http://localhost:4321/sql/system \
  -d "ALTER USER root IDENTIFIED BY 'supersecret'" \
  -u root:admin

Importing Existing Data 📥

MemCP can bulk-import schema and data from MySQL or PostgreSQL with a single Scheme call. The import drops and recreates the target tables on every run, so it is safe to re-run after schema changes.

Import from MySQL

; import all databases (skip system dbs)
(mysql_import nil nil "root" "secret")

; import one specific database
(mysql_import nil nil "root" "secret" "myapp")

; import into a differently-named MemCP database
(mysql_import nil nil "root" "secret" "myapp" "myapp_memcp")

Parameters: host (nil → 127.0.0.1), port (nil → 3306), username, password, sourcedb (nil → all), targetdb (nil → sourcedb), sourcetable (nil → all), targettable (nil → sourcetable).

Import from PostgreSQL

PostgreSQL has an extra hierarchy level — database → schema → table — compared to MySQL. The sourceschema parameter selects which schema(s) within the database to import. All imported schemas land in the same MemCP database (targetdb).

; import the public schema of one PostgreSQL database
(psql_import nil nil "postgres" "secret" "myapp" "public")

; import all non-system schemas of one database
(psql_import nil nil "postgres" "secret" "myapp" nil "myapp")

; import all databases (each becomes a separate MemCP database)
(psql_import nil nil "postgres" "secret")

Parameters: host (nil → 127.0.0.1), port (nil → 5432), username, password, sourcedb (nil → all), sourceschema (nil → all non-system schemas), targetdb (nil → sourcedb), sourcetable (nil → all), targettable (nil → sourcetable).

Both functions print a line for each imported table and return true on success.

Performance Comparison 📈

Query	MySQL (SSD)	MemCP
`SELECT * FROM users WHERE id = 1`	1–5ms	0.1ms
`SELECT region, SUM(revenue) FROM orders GROUP BY region`	200–800ms	2–10ms
`SELECT COUNT(*), AVG(price) FROM products WHERE category = ?`	50–200ms	0.5ms
`INSERT INTO events VALUES (...)`	10–30ms	0.2ms
REST API call (HTTP → query → JSON response)	20–100ms	1–10ms

Measured on standard dev hardware. Aggregation queries show the largest speedup because MemCP only reads the columns a query actually uses.

Use Cases 💼

📊 Dashboards and Reports - GROUP BY queries that take seconds in MySQL run in milliseconds
📡 Realtime Monitoring - Aggregate metrics and counters over millions of rows without slowing down
🛒 E-commerce - Product catalog queries, price calculations, and inventory stats at any scale
🎮 Gaming Backends - Leaderboards, player statistics, and session data with sub-millisecond latency
💰 Financial Applications - Aggregations, risk calculations, and transaction summaries in real time
🧪 Development & Testing - Instant database setup, no configuration, throw it away when done

Contributing 🤝

We'd love your help making MemCP even better!

CI behavior

Pull requests to master run the full required test workflow.
Direct pushes run test only on master (to avoid duplicate branch + PR runs).
If a PR shows pending/old checks, push one fresh commit so the current workflow config is applied.

🌟 Why Contribute?

Work with cutting-edge database technology
Learn Go, Scheme, and database internals
Help shape an early-stage project where your contributions have real impact
Build ultra-high-performance systems

🛠️ Easy Ways to Contribute

📝 Add test cases - Expand our comprehensive test suite
🐛 Fix bugs - Help us squash issues and improve stability
⚡ Performance optimization - Make fast even faster
📚 Documentation - Help other developers get started
🔧 New features - SQL functions, operators, and capabilities

🚀 Getting Started

# 1. Fork the repository
# 2. Clone your fork
git clone https://github.com/launix-de/memcp.git

# 3. Set up development environment
cd memcp
go build -o memcp

# 4. Run the test suite (starts its own server automatically)
python3 run_sql_tests.py tests/01_basic_sql.yaml

# 5. Make your changes and add tests
# 6. Submit a pull request!

🎯 Current Contribution Opportunities

Vector database features - Advanced similarity search
Additional SQL functions - String, math, and date functions
Performance benchmarking - Automated performance testing
Driver development - Language-specific database drivers
Documentation examples - Real-world usage scenarios

Testing 🧪

MemCP includes a comprehensive test framework:

# Run all tests
make test

# Or if you want to contribute, deploy this as a Pre-commit hook:
cp git-pre-commit .git/hooks/pre-commit

# Run specific test suites (starts its own server automatically)
python3 run_sql_tests.py tests/01_basic_sql.yaml      # Basic operations
python3 run_sql_tests.py tests/02_functions.yaml      # SQL functions
python3 run_sql_tests.py tests/07_error_cases.yaml    # Error handling

# Connect to an already-running instance (skip startup)
python3 run_sql_tests.py tests/01_basic_sql.yaml 4321 --connect-only

Performance Testing 📊

MemCP includes an auto-calibrating performance test framework that adapts to your machine.

Running Performance Tests

# Run perf tests (uses calibrated baselines)
PERF_TEST=1 make test

# Calibrate for your machine (run ~10 times to reach target time range)
PERF_TEST=1 PERF_CALIBRATE=1 make test

# Freeze row counts for bisecting performance regressions
PERF_TEST=1 PERF_NORECALIBRATE=1 make test

# Show query plans for each test
PERF_TEST=1 PERF_EXPLAIN=1 make test

How Calibration Works

Initial run starts with 10,000 rows per test
Each calibration run scales row counts by 30% up/down
Target is 10-20 seconds query time per test
Baselines are stored in .perf_baseline.json
After ~10 runs, row counts stabilize in the target range

Output Format

✅ Perf: COUNT (7.9ms / 8700ms, 20,000 rows, 0.39µs/row, 11.4MB heap)
         │       │        │           │        │           └─ Heap memory after insert
         │       │        │           │        └─ Time per row
         │       │        │           └─ Calibrated row count
         │       │        └─ Threshold (from baseline × 1.1)
         │       └─ Actual query time
         └─ Test name

Performance Debugging Cookbook

Detecting a performance regression:

# 1. Freeze baselines to use consistent row counts
PERF_TEST=1 PERF_NORECALIBRATE=1 make test

# 2. If a test fails threshold, you have a regression

Bisecting a performance bug:

# 1. Checkout the known-good commit, run calibration
git checkout good-commit
PERF_TEST=1 PERF_CALIBRATE=1 make test  # run 10x to calibrate

# 2. Save the baseline
cp .perf_baseline.json .perf_baseline_good.json

# 3. Bisect with frozen row counts
git bisect start
git bisect bad HEAD
git bisect good good-commit
git bisect run bash -c 'PERF_TEST=1 PERF_NORECALIBRATE=1 make test'

Analyzing slow queries:

# Show query plans to understand execution
PERF_TEST=1 PERF_EXPLAIN=1 make test

Environment Variables

Variable	Values	Description
`PERF_TEST`	`0`/`1`	Enable performance tests
`PERF_CALIBRATE`	`0`/`1`	Update baselines with new times
`PERF_NORECALIBRATE`	`0`/`1`	Freeze row counts (for bisecting)
`PERF_EXPLAIN`	`0`/`1`	Show query plans

Remote Storage Backends 🗄️

MemCP supports storing databases on remote storage backends instead of the local filesystem. To configure a remote backend, create a JSON configuration file in the data folder instead of a directory.

S3 / MinIO Storage

Store your database on Amazon S3 or any S3-compatible storage (MinIO, Ceph RGW, etc.).

Configuration file (data/mydb.json):

{
  "backend": "s3",
  "access_key_id": "your-access-key",
  "secret_access_key": "your-secret-key",
  "region": "us-east-1",
  "bucket": "memcp-data",
  "prefix": "databases"
}

For MinIO or self-hosted S3-compatible storage:

{
  "backend": "s3",
  "access_key_id": "minioadmin",
  "secret_access_key": "minioadmin",
  "endpoint": "http://localhost:9000",
  "bucket": "memcp",
  "prefix": "data",
  "force_path_style": true
}

Quick MinIO setup for testing:

# Start MinIO with Docker
docker run -d --name minio \
  -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  minio/minio server /data --console-address ":9001"

# Create a bucket (via MinIO Console at http://localhost:9001)
# Or via mc CLI:
mc alias set local http://localhost:9000 minioadmin minioadmin
mc mb local/memcp

Ceph/RADOS Storage

Store your database directly on Ceph RADOS for high-performance distributed storage.

Why is Ceph optional? The Ceph backend uses CGO to link against librados (the Ceph client library). This requires the C headers and library to be installed at compile time and the shared library at runtime. To keep the default build simple and portable, Ceph support is behind a build tag.

# Install Ceph development libraries (Ubuntu/Debian)
sudo apt-get install librados-dev

# Build MemCP with Ceph support
make ceph
# or: go build -tags=ceph

Configuration file (data/mydb.json):

{
  "backend": "ceph",
  "username": "client.memcp",
  "cluster": "ceph",
  "pool": "memcp",
  "prefix": "databases"
}

Optional fields:

conf_file: Path to ceph.conf (defaults to /etc/ceph/ceph.conf)

Setting up a Ceph development cluster with vstart.sh:

# Clone Ceph source
git clone https://github.com/ceph/ceph.git
cd ceph

# Install dependencies and build (only vstart target needed)
./install-deps.sh
pip install cython setuptools
./do_cmake.sh
cd build && ninja vstart

# Start a development cluster
cd ..
MON=1 OSD=3 MDS=0 MGR=1 ./build/bin/vstart.sh -d -n -x

# Create a pool for MemCP
./build/bin/ceph osd pool create memcp 32

# Create a user for MemCP (optional, can also use client.admin)
./build/bin/ceph auth get-or-create client.memcp \
  mon 'allow r' \
  osd 'allow rwx pool=memcp' \
  -o ceph.client.memcp.keyring

Environment variables for vstart cluster:

export CEPH_CONF=/path/to/ceph/build/ceph.conf
export CEPH_KEYRING=/path/to/ceph/build/keyring

Backend Configuration Reference

Field	Backend	Description
`backend`	all	Backend type: `"s3"` or `"ceph"`
`prefix`	all	Object key prefix for database objects
`access_key_id`	S3	AWS or S3-compatible access key
`secret_access_key`	S3	AWS or S3-compatible secret key
`region`	S3	AWS region (e.g., `"us-east-1"`)
`endpoint`	S3	Custom endpoint URL (for MinIO, etc.)
`bucket`	S3	S3 bucket name
`force_path_style`	S3	Use path-style URLs (required for MinIO)
`username`	Ceph	Ceph user (e.g., `"client.admin"`)
`cluster`	Ceph	Cluster name (usually `"ceph"`)
`conf_file`	Ceph	Path to ceph.conf (optional)
`pool`	Ceph	RADOS pool name

License 📄

MemCP is open source software. See the LICENSE file for details.

Ready to experience database performance like never before? Get Started • Contribute • Join our Community

MemCP: Because your applications deserve better than "good enough" performance. ⚡

Name		Name	Last commit message	Last commit date
Latest commit History 1,462 Commits
.github/workflows		.github/workflows
apps		apps
assets		assets
debian		debian
docs		docs
lib		lib
papers/aggregationcache		papers/aggregationcache
scm		scm
storage		storage
tests		tests
third_party		third_party
todos		todos
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bench_complex.scm		bench_complex.scm
git-pre-commit		git-pre-commit
go.mod		go.mod
go.sum		go.sum
lol.log		lol.log
main.go		main.go
memcp.service		memcp.service
memcp.singularity.recipe		memcp.singularity.recipe
memcp.spec		memcp.spec
run_sql_tests.py		run_sql_tests.py
run_tests_unbuffered.py		run_tests_unbuffered.py
test_memcp_api.py		test_memcp_api.py
watch-memcp.sh		watch-memcp.sh

Folders and files

Latest commit

History

Repository files navigation

MemCP — Persistent Main Memory Database with MySQL Compatibility

Why Switch from MySQL? 💡

⚡ 10-100x Faster on Aggregations and Reports

🔌 Drop-in MySQL Compatibility

🌐 Built-in REST API Server

📊 Perfect for Modern Workloads

Architecture & Languages 🏗️

Go (Storage Engine & Core)

Scheme (SQL Parser & Compiler)

Flexible Scripting Support

Key Advantages 🎯

🔥 Ultra-Fast REST APIs

⚡ Docker

🧠 Persistent and Safe

Memory Management 🧩

Storage Engines

🔧 Developer-Friendly

Quick Start 🚀

CLI Flags

Authentication

Importing Existing Data 📥

Import from MySQL

Import from PostgreSQL

Performance Comparison 📈

Use Cases 💼

Contributing 🤝

CI behavior

🌟 Why Contribute?

🛠️ Easy Ways to Contribute

🚀 Getting Started

🎯 Current Contribution Opportunities

Testing 🧪

Performance Testing 📊

Running Performance Tests

How Calibration Works

Output Format

Performance Debugging Cookbook

Environment Variables

Remote Storage Backends 🗄️

S3 / MinIO Storage

Ceph/RADOS Storage

Backend Configuration Reference

License 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages