diff --git a/README.md b/README.md index 822fa33a..5faf5b70 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,23 @@ # Actionbase -**One database for likes, views, and follows — pre-computed, served in real-time.**
+**One database for user–user, user–item, and item–item interactions — precomputed at write time, served as simple lookups.**
**1M+ req/min in production at Kakao. Built on HBase.** [![CI](https://img.shields.io/github/actions/workflow/status/kakao/actionbase/continuous-integration.yml?label=ci&style=flat-square)](https://github.com/kakao/actionbase/actions/workflows/continuous-integration.yml) [![Release](https://img.shields.io/github/v/release/kakao/actionbase?label=release&style=flat-square)](https://github.com/kakao/actionbase/releases) [![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](https://opensource.org/licenses/Apache-2.0) [![Docs](https://img.shields.io/badge/docs-actionbase.io-green?style=flat-square)](https://actionbase.io) -Likes, recent views, follows—look simple, but get complex as you scale, and end up rebuilt again and again. +[Documentation](https://actionbase.io) · [Open-sourced](https://actionbase.io/blog/open-source-announcement/) · [Why we built this](https://github.com/kakao/actionbase/issues/358) · [Production stories](https://actionbase.io/stories/kakaotalk-gift-wish/) -Actionbase solves this by precomputing everything at write time, so reads are just lookups. Currently backed by HBase, handling over a million requests per minute for years at Kakao. +## Why Actionbase -[Documentation](https://actionbase.io) · [Open-sourced](https://actionbase.io/blog/open-source-announcement/) · [Why we built this](https://github.com/kakao/actionbase/issues/358) · [Production stories](https://actionbase.io/stories/kakaotalk-gift-wish/) +Follows, likes, recent views, related content — they are all **interaction data**. They look simple, but at scale, real-time counts, toggle consistency, and ordered relationship reads turn into bottlenecks, and teams end up rebuilding caches, indexes, and dual-write pipelines from scratch every time. + +Actionbase consolidates these into **one database for interactions**. Every interaction is expressed in the same model — *source → action → target* — and the combination of source and target naturally yields three axes: + +- **User–User (U2U)** — follow/unfollow, follower/following counts, timeline scans +- **User–Item (U2I)** — likes/bookmarks, view history, bidirectional counters +- **Item–Item (I2I)** — related products, similar-content graphs, and other precomputed item-to-item relations + +All three axes have run in production at Kakao at over 1M req/min since 2024. ## Quick Start @@ -18,14 +26,14 @@ Actionbase solves this by precomputing everything at write time, so reads are ju ```bash docker run -it ghcr.io/kakao/actionbase:standalone ``` -Runs server (port 8080) in background, CLI (`actionbase>`) in foreground. +Runs server (port 8080) in the background, CLI (`actionbase>`) in the foreground. **2. Load sample data** ``` load preset likes ``` -Loads metadata and 3 edges: +Loads metadata and 3 edges (a U2I example): ``` Alice --- likes ----> +--------+ | Phone | @@ -50,34 +58,30 @@ count --start Phone --direction IN # 2 Actionbase Quick Start Demo -See [Quick Start](https://actionbase.io/quick-start/) for more details, or [Build Your Social Media App](https://actionbase.io/guides/build-your-social-media-app/) to go deeper. +See [Quick Start](https://actionbase.io/quick-start/) for more, or [Build Your Social Media App](https://actionbase.io/guides/build-your-social-media-app/) to go deeper. ## How It Works -Actionbase serves interaction-derived data that powers feeds, product listings, recommendations, and other user-facing surfaces. - -Interactions are modeled as: **who** did **what** to which **target** +- **At write time**: exact counts, consistent toggles, indexes, and aggregations are all precomputed. +- **At read time**: precomputed results are read as-is. Read-path cost stays flat under load. Multi-hop traversals are expressed by composing the supported operations. -At write time, Actionbase precomputes everything needed for reads—accurate counts, consistent toggles, and ordering information for sorting and querying. At read time, there's no aggregation or additional computation. You simply read the precomputed results as they are. - -Supported operations focus on high-frequency access patterns: +Supported operations: * Edge lookups (GET, multi-get) * Edge counts (COUNT) -* Indexed edge scans (SCAN) +* Indexed edge scans (SCAN; multi-start variant is SEEK) +* Real-time aggregations (AGG) -## When (Not) to Use It +Planned: -Use Actionbase when: -- A single database no longer scales for your workload -- Interaction features are rebuilt repeatedly across teams -- You need predictable read latency without read-time computation - -If a single database can handle your workload, that's the better choice. +* Global TopK +* Per-entity TopK ## Architecture -Actionbase writes to HBase for storage and emits WAL/CDC to Kafka for recovery, replay, and downstream pipelines. HBase provides strong durability and horizontal scalability. +- **HBase** — durable storage for interactions; strong durability, horizontal scalability. +- **Kafka** — WAL/CDC publication for recovery, replay, and downstream pipelines. +- **JDBC metastore** — to be consolidated into HBase. ``` Client @@ -85,25 +89,33 @@ Client (REST API) │ Actionbase - ├──> HBase (Storage for user interactions) + ├──> HBase │ - ├──> JDBC (Metastore, to be consolidated) + ├──> JDBC │ - └──> Kafka (WAL/CDC) ──> Downstream Pipelines + └──> Kafka ──> Downstream Pipelines ``` Additional storage backends are planned for small to mid-size deployments. ## Codebase Overview -* **core** — Data model, mutation, query, encoding logic (Java, Kotlin) -* **engine** — Storage and messaging bindings (Kotlin) +* **core** — data model, mutation, query, encoding logic (Java, Kotlin) +* **engine** — storage and messaging bindings (Kotlin) * **server** — REST API server (Kotlin, Spring WebFlux) -* **pipeline** *(planned)* — Bulk loading and CDC processing (Scala, Spark) +* **pipeline** *(planned)* — bulk loading and CDC processing (Scala, Spark) + +## When to Use It + +- A single database or cache layer can no longer absorb your U2U/U2I/I2I traffic +- Multiple teams keep rebuilding the same real-time counters, sync queues, and relationship scans +- You need predictable read latency without read-time computation + +If a single database can handle your workload, that's the better choice. ## Current Status -Early open-source preparation phase. The first release focuses on introducing core concepts and hands-on guides. Production installation, operations guides, and additional components will be released over time. +Early open-source phase. The first release focuses on introducing core concepts and hands-on guides. Production installation, operations guides, and additional components will be released over time. ## Contribute