From 7f6cbb07b752b80ea699a267cac903faee05e483 Mon Sep 17 00:00:00 2001 From: TaiChi Date: Thu, 4 Jun 2026 17:29:32 +0700 Subject: [PATCH] Add database visual learning polish --- .../systems/db/labs/index.mdx | 50 +++++++++++++++++++ .../systems/db/performance/index.mdx | 40 +++++++++------ .../systems/db/reliability/index.mdx | 32 ++++++++++-- .../systems/db/schema-design/index.mdx | 47 ++++++++++++++--- 4 files changed, 143 insertions(+), 26 deletions(-) diff --git a/docs/(computer_science)/systems/db/labs/index.mdx b/docs/(computer_science)/systems/db/labs/index.mdx index 4801383..54c2ed5 100644 --- a/docs/(computer_science)/systems/db/labs/index.mdx +++ b/docs/(computer_science)/systems/db/labs/index.mdx @@ -22,6 +22,56 @@ In `taichi112.works`, we believe in a tight feedback loop: These workshop-style tasks are designed to guide you through practical database engineering. No real database migrations are required yet—these are conceptual exercises. +```mermaid +flowchart LR + L1[Data Modeling] --> L2[Schema Review] + L2 --> L3[Performance Thinking] + L3 --> L4[Agent-Safe Workflow] +``` + +### Workshop Overview + +| Lab | Theory Focus | Expected Output | Safety Note | +|---|---|---|---| +| **1. Data Modeling Basics** | Relationships & Keys | Sketch of tables and columns. | No code execution. | +| **2. Schema Design Review** | Normalization | Redesigned multi-table layout. | No code execution. | +| **3. Query Performance** | N+1 Problem & Joins | Mental map of batched queries. | Read-only concepts. | +| **4. Agent-Safe Workflow** | Human-in-the-loop | Diagram of approval flow. | Prevent destructive queries. | + +### Scenario: Project Tracker + +To tie all these concepts together, consider a simple **Project Tracker** application. + +A user wants to track projects and the tasks within them. As we build this, we encounter every major database concept: +- We identify the core objects we need to store (User, Project, Task). +- We design how they connect (a User owns Projects, a Project contains Tasks). +- We enforce rules (Tasks must have titles, Projects belong to a valid User). +- We ensure the system doesn't lose data if the server crashes while saving. +- We make sure the dashboard loads instantly even with thousands of tasks. +- We design a workflow so an AI assistant can help manage projects without accidentally deleting everything. + +| Step | What we do | Knowledge used | Related page | Why it matters | +|---|---|---|---|---| +| **1. Ideation** | Identify Entities | Entity, Table, Column | [Overview](../overview) | Decides what data to store. | +| **2. Design** | Connect Entities | Primary Key, Foreign Key | [Schema Design](../schema-design) | Defines structural relationships. | +| **3. Validation** | Enforce Rules | Constraint, Normalization | [Foundations](../foundations) | Prevents invalid or duplicate data. | +| **4. Safety** | Protect Operations | Transaction, Rollback | [Reliability](../reliability) | Ensures data integrity during failures. | +| **5. Speed** | Optimize Queries | Index, Join, N+1 Query | [Performance](../performance) | Keeps the application fast at scale. | +| **6. AI Agents** | Safe Automation | Human-in-the-loop | [Agentic Applications](../agentic-applications) | Prevents destructive AI actions. | + +### Knowledge Map + +This table summarizes the core vocabulary used throughout the database modules. + +| Keyword | Used when | Read more | Why it matters | +|---|---|---|---| +| **Entity** / **Relationship** | Ideation & Design | [Schema Design](../schema-design) | Defines what data exists and how it connects. | +| **Primary Key** / **Foreign Key** | Connecting tables | [Schema Design](../schema-design) | Links data together securely. | +| **Constraint** | Validating data | [Foundations](../foundations) | Enforces rules so bad data never saves. | +| **Transaction** / **Rollback** | Handling failures | [Reliability](../reliability) | Ensures all-or-nothing data operations. | +| **Index** / **N+1 Query** / **Pagination** | Speeding up queries | [Performance](../performance) | Keeps applications fast at scale. | +| **Human Approval** / **Read-only Access** | AI workflows | [Agentic Applications](../agentic-applications) | Protects databases from autonomous destruction. | + ### Lab 1: Data Modeling Basics **Goal**: Design a simple relationship between Users and Projects. - **Mental Exercise**: Sketch out what tables and columns are needed to track which users own which projects. diff --git a/docs/(computer_science)/systems/db/performance/index.mdx b/docs/(computer_science)/systems/db/performance/index.mdx index e233d81..ed6be9e 100644 --- a/docs/(computer_science)/systems/db/performance/index.mdx +++ b/docs/(computer_science)/systems/db/performance/index.mdx @@ -17,25 +17,35 @@ As an application grows, tables get larger. A query that takes 10 milliseconds w ## Core Performance Concepts -### Indexes - -An index is like the table of contents in a book. Instead of scanning every single row in a table to find a specific user, the database uses an index to jump directly to the correct record. Indexing commonly searched columns (like an `email` or `user_id`) is the fastest way to speed up a slow query. - -### Query Shape and N+1 Queries - -How you ask for data matters. A common beginner mistake is the **N+1 query problem**: -- Querying a list of 100 users (1 query). -- Looping through that list and querying the database again for each user's profile picture (100 queries). -- *Result*: 101 queries for a single page load. -Modern ORMs (like Prisma) help solve this by fetching related data efficiently using **Joins** or optimized batch queries. - -### Pagination - -Never load the entire database into memory. Whether showing a list of blog posts or an admin dashboard of users, always use pagination (e.g., "Load 20 items at a time") to keep queries fast and predictable. +Here is a quick overview of performance strategies: + +| Concept | What it is | When to use it | +|---|---|---| +| **Index** | A "table of contents" for fast lookups. | When searching a column frequently (e.g., `email`). | +| **Query Shape** | Requesting only the data you need. | When a page loads too much hidden data. | +| **N+1 Queries** | Looping database calls unnecessarily. | Use Joins or ORM batching instead. | +| **Pagination** | Loading data in small chunks. | Lists with more than 50 items. | +| **Join** | Combining related tables in one query. | When you need User and Profile data at once. | +| **Cache** | Storing slow query results in fast memory. | When data is read constantly but changes rarely. | + + + **Used in the Project Tracker scenario ([Labs](../labs))**: + When displaying a user's dashboard, we use an **index** to find their projects instantly. We fetch the project and its related tasks in one step to avoid an **N+1 query** loop, keeping the **query shape** optimized. If they have hundreds of tasks, we use **pagination** to only load what's visible. Compare this structural thinking back to [Schema Design](../schema-design). + ## The Golden Rule: Measure Before Optimizing Avoid **premature optimization**. Don't spend days architecting a complex caching layer for a database table that only holds 50 rows. + +```mermaid +flowchart LR + Measure([Measure Query Time]) --> Check{Is it slow?} + Check -- No --> Done([Done]) + Check -- Yes --> Inspect[Inspect Query Shape] + Inspect --> Fix[Add Index / Rewrite Query] + Fix --> Measure +``` + 1. Build the feature cleanly. 2. Measure the query time under realistic conditions. 3. If it is slow, add an index. diff --git a/docs/(computer_science)/systems/db/reliability/index.mdx b/docs/(computer_science)/systems/db/reliability/index.mdx index fa537b9..b021955 100644 --- a/docs/(computer_science)/systems/db/reliability/index.mdx +++ b/docs/(computer_science)/systems/db/reliability/index.mdx @@ -23,10 +23,34 @@ In practical software terms, database reliability means: ### ACID Properties Reliable relational databases (like PostgreSQL) guarantee **ACID** properties: -- **Atomicity**: An operation (transaction) is "all or nothing." If a user buys an item, money is deducted AND the inventory drops. If one fails, both fail. -- **Consistency**: The database only moves from one valid state to another valid state, enforcing all constraints. -- **Isolation**: Concurrent transactions don't interfere with each other. -- **Durability**: Once data is saved, it remains saved, even if the power goes out immediately after. +This ensures that database transactions are processed reliably. + +| Property | Meaning | Practical Example | +|---|---|---| +| **Atomicity** | "All or nothing." | Money is deducted AND item ships. If one fails, both fail. | +| **Consistency** | Data must always be valid. | You cannot save an order without a valid user ID. | +| **Isolation** | Transactions don't mix. | Two users buying the last ticket won't break the system. | +| **Durability** | Saved means saved forever. | If power goes out right after saving, data remains. | + + + **Used in the Project Tracker scenario ([Labs](../labs))**: + When a user creates a new project with multiple initial tasks, we use a **transaction** to ensure either everything saves or nothing does. If something fails, a **rollback** prevents partial data. This works alongside the **constraint** rules we defined in [Schema Design](../schema-design) to maintain absolute **data integrity**. + + +### Safe Transaction Flow + +When an application processes critical data, it uses a transaction to maintain ACID properties: + +```mermaid +flowchart TD + Start([Start Transaction]) --> Val[Validate Data] + Val --> Write[Write to Database] + Write --> Check{Are there errors?} + Check -- Yes --> Rollback[Rollback / Undo] + Check -- No --> Commit[Commit / Save] + Rollback --> End([End Transaction]) + Commit --> End +``` ### Constraints & Data Integrity diff --git a/docs/(computer_science)/systems/db/schema-design/index.mdx b/docs/(computer_science)/systems/db/schema-design/index.mdx index 980831c..c60eb83 100644 --- a/docs/(computer_science)/systems/db/schema-design/index.mdx +++ b/docs/(computer_science)/systems/db/schema-design/index.mdx @@ -15,13 +15,46 @@ Schema design is the process of planning how your data is structured, stored, an Understanding schema design requires a firm grasp on how data points relate to each other: -- **Entities**: Real-world objects or concepts you need to store (e.g., a `User`, a `Project`, a `Post`). In a database, these become your **Tables**. -- **Relationships**: How entities connect to one another. - - *One-to-One*: A User has one Profile. - - *One-to-Many*: A User has many Projects. - - *Many-to-Many*: A Project has many Tags, and a Tag belongs to many Projects. -- **Keys**: Tools to identify and link data. **Primary Keys** uniquely identify a single row, and **Foreign Keys** link a row to a primary key in another table. -- **Constraints**: Rules that ensure data is valid (e.g., ensuring an email address is unique, or a required field is not empty). +| Concept | Description | Real-world Example | +|---|---|---| +| **Entity / Table** | A real-world object you need to store. | `User`, `Project` | +| **Row** | A single distinct record. | "Alice's User Profile" | +| **Column** | A specific piece of information. | `email`, `created_at` | +| **Primary Key** | Uniquely identifies a single row. | `id: 1` | +| **Foreign Key** | Links a row to another table's primary key. | `owner_id: 1` | +| **Constraint** | Rule ensuring data is valid. | `email must be UNIQUE` | + + + **Used in the Project Tracker scenario ([Labs](../labs))**: + We identify real-world concepts as an **Entity**, and map how they connect via a **Relationship**. We use a **Primary Key** to uniquely identify each row, a **Foreign Key** to link rows together, and a **Constraint** to enforce data validity. Learn the bedrock of these rules in [Foundations](../foundations). + + +### Visualizing Relationships + +Here is a simplified Entity-Relationship Diagram (ERD). Note: This is a conceptual learning model, not the exact production schema. + +```mermaid +erDiagram + USER ||--o{ PROJECT : "creates" + PROJECT ||--o{ TASK : "contains" + + USER { + int id PK + string email + string name + } + PROJECT { + int id PK + string title + int owner_id FK + } + TASK { + int id PK + string description + boolean is_completed + int project_id FK + } +``` ## Normalization