Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions docs/(computer_science)/systems/db/labs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,56 @@ In `taichi112.works`, we believe in a tight feedback loop:

These workshop-style tasks are designed to guide you through practical database engineering. No real database migrations are required yet—these are conceptual exercises.

```mermaid
flowchart LR
L1[Data Modeling] --> L2[Schema Review]
L2 --> L3[Performance Thinking]
L3 --> L4[Agent-Safe Workflow]
```

### Workshop Overview

| Lab | Theory Focus | Expected Output | Safety Note |
|---|---|---|---|
| **1. Data Modeling Basics** | Relationships & Keys | Sketch of tables and columns. | No code execution. |
| **2. Schema Design Review** | Normalization | Redesigned multi-table layout. | No code execution. |
| **3. Query Performance** | N+1 Problem & Joins | Mental map of batched queries. | Read-only concepts. |
| **4. Agent-Safe Workflow** | Human-in-the-loop | Diagram of approval flow. | Prevent destructive queries. |

### Scenario: Project Tracker

To tie all these concepts together, consider a simple **Project Tracker** application.

A user wants to track projects and the tasks within them. As we build this, we encounter every major database concept:
- We identify the core objects we need to store (User, Project, Task).
- We design how they connect (a User owns Projects, a Project contains Tasks).
- We enforce rules (Tasks must have titles, Projects belong to a valid User).
- We ensure the system doesn't lose data if the server crashes while saving.
- We make sure the dashboard loads instantly even with thousands of tasks.
- We design a workflow so an AI assistant can help manage projects without accidentally deleting everything.

| Step | What we do | Knowledge used | Related page | Why it matters |
|---|---|---|---|---|
| **1. Ideation** | Identify Entities | Entity, Table, Column | [Overview](../overview) | Decides what data to store. |
| **2. Design** | Connect Entities | Primary Key, Foreign Key | [Schema Design](../schema-design) | Defines structural relationships. |
| **3. Validation** | Enforce Rules | Constraint, Normalization | [Foundations](../foundations) | Prevents invalid or duplicate data. |
| **4. Safety** | Protect Operations | Transaction, Rollback | [Reliability](../reliability) | Ensures data integrity during failures. |
| **5. Speed** | Optimize Queries | Index, Join, N+1 Query | [Performance](../performance) | Keeps the application fast at scale. |
| **6. AI Agents** | Safe Automation | Human-in-the-loop | [Agentic Applications](../agentic-applications) | Prevents destructive AI actions. |

### Knowledge Map

This table summarizes the core vocabulary used throughout the database modules.

| Keyword | Used when | Read more | Why it matters |
|---|---|---|---|
| **Entity** / **Relationship** | Ideation & Design | [Schema Design](../schema-design) | Defines what data exists and how it connects. |
| **Primary Key** / **Foreign Key** | Connecting tables | [Schema Design](../schema-design) | Links data together securely. |
| **Constraint** | Validating data | [Foundations](../foundations) | Enforces rules so bad data never saves. |
| **Transaction** / **Rollback** | Handling failures | [Reliability](../reliability) | Ensures all-or-nothing data operations. |
| **Index** / **N+1 Query** / **Pagination** | Speeding up queries | [Performance](../performance) | Keeps applications fast at scale. |
| **Human Approval** / **Read-only Access** | AI workflows | [Agentic Applications](../agentic-applications) | Protects databases from autonomous destruction. |

### Lab 1: Data Modeling Basics
**Goal**: Design a simple relationship between Users and Projects.
- **Mental Exercise**: Sketch out what tables and columns are needed to track which users own which projects.
Expand Down
40 changes: 25 additions & 15 deletions docs/(computer_science)/systems/db/performance/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,35 @@ As an application grows, tables get larger. A query that takes 10 milliseconds w

## Core Performance Concepts

### Indexes

An index is like the table of contents in a book. Instead of scanning every single row in a table to find a specific user, the database uses an index to jump directly to the correct record. Indexing commonly searched columns (like an `email` or `user_id`) is the fastest way to speed up a slow query.

### Query Shape and N+1 Queries

How you ask for data matters. A common beginner mistake is the **N+1 query problem**:
- Querying a list of 100 users (1 query).
- Looping through that list and querying the database again for each user's profile picture (100 queries).
- *Result*: 101 queries for a single page load.
Modern ORMs (like Prisma) help solve this by fetching related data efficiently using **Joins** or optimized batch queries.

### Pagination

Never load the entire database into memory. Whether showing a list of blog posts or an admin dashboard of users, always use pagination (e.g., "Load 20 items at a time") to keep queries fast and predictable.
Here is a quick overview of performance strategies:

| Concept | What it is | When to use it |
|---|---|---|
| **Index** | A "table of contents" for fast lookups. | When searching a column frequently (e.g., `email`). |
| **Query Shape** | Requesting only the data you need. | When a page loads too much hidden data. |
| **N+1 Queries** | Looping database calls unnecessarily. | Use Joins or ORM batching instead. |
| **Pagination** | Loading data in small chunks. | Lists with more than 50 items. |
| **Join** | Combining related tables in one query. | When you need User and Profile data at once. |
| **Cache** | Storing slow query results in fast memory. | When data is read constantly but changes rarely. |

<Callout title="Knowledge Links" type="info">
**Used in the Project Tracker scenario ([Labs](../labs))**:
When displaying a user's dashboard, we use an **index** to find their projects instantly. We fetch the project and its related tasks in one step to avoid an **N+1 query** loop, keeping the **query shape** optimized. If they have hundreds of tasks, we use **pagination** to only load what's visible. Compare this structural thinking back to [Schema Design](../schema-design).
</Callout>

## The Golden Rule: Measure Before Optimizing

Avoid **premature optimization**. Don't spend days architecting a complex caching layer for a database table that only holds 50 rows.

```mermaid
flowchart LR
Measure([Measure Query Time]) --> Check{Is it slow?}
Check -- No --> Done([Done])
Check -- Yes --> Inspect[Inspect Query Shape]
Inspect --> Fix[Add Index / Rewrite Query]
Fix --> Measure
```

1. Build the feature cleanly.
2. Measure the query time under realistic conditions.
3. If it is slow, add an index.
Expand Down
32 changes: 28 additions & 4 deletions docs/(computer_science)/systems/db/reliability/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,34 @@ In practical software terms, database reliability means:
### ACID Properties

Reliable relational databases (like PostgreSQL) guarantee **ACID** properties:
- **Atomicity**: An operation (transaction) is "all or nothing." If a user buys an item, money is deducted AND the inventory drops. If one fails, both fail.
- **Consistency**: The database only moves from one valid state to another valid state, enforcing all constraints.
- **Isolation**: Concurrent transactions don't interfere with each other.
- **Durability**: Once data is saved, it remains saved, even if the power goes out immediately after.
This ensures that database transactions are processed reliably.
Comment on lines 25 to +26

| Property | Meaning | Practical Example |
|---|---|---|
| **Atomicity** | "All or nothing." | Money is deducted AND item ships. If one fails, both fail. |
| **Consistency** | Data must always be valid. | You cannot save an order without a valid user ID. |
| **Isolation** | Transactions don't mix. | Two users buying the last ticket won't break the system. |
| **Durability** | Saved means saved forever. | If power goes out right after saving, data remains. |

<Callout title="Knowledge Links" type="info">
**Used in the Project Tracker scenario ([Labs](../labs))**:
When a user creates a new project with multiple initial tasks, we use a **transaction** to ensure either everything saves or nothing does. If something fails, a **rollback** prevents partial data. This works alongside the **constraint** rules we defined in [Schema Design](../schema-design) to maintain absolute **data integrity**.
</Callout>

### Safe Transaction Flow

When an application processes critical data, it uses a transaction to maintain ACID properties:

```mermaid
flowchart TD
Start([Start Transaction]) --> Val[Validate Data]
Val --> Write[Write to Database]
Write --> Check{Are there errors?}
Check -- Yes --> Rollback[Rollback / Undo]
Check -- No --> Commit[Commit / Save]
Rollback --> End([End Transaction])
Commit --> End
```

### Constraints & Data Integrity

Expand Down
47 changes: 40 additions & 7 deletions docs/(computer_science)/systems/db/schema-design/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,46 @@ Schema design is the process of planning how your data is structured, stored, an

Understanding schema design requires a firm grasp on how data points relate to each other:

- **Entities**: Real-world objects or concepts you need to store (e.g., a `User`, a `Project`, a `Post`). In a database, these become your **Tables**.
- **Relationships**: How entities connect to one another.
- *One-to-One*: A User has one Profile.
- *One-to-Many*: A User has many Projects.
- *Many-to-Many*: A Project has many Tags, and a Tag belongs to many Projects.
- **Keys**: Tools to identify and link data. **Primary Keys** uniquely identify a single row, and **Foreign Keys** link a row to a primary key in another table.
- **Constraints**: Rules that ensure data is valid (e.g., ensuring an email address is unique, or a required field is not empty).
| Concept | Description | Real-world Example |
|---|---|---|
| **Entity / Table** | A real-world object you need to store. | `User`, `Project` |
| **Row** | A single distinct record. | "Alice's User Profile" |
Comment on lines +20 to +21
| **Column** | A specific piece of information. | `email`, `created_at` |
| **Primary Key** | Uniquely identifies a single row. | `id: 1` |
| **Foreign Key** | Links a row to another table's primary key. | `owner_id: 1` |
| **Constraint** | Rule ensuring data is valid. | `email must be UNIQUE` |

<Callout title="Knowledge Links" type="info">
**Used in the Project Tracker scenario ([Labs](../labs))**:
We identify real-world concepts as an **Entity**, and map how they connect via a **Relationship**. We use a **Primary Key** to uniquely identify each row, a **Foreign Key** to link rows together, and a **Constraint** to enforce data validity. Learn the bedrock of these rules in [Foundations](../foundations).
</Callout>

### Visualizing Relationships

Here is a simplified Entity-Relationship Diagram (ERD). Note: This is a conceptual learning model, not the exact production schema.

```mermaid
erDiagram
USER ||--o{ PROJECT : "creates"
PROJECT ||--o{ TASK : "contains"

USER {
int id PK
string email
string name
}
PROJECT {
int id PK
string title
int owner_id FK
}
TASK {
int id PK
string description
boolean is_completed
int project_id FK
}
```

## Normalization

Expand Down