Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# AGENTS.md

## Project Summary

BBOT Server is a database and multiplayer hub for [BBOT](https://github.com/blacklanternsecurity/bbot), an open-source security reconnaissance tool. It ingests BBOT scan events, tracks assets over time, detects changes, and exposes everything through multiple interfaces: a REST API (FastAPI), a Python SDK, a CLI (`bbctl`), and a Terminal UI (Textual).

Key capabilities:
- Ingest scan events in real-time or after the fact
- Track assets with detailed history and change detection
- Multi-user collaboration via shared server
- Query and export assets, findings, technologies, open ports, DNS, etc.
- AI interaction via MCP (Model Context Protocol)

### Architecture

The server is built on FastAPI with PostgreSQL for storage and Redis for message queuing. The codebase is organized into **modules**, each owning its own API endpoints, CLI commands, and data models. Modules are discovered and loaded dynamically at startup.

```
bbot_server/
├── api/ # FastAPI app setup
├── cli/ # bbctl CLI (Typer/Click)
├── db/ # PostgreSQL connection and table definitions
├── models/ # Base Pydantic/SQLModel classes
├── interfaces/ # Python (direct DB) and HTTP (REST client) interfaces
├── modules/ # Feature modules (assets, events, findings, scans, etc.)
│ └── <module>/
│ ├── <module>_api.py # FastAPI applet (BaseApplet)
│ ├── <module>_cli.py # CLI commands (BaseBBCTL)
│ └── <module>_models.py # Data models
├── store/ # Data store abstraction
├── event_store/ # Event storage
├── message_queue/ # Redis-based task queue
├── applets/ # Async task runners
└── watchdog/ # Asset change detection
```

## Tooling

### uv

We use [uv](https://docs.astral.sh/uv/) for dependency management and virtual environments.

```bash
# Install dependencies
uv sync

# Run any command in the venv
uv run <command>
```

Dependencies are declared in `pyproject.toml` and locked in `uv.lock`. BBOT itself is pulled from the `3.0` branch on GitHub (not PyPI).

### Ruff

We use [ruff](https://docs.astral.sh/ruff/) for linting and formatting. Configuration lives in `pyproject.toml`:

- Line length: 119
- Rules: `E` (PEP 8) and `F` (PyFlakes)
- Target: Python 3.10+

```bash
# Lint
uv run ruff check

# Format check
uv run ruff format --check

# Auto-fix
uv run ruff check --fix
uv run ruff format
```

### Running Tests

Tests use pytest with async support. Before running, start the backing services:

```bash
docker run --rm -p 5432:5432 -e POSTGRES_DB=test_bbot_server -e POSTGRES_USER=bbot -e POSTGRES_PASSWORD=bbot postgres:16
docker run --rm -p 6379:6379 redis
```

Then run:

```bash
# All tests
uv run pytest

# Specific test
uv run pytest -k test_applet_scans

# With coverage
uv run pytest --cov=bbot_server .
```

CI runs tests across Python 3.10-3.13 with `--reruns 2` for flaky test resilience.

## Engineering Principles

**No shortcuts. No hardcoding. No hacks.**

- **Build systems, not one-offs.** If you're building one of something and there will eventually be more, first build the proper generic system for it, then implement the specific instance within that system.
- **Modules own their data and code.** Any module-specific data or logic lives ONLY in that module's directory. No matching on module names. No branching on module types. The core system has zero knowledge of individual modules.
- **Generic over specific.** Always implement generic systems that work through interfaces and conventions, not through awareness of what's plugged into them. Modules register themselves; the core discovers and loads them uniformly.
- **Eat our own dogfood.** We use our own interfaces and abstractions. If something is awkward to use internally, it will be awkward for users too. Fix the abstraction. It's okay if we have to take a step back from the current task.

# MONGO TO POSTRES REFACTOR

This refactor is in-progress. here's our immediate TODO:

- get assets aggregation working properly. The assets module is meant to aggregate data from each child module recursively into an "Asset": a host with findings, technologies, open ports, etc. currently list_assets() yields bare hosts. A generic mechanism needs to be built for use in several of the asset endpoints, which pulls in data from those disparate tables and joins them on host.
- port events and watchdog, and get them working. this is an essential step which will ensures the testing framework is up and running, so we can finish porting the rest of the modules.
- note that we're not actually migrating existing data, so we don't need to worry about that

Later TODO:
- finish porting all modules and get their tests passing
- Implement alembic for migrations
- Make sure all data is stored within a single database, but that we have a reliable mechanism for separating event store, user store, and asset store. asset store is particularly important because its tables are dynamic and we need to have a programatic way to delete them (not only clear them), without inadvertently affecting any similarly-named tables tables that may exist.
85 changes: 85 additions & 0 deletions ASSET_ENRICHMENT_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Plan: Dynamic asset enrichment via child applets

## Context

`list_assets()` currently yields bare `Host(pk, host)` objects. An "asset" should include data from all child applets (findings, technologies, etc.). The enrichment should be dynamic — any child applet of AssetsApplet with its own model/table should automatically contribute to the asset view, with no hard-coding in the assets module.

We'll use GROUP BY + array_agg for now. If it becomes a bottleneck at scale, we can switch to LATERAL JOIN later.

## Approach

### 1. Each child applet declares how it contributes to the asset view

In `bbot_server/applets/base.py`, add to BaseApplet:

- `asset_field: str = ""` — the key name in the asset dict (e.g. `"findings"`, `"technologies"`). Empty means "don't participate." One applet, one field.
- `def asset_summary(self)` — returns a SQLAlchemy expression describing the per-host summary for this applet (e.g. an aggregated list of finding names). Returns `None` by default (don't participate).
- `def asset_join(self, host_column)` — returns a SQLAlchemy join condition. Default: `self.model.host == host_column`.

### 2. Build the enriched query dynamically in AssetsApplet

In `bbot_server/modules/assets/assets_api.py`, rewrite `list_assets()`:

```python
stmt = select(Host.host)

for applet in self.child_applets:
summary = applet.asset_summary()
if summary is not None:
stmt = stmt.outerjoin(applet.model, applet.asset_join(Host.host))
stmt = stmt.add_columns(summary)

stmt = stmt.group_by(Host.host)
```

The assets applet doesn't know or care what's inside the expressions. It just asks each child for a join condition and a summary.

### 3. FindingsApplet implements asset_summary()

In `bbot_server/modules/findings/findings_api.py`:

```python
asset_field = "findings"

def asset_summary(self):
from sqlalchemy import func, distinct
return (
func.array_agg(distinct(self.model.name))
.filter(self.model.name.isnot(None))
.label(self.asset_field)
)
```

Future applets (technologies, open_ports) each define their own summary. A technology applet might aggregate differently than findings.

### 4. End result

After scan 1, `list_assets()` yields dicts like:

```json
{"host": "www.evilcorp.com", "findings": ["CVE-2024-12345"]}
{"host": "evilcorp.com", "findings": []}
{"host": "1.2.3.4", "findings": []}
```

If TechnologiesApplet existed, it would automatically add another LEFT JOIN and the output would include `"technologies"` too — no changes to AssetsApplet needed.

### 5. Update the test

In `tests/test_applets/test_applet_assets.py`, update `after_scan_1()`:

- `list_assets()` now yields dicts instead of Host objects
- Each dict has `"host"` + one key per child applet (e.g. `"findings"`)
- After scan 1, `www.evilcorp.com` and `www2.evilcorp.com` have `"findings": ["CVE-2024-12345"]`
- Other hosts have `"findings": []`

## Files to modify

1. `bbot_server/applets/base.py` — add `asset_field`, `asset_summary()`, `asset_join()` defaults
2. `bbot_server/modules/assets/assets_api.py` — rewrite `list_assets()` to build dynamic JOIN query
3. `bbot_server/modules/findings/findings_api.py` — set `asset_field`, implement `asset_summary()`
4. `tests/test_applets/test_applet_assets.py` — update `after_scan_1()` assertions

## Verification

Run `pytest tests/test_applets/test_applet_assets.py::TestAppletAssets -x -v` and verify `after_scan_1()` passes.
Loading
Loading