Skip to content

feat: kbagent pipeline trace — end-to-end data flow mapping #149

@padak

Description

@padak

Problem

lineage show maps bucket sharing between projects, but agents need deeper understanding:

  • Which extractor writes to this table?
  • Which transformations read it?
  • What downstream writers consume the output?
  • What's the full execution order?

Currently agents must correlate config list + config detail + input/output mapping analysis manually.

Proposal

kbagent pipeline trace --project prod --table-id in.c-crm.orders

Output: structured data flow graph:

{
  "table": "in.c-crm.orders",
  "written_by": {"component": "keboola.ex-db-mysql", "config_id": "123", "schedule": "daily 06:00"},
  "read_by": [
    {"component": "keboola.snowflake-transformation", "config_id": "456", "output_tables": ["out.c-analytics.order-metrics"]}
  ],
  "downstream": [
    {"table": "out.c-analytics.order-metrics", "consumed_by": [{"component": "keboola.wr-google-sheets", "config_id": "789"}]}
  ]
}

Why this matters

Understanding data dependencies is the #1 prerequisite for safe changes. Without it, agents risk breaking downstream consumers when modifying a pipeline.

Context

Discussion from Devil's Advocate analysis of kbagent's agentic capabilities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epic/agentic-cliSub-issue of #152 (Modern Agentic CLI)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions