Zero-Trust Data Pipeline Verification Framework

Reference implementation of the framework described in:

Mudusu, S. K., & Gentyala, S. (2026). Zero-Trust Data Pipelines for AI Systems: A Framework for Secure, Verifiable, and Auditable Data Engineering. Journal of Recent Trends in Computer Science and Engineering, 14(2), 10–25. https://jrtcse.com/index.php/home/article/view/JRTCSE.2026.14.2.2/JRTCSE.2026.14.2.2

What this implements

The paper proposes a zero-trust approach to data pipelines feeding AI systems — where no data source is implicitly trusted, every record must pass verifiable quality gates, and all pipeline actions are logged for audit. This repository translates those concepts into working Python code.

Concretely:

Secure ingestion — checksum every input file before parsing; reject unknown extensions and oversized files
Data validation — detect nulls, duplicates, missing required fields, and invalid date formats
Policy enforcement — evaluate declarative YAML rules (required columns, PII detection, null limits, file type allowlists)
Lineage tracking — record source, timestamp, and transformation steps in SQLite
Audit logging — append-only event log for every pipeline action, exportable to JSONL
Trust scoring — aggregate all stage results into a 0–100 AI-readiness score with letter grade

Repository structure

zero-trust-data-pipeline-framework/
├── src/ztdp/
│   ├── ingestion.py        # File loading, checksum, format guard
│   ├── validation.py       # Null counts, duplicates, field checks
│   ├── policy_engine.py    # YAML policy loader and rule evaluator
│   ├── lineage.py          # SQLite-backed lineage tracker
│   ├── audit.py            # Append-only audit logger
│   ├── trust_score.py      # 0–100 weighted trust score
│   ├── config.py           # Configuration dataclasses
│   └── exceptions.py       # Typed exceptions per stage
├── examples/
│   ├── sample_pipeline.py  # End-to-end demonstration
│   ├── sample_input.csv    # 15-row healthcare sample dataset
│   └── policies.yaml       # Example policy definition
├── tests/                  # Pytest test suite (46 tests)
├── docs/                   # Architecture, mapping, audit model, test results
├── .github/workflows/ci.yml
├── Dockerfile
└── pyproject.toml

Installation

git clone https://github.com/reachsunilmudusu-rgb/zero-trust-data-pipeline-framework.git
cd zero-trust-data-pipeline-framework

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate

pip install -e ".[dev]"

Quick start

from ztdp.ingestion import ingest_file
from ztdp.validation import validate
from ztdp.policy_engine import load_policies, enforce
from ztdp.lineage import LineageTracker
from ztdp.audit import AuditLogger
from ztdp.trust_score import calculate_trust_score

ingestion   = ingest_file("examples/sample_input.csv")
validation  = validate(ingestion.records, required_fields=["patient_id", "admission_date"])
policy      = load_policies("examples/policies.yaml")
pol_result  = enforce(ingestion.records, ingestion, validation, policy)

tracker = LineageTracker()
tracker.record(ingestion.dataset_id, ingestion.source_path, ingestion.ingested_at,
               transformation_steps=["validate", "policy_check"])

logger = AuditLogger()
logger.log("pipeline", "ingest", ingestion.dataset_id, "success", f"{ingestion.row_count} rows loaded")

trust = calculate_trust_score(ingestion, validation, pol_result,
                              lineage_recorded=True, audit_recorded=True)
print(trust.summary)

Run the full end-to-end pipeline:

python examples/sample_pipeline.py

Expected output:

============================================================
Zero-Trust Data Pipeline — Verification Run
============================================================

[1] Ingesting file ...
    dataset_id : <uuid>
    rows       : 15
    checksum   : <sha256>...

[2] Validating data ...
    valid      : True
    null %     : 0.0%
    duplicates : 0

[3] Enforcing policies ...
    [PASS] required_columns: All required columns present
    [PASS] pii_columns: PII columns detected — flag for downstream masking: ['patient_id', 'age']
    [PASS] max_null_percentage: Null percentage 0.0% within limit of 10.0%
    [PASS] allowed_file_types: File type 'csv' is allowed
    [PASS] checksum_required: Checksum present
    [PASS] max_duplicate_percentage: Duplicate rate 0.0% within limit of 5.0%

[4] Recording lineage ...
    lineage_id : <uuid>

[5] Calculating trust score ...
    Trust score 100/100 (grade A) — checksum 20/20, validation 25/25, policy 25/25, lineage 20/20, audit 10/10

============================================================
Verification Summary
============================================================
  Dataset ID     : <uuid>
  Source         : sample_input.csv
  Rows ingested  : 15
  Validation     : PASSED
  Policy         : PASSED
  Trust Score    : 100/100 (Grade A)
  Audit events   : 4
============================================================

Running tests

pytest -q

46 tests covering all modules with positive and negative cases. See docs/test_results.md for full expected output.

Docker

docker build -t ztdp .
docker run --rm ztdp

Verification checklist

Documentation

Document	Description
Architecture	Module layout and data flow
Framework Mapping	Paper concept → implementation module
Verification Process	What constitutes a passing pipeline run
Audit Model	Event schema and query examples
Test Results	Expected pytest output and pipeline run

Citation

Mudusu, S. K., & Gentyala, S. (2026). Zero-Trust Data Pipelines for AI Systems:
A Framework for Secure, Verifiable, and Auditable Data Engineering.
Journal of Recent Trends in Computer Science and Engineering, 14(2), 10–25.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/ztdp		src/ztdp
tests		tests
.gitignore		.gitignore
CONTRIBUTORS.md		CONTRIBUTORS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-Trust Data Pipeline Verification Framework

What this implements

Repository structure

Installation

Quick start

Running tests

Docker

Verification checklist

Documentation

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zero-Trust Data Pipeline Verification Framework

What this implements

Repository structure

Installation

Quick start

Running tests

Docker

Verification checklist

Documentation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages