Skip to content

feat: integrate AWS Comprehend as a chainable PII redactor #461

Open
Pearltechie wants to merge 1 commit intoarakoodev:tsfrom
Pearltechie:feat/aws-comprehend-pii-redactor-rxjs
Open

feat: integrate AWS Comprehend as a chainable PII redactor #461
Pearltechie wants to merge 1 commit intoarakoodev:tsfrom
Pearltechie:feat/aws-comprehend-pii-redactor-rxjs

Conversation

@Pearltechie
Copy link
Copy Markdown

Closes #290

/claim #290

Adds an AWS Comprehend wrapper to @arakoodev/edgechains.js/ai so prompts can be PII-scrubbed before being sent to an Endpoint (OpenAI, GeminiAI, LlamaAI, RetellAI).

The issue asks for "new classes that can be chained with existing Endpoint classes (as observables)". I exposed two surfaces over the same underlying logic so people can pick whichever matches their code:

  • AWSComprehend - promise API (detectPii, containsPii, redact, redactBatch, chain)
  • redact$, redactPii, redactPiiText, redactPiiBatch - RxJS Observable / operator API that drops into a pipe()

Files

JS/edgechains/arakoodev/src/ai/src/

  • lib/aws-comprehend/comprehend.ts - the class
  • lib/aws-comprehend/observables.ts - rxjs operators
  • lib/aws-comprehend/index.ts - barrel
  • tests/awsComprehend.test.ts - 17 vitest tests, AWS SDK mocked
  • index.ts - re-export the new symbols

JS/edgechains/examples/pii-redactor/

  • runnable example with both promise + observable demos against the existing OpenAI Endpoint

New deps in arakoodev/package.json

"@aws-sdk/client-comprehend": "^3.600.0",
"rxjs": "^7.8.1"

Usage

Promise:

import { AWSComprehend, OpenAI } from "@arakoodev/edgechains.js/ai";

const comprehend = new AWSComprehend({ region: "us-east-1" });
const openai     = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const reply = await comprehend.chain(
    userPrompt,
    safe => openai.chat({ prompt: safe }),
    { strategy: "type" }   // -> "[EMAIL]", "[PHONE]", "[SSN]" ...
);

Observable:

import { from } from "rxjs";
import { mergeMap } from "rxjs/operators";
import { AWSComprehend, redactPii, OpenAI } from "@arakoodev/edgechains.js/ai";

const comprehend = new AWSComprehend();
const openai     = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

userPrompts$
  .pipe(
    redactPii(comprehend, { strategy: "type" }),
    mergeMap(r => openai.chat({ prompt: r.redactedText }))
  )
  .subscribe(reply => sendBack(reply));

redact() options: languageCode, piiEntityTypes (filter to specific types), minConfidence (default 0.5), redactionChar, strategy ("char" | "type" | "fixed").

Credentials resolved from constructor -> env (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN) -> default AWS credential chain. Inputs above 100 KB are rejected before the network call (Comprehend's hard limit).

Tests

cd JS/edgechains/arakoodev
pnpm install
pnpm test -- awsComprehend
Test Files  1 passed (1)
     Tests  17 passed (17)

Tests mock @aws-sdk/client-comprehend so they run with no AWS account. They cover detection, redaction (all three strategies + entity-type filter + confidence filter), batching, the promise chain() helper, and every observable operator including the from -> redactPii -> mergeMap(endpoint) composition that the issue asks for.

Demo

cd JS/edgechains/examples/pii-redactor
cp .env.example .env   # add AWS_* + OPENAI_API_KEY
pnpm install
pnpm start             # promise demo
pnpm run stream        # rxjs demo
ORIGINAL : Hi, I'm Sarah Chen (sarah.chen@acme.io). My phone is 415-555-0142 ...
REDACTED : Hi, I'm [NAME] ([EMAIL]). My phone is [PHONE] ...
LLM REPLY: Sure, here's a polite extension request you can send ...

IAM

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["comprehend:DetectPiiEntities", "comprehend:ContainsPiiEntities"],
    "Resource": "*"
  }]
}

Let me know if you'd like a different naming scheme for the operators or a different default for concurrency / minConfidence.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@Pearltechie
Copy link
Copy Markdown
Author

Quick demo (40s) — tests passing, then both promise and rxjs demos against a stubbed Comprehend showing PII tagged with [TYPE] before going to the LLM.

edgechains-pr461-demo.1.mp4

@Pearltechie
Copy link
Copy Markdown
Author

I have read the Arakoo CLA Document and I hereby sign the CLA

@Pearltechie
Copy link
Copy Markdown
Author

recheck

@Pearltechie
Copy link
Copy Markdown
Author

Hey @sandys, I've submitted PR #461 with the requested video and passed all CI tests. Ready for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BOUNTY: integrate AWS Comprehend as a utility to redact data

1 participant