fix(compliance): correct HIPAA Safe-Harbor citations + drop anonymisation overclaim (CF-18/CF-17) + verify_output tests (CF-14)#1
Conversation
…tion overclaim (CF-18/CF-17) + verify_output tests (CF-14)
CF-18 — two factual HIPAA 164.514(b)(2)(i) category errors in the manifest
citations, the kind a reviewing radiologist or DPO would catch:
- verify_output.py: the free-text / quasi-identifier catch-all was
mislabelled "(Q) Any other unique characteristic". (Q) is full-face
photographs; the catch-all is (R) "Any other unique identifying number,
characteristic, or code". Relabelled 11 tags plus the burned-in-pixel
finding to (R).
- DeviceSerialNumber was "(N) Device identifiers". (N) is URLs; device
identifiers and serial numbers are (M). Fixed in verify_output.py and the
regulatory_mapping.py Safe-Harbor example mapping (->M, was ->R).
- Qualified the ENS (RD 311/2022) citation: the tool directly evidences
op.exp.8 (audit log) and mp.info.6 (limpieza de documentos); mp.info.3
(cifrado) is NOT implemented and remains the controller responsibility.
- Gated EU AI Act Art. 10 on applicability: added AI_ACT_APPLICABILITY_NOTE
surfaced as an eu-ai-act manifest disclosure. Art. 10 binds only the
provider of a high-risk Annex III system, not anyone who de-identifies.
- Added CITATIONS_VERIFIED_ON constant; DISCLAIMER now carries the
2026-06-01 re-verification date.
CF-17 — dropped the Recital-26 anonymisation overclaim from the Action.D GDPR
clause. It claimed a schema-preserving dummy renders the substituted data
"no longer personal data". That field-level anonymisation claim contradicts
the global PSEUDONYMOUS (not anonymous) classification: salted-hash remapped
UIDs stay reversible with the separately-held salt, so the dataset never
becomes anonymous. Asserting otherwise is exactly the false-anonymisation
overclaim CNIL SAN-2024-013 (Cegedim) sanctioned. Citation narrowed to
Art. 32(1)(a); summary now states the dummy neutralises the field WITHOUT
anonymising the dataset.
CF-14 — new tests/test_verify_output.py (27 tests): metadata residual path,
SQ recursion, value-cleanliness helpers, the happy pixel-OCR path via a fake
pytesseract (no system tesseract binary needed), multiframe mid-slice OCR,
no-pixel objects, VerificationResult status/conclusive/coverage logic,
sampling, empty/garbage dirs, plus regression guards locking the CF-18/CF-17
corrections. verify_output.py coverage ~74 -> 91 percent.
Full suite 226 passed, ruff + mypy --strict clean, golden completeness proof
exits 0. No public API or manifest-hash-affecting change beyond the added
eu-ai-act disclosure.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fba2054536
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "in this manifest BIND ONLY the provider of a high-risk AI system within " | ||
| "the meaning of AI Act Art. 6(2) + Annex III (e.g. AI intended as a " | ||
| "medical device safety component, or otherwise listed in Annex III). If " |
There was a problem hiding this comment.
Include the Art. 6(1) high-risk route
For providers of AI that is itself a medical device or a safety component covered by Annex I product legislation, high-risk classification comes from AI Act Art. 6(1), not Art. 6(2) + Annex III. This new disclosure says Art. 10 binds only Art. 6(2)/Annex III providers and then tells non-Annex-III users Art. 10 does not apply, so an EU medical-device AI provider can receive a manifest that incorrectly disclaims mandatory data-governance obligations. Please include the Art. 6(1)/Annex I path or remove the medical-device example from the Annex III-only gate.
Useful? React with 👍 / 👎.
Post-v0.6.0 correctness polish on the compliance citations and the independent verifier tests. No public API change; the only manifest-payload change is one added eu-ai-act disclosure.
CF-18 — HIPAA Safe-Harbor citation errors
(Q)(full-face photos) instead of(R)(any other unique identifying number/characteristic/code). Relabelled 11 tags + the burned-in-pixel finding.DeviceSerialNumbermislabelled(N)(URLs) instead of(M)(device identifiers and serial numbers). Fixed inverify_output.pyand theregulatory_mapping.pyexample mapping.AI_ACT_APPLICABILITY_NOTEdisclosure (binds only the high-risk Annex III provider).CITATIONS_VERIFIED_ON(2026-06-01) carried in the DISCLAIMER.CF-17 — drop Recital-26 anonymisation overclaim
Action.D GDPR clause claimed a dummy renders data "no longer personal data". That field-level anonymisation claim contradicts the global PSEUDONYMOUS classification (salted-hash remapped UIDs stay reversible with the withheld salt) and is the exact overclaim CNIL SAN-2024-013 (Cegedim) sanctioned. Narrowed to Art. 32(1)(a).
CF-14 — verify_output tests
New
tests/test_verify_output.py(27 tests): metadata residual path, SQ recursion, cleanliness helpers, happy pixel-OCR path via fake pytesseract, multiframe mid-slice, no-pixel objects, VerificationResult logic, sampling, empty/garbage dirs, plus regression guards locking CF-18/CF-17. Coverage ofverify_output.py~74 -> 91 percent.Verification
Full suite 226 passed, ruff + mypy --strict clean,
examples/verify_golden.pyexits 0.