Skip to content

fix(compliance): correct HIPAA Safe-Harbor citations + drop anonymisation overclaim (CF-18/CF-17) + verify_output tests (CF-14)#1

Merged
Ces107 merged 1 commit into
mainfrom
fix/cf-18-cf-14-citation-tests
Jun 1, 2026
Merged

fix(compliance): correct HIPAA Safe-Harbor citations + drop anonymisation overclaim (CF-18/CF-17) + verify_output tests (CF-14)#1
Ces107 merged 1 commit into
mainfrom
fix/cf-18-cf-14-citation-tests

Conversation

@Ces107

@Ces107 Ces107 commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Post-v0.6.0 correctness polish on the compliance citations and the independent verifier tests. No public API change; the only manifest-payload change is one added eu-ai-act disclosure.

CF-18 — HIPAA Safe-Harbor citation errors

  • Catch-all mislabelled (Q) (full-face photos) instead of (R) (any other unique identifying number/characteristic/code). Relabelled 11 tags + the burned-in-pixel finding.
  • DeviceSerialNumber mislabelled (N) (URLs) instead of (M) (device identifiers and serial numbers). Fixed in verify_output.py and the regulatory_mapping.py example mapping.
  • ENS (RD 311/2022) citation qualified: tool evidences op.exp.8 + mp.info.6; mp.info.3 (cifrado) is NOT implemented (controller responsibility).
  • EU AI Act Art. 10 gated on applicability via a new AI_ACT_APPLICABILITY_NOTE disclosure (binds only the high-risk Annex III provider).
  • Added CITATIONS_VERIFIED_ON (2026-06-01) carried in the DISCLAIMER.

CF-17 — drop Recital-26 anonymisation overclaim

Action.D GDPR clause claimed a dummy renders data "no longer personal data". That field-level anonymisation claim contradicts the global PSEUDONYMOUS classification (salted-hash remapped UIDs stay reversible with the withheld salt) and is the exact overclaim CNIL SAN-2024-013 (Cegedim) sanctioned. Narrowed to Art. 32(1)(a).

CF-14 — verify_output tests

New tests/test_verify_output.py (27 tests): metadata residual path, SQ recursion, cleanliness helpers, happy pixel-OCR path via fake pytesseract, multiframe mid-slice, no-pixel objects, VerificationResult logic, sampling, empty/garbage dirs, plus regression guards locking CF-18/CF-17. Coverage of verify_output.py ~74 -> 91 percent.

Verification

Full suite 226 passed, ruff + mypy --strict clean, examples/verify_golden.py exits 0.

…tion overclaim (CF-18/CF-17) + verify_output tests (CF-14)

CF-18 — two factual HIPAA 164.514(b)(2)(i) category errors in the manifest
citations, the kind a reviewing radiologist or DPO would catch:
  - verify_output.py: the free-text / quasi-identifier catch-all was
    mislabelled "(Q) Any other unique characteristic". (Q) is full-face
    photographs; the catch-all is (R) "Any other unique identifying number,
    characteristic, or code". Relabelled 11 tags plus the burned-in-pixel
    finding to (R).
  - DeviceSerialNumber was "(N) Device identifiers". (N) is URLs; device
    identifiers and serial numbers are (M). Fixed in verify_output.py and the
    regulatory_mapping.py Safe-Harbor example mapping (->M, was ->R).
  - Qualified the ENS (RD 311/2022) citation: the tool directly evidences
    op.exp.8 (audit log) and mp.info.6 (limpieza de documentos); mp.info.3
    (cifrado) is NOT implemented and remains the controller responsibility.
  - Gated EU AI Act Art. 10 on applicability: added AI_ACT_APPLICABILITY_NOTE
    surfaced as an eu-ai-act manifest disclosure. Art. 10 binds only the
    provider of a high-risk Annex III system, not anyone who de-identifies.
  - Added CITATIONS_VERIFIED_ON constant; DISCLAIMER now carries the
    2026-06-01 re-verification date.

CF-17 — dropped the Recital-26 anonymisation overclaim from the Action.D GDPR
clause. It claimed a schema-preserving dummy renders the substituted data
"no longer personal data". That field-level anonymisation claim contradicts
the global PSEUDONYMOUS (not anonymous) classification: salted-hash remapped
UIDs stay reversible with the separately-held salt, so the dataset never
becomes anonymous. Asserting otherwise is exactly the false-anonymisation
overclaim CNIL SAN-2024-013 (Cegedim) sanctioned. Citation narrowed to
Art. 32(1)(a); summary now states the dummy neutralises the field WITHOUT
anonymising the dataset.

CF-14 — new tests/test_verify_output.py (27 tests): metadata residual path,
SQ recursion, value-cleanliness helpers, the happy pixel-OCR path via a fake
pytesseract (no system tesseract binary needed), multiframe mid-slice OCR,
no-pixel objects, VerificationResult status/conclusive/coverage logic,
sampling, empty/garbage dirs, plus regression guards locking the CF-18/CF-17
corrections. verify_output.py coverage ~74 -> 91 percent.

Full suite 226 passed, ruff + mypy --strict clean, golden completeness proof
exits 0. No public API or manifest-hash-affecting change beyond the added
eu-ai-act disclosure.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fba2054536

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +635 to +637
"in this manifest BIND ONLY the provider of a high-risk AI system within "
"the meaning of AI Act Art. 6(2) + Annex III (e.g. AI intended as a "
"medical device safety component, or otherwise listed in Annex III). If "

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include the Art. 6(1) high-risk route

For providers of AI that is itself a medical device or a safety component covered by Annex I product legislation, high-risk classification comes from AI Act Art. 6(1), not Art. 6(2) + Annex III. This new disclosure says Art. 10 binds only Art. 6(2)/Annex III providers and then tells non-Annex-III users Art. 10 does not apply, so an EU medical-device AI provider can receive a manifest that incorrectly disclaims mandatory data-governance obligations. Please include the Art. 6(1)/Annex I path or remove the medical-device example from the Annex III-only gate.

Useful? React with 👍 / 👎.

@Ces107 Ces107 merged commit 7e02c52 into main Jun 1, 2026
4 checks passed
@Ces107 Ces107 deleted the fix/cf-18-cf-14-citation-tests branch June 1, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant