Skip to content

Add com.writerslogic.text-fingerprint.1 (fingerprint, text)#55

Merged
domguinard merged 6 commits into
c2pa-org:mainfrom
dcondrey:add-text-fingerprint-alg
Jun 27, 2026
Merged

Add com.writerslogic.text-fingerprint.1 (fingerprint, text)#55
domguinard merged 6 commits into
c2pa-org:mainfrom
dcondrey:add-text-fingerprint-alg

Conversation

@dcondrey

@dcondrey dcondrey commented Jun 22, 2026

Copy link
Copy Markdown
Member

Add com.writerslogic.text-fingerprint.1 (fingerprint, text)

This PR adds a new entry (identifier 40) for a fingerprint-type soft
binding for text content.

Algorithm

com.writerslogic.text-fingerprint.1 computes a durable 256-bit fingerprint
from the words of a document (rather than embedding hidden markers into it):

  1. Normalize to a single character stream: Unicode NFC; remove zero-width /
    formatting characters (U+200B, U+200C, U+200D, U+FEFF, U+2060, variation
    selectors U+FE00–U+FE0F); lowercase; collapse whitespace runs to a single
    space; strip punctuation; trim.
  2. Character 4-grams: overlapping character 4-grams over the normalized
    string (sliding window of 4 chars, step 1; a single n-gram of the whole
    string when it is shorter than 4 chars). Character grams keep a single-word
    edit local, so the fingerprint stays stable on short text where word shingles
    moved too far.
  3. SimHash-256: SHA-256 each 4-gram to a 256-bit vector; for each bit
    position sum +1/−1 across grams; the final bit is 1 when the column sum
    is > 0. Output is 32 bytes, hex-encoded.
  4. Match: Hamming distance ≤ 32 bits (12.5%) indicates the same content
    under light edits.
  5. Windowed fingerprints: the normalized stream is also split into
    overlapping windows of 512 characters with 50% overlap (step 256) and each
    window fingerprinted separately. The soft binding records the whole-document
    fingerprint as block 0 (empty scope) and one block per window
    (scope: {start, length}), so an extracted excerpt or truncated copy can be
    matched against a window block even when its whole-document fingerprint has
    drifted.

Why a computed fingerprint (vs. an embedded ZWC watermark)

The fingerprint is derived from the visible words and recorded in the
agent-signed C2PA manifest, so it is non-destructive (nothing is added to
the document), normalization-proof (reformatting, case, whitespace, and
punctuation changes happen before hashing), ZWC-immune (normalization
removes zero-width characters and variation selectors, so injecting them does
not change the fingerprint), and forge / transfer-resistant (the value is
bound by the manifest signature and cannot be lifted onto another document
without re-signing).

Honest limit: robust to edits and formatting (including a single-word edit
on a one-sentence snippet, thanks to character 4-grams), not to paraphrase.

This algorithm is part of the WritersLogic CPoE proof-of-effort authorship
attestation system, alongside the existing com.writerslogic.zwc-watermark.1
entry (identifier 29).

Checklist

  • Entry conforms to the schema and includes all mandatory fields.
  • File remains valid JSON (python -m json.tool).
  • Next free integer identifier (40) assigned; no collisions.
  • Submitted by an individual affiliated with WritersLogic.
  • informationalUrl provided.
  • Commit signed off (DCO).

@dcondrey dcondrey force-pushed the add-text-fingerprint-alg branch from 5cd7750 to 0bfa35e Compare June 22, 2026 19:15
@dcondrey dcondrey requested review from domguinard and mrappard June 22, 2026 19:26
Signed-off-by: David Condrey <david@writerslogic.com>
@dcondrey dcondrey force-pushed the add-text-fingerprint-alg branch from 0bfa35e to f848e74 Compare June 22, 2026 19:29

@mrappard mrappard left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would approve this if (https://writersproof.com/cpoe/text-fingerprint) had more information.

@domguinard domguinard left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the entry, we reviewed this during the WM TF meeting and would ask if you could add some information on the reference page as it is currently blank. No need for a great deal of details, a marketing page would do for instance.

@dcondrey

dcondrey commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review. The reference page is now published at https://docs.writerslogic.com/soft-binding/text-fingerprint with the algorithm details — normalization, the 256-bit SimHash over character 4-grams, the windowed scoped blocks, the match threshold, and limitations — and the entry's informationalUrl points there. Happy to expand it further if useful.

@domguinard domguinard self-requested a review June 27, 2026 07:58

@domguinard domguinard left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the documentation page, please approve the ID change as ID 40 is already in the queue.
With that approval we should be good to go.

Comment thread softbinding-algorithm-list.json Outdated

@domguinard domguinard left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to accept the change myself.

@domguinard domguinard merged commit b9b906b into c2pa-org:main Jun 27, 2026
1 check passed
@domguinard

Copy link
Copy Markdown
Collaborator

Merged, thanks a lot for your entry @dcondrey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants