Skip to content

fix: anonymize all personal data from framework codebase#42

Merged
simongonzalezdc merged 1 commit into
mainfrom
fix/anonymize-personal-data
May 27, 2026
Merged

fix: anonymize all personal data from framework codebase#42
simongonzalezdc merged 1 commit into
mainfrom
fix/anonymize-personal-data

Conversation

@simongonzalezdc
Copy link
Copy Markdown
Member

@simongonzalezdc simongonzalezdc commented May 27, 2026

Removes all personal identifiers that leaked into the public framework:

Code defaults:

  • cli.py: --owner default was Pastorsimon1798 → now reads from ARCHAEOLOGY_GITHUB_OWNER env var
  • github_fetcher.py: same _DEFAULT_OWNER fix
  • agent_benchmark.py: normalize_author() returned 'Simon' for any human author → now returns 'Human' for all non-AI names; CSS/JS token --simon--human

References in source:

  • api.py: removed 'The-Factory' from module docstring
  • db/queries.py: removed 'Liminal case study' from comment
  • report.py: removed 'Liminal' from demo README text

Demo deliverables:

  • Both deliverables/archaeology.html and deliverables/visuals/archaeology.html had a callout with 'Jake Van Clief', 'Simon's first-ever PR', and 'mcp-video' — replaced with anonymized generic equivalent

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

  • Configuration

    • GitHub owner configuration now uses environment variable (ARCHAEOLOGY_GITHUB_OWNER) for improved flexibility instead of hardcoded defaults.
  • Visualization

    • Agent benchmark visualizations simplified with generic "Human" agent labeling and updated color theming.
  • Content Updates

    • Removed product-specific references from generated reports and case studies.
    • Updated privacy exclusion descriptions in exported documentation.
    • Demo project content updated with context methodology insights instead of person-specific credits.
    • Improved database schema compatibility documentation.

Review Change Stack

- cli.py: replace hardcoded 'Pastorsimon1798' default owner with
  ARCHAEOLOGY_GITHUB_OWNER env var (empty default)
- github_fetcher.py: same env var substitution for _DEFAULT_OWNER
- agent_benchmark.py: normalize_author now returns 'Human' for all
  non-AI authors (no personal names); rename CSS/JS token simon→human
- api.py: remove 'The-Factory' project reference from module docstring
- db/queries.py: remove 'Liminal case study' reference from comment
- report.py: remove 'Liminal' project name from demo README text
- demo deliverables: replace 'Simon', 'Jake Van Clief', 'mcp-video'
  narrative with generic anonymized equivalent
@simongonzalezdc simongonzalezdc merged commit 93267f5 into main May 27, 2026
2 of 12 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1bc7cd87-eee8-4347-93ec-be9707627872

📥 Commits

Reviewing files that changed from the base of the PR and between f8e74ce and 56ca71d.

📒 Files selected for processing (8)
  • archaeology/api.py
  • archaeology/cli.py
  • archaeology/db/queries.py
  • archaeology/report.py
  • archaeology/visualization/agent_benchmark.py
  • archaeology/visualization/github_fetcher.py
  • projects/demo-project/deliverables/archaeology.html
  • projects/demo-project/deliverables/visuals/archaeology.html

📝 Walkthrough

Walkthrough

This PR removes hardcoded references to a specific person (Simon) and product (Liminal/The-Factory) across multiple modules. Configuration defaults shift from hardcoded to environment variables, agent visualization logic treats unknown authors as generic "Human" instead of preserving names, and documentation is updated to use neutral language.

Changes

Depersonalization and Generalization

Layer / File(s) Summary
GitHub owner configuration externalization
archaeology/visualization/github_fetcher.py, archaeology/cli.py
GitHub owner defaults moved from hardcoded "Pastorsimon1798" to environment variable ARCHAEOLOGY_GITHUB_OWNER with empty-string fallback; fetch-github CLI option, dashboard command, and publish-static command owner_labels mappings updated accordingly.
Agent benchmark author labeling and visualization
archaeology/visualization/agent_benchmark.py
Author normalization treats non-AI authors as generic "Human" label; CSS theme token changed from --simon to --human; JavaScript color extraction and agent-to-color mapping updated; metrics table rework-rate color logic uses colors.human instead of colors.simon.
API and database documentation updates
archaeology/api.py, archaeology/db/queries.py
Module docstring changed to refer to "external consumers" instead of "The-Factory"; get_eras docstring updated to document fallback behavior across schema versions.
Case study and project documentation narratives
archaeology/report.py, projects/demo-project/deliverables/archaeology.html, projects/demo-project/deliverables/visuals/archaeology.html
Case study README content and demo project callouts changed from "KEY PERSON — Jake Van Clief" to "KEY INSIGHT — Context Methodology"; "Liminal" and "YouTube export" references removed in favor of generalized descriptions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/anonymize-personal-data

Comment @coderabbitai help to get the list of available commands and usage tips.

@simongonzalezdc simongonzalezdc deleted the fix/anonymize-personal-data branch May 27, 2026 02:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56ca71d3f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread archaeology/cli.py

@main.command("fetch-github")
@click.option("--owner", default="Pastorsimon1798", help="GitHub username/org")
@click.option("--owner", default=os.environ.get("ARCHAEOLOGY_GITHUB_OWNER", ""), help="GitHub username/org")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require an owner before calling gh

When ARCHAEOLOGY_GITHUB_OWNER is unset and the user runs archaeology fetch-github without --owner, this default becomes "", but the fetcher still passes it as a positional owner to gh repo list rather than omitting the optional argument. The GitHub CLI manual documents the syntax as gh repo list [<owner>], so defaulting to an explicit empty string can make the default command fail or fetch no repos instead of either using the authenticated user/default behavior or giving an actionable error; validate this value or omit the positional argument when it is blank.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant