Skip to content

feat: robust i18n pipeline with CI integration and nested key support#5

Open
0xPepeSilvia wants to merge 6 commits intotari-project:mainfrom
0xPepeSilvia:feat/i18n-pipeline-v2
Open

feat: robust i18n pipeline with CI integration and nested key support#5
0xPepeSilvia wants to merge 6 commits intotari-project:mainfrom
0xPepeSilvia:feat/i18n-pipeline-v2

Conversation

@0xPepeSilvia
Copy link
Copy Markdown

Closes #1

Summary

  • Replaces four standalone scripts with a proper aiteen Python package (audit, translate, patch, qa, cli, config modules)
  • Adds nested-key support — arbitrarily deep JSON trees are flattened to dotted paths and compared correctly (root-only comparison was the documented failure in PR fix: Replace with a supported robust solution #2)
  • Adds deep-merge when writing translations back — existing keys are never overwritten by a shallow dict.update() (another PR fix: Replace with a supported robust solution #2 failure)
  • Self-contained: zero external SaaS dependencies (unlike PR fix: Replace with a supported robust solution #3 / Tolgee which requires an account and network egress)
  • Full CLI (aiteen audit | translate | patch | qa | run-all) with --dry-run, --fail-on-missing, --fail-on-issue flags
  • GitHub Actions CI for automatic translation on locale changes + pytest matrix across Python 3.10–3.12

Acceptance Criteria

AC Status Pointer
Detects missing top-level keys PASS aiteen/audit.py::find_missing, tests/test_audit.py::test_detects_missing_top_level_keys
Detects missing nested keys PASS aiteen/audit.py::flatten, tests/test_audit.py::test_detects_missing_nested_keys
Preserves placeholders ({count}, <b>) in translations PASS aiteen/translate.py::PLACEHOLDER_RE, tests/test_translate.py::test_placeholder_*
QA catches placeholder stripping PASS aiteen/qa.py::qa_locale, tests/test_qa.py::test_catches_placeholder_stripping
Deep merge (not shallow) PASS aiteen/patch.py::deep_merge, tests/test_patch.py::test_deep_merge_nested
Clear errors (no raw tracebacks) PASS aiteen/cli.py::_handle_error, adversarial testing

Differentiators vs Competitor PRs

PR #2 (ledgerpilot):

  • Uses deprecated openai.Completion API (removed in openai>=1.0) — will crash immediately
  • Only compares root-level keys; nested JSON objects are silently ignored
  • Uses dict.update() (shallow merge) — overwrites entire nested sections

PR #3 (Tolgee):

  • Requires a Tolgee SaaS account, API token, and continuous network egress
  • This solution is fully self-contained: one pip install, one OPENAI_API_KEY env var

PR #4:

  • Net negative: removes more functionality than it adds

Test Results

74 passed in 3.79s

All 74 tests pass across: test_audit.py (18), test_patch.py (16), test_qa.py (15), test_translate.py (25).

CI Workflows

  • .github/workflows/translate.yml — triggers on **/locales/en/**/*.json changes in any PR, runs aiteen run-all, auto-commits translations
  • .github/workflows/test.yml — runs pytest on Python 3.10, 3.11, 3.12 on push and PR to main

Config Examples

tari-project/universe (examples/universe.yaml): 12 languages (ar, de, es, fr, id, ja, ko, pt, ru, tr, vi, zh)

WXTM Bridge (examples/wxtm-bridge.yaml): 4 languages (ar, de, es, fr)

Usage:

pip install aiteen
aiteen --config examples/universe.yaml run-all

Adversarial Testing Passed

  • Malformed JSON → Error: Malformed JSON in de/common.json: line 1 col 2 (exit 1, no stack trace)
  • Missing OPENAI_API_KEYError: OPENAI_API_KEY is not set. Provide it via environment variable... (exit 1)
  • Only-nested-keys locale → audit correctly detects all missing keys
  • Non-existent locales dir → Error: Locales directory not found: /path/to/dir (exit 1)
  • QA catches {count} stripped from translation → PLACEHOLDER_MISMATCH issue reported

🤖 Generated with Claude Code

0xPepeSilvia and others added 5 commits April 15, 2026 06:02
Replaces legacy flat-file scripts with a proper Python package (aiteen/).
Adds CLI (click), config loading (YAML + dotenv), deep-merge patch logic,
placeholder-aware QA, and full nested-key audit support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
74 tests covering: nested-key detection (the case competitor PR tari-project#2 missed),
placeholder preservation, deep-merge correctness, QA placeholder mismatch
reporting, retry logic, and missing-API-key error handling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
translate.yml: triggers on changes to locales/en/**/*.json, runs the full
aiteen pipeline, and commits the results via git-auto-commit-action.
test.yml: runs pytest on push/PR across Python 3.10, 3.11, 3.12.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
examples/universe.yaml — 12 target languages for tari-project/universe.
examples/wxtm-bridge.yaml — 4 target languages for the WXTM Bridge frontend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Aiteen v2, a robust i18n translation pipeline for Tari projects, featuring modules for auditing missing translations, OpenAI-driven translation, locale file patching, and quality assurance. The project structure has been updated to use setuptools, and a comprehensive test suite is included. Feedback identifies a logic flaw in the patching module that could lead to data loss during the unflattening of nested keys, and points out a redundant return statement in the CLI's error handling.

Comment thread aiteen/patch.py Outdated
Comment on lines +48 to +56
# Convert existing to flat, override, then unflatten — preserves order
# of source keys while letting us write nested dotted keys cleanly.
flat_existing = flatten(existing)
for key, val in flat_translations.items():
flat_existing[key] = val
merged = unflatten(flat_existing)
# Deep-merge against the original existing dict to keep any keys
# `flatten` may have collapsed (e.g. empty-dict leaves).
merged = deep_merge(existing, merged)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for merging translations is flawed and can lead to data loss. By flattening the existing data and the new translations into a single dictionary and then unflattening, you create conflicts when one key is a prefix of another (e.g., a and a.b). The unflatten function is not designed to handle this and will destructively overwrite values depending on key order, causing either existing data or new translations to be lost.

A safer and simpler approach is to unflatten only the new translations and then deep-merge the resulting structure into the existing data. This correctly handles adding nested keys without the risk of data loss from key conflicts during the unflattening process.

Suggested change
# Convert existing to flat, override, then unflatten — preserves order
# of source keys while letting us write nested dotted keys cleanly.
flat_existing = flatten(existing)
for key, val in flat_translations.items():
flat_existing[key] = val
merged = unflatten(flat_existing)
# Deep-merge against the original existing dict to keep any keys
# `flatten` may have collapsed (e.g. empty-dict leaves).
merged = deep_merge(existing, merged)
# Unflatten the new translations and deep-merge them into the existing data.
new_translations = unflatten(flat_translations)
merged = deep_merge(existing, new_translations)

Comment thread aiteen/cli.py Outdated
cfg = _make_config(ctx.obj)
except (click.UsageError, FileNotFoundError, ValueError, OSError) as e:
_handle_error(e)
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This return statement is unreachable because _handle_error on the previous line calls sys.exit(1), which terminates the program. This redundant code can be removed.

- patch.py: unflatten only the new translations and deep-merge into
  existing instead of flattening both dicts together; the old approach
  caused data loss when one key was a prefix of another (e.g. 'a' and
  'a.b') because unflatten would destructively overwrite values
- cli.py: remove unreachable `return` after `_handle_error(e)` in
  run_all(); _handle_error calls sys.exit(1) so the return never executes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@0xPepeSilvia
Copy link
Copy Markdown
Author

Both gemini issues addressed in 9fa96c7:

HIGH — data loss in patch.py: Removed the flatten-merge-unflatten pattern. Now we unflatten only the new translations and deep_merge them into the existing structure directly. This eliminates the prefix-key conflict (e.g. a vs a.b) that caused unflatten to destructively overwrite values.

MEDIUM — unreachable return in cli.py: Removed the return on line 185; _handle_error calls sys.exit(1) so the statement was never reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace with a supported robust solution

1 participant