Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a31aecb435
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
🚀 Preview DeploymentYour documentation preview is ready! Preview URL: https://pr-126.comapeo-docs.pages.dev 🔄 Content: Regenerated 5 pages from Notion (script changes detected)
This preview will update automatically when you push new commits to this PR. Built with commit fd88bc7 |
- Use Notion pages.retrieve API directly in fetchPage instead of invalid database query filter on non-existent "id" property - Add process tracking to job cancellation so DELETE handler actually kills running child processes and prevents status overwrite - Delete persisted job files during cleanup to prevent old jobs from reappearing after restart
- PRD.md: comprehensive review checklist with 16 tasks covering security, architecture, testing, and production readiness - PRD_DOCKER_IMAGE.md: specification for Docker Hub GitHub Action workflow with multi-platform support, path filtering, and PR preview tags
- API_REVIEW.md: 16-task review plan for PR #126 with complexity levels and priority ordering. Covers security, architecture, reliability, and production readiness concerns - TEST_IMPROVEMENT.md: comprehensive test improvement plan addressing 20 failing tests, code duplication issues, and missing HTTP integration coverage
🐳 Docker Image PublishedYour Docker image has been built and pushed for this PR. Image Reference: Platforms: linux/amd64, linux/arm64 TestingTo test this image: docker pull docker.io/communityfirst/comapeo-docs-api:pr-126
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-126Built with commit fd88bc7 |
✅ PR #129 Merge Readiness Checklist (updated)Review findings / correctness
Docker / env completeness
Final validation
|
|
Too many files changed for review. ( |
Refactor Notion script logic into reusable, API-callable modules that can be invoked from APIs, tests, or other tools without CLI dependencies. Core modules: - fetchPages: Fetch all pages from Notion database - fetchPage: Fetch a single page by ID - generateMarkdown: Generate markdown files from Notion pages - generatePlaceholders: Generate placeholder content for empty pages - validateConfig: Validate Notion API configuration - getHealthStatus: Check health of Notion API service All functions return ApiResult<T> with structured error handling, execution time tracking, and consistent metadata. Includes: - Pure functions with explicit config parameters - Progress callback support for long-running operations - Type-safe interfaces for all operations - Comprehensive test coverage (21 tests) Related to: PRD.md task "Refactor Notion script logic into reusable modules callable from API"
Add comprehensive documentation test suite that verifies and documents module purity across the codebase. This establishes: 1. Purity Categories: - PURE: No side effects, output depends only on inputs - ISOLATED_IMPURE: Side effects are isolated and documented - CONFIG_DEPENDENT: Depends on environment variables 2. Module Classifications: - imageCompressor: ISOLATED_IMPURE (uses spawn for pngquant) - utils.ts: PURE (all utility functions) - notion-api/modules.ts: PURE with dependency injection - notionClient.ts: CONFIG_DEPENDENT (needs refactoring) 3. Guidelines for new modules: - Prefer pure functions with explicit configuration - Isolate external dependencies with documentation - Avoid environment variable dependencies - Use dependency injection for testability The test suite documents current architecture decisions and provides guidance for future development.
- Implement HTTP API server using Bun's native serve() - Add job tracking system with in-memory state management - Support 7 job types: notion:fetch, notion:fetch-all, notion:translate, and 4 status update workflows - Add endpoints: GET /health, GET /jobs/types, GET /jobs, POST /jobs, GET /jobs/:id - Include job progress tracking and status updates - Add comprehensive test suite with 36 passing tests - Configure npm scripts: api:server, api:server:dev, test:api-server
Add comprehensive validation tests to verify API routes match required operations and response shapes per PRD requirement. Tests validate: - All 7 required job types are supported - Correct response shapes for all endpoints (health, jobs/types, jobs) - Job status transitions (pending -> running -> completed/failed) - CORS headers configuration - Error response consistency - Request validation for job types and options - All 5 required endpoints are defined All 53 tests pass (36 existing + 17 new validation tests).
Implement a minimal job queue with: - Configurable concurrency limit to control parallel job execution - Job cancellation support for both queued and running jobs - Automatic queue processing when slots become available - Integration with existing JobTracker for state management Key features: - JobQueue class with registerExecutor, add, cancel, and getStatus methods - createJobQueue factory for pre-configured queues with all job types - AbortSignal-based cancellation for graceful job termination - Comprehensive test coverage including concurrency enforcement and cancellation Co-authored-by: Claude <claude@anthropic.com>
Add comprehensive test suite covering: - Multiple simultaneous job additions (Promise.all batching) - FIFO order preservation under concurrency constraints - Concurrency limit enforcement under rapid concurrent requests - Job additions during active queue processing - Accurate running/queued count tracking during concurrent operations - Race condition handling in processQueue - Concurrent cancellation requests - Queue integrity with mixed add/cancel operations - getStatus() thread safety during concurrent operations - Prevention of job starvation under continuous load - Concurrent getQueuedJobs/getRunningJobs calls These tests verify the queue maintains correctness and integrity when handling concurrent HTTP requests typical of API server workloads.
…rvability - Add file-based job persistence using JSON format in .jobs-data directory - Implement log capture with both file and console output - Integrate persistence into job-tracker (load on startup, save on updates) - Integrate log capture into job-executor for job execution logging - Add comprehensive tests for persistence functionality (28 tests) - Update all test files with proper cleanup for persisted data - Add .jobs-data/ to .gitignore Implements PRD requirement for job status persistence and log capture.
Add comprehensive API endpoints for Notion job lifecycle management: - Add DELETE /jobs/:id endpoint for cancelling pending/running jobs - Add query parameter filtering to GET /jobs (?status=, ?type=) - Update CORS headers to support DELETE method - Add tests for job cancellation and filtering scenarios - Update console help with new endpoints and examples The API now supports complete CRUD operations for job lifecycle: - Create: POST /jobs - Read: GET /jobs, GET /jobs/:id - Update: Job status via execution - Delete: DELETE /jobs/:id (cancel operation) Job filtering allows querying by status (pending, running, completed, failed) and job type (notion:fetch, notion:fetch-all, etc.) with optional combined filters.
Per PRD requirement: "Review: confirm endpoint list is minimal and sufficient" Adds comprehensive test suite validating: - Exactly 6 endpoints exist (no redundancy) - Complete CRUD coverage (sufficiency) - All required job lifecycle operations - Query parameter filtering (not separate endpoints) - REST conventions (GET/POST/DELETE) - No redundant purposes - Discovery endpoints (/health, /jobs/types) - HATEOAS-like response structure All 25 tests pass.
Add comprehensive input validation and error handling for all API endpoints to improve security and provide better error messages. Changes: - Add ValidationError class for typed validation errors - Add isValidJobStatus() function for status validation - Add isValidJobId() function with path traversal prevention - Enhance parseJsonBody() with Content-Type and size validation - Add request body validation for POST /jobs endpoint - Validate type field presence and type - Validate job type against allowed values - Validate options object structure and types - Add query parameter validation for GET /jobs endpoint - Validate status filter against allowed values - Validate type filter against allowed values - Add job ID validation for GET/DELETE /jobs/:id endpoints - Prevent path traversal attacks - Enforce maximum length - Add error response helper with optional details field - Add 29 comprehensive tests for validation logic Security improvements: - Path traversal prevention in job IDs - Request size limits (1MB max) - Content-Type validation for POST requests - Input sanitization for all user-provided values
Add centralized error handling system for consistent, actionable error messages across all scripts. This addresses inconsistent error reporting patterns identified during code review. Changes: - Add scripts/shared/errors.ts with base error classes (AppError, ConfigError, NetworkError, ValidationError, FileSystemError, RateLimitError) - Each error type includes default suggestions and context tracking - Add consistent logging utilities (logError, logWarning, logInfo, logSuccess) - Add withErrorHandling wrapper for async operations - Update scripts/fetchNotionData.ts to use unified error logging - Update scripts/migrate-image-cache.ts to use FileSystemError - Update scripts/notion-placeholders/index.ts to use ConfigError - Update scripts/api-server/index.ts to use extended ValidationError - Add comprehensive test coverage (32 tests in errors.test.ts) Error messages now include: - Clear description of what went wrong - Actionable suggestions for resolution - Relevant context information - Consistent formatting with chalk colors Testing: All 32 tests pass, linting clean
Implement API key authentication and comprehensive request audit logging for the Notion Jobs API server. **Authentication (auth.ts):** - API key validation via Authorization header (Bearer/Api-Key schemes) - Environment variable configuration (API_KEY_<name> format) - Graceful degradation when no keys configured (allows public access) - Key metadata tracking (name, description, active status, creation date) - Support for multiple API keys with independent management - Minimum key length validation (16 characters) **Audit Logging (audit.ts):** - Comprehensive request logging with structured JSON format - Client IP extraction from various proxy headers (X-Forwarded-For, X-Real-IP, CF-Connecting-IP) - Authentication result tracking for all requests - Response time measurement and status code logging - File-based persistence (.audit-data/audit.log) - Public endpoint detection for conditional auth **API Server Integration (index.ts):** - Public endpoints: /health, /jobs/types (no auth required) - Protected endpoints: /jobs, /jobs/:id (require valid API key) - Enhanced startup information showing auth status and configured keys - Updated CORS headers to include Authorization - Comprehensive audit logging for all requests **Tests:** - 32 new tests covering authentication and audit functionality - Tests for API key validation, header parsing, and error handling - Tests for audit entry creation, logging, and configuration - All existing tests remain passing **Usage:** - Set API_KEY_* environment variables to enable authentication - Example: API_KEY_READONLY=sk_123... API_KEY_ADMIN=sk_456... - Use: Authorization: Bearer <api-key> or Authorization: Api-Key <api-key>
…etion - Integrate reportJobCompletion into executeJobAsync's onComplete callback - Pass GitHub context, job duration, and error details to status reporter - Add github-context parameter to executeJobAsync signature - Add comprehensive tests for GitHub status integration - Add tests for github-status module (reportJobCompletion, validation)
Update actions/checkout from v4/v5 to v6 and actions/github-script from v7 to v8 across all workflows. All actions now use semantic version tags instead of commit SHAs.
Actionlint reported that `github.event.inputs.confirm` is undefined since the workflow_dispatch inputs section was commented out. Updated the Slack notification to reflect the hardcoded --confirm=yes flag used in the cleanup script.
… constants, add CORS config - Remove unused JobQueue class and 4 related test files (job-queue.ts, job-queue.test.ts, job-queue-behavior-validation.test.ts, job-persistence-queue-regression.test.ts) - Consolidate VALID_JOB_TYPES to derive from JOB_COMMANDS keys (single source of truth) - Move validation constants to validation.ts, import in validation-schemas.ts - Add configurable CORS via ALLOWED_ORIGINS env var with origin allowlisting - Update all response helpers with requestOrigin parameter for proper CORS - Remove tracked build artifact (assets/index-DlhE0rqZ.css) and update .gitignore - Fix duplicate .gitattributes entry in .dockerignore
…olation - Replace weak djb2 auth hash with SHA-256 + timing-safe comparison (TASK 3) - Add environment variable whitelist for child processes (TASK 4) - Fix docker-publish-workflow test assertions for action version changes (TASK 1) - Fix github-status unhandled rejection in retry test (TASK 1) - Add subpage filtering test coverage for notion-count-pages (TASK 5)
…ity hardening TASK 9: Job execution timeout - Add configurable timeouts per job type (5-60 minutes) - Implement SIGTERM → wait → SIGKILL pattern for graceful shutdown - Support JOB_TIMEOUT_MS env var for global override - Add 13 comprehensive timeout tests TASK 6: Log rotation and size limits - Add log rotation for jobs.log and audit.log (max 10MB, keep 3 rotated files) - Add MAX_LOG_SIZE_MB env var for configuration - Add MAX_STORED_JOBS cap for jobs.json (default 1000) - Add 17 rotation tests covering all scenarios TASK 12: GitHub Actions workflow security - Move secrets from shell exports to env blocks (prevents log exposure) - Replace heredoc JSON with jq for proper escaping - Add notion:count-pages to job_type options - Make Slack notification conditional on webhook existence - Update tests to validate new secure pattern Tests: 117 files, 2869 tests pass
…ure, integration tests
TASK 7: File persistence race condition fix
- Add synchronous write lock (mutex-style) to serialize file writes
- Implement atomic writes (temp file + rename pattern)
- Prevent concurrent job completions from overwriting each other
- Add 6 race condition tests including 100-concurrent stress test
TASK 11: Refactor monolithic index.ts (1513 → 14 lines)
- Extract OpenAPI spec → openapi-spec.ts (648 lines)
- Extract CORS handling → middleware/cors.ts (55 lines)
- Extract route handlers → routes/{health,docs,job-types,jobs}.ts
- Extract routing logic → router.ts (118 lines)
- Extract request handling → request-handler.ts (104 lines)
- Extract server setup → server.ts (106 lines)
- No behavior changes, all existing tests pass
TASK 15: Comprehensive integration test script
- Add test-api-integration.sh with 10 E2E test scenarios
- Auth flow (disabled/enabled, valid/invalid keys)
- Job cancellation, concurrent jobs, dry-run mode
- Error handling (invalid types, malformed JSON, 404s)
Tests: 118 files, 2875 tests pass, 0 failures
Fix critical bug where SIGKILL fallback was being skipped incorrectly during timeout escalation. The code checked childProcess.killed (which indicates kill() was called) instead of whether the process actually exited. Changes: - Add processExited flag to track actual process exit via close event - Change SIGKILL decision from !childProcess.killed to !processExited - Set processExited = true in error handler to prevent race condition Tests: - Add test verifying SIGKILL based on actual exit, not killed property - Add test verifying no SIGKILL if process exits during grace period - Add test verifying no SIGKILL if error fires during grace period
Fix unsafe timeout behavior when JOB_TIMEOUT_MS contains invalid values
(NaN, non-numeric, negative, zero, Infinity). Previously parseInt would
coerce these to dangerous values (0ms, negative, etc).
Changes:
- Add parseTimeoutOverride() helper function
- Accept only finite positive integers
- Fall back to job-specific timeout when invalid
- Log warning with diagnostic info for invalid values
Security fixes:
- NaN from parseInt("invalid") → falls back to job timeout
- Negative numbers → falls back to job timeout + warning
- Zero value → falls back to job timeout + warning
- Infinity → falls back to job timeout + warning
Tests:
- 7 new test cases covering all validation edge cases
- All 23 timeout tests pass
The workflow was setting NODE_ENV=test which caused the server to bind to a random port (port 0) instead of the configured port 3001. The health check expected port 3001, causing a mismatch. Root cause: server.ts treats NODE_ENV=test as test mode, using random port binding for test isolation. But workflow local mode needs deterministic port 3001 for health checks to work. Changes: - Remove `export NODE_ENV=test` from workflow local mode startup - Add comment explaining why (deterministic port required) - Update test to verify NODE_ENV=test is NOT set in local mode
…ract The test expected status 'cancelled' but the API stores cancelled jobs as status 'failed' with error 'Job cancelled by user'. This was a false contract mismatch. API contract clarification: - DELETE /jobs/:id returns immediate response with status 'cancelled' - Job tracker stores persisted state as status='failed' with result.error='Job cancelled by user' - GET /jobs/:id returns the persisted state (failed with error) Changes: - Update test to check status='failed' (not 'cancelled') - Verify error message contains 'cancelled' - Add explanatory comment about API contract Fixes the false contract mismatch while maintaining actual API behavior.
Disallowed origins now receive no CORS headers (browser blocks)
instead of ambiguous empty Access-Control-Allow-Origin.
Changes:
- Add isOriginAllowed() helper for clear logic
- Return empty object {} for disallowed origins (no headers)
- Fix Vary: Origin to only include when Origin header present
- Better handling of empty strings in ALLOWED_ORIGINS
Behavior:
| Scenario | Result |
|-------------------------|---------------------------------|
| Allow-all mode (unset) | Access-Control-Allow-Origin: * |
| Allowed origin | Echo origin + Vary: Origin |
| Disallowed origin | No CORS headers (browser blocks)|
| No Origin header | Access-Control-Allow-Origin: * |
Tests:
- 18 new CORS unit tests (all scenarios)
- Updated HTTP integration tests
Add environment variables needed for Notion script parity between CI and local execution paths. Changes: - Add DEBUG, NOTION_PERF_LOG, NOTION_PERF_OUTPUT for telemetry - Add BASE_URL for production asset URLs (production-critical) - Export CHILD_ENV_WHITELIST and buildChildEnv() for testing - Add comprehensive test suite (20 tests) for env propagation Security: - Least privilege maintained (only required vars) - Sensitive vars (GITHUB_TOKEN, API_KEY_*) remain excluded - Tests verify allowed vs blocked propagation Tests: 20 new env tests, all 98 job-executor tests pass
This commit resolves all 5 remaining issues from the API review: 1. CORS Consistency on Error Responses (HIGH) - Add CORS headers to 401, 404, and 500 error responses - Update request-handler.ts to extract origin early and add CORS - Update router.ts to add CORS to 404 responses - Add comprehensive CORS test assertions 2. JOB_TIMEOUT_MS Strict Parsing (MEDIUM) - Implement strict /d+/ validation to reject decimals - Add MAX_TIMEOUT_MS constant (2 hours) with cap enforcement - Replace decimal truncation test with strict rejection test - Add tests for scientific notation, signed values, and cap behavior 3. SIGKILL Hang Edge Case (MEDIUM) - Add post-SIGKILL hard fail-safe timer (SIGKILL_FAILSAFE_MS) - Ensure fail-safe settles executeJob promise properly - Update error message to "Process unresponsive after timeout" - Add test coverage for fail-safe behavior 4. Workflow Auth Test Strictness (LOW) - Change auth assertions from substring to exact match - Add extractAuthorizationHeader() helper for robust parsing - Update all auth checks to use strict.toBe() assertions - Add notion:count-pages to expected job types 5. Child Env Documentation (LOW) - Add comprehensive child process environment variables section - Document all whitelisted variables by category - Document intentionally blocked variables for security Test Results: - API Server Tests: 1083 passed, 23 skipped - ESLint: All checks pass - Prettier: All checks pass Co-authored-by: Codex CLI <codex@anthropic.com>
- Rename PRD files with descriptive names (remove redundant prefixes) - PRD.md → notion-api-service.md - PRD.completed.md → notion-count-pages-feature.md - PRD_DOCKER_IMAGE.md → docker-hub-workflow.md - task-0-investigation-report.md → page-count-discrepancy-investigation.md - Remove duplicate PRD-REVIEW.completed.md (covered by PRD-REVIEW-MAPPING.md) - Keep DOCKER_HUB_AUTH_PATTERNS.md and PRD-REVIEW-MAPPING.md (already descriptive) Improves discoverability and reduces naming confusion.
- Documents 18 issues across P0-P3 priorities - Critical security vulnerabilities identified in test-fetch.sh - Provides fix packages A-D for different needs - Recommended: Package B (Production Ready) for CI/CD
* fix(ci): restore docker publish workflow contract * fix(ci): stop PR image comment when push is disabled * Update .github/workflows/docker-publish.yml
* test(ci): tighten docker publish workflow validation assertions * fix(ci): restore same-repo PR docker publish gating * fix(ci): keep docker publish image name on api repo * fix(ci): remove unused packages write permission
* fix(notion-fetch): isolate per-page failures in batch processing * fix(api-server): defer content repo init and relax health readiness * fix(api-server): serialize init and honor pre-spawn cancellation * fix(content-repo): initialize repo before acquiring lock The lock acquisition must happen AFTER directory creation to avoid 30-minute timeout when dirname(WORKDIR) doesn't exist. This regression was introduced when serialization logic was added in commit 13ab1e7. The acquireRepoLock() call would fail with ENOENT and retry as if it were lock contention, rather than immediately failing or succeeding. Fixes the order to: 1. initializeContentRepo() - creates dirname(workdir) with mkdir -p 2. acquireRepoLock() - creates lock file in now-existing directory * fix(content-repo): prevent initialization race condition Replace boolean `initialized` flag with promise-based lock to prevent concurrent initialization race conditions in git config operations. - Use `initPromise` to ensure single initialization across concurrent calls - Reset promise on failure to allow retry after transient errors - Addresses code review feedback on lines 203-259 Reviewed-by: OpenAI Codex * Codex-generated pull request (#133) * fix(api-server): address pr review race and path issues * fix(ci): align docker publish workflow with validation and tests * fix(ci): use version tags for docker publish actions * fix(api-server): address pr 133 review regressions * fix(ci-validation): restore docker action pinning policy checks * fix(content-repo): reintroduce safe init singleton * fix(ci): disable docker push when registry secrets are unavailable * test(ci): cover docker publish secret-gating behavior * doc: review findings * fix(api-server): harden content repo lock and script path handling (#134) * test(api-server): complete PR 129 reliability coverage for content repo locking (#135) * test(api-server): add focused content-repo lock and init race coverage * docs(api-server): document content-repo envs for docker runtime * feat(test-docker): add docker-compose fetch-all smoke script
- Add ca-certificates to Dockerfile for GitHub SSL verification - Remove CPU limits from docker-compose.yml (NanoCPUs not supported on this system) - Fix conditional .env loading in test-compose-fetch.sh - Fix API response field: data.id → data.jobId - Increase timeout to 1 hour (1800 polls × 2s) for large Notion fetches
- Fix job executor to only treat stderr as error when exit code is non-zero - Add automatic .env loading to test script - Add cleanup of existing containers before test run - Improve health check to silently retry on connection errors
The conditional lastExitCode !== null && lastExitCode !== 0 was dead code because the catch block only executes when exit code is non-zero or on exception, making the condition always true in practice. Simplified to always include stderr on failure.
- Remove dead code: conditional on lastExitCode was always true in catch block - Add exitCodeKnown to help debug null exit code cases (e.g., spawn errors)
f451b74 to
0d2004b
Compare
solves #120 but using a VPS instead of CloudFlare workers.
Summary
Testing