Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions .github/workflows/npm-token-health.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
name: NPM_TOKEN health

# Daily check that the NPM_TOKEN secret still authenticates against the
# npm registry. Opens a GitHub issue if `npm whoami` fails — catches
# token rot, revocation, or scope changes BEFORE the next release fires
# and silently leaves npm behind.
#
# Motivating incident (2026-05-22): the v3.2.1 release on 2026-05-12
# failed at `npm publish` with HTTP 404 because NPM_TOKEN had been
# rotated 7 weeks earlier and agent's secret wasn't updated alongside
# the sibling @askalf/dario repo's. Users running `npm i -g @askalf/agent`
# for 10 days got the stale 3.1.5 build — missing the WS-subprotocol
# crash fix that this very release was shipping. Token rot turns into a
# GH issue within 24h now instead of being invisible until the next
# release attempt.
#
# Pattern mirrors @askalf/dario's npm-token-health.yml, with marker-
# comment dedup so a stale token doesn't open a fresh issue every day,
# and auto-close on the first successful run after rotation.

on:
schedule:
- cron: '17 4 * * *'
workflow_dispatch:

permissions:
contents: read
issues: write

jobs:
whoami:
runs-on: ubuntu-latest
timeout-minutes: 3
steps:
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
with:
node-version: 22
registry-url: https://registry.npmjs.org

- name: Check NPM_TOKEN
id: check
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
# `npm whoami` against the configured registry returns the
# token owner's username when auth works, exits non-zero when
# it doesn't. Granular access tokens return the org/user that
# owns the token. Capture both for the issue body.
set +e
whoami_out=$(npm whoami --registry=https://registry.npmjs.org 2>&1)
whoami_exit=$?
set -e
echo "exit=$whoami_exit" >> "$GITHUB_OUTPUT"
echo "$whoami_out" > whoami.txt
if [ "$whoami_exit" = "0" ]; then
echo "NPM_TOKEN healthy. whoami: $whoami_out"
else
echo "::error::NPM_TOKEN authentication failed (exit $whoami_exit)"
echo "--- whoami output ---"
cat whoami.txt
fi

- name: Open issue on token failure
if: steps.check.outputs.exit != '0'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# De-dup: only open one issue per outage. If a previous health
# check already opened one and nobody's closed it yet (token's
# still bad), don't pile on with another. Closing the issue
# signals "rotated, ready for the next check to re-arm".
marker='<!-- agent-npm-token-rot -->'
existing=$(gh issue list -R "${{ github.repository }}" --state open --search "in:body $marker" --json number --jq '.[0].number // ""')
if [ -n "$existing" ]; then
echo "Existing open token-rot issue #$existing — leaving alone."
exit 0
fi

body_file=$(mktemp)
{
echo "$marker"
echo ""
echo "### NPM_TOKEN authentication failed"
echo ""
echo "Daily \`npm whoami\` check could not authenticate to the npm registry. The token is likely expired, revoked, or has had its scopes changed."
echo ""
echo "**Impact:** the next \`@askalf/agent\` release will fail at npm publish. Users running \`npm i -g @askalf/agent\` will continue getting whatever's currently latest on npm until the token is replaced and the failed publish is re-run."
echo ""
echo "### To fix"
echo ""
echo "1. npmjs.com → Access Tokens → **Generate New Token** → **Granular Access Token**"
echo "2. Scope: \`@askalf/*\` (or just \`@askalf/agent\`) with **Read and write** permission"
echo "3. Copy the token (\`npm_xxxxx...\`)"
echo "4. \`gh secret set NPM_TOKEN -R askalf/agent\` (paste when prompted)"
echo "5. Verify: \`gh workflow run npm-token-health.yml -R askalf/agent\` — this issue auto-closes when the next health check passes"
echo "6. Backfill any failed publish:"
echo " \`\`\`"
echo " gh run list -R askalf/agent --workflow 'Publish to npm' --status failure --limit 1 --json databaseId"
echo " gh run rerun <id> -R askalf/agent --failed"
echo " \`\`\`"
echo ""
echo "### whoami output"
echo ""
echo "\`\`\`"
cat whoami.txt
echo "\`\`\`"
echo ""
echo "### Workflow run"
echo ""
echo "[Run #${{ github.run_id }}](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})"
} > "$body_file"

gh issue create \
-R "${{ github.repository }}" \
--title "NPM_TOKEN auth failed — token rot detected" \
--body-file "$body_file" \
--label "npm-token-rot"

- name: Auto-close any open token-rot issue on success
if: steps.check.outputs.exit == '0'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Re-arm: if a previous failure opened an issue and the operator
# has since rotated the token, the next successful health check
# should close the issue automatically. Marker-comment match.
marker='<!-- agent-npm-token-rot -->'
open_issues=$(gh issue list -R "${{ github.repository }}" --state open --search "in:body $marker" --json number --jq '.[].number')
for n in $open_issues; do
gh issue close "$n" -R "${{ github.repository }}" --comment "Auto-closed: \`npm whoami\` succeeded on [run #${{ github.run_id }}](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}). Token healthy again."
done