fix(scrapfly-webhooks): handle binary screenshot bodies and correct extraction payload path#66
Merged
Merged
Conversation
…xtraction payload path Fixes the two bugs reported in #65 (caught while capturing real Scrapfly deliveries for the webhook-samples repo): 1. **Screenshot handler crashed on binary bodies.** Scrapfly's Screenshot API delivers raw image bytes (JPEG / PNG / WebP / GIF) but sets `Content-Type: application/json` — an upstream quirk that makes the header lie about the body. The previous example always ran `JSON.parse(req.body)` after signature verification, so screenshot deliveries verified successfully and then 500'd on parse. Fixed by dispatching on `X-Scrapfly-Webhook-Resource-Type` BEFORE JSON parsing in all three example handlers (Express / Next.js / FastAPI) and the SKILL.md inline code. Screenshot is now handled as a binary path that logs the byte count and exits without parsing; the Next.js handler also switched from `request.text()` to `request.arrayBuffer()` so binary bytes survive intact through the framework. 2. **Extraction field path was wrong.** Real extraction bodies expose the fields at `payload.data`, not `payload.result.data`. Fixed in SKILL.md and all three example handlers. Tests now use the real shape `{ content_type, data: {...} }` captured from live deliveries. Also: - Added a prominent **Prerequisites** section to SKILL.md noting the paid-plan requirement (FREE-tier accounts hide the webhook UI and have a queue size of 0, so no deliveries ever fire). Was already in `references/setup.md`; pulled it to SKILL.md for visibility. - Added an **Alternative: Verify at the Gateway with Hookdeck** section in `references/verification.md`, calling out Hookdeck's built-in `SCRAPFLY` source-type that does edge verification, plus the known May 2026 caveat that the Content-Type / binary-body mismatch causes Hookdeck to reject screenshot deliveries with UNPARSABLE_JSON when the preset is enabled. - Added two tests per framework covering the binary screenshot path (verifies HMAC over raw image bytes + handler doesn't try to JSON- parse) and the corrected extraction shape. All three example suites pass: 24 Express, 17 Next.js, 19 FastAPI. Verification (HMAC-SHA256, uppercase hex, dual-case headers, raw-body capture, no-replay-window, dispatch by resource type, crawler events) is unchanged — those parts of the skill already matched live behaviour. Closes #65. https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB
…board setting, not "always JSON" Follow-up to the previous commit on this PR. The earlier framing said "Scrapfly nevertheless sets Content-Type: application/json on the request" which is technically wrong — Scrapfly's webhook config exposes a Content-Type dropdown with `application/json` (default) and `application/msgpack` options, and the configured value is what gets sent verbatim on every delivery. The actual upstream quirk is narrower: Scrapfly Screenshot deliveries are raw image bytes regardless of which Content-Type you configured (the configured value is sent in the header but doesn't change the body for image deliveries). Updated five places to reframe accordingly: - SKILL.md "Screenshot is binary" bullet and the inline-handler comment - references/verification.md "Screenshot deliveries are binary" gotcha and the Hookdeck-Gateway caveat - references/setup.md gains a Content-Type bullet in the dashboard config steps, calling out that JSON is the default and that msgpack users need to swap the parser in handler scrape/extraction branches - Express, Next.js, FastAPI handler comments Added a new gotcha entry in references/verification.md and a comment in each handler example explaining how to handle msgpack-configured webhooks (the dispatch-before-parse pattern is unchanged; only the parse step needs to swap to a msgpack decoder). No code behaviour changes — only doc/comment text. All three example suites still pass: 24 Express, 17 Next.js, 19 FastAPI. https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #65. Fixes two real bugs in
scrapfly-webhookscaught while capturing live Scrapfly deliveries for the webhook-samples repo:Content-Type: application/json).payload.result.data→payload.data. The example used to silently logundefined.What changed
Handler dispatch reordered (Express, Next.js, FastAPI + SKILL.md inline):
Next.js framework fix: body reader switched from
await request.text()toBuffer.from(await request.arrayBuffer())—request.text()UTF-8-decodes the bytes and corrupts binary screenshot bodies;arrayBuffer()preserves them so the HMAC verifies correctly.SKILL.md additions:
references/setup.md; pulled forward for visibility — FREE-tier accounts hide the webhook UI entirely and have a queue size of 0).SCRAPFLYsource type for edge verification.references/verification.mdadditions:UNPARSABLE_JSONwhen the preset is enabled; route screenshots directly until upstream resolves the Content-Type).arrayBuffer()fix.Tests:
{ content_type, data: {...} }) instead of the bogus{ result: { data: {...} } }.Test plan
cd skills/scrapfly-webhooks/examples/express && npm test— 24 passed, 0 failedcd skills/scrapfly-webhooks/examples/nextjs && npm test— 17 passed, 0 failedcd skills/scrapfly-webhooks/examples/fastapi && pytest test_webhook.py -v— 19 passed, 0 failedNot addressed in this PR
From issue #65, point 3.c (npx install path verification): we already use
npx hookdeck-cli listenas the canonical pattern repo-wide (per AGENTS.md, set in PR #40 + #54) and have verified it works against the publishedhookdeck-clinpm package. Leaving as-is.Unverified strengths preserved
The signature verification code (JS + Python), the full header list, timing-safe comparison, raw-body capture with
express.raw({ type: '*/*' })/request.arrayBuffer()/await request.body(), the no-replay-window note, and routing byX-Scrapfly-Webhook-Resource-Typeall remain unchanged — they matched live behaviour and weren't part of the bug report.https://claude.ai/code/session_01NNTgQRJss1V7gyzzJ9rjnB
Generated by Claude Code