Skip to content

Render HTML-safe JSON strings, especially for use in <script> tags#483

Merged
lambda-fairy merged 5 commits intolambda-fairy:mainfrom
allenap:json
Apr 6, 2026
Merged

Render HTML-safe JSON strings, especially for use in <script> tags#483
lambda-fairy merged 5 commits intolambda-fairy:mainfrom
allenap:json

Conversation

@allenap
Copy link
Copy Markdown
Contributor

@allenap allenap commented Oct 19, 2025

[Presently there's no issue for this, though #181 might be closest. I use this with Maud in a private codebase of my own, and I thought it could be more widely useful. I also wanted to expose my approach to the light to check if it does actually make sense.]

Reproduced from maud/src/json.rs:

This module provides a [Json] wrapper type which will render its inner value as HTML-safe JSON, and an implementation of [Render] for [serde_json::Value] as a convenience.

This does not follow WHATWG advice for embedding content into <script> tags, nor does it follow the JSON-LD spec's advice.

The WHATWG's advice for <script> elements suggests replacing < characters with \x3C. This works when embedding JSON into a <script> element that is being used for JavaScript because \xNN escapes are allowed in JavaScript, but it does not work when embedding, say, JSON-LD into a <script> element. These \xNN escape sequences are not recognised by JSON parsers, and their use renders the JSON payload invalid.

The JSON-LD spec suggests replacing several characters – <, >, &, and quotes – with HTML entities. It follows that every receiving processor must then reverse this transformation, but this does not seem to happen in practice.

Instead, this module replaces <, >, and & characters in JSON strings (including object keys) with \u003c, \u003e, and \u0026 Unicode escape sequences respectively. This is understood by JSON parsers but is inert as far as HTML <script> tags are concerned, and neutralises misinterpretation of embedded HTML, XML, entities, and other <…> tagged content by HTML processors. Notably, this also prevents HTML comments (i.e. <!-- … -->) from being opened or prematurely closed by JSON within the <script> tag. The JSON content is still entirely valid and able to be parsed as-is, with no pre- or post-processing required; it could be copied and pasted from the source document into a .json file and it would work.

⚠️ Some of the code in here can panic. That's not ideal. I was wondering about having a TryRender trait, say, that would allow for fallible renderers. Alternatively, I think the expect(...) call could remain in the impl Render for serde_json::Value because Value should be almost/entirely guaranteed to be valid JSON. The Json wrapper could be removed; just ask callers to use serde_json::to_value themselves.


There are also a few things I fixed in passing, but I can extract those to a separate PR if necessary.

Copy link
Copy Markdown
Owner

@lambda-fairy lambda-fairy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I haven't really thought about context-aware escaping yet, but this looks like a reasonable first step. Let's land it first and let it inform what to do next.

Comment thread maud/src/json.rs
/// Writes a string fragment to the specified writer.
///
/// It replaces `<`, `>`, and `&` characters with `\u003c`, `\u003e`, and
/// `\u0026` Unicode escape sequences respectively.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this should also include \u2028 and \u2029, to cover browsers that don't implement ES2019.

https://github.com/tc39/proposal-json-superset

Though I'm ok with ignoring this problem for now, given that we haven't decided on a policy w.r.t. legacy browsers yet.

Comment thread maud/src/json.rs

let mut offset = 0;
for (index, byte) in fragment.bytes().enumerate() {
if matches!(byte, b'<' | b'>' | b'&') {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future PR we can consider using Memchr3 for this (and the other escaping routines in this library).

Comment thread maud/src/json.rs
//! [advice-whatwg]:
//! https://html.spec.whatwg.org/multipage/scripting.html#restrictions-for-contents-of-script-elements
//! [advice-json-ld]:
//! https://www.w3.org/TR/json-ld/#restrictions-for-contents-of-json-ld-script-elements
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up here.

It might be worth also mentioning prototype pollution attacks (objects with keys named __proto__ or constructor). Embedding JSON with such keys won't cause immediate problems, but could reveal security issues if JavaScript code uses the parsed result naively.

@lambda-fairy lambda-fairy merged commit cd6d51c into lambda-fairy:main Apr 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants