Release Candidate • 2025‑08‑08
OpenChatML is a plain‑text envelope for LLM dialogue that:
- cleanly separates private reasoning (
analysis) from user‑visible output (final), - provides native support for built‑in and user‑defined tools via
commentary, - enables safe, predictable streaming and cancellation,
- remains readable by 1.x parsers by treating legacy messages as
final, - and interoperates with Harmony (gpt‑oss) via HIP‑1.
Every transcript must begin with a YAML header.
version: 2.2 # required – spec major.minor
model: gpt-oss-120b # optional – model identifier
generation_settings: # optional – settings are transcript-global
temperature: 0.7
reasoning_effort: "medium" # low · medium · high
builtin_tools: ["browser","python"]
capabilities: # optional negotiation block
roles: ["system","developer","user","assistant","functions.*","tool"]
channels: ["analysis","commentary","final"]
tokens: ["<|start|>","<|channel|>","<|message|>","<|call|>","<|constrain|>","<|return|>","<|end|>","<|literal|>","<|endliteral|>"]
streaming: ["response.delta","response.cancel","tool.cancel"]
profiles:
harmony:
enabled: true
encoding: o200k_harmony # informative; parsers MUST ignore unknown keys
require_channels: ["analysis","commentary","final"]
tool_calls_in: "commentary" # see HIP‑1 §3
builtins_in: ["analysis","commentary"]Parsers MUST ignore unknown keys.
generation_settings apply to the entire transcript unless a future revision specifies a mechanism for per-message overrides.
(Each SHOULD be a single BPE where possible; see portability clause.)
| Token | Purpose |
|---|---|
<|start|> |
Open a message header |
<|channel|> |
Declare logical channel (analysis · commentary · final) |
<|message|> |
Start of body |
<|call|> |
Terminates JSON of an outgoing tool call (router stop) |
<|constrain|> |
Declares input data type for the subsequent body (e.g., json) |
<|return|> |
Sampling terminator for the final assistant message (stop token) |
<|end|> |
Close the message |
<|literal|> |
Begin a literal block; the body is treated as opaque bytes (no token scanning) |
<|endliteral|> |
End a literal block |
Escaping: Outside a literal block, a literal control token can be written by doubling the leading <: <<|start|> renders as <|start|>.
Portability: If single‑BPE registration is unavailable, parsers MUST recognize the literal ASCII sequences across token boundaries.
| Role | Description |
|---|---|
system |
Ground‑truth constraints (highest authority) — hidden by default |
developer |
Implementation‑time instructions & tool defs — hidden by default |
user |
End‑user input |
assistant |
Model output |
functions.<tool> |
Legacy/Compat role for a tool’s reply (e.g., functions.search). |
tool |
Canonical role for a tool’s reply. Use with name= to carry the specific tool name. |
Instruction hierarchy: system > developer > user > assistant > tool.
Canonical Usage: For tool replies, producers SHOULD use ROLE=tool name={tool_name}. Parsers MUST accept ROLE=functions.{tool_name} for Harmony compatibility (see HIP-1).
Renderer policy: UIs MUST hide system, developer, analysis, and commentary from end users by default. Debug views MAY expose them with explicit opt‑in.
| Channel | Shown to user? | Typical content |
|---|---|---|
analysis |
No | Chain‑of‑thought, self‑critique, internal tool use |
commentary |
No (see below) | Tool call JSON; optional preambles intended for the user when flagged intent=preamble |
final |
Yes | Polished answer |
Rules
- Each message has exactly one channel.
- If
<\|channel\|>is absent, treat the message asfinal(back‑compat). - Renderers MUST NOT display
analysiscontent. commentaryMAY include end‑user‑visible preambles (plans, action lists). To render, the assistant MUST setintent=preamblein the header (see §6) or copy content into afinalmessage.
<|start|>{ROLE}[ SP to={DEST}][ SP call_id={ID}][ SP name={ALIAS}][ SP intent={INTENT}][ SP content_type={CTYPE}]
[<|channel|>{CHANNEL}][ SP intent={INTENT}][ SP content_type={CTYPE}]
[<|constrain|>{CTYPE}]
<|message|>{BODY}<|end|>
{DEST}— Required when routing to a tool. Canonical position is in the<|start|>header. Format:functions.{tool}orbrowser.search. For HIP-1 compatibility, parsers MUST also acceptto=after the channel tag.{ID}— Required, unique identifier for a tool call. It MUST be present in the<|start|>header of the outgoing call and the corresponding tool reply.{ALIAS}— Optional display name. WhenROLE=tool, this MUST be set to the tool's name (e.g.,name=functions.get_current_weather).{INTENT}— Optional, values:preamble(user‑visible commentary),status,debug(non‑display).{CTYPE}— Optional content type hint (e.g.,json,text,markdown). When present in<|constrain|>, it applies to the body. If the body fails to parse as the specified type, runtimes MUST raise anE-BODY-CONSTRAINT-VIOLATIONerror.{BODY}— UTF‑8 text or JSON. For an outgoing tool call, end the message with<\|call\|>.
Canonical tool call envelope: Use the tokenized text path above. A turn MUST NOT mix <\|call\|> and any alternate JSON tool_calls field.
(inside analysis only — never shown to end users)
<|start_reflect|>— self‑critique<|start_introspect|>— latent probes<|start_reason|>— step‑by‑step logic
Semantics: These markers are intended primarily for human-readable debugging and offline model analysis. Automated tools MAY be designed to parse them, but they are not intended to trigger specific runtime behaviors.
Safety: CoT is not moderated to the same standard as final. Do not display. When the previous assistant turn ended in final, subsequent sampling SHOULD drop prior CoT. Exception: retain analysis leading to unresolved tool calls.
1) assistant (analysis) – rationale (optional)
2) assistant to=functions.X call_id=xyz789 (commentary)<|constrain|>json … <|call|>
3) tool name=functions.X call_id=xyz789 to=assistant (commentary, JSON result)
4) assistant (analysis) – post‑processing (optional)
5) assistant (final) – user answer <|return|>
- An assistant turn MAY request N ≥ 1 tool calls.
- Each call MUST carry a unique
call_idin its<|start|>header. This attribute is the canonical identifier for the call. - Tool replies MUST echo the
call_idin their header to correlate request and response. - Runtimes MAY execute calls concurrently; the assistant MUST NOT assume arrival order.
- Optional
deadline_msper call can be specified in the call's JSON body. - New event:
tool.cancel {call_id, reason}; tools SHOULD be idempotent.
Reply envelope (recommended):
{ "ok": true, "content": { ... }, "error": null, "provenance": { "source": "...", "timestamp": "..." } }Note: The call_id is now handled in the message header, not the JSON body.
browserandpythonMAY be invoked fromanalysisorcommentary(Harmony‑style).- Function tools (developer‑defined) must be routed on
commentary. - Tool inputs MUST be limited to the
commentarybody; tools MUST NOT receiveanalysis.
Events
| Event | Contains | UI policy |
|---|---|---|
response.delta |
final token slices |
display |
response.cancel |
cancellation notice | stop rendering |
response.delta.flush |
paragraph/segment boundary hint | display boundary |
tool.cancel |
{call_id, reason} |
internal routing |
- Do not interleave unrelated streams unless each is tagged by a
message_id. - If streaming ends without
…done(i.e., without<|return|>or<|call|>), the message is incomplete.
For API transports that prefer a JSON envelope mirroring the text transcript:
Rule: The plain‑text envelope is canonical. JSON is a faithful projection; do not invent states the text cannot represent.
- Enforce channel boundaries:
analysis/commentaryMUST NOT be shown to end users unless explicitly copied tofinalor flaggedintent=preamble. - Tools MUST return machine‑readable errors and SHOULD include
provenance(source URL, checksum, timestamp) when applicable. - Runtimes SHOULD sanitize/escape user‑supplied strings inside message headers.
The following constraints ensure clean round‑trips with Harmony / gpt‑oss:
- Tokens: Include
<|constrain|>and honor<|return|>as a valid stop token. Harmony’so200k_harmonyencoding is permitted but not required. - Channels: Always include one of
analysis,commentary,final. Harmony commonly emitsanalysismessages before afinalor a<|call|>. - Roles & recipients:
- Tool calls use
assistant … to=functions.{tool}incommentary. For compatibility, parsers MUST acceptto=after the<|channel|>tag, though the<|start|>header is canonical for OCM 2.2. - Tool replies MAY use
ROLE=functions.{tool}. OCM 2.2 producers SHOULD use the canonicalROLE=tool name={functions.{tool}}, but parsers MUST accept both for compatibility.
- Tool calls use
- Constrain token: For JSON tool arguments, emit
<|constrain|>jsonbefore<|message|>. - Built‑ins: Allow
browser.*andpythonfromanalysisorcommentary. (Harmony may preferanalysisfor built‑ins.) - Preambles: If the model emits a plan for the user on
commentary, setintent=preambleso renderers may display it (Harmony calls this a “preamble”). - CoT handling: When a turn ends with
final(<|return|>), drop prioranalysison subsequent sampling. When a turn ends with<|call|>, include prioranalysisand the call when resuming. - Stop conditions: Treat
<|return|>and<|call|>as hard stops for inference.
- Any 1.x transcript (no channels, no
developer, etc.) is valid under 2.x and is read as all‑final. - A 1.x parser will not understand 2.x features and SHOULD fail loudly.
E-PARSE-HEADER: malformed headerE-PARSE-CHANNEL-MISSING: channel absent where required by profileE-BODY-CONSTRAINT-VIOLATION: message body failed to parse against<|constrain|>typeE-CALL-SCHEMA: tool args missing/invalidE-TOOL-TIMEOUT: tool exceededdeadline_msE-TOOL-CANCELLED: tool cancelled by runtimeE-STREAM-TRUNCATED: streaming ended without a stop tokenE-PERM-VISIBILITY: attempt to display hidden channels
Runtimes SHOULD log these and provide user‑safe summaries when appropriate.
WSP = SP / HTAB
ROLE = "system" / "developer" / "user" / "assistant" / "tool" / "functions." 1*(ALPHA / DIGIT / "_" / "-")
CHANNEL = "analysis" / "commentary" / "final"
TOKEN = "<|start|>" / "<|channel|>" / "<|message|>" / "<|call|>" / "<|constrain|>" / "<|return|>" / "<|end|>" / "<|literal|>" / "<|endliteral|>"
HEADER = "<|start|>" ROLE [WSP "to=" 1*VCHAR] [WSP "call_id=" 1*VCHAR] [WSP "name=" 1*VCHAR] [WSP "intent=" 1*VCHAR] [WSP "content_type=" 1*VCHAR]
CHANNELTAG = [ "<|channel|>" CHANNEL [WSP "intent=" 1*VCHAR] [WSP "content_type=" 1*VCHAR] ]
CONSTRAIN = [ "<|constrain|>" 1*VCHAR ]
MESSAGE = "<|message|>" *OCTET
FOOTER = "<|end|>"
FRAME = HEADER [WSP] CHANNELTAG [WSP] CONSTRAIN MESSAGE FOOTERMESSAGE terminates at the next <|end|> matching the current frame.
<|start|>user<|message|>What is 2 + 2?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Simple arithmetic; answer directly.<|end|>
<|start|>assistant<|channel|>final<|message|>4.<|return|>
<|start|>system<|message|>You are a helpful AI assistant.
Knowledge cutoff: 2024-06
Current date: 2025-08-08
Reasoning: high
# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|>
<|start|>developer<|message|># Tools
## functions
namespace functions {
// Gets weather for a city.
type get_current_weather = (_: {
location: string,
format?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Call functions.get_current_weather with location Tokyo.<|end|>
<|start|>assistant to=functions.get_current_weather call_id=wx1<|channel|>commentary<|constrain|>json<|message|>{"location":"Tokyo","format":"celsius"}<|call|>
<|start|>tool name=functions.get_current_weather call_id=wx1 to=assistant<|channel|>commentary<|message|>{"ok":true,"content":{"temperature":20,"sunny":true}}<|end|>
<|start|>assistant<|channel|>final<|message|>It’s 20 °C and sunny in Tokyo right now.<|return|>
<|start|>assistant intent=preamble<|channel|>commentary<|message|>**Plan:** 1) Search docs 2) Extract figures 3) Summarize.<|end|>
Renderers MAY show this preamble to the user because intent=preamble is set.
<|start|>user<|message|>Please print these markers exactly:
<|literal|>
<|start|><|channel|><|message|><|end|>
<|endliteral|><|end|>
- 1.x transcript (no channels) → parsed as all‑
final. - Fully‑channeled 2.2 with
<|return|>. - Multi‑tool turn with two concurrent calls and echoed
call_ids in headers. - Tool error path (
ok:falsein body,E-TOOL-TIMEOUTas error code). - Literal block containing
<|start|>. - Message body that violates a
<|constrain|>jsondirective →E-BODY-CONSTRAINT-VIOLATION. - Harmony preamble on
commentarywithintent=preamble. - Tool reply using the legacy
ROLE=functions.x→ parsed successfully.
| Area | Action |
|---|---|
| Tokenizer | Add the nine control tokens; support multi‑token fallback. |
| Parser | Channel optionality; final default; to= accepted in start header (canonical) or after `< |
| Renderer | Hide system/developer by default; hide analysis/commentary unless intent=preamble. |
| Tool Router | Dispatch based on assistant to=...; use call_id from header for tracking; support concurrency + cancellation. |
| Streaming | Implement response.delta, response.cancel, response.delta.flush. |
| Safety | Enforce channel boundaries; never surface CoT; handle constraint violations by raising E-BODY-CONSTRAINT-VIOLATION. |
| Harmony | Enable HIP‑1; honor `< |
| Training | Teach model to end final with `< |
{ "role": "assistant", "channel": "final", "content": "Visible reply", "tool_call": { // absent unless <|call|> was emitted "id": "abc123", // from call_id= header attribute "recipient": "functions.lookup_weather", // from to= header attribute "content_type": "json", "arguments": "{\"location\":\"SF\"}" }, "thinking": "Private reasoning (analysis)", // optional; never deliver to end users "intent": "preamble" // optional; see §5 }