Skip to content

Add callable support for decimal_format option#978

Open
NyanFisher wants to merge 1 commit intojcrist:mainfrom
NyanFisher:decimal-quantize
Open

Add callable support for decimal_format option#978
NyanFisher wants to merge 1 commit intojcrist:mainfrom
NyanFisher:decimal-quantize

Conversation

@NyanFisher
Copy link
Copy Markdown

@NyanFisher NyanFisher commented Feb 11, 2026

Hello!

Description of the problem solved by this PR

The msgspec library has many useful features, but the current version lacks the ability to correctly quantize Decimal values during encoding. In applications related to finance and precise calculations, it is critical to take into account maximum accuracy and return values rounded to a specified precision. Without implementing this functionality, a complete transition from pydantic to msgspec is not possible.

Changes implemented in this PR

Core Functionality

  • Added DECIMAL_FORMAT_CALLABLE enum value to support callable decimal_format
  • Added decimal_callable field to EncoderState and Encoder structs
  • Implemented callable invocation for both JSON and MessagePack encoders

Validation & Safety

  • Added runtime check preventing callable from returning Decimal (avoids infinite recursion)
  • Updated error messages to reflect new callable option

Type Hints

  • Updated json.pyi and msgpack.pyi stubs to include callable type hints:
decimal_format: Union[
    Literal["string", "number"],
    Callable[[decimal.Decimal], Union[str, float]],
]

Examples

Rounding to 2 Decimal Places

import msgspec
import decimal

enc = msgspec.json.Encoder(
    decimal_format=lambda d: str(d.quantize(decimal.Decimal("0.01")))
)

value = decimal.Decimal("123.456789")
print(enc.encode(value))  # b'"123.46"'

MessagePack with Rounding

import msgspec
import decimal

# MessagePack with custom rounding
enc = msgspec.msgpack.Encoder(
    decimal_format=lambda d: float(d.quantize(decimal.Decimal("0.001")))
)

value = decimal.Decimal("3.14159265")
msg = enc.encode(value)
print(msgspec.msgpack.decode(msg))  # 3.142

Error: Returning Decimal from Callable

import msgspec
import decimal

# INVALID: callable must not return Decimal
enc = msgspec.json.Encoder(
    decimal_format=lambda d: d.quantize(decimal.Decimal("0.01"))  # Error!
)

try:
    enc.encode(decimal.Decimal("1.234"))
except TypeError as e:
    print(e)  # decimal_format callable must not return a Decimal

I would appreciate any comments on improving or restructuring the code, as I don't often write in C.

Fix my issue - Closes #848

@NyanFisher NyanFisher changed the title Implement quantization for Decimal type when encode Draft: Implement quantization for Decimal type when encode Feb 11, 2026
@NyanFisher NyanFisher force-pushed the decimal-quantize branch 2 times, most recently from 2ee9741 to 31effee Compare February 11, 2026 13:51
@NyanFisher NyanFisher changed the title Draft: Implement quantization for Decimal type when encode Implement quantization for Decimal type when encode Feb 11, 2026
@NyanFisher NyanFisher force-pushed the decimal-quantize branch 2 times, most recently from 117001b to b5ff6b7 Compare February 11, 2026 15:32
@NyanFisher
Copy link
Copy Markdown
Author

CI failures are unrelated to this change:

All build, test, and wheel jobs pass across all platforms.

@Siyet
Copy link
Copy Markdown
Collaborator

Siyet commented Apr 10, 2026

Code looks solid and CI is green across the matrix - nice work, especially the test coverage in test_common.py.

One API design question I'd like to raise before this moves forward: the current shape places decimal_quantize / decimal_rounding on the encoder itself, which means every Decimal field in every struct passing through that encoder gets the same scale and rounding mode. In financial code it's common to have heterogeneous Decimal fields in the same payload (e.g. price at scale 4, quantity at scale 0, tax_rate at scale 6) - with an encoder-level setting you'd need separate encoders per shape, which defeats most of the ergonomic win.

An alternative would be to attach quantization to the type via Annotated[Decimal, Meta(...)], e.g.

Price = Annotated[Decimal, Meta(decimal_quantize="0.0001", decimal_rounding="ROUND_HALF_EVEN")]

That composes naturally with per-field configuration, lives next to the type where the constraint is logically defined, and matches how gt/ge/pattern etc. already work today. The downside is more plumbing through TypeNode instead of one encoder kwarg.

Did you consider the Meta-based approach? If so, what made you land on encoder-level? Both have trade-offs and I'd rather get the API right before merge.

cc @jcrist @ofek — this expands the encoder API surface, so I'd like your read on whether the encoder-kwarg shape is the one we want, or whether Meta-based quantization is preferable.

@jcrist
Copy link
Copy Markdown
Owner

jcrist commented Apr 10, 2026

Instead of two new options for quantization, how about adding a single decimal_format option to Encoder? This would take either a string to pass to quantize (something like decimal.quantize(Decimal(decimal_format))), or a callable that takes in the decimal and returns a new value to encode. A few examples:

# Uses default rounding
enc = Encoder(decimal_format="0.0001")

# Custom rounding
enc = Encoder(decimal_format=lambda d: d.quanitize(decimal.Decimal("0.001"), "ROUND_DOWN"))

I like this since it's more flexible, and also only adds a single new option. Otherwise I'd worry about other users needing further customization, resulting in a number of decimal_* kwargs.

I wouldn't expect a callable here to have a perf cost - calling into python here is negligible, most of the time will be in the quantize call itself.

Did you consider the Meta-based approach? If so, what made you land on encoder-level? Both have trade-offs and I'd rather get the API right before merge.

In msgspec, (currently) encoding doesn't have any type-level information, it only has the values. This means customization for encoding cannot rely on information in annotations, it has to rely on the actual object instances themselves. This is admittedly less flexible in cases where you might want to encode different values differently, but keeps the encoder simple and supports values that exist outside of containers with attached annotations (e.g. encode(decimal_object) wouldn't have annotations, but encode(struct_with_a_decimal_field) would).

For now a single setting on an Encoder is both straightforward to implement, and matches the current conventions.

@NyanFisher
Copy link
Copy Markdown
Author

@Siyet @jcrist Hello! Thanks for the review!

@Siyet

Did you consider the Meta-based approach? If so, what made you land on encoder-level? Both have trade-offs and I'd rather get the API right before merge.

I hadn't considered using Meta, but I think that approach would result in a large number of TypeNode. I work at a bank and know that a single Price isn't enough, since it's too general a concept. But it's a good idea for future 😃

@jcrist

Instead of two new options for quantization, how about adding a single decimal_format option to Encoder?

I like this idea, but the decimal_format parameter already exists. If you plan to extend the interface with additional types, I don’t think this is the best solution, as it will confuse users. I suggest using a separate additional parameter called decimal_quantize with the types Decimal | Callable[[Decimal], Decimal], which would be responsible exclusively for quantization.
This way, we’ll retain the ability to convert Decimal to “string”/“number”, add quantization, and maintain backward compatibility.

@Siyet
Copy link
Copy Markdown
Collaborator

Siyet commented Apr 15, 2026

After thinking it through I'm coming around to @jcrist's single-kwarg shape. One slot for everything is, in my view, the right call here.

Encoder(decimal_format=lambda d: d.quantize(Decimal("0.001"), ROUND_DOWN))  # custom
Encoder(decimal_format="string")                                            # existing
Encoder(decimal_format="number")                                            # existing

We could split it along dataclasses.field(default=..., default_factory=...) lines (value in one kwarg, callable in another), but that split exists specifically to disambiguate "the value is a callable" from "call this to produce the value", and neither "string" nor "number" is callable. Introducing a separate decimal_hook just to satisfy a pattern we do not actually need feels like overcomplicating the interface.

There is also the naming angle: decimal_format reads as a verb just as naturally as it reads as a noun ("how to format the decimal"), which makes "pass a callable that does the formatting" fit the name rather than fight it.

@NyanFisher regarding your concern about overloading an existing kwarg: the three shapes ("string" / "number" / callable) dispatch unambiguously on type (string vs. callable), so the dispatch logic in C stays simple and the user-facing docs just enumerate the three accepted shapes in one place.

@NyanFisher NyanFisher closed this Apr 27, 2026
@NyanFisher NyanFisher reopened this Apr 27, 2026
@NyanFisher NyanFisher changed the title Implement quantization for Decimal type when encode Draft: Implement quantization for Decimal type when encode Apr 27, 2026
@NyanFisher NyanFisher changed the title Draft: Implement quantization for Decimal type when encode Add callable support for decimal_format option Apr 28, 2026
@NyanFisher
Copy link
Copy Markdown
Author

@Siyet

Please review this PR when you have a moment 🙂 I changed the implementation.

Copy link
Copy Markdown
Collaborator

@Siyet Siyet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-checked after the rework. CI is green across the matrix, design matches what we landed on. Ran some scenarios beyond the existing tests locally (WSL Ubuntu 22.04, Python 3.10, build at 4810273), three blockers inline.

Plus docs: docs/supported-types.rst:595-606 only mentions 'string'/'number', would be good to add a callable example covering the use case from #848.

Nit: test_encoder_decimal_callable_raise_error_if_fn_return_decimal should use match="must not return a Decimal" to pin the message rather than any TypeError.

Comment thread src/msgspec/_core.c
else if (PyUnicode_CheckExact(decimal_format)) {
bool ok = false;
if (PyUnicode_CheckExact(decimal_format)) {
if (PyUnicode_CompareWithASCIIString(decimal_format, "string") == 0) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation broke after flattening the nesting. The inner if/elif here are at 12 spaces instead of 8. Same below: bodies of else if (PyCallable_Check) (lines 9596-9600) and else (lines 9601-9608) are at 8 spaces instead of 4.

Comment thread src/msgspec/json.pyi
@@ -23,15 +24,21 @@ schema_hook_sig = Optional[Callable[[type], dict[str, Any]]]

class Encoder:
enc_hook: enc_hook_sig
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable[[Decimal], Union[str, float]] is narrower than what the runtime actually accepts. After the callable returns, the value goes through the regular encode path and accepts anything encodable (int, bool, dict, Struct, bytes, etc.). A realistic case: scaled integer for cents (lambda d: int(d * 100)), which mypy rejects with the current stub but works at runtime. Should be Callable[[Decimal], Any]. Same for msgpack.pyi.

Comment thread src/msgspec/_core.c
if (type == (PyTypeObject *)(self->mod->DecimalType)) {
PyErr_SetString(
PyExc_TypeError,
"decimal_format callable must not return a Decimal"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard only catches a direct Decimal return, not a nested one. Verified locally:

lambda d: [Decimal("0.5")]              -> RecursionError
lambda d: {"v": Decimal("0.5")}         -> RecursionError
lambda d: Struct(inner=Decimal("0.5"))  -> RecursionError

Worse: with sys.setrecursionlimit(10**6) (common in projects with deep graphs) the Python-level safety net is gone and the encoder hits SIGSEGV with a core dump. Cleanest fix: add an in_decimal_callable flag to EncoderState, on re-entry into *_encode_decimal raise TypeError: callable returned a value containing a Decimal. ~5 lines of C, covers all the nested cases. Same applies to _core.c:14028 (json_encode_decimal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decimal is a custom type

3 participants