Ordinal output layers and loss functions for PyTorch and TensorFlow/Keras, based on Rennie & Srebro (2005).
DeepOrdinal provides an OrdinalOutput layer that converts a learned logit into ordinal class probabilities via sorted thresholds, plus loss functions designed specifically for ordinal regression.
pip install deepordinalInstall with a specific backend:
pip install "deepordinal[torch]" # PyTorch
pip install "deepordinal[tf]" # TensorFlow/KerasFor development:
pip install -e ".[torch,tf]"import torch
import torch.nn as nn
from deepordinal.torch import OrdinalOutput, ordinal_loss
model = nn.Sequential(
nn.Linear(8, 16),
nn.ReLU(),
OrdinalOutput(input_dim=16, output_dim=4),
)
ordinal_layer = model[2]
optimizer = torch.optim.Adam(model.parameters())
# Training step
h = model[1](model[0](x_batch)) # hidden features
logits = ordinal_layer.linear(h) # raw logit
thresholds = ordinal_layer.interior_thresholds
loss = ordinal_loss(logits, y_batch, thresholds)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Inference
probs = model(x_batch) # (batch, K) class probabilities
preds = probs.argmax(dim=1)import tensorflow as tf
from deepordinal.tf import OrdinalOutput, ordinal_loss
dense = tf.keras.layers.Dense(16, activation="relu")
ordinal = OrdinalOutput(output_dim=4)
# Training step
with tf.GradientTape() as tape:
h = dense(x_batch)
logits = tf.matmul(h, ordinal.kernel) + ordinal.bias
thresholds = ordinal.interior_thresholds[0] # shape is (1, K-1)
loss = ordinal_loss(logits, y_batch, thresholds)
grads = tape.gradient(loss, dense.trainable_variables + ordinal.trainable_variables)
# Inference
probs = ordinal(dense(x_batch)) # (batch, K) class probabilities
preds = tf.argmax(probs, axis=1)Projects an input down to a single logit and converts it into K class probabilities using K-1 learned, sorted thresholds:
P(y = k | x) = sigmoid(t(k+1) - logit) - sigmoid(t(k) - logit)
where t(0) = -inf and t(K) = inf are fixed, and interior thresholds are initialized sorted.
| PyTorch | TensorFlow/Keras | |
|---|---|---|
| Import | from deepordinal.torch import OrdinalOutput |
from deepordinal.tf import OrdinalOutput |
| Constructor | OrdinalOutput(input_dim, output_dim) |
OrdinalOutput(output_dim) |
| Logit access | layer.linear(h) |
tf.matmul(h, layer.kernel) + layer.bias |
| Thresholds | layer.interior_thresholds — shape (K-1,) |
layer.interior_thresholds[0] — shape (1, K-1) |
Threshold-based ordinal loss from Rennie & Srebro (2005). Operates on raw logits and thresholds rather than probability output.
ordinal_loss(logits, targets, thresholds, construction="all", penalty="logistic")Parameters:
- logits —
(batch,)or(batch, 1), raw predictor output - targets —
(batch,), integer labels in[0, K) - thresholds —
(K-1,), sorted interior thresholds - construction —
"all"(default) or"immediate" - penalty —
"logistic"(default),"hinge","smooth_hinge", or"modified_least_squares"
Returns: scalar mean loss over the batch.
- All-threshold (default, eq 13) — penalizes violations of every threshold, weighted by direction. Bounds mean absolute error. Best performer in the paper's experiments.
- Immediate-threshold (eq 12) — only penalizes violations of the two thresholds bounding the correct class segment.
| Name | Formula | Reference |
|---|---|---|
"logistic" |
log(1 + exp(-z)) |
eq 9 |
"hinge" |
max(0, 1-z) |
eq 5 |
"smooth_hinge" |
0 if z>=1, (1-z)^2/2 if 0<z<1, 0.5-z if z<=0 |
eq 6 |
"modified_least_squares" |
0 if z>=1, (1-z)^2 if z<1 |
eq 7 |
The paper recommends all-threshold + logistic as the best-performing combination (the default).
Probabilistic generalization of logistic regression to K-class ordinal problems (Section 4).
ordistic_loss(logits, targets, means, log_priors=None)Parameters:
- logits —
(batch,)or(batch, 1), raw predictor output - targets —
(batch,), integer labels in[0, K) - means —
(K,), class means (convention: mu_1=-1, mu_K=1; interior means learned) - log_priors —
(K,)orNone, optional log-prior terms
Returns: scalar mean negative log-likelihood over the batch.
Complete training loops with synthetic ordinal data:
examples/example_torch.ipynb— PyTorchexamples/example_tf.ipynb— TensorFlow/Keras
pip install -e ".[torch,tf]"
pytest -vMIT
Rennie, J. D. M. & Srebro, N. (2005). Loss Functions for Preference Levels: Regression with Discrete Ordered Labels. Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling.