-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Key links to docs are:
- https://tskit.dev/tskit/docs/stable/metadata.html#sec-metadata (not to be confused with https://tskit.dev/tskit/docs/stable/provenance.html#sec-provenance)
- https://tskit.dev/tskit/docs/stable/metadata.html#top-level
- https://tskit.dev/tskit/docs/stable/metadata.html#tables
- Working with Metadata tutorial https://tskit.dev/tutorials/metadata.html#sec-tutorial-metadata
- Differences with the Python API https://github.com/tskit-dev/tskit/blob/03ad862aa5830134d840ba150b4005ad7a5a7def/docs/c-api.rst?plain=1#L36C1-L37C1
From the last link we have:
Much of the explanatory material (for example tutorials) about the Python API applies to
the C-equivalent methods as the Python API wraps this API.
The main area of difference is, unlike the Python API, the C API doesn't do any
decoding, encoding or schema validation of :ref:`sec_metadata` fields,
instead only handling the byte sting representation of the metadata. Metadata is therefore
never used directly by any tskit C API method, just stored.
Interacting with Codex I got
You can decode on the R side similarly to Python: treat the raw bytes as JSON (if the schema says JSON) and run jsonlite::fromJSON().
A simple approach:
Read the metadata schema (if present) to decide codec.
Convert raw -> string.
Parse based on codec (JSON is the common case).
Example helper:
decode_metadata <- function(raw_vec, schema_raw = NULL) {
if (length(raw_vec) == 0) {
return(NULL)
}
# Default to JSON if no schema is provided
codec <- "json"
if (!is.null(schema_raw) && length(schema_raw) > 0) {
schema_txt <- rawToChar(schema_raw)
schema <- jsonlite::fromJSON(schema_txt)
if (!is.null(schema$codec)) {
codec <- schema$codec
}
}
txt <- rawToChar(raw_vec)
if (codec == "json") {
return(jsonlite::fromJSON(txt, simplifyVector = TRUE))
}
# Fallback: return raw string if codec is unknown
txt
}
Now testing this on my own gives:
ts = tskit$load("inst/examples/test2.trees")
decode_metadata(as.raw(ts$metadata))
# Error in nchar(x) : invalid multibyte string, element 1
ts$metadata
# $mean_coverage
# [1] 200.5
is(ts$metadata)
# [1] "list" "vector"
decode_metadata(as.raw(ts$tables$individuals$metadata))
# Error:
# ! lexical error: invalid char in json text.
# SOME CUSTOM BYTES #!@
# (right here) ------^
# Not sure why we get errors, as the decoding is correct!
Metadata
Metadata
Assignees
Labels
No labels