Skip to content

Standardize how to map scalar binning Enums to a corresponding scalar slot #99

@cmungall

Description

@cmungall

A common scenario is to have Enums that are bins over ranges of a scalar value

Each of the above has ranges encoded in the annotations but in different ways

Most range bin enums bin by the same unit, but this is not guaranteed, e.g.

E.g.

name: AgeGroupEnum
description: Standard age groups used in NIH clinical research, particularly NINDS
  CDEs
from_schema: https://w3id.org/linkml/valuesets
rank: 1000
permissible_values:
  NEONATE:
    text: NEONATE
    description: Birth to 28 days
    meaning: NCIT:C16731
    annotations:
      max_age_days: 28
    title: Newborn
  INFANT:
    text: INFANT
    description: 29 days to less than 1 year
    meaning: NCIT:C27956
    annotations:
      min_age_days: 29
      max_age_years: 1

We might even have different units in the same PV

Note these annotations are not necessary - we can use linkml-map to write conditional boolean rules for mapping. But this can lead to ugly boilerplate, and it is often better to keep the metadata about ranges in the PV metadata itself.

If this were standardized then we could automatically (a) infer bins from scalars (b) perform consistency checking when both are present.

In the above AgeRange case, we can imagine age_days or age_years? (linkml-map already has examples of how unit conversion between these can be automated.

We can imagine a few different ways to standardize this.

Nesting

INFANT:
    text: INFANT
    description: 29 days to less than 1 year
    meaning: NCIT:C27956
    annotations:
      min:
        age_days: 29
      max:
        age_years: 1

Flat

INFANT:
    text: INFANT
    description: 29 days to less than 1 year
    meaning: NCIT:C27956
    annotations:
      minimum_age_days: 29
      minimum_age_years: 1

My preference is for the latter. This would rely on linkml-map specifications for aggregate operations, which don't yet exist. But at least having a convention:

  • minimum_[scalar_slot] and maximum_[scalar_slot], where scalar_slot is defined using normal linkml mechanisms

Would help with human readability and consistency

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions