Skip to content

Detect Geoid in Dataset #8

Description

@etam4260

Is your feature request related to a problem? Please describe.
NA

Describe the solution you'd like

** MIGHT BE EXPORTED TO A SEPARATE REPO/PACKAGE IF IMPLEMENTATION IS TOO COMPLEX**

This initial goal of this feature is to help the user identify geographic identifiers in their dataset. For example, if there exists
a column called geoid and the data under it is 21321, 22433... We might assume this refers to zip or county code, but we don't know completely. What about county as the column and then 32212, 13131, ... then clearly we would know what it is referring to.

I think this would require some knowledge of machine learning using NLP alongside using structure of the entries within a column. This assumption is only made only for US datasets. For example, a 5 digit number would allow us to narrow down to zip, county, or cbsa. The name of the column might also help classify it. (Could we expand this to other countries?). This would require a fair amount of knowledge on geographic identifiers...

Another worthwhile feature that could be added is to determine the hierarchy of geographic data within a dataset. For example, we should order state column as higher above county column and countysubdivision below that. This is likely a feature that should be added on after there is a working solution to detecting geoids in a dataset.

Describe alternatives you've considered
NA

Additional context
NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions