Skip to content

Consolidate config + metadata generated by Data Designer #173

@nabinchha

Description

@nabinchha

See comment in this PR

Streamline Data Designer metadata generation: Currently, we generate column_configs.json, model_configs.json, and metadata.json. Ideally, we should collapse these into a single metadata.json (sanitized for Hugging Face) and a new sdg.json file (or a different name). The latter would capture the serialized version of the entire Data Designer SDG pipeline, allowing anyone to recreate it using the DataDesignerConfigBuilder.from_config(...) API. This approach allows us to focus initially on the push_to_hub integration, as "re-hydrating" a DatasetCreationResults object via pull_to_hub seems a bit more complex. I can create a GitHub issue and a PR for this at the beginning of next year.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions