-
-
Notifications
You must be signed in to change notification settings - Fork 211
[ENH] V1 → V2 API Migration - datasets #1608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1608 +/- ##
==========================================
+ Coverage 53.02% 53.32% +0.29%
==========================================
Files 36 46 +10
Lines 4326 4645 +319
==========================================
+ Hits 2294 2477 +183
- Misses 2032 2168 +136 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
FYI @geetu040 Currently the
Issues:
Example:current def _get_dataset_features_file(did_cache_dir: str | Path | None, dataset_id: int) -> dict[int, OpenMLDataFeature]:
return _featuresOr by updating the Dataset class to use the underlining interface method from api_context directly. def _load_features(self) -> None:
...
self._features = api_context.backend.datasets.get_features(self.dataset_id)Another option is to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left an intermediate review. This is solid work and well done overall. Nice job. I'll look into the download part now.
| def list( | ||
| self, | ||
| limit: int, | ||
| offset: int, | ||
| *, | ||
| data_id: list[int] | None = None, # type: ignore | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not have same signature for all 3 methods: DatasetsAPI.list, DatasetsV1.list, DatasetsV2.list? does it raise pre-commit failures since a few might not be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that v2 signature was experimental, idk how pre-commits did not catch that, Will came them same
| def list( | ||
| self, | ||
| limit: int, | ||
| offset: int, | ||
| *, | ||
| data_id: list[int] | None = None, # type: ignore | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this simple using private helper methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geetu040 I could not find a good common block to make an helper since the filters are passed via url in v1 and via json object in v2 , and both have different parsing, If you have any specific Idea on that please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at DatasetsV1 I can think of these helper methods: _build_list_call, _parse_and_validate_xml, _parse_dataset
you can do something similar for DatasetsV2 though they can be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do something similar for DatasetsV2 though they can be different.
I see, That opens more options.
openml/_api/resources/datasets.py
Outdated
| bool | ||
| True if the deletion was successful. False otherwise. | ||
| """ | ||
| return openml.utils._delete_entity("data", dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you implement the delete logic yourself instead of openml.utils._delete_entity, how would that look? I think it would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes Sense , It would look like a delete request from client along with exception handling
| def list( | ||
| self, | ||
| limit: int, | ||
| offset: int, | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, it can use private helper methods
| # Minimalistic check if the XML is useful | ||
| if "oml:data_qualities_list" not in qualities: | ||
| raise ValueError('Error in return XML, does not contain "oml:data_qualities_list"') | ||
| from openml._api import api_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we have this import at the very top? does it create circular import error? if not, should be moved to top from all functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does raise circular import
Thanks for a detailed explanation, I now have good understanding of the download mechanism.
minio can be handled easily, we will use a separate client along with
these are actually different objects in both apis, v1 uses xml and v2 keeps them in json
yes you are right, they are the same files, which are not required to be downloaded again for both versions, but isn't this true for almost all the http objects? they may have different format
I don't understand this point
agreed, should be handled by
agreed, adding in conclusion, I may ask, if we ignore the fact that it downloads the |
Metadata
Reference Issue: [ENH] V1 → V2 API Migration - datasets #1592
Depends on: [ENH] V1 → V2 API Migration - core structure #1576
Change Log Entry:This PR implements Datasets resource, and refactor its existing functions