Skip to content

[future][discussion] deal with data located in files #196

@mathysgrapotte

Description

@mathysgrapotte

Description of feature

Current implementation requires a flat-file input csv file where one line = one entry (see readme ).

However, most datasets are not flat, i.e. image datasets require data to be saved in external files

We should consider :

  1. the shuffle process shuffles the lines of the input csv file, image datasets have few files that each contain lots of information (which would decrease the effectiveness of shuffle). idem for the split method.
  2. how to mount those files to the various processes ?
  3. effective way of parsing those files
  4. memory allocation for large datasets

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions