Skip to content

feat(datasets): add tasks subcommand with file task support and improved output#875

Merged
BCeZn merged 3 commits intoalibaba:masterfrom
berstpander:feature/datasets-tasks-list
Apr 22, 2026
Merged

feat(datasets): add tasks subcommand with file task support and improved output#875
BCeZn merged 3 commits intoalibaba:masterfrom
berstpander:feature/datasets-tasks-list

Conversation

@berstpander
Copy link
Copy Markdown
Contributor

Summary

  • Add rock datasets tasks CLI command to list task IDs under a dataset split
  • Fix task listing to recognize both directory and file tasks
  • Improve output formatting with visual separation from logs

closes #874

Changes

CLI

  • Add rock datasets tasks with --org, --dataset, --split, --offset, --limit args
  • Add separator lines and #Task name header for readable output

SDK

  • Add list_dataset_tasks to DatasetClient and registry layers
  • Add _extract_tasks_from_split method to merge directory + file tasks
  • Strip file suffix (e.g., task-001.jsontask-001)
  • Ignore placeholder objects and nested paths

Tests

  • Unit tests for CLI command, client, and OSS registry
  • Tests for directory + file task merging scenarios

Test Plan

  • uv run pytest tests/unit/datasets/ -v passes (15 tests)

Usage Example

rock datasets tasks \                                                                                                                             
  --bucket my-bucket \
  --org my-org --dataset my-dataset --split train                                                                                                 
                                                                                                                                                  
Output:                                                                                                                                           
                                                                                                                                                  
================================================================================                                                                  
Dataset: my-org/my-dataset  Split: train  Total: 5  Shown: 5                                                                                      
================================================================================                                                                  
#Task name                                                                                                                                        
----------                                                                                                                                        
environment                                                                                                                                       
instruction                                                                                                                                       
solution                                                                                                                                          
task                                                                                                                                              
tests

berstpander and others added 3 commits April 21, 2026 19:03
- add `rock datasets tasks` with required org/dataset and default split=test
- support offset/limit pagination for displayed task IDs
- extend DatasetClient and registry layers with list_dataset_tasks
- add unit tests for CLI, client, and OSS registry behavior

Co-Authored-By: Oz <oz-agent@warp.dev>
…ader

Add visual separation between log messages and task list output:
- Empty line and separator line before results
- Consolidated info line with dataset/split/total/shown
- "#Task name" header with underline for task list

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously, task listing only recognized directory tasks from prefix_list.
This fix adds support for file tasks from object_list:

- Add _extract_tasks_from_split method to merge directory and file tasks
- Strip file suffix (e.g., task-001.json -> task-001)
- Ignore placeholder objects (key ending with "/") and nested paths
- Dedupe and sort merged task list

This affects both list_datasets (task count) and list_dataset_tasks (task list).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@berstpander berstpander force-pushed the feature/datasets-tasks-list branch from 9daa2e7 to 5a02133 Compare April 22, 2026 03:05
@BCeZn BCeZn merged commit 3b55efe into alibaba:master Apr 22, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add dataset tasks listing with file task support

3 participants