Update ls command to recurse all objects if -r flag is provided with no bucket#57
Update ls command to recurse all objects if -r flag is provided with no bucket#57Simon9870 wants to merge 1 commit intorustfs:mainfrom
Conversation
…no bucket This brings rc more in line with mc's behaviour, specifically when using the recursive flag. No other behaviour changes other than if you provide the -r flag with an alias but no bucket specified. Previously this would have listed just the buckets (same as without the flag), now it lists all objects on the server.
|
@codex review |
There was a problem hiding this comment.
Pull request overview
Updates the CLI ls command to match mc behavior by recursing through all buckets when -r/--recursive is provided with an alias but no bucket, and refactors object listing pagination into a shared helper.
Changes:
- Add
ls -r <alias>behavior to list objects across all buckets instead of listing buckets only. - Introduce
list_all_objectsto enumerate buckets then recursively list objects per bucket. - Extract pagination logic into
list_objects_with_pagingto reuse in both single-bucket and all-buckets listing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (bucket_name, objects) in all_items { | ||
| for item in objects { | ||
| let date = item | ||
| .last_modified | ||
| .map(|d| d.strftime("%Y-%m-%d %H:%M:%S").to_string()) | ||
| .unwrap_or_else(|| " ".to_string()); | ||
| let styled_date = formatter.style_date(&format!("[{date}]")); | ||
|
|
||
| if item.is_dir { | ||
| let styled_size = formatter.style_size(&format!("{:>10}", "0B")); | ||
| let styled_name = | ||
| formatter.style_dir(&format!("{}/{}", bucket_name, &item.key)); | ||
| formatter.println(&format!("{styled_date} {styled_size} {styled_name}")); | ||
| } else { | ||
| let size = item.size_human.clone().unwrap_or_else(|| "0 B".to_string()); | ||
| let styled_size = formatter.style_size(&format!("{:>10}", size)); | ||
| let styled_name = | ||
| formatter.style_file(&format!("{}/{}", bucket_name, &item.key)); | ||
| formatter.println(&format!("{styled_date} {styled_size} {styled_name}")); |
There was a problem hiding this comment.
Iterating over HashMap in the text output path (for (bucket_name, objects) in all_items) produces a non-deterministic bucket order between runs, which can make CLI output harder to read and test. If you keep the bucket grouping, consider using an order-preserving structure (e.g., Vec in list_buckets() order, or BTreeMap) so output is stable.
| for (bucket_name, objects) in all_items { | |
| for item in objects { | |
| let date = item | |
| .last_modified | |
| .map(|d| d.strftime("%Y-%m-%d %H:%M:%S").to_string()) | |
| .unwrap_or_else(|| " ".to_string()); | |
| let styled_date = formatter.style_date(&format!("[{date}]")); | |
| if item.is_dir { | |
| let styled_size = formatter.style_size(&format!("{:>10}", "0B")); | |
| let styled_name = | |
| formatter.style_dir(&format!("{}/{}", bucket_name, &item.key)); | |
| formatter.println(&format!("{styled_date} {styled_size} {styled_name}")); | |
| } else { | |
| let size = item.size_human.clone().unwrap_or_else(|| "0 B".to_string()); | |
| let styled_size = formatter.style_size(&format!("{:>10}", size)); | |
| let styled_name = | |
| formatter.style_file(&format!("{}/{}", bucket_name, &item.key)); | |
| formatter.println(&format!("{styled_date} {styled_size} {styled_name}")); | |
| // Ensure deterministic bucket order in text output | |
| let mut bucket_names: Vec<&str> = all_items.keys().copied().collect(); | |
| bucket_names.sort_unstable(); | |
| for bucket_name in bucket_names { | |
| if let Some(objects) = all_items.get(bucket_name) { | |
| for item in objects { | |
| let date = item | |
| .last_modified | |
| .map(|d| d.strftime("%Y-%m-%d %H:%M:%S").to_string()) | |
| .unwrap_or_else(|| " ".to_string()); | |
| let styled_date = formatter.style_date(&format!("[{date}]")); | |
| if item.is_dir { | |
| let styled_size = formatter.style_size(&format!("{:>10}", "0B")); | |
| let styled_name = | |
| formatter.style_dir(&format!("{}/{}", bucket_name, &item.key)); | |
| formatter.println(&format!("{styled_date} {styled_size} {styled_name}")); | |
| } else { | |
| let size = item.size_human.clone().unwrap_or_else(|| "0 B".to_string()); | |
| let styled_size = formatter.style_size(&format!("{:>10}", size)); | |
| let styled_name = | |
| formatter.style_file(&format!("{}/{}", bucket_name, &item.key)); | |
| formatter.println(&format!("{styled_date} {styled_size} {styled_name}")); | |
| } |
| if formatter.is_json() { | ||
| let output = LsOutput { | ||
| items: all_items.into_values().flatten().collect(), | ||
| truncated: is_truncated, | ||
| continuation_token, | ||
| summary: if summarize { | ||
| Some(Summary { | ||
| total_objects, | ||
| total_size_bytes: total_size, | ||
| total_size_human: humansize::format_size(total_size as u64, humansize::BINARY), | ||
| }) | ||
| } else { | ||
| None | ||
| }, | ||
| }; | ||
| formatter.json(&output); |
There was a problem hiding this comment.
In list_all_objects, the JSON output collects ObjectInfo values from multiple buckets without preserving which bucket each item came from (items: all_items.into_values()...). Since ObjectInfo.key is only the object key (not bucket/object), JSON consumers can’t disambiguate objects from different buckets and the output won’t match the text mode behavior that prefixes bucket names. Consider prefixing ObjectInfo.key with "{bucket}/{key}" when aggregating, or emitting a dedicated JSON item type that includes bucket separately.
| let mut all_items: HashMap<&str, Vec<ObjectInfo>> = HashMap::new(); | ||
| let mut is_truncated = false; | ||
| let mut continuation_token: Option<String> = None; | ||
|
|
||
| // List all objects in each bucket | ||
| for bucket in &buckets { | ||
| let path = &RemotePath::new(&alias, &bucket.key, ""); | ||
| let new_items: Vec<ObjectInfo>; | ||
|
|
||
| (new_items, is_truncated, continuation_token) = | ||
| match list_objects_with_paging(client, path, &options).await { | ||
| Ok(r) => r, | ||
| Err((message, exit_code)) => { | ||
| formatter.error(&message); | ||
| return exit_code; | ||
| } | ||
| }; | ||
|
|
||
| all_items.entry(&bucket.key).or_default().extend(new_items); | ||
| } |
There was a problem hiding this comment.
list_all_objects loads all objects from all buckets into memory (HashMap of vectors) before printing/serializing. On large deployments this can cause very high memory use and long delays before any output. Prefer streaming per-bucket results: print items as they’re listed (and accumulate summary counters), and for JSON either (a) still collect but avoid the intermediate HashMap, or (b) add a streaming JSON mode if supported by the formatter.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cc46fa7b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let output = LsOutput { | ||
| items: all_items.into_values().flatten().collect(), | ||
| truncated: is_truncated, | ||
| continuation_token, |
There was a problem hiding this comment.
Preserve bucket context in
--json alias-recursive listings
When rc ls --recursive <alias> --json walks the whole server, this branch serializes all_items.into_values().flatten() directly. ObjectInfo only carries a key, so unlike the human-output path below it never prefixes bucket_name/, which means the JSON loses which bucket each object came from. If two buckets both contain foo.txt, callers will see two indistinguishable key: "foo.txt" entries and cannot reconstruct the full remote path from the new output.
Useful? React with 👍 / 👎.
| ..Default::default() | ||
| }; | ||
|
|
||
| let mut all_items: HashMap<&str, Vec<ObjectInfo>> = HashMap::new(); |
There was a problem hiding this comment.
Keep bucket order deterministic in alias-recursive ls
This new path stores per-bucket results in a HashMap, then renders it with for (bucket_name, objects) in all_items / into_values(). Because Rust hash maps use a randomized iteration order, repeated rc ls -r <alias> runs against the same data can emit buckets in a different order than list_buckets() returned, which is a user-visible regression for scripts and snapshot-style checks that diff the command output.
Useful? React with 👍 / 👎.
This brings rc more in line with mc's behaviour, specifically when using the recursive flag. No other behaviour changes other than if you provide the -r flag with an alias but no bucket specified. Previously this would have listed just the buckets (same as without the flag), now it lists all objects on the server.
Closes #47