Skip to content

Conversation

@Paol0B
Copy link

@Paol0B Paol0B commented Jan 12, 2026

This pull request refactors the output handling in the comm utility to write output directly to a locked stdout handle, improving efficiency and error handling, especially for non-UTF-8 input. It also adds a new test to ensure correct output when processing files with invalid UTF-8 sequences.

Output handling improvements:

  • Changed from using print! macros to writing directly to a locked stdout handle via stdout.write_all, allowing for more efficient and robust output, particularly for non-UTF-8 data. All output operations now properly handle errors using map_err_context. (src/uu/comm/src/comm.rs) [1] [2] [3] [4] [5] [6]

Testing for non-UTF-8 input:

  • Added a new test, test_output_lossy_utf8, to verify that the utility correctly handles and outputs files containing invalid UTF-8 bytes, matching GNU comm's behavior. (tests/by-util/test_comm.rs)

@github-actions
Copy link

GNU testsuite comparison:

Note: The gnu test tests/printf/printf-surprise is now being skipped but was previously passing.

@sylvestre
Copy link
Contributor

please run cargo fmt

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

@Paol0B
Copy link
Author

Paol0B commented Jan 13, 2026

The current CI failure occurs during package installation on ubuntu-latest, prior to running any project-specific steps.
From the logs it appears related to external APT repositories returning 403.

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

@ChrisDryden
Copy link
Collaborator

As a side note for other reviewers, I felt that the way that the output was written was repetitive and that we should have made a helper function for that but it turns out that we do not have that implemented yet and there's many utilities that follow that same format. Would be great as a follow up task to make that format into a helper and cleaning up all of the ones that match that pattern.

Wrap stdout in BufWriter to improve performance and avoid
duplicate error messages, matching GNU comm behavior.
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 16, 2026

CodSpeed Performance Report

Merging this PR will improve performance by 8.42%

Comparing Paol0B:fix/10192 (ea47ce4) with main (5fd3459)

Summary

⚡ 2 improved benchmarks
✅ 280 untouched benchmarks
⏩ 38 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory du_deep_tree[(100, 3)] 155.8 KB 143.7 KB +8.42%
Memory numfmt_large_numbers_si[10000] 4.9 MB 4.7 MB +4.01%

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants