Skip to content

Conversation

@mattsu2020
Copy link
Contributor

@mattsu2020 mattsu2020 commented Nov 14, 2025

Implement logic to increment column count in WidthMode::Characters, emitting output when width is reached. This ensures accurate line folding for multi-byte characters, enhancing Unicode support.

related
#9127

…scii_line

Implement logic to increment column count in WidthMode::Characters, emitting output when width is reached. This ensures accurate line folding for multi-byte characters, enhancing Unicode support.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

- Added conditional check in fold_file function to call emit_output when col_count >= width
- Ensures lines are properly wrapped based on byte or character width before final output flush
- Improves handling of incomplete lines that need early breaking to respect the specified width
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 14, 2025

CodSpeed Performance Report

Merging #9274 will improve performances by 49.22%

Comparing mattsu2020:fold_compatibility (eb7d8c4) with main (2a314c7)

Summary

⚡ 2 improvements
✅ 124 untouched
⏩ 6 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
fold_custom_width[50000] 32.9 ms 22 ms +49.22%
fold_many_lines[100000] 80.1 ms 55.8 ms +43.39%

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

In character width mode, emit output immediately after segments are added if column count exceeds width, preventing redundant flushes. Simplify the file folding logic by removing unnecessary conditional checks at the end, ensuring clean output writing. This fixes potential issues with extra line breaks or incorrect folding behavior.
…ability

Refactor code in fold.rs to break lengthy if-condition statements across multiple lines in push_ascii_segment, process_utf8_line, and process_non_utf8_line functions. This improves code readability without changing functionality.
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/fold/fold-characters. tests/fold/fold-characters is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

…ory usage

Introduce a STREAMING_FLUSH_THRESHOLD constant and helper functions (maybe_flush_unbroken_output, push_byte, push_bytes) to periodically flush the output buffer when it exceeds 8KB and no spaces are being tracked, preventing excessive memory consumption when processing large files. This refactor replaces direct buffer pushes with checks for threshold-based flushing.
@sylvestre
Copy link
Contributor

Could you please add tests?
thanks

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/fold/fold-characters. tests/fold/fold-characters is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

and please fix this regression:
GNU test failed: tests/fold/fold-characters. tests/fold/fold-characters is passing on 'main'. Maybe you have to rebase?

mattsu2020 and others added 5 commits November 15, 2025 08:55
…d tests

Remove conditional checks that incorrectly emitted output when column count reached width in character mode, ensuring proper folding of wide characters and handling of edge cases. Add comprehensive tests for wide characters, invalid UTF-8, zero-width spaces, and buffer boundaries to verify correct behavior. This prevents issues with multi-byte character folding where output was prematurely flushed, improving accuracy for Unicode input.
- Remove trailing empty lines in fold.rs
- Compact multiline variable assignments in test_fold.rs for readability
…racters

Add unicode-width crate to handle zero-width Unicode characters in fold utility. Introduced new test 'test_zero_width_data_line_counts' to verify correct wrapping in --characters mode for zero-width bytes and spaces, ensuring fold behaves consistently with character counts rather than visual width.
- Add bytecount dependency to Cargo.toml and Cargo.lock
- Refactor newline_count function in test_fold.rs to use bytecount::count instead of manual iteration for better performance
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Modify the fold implementation to process input in buffered chunks rather than line-by-line reading, ensuring correct handling of multi-byte characters split across buffer boundaries. Add process_pending_chunk function and new streaming logic to fold_file for better performance on large files. Update tests accordingly.
Replace loop with early empty check by a while loop conditional on !pending.is_empty()
for clarity. Restructure invalid UTF-8 error handling to first check if valid_up_to == 0,
then process the valid prefix, improving code readability and flow without changing behavior.
Consolidate the assignment of the `valid` variable from multiple lines to a single line for improved code readability and adherence to style guidelines favoring concise declarations.
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/fold/fold-zero-width is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/fold/fold-zero-width is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/fold/fold-zero-width is no longer failing!

@mattsu2020
Copy link
Contributor Author

この回帰を修正してください: GNU test failed: tests/fold/fold-characters. tests/fold/fold-characters is passing on 'main'. Maybe you have to rebase?

done

@ChrisDryden
Copy link
Contributor

#9328 just saw that this was succeeding, with both of these together the all of the fold tests will pass

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

… mode

Only coalesce zero-width combining characters into base characters when folding by display columns (WidthMode::Columns). In character-counting mode, treat every scalar value as advancing the counter to match chars().count() semantics, preventing incorrect line breaking for characters with zero-width marks. This ensures consistent behavior across modes as verified by existing tests.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/fold/fold-zero-width is no longer failing!

@mattsu2020
Copy link
Contributor Author

@sylvestre add test and passed the GNU coreutils tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants