-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fold:fix gnu test fold-zero-width.sh #9274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…scii_line Implement logic to increment column count in WidthMode::Characters, emitting output when width is reached. This ensures accurate line folding for multi-byte characters, enhancing Unicode support.
|
GNU testsuite comparison: |
- Added conditional check in fold_file function to call emit_output when col_count >= width - Ensures lines are properly wrapped based on byte or character width before final output flush - Improves handling of incomplete lines that need early breaking to respect the specified width
CodSpeed Performance ReportMerging #9274 will improve performances by 49.22%Comparing Summary
Benchmarks breakdown
Footnotes
|
In character width mode, emit output immediately after segments are added if column count exceeds width, preventing redundant flushes. Simplify the file folding logic by removing unnecessary conditional checks at the end, ensuring clean output writing. This fixes potential issues with extra line breaks or incorrect folding behavior.
…ability Refactor code in fold.rs to break lengthy if-condition statements across multiple lines in push_ascii_segment, process_utf8_line, and process_non_utf8_line functions. This improves code readability without changing functionality.
|
GNU testsuite comparison: |
…ory usage Introduce a STREAMING_FLUSH_THRESHOLD constant and helper functions (maybe_flush_unbroken_output, push_byte, push_bytes) to periodically flush the output buffer when it exceeds 8KB and no spaces are being tracked, preventing excessive memory consumption when processing large files. This refactor replaces direct buffer pushes with checks for threshold-based flushing.
|
Could you please add tests? |
|
GNU testsuite comparison: |
|
and please fix this regression: |
…d tests Remove conditional checks that incorrectly emitted output when column count reached width in character mode, ensuring proper folding of wide characters and handling of edge cases. Add comprehensive tests for wide characters, invalid UTF-8, zero-width spaces, and buffer boundaries to verify correct behavior. This prevents issues with multi-byte character folding where output was prematurely flushed, improving accuracy for Unicode input.
- Remove trailing empty lines in fold.rs - Compact multiline variable assignments in test_fold.rs for readability
…racters Add unicode-width crate to handle zero-width Unicode characters in fold utility. Introduced new test 'test_zero_width_data_line_counts' to verify correct wrapping in --characters mode for zero-width bytes and spaces, ensuring fold behaves consistently with character counts rather than visual width.
- Add bytecount dependency to Cargo.toml and Cargo.lock - Refactor newline_count function in test_fold.rs to use bytecount::count instead of manual iteration for better performance
|
GNU testsuite comparison: |
Modify the fold implementation to process input in buffered chunks rather than line-by-line reading, ensuring correct handling of multi-byte characters split across buffer boundaries. Add process_pending_chunk function and new streaming logic to fold_file for better performance on large files. Update tests accordingly.
Replace loop with early empty check by a while loop conditional on !pending.is_empty() for clarity. Restructure invalid UTF-8 error handling to first check if valid_up_to == 0, then process the valid prefix, improving code readability and flow without changing behavior.
Consolidate the assignment of the `valid` variable from multiple lines to a single line for improved code readability and adherence to style guidelines favoring concise declarations.
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
done |
|
#9328 just saw that this was succeeding, with both of these together the all of the fold tests will pass |
|
GNU testsuite comparison: |
… mode Only coalesce zero-width combining characters into base characters when folding by display columns (WidthMode::Columns). In character-counting mode, treat every scalar value as advancing the counter to match chars().count() semantics, preventing incorrect line breaking for characters with zero-width marks. This ensures consistent behavior across modes as verified by existing tests.
|
GNU testsuite comparison: |
|
@sylvestre add test and passed the GNU coreutils tests |
Implement logic to increment column count in WidthMode::Characters, emitting output when width is reached. This ensures accurate line folding for multi-byte characters, enhancing Unicode support.
related
#9127