-
Notifications
You must be signed in to change notification settings - Fork 189
Open
Description
Description of the feature request:
It would be helpful to have a simple processor in genai_processors/contrib that converts all text in incoming ProcessorParts to lowercase. This would make it easier for users to build normalization pipelines.
Proposed API:
Location: genai_processors/contrib/lowercase_text_processor.py
Class: LowercaseTextProcessor
Inherits from: PartProcessor
Logic: If the part is text (is_text(part.mimetype)), convert to lowercase; else, yield unchanged. All metadata is preserved.
What problem are you trying to solve with this feature?
-
Tokenization might use "Hello", "hello", and "HELLO" as different number of tokens. Lowercasing ensures that "Hello", "hello", and "HELLO" are treated the same.
-
Improved search and matching
Any other information you'd like to share?
No response
Metadata
Metadata
Assignees
Labels
No labels