-
-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Description
Hi,
I use Augraphy extensively but I've noticed that:
- the default pipeline can be too destructive on my documents to the point where a human cannot read the text on it (see example below)
- the only way to have a "milder" augmentation pipeline is to create a custom pipeline which requires listing out all the augmentations and is a bit cumbersome to experiment with (so many options).
It'd be great to either provide an option like "mild/strong" for the default pipeline to give some control over the default pipeline without needing to deep-dive into the internals of the package.
For instance, this doc is almost unreadable, and training models on unreadable docs can lead to really damaging behaviors like hallucinating answers completely on docs that they can't read

Metadata
Metadata
Assignees
Labels
No labels