Skip to content

Conversation

@aksg87
Copy link
Collaborator

@aksg87 aksg87 commented Nov 2, 2025

Summary

Fixes generator delegation bug causing incorrect document attribution. Changed yield to yield from and streamlined code for clarity. This issue was also comprehensively reproduced and documented in #260 by @vayoa.

Changes

  • Fix: Use yield from to properly delegate to generator helpers
  • Refactor: Improve naming (keep_last_doc, _emit_docs_iter), remove verbose comments
  • Memory: Reduce from O(documents) to O(batch_size)
  • Add: InvalidDocumentError and InternalError exception classes

Testing

  • Regression test catches the delegation bug
  • All 26 tests pass

Fixes #260 #269

@github-actions github-actions bot added the size/M Pull request with 150-600 lines changed label Nov 2, 2025
@aksg87 aksg87 force-pushed the fix/annotation-generator-align branch 2 times, most recently from 54ebb97 to aefe83a Compare November 2, 2025 10:44
aksg87 and others added 3 commits November 2, 2025 10:49
Verify annotate_documents uses 'yield from' to properly delegate to
generators, ensuring correct document attribution across batches.

Co-authored-by: Vayoa <[email protected]>
Stream documents lazily and emit incrementally to reduce memory from
O(documents) to O(batch_size). Improve code clarity with better naming
(keep_last_doc, _emit_docs_iter) and removed verbose comments.
The test was not passing through the real batches, which prevented the
lazy document capture from running. Updated mock to use side_effect to
pass through the iterable while still allowing inspection of call args.
@aksg87 aksg87 force-pushed the fix/annotation-generator-align branch from aefe83a to f738897 Compare November 2, 2025 10:50
@aksg87 aksg87 merged commit fe18abe into main Nov 2, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Pull request with 150-600 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-Document extraction bleed (only last result captured)

1 participant