[Task] Extract Text from Indico PDFs #29

Open

Labels

bugdescriptiondocumentationinferenceingestioninvalidquestiontrack issuewontfixworkflowworking

opened

on Jul 29, 2025

Use PyPDF or LangChain’s PyPDFLoader to extract text from each downloaded PDF document.

Handle possible extraction failures gracefully with logging.
Prepare text data for chunking and ingestion.
Ensure text is mapped with appropriate meeting and document metadata.

Metadata

Assignees

No one assigned

Labels

bugdescriptiondocumentationinferenceingestioninvalidquestiontrack issuewontfixworkflowworking

Type

No type

Projects

RAG4EIC Project

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests