This project handles document ingestion and processing for the RAG (Retrieval-Augmented Generation) chatbot. It's separate from the main chatbot deployment to keep the cloud instance clean and focused.
src/: Python source code for document processingdata/input/: Place input documents heredata/output/: Processed data will be stored here
-
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Linux/Mac # or .venv\Scripts\activate # On Windows
-
Install dependencies:
pip install -r requirements.txt
- Place documents to be processed in the
data/input/directory - Run the processing scripts from the
src/directory - The processed data will be stored in the
data/output/directory, ready for use by the RAG engine