An advanced RAG (Retrieval-Augmented Generation) system that provides intelligent access to Cerebras inference documentation with citations, conversation memory, and multiple interfaces for the Cerebras/OpenRouter Qwen 3 hackathon!
- Advanced RAG Pipeline: Semantic search through Cerebras documentation using Pinecone + Cohere
- Citation Support: Get exact quotes and source references for all answers
- Multiple Interfaces: CLI or programmatic API access
- Conversation Memory: Maintains context across interactions using LangGraph
- Document Reranking: Optional Cohere reranking for improved relevance
- Streaming Responses: Real-time response generation with live citation tracking
- Powered by Cerebras: Uses Qwen3-32B through Cerebras inference for fast, high-quality responses
cerebras-openrouter-hackathon/
├── src/cerebras_rag/ # Main package
│ ├── agents/ # Core RAG agent logic
│ │ ├── rag_agent.py # Main CerebrasRAGAgent class
│ │ └── models.py # Pydantic models for structured output
│ ├── interfaces/ # User interfaces
│ │ └── cli.py # Professional CLI interface
│ └── utils/ # Utilities and tools
│ └── populate_vectordb.py # Vector database population
├── scripts/ # Entry point scripts
│ ├── run_cli.py # Run CLI interface
│ └── populate_vectordb.py # Populate vector database
├── docs/ # Comprehensive documentation
├── examples/ # Usage examples
├── tests/ # Test suite
├── requirements.txt # Python dependencies
├── setup.py # Package installation
└── .env # API keys (create this file)
git clone <repository-url>
cd cerebras-openrouter-hackathon# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
# Or install as editable package
pip install -e .Create a .env file with your API keys:
# OpenRouter (for Cerebras inference)
OPENROUTER_API_KEY=your_openrouter_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
# Pinecone (for vector database)
PINECONE_API_KEY=your_pinecone_key_here
# Cohere (for embeddings and reranking)
COHERE_API_KEY=your_cohere_key_here
# Firecrawl (for enhanced web crawling)
FIRECRAWL_API_KEY=your_firecrawl_key_hereGet API Keys:
- OpenRouter - For Cerebras model access
- Pinecone - For vector database
- Cohere - For embeddings and reranking
- Firecrawl - For enhanced web crawling (optional)
python scripts/populate_vectordb.pyThis will:
- Crawl Cerebras inference documentation
- Create embeddings using Cohere
- Store in Pinecone vector database
- Build searchable knowledge base
# Run the interactive CLI
python scripts/run_cli.py
# Or if installed as package
cerebras-rag-cliCLI Features:
- Interactive question-answering with citations
- Live configuration (enable/disable citations, reranking)
- Conversation history tracking
- Professional terminal interface with color coding
- Session management and command help
CLI Commands:
help- Show command referencecitations on/off- Toggle source citationsreranking on/off- Toggle document rerankingstatus- Show system statushistory- Display conversation historyquit/exit- Exit application
from src.cerebras_rag import get_agent
# Get agent instance
agent = get_agent()
# Initialize components
agent.initialize_vector_store()
agent.initialize_graph()
# Ask a question
response = agent.ask_question(
"How do I authenticate with Cerebras API?",
use_citations=True,
use_reranking=False
)
print(f"Answer: {response.answer}")
for citation in response.citations:
print(f"Source {citation.source_id}: {citation.quote}")# Stream responses in real-time
for chunk in agent.stream_response_with_citations(
question="How do I get started with Cerebras?",
use_citations=True
):
if chunk["type"] == "answer":
print(chunk["content"])
elif chunk["type"] == "citation":
print(f"Source: {chunk['title']}")-
CerebrasRAGAgent- The heart of the system- Document retrieval and reranking
- Citation generation and structured output
- Conversation memory management
- LangGraph integration
-
CLI Interface - Professional command-line experience
- Interactive question-answering
- Real-time streaming responses
- Configuration management
- Session tracking
- LLM: Qwen3-32B via Cerebras inference (OpenRouter)
- Embeddings: Cohere embed-english-v3.0
- Vector DB: Pinecone with semantic search
- Reranking: Cohere rerank-english-v3.0 (optional)
- Memory: LangGraph with persistent checkpointing
- Citations: Structured output with source tracking
- Question Input → User asks a question
- Document Retrieval → Semantic search in Pinecone
- Optional Reranking → Cohere reranks for relevance
- Context Formation → Documents formatted with source IDs
- LLM Generation → Cerebras model generates cited response
- Memory Storage → Conversation saved to LangGraph
- Response Output → Structured answer with citations
Check out the examples/ directory for comprehensive usage examples:
basic_usage.py- Basic agent usage and citation handling- More examples coming soon!
Comprehensive documentation is available in the docs/ directory:
- Configuration Guide - Setup and configuration options
- API Reference - Detailed API documentation
- Architecture - System architecture and design
- Examples - Usage examples and tutorials
# Clone repository
git clone <repository-url>
cd cerebras-openrouter-hackathon
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks (optional)
pre-commit install# Run all tests
pytest
# Run with coverage
pytest --cov=src/cerebras_rag# Format code
black src/ scripts/ examples/
# Sort imports
isort src/ scripts/ examples/
# Lint code
flake8 src/ scripts/ examples/
# Type checking
mypy src/cerebras_ragWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check the docs/ directory
- Examples: See examples/ for usage patterns
- Issues: Report bugs and request features via GitHub issues
- Cerebras: For providing fast inference capabilities
- OpenRouter: For API access to Cerebras models
- Pinecone: For vector database services
- Cohere: For embeddings and reranking
- LangChain & LangGraph: For RAG framework and memory management
