⚡️ Cerebras Documentation RAG System

An advanced RAG (Retrieval-Augmented Generation) system that provides intelligent access to Cerebras inference documentation with citations, conversation memory, and multiple interfaces for the Cerebras/OpenRouter Qwen 3 hackathon!

🚀 Features

Advanced RAG Pipeline: Semantic search through Cerebras documentation using Pinecone + Cohere
Citation Support: Get exact quotes and source references for all answers
Multiple Interfaces: CLI or programmatic API access
Conversation Memory: Maintains context across interactions using LangGraph
Document Reranking: Optional Cohere reranking for improved relevance
Streaming Responses: Real-time response generation with live citation tracking
Powered by Cerebras: Uses Qwen3-32B through Cerebras inference for fast, high-quality responses

📁 Project Structure

cerebras-openrouter-hackathon/
├── src/cerebras_rag/           # Main package
│   ├── agents/                 # Core RAG agent logic
│   │   ├── rag_agent.py       # Main CerebrasRAGAgent class
│   │   └── models.py          # Pydantic models for structured output
│   ├── interfaces/            # User interfaces
│   │   └── cli.py            # Professional CLI interface
│   └── utils/                 # Utilities and tools
│       └── populate_vectordb.py  # Vector database population
├── scripts/                   # Entry point scripts
│   ├── run_cli.py            # Run CLI interface
│   └── populate_vectordb.py  # Populate vector database
├── docs/                      # Comprehensive documentation
├── examples/                  # Usage examples
├── tests/                     # Test suite
├── requirements.txt           # Python dependencies
├── setup.py                  # Package installation
└── .env                      # API keys (create this file)

🛠️ Setup

1. Clone Repository

git clone <repository-url>
cd cerebras-openrouter-hackathon

2. Install Dependencies

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

# Or install as editable package
pip install -e .

3. Configure API Keys

Create a .env file with your API keys:

# OpenRouter (for Cerebras inference)
OPENROUTER_API_KEY=your_openrouter_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

# Pinecone (for vector database)
PINECONE_API_KEY=your_pinecone_key_here

# Cohere (for embeddings and reranking)
COHERE_API_KEY=your_cohere_key_here

# Firecrawl (for enhanced web crawling)
FIRECRAWL_API_KEY=your_firecrawl_key_here

Get API Keys:

OpenRouter - For Cerebras model access
Pinecone - For vector database
Cohere - For embeddings and reranking
Firecrawl - For enhanced web crawling (optional)

4. Populate Vector Database

python scripts/populate_vectordb.py

This will:

Crawl Cerebras inference documentation
Create embeddings using Cohere
Store in Pinecone vector database
Build searchable knowledge base

🎯 Usage

Option 1: Command Line Interface (Recommended)

# Run the interactive CLI
python scripts/run_cli.py

# Or if installed as package
cerebras-rag-cli

CLI Features:

Interactive question-answering with citations
Live configuration (enable/disable citations, reranking)
Conversation history tracking
Professional terminal interface with color coding
Session management and command help

CLI Commands:

help - Show command reference
citations on/off - Toggle source citations
reranking on/off - Toggle document reranking
status - Show system status
history - Display conversation history
quit/exit - Exit application

Option 2: Programmatic API

from src.cerebras_rag import get_agent

# Get agent instance
agent = get_agent()

# Initialize components
agent.initialize_vector_store()
agent.initialize_graph()

# Ask a question
response = agent.ask_question(
    "How do I authenticate with Cerebras API?",
    use_citations=True,
    use_reranking=False
)

print(f"Answer: {response.answer}")
for citation in response.citations:
    print(f"Source {citation.source_id}: {citation.quote}")

Option 3: Streaming API

# Stream responses in real-time
for chunk in agent.stream_response_with_citations(
    question="How do I get started with Cerebras?",
    use_citations=True
):
    if chunk["type"] == "answer":
        print(chunk["content"])
    elif chunk["type"] == "citation":
        print(f"Source: {chunk['title']}")

🧠 Architecture

Core Components

CerebrasRAGAgent - The heart of the system
- Document retrieval and reranking
- Citation generation and structured output
- Conversation memory management
- LangGraph integration
CLI Interface - Professional command-line experience
- Interactive question-answering
- Real-time streaming responses
- Configuration management
- Session tracking

Technical Stack

LLM: Qwen3-32B via Cerebras inference (OpenRouter)
Embeddings: Cohere embed-english-v3.0
Vector DB: Pinecone with semantic search
Reranking: Cohere rerank-english-v3.0 (optional)
Memory: LangGraph with persistent checkpointing
Citations: Structured output with source tracking

Data Flow

Question Input → User asks a question
Document Retrieval → Semantic search in Pinecone
Optional Reranking → Cohere reranks for relevance
Context Formation → Documents formatted with source IDs
LLM Generation → Cerebras model generates cited response
Memory Storage → Conversation saved to LangGraph
Response Output → Structured answer with citations

📊 Examples

Check out the examples/ directory for comprehensive usage examples:

basic_usage.py - Basic agent usage and citation handling
More examples coming soon!

📚 Documentation

Comprehensive documentation is available in the docs/ directory:

Configuration Guide - Setup and configuration options
API Reference - Detailed API documentation
Architecture - System architecture and design
Examples - Usage examples and tutorials

🔧 Development

Installation for Development

# Clone repository
git clone <repository-url>
cd cerebras-openrouter-hackathon

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks (optional)
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/cerebras_rag

Code Quality

# Format code
black src/ scripts/ examples/

# Sort imports
isort src/ scripts/ examples/

# Lint code
flake8 src/ scripts/ examples/

# Type checking
mypy src/cerebras_rag

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Documentation: Check the docs/ directory
Examples: See examples/ for usage patterns
Issues: Report bugs and request features via GitHub issues

🙏 Acknowledgments

Cerebras: For providing fast inference capabilities
OpenRouter: For API access to Cerebras models
Pinecone: For vector database services
Cohere: For embeddings and reranking
LangChain & LangGraph: For RAG framework and memory management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡️ Cerebras Documentation RAG System

🚀 Features

📁 Project Structure

🛠️ Setup

1. Clone Repository

2. Install Dependencies

3. Configure API Keys

4. Populate Vector Database

🎯 Usage

Option 1: Command Line Interface (Recommended)

Option 2: Programmatic API

Option 3: Streaming API

🧠 Architecture

Core Components

Technical Stack

Data Flow

📊 Examples

📚 Documentation

🔧 Development

Installation for Development

Running Tests

Code Quality

🤝 Contributing

📝 License

🆘 Support

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
examples		examples
scripts		scripts
src/cerebras_rag		src/cerebras_rag
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

colesmcintosh/cerebras-openrouter-hackathon-rag-system

Folders and files

Latest commit

History

Repository files navigation

⚡️ Cerebras Documentation RAG System

🚀 Features

📁 Project Structure

🛠️ Setup

1. Clone Repository

2. Install Dependencies

3. Configure API Keys

4. Populate Vector Database

🎯 Usage

Option 1: Command Line Interface (Recommended)

Option 2: Programmatic API

Option 3: Streaming API

🧠 Architecture

Core Components

Technical Stack

Data Flow

📊 Examples

📚 Documentation

🔧 Development

Installation for Development

Running Tests

Code Quality

🤝 Contributing

📝 License

🆘 Support

🙏 Acknowledgments

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages