This document describes how to run Sugar-AI, test recent changes, and troubleshoot common issues.
Sugar-AI provides a Docker-based deployment option for an isolated and reproducible environment.
Open your terminal in the project's root directory and run:
docker build -t sugar-ai .- 
With GPU (using NVIDIA Docker runtime): docker run --gpus all -it --rm sugar-ai 
- 
CPU-only: docker run -it --rm sugar-ai 
The container starts by executing main.py. To change the startup behavior, update the Dockerfile accordingly.
The FastAPI server provides endpoints to interact with Sugar-AI.
pip install -r requirements.txtuvicorn main:app --host 0.0.0.0 --port 8000Sugar-AI provides three different endpoints for different use cases:
| Endpoint | Purpose | Input Format | Features | 
|---|---|---|---|
| /ask | RAG-enabled answers | Query parameter | • Retrieval-Augmented Generation • Sugar/Pygame/GTK documentation • Child-friendly responses | 
| /ask-llm | Direct LLM without RAG | Query parameter | • No document retrieval • Direct model access • Faster responses • Default system prompt and parameters | 
| /ask-llm-prompted | Custom prompt with advanced controls | JSON body | • Custom system prompts • Configurable model parameters | 
- 
GET endpoint Access the root URL: 
 http://localhost:8000/ to see the welcome message.
- 
POST endpoint for asking questions To submit a coding question, send a POST request to /askwith thequestionparameter. For example:curl -X POST "http://localhost:8000/ask?question=How%20do%20I%20create%20a%20Pygame%20window?"The API returns a JSON object with the answer. 
- 
POST endpoint for debugging python programs To submit your code, send a POST request to /debugwith thecodeparameter and acontextflag. For example:curl -X POST "http://localhost:8000/debug?code=How%20do%20I%20create%20a%20Pygame%20window&context=False?"The API returns a JSON object with the answer. 
- 
Additional POST endpoint (/ask-llm) An alternative endpoint /ask-llmis available inmain.py, which provides similar functionality with an enhanced processing pipeline for LLM interactions. To use it, send your coding-related question using:curl -X POST "http://localhost:8000/ask-llm?question=How%20do%20I%20create%20a%20Pygame%20window?"The response format is JSON containing the answer generated by the language model. 
- 
Advanced POST endpoint - Custom prompt + generation parameters (/ask-llm-prompted) A powerful endpoint that allows you to use custom prompts and fine-tune generation parameters. Unlike the other endpoints, this one: - Uses your own custom system prompt
- Accepts JSON request body with configurable model parameters
- Provides direct LLM access without RAG
 Basic Usage: curl -X POST "http://localhost:8000/ask-llm-prompted" \ -H "X-API-Key: sugarai2024" \ -H "Content-Type: application/json" \ -d '{ "question": "How do I create a Pygame window?", "custom_prompt": "You are a Python expert. Provide detailed code examples with explanations." }' Advanced Usage with Generation Parameters: curl -X POST "http://localhost:8000/ask-llm-prompted" \ -H "X-API-Key: sugarai2024" \ -H "Content-Type: application/json" \ -d '{ "question": "Write a function to calculate fibonacci numbers", "custom_prompt": "You are a coding tutor. Explain step-by-step with comments.", "max_length": 1024, "truncation": true, "repetition_penalty": 1.1, "temperature": 0.7, "top_p": 0.9, "top_k": 50 }' Request Parameters: - question(required): The question or task to process
- custom_prompt(required): Your custom system prompt
- max_length(optional, default: 1024): Maximum length of generated response
- truncation(optional, default: true): Whether to truncate long inputs
- repetition_penalty(optional, default: 1.1): Controls repetition (1.0 = no penalty, >1.0 = less repetition)
- temperature(optional, default: 0.7): Controls randomness (0.0 = deterministic, 1.0 = very random)
- top_p(optional, default: 0.9): Nucleus sampling (0.1 = focused, 0.9 = diverse)
- top_k(optional, default: 50): Limits vocabulary to K most likely words
 Response Format: { "answer": "Here's how to create a Pygame window:\n\nimport pygame...", "user": "Admin Key", "quota": {"remaining": 95, "total": 100}, "generation_params": { "max_length": 1024, "truncation": true, "repetition_penalty": 1.1, "temperature": 0.7, "top_p": 0.9, "top_k": 50 } }Use Cases: Different activites can now use different system prompts and different generation parameters to achieve a model that is personalized to that activites needs. Generation Parameter Guidelines: - For Code: temperature: 0.2-0.4, top_p: 0.8, repetition_penalty: 1.1
- For Creative Content: temperature: 0.7-0.9, top_p: 0.9, repetition_penalty: 1.2
- For Factual Answers: temperature: 0.3-0.5, top_p: 0.7, repetition_penalty: 1.0
 
Sugar-AI implements an API key-based authentication system for secure access to endpoints.
API keys are defined in the .env file with the following format:
API_KEYS={"sugarai2024": {"name": "Admin Key", "can_change_model": true}, "user_key_1": {"name": "User 1", "can_change_model": false}}
Each key has associated user information:
- name: A friendly name for the user (appears in API responses and logs)
- can_change_model: Boolean that controls permission to change the model
To use the authenticated endpoints, include the API key in your request headers:
curl -X POST "http://localhost:8000/ask?question=How%20do%20I%20create%20a%20Pygame%20window?" \
  -H "X-API-Key: sugarai2024"The response will include the user name:
{
  "answer": "To create a Pygame window...",
  "user": "Admin Key"
}Users with can_change_model: true permission can change the model:
curl -X POST "http://localhost:8000/change-model?model=Qwen/Qwen2-1.5B-Instruct&api_key=sugarai2024&password=sugarai2024"The user name serves several purposes:
- It provides identification in API responses, helping track which user made which request
- It adds context to server logs for monitoring API usage
- It allows for more personalized interaction in multi-user environments
- It helps administrators identify which API key corresponds to which user
Sugar-AI includes several additional security features to protect the API and manage resources effectively:
Each API key has a daily request limit defined in the .env file:
MAX_DAILY_REQUESTS=100
The system automatically tracks usage and resets quotas daily. When testing:
- 
Check remaining quota by examining API responses: { "answer": "Your answer here...", "user": "User 1", "quota": {"remaining": 95, "total": 100} }
- 
Test quota enforcement by sending more than the allowed number of requests. The API will return a 429 status code when the quota is exceeded: curl -i -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: user_key_1" # After exceeding quota: # HTTP/1.1 429 Too Many Requests # {"detail":"Daily request quota exceeded"} 
Sugar-AI implements comprehensive logging for security monitoring:
- All API requests are logged with user information, IP addresses, and timestamps
- Failed authentication attempts are recorded with warning level
- Model change attempts are tracked with detailed information
- All logs are stored in sugar_ai.logfor review
To test logging functionality:
# Make a valid request
curl -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: sugarai2024"
# Make an invalid request
curl -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: invalid_key"
# Check the logs
tail -f sugar_ai.logThe API implements CORS (Cross-Origin Resource Sharing) and trusted host verification:
- In development mode, API access is allowed from all origins
- For production, consider restricting the allow_originsparameter inmain.py
The Streamlit app should be updated to include API key authentication and support for all three endpoints:
# Updated streamlit.py example
import streamlit as st
import requests
import json
st.title("Sugar-AI Chat Interface")
# Add API key field
api_key = st.sidebar.text_input("API Key", type="password")
# Endpoint selection
endpoint_choice = st.selectbox(
    "Choose endpoint:",
    ["RAG (ask)", "Direct LLM (ask-llm)", "Custom Prompt (ask-llm-prompted)"]
)
st.subheader("Ask Sugar-AI")
question = st.text_input("Enter your question:")
# Custom prompt section for ask-llm-prompted
custom_prompt = ""
generation_params = {}
if endpoint_choice == "Custom Prompt (ask-llm-prompted)":
    custom_prompt = st.text_area(
        "Custom Prompt:", 
        value="You are a helpful assistant. Provide clear and detailed answers.",
        help="This prompt will replace the default system prompt"
    )
    
    # Generation parameters
    with st.expander("Advanced Generation Parameters"):
        col1, col2 = st.columns(2)
        
        with col1:
            max_length = st.number_input("Max Length", value=1024, min_value=100, max_value=2048)
            temperature = st.slider("Temperature", 0.0, 1.0, 0.7, 0.1)
            repetition_penalty = st.slider("Repetition Penalty", 0.5, 2.0, 1.1, 0.1)
        
        with col2:
            top_p = st.slider("Top P", 0.1, 1.0, 0.9, 0.1)
            top_k = st.number_input("Top K", value=50, min_value=1, max_value=100)
            truncation = st.checkbox("Truncation", value=True)
    
    generation_params = {
        "max_length": max_length,
        "truncation": truncation,
        "repetition_penalty": repetition_penalty,
        "temperature": temperature,
        "top_p": top_p,
        "top_k": top_k
    }
if st.button("Submit"):
    if question and api_key:
        headers = {"X-API-Key": api_key}
        
        try:
            if endpoint_choice == "RAG (ask)":
                url = "http://localhost:8000/ask"
                params = {"question": question}
                response = requests.post(url, params=params, headers=headers)
                
            elif endpoint_choice == "Direct LLM (ask-llm)":
                url = "http://localhost:8000/ask-llm"
                params = {"question": question}
                response = requests.post(url, params=params, headers=headers)
                
            elif endpoint_choice == "Custom Prompt (ask-llm-prompted)":
                url = "http://localhost:8000/ask-llm-prompted"
                headers["Content-Type"] = "application/json"
                data = {
                    "question": question,
                    "custom_prompt": custom_prompt,
                    **generation_params
                }
                response = requests.post(url, headers=headers, data=json.dumps(data))
            
            if response.status_code == 200:
                result = response.json()
                st.markdown("**Answer:** " + result["answer"])
                st.sidebar.info(f"User: {result.get('user', 'Unknown')}")
                st.sidebar.info(f"Remaining quota: {result['quota']['remaining']}/{result['quota']['total']}")
                
                # Show generation parameters for custom prompt endpoint
                if endpoint_choice == "Custom Prompt (ask-llm-prompted)" and "generation_params" in result:
                    with st.expander("Generation Parameters Used"):
                        st.json(result["generation_params"])
                        
            else:
                st.error(f"Error {response.status_code}: {response.text}")
                
        except Exception as e:
            st.error(f"Error contacting the API: {e}")
            
    elif not question:
        st.warning("Please enter a question.")
    elif not api_key:
        st.warning("Please enter an API key.")Run this updated Streamlit app to test the complete authentication flow and quota visibility.
To test the new RAG Agent directly from the CLI, execute:
python rag_agent.py --quantizeRemove the --quantize flag if you prefer running without 4‑bit quantization.
- 
Verify Model Setup: - Confirm the selected model loads correctly by checking the terminal output for any errors.
 
- 
Document Retrieval: - Place your documents (PDF or text files) in the directory specified in the default parameters or provide your paths using the --docsflag.
- The vector store is rebuilt every time the agent starts. Ensure your documents are well placed to retrieve relevant content.
 
- Place your documents (PDF or text files) in the directory specified in the default parameters or provide your paths using the 
- 
Question Handling: - After the agent starts, enter a sample coding-related question.
- The assistant should respond by incorporating context from the loaded documents and answering your query.
 
- 
API and Docker Route: - Optionally, combine these changes by deploying the updated version via Docker and testing the FastAPI endpoints as described above.
 
If you encounter CUDA out-of-memory errors, consider running the agent on CPU or adjust CUDA settings:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:TrueReview the terminal output for further details and error messages.
When deploying Sugar-AI in CI/CD pipelines, you'll need to configure environment variables properly. Current CI/CD uses github webhooks. So make sure to create a webhook secret and add it to the .env.
Sugar-AI also provides a Streamlit-based interface for quick interactions and visualizations.
- 
Install Streamlit: If you haven't already, install Streamlit: pip install streamlit 
- 
Make sure server is running using: uvicorn main:app --host 0.0.0.0 --port 8000 
- 
Start the App: Launch the Streamlit app by adding streamlit.py file. #./streamlit.py import streamlit as st import requests st.title("Sugar-AI Chat Interface") use_rag = st.checkbox("Use RAG (Retrieval-Augmented Generation)", value=True) st.subheader("Ask Sugar-AI") question = st.text_input("Enter your question:") if st.button("Submit"): if question: if use_rag: url = "http://localhost:8000/ask" else: url = "http://localhost:8000/ask-llm" params = {"question": question} try: response = requests.post(url, params=params) if response.status_code == 200: result = response.json() st.markdown("**Answer:** " + result["answer"]) else: st.error(f"Error {response.status_code}: {response.text}") except Exception as e: st.error(f"Error contacting the API: {e}") else: st.warning("Please enter a question.") streamlit run streamlit.py 
- 
Using the App: - The app provides a simple UI to input coding questions and displays the response using Sugar-AI.
- Use the sidebar options to configure settings if available.
- The app communicates with the FastAPI backend to process and retrieve answers.
 
Enjoy exploring Sugar-AI through both API endpoints and the interactive Streamlit interface!
