A fully functional demo showing Redis LangCache + OpenAI in action, implementing semantic caching with scoped isolation by Company / Business Unit / Person — all in a Gradio web interface.
Main demo file:
main_demo_released.py
- Demonstrates semantic caching for LLM responses to reduce latency and API cost.
- Scoped reuse of answers by Company / Business Unit / Person — adjustable isolation levels.
- Domain disambiguation: ambiguous questions (“cell”, “network”, “bank”) are automatically interpreted in the correct domain.
- Identity handling:
- Name → not cached (display only when asked).
- Role/Function → stored under exact key (
[IDENTITY:ROLE]) and supports “set” (e.g., “My role is …”).
- Cache management UI: clear cached entries by scope (A, B, or both) — the index is never deleted.
- Real-time KPIs: cache hits, misses, hit rate, estimated tokens saved, and $ savings.
.
├── main_demo_released.py # Main Gradio app (this demo)
├── requirements.txt # Python dependencies
├── Dockerfile # Docker build
├── docker-compose.yml # Example local orchestration
└── .env # Environment variables (not committed)
The repository also includes additional examples (RAG, attribute-based caching, etc.).
This demo usesmain_demo_released.pyas its entry point.
Create a .env file in the project root with:
# OpenAI
OPENAI_API_KEY=sk-proj-<your-openai-key>
OPENAI_MODEL=gpt-4o-mini
# LangCache (Redis Cloud)
LANGCACHE_SERVICE_KEY=<your-service-key> # or LANGCACHE_API_KEY
LANGCACHE_CACHE_ID=<your-cache-id>
LANGCACHE_BASE_URL=https://gcp-us-east4.langcache.redis.io
# (Optional) Redis local or other configs
REDIS_URL=redis://localhost:6379/0
# Embedding model (for RAG examples)
EMBED_MODEL=text-embedding-3-small
EMBED_DIM=1536
LANGCACHE_API_KEYandLANGCACHE_SERVICE_KEYare interchangeable for this app — use one of them.
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows PowerShell
pip install -r requirements.txt
# Ensure your .env is configured
python main_demo_released.pyThe UI will start at: http://localhost:7860
docker run -d \
--name langcache-demo \
--env-file .env \
-p 7860:7860 \
gacerioni/gabs-redis-langcache:1.1.0Apple Silicon (arm64): if needed, add
--platform linux/amd64when running the image.
# docker-compose.yml
version: "3.9"
services:
langcache-demo:
image: gacerioni/gabs-redis-langcache:1.1.0
# platform: linux/amd64 # uncomment on Apple Silicon if needed
env_file:
- .env
ports:
- "7860:7860"
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"docker compose up -d- Set Company, Business Unit, and Person for both Scenario A and B.
- Ask questions in both panels to observe cache hits/misses and domain-aware disambiguation.
- Use the 🧹 Clear Cache buttons to delete entries by scope (A, B, or both).
⚠️ This clears cached entries only — the index is never deleted.
Recommended questions for demonstration:
- “My role is Doctor.” / “My role is Software Engineer.”
- “What is my role in the company?”
- “What is a cell?” (see difference between healthcare vs software)
- “Explain what machine learning is.” / “What is machine learning?”
- “What is my name?”
- Search Redis LangCache for semantically similar prompts.
- If a cache hit (above threshold) is found, return the cached response.
- If a miss occurs:
- Query OpenAI.
- Store a neutral response (no user identity) in the cache.
- Isolation is managed via attributes:
company,business_unit, andperson. - Ambiguous prompts are internally rewritten with explicit domain context (e.g., “(in the context of healthcare)”).
You can automate Docker build & release with GitHub Actions.
The existing workflow builds a multi-arch image and publishes it on new tags (vX.Y.Z).
Required repository secrets:
DOCKERHUB_USERNAMEDOCKERHUB_TOKEN(Docker Hub PAT)GITHUB_TOKEN(provided automatically)
- Redis LangCache Documentation: https://redis.io/docs/latest/solutions/semantic-caching/langcache/
- Redis Website: https://redis.io/
- LinkedIn (Gabriel Cerioni): https://www.linkedin.com/in/gabrielcerioni/
MIT — feel free to use, adapt, and share.