Skip to content

Trying to understand JAN AI Server deployment option to offload GPU to external machine #227

@DPBattaglia

Description

@DPBattaglia

Hi team,

I’m very interested in trying out Jan Server for our company as a replacement for OpenWebUI (OWUI). We currently have OWUI operational, but as a small business (fewer than 5 employees), we’ve found it too complex to fine-tune and maintain. Jan AI’s simplicity — and the major improvements over the past few months — make it look like a fantastic alternative for SMBs like ours.

After reviewing the guides and Docker Compose examples, I’m trying to determine what would be the most optimal deployment architecture for our setup.

🧩 Overview

  • Business type: Small business (<5 employees)
  • Hardware: Mac M3 Ultra (used as inference machine)
  • Infrastructure: Synology NAS (handles reverse proxy, MS365 EntraID SSO, file storage, and local data access)
    Current setup:
  • Synology manages reverse proxy + SSO
  • Mac runs an exposed OpenAI-compatible MLX server to handle AI inference requests

We’d like to replicate this setup with Jan Server, ideally with a simpler, more maintainable configuration.

⚙️ Deployment Questions

I’ve reviewed your docker directory and would appreciate some guidance on how best to structure our setup:

  1. Should I use jan-server/docker/infrastructure.yml as the main template?
  • I’m unsure whether this file should also include the llm-api service.
  1. On the Mac M3 Ultra, should I run one or both of:
  • llm-api (/service-api.yml), and/or
  • vllm (/inference.yml)
  1. How should I connect all the components together — via the .env file?
  2. Do I still need Kong if the Synology NAS is already acting as our reverse proxy (terminating HTTPS connections and forwarding requests)?
  • It looks like Kong might still be required to handle API routing between services.
  1. Would it make more sense to run llm-api on the Synology NAS or on the Mac M3?
  2. Is the vllm inference setup on macOS tuned for Metal to utilize the M3 Ultra’s GPU and CPU efficiently?
  • Or should I continue using an MLX-based OpenAI-compatible server instead?
  1. Do I need Keycloak for OIDC / SSO integration with Microsoft EntraID, or can Jan Server handle that natively?
  2. Side question — will Jan eventually expose the OpenAI v1/audio/transcriptions API (ASR)?

I’d love to consolidate everything into Jan AI for our SMB AI platform.

🖥️ Desired Architecture

Ideally, we’d like:

  • Synology NAS → runs Jan front-end, SSO integration, DB, chat history, and file management
  • Mac M3 Ultra → handles all GPU-intensive inference tasks

Based on what I’ve read, my assumption is that:

  • Synology would run a combined infrastructure.yml + service-api.yml setup (without inference),
  • and the Mac would run inference.yml for model management and inference hosting.

Does that sound correct?

Also, would the Jan web GUI (and user sign-in) still be accessible via the default web interface on port 8000?

🙏 Closing

Thanks for building such an awesome product — it’s clearly come a long way recently.
I really appreciate your time in helping me confirm the best approach for an SMB setup like ours.

Kind regards,
David

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions