Inference backends
Audience: platform operators deploying Asgard. End users and agents never read this, they just request an LLM key for their project and get one. Which backend serves it is invisible to them.
Inference is a service module, like any other (see the service-module contract
in plans/SERVICE-MODULE.md). Asgard’s control plane is constant, register →
project token → governed inference → cost + audit, and the backend behind it is
swappable. AI inference is the loudest tenant, not the whole building.
Out of the box (the lightweight floor)
Asgard ships two built-in inference modules. Each activates the moment its master key is present, no other setup:
| Module | Enable by setting | Models |
|---|---|---|
openai | OPENAI_MASTER_KEY | model:default/gpt-5.1, … |
anthropic | ANTHROPIC_MASTER_KEY | model:anthropic/claude-sonnet |
That’s the whole story for one or two providers. A project mints a token, calls the
gateway, and spend is attributed per token, the master key never leaves the
platform. The project id is forwarded downstream as the OpenAI user /
Anthropic metadata.user_id, so the provider’s own logs/spend carry it too.
You do not need LiteLLM for this.
Scaling out: LiteLLM (optional, operator-deployed)
When you want 100+ providers, fallbacks, or routing, enable the litellm
module, a standard module that ships with Asgard. You do not author a service
definition; you stand up LiteLLM and point Asgard at it.
1. Deploy LiteLLM
LiteLLM is its own process, run it under your controls (HA, network, secrets), not bundled inside Asgard. Two common paths:
- docker-compose (local / single host):
services: litellm: image: ghcr.io/berriai/litellm:main-latest command: ["--config", "/app/config.yaml"] ports: ["4000:4000"] volumes: ["./litellm.config.yaml:/app/config.yaml"] - Kubernetes / Helm, deploy the LiteLLM chart as its own Deployment+Service alongside the Asgard release.
Configure LiteLLM’s providers/keys per its own docs: docs.litellm.ai/docs/proxy/configs.
2. Point Asgard at it
Set on the Asgard process:
LITELLM_BASE_URL=http://litellm:4000 # the proxy URL
LITELLM_MASTER_KEY=sk-... # the LiteLLM master key
The litellm module activates on LITELLM_BASE_URL (it’s OpenAI-compatible).
Asgard now routes inference through LiteLLM while keeping all governance, tokens,
budgets, policy, guardrails, audit, per-project cost, exactly the same.
This gated path needs no LiteLLM database. Asgard only proxies chat completions through LiteLLM on the one shared master key, so LiteLLM can run config-only / stateless (no
DATABASE_URLon the proxy). Key issuance, budgets, and cost attribution all live in Asgard. The per-project path below is the one that needs LiteLLM’s own DB.
3. (Optional) Make it the default backend
Point the catalog’s default model ref at the litellm module’s model, or set the
project default-backend rule. Projects keep requesting “an LLM key”; they never
know the backend changed.
4. (Optional) Per-project LiteLLM keys
The litellm module above fronts LiteLLM through Asgard’s gateway on one shared
master key. To instead give each project its own budgeted LiteLLM key, so the
project calls LiteLLM directly and Asgard pulls the key’s spend back, the same two
env vars also enable the litellm-key service:
LITELLM_BASE_URL=http://litellm:4000
LITELLM_MASTER_KEY=sk-...
A project requests litellm-key (with a max_budget_usd) through the normal
governed request_resource flow → human review (it’s a spend-authorizing
credential, and Asgard isn’t in the call path) → Asgard mints a project-tagged
virtual key on the proxy, stores the key value in the secret store, and registers a
litellm cost source that reads each key’s spend back via /key/info. Without
those env vars the service’s connector falls back to stub and no spend is pulled.
This direct path requires LiteLLM to have its own database. Per-project key minting (
/key/generate), budgets, and spend read-back (/key/info) are DB-backed features of the LiteLLM proxy, LiteLLM must be configured with its ownDATABASE_URL(its own Postgres, separate from Asgard’s). The gatedlitellmmodule above does not need this; onlylitellm-keydoes, because here the key lives on the proxy, not in Asgard.
Databricks Model Serving
The shipped databricks module is exactly the same shape as LiteLLM, a plug-in
manifest, not core code. It’s kind: openai-compatible with one extra knob,
chat_path, because Databricks queries a served model at
{host}/serving-endpoints/{endpoint-name}/invocations (the endpoint is in the path,
not OpenAI’s /v1/chat/completions):
DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi...
The module activates on those two env vars (the standard Databricks ones, shared
with the Terraform provider and cost source); edit its models[] so each route is a
serving endpoint name. Asgard’s gateway then sits in front of Databricks Model
Serving (and Databricks’ own Mosaic AI Gateway), same tokens, budgets, policy,
guardrails, audit, per-project cost. See Databricks for the full
picture (provisioning + cost too).
Other backends
vLLM, Azure OpenAI, or any OpenAI-compatible endpoint use the same shape, set
base_url + api_key_env on a module, plus chat_path if its request path isn’t
/v1/chat/completions. Anthropic-native uses the anthropic kind. Adding a
brand-new backend is dropping a services/<id>/service.yaml with an inference
block; no recompile.
What the operator never has to do
- Write a LiteLLM service definition (it’s a standard module).
- Bundle LiteLLM into Asgard’s image (deploy it separately, point at it).
- Expose any of this to end users (they see a capability, not a backend).