BYO endpoints

Route through your own vLLM, Ollama, Together-dedicated, Azure OpenAI, or any OpenAI-compatible inference server.

BYO (bring-your-own) lets you connect Conduix's routing, audit, governance, and spend-cap stack to any inference server you operate. Tokens are not billed against your Conduix credits — you're paying upstream — and your data never leaves your infrastructure.

When to use this. Compliance teams that can't send data to a public LLM provider. Customers running fine-tuned open-weight models. Anyone with a Together dedicated deployment, Azure OpenAI in their VPC, or a self-hosted Llama / Qwen / Mistral server.

Setup

  1. Stand up an OpenAI-compatible server. Conduix talks to anything that implements POST /v1/chat/completions with the OpenAI contract — vLLM, Ollama, Text Generation Inference, Together dedicated, Azure OpenAI (with the right config), or your own shim over Bedrock / Vertex.
  2. Go to /dashboard/byo-endpoints and click Add endpoint.
  3. Fill in the form: a slug (3-40 lowercase chars — used in the model name), a display name, your base URL (the OpenAI-compatible /v1 root), and an optional API key (stored encrypted at rest with AES-256-GCM).
  4. Optionally restrict which model ids are accepted by adding an allowlist.

Calling a BYO endpoint

Use the model id byo:<slug>/<upstream-model>:

client.chat.completions.create(
    model="byo:my-vllm/meta-llama/Llama-4-Maverick",
    messages=[{"role": "user", "content": "Hello"}],
)

Conduix strips the byo:<slug>/ prefix and sends the upstream model name verbatim to your server.

What you get

Authentication
Customer keys (cx_live_/cx_test_) authorize the call to your endpoint
Audit log
Every request lands in your usage logs with provider = "byo:<slug>"
Rate limiting
Per-key RPM applies as usual
Spend caps
Daily/monthly caps still enforced (counts requests, not credits)
PII redaction
Same redaction rules apply before the request leaves Conduix
Response headers
x-conduix-provider = "byo:<slug>"
Token billing
$0 — you pay upstream, not Conduix

Validation rules

  • Base URL must use HTTPS in production (HTTP allowed for localhost/.local while testing).
  • Slug: 3-40 chars, lowercase letters/digits/dashes, must start with a letter and end with a letter or digit.
  • API keys: stored as AES-256-GCM ciphertext. Only a 4-character preview is shown back in the UI. Plaintext leaves Conduix only when calling your endpoint.

Troubleshooting

byo_endpoint_not_found
Slug doesn't exist for this org. Check spelling.
byo_endpoint_disabled
Endpoint exists but is disabled — re-enable in the dashboard.
byo_model_not_allowed
Model id isn't in the allowlist for that endpoint.
upstream_error
Your endpoint returned 4xx/5xx. Check its logs.

Common backends

vLLM
# Start vLLM with OpenAI-compatible server:
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-4-Maverick \
  --host 0.0.0.0 --port 8000

# Then in Conduix dashboard, add:
#   base URL: https://your-vllm.internal:8000/v1
#   api_key:  (whatever VLLM_API_KEY you set, or leave blank)
#   slug:     my-vllm
Ollama
# Run Ollama locally:
ollama serve &
ollama pull llama4:maverick

# Add in dashboard:
#   base URL: http://localhost:11434/v1   (only for testing)
#   slug:     my-ollama
Together dedicated
# After Together provisions your dedicated deployment:
#   base URL: https://api.together.xyz/v1   (Together's standard endpoint)
#   api_key:  your Together secret
#   slug:     together-dedicated
#   allowed_models: ["meta-llama/Llama-4-Maverick"]   (optional, scope it)