BYO endpoints
Route through your own vLLM, Ollama, Together-dedicated, Azure OpenAI, or any OpenAI-compatible inference server.
BYO (bring-your-own) lets you connect Conduix's routing, audit, governance, and spend-cap stack to any inference server you operate. Tokens are not billed against your Conduix credits — you're paying upstream — and your data never leaves your infrastructure.
When to use this. Compliance teams that can't send data to a public LLM provider. Customers running fine-tuned open-weight models. Anyone with a Together dedicated deployment, Azure OpenAI in their VPC, or a self-hosted Llama / Qwen / Mistral server.
Setup
- Stand up an OpenAI-compatible server. Conduix talks to anything that implements
POST /v1/chat/completionswith the OpenAI contract — vLLM, Ollama, Text Generation Inference, Together dedicated, Azure OpenAI (with the right config), or your own shim over Bedrock / Vertex. - Go to /dashboard/byo-endpoints and click Add endpoint.
- Fill in the form: a slug (3-40 lowercase chars — used in the model name), a display name, your base URL (the OpenAI-compatible
/v1root), and an optional API key (stored encrypted at rest with AES-256-GCM). - Optionally restrict which model ids are accepted by adding an allowlist.
Calling a BYO endpoint
Use the model id byo:<slug>/<upstream-model>:
client.chat.completions.create(
model="byo:my-vllm/meta-llama/Llama-4-Maverick",
messages=[{"role": "user", "content": "Hello"}],
)Conduix strips the byo:<slug>/ prefix and sends the upstream model name verbatim to your server.
What you get
Validation rules
- Base URL must use HTTPS in production (HTTP allowed for
localhost/.localwhile testing). - Slug: 3-40 chars, lowercase letters/digits/dashes, must start with a letter and end with a letter or digit.
- API keys: stored as AES-256-GCM ciphertext. Only a 4-character preview is shown back in the UI. Plaintext leaves Conduix only when calling your endpoint.
Troubleshooting
Common backends
vLLM
# Start vLLM with OpenAI-compatible server:
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-4-Maverick \
--host 0.0.0.0 --port 8000
# Then in Conduix dashboard, add:
# base URL: https://your-vllm.internal:8000/v1
# api_key: (whatever VLLM_API_KEY you set, or leave blank)
# slug: my-vllmOllama
# Run Ollama locally:
ollama serve &
ollama pull llama4:maverick
# Add in dashboard:
# base URL: http://localhost:11434/v1 (only for testing)
# slug: my-ollamaTogether dedicated
# After Together provisions your dedicated deployment:
# base URL: https://api.together.xyz/v1 (Together's standard endpoint)
# api_key: your Together secret
# slug: together-dedicated
# allowed_models: ["meta-llama/Llama-4-Maverick"] (optional, scope it)