Deploy production GPU inference on your own infrastructure

One-Command Model Deployment

Pick from 30+ pre-configured templates or bring any HuggingFace model. The Deploy Wizard handles GPU selection, health checks, autoscaling, and environment configuration. Your first inference endpoint can be live in under 15 minutes.

Scale to Zero When Idle

KEDA-based autoscaling scales GPU services to zero replicas when traffic stops and spins them back up on the next request. Set per-app budget caps with automatic shutdown. GPU instances cost $1 to $32+ per hour - stop paying when nothing is running.

OpenAI-Compatible Endpoints

22 templates expose OpenAI-compatible API endpoints out of the box. Swap your OpenAI base URL for your Convox endpoint and your existing application code works unchanged. Migrate off managed APIs without rewriting anything.

Full GPU Observability

Real-time dashboards for GPU utilization, VRAM, power draw, and throughput per pod. Built-in DCGM exporter with 20+ metrics. Configurable chart windows from 5 minutes to 24 hours. Grafana deep links for advanced investigation.

Your Cloud, Your Data

Everything runs in your AWS account. No data leaves your VPC. No third-party API calls. No per-request markup. Per-action RBAC with admin gates on budget and infrastructure mutations. Full audit trail with actor attribution on every event.

Bring Any HuggingFace Model

Not in the catalog? Enter any HuggingFace model ID and the Deploy Wizard auto-configures GPU selection, memory allocation, and serving engine. Gated models supported with your HuggingFace access token. Six serving engines: vLLM, SGLang, TGI, ComfyUI, Triton, and NIM.

Don't just take our word for it.

“Convox made it possible for us to distribute dev-ops responsibilities from one individual to the entire team. Their platform makes it super simple for our developers to fully manage their applications in production without the operational overhead of managing Kubernetes.”

Jim Myers — Flipside Crypto

“The Convox advantage is that operations work is reduced to an absolute minimum. We used to have an extra consultant just to keep our servers safe, taking care of updates, logs and backups, whereas now our developers manage the entire infrastructure by themselves.”

Cesare Navarotto — Monrif

“Convox helped us migrate everything to AWS quicker than I ever thought was possible. Unlocking all the advantages of the cloud through Convox is easily one of the best decisions we made.”

Ryan Jackson — Paid Labs
×

Book a Demo