Choose a model, deploy in minutes

Large Language Models

16 templates spanning Llama 3.3, Mistral 7B, Qwen 2.5, DeepSeek R1, Phi-4, Gemma 2, and Mixtral 8x7B. Serving engines include vLLM, SGLang, TGI, and NVIDIA NIM. Models from 1.7B to 70B parameters across T4, A10G, L4, L40S, and A100 GPUs. All LLM templates expose OpenAI-compatible chat completion endpoints.

Image and Video Generation

Stable Diffusion XL, FLUX.1 Schnell, and SD3.5 Medium via ComfyUI for image generation. LTX-Video, CogVideoX, Wan Image-to-Video, and AnimateDiff for video synthesis. Full ComfyUI workflow support for custom pipelines and LoRA loading. Text-to-video and image-to-video with configurable resolution and frame counts.

Speech and Audio

Whisper large-v3-turbo for production speech-to-text. Kokoro 82M for fast TTS across 8 languages. Orpheus 3B for expressive TTS with emotion control. Chatterbox for zero-shot voice cloning from a 10-second sample. All speech templates include OpenAI-compatible endpoints for drop-in integration.

RAG and Embeddings

Full retrieval-augmented generation stack: API gateway, vLLM for generation, BGE embeddings, and Qdrant vector database. Ingest, embed, store, and query documents with citations. BGE Large embedding template (335M params, 1024-dim output) also available standalone for semantic search.

Multi-LoRA and Custom Models

NVIDIA Triton template with Llama 3.1 8B base and hot-swap LoRA adapters for serving multiple fine-tuned variants from a single GPU. Or bring any HuggingFace model ID - the Deploy Wizard auto-configures GPU, memory, and serving engine. Gated models supported with your access token.

Built for Production

Every template ships with GPU-aware health checks, startup probes sized for model loading, and KEDA autoscaling rules. Scale to zero when idle. Set budget caps per app. Track per-service GPU spend in real time. Models run on your AWS account at AWS pricing with no per-request markup.
Note: Model licenses vary. Check each model's license on HuggingFace before production use.

Don't just take our word for it.

“Convox made it possible for us to distribute dev-ops responsibilities from one individual to the entire team. Their platform makes it super simple for our developers to fully manage their applications in production without the operational overhead of managing Kubernetes.”

Jim Myers — Flipside Crypto

“The Convox advantage is that operations work is reduced to an absolute minimum. We used to have an extra consultant just to keep our servers safe, taking care of updates, logs and backups, whereas now our developers manage the entire infrastructure by themselves.”

Cesare Navarotto — Monrif

“Convox helped us migrate everything to AWS quicker than I ever thought was possible. Unlocking all the advantages of the cloud through Convox is easily one of the best decisions we made.”

Ryan Jackson — Paid Labs
×

Book a Demo