If you've been running Convox v2, you know the magic: a single convox deploy turns your code into a running, load-balanced, auto-scaling application. That magic isn't going away. But underneath, everything has changed.
Convox v3 replaces AWS Elastic Container Service (ECS) with Kubernetes — specifically EKS on AWS, with GKE, AKS, and DOKS also supported. This isn't a minor version bump. It's an architectural overhaul that unlocks multi-cloud portability, standard Kubernetes ecosystem compatibility, advanced deployment controls, and granular workload placement. Your convox.yml still works. Your CLI commands still work. But the behaviors you've internalized over years of v2 usage — how logs appear, how resources are provisioned, how costs accumulate — have fundamentally shifted.
This guide is written for the v2 user who loves Convox's simplicity and doesn't want Kubernetes to steal it. We'll be transparent about the hard parts, show you exactly how to handle them, and make sure your migration goes smoothly.
Let's start with the issue that generates the most confusion during migration: your deployment hangs, you run convox logs, and you see absolutely nothing.
In v2, logs from your containers streamed immediately, regardless of health check status. In v3, convox logs only shows output from containers that have passed their Kubernetes readiness probe. If your app crashes on startup, fails its health check, or takes too long to boot, the container is running — but from Convox's perspective, it doesn't exist yet. This is by design in Kubernetes, but it's disorienting if you're used to v2's behavior.
To debug a stuck deployment in v3, you need to go one layer deeper. First, export your Rack's Kubernetes configuration:
$ convox rack kubeconfig -r myorg/production > ~/.kube/config
Now find the pods for your app. Convox namespaces apps as rackName-appName:
$ kubectl get pods -n production-myapp
NAME READY STATUS RESTARTS AGE
web-7f8b4d6c9-x2k4j 0/1 CrashLoopBackOff 4 3m
There's your problem — CrashLoopBackOff. Now pull the logs directly from the container, including previous crashed instances:
$ kubectl logs -n production-myapp web-7f8b4d6c9-x2k4j --previous
For even more detail — like why Kubernetes couldn't schedule the pod at all — use describe:
$ kubectl describe pod -n production-myapp web-7f8b4d6c9-x2k4j
If a deployment is hanging and you need to abort immediately, use:
$ convox apps cancel -a myapp
This triggers an immediate rollback to your last known good release. Bookmark these commands — they'll save you hours during migration.
For a comprehensive reference on how every Convox concept maps to Kubernetes resources, including naming patterns, labels, and kubectl commands for every resource type, check out our Convox to Kubernetes Resource Mapping guide.
In v2, your EC2 instances ran in public subnets. They had direct internet access, and outbound traffic was essentially free beyond standard data transfer charges. In v3, worker nodes run in private subnets by default. This is a security best practice — your compute is no longer directly addressable from the internet — but it introduces a cost that catches many users off guard: NAT Gateways.
Every outbound internet request from your private-subnet nodes (pulling Docker images from ECR, calling external APIs, sending webhooks) must pass through a NAT Gateway. On AWS, each NAT Gateway costs approximately $32/month plus $0.045 per GB processed. In a High Availability setup with 3 Availability Zones, that's a base cost of roughly $100/month before any data transfer.
The real cost danger comes from traffic that leaves your VPC when it doesn't need to. If your app connects to an external managed database like MongoDB Atlas, or calls any third-party service using public endpoints, all of that traffic flows through NAT. For services that live inside your VPC or a peered VPC, the fix is making sure you're using private hostnames or IP addresses so traffic stays on the AWS backbone and never touches NAT at all. AWS-managed services like RDS generally handle this well when accessed via their private endpoints, but self-managed or external databases are where costs can quietly spiral.
For internal service-to-service communication within your cluster, Kubernetes DNS is the way to go. In Convox, any service can reach another service in the same app using the pattern:
convoxServiceName.rackName-appName.svc.cluster.local
This routes traffic entirely within the cluster, bypassing NAT, load balancers, and even the AWS network layer. If your services are calling each other over public URLs or external DNS names, switching to internal DNS can meaningfully reduce your NAT bill.
There are several strategies to keep NAT costs under control:
serviceName.rackName-appName.svc.cluster.local) to keep traffic off NAT entirely.private=false for non-production: For development and staging Racks where compliance isn't a concern, you can set the private rack parameter to false during installation. This places worker nodes in public subnets and eliminates NAT costs entirely.⚠️ The private parameter is immutable after installation. You cannot change it without reinstalling the Rack. Plan accordingly.
Several v3 rack parameters are either immutable or have outsized impact on cost, stability, and developer experience. Here are the ones worth evaluating before installing your first v3 Rack.
In v2, Convox always provisioned a dedicated build instance. In v3, builds run on your main cluster nodes by default. This means a large Docker build can consume CPU and memory that your production services need, leading to resource contention, OOM kills, and degraded performance.
The fix is straightforward. When installing or updating your Rack, enable a dedicated build node:
$ convox rack params set build_node_enabled=true build_node_min_count=0
Setting build_node_min_count=0 means the build node scales to zero after 30 minutes of inactivity, so you only pay for it during active builds. We strongly recommend enabling this for any Rack running production workloads.
The cidr rack parameter (default 10.0.0.0/16) defines the IP address range for your VPC. Like private, it is immutable after installation. If you plan to use VPC peering to connect your Rack to other VPCs — for example, to reach a shared database or an internal router — you must ensure your CIDR blocks don't overlap. Plan your network topology before you install.
v3 introduces Kubernetes Pod Disruption Budgets (PDBs) via the pdb_default_min_available_percentage rack parameter, which defaults to 50. This means at least 50% of your pods must remain available during voluntary disruptions like node drains or cluster autoscaling.
Here's the gotcha: if you have a service running with count: 1, 50% of 1 rounds up to 1. The PDB tells Kubernetes that 1 pod must always be available, which means the pod can never be voluntarily evicted. This blocks the cluster autoscaler from draining and removing empty or underutilized nodes, silently inflating your compute costs.
For development Racks or services that can tolerate brief downtime, consider lowering this value or scaling single-replica services to count: 2 in production.
Your convox.yml will need updates. Most are straightforward, but a few are significant enough to break your app if missed. Here's a complete comparison of the changes that matter most.
This is the migration change most likely to catch you off guard. In v2, defining type: postgres in your resources section provisioned an AWS RDS instance — a managed, durable, backed-up database. In v3, type: postgres provisions a containerized database running inside your cluster. This is great for development but not what you want in production.
To get an actual RDS instance in v3, you must explicitly use type: rds-postgres:
v2 syntax (provisions RDS):
resources:
database:
type: postgres
options:
storage: 100
v3 syntax (provisions RDS):
resources:
database:
type: rds-postgres
options:
storage: 100
The same applies to MySQL (rds-mysql), MariaDB (rds-mariadb), and Redis (elasticache-redis). Review your resource definitions carefully before deploying to v3.
If you already have an RDS instance running outside of Convox — whether it was created manually, by Terraform, or by your v2 Rack — you don't have to start from scratch. Convox v3 supports importing existing RDS databases (and ElastiCache instances) directly into your app's resource definitions.
To import, use the import option in your resource definition and provide the masterUserPassword as an environment variable reference:
resources:
mydb:
type: rds-postgres
options:
import: my-existing-rds-instance-identifier
masterUserPassword: ${MYDBPASS}
services:
web:
resources:
- mydb
Before deploying, set the password via convox env set:
$ convox env set MYDBPASS=my_secure_password -a myapp
Setting MYDBPASS... OK
Release: RABCDEFGHI
While the import option is set, Convox treats the database as a passive linked resource. It won't modify the database's configuration, and no other options in the resource definition will be applied. Convox will inject the connection URL and credential environment variables into your linked services just like any other resource.
When you're ready for Convox to take over full management of the imported database, simply remove the import option (and the masterUserPassword reference) from your convox.yml and redeploy. At that point, Convox will begin managing the database's lifecycle and any configured options will take effect.
One important safety note: if an application is deleted, any RDS databases it created will also be deleted. For imported databases, this does not apply — the database will remain intact and must be manually removed. That said, we strongly recommend enabling deletionProtection on any production database regardless of how it was provisioned.
If you'd rather skip the import process and just point your app at an existing database without Convox managing it at all, you can use a Resource Overlay. Set the resource's URL directly as an environment variable, and Convox will use that instead of provisioning anything:
$ convox env set MAIN_URL=postgres://user:pass@your-rds-host:5432/dbname -a myapp
Setting MAIN_URL... OK
Release: RABCDEFGHI
When a matching environment variable is set (e.g., MAIN_URL for a resource named main), Convox won't start a containerized resource for it. This is a great approach for teams that manage their databases through other tools and just need Convox to connect to them.
In v2, scale.memory set a hard memory limit. If your process exceeded it, ECS killed it. Simple. In v3, the model is different because Kubernetes separates scheduling from enforcement.
The values you set in scale.cpu and scale.memory become Kubernetes resource requests. These are the guaranteed minimums that the scheduler uses to decide which node your pod runs on. A pod requesting 512 MB of memory will only be placed on a node with at least 512 MB available. This is the primary mechanism for ensuring your services have the resources they need, and getting your requests right is the single most important thing to focus on.
Convox also supports limit.cpu and limit.memory, which set hard caps. If a pod exceeds its memory limit, Kubernetes kills it (OOMKill). If it exceeds its CPU limit, it gets throttled. However, in most cases we'd recommend starting without limits and only adding them if you have a specific reason to. Overly tight limits are one of the most common causes of deployment issues on Kubernetes — pods get OOM-killed or CPU-throttled during normal traffic spikes, leading to restarts, failed health checks, and cascading problems that can be hard to trace back to a resource constraint.
Here's a typical configuration with just requests:
services:
web:
build: .
port: 3000
scale:
count: 2 # Static replica count
cpu: 256 # Request: guaranteed 256m CPU
memory: 512 # Request: guaranteed 512 MB RAM
The scale.count value controls how many replicas of your service are running. Set it as a fixed number for a static deployment, or as a range to enable Kubernetes Horizontal Pod Autoscaling (HPA):
services:
web:
build: .
port: 3000
scale:
count: 2-10 # Autoscale between 2 and 10 replicas
cpu: 256
memory: 512
targets:
cpu: 70 # Scale up when avg CPU exceeds 70%
When you specify a range, Convox creates a HorizontalPodAutoscaler that monitors your pods' actual resource usage against the targets you define. The targets section tells Kubernetes when to scale: a cpu: 70 target means new replicas are added when average CPU utilization across existing pods exceeds 70% of the requested CPU.
Accurate resource requests are critical here. If your requests are too high relative to actual usage, utilization will always appear low and HPA will never scale up. If they're too low, HPA will trigger scaling constantly. A good starting point is to set requests close to your service's steady-state usage and let HPA handle the spikes.
v3 uses Kubernetes rolling deployments by default, which means new pods are brought up before old pods are terminated. The deployment block controls how aggressively this rollout happens:
services:
web:
build: .
port: 3000
scale:
count: 4
cpu: 256
memory: 512
deployment:
minimum: 50 # At least 50% of pods stay available during deploy
maximum: 200 # Up to 200% of desired count can exist during rollout
With these settings and a count of 4, Kubernetes will keep at least 2 pods running at all times during a deploy and can spin up as many as 8 total while transitioning. This gives you zero-downtime deployments, but keep in mind that during the rollout you temporarily need enough cluster capacity to run both old and new pods.
If you're coming from v2 where deployments were more opaque, the key mental shift is that your scale requests directly affect scheduling, autoscaling behavior, and rollout capacity. Getting them right is more impactful than setting limits.
Beyond the big changes, there are several smaller syntax differences that can trip you up during migration.
v2 used AWS CloudWatch's 6-field cron expression format (including a year field and the ? character). v3 uses standard Kubernetes 5-field cron. Your timers will fail validation if you don't update the syntax.
| Version | Syntax | Example (3 AM daily) |
|---|---|---|
| v2 | 6-field (min hour dom month dow year) | 0 3 * * ? * |
| v3 | 5-field (min hour dom month dow) | 0 3 * * * |
The drain key is deprecated in v3. Replace it with the termination.grace syntax:
v2:
services:
web:
drain: 60
v3:
services:
web:
termination:
grace: 60
v3 introduces startupProbe via the health check configuration. If you've ever had a Rails app that takes 30+ seconds to boot (asset compilation, seed data loading, ML model initialization), you know the frustration of Kubernetes killing it before it finishes starting because the liveness probe timed out.
The startup probe gates the liveness and readiness probes entirely. Until the startup probe passes, Kubernetes won't even begin checking the other probes. This gives slow-booting apps the breathing room they need without weakening your ongoing health checks:
services:
web:
build: .
port: 3000
health:
path: /health
interval: 10
timeout: 5
startup:
path: /health
timeout: 120 # Give the app up to 2 minutes to boot
If you're migrating a monolithic application, startup probes alone can eliminate 90% of your deployment-related headaches.
This is one of the more confusing syntax changes in v3. In v2, you had a single port: key and a ports: array that handled everything. In v3, they serve entirely different purposes:
port: 3000 (Singular) — Exposes the service via public HTTPS ingress on port 443, routed to your container on port 3000. This is what most web services need.ports: (Plural) — Exposes ports for internal cluster communication or for use with custom Balancers. These ports are not publicly accessible by default.v2 syntax:
services:
web:
build: .
port: 3000
v3 syntax (identical for basic web services):
services:
web:
build: .
port: 3000
For TCP or UDP services (like a game server, MQTT broker, or gRPC endpoint), v3 introduces the balancers: block, which provisions a dedicated Network Load Balancer. Note that the service must still define a port (singular) that is different from the ports listed in ports:. Convox uses this port for standard ingress, health checks, and internal service management:
balancers:
gameserver:
service: game
ports:
7777: 7777
services:
game:
build: .
port: 3000 # Required: used by Convox for health checks and ingress
ports:
- 7777 # Exposed via the NLB balancer above
If you omit the port value or set it to the same value as one of your ports: entries, Convox won't be able to properly manage health checks or route traffic to the service.
One of the most powerful features you gain in v3 is fine-grained control over where your workloads run. In v2, all your services shared the same pool of EC2 instances. In v3, you can create custom node groups with different instance types, capacity modes, and scaling rules, and then direct specific services to specific node groups.
This is useful for a lot of real-world scenarios: running CPU-heavy batch workers on compute-optimized instances, putting your web frontend on on-demand nodes while background jobs run on cheaper spot instances, or isolating sensitive workloads onto dedicated node pools.
You define custom node groups at the Rack level using the additional_node_groups_config parameter. Create a JSON file with your node group definitions:
[
{
"id": 101,
"type": "t3.medium",
"capacity_type": "ON_DEMAND",
"min_size": 1,
"max_size": 5,
"label": "web-services",
"tags": "environment=production,team=frontend"
},
{
"id": 102,
"type": "c5.large",
"capacity_type": "SPOT",
"min_size": 0,
"max_size": 10,
"label": "batch-workers",
"tags": "environment=production,team=data"
}
]
Then apply it to your Rack:
$ convox rack params set additional_node_groups_config=/path/to/node-groups.json -r rackName
Each node group gets a label value that becomes a Kubernetes label (convox.io/label) on those nodes. You can also assign a unique id to each group so it doesn't get recreated during config updates, and use the tags field to apply AWS resource tags for cost tracking.
Once your node groups are in place, use nodeSelectorLabels in your convox.yml to direct services to specific groups:
services:
web:
build: .
port: 3000
nodeSelectorLabels:
convox.io/label: web-services
worker:
build: ./worker
nodeSelectorLabels:
convox.io/label: batch-workers
If you want softer placement preferences rather than hard requirements, you can use nodeAffinityLabels with weights. This tells Kubernetes to prefer certain nodes without failing the deployment if they're not available:
services:
web:
nodeSelectorLabels:
convox.io/label: web-services
nodeAffinityLabels:
- weight: 10
label: node.kubernetes.io/instance-type
value: t3a.large
- weight: 1
label: node.kubernetes.io/instance-type
value: t3a.medium
In this example, the service will always run on the web-services node group, but Kubernetes will prefer t3a.large instances within that group (weight 10) over t3a.medium instances (weight 1).
Workload placement isn't just for services. You can also create dedicated node groups specifically for builds using additional_build_groups_config, and then target them with the BuildLabels app parameter:
$ convox rack params set additional_build_groups_config='[{"id":201,"type":"c5.xlarge","capacity_type":"SPOT","min_size":0,"max_size":3,"label":"app-build","disk":100}]' -r rackName
$ convox apps params set BuildLabels=convox.io/label=app-build -a myapp
This puts your builds on beefy, cost-effective spot instances that scale to zero when idle, keeping them completely isolated from your production workloads.
For the full set of node group configuration options (including dedicated node pools, AMI overrides, and disk sizing), see the Workload Placement documentation.
Here's a summary of every significant change between v2 and v3, consolidated for quick reference:
| Area | v2 (ECS) | v3 (Kubernetes) |
|---|---|---|
| Network | EC2 in public subnets | Nodes in private subnets (NAT required) |
| Logs | Stream immediately | Hidden until readiness probe passes |
| Builds | Dedicated build instance | Main cluster (enable build_node_enabled) |
| Database Resources | type: postgres → RDS |
type: postgres → Container. Use rds-postgres for RDS. |
| Timer Cron | 6-field (0 3 * * ? *) |
5-field (0 3 * * *) |
| Public Ports | port: 3000 |
port: 3000 (same, but ports: now = internal only) |
| TCP/UDP | Via ports array | Requires balancers: block (NLB) |
| Memory/CPU | scale.memory = hard limit |
scale.memory = request (scheduling guarantee). Limits optional. |
| Autoscaling | ECS service auto scaling | scale.count: 2-10 with targets creates HPA |
| Service Discovery | links: injects [SERVICE]_URL |
K8s DNS: service.rackName-appName.svc.cluster.local |
| Termination | drain: 30 |
termination: grace: 30 |
| IAM | Instance-level or Kiam | EKS Pod Identity (pod_identity_agent_enable=true) |
| Existing DBs | Manual env var or Kiam | import option or Resource Overlay via env var |
| Workload Placement | Shared instance pool | Custom node groups with nodeSelectorLabels and nodeAffinityLabels |
Migrating from v2 to v3 takes some planning, but if you've followed along with this guide, you already have a clear picture of what needs to change and why. Most of the work is mechanical: updating resource types, adjusting cron syntax, adding limit blocks. Once you're through it, you get standard Kubernetes ecosystem compatibility, multi-cloud portability, startup probes, Pod Disruption Budgets, custom NLBs, granular IAM via Pod Identity, and the entire universe of Kubernetes tooling — all while keeping the convox deploy simplicity that made you choose Convox in the first place.
Start with a staging Rack, use the checklist above, and give yourself a bit of time to get comfortable with the debugging workflow. The syntax changes are the easy part. The mental model shift — from ECS to Kubernetes — is what takes more effort, but this guide has given you the map.
Ready to make the jump? Sign up for a free Convox account and spin up a v3 staging Rack today. Walk through our Getting Started Guide to see how the new architecture works firsthand, or explore our example applications for reference implementations.
For teams planning a production migration and want help with architecture review, VPC planning, or cost optimization, reach out to our team — we've helped hundreds of teams navigate this transition successfully.
For enterprise migrations, architecture reviews, or compliance questions: sales@convox.com