Back to Blog

Stop Paying for Idle Nodes: How Karpenter Transforms Your AWS Bill

You open the AWS billing console expecting a modest bill. Your startup deploys twice a day, runs a handful of services, and traffic is predictable. Then you see it: $180 for EC2 this month. Your build infrastructure alone accounts for $60 of that, running a t3.large around the clock even though actual builds consume maybe 40 minutes of compute time per day.

This is the hidden tax of traditional Kubernetes node management. Cluster Autoscaler operates at the Auto Scaling Group level, scaling node groups up and down based on pending pods. But it reacts slowly, often keeping nodes running for extended periods "just in case." Build nodes, in particular, tend to linger because the autoscaler cannot predict when your next deployment will happen.

Karpenter changes this equation entirely. It provisions nodes in response to pending pods within seconds, selects the optimal instance type for each workload, and terminates nodes as soon as they are no longer needed. For Convox users, this means build nodes that spin up when you run convox deploy and disappear shortly after the build completes. It means workload nodes that match your actual resource requirements rather than forcing you into predefined node group sizes. And it means a monthly AWS bill that reflects what you actually used, not what you might have needed.

The Cost of Always-On Infrastructure

Traditional Kubernetes deployments use Cluster Autoscaler to manage node capacity. This works by monitoring for pods that cannot be scheduled due to insufficient resources, then incrementing the desired count on an Auto Scaling Group. The feedback loop takes minutes, not seconds, and the autoscaler errs on the side of keeping capacity available.

For build infrastructure, this behavior is particularly wasteful. Consider a typical development team that deploys two to three times per day. Each build might take 5 to 10 minutes. With Cluster Autoscaler, your build node often runs continuously because the scale-down delay defaults to 10 minutes or more, and any pending build pods cause it to scale back up before termination completes.

The math is straightforward. A t3.large instance costs approximately $0.0832 per hour in us-east-1. Running 24/7, that adds up to roughly $60 per month. If your actual build time totals 40 minutes per day, you are paying for over 23 hours of idle capacity every single day. Across a year, you spend $720 on build infrastructure that sits unused 97% of the time.

Workload nodes face a similar challenge. Your application traffic peaks during business hours but drops to a trickle overnight. Cluster Autoscaler eventually scales down, but the delay means you pay for capacity you do not need during quiet periods. And because ASGs use homogeneous instance types, you cannot easily mix smaller instances for light workloads with larger instances for heavy ones.

How Karpenter Solves the Idle Node Problem

Karpenter takes a fundamentally different approach. Instead of managing Auto Scaling Groups, it directly provisions EC2 instances in response to pending pods. When a build pod is created, Karpenter evaluates its resource requirements, selects the optimal instance type from a configurable set, and launches the instance within seconds. When the build completes and the pod terminates, Karpenter begins a consolidation countdown. If no new workloads arrive before the timer expires, the node is terminated.

This architecture enables true scale-to-zero for build infrastructure. Your build nodes exist only during active builds. The cost of running two 10-minute builds per day drops from $60 per month to roughly $0.55, assuming the same t3.large instance type. That represents a 99% reduction in build infrastructure costs.

For workload nodes, Karpenter provides equally significant benefits. It consolidates underutilized nodes by moving pods to fewer, better-utilized instances and terminating the empty ones. It automatically selects smaller instance types during low-traffic periods and larger ones during peak load. And it replaces nodes before they reach a configurable age, keeping your fleet on current AMIs without manual intervention.

Convox makes Karpenter accessible without requiring you to understand the underlying Kubernetes primitives. You enable Karpenter with a rack parameter, configure your preferences through simple key-value settings, and Convox handles the NodePool definitions, EC2NodeClass configurations, IAM roles, and SQS interruption queues that Karpenter requires.

Enabling Karpenter on Your Convox Rack

Karpenter uses a two-parameter enablement model. The first parameter, karpenter_auth_mode, prepares your EKS cluster by migrating it to the required access mode and applying discovery tags to subnets and security groups. This is a one-way migration that cannot be reverted. The second parameter, karpenter_enabled, deploys the Karpenter controller and creates the NodePools that manage your workload and build nodes.

Most users enable both parameters in a single command:

$ convox rack params set karpenter_auth_mode=true karpenter_enabled=true -r rackName
Setting parameters... OK

Once enabled, Karpenter manages two distinct NodePools. The workload NodePool handles your application services, scaling based on pending pods and consolidating underutilized nodes. The build NodePool, created when build_node_enabled=true, handles build pods exclusively and scales to zero when no builds are running.

After enabling Karpenter, you may want to reduce the size of your managed node group so that application workloads shift to Karpenter-provisioned nodes. System components like the Convox API server, router, and the Karpenter controller itself remain on the managed node group to ensure stability. See the Karpenter documentation for guidance on right-sizing your system nodes after migration.

Configuring Build Node Behavior

The key parameter for controlling build node costs is karpenter_build_consolidate_after. This sets the delay between when the last build pod completes and when Karpenter terminates the build node. The default is 60 seconds.

$ convox rack params set karpenter_build_consolidate_after=60s -r rackName
Setting parameters... OK

A shorter consolidation window means faster cost savings but introduces potential cold starts. If you run a second build within 60 seconds of the first completing, the existing build node handles it immediately. If you run a build 90 seconds later, Karpenter must provision a new node, adding roughly 30 to 60 seconds to your build time.

For teams that deploy in bursts, increasing this value to 5 minutes or more can eliminate cold starts during active development sessions while still saving money overnight and on weekends. For teams with predictable deployment schedules, a shorter window maximizes savings.

The karpenter_build_instance_families and karpenter_build_instance_sizes parameters control which instance types Karpenter considers for build nodes:

$ convox rack params set karpenter_build_instance_families=c5,c6i -r rackName
$ convox rack params set karpenter_build_instance_sizes=large,xlarge,2xlarge -r rackName
Setting parameters... OK

By allowing multiple instance types, Karpenter can select the optimal configuration based on your build requirements and current EC2 availability. Compute-optimized instances like c5 and c6i often provide better build performance than general-purpose instances at similar cost.

Using Spot Instances for Builds

Build workloads are inherently interruptible. If a spot instance is reclaimed mid-build, you simply restart the build. This makes builds an ideal candidate for spot pricing, which typically offers 60 to 70 percent savings compared to on-demand instances.

Enable spot instances for builds with the karpenter_build_capacity_types parameter:

$ convox rack params set karpenter_build_capacity_types=spot -r rackName
Setting parameters... OK

You can also specify on-demand,spot to allow Karpenter to fall back to on-demand instances when spot capacity is unavailable. This provides the cost benefits of spot pricing with a safety net for reliability.

Convox integrates with AWS SQS for spot interruption handling. When AWS signals that a spot instance will be reclaimed, Karpenter gracefully drains pods from the node before termination. For build pods, this means the build fails and can be retried automatically or manually. The interruption handling is configured automatically when you enable Karpenter.

Combining scale-to-zero with spot pricing produces dramatic cost reductions. A build infrastructure that cost $60 per month with Cluster Autoscaler might cost less than $0.20 per month with Karpenter and spot instances. The exact savings depend on your build frequency, build duration, and spot pricing at the time of each build.

Optimizing Workload Node Costs

While build nodes provide the most dramatic scale-to-zero savings, Karpenter also reduces costs for your application workloads. The key mechanisms are consolidation, right-sizing, and capacity type selection.

Consolidation is controlled by karpenter_consolidation_enabled (default: true) and karpenter_consolidate_after (default: 30 seconds). When enabled, Karpenter continuously evaluates whether pods can be packed more efficiently onto fewer nodes. If it determines that a node is underutilized, it cordons the node, moves pods to other nodes, and terminates the empty instance.

$ convox rack params set karpenter_consolidation_enabled=true -r rackName
$ convox rack params set karpenter_consolidate_after=30s -r rackName
Setting parameters... OK

Right-sizing happens automatically. When a pending pod requires resources, Karpenter selects the smallest instance type from your configured families and sizes that satisfies the pod requirements. If your service requests 512MB of memory and 250 millicores of CPU, Karpenter might select a t3.small rather than the t3.large that Cluster Autoscaler would provision for an entire node group.

For workload nodes, you can configure the instance selection with parameters like karpenter_instance_families and karpenter_capacity_types:

$ convox rack params set karpenter_instance_families=t3,t3a,m5,m5a -r rackName
$ convox rack params set karpenter_capacity_types=on-demand,spot -r rackName
Setting parameters... OK

By allowing Karpenter to choose from multiple instance families and mix spot with on-demand capacity, you get the lowest available price for each workload while maintaining availability for production services.

Resource Limits and Safety Controls

Karpenter includes safety limits to prevent runaway scaling. The karpenter_cpu_limit and karpenter_memory_limit_gb parameters cap the total resources Karpenter can provision across all workload nodes:

$ convox rack params set karpenter_cpu_limit=100 -r rackName
$ convox rack params set karpenter_memory_limit_gb=400 -r rackName
Setting parameters... OK

Build nodes have separate limits via karpenter_build_cpu_limit (default: 32) and karpenter_build_memory_limit_gb (default: 256).

The karpenter_node_expiry parameter controls automatic node replacement. By default, nodes are replaced after 720 hours (30 days), ensuring your fleet stays on current AMIs without manual intervention:

$ convox rack params set karpenter_node_expiry=720h -r rackName
Setting parameters... OK

The karpenter_disruption_budget_nodes parameter limits how many nodes can be disrupted simultaneously during consolidation or replacement, protecting your application availability during these operations.

Measuring Your Savings

After enabling Karpenter, you can measure the impact in AWS Cost Explorer. Navigate to Cost Explorer and filter by service (EC2) and tag. Karpenter-provisioned nodes include the tag karpenter.sh/nodepool with values like build or workload.

Compare your EC2 costs month-over-month. For build infrastructure specifically, look at the hours of runtime for instances tagged with the build NodePool. If you previously ran a build node 24/7 and now see intermittent usage, your savings should be proportional to the reduction in hours.

A typical before-and-after comparison might look like this:

Metric Cluster Autoscaler Karpenter
Build node hours/month 720 hours (always on) ~7 hours (2 builds/day, 10 min each)
Build node cost (t3.large) $60/month $0.58/month (on-demand)
Build node cost (spot) N/A ~$0.17/month
Annual build savings $0 $710+ per build node

Workload node savings vary more based on your traffic patterns and scaling behavior, but teams commonly report 20 to 40 percent reductions in overall EC2 costs after enabling Karpenter consolidation.

Convox Makes Karpenter Accessible

Configuring Karpenter directly on Kubernetes requires understanding NodePools, EC2NodeClasses, IAM roles, SQS queues for interruption handling, and EventBridge rules for spot termination notices. A minimal Karpenter setup involves dozens of lines of YAML and careful IAM policy configuration.

Convox abstracts this complexity into rack parameters. You enable Karpenter with two parameters and configure its behavior with simple key-value settings. Convox handles the underlying Kubernetes resources, IAM policies, and AWS integrations automatically.

Karpenter is also bidirectional in Convox. You can enable it to test the cost savings and disable it to return to Cluster Autoscaler if needed. The karpenter_enabled parameter can be toggled freely, while only karpenter_auth_mode is a one-way migration.

This approach lets you adopt Karpenter incrementally. Enable it on a staging rack first, measure the impact, then roll it out to production with confidence.

Get Started

Karpenter represents a fundamental improvement in how Kubernetes manages infrastructure costs. Build nodes that scale to zero, workload nodes that right-size automatically, and consolidation that eliminates idle capacity combine to produce significant monthly savings without changing your deployment workflow.

To get started with Karpenter on your Convox Rack, see the Karpenter documentation for the full configuration reference and migration guidance. The Getting Started Guide covers Rack installation if you are new to Convox.

Create a free account and see how much you can save on your AWS bill. For enterprise deployments or compliance requirements, reach out to our team.

Let your team focus on what matters.