Back to Blog

The Tribal Knowledge Trap: Why convox.yml Becomes Your Infrastructure Documentation

The SOC 2 auditor leans forward, pen in hand. "Can you show me how your services are configured and how traffic flows between them?"

You pause. Marcus, your senior engineer who set up the entire production infrastructure eighteen months ago, left for a two-week hiking trip in Patagonia yesterday. There is no cell service where he is going. You open Slack and start scrolling through the #infrastructure channel, looking for that message where he explained how the database connections work. Somewhere in your Notion workspace there is a document titled "Production Architecture" that was last updated eight months ago. You have a Terraform repository, but it only covers about half of what is actually running. The rest lives in a combination of kubectl commands Marcus runs from his laptop and environment variables set directly in the AWS console.

The auditor is still waiting.

The Tribal Knowledge Problem

This scenario plays out at startups every week. Not necessarily during audits. Sometimes it happens when Marcus gives two weeks notice and suddenly everyone realizes that critical infrastructure knowledge is about to walk out the door. Sometimes it happens at 2 AM when a new engineer is on call for the first time and needs to understand why the worker service is not processing jobs.

The pattern is always the same. Infrastructure knowledge fragments across tools, documents, and people. Terraform handles some resources. Kubernetes manifests live in a different repository. Environment variables are set through a mix of CI/CD pipelines, console clicks, and SSH sessions that nobody documented. Health check configurations are buried in Helm charts. Scaling rules exist in an auto-scaling group that someone configured manually a year ago.

For compliance frameworks like SOC 2, this fragmentation creates real problems. Auditors want to see how your systems are configured and how changes are tracked. They want to know who can access what and how you ensure that production matches your documented configuration. When your infrastructure documentation is spread across five tools and one person's memory, you cannot answer these questions confidently.

For healthcare startups pursuing HIPAA compliance or fintech companies navigating regulatory requirements, the stakes are even higher. You need to demonstrate that protected health information flows only through systems with appropriate safeguards. You need to show that financial data is handled according to your documented policies. Scattered tribal knowledge does not satisfy auditors, and it does not protect your business.

Declarative Infrastructure as Living Documentation

The solution is not more documentation. Nobody wants to maintain a separate wiki page that describes what Terraform already knows, what Kubernetes already knows, and what the CI/CD pipeline already knows. The solution is infrastructure configuration that is human-readable enough to serve as documentation while being machine-executable enough to actually run your systems.

This is the core insight behind manifest-based configuration. When your entire application is defined in a single file that both humans can read and systems can execute, documentation and implementation cannot drift apart. The configuration file is the documentation. Change the configuration, and the documentation updates automatically because they are the same thing.

Convox uses convox.yml as this single source of truth. One file defines your services, their resource allocations, database connections, environment variable requirements, health checks, scaling rules, and scheduled jobs. When someone asks "how is production configured?", you point them to one file that answers every question.

Anatomy of a Complete convox.yml

Let us look at a realistic example for a healthcare SaaS application. This is not a toy example with a single web service. This is what a production convox.yml looks like for a company preparing for compliance audits.

environment:
  - RAILS_ENV=production
  - SECRET_KEY_BASE
  - ENCRYPTION_KEY
  - SENTRY_DSN

resources:
  database:
    type: postgres
    options:
      storage: 100
      version: "16"
  cache:
    type: redis

services:
  web:
    build: .
    command: bundle exec puma -C config/puma.rb
    port: 3000
    health: /health
    environment:
      - ALLOWED_HOSTS
    resources:
      - database
      - cache
    scale:
      count: 2-10
      cpu: 512
      memory: 1024
      targets:
        cpu: 70
    deployment:
      minimum: 50
      maximum: 200

  worker:
    build: .
    command: bundle exec sidekiq
    environment:
      - SIDEKIQ_CONCURRENCY=10
    resources:
      - database
      - cache
    scale:
      count: 2
      cpu: 256
      memory: 512

  scheduler:
    build: .
    command: bundle exec clockwork config/clock.rb
    resources:
      - database
      - cache
    scale:
      count: 1
      cpu: 128
      memory: 256

timers:
  daily-report:
    schedule: "0 6 * * *"
    command: bundle exec rake reports:daily
    service: worker

  cleanup:
    schedule: "0 3 * * *"
    command: bundle exec rake cleanup:old_records
    service: worker

A new engineer reading this file for the first time can understand the entire system in fifteen minutes. Let us walk through what each section documents.

Environment Variables Section

The top-level environment section defines variables available to every service. Variables with values (like RAILS_ENV=production) have defaults. Variables without values (like SECRET_KEY_BASE) must be set before deployment. This makes it immediately clear which secrets the application requires. No more hunting through Kubernetes secrets, AWS Parameter Store, and environment-specific .env files to understand what a service needs to run.

For compliance purposes, this is documentation of your secrets management approach. Auditors can see that sensitive values are not hardcoded in the repository. The values are set separately via convox env set and stored encrypted by the platform. See Environment Variables for the complete reference.

Resources Section

The resources section defines databases and caches. This example shows a PostgreSQL database with 100GB of storage running version 16, plus a Redis cache. The configuration is explicit about versions and capacity. No tribal knowledge required to understand what database the application uses or how much storage is allocated.

When you link a resource to a service, Convox automatically injects connection environment variables. A resource named database provides DATABASE_URL, DATABASE_HOST, DATABASE_USER, and related variables. This eliminates entire categories of misconfiguration where someone manually set the wrong database URL in production. The Resource documentation covers all available types and options.

Services Section

Each service entry documents how that service is built, what command it runs, what resources it can access, and how it should scale. The web service shows CPU and memory allocation, autoscaling rules, and deployment rollout configuration. The scale.targets.cpu: 70 line documents that the service will scale up when CPU usage exceeds 70 percent. No need to dig through an AWS auto-scaling group to find this information.

The health: /health line documents the health check endpoint. This tells both the platform and future engineers exactly how service health is verified. The Health Checks documentation explains additional configuration options including grace periods, intervals, and timeouts.

Notice that the worker service has no port attribute. This documents that workers process background jobs and do not serve HTTP traffic. The configuration makes architectural decisions explicit.

Timers Section

The timers section documents scheduled jobs. Instead of cron configurations scattered across servers or Lambda functions defined in a separate Terraform repository, scheduled work is declared right alongside the services that run it. The daily-report timer runs at 6 AM UTC every day. The cleanup timer runs at 3 AM UTC.

For compliance, this documents your data lifecycle practices. You can point an auditor to this file and show exactly when old records are cleaned up. See Timer for the complete timer reference.

Version Control as Audit Trail

Because convox.yml lives in your repository, every change is tracked through Git. When the auditor asks "how did your infrastructure change over the past year?", you run git log convox.yml and show them the complete history. Each commit includes who made the change, when, and ideally why (in the commit message).

This is dramatically better than trying to reconstruct change history from CloudTrail logs, Terraform state files, and half-remembered conversations. The audit trail is built into your normal development workflow. No extra tooling required.

When you deploy with convox deploy, the platform creates a Release that captures the exact build and configuration. You can see every release with convox releases and roll back to any previous release instantly. This gives you both the configuration change history in Git and the deployment history in Convox.

Onboarding in Minutes Instead of Weeks

Consider what team onboarding looks like without manifest-based configuration. A new engineer joins the team. Over the next two weeks, they slowly piece together how the system works by reading scattered documentation, asking questions in Slack, and making mistakes in staging. They learn that the worker service needs a specific environment variable that is not documented anywhere. They discover that the health check timeout was changed six months ago but the wiki still shows the old value.

Now consider onboarding with convox.yml. The new engineer reads one file and understands the entire system topology. They see which services exist, how they connect to databases, how they scale, and what health checks they use. The configuration is guaranteed to match production because it is what runs production.

This is not just about efficiency. This is about reducing operational risk. When tribal knowledge is required to operate systems safely, you have a single point of failure. When configuration is declarative and version-controlled, any engineer can understand and operate the system.

Comparison: Traditional Infrastructure vs. Manifest-Based

The difference becomes stark when you compare what an auditor would need to review:

Documentation Need Traditional Approach Convox Approach
Service topology Kubernetes manifests across multiple files, possibly multiple repos services: section in convox.yml
Database configuration Terraform modules, RDS console, connection strings in secrets resources: section in convox.yml
Scaling rules HPA manifests, AWS auto-scaling groups, possibly Karpenter configs scale: block per service
Environment variables Kubernetes Secrets, ConfigMaps, CI/CD vars, Parameter Store environment: declarations
Scheduled jobs Kubernetes CronJobs, Lambda + EventBridge, crontab on EC2 timers: section in convox.yml
Change history CloudTrail, multiple Git repos, manual documentation git log convox.yml

The traditional approach requires expertise in multiple tools to understand how things are configured. The manifest-based approach requires reading one file.

Compliance-Ready by Default

For organizations pursuing SOC 2, HIPAA, or other compliance certifications, convox.yml provides evidence for several common control requirements. Configuration management controls ask how you ensure systems are configured consistently. Answer: the convox.yml defines the configuration, and deployment applies it automatically. Change management controls ask how changes are approved and tracked. Answer: changes go through pull request review and are tracked in Git history.

Convox also supports BYOC (Bring Your Own Cloud) deployment, meaning your infrastructure runs in your own AWS, GCP, or Azure account. For healthcare companies, this means protected health information never leaves infrastructure you control. You can point auditors to your own CloudTrail logs, your own VPC configuration, and your own encryption keys. The platform provides developer experience while you maintain infrastructure ownership.

The convox.yml reference documents every available configuration option. Review it to understand the full scope of what you can define declaratively.

From Tribal Knowledge to Shared Understanding

The next time someone asks "how is production configured?", you have a single answer. Not a wiki page that might be outdated. Not a collection of Terraform modules that cover some resources but not others. Not "ask Marcus when he is back from vacation." One file that is version-controlled, human-readable, and actually runs your production systems.

Infrastructure knowledge sharing does not require better documentation practices. It requires infrastructure that documents itself.

Get Started

Ready to replace scattered tribal knowledge with a single source of truth? The Getting Started Guide walks you through installation and deploying your first application. Review the convox.yml reference to see the full range of configuration options available.

Create a free account and deploy your first application in minutes. For compliance requirements or questions about HIPAA-ready deployments, reach out to our team.

Let your team focus on what matters.