HIPAA-Ready Inference: Model Deploy as Your Compliance Foundation

Your compliance team said no to OpenAI. They said no to Anthropic. They said no to every managed inference API that routes patient data through someone else's infrastructure. And they were right to say no. When protected health information leaves your environment, you inherit the compliance posture of every vendor in the chain. For healthcare organizations building clinical decision support tools, document processing systems, or risk stratification models, that chain of custody problem is not a minor inconvenience. It is a deal-breaker.

But here is the thing: the models powering those managed APIs are often open-weight models you can run yourself. Llama, Mistral, and dozens of specialized medical and document processing models are available for self-hosted deployment. The question is not whether you can avoid third-party APIs. The question is whether you can deploy these models on infrastructure you control, with the audit trail your auditors actually want, without hiring a dedicated MLOps team to make it happen.

That is the problem Convox's Model Deploy solves. It gives you a guided path from model selection to production deployment, running entirely on your own AWS account, with the release tracking, access controls, and audit logging that HIPAA technical safeguards require.

The Data Sovereignty Problem with Managed Inference

When you call a managed inference API, your data leaves your environment. The prompt containing patient information, the clinical note being summarized, the medical image being analyzed: all of it travels to infrastructure you do not control, gets processed by systems you cannot audit, and passes through logging pipelines you cannot inspect.

For healthcare organizations subject to HIPAA, this creates several concrete problems:

Business Associate Agreements are necessary but not sufficient. A BAA from an AI vendor covers what happens if they mishandle your data. It does not give you visibility into how your data is actually processed, who has access to their systems, or whether their subprocessors meet your security requirements.

Audit trails stop at the API boundary. When your auditor asks "show me everywhere this patient's data was accessed," you can show them your application logs. But the processing that happened inside the vendor's inference system? That is a black box. You cannot produce logs you do not have.

Data residency requirements may not align. If your organization operates under state-specific data residency rules or federal requirements for specific AWS regions, managed inference APIs may not offer the geographic controls you need.

Vendor dependency creates compliance risk. When a vendor changes their terms of service, deprecates an API, or gets acquired, your compliance posture changes with them. Self-hosted infrastructure means your compliance controls stay under your control.

The alternative is not to abandon AI capabilities. The alternative is to deploy the same models on infrastructure you own, where data never leaves your VPC and where every interaction is logged in systems you control.

BYOC Architecture: The Foundation for Compliant ML Inference

Convox operates on a Bring Your Own Cloud model. When you install a Convox Rack, you are deploying a Kubernetes-based platform into your own AWS account. The infrastructure lives in your VPC. The container registry lives in your ECR. The logs flow to your CloudWatch. Convox provides the orchestration layer, but the compute, storage, and networking all belong to you.

For HIPAA-compliant ML inference, this architecture provides several advantages that managed services cannot match:

Protected health information never leaves your AWS account. When you deploy a model through Convox, the inference requests and responses stay within your VPC. No PHI traverses the public internet to reach a third-party API. No prompts are stored in vendor logging systems you cannot inspect.

You control the network boundary. Deploy models as Private services, and they are accessible only within your Rack network. No internet-facing endpoints, no public load balancers, no external attack surface. Internal applications can reach the model; nothing else can.

AWS compliance inheritance applies. Your Rack runs on AWS infrastructure that is itself HIPAA-eligible. When you sign a BAA with AWS, your Convox-managed workloads are covered under that agreement. You are not adding a third-party processor to your compliance scope.

GovCloud regions are fully supported. For federal healthcare workloads or organizations with FedRAMP requirements, Convox Racks can be installed in AWS GovCloud regions. The same deployment workflow, the same Model Deploy wizard, the same audit capabilities, running in government-certified infrastructure.

This is the fundamental difference between managed inference and BYOC deployment. With managed services, you trust the vendor to handle your data correctly. With BYOC, you verify compliance through controls you operate directly.

Model Deploy: From Selection to Production in Minutes

The Model Deploy wizard in the Convox Console provides a guided path for deploying inference models. It handles the operational complexity of GPU node provisioning, container image management, and service configuration while creating standard Convox primitives that integrate with your existing compliance workflows.

When you deploy a model through the wizard, here is what actually happens:

GPU readiness verification. Before showing you the model catalog, the wizard checks that your Rack has the required GPU infrastructure. If you need to enable the NVIDIA device plugin or configure Karpenter for GPU node provisioning, the wizard provides the exact commands.

Template or custom model selection. The catalog includes pre-configured templates for popular models across LLM serving, speech recognition, image generation, and embedding categories. For models not in the catalog, you can paste any HuggingFace model ID and the wizard auto-detects the appropriate serving framework, GPU requirements, and configuration.

App creation and image import. The wizard creates a standard Convox App and imports the container image into your Rack's private ECR registry. No external image pulls at runtime, no dependency on public registries for production workloads.

Release creation with full audit trail. Every deployment creates a tracked Release with associated build logs, environment configuration, and deployment timestamps. When your auditor asks "what was running on this date," you can answer with a specific release ID and its complete configuration.

The output is not a special ML deployment type. It is a regular Convox App with services, environment variables, and the same management commands you use for any other application. The wizard generates a convox.yml manifest like this:

services:
  inference:
    image: vllm/vllm-openai:latest
    port: 8000
    internal: true
    scale:
      cpu: 4000
      memory: 32768
      gpu:
        count: 1
        vendor: nvidia
    environment:
      - MODEL_NAME
      - HUGGING_FACE_HUB_TOKEN
    health:
      path: /health
      interval: 30
      timeout: 10

Note the internal: true setting. This is how you deploy models as Private services that are not exposed to the internet.

Private Access: Models That Never Touch the Internet

For healthcare organizations handling PHI, the Private access option is often the only acceptable deployment model. When you configure a model as Private through Model Deploy, the service is accessible only within your Rack's internal network.

Here is what Private access means in practice:

No internet-facing load balancer. The service gets an internal hostname that resolves only within your VPC. External traffic cannot reach the model because there is no public endpoint to reach.

Service-to-service communication within the Rack. Your application services can call the model using the internal service discovery hostname. For a service named inference in an app named clinical-ai, the internal endpoint would be https://inference.clinical-ai.{rack}.local. See Service Discovery for the full hostname format.

Console Playground access without external exposure. The Model Deploy wizard includes a built-in Playground for testing deployed models. When you access the Playground through the Convox Console, requests are proxied through the Console's authenticated session. The model itself remains internal. You can test inference without exposing any endpoint to the internet.

Local development access via convox proxy. Developers who need to test against the deployed model can use convox proxy to tunnel traffic from their local machine through the Rack. The connection is authenticated through the developer's Convox credentials and encrypted in transit.

$ convox proxy inference:8000 -a clinical-ai
Proxying localhost:8000 to inference:8000

This gives you a secure development workflow where PHI never leaves the AWS account, but developers can still iterate on model integration from their local environments.

Mapping to HIPAA Technical Safeguards

HIPAA's Security Rule requires covered entities to implement technical safeguards protecting ePHI. Convox's architecture provides the mechanisms to meet these requirements for ML inference workloads. Here is how the pieces map:

Access Controls

The Security Rule requires unique user identification and procedures for obtaining ePHI access. Convox provides multiple layers of access control:

Console RBAC. The Convox Console supports role-based access control with predefined roles (Administrator, Operator, Developer) and custom role creation. You can restrict who can deploy models, view logs, or access the Playground.

API key authentication for Public models. If you do deploy a model with internet access, Model Deploy supports API key authentication. Requests without a valid key are rejected at the inference endpoint.

Network-level isolation for Private models. The simplest access control is having no public endpoint. Private models cannot be accessed from outside your VPC regardless of credentials.

Audit Controls

The Security Rule requires mechanisms recording and examining activity in systems containing ePHI. Convox provides comprehensive audit logging:

Release history. Every deployment creates a Release with a unique ID, timestamp, build reference, and environment configuration. You can reconstruct exactly what was running at any point in time.

Application logs. All application output is captured and available through convox logs or forwarded to your syslog endpoint. Inference requests, responses, and errors are logged through your application's standard logging.

Console audit log. Administrative actions in the Convox Console are logged with timestamps and user attribution. When someone promotes a release or changes environment variables, that action is recorded.

Transmission Security

The Security Rule requires guarding against unauthorized access to ePHI transmitted over networks. Convox handles encryption in transit automatically:

TLS everywhere. All service endpoints use HTTPS. Convox provisions certificates through Let's Encrypt for public endpoints and self-signed certificates for internal services. Traffic between your application and the inference model is encrypted in transit.

Internal service mesh encryption. Traffic between services within the Rack uses encrypted connections. PHI moving from your application to the inference endpoint is protected even within your VPC.

Integrity Controls

The Security Rule requires policies protecting ePHI from improper alteration or destruction. Convox's release-based deployment model supports integrity:

Immutable releases. Once a release is created, its configuration is fixed. You cannot modify a deployed release; you promote a new one. This creates an immutable audit trail of every configuration that ever ran.

Rollback capability. If a deployment introduces issues, you can roll back to a previous known-good release. The rollback creates a new release referencing the previous configuration, maintaining the audit trail.

Compliance Readiness Checklist

Before deploying ML inference for healthcare workloads, walk through this checklist to ensure your environment meets compliance requirements:

Requirement	How to Verify	Convox Feature
AWS BAA in place	Confirm your AWS account has an active Business Associate Agreement	BYOC architecture
VPC isolation	Rack deployed in dedicated VPC with appropriate security groups	Rack install parameters
Private deployment	`internal: true` in convox.yml	Model Deploy wizard
Encryption at rest	EBS volumes and ECR images use AWS-managed encryption	AWS default encryption
Encryption in transit	All endpoints use HTTPS, internal services use TLS	Automatic TLS provisioning
Access logging	Application logs forwarded to your SIEM or CloudWatch	Syslog integration
User authentication	Console access requires authentication, RBAC configured	Console RBAC
Release tracking	Every deployment creates immutable release record	Release primitives
GovCloud (if required)	Rack installed in AWS GovCloud region	GovCloud support

Starting a Compliant Model Deploy Pilot

If your organization is ready to explore private ML infrastructure for healthcare workloads, here is a practical path to a pilot deployment:

1. Deploy a Rack in your AWS account. If you do not have a Convox Rack yet, the Getting Started Guide walks through installation. For HIPAA workloads, ensure your AWS account has a BAA in place before deploying production workloads.

2. Configure GPU infrastructure. Model Deploy requires GPU nodes for inference. The wizard guides you through enabling the NVIDIA device plugin and configuring node provisioning through Karpenter or managed node groups. See Workload Placement for GPU configuration options.

3. Deploy a test model as Private. Use the Model Deploy wizard to deploy a model from the catalog with Private access. Start with something simple like a text embedding model before moving to larger LLMs.

4. Validate internal access. Use the Console Playground to test the model. Verify that convox proxy works for local development access. Confirm the model is not reachable from outside your VPC.

5. Integrate with a test application. Deploy a simple application that calls the inference model using the internal service discovery hostname. Verify end-to-end functionality without exposing PHI.

6. Document for your compliance team. Walk through the checklist above with your compliance officer. Show them the release history, the access controls, and the network isolation. Give them the audit trail they need to approve production use.

Get Started

Private ML infrastructure is not a future capability you need to wait for. The models are available today. The deployment tooling exists today. The compliance architecture that keeps PHI in your environment while giving you the audit trail your auditors require is ready now.

Create a free Convox account and deploy your first Rack. The Model Deploy documentation walks through the wizard step by step. For organizations with existing compliance requirements or questions about GovCloud deployments, contact our team to discuss your architecture.

Your compliance team said no to external AI APIs for good reasons. Give them a yes they can approve: the same models, running on infrastructure you control, with the audit trail that makes compliance straightforward instead of a constant negotiation.