Your compliance team said no to OpenAI. They said no to Anthropic. They said no to every managed inference API that routes patient data through someone else's infrastructure. And they were right to say no. When protected health information leaves your environment, you inherit the compliance posture of every vendor in the chain. For healthcare organizations building clinical decision support tools, document processing systems, or risk stratification models, that chain of custody problem is not a minor inconvenience. It is a deal-breaker.
But here is the thing: the models powering those managed APIs are often open-weight models you can run yourself. Llama, Mistral, and dozens of specialized medical and document processing models are available for self-hosted deployment. The question is not whether you can avoid third-party APIs. The question is whether you can deploy these models on infrastructure you control, with the audit trail your auditors actually want, without hiring a dedicated MLOps team to make it happen.
That is the problem Convox's Model Deploy solves. It gives you a guided path from model selection to production deployment, running entirely on your own AWS account, with the release tracking, access controls, and audit logging that HIPAA technical safeguards require.
When you call a managed inference API, your data leaves your environment. The prompt containing patient information, the clinical note being summarized, the medical image being analyzed: all of it travels to infrastructure you do not control, gets processed by systems you cannot audit, and passes through logging pipelines you cannot inspect.
For healthcare organizations subject to HIPAA, this creates several concrete problems:
Business Associate Agreements are necessary but not sufficient. A BAA from an AI vendor covers what happens if they mishandle your data. It does not give you visibility into how your data is actually processed, who has access to their systems, or whether their subprocessors meet your security requirements.
Audit trails stop at the API boundary. When your auditor asks "show me everywhere this patient's data was accessed," you can show them your application logs. But the processing that happened inside the vendor's inference system? That is a black box. You cannot produce logs you do not have.
Data residency requirements may not align. If your organization operates under state-specific data residency rules or federal requirements for specific AWS regions, managed inference APIs may not offer the geographic controls you need.
Vendor dependency creates compliance risk. When a vendor changes their terms of service, deprecates an API, or gets acquired, your compliance posture changes with them. Self-hosted infrastructure means your compliance controls stay under your control.
The alternative is not to abandon AI capabilities. The alternative is to deploy the same models on infrastructure you own, where data never leaves your VPC and where every interaction is logged in systems you control.
Convox operates on a Bring Your Own Cloud model. When you install a Convox Rack, you are deploying a Kubernetes-based platform into your own AWS account. The infrastructure lives in your VPC. The container registry lives in your ECR. The logs flow to your CloudWatch. Convox provides the orchestration layer, but the compute, storage, and networking all belong to you.
For HIPAA-compliant ML inference, this architecture provides several advantages that managed services cannot match:
Protected health information never leaves your AWS account. When you deploy a model through Convox, the inference requests and responses stay within your VPC. No PHI traverses the public internet to reach a third-party API. No prompts are stored in vendor logging systems you cannot inspect.
You control the network boundary. Deploy models as Private services, and they are accessible only within your Rack network. No internet-facing endpoints, no public load balancers, no external attack surface. Internal applications can reach the model; nothing else can.
AWS compliance inheritance applies. Your Rack runs on AWS infrastructure that is itself HIPAA-eligible. When you sign a BAA with AWS, your Convox-managed workloads are covered under that agreement. You are not adding a third-party processor to your compliance scope.
GovCloud regions are fully supported. For federal healthcare workloads or organizations with FedRAMP requirements, Convox Racks can be installed in AWS GovCloud regions. The same deployment workflow, the same Model Deploy wizard, the same audit capabilities, running in government-certified infrastructure.
This is the fundamental difference between managed inference and BYOC deployment. With managed services, you trust the vendor to handle your data correctly. With BYOC, you verify compliance through controls you operate directly.
The Model Deploy wizard in the Convox Console provides a guided path for deploying inference models. It handles the operational complexity of GPU node provisioning, container image management, and service configuration while creating standard Convox primitives that integrate with your existing compliance workflows.
When you deploy a model through the wizard, here is what actually happens:
GPU readiness verification. Before showing you the model catalog, the wizard checks that your Rack has the required GPU infrastructure. If you need to enable the NVIDIA device plugin or configure Karpenter for GPU node provisioning, the wizard provides the exact commands.
Template or custom model selection. The catalog includes pre-configured templates for popular models across LLM serving, speech recognition, image generation, and embedding categories. For models not in the catalog, you can paste any HuggingFace model ID and the wizard auto-detects the appropriate serving framework, GPU requirements, and configuration.
App creation and image import. The wizard creates a standard Convox App and imports the container image into your Rack's private ECR registry. No external image pulls at runtime, no dependency on public registries for production workloads.
Release creation with full audit trail. Every deployment creates a tracked Release with associated build logs, environment configuration, and deployment timestamps. When your auditor asks "what was running on this date," you can answer with a specific release ID and its complete configuration.
The output is not a special ML deployment type. It is a regular Convox App with services, environment variables, and the same management commands you use for any other application. The wizard generates a convox.yml manifest like this:
services:
inference:
image: vllm/vllm-openai:latest
port: 8000
internal: true
scale:
cpu: 4000
memory: 32768
gpu:
count: 1
vendor: nvidia
environment:
- MODEL_NAME
- HUGGING_FACE_HUB_TOKEN
health:
path: /health
interval: 30
timeout: 10
Note the internal: true setting. This is how you deploy models as Private services that are not exposed to the internet.
For healthcare organizations handling PHI, the Private access option is often the only acceptable deployment model. When you configure a model as Private through Model Deploy, the service is accessible only within your Rack's internal network.
Here is what Private access means in practice:
No internet-facing load balancer. The service gets an internal hostname that resolves only within your VPC. External traffic cannot reach the model because there is no public endpoint to reach.
Service-to-service communication within the Rack. Your application services can call the model using the internal service discovery hostname. For a service named inference in an app named clinical-ai, the internal endpoint would be https://inference.clinical-ai.{rack}.local. See Service Discovery for the full hostname format.
Console Playground access without external exposure. The Model Deploy wizard includes a built-in Playground for testing deployed models. When you access the Playground through the Convox Console, requests are proxied through the Console's authenticated session. The model itself remains internal. You can test inference without exposing any endpoint to the internet.
Local development access via convox proxy. Developers who need to test against the deployed model can use convox proxy to tunnel traffic from their local machine through the Rack. The connection is authenticated through the developer's Convox credentials and encrypted in transit.
$ convox proxy inference:8000 -a clinical-ai
Proxying localhost:8000 to inference:8000
This gives you a secure development workflow where PHI never leaves the AWS account, but developers can still iterate on model integration from their local environments.
HIPAA's Security Rule requires covered entities to implement technical safeguards protecting ePHI. Convox's architecture provides the mechanisms to meet these requirements for ML inference workloads. Here is how the pieces map:
The Security Rule requires unique user identification and procedures for obtaining ePHI access. Convox provides multiple layers of access control:
Console RBAC. The Convox Console supports role-based access control with predefined roles (Administrator, Operator, Developer) and custom role creation. You can restrict who can deploy models, view logs, or access the Playground.
API key authentication for Public models. If you do deploy a model with internet access, Model Deploy supports API key authentication. Requests without a valid key are rejected at the inference endpoint.
Network-level isolation for Private models. The simplest access control is having no public endpoint. Private models cannot be accessed from outside your VPC regardless of credentials.
The Security Rule requires mechanisms recording and examining activity in systems containing ePHI. Convox provides comprehensive audit logging:
Release history. Every deployment creates a Release with a unique ID, timestamp, build reference, and environment configuration. You can reconstruct exactly what was running at any point in time.
Application logs. All application output is captured and available through convox logs or forwarded to your syslog endpoint. Inference requests, responses, and errors are logged through your application's standard logging.
Console audit log. Administrative actions in the Convox Console are logged with timestamps and user attribution. When someone promotes a release or changes environment variables, that action is recorded.
The Security Rule requires guarding against unauthorized access to ePHI transmitted over networks. Convox handles encryption in transit automatically:
TLS everywhere. All service endpoints use HTTPS. Convox provisions certificates through Let's Encrypt for public endpoints and self-signed certificates for internal services. Traffic between your application and the inference model is encrypted in transit.
Internal service mesh encryption. Traffic between services within the Rack uses encrypted connections. PHI moving from your application to the inference endpoint is protected even within your VPC.
The Security Rule requires policies protecting ePHI from improper alteration or destruction. Convox's release-based deployment model supports integrity:
Immutable releases. Once a release is created, its configuration is fixed. You cannot modify a deployed release; you promote a new one. This creates an immutable audit trail of every configuration that ever ran.
Rollback capability. If a deployment introduces issues, you can roll back to a previous known-good release. The rollback creates a new release referencing the previous configuration, maintaining the audit trail.
Before deploying ML inference for healthcare workloads, walk through this checklist to ensure your environment meets compliance requirements:
| Requirement | How to Verify | Convox Feature |
|---|---|---|
| AWS BAA in place | Confirm your AWS account has an active Business Associate Agreement | BYOC architecture |
| VPC isolation | Rack deployed in dedicated VPC with appropriate security groups | Rack install parameters |
| Private deployment | internal: true in convox.yml |
Model Deploy wizard |
| Encryption at rest | EBS volumes and ECR images use AWS-managed encryption | AWS default encryption |
| Encryption in transit | All endpoints use HTTPS, internal services use TLS | Automatic TLS provisioning |
| Access logging | Application logs forwarded to your SIEM or CloudWatch | Syslog integration |
| User authentication | Console access requires authentication, RBAC configured | Console RBAC |
| Release tracking | Every deployment creates immutable release record | Release primitives |
| GovCloud (if required) | Rack installed in AWS GovCloud region | GovCloud support |
If your organization is ready to explore private ML infrastructure for healthcare workloads, here is a practical path to a pilot deployment:
1. Deploy a Rack in your AWS account. If you do not have a Convox Rack yet, the Getting Started Guide walks through installation. For HIPAA workloads, ensure your AWS account has a BAA in place before deploying production workloads.
2. Configure GPU infrastructure. Model Deploy requires GPU nodes for inference. The wizard guides you through enabling the NVIDIA device plugin and configuring node provisioning through Karpenter or managed node groups. See Workload Placement for GPU configuration options.
3. Deploy a test model as Private. Use the Model Deploy wizard to deploy a model from the catalog with Private access. Start with something simple like a text embedding model before moving to larger LLMs.
4. Validate internal access. Use the Console Playground to test the model. Verify that convox proxy works for local development access. Confirm the model is not reachable from outside your VPC.
5. Integrate with a test application. Deploy a simple application that calls the inference model using the internal service discovery hostname. Verify end-to-end functionality without exposing PHI.
6. Document for your compliance team. Walk through the checklist above with your compliance officer. Show them the release history, the access controls, and the network isolation. Give them the audit trail they need to approve production use.
Private ML infrastructure is not a future capability you need to wait for. The models are available today. The deployment tooling exists today. The compliance architecture that keeps PHI in your environment while giving you the audit trail your auditors require is ready now.
Create a free Convox account and deploy your first Rack. The Model Deploy documentation walks through the wizard step by step. For organizations with existing compliance requirements or questions about GovCloud deployments, contact our team to discuss your architecture.
Your compliance team said no to external AI APIs for good reasons. Give them a yes they can approve: the same models, running on infrastructure you control, with the audit trail that makes compliance straightforward instead of a constant negotiation.