← All engineering posts

HIPAA Compliant AWS Architecture for Medical Device Software

By Emil Tejeda

HIPAA Compliant AWS Architecture for Medical Device Software

Signing the AWS Business Associate Addendum takes about three clicks in the console. That's where most "HIPAA on AWS" guides stop. For ContourCompanion, a Class II AI SaMD processing DICOM CT data to generate radiation therapy contours, the BAA was the starting line. The actual work was building an architecture where every service that touches Protected Health Information is HIPAA-eligible, every data path is encrypted, every access is logged, and every infrastructure decision maps to a specific regulatory requirement.

The FDA has not issued standalone guidance on cloud computing for medical devices. Cloud architecture is addressed indirectly through the September 2023 cybersecurity guidance, the OTS software guidance, and AAMI/CR510:2021 (Appropriate Use of Public Cloud Computing for Quality Systems and Medical Devices). That last one is the closest thing to an industry standard for cloud-hosted medical devices, and FDA's cybersecurity guidance references it directly. The regulatory picture is assembled from pieces, not read from a single document.

This post covers how we assembled those pieces into a HIPAA compliant AWS architecture for a production SaMD.

AWS Multi-Account Strategy with Control Tower

A single AWS account for everything is how teams get into trouble. One overprivileged IAM user, one misconfigured resource policy, and the blast radius is the entire workload: production data, audit logs, infrastructure pipelines, billing. For a medical device, that blast radius includes PHI and FDA submission evidence.

AWS Control Tower provisions a multi-account landing zone with guardrails baked in from day one. We stood up the landing zone with five accounts, each with a distinct security boundary:

AccountPurposeKey Controls
ManagementOrganizations root, Control Tower, billingNo workloads, no PHI. SCPs applied here propagate to all child accounts
ProductionECS workloads (EC2 GPU), RDS, S3 DICOM stagingBAA signed, HIPAA-eligible services only for PHI paths
StagingPre-production environment, integration testingMirrors production config, synthetic data only (no PHI)
Log ArchiveCloudTrail logs, VPC Flow Logs, Config snapshotsS3 Object Lock Compliance Mode, no human write access
AuditAWS Config aggregator, Security Hub, Audit ManagerRead-only cross-account access to all accounts

Control Tower deploys Service Control Policies (SCPs) across the organization. SCPs are permission boundaries that no IAM policy can override. We added custom SCPs on top of the Control Tower defaults:

  • Deny non-HIPAA-eligible services in the production account. If an engineer tries to create a Lightsail instance or a WorkSpaces desktop in the production account, the API call fails at the organization level. The IAM policy is irrelevant.
  • Deny CloudTrail modifications in all accounts. Nobody, including account administrators, can stop the trail, delete log files, or disable integrity validation.
  • Deny S3 public access organization-wide. The S3 Block Public Access setting is enforced by SCP, not just by bucket policy.
  • Restrict regions to us-east-2 (primary) and us-west-2 (DR). No resources can be created in other regions. This prevents shadow infrastructure and simplifies compliance scope.

The separation between the production account and the log archive account is the most important boundary. If the production account is fully compromised, the attacker cannot reach the audit trail. The log archive account has its own root credentials, its own IAM boundaries, and SCPs that deny all write operations except from the CloudTrail service principal. The only humans with access to the log archive are auditors operating through Identity Center with read-only permission sets.

IAM Identity Center (SSO)

There are no IAM users anywhere in the organization. Zero. Every human authenticates through IAM Identity Center (formerly AWS SSO), which provides centralized authentication with MFA enforcement, temporary credentials for every session, and permission sets that map to specific accounts and roles.

Permission sets replace the traditional model of long-lived IAM access keys. When an engineer needs to access the production account, they authenticate through Identity Center, select the account and permission set, and receive temporary credentials that expire after one hour. No access keys stored in ~/.aws/credentials. No shared accounts. Every API call in CloudTrail traces back to a named individual, not a generic service account.

We defined four permission sets:

Permission SetScopeAccess Level
DeveloperStaging accountRead/write to ECS, ECR, S3, CloudWatch. No RDS admin, no IAM changes
OperatorProduction accountRead-only. Can view logs, metrics, task status. Cannot modify resources
BreakGlassProduction accountTime-limited admin. Requires MFA re-challenge. Triggers instant alarm
AuditorLog Archive + Audit accountsRead-only. CloudTrail queries, Config rules, Security Hub findings

The break-glass permission set deserves its own section (below). The key property of Identity Center is that it eliminates the class of credentials that cause the majority of AWS security incidents: long-lived access keys committed to Git, shared between team members, or left active after an employee departs.

For 21 CFR Part 11 §11.10(d), Identity Center's audit trail of every authentication event, combined with CloudTrail's record of every API call, provides the "limiting system access to authorized individuals" evidence that FDA reviewers expect. The mapping is direct: named person authenticates, receives scoped temporary credentials, performs actions logged with their identity.

The AWS Shared Responsibility Model for Medical Devices

The AWS Shared Responsibility Model divides security obligations between AWS and the customer. AWS secures the cloud itself: physical data centers, network infrastructure, hypervisor, host operating system. The customer secures everything in the cloud: application code, IAM configuration, encryption settings, network rules, and all compliance controls specific to the workload.

For medical devices, this model has a regulatory layer on top. The FDA holds the device manufacturer responsible for the safety and security of the entire system, including the cloud infrastructure. AWS doesn't appear in your 510(k) submission as a responsible party. You do. If a misconfigured security group exposes PHI, the FDA views that as a design defect in your device, not an AWS problem. The GxP Systems on AWS whitepaper frames it as "security and quality OF the cloud" (AWS) versus "security and quality IN the cloud" (customer).

What shifts when the workload is a medical device:

LayerAWS ResponsibilityYour Responsibility
Physical infrastructureData center security, hardware, networkingNone
Platform services (RDS, ECS)OS patching, platform availabilityData encryption, access control, backup config
Compute (EC2, Fargate)Hypervisor, Firecracker isolationApplication security, container image hardening
ApplicationNothingEverything: code, DICOM handling, inference, PHI controls
Compliance evidenceSOC 2, ISO 27001, HITRUST via ArtifactFDA submission, 21 CFR Part 11, risk management, IQ/OQ

The compliance evidence row is where teams miscalculate. AWS provides SOC 2 Type II reports, ISO 27001 certificates, and HITRUST CSF certifications through AWS Artifact, all free to download. Those documents prove AWS meets its half of the shared responsibility model. They don't prove your application is compliant. Your FDA submission needs to show how you configured those services, not just that AWS passed its audit. We used the Artifact reports as supplier qualification evidence for the ISO 14971 risk management file, feeding them into the risk assessment of cloud infrastructure as a SOUP item under IEC 62304.

HIPAA Eligible Services: What the BAA Actually Covers

AWS offers 166+ HIPAA-eligible services. The BAA, executed through AWS Artifact, establishes AWS as a Business Associate under HIPAA. But the BAA only covers services on the eligible list. You can use any AWS service in a HIPAA-designated account, but you may only process, store, or transmit PHI through HIPAA-eligible services. The SCP we described above enforces this at the organization level, making it impossible to accidentally provision a non-eligible service in the production account.

For ContourCompanion, the core services handling PHI:

ServicePHI RoleWhy This One
ECS on EC2 (GPU)Runs inference containers on G4dn/G5 instancesGPU acceleration for DICOM segmentation, coordinator-managed scaling
ECS on FargateWeb UI, upload service, distributorFirecracker micro-VM isolation, zero OS patching
ECRStores container imagesImmutable tags, image scanning on push
RDS (PostgreSQL)Patient metadata, OAR configurationsKMS encryption, Multi-AZ, automated backups
S3DICOM staging, audit log archiveSSE-KMS, Object Lock, cross-region replication
ALBTLS termination for inbound DICOMACM-managed certificates, TLS 1.2+ enforced
WAFApplication-layer protectionOWASP managed rules, rate limiting, geo-restriction
Route 53DNS with health checksFailover routing to DR region
KMSEncryption key managementCustomer-managed keys for all PHI stores
CloudTrailAPI audit loggingSHA-256 integrity, org-wide trail to log archive account
CloudWatchOperational metrics and alarmsBreak-glass alerts, no PHI in metrics
Secrets ManagerDatabase credentials, API keysKMS encryption mandatory, auto-rotation
Systems ManagerBreak-glass session accessKeystroke logging to log archive, no SSH

ContourCompanion HIPAA compliant AWS architecture diagram showing multi-account landing zone with Control Tower, Identity Center, production VPC with ECS on EC2 GPU instances and Fargate, RDS Multi-AZ, WAF, Route 53 failover, break-glass SSM access, cross-region DR, and separate log archive account with immutable CloudTrail storage

The compute layer is split by workload type. The web UI, upload service, and distributor run on ECS Fargate. Fargate uses Firecracker micro-VMs with dedicated CPU, memory, and network per task. No shared infrastructure, no OS patches, no host management. For services that handle request routing and file staging, Fargate's simplicity is the right trade-off.

Inference is a different story. Autocontouring CT volumes requires GPU acceleration. Running that on Fargate would mean paying for on-demand GPU capacity at Fargate's pricing premium, which is brutal at GPU tier. We built a coordinator service that manages an EC2 Auto Scaling group of G4dn/G5 GPU instances. The coordinator watches the inference queue depth and historical load patterns (radiation oncology departments have predictable scheduling: weekday mornings are peak, nights and weekends are near-zero). It scales the GPU fleet up before the morning rush and back down to a minimum capacity overnight. This cut GPU compute costs by roughly 60% compared to keeping instances running 24/7 or paying Fargate's on-demand premium.

The coordinator runs on Fargate itself (it's a lightweight service, no GPU needed) and uses ECS capacity providers to manage the EC2 fleet. When a new DICOM study arrives, the coordinator checks available GPU capacity, spins up additional instances if the queue is backing up, and places the inference task on the next available GPU host. The EC2 instances run Amazon ECS-optimized AMIs with the NVIDIA GPU driver baked in, hardened with CIS benchmarks, and encrypted EBS volumes. OS patching on these instances is automated through SSM Patch Manager on a weekly cadence, with patching windows aligned to low-traffic periods.

EKS would have given us more scheduling flexibility for the GPU workloads, but added Kubernetes operational overhead that a startup team doesn't need. ECS capacity providers gave us the same bin-packing and scaling behavior with a fraction of the configuration surface. The regulatory isolation properties were equivalent. The time saved on Kubernetes operations went directly to IEC 62304 compliance work.

Encryption Architecture for PHI

HIPAA requires encryption of PHI both at rest and in transit. The HIPAA Security Rule doesn't specify encryption algorithms, but the FDA's cybersecurity guidance expects current standards. We used AES-256 for data at rest and TLS 1.2+ for data in transit.

At rest

Every service storing data got a customer-managed KMS key. Not AWS-managed keys, not default encryption. Customer-managed keys give you control over key rotation policy, key deletion scheduling, and CloudTrail logging of every key usage event. When an auditor asks "who accessed the encryption key protecting patient data on this date," the answer is a CloudTrail query, not a shrug.

The encryption configuration:

  • RDS PostgreSQL: Encrypted at instance creation with a dedicated KMS key. You cannot enable encryption on an existing unencrypted RDS instance. Get this right on day one.
  • S3 buckets: SSE-KMS with bucket policies that deny any PutObject request missing server-side encryption headers. Belt and suspenders.
  • EBS volumes: Account-wide default encryption enabled. Every volume, every region, no exceptions.
  • Secrets Manager: KMS encryption is mandatory, you can't turn it off. Credentials auto-rotate on a schedule. Database connection strings never appear in environment variables or config files.

In transit

TLS 1.2 minimum on every network path that carries PHI. The ALB terminates TLS with ACM-managed certificates. Internal service-to-service communication within the VPC uses TLS as well. VPC endpoints provide private connectivity to AWS services (S3, KMS, ECR) without traversing the public internet.

One detail that catches teams: DICOM communication. The DICOM standard predates TLS. Traditional DICOM uses unencrypted TCP on port 104. ContourCompanion's DICOM ingestion pipeline wrapped all DICOM traffic in TLS-encrypted channels. Clinical sites connecting to ContourCompanion never sent unencrypted PHI over the network.

Network Architecture: Isolating PHI Data Paths

The VPC design followed a three-tier pattern across two Availability Zones in us-east-2, mapping subnet boundaries to data sensitivity.

Why us-east-2, not us-east-1

us-east-1 (N. Virginia) has the broadest service catalog but also the highest incident frequency of any AWS region. It's the oldest, the largest, and the most likely to experience capacity pressure during major events. For a medical device where uptime directly affects clinical workflow, the marginal service availability advantage of us-east-1 doesn't justify the elevated risk. us-east-2 (Ohio) has every HIPAA-eligible service ContourCompanion requires and a significantly cleaner operational track record. Friends don't let friends deploy medical devices in us-east-1.

Three-tier subnet isolation

The public subnets hold only the ALB and NAT gateways (one per AZ for redundancy). No compute instances, no databases, no PHI storage. AWS WAF sits in front of the ALB with OWASP managed rule groups, rate limiting, and geo-restriction to block traffic from regions where ContourCompanion has no clinical users. The ALB accepts inbound HTTPS, terminates TLS, and forwards to private subnets.

The private subnets run both ECS Fargate tasks (web UI, upload service, distributor, coordinator) and ECS EC2 GPU instances (inference). No internet gateway route. Outbound internet access (for pulling container images, NVD vulnerability checks, SSM Patch Manager) routes through the NAT gateway. Security groups restrict traffic to the minimum required ports between services. Both Fargate tasks and EC2 instances spread across both AZs, so losing an AZ doesn't lose compute capacity.

The isolated subnets contain only the RDS PostgreSQL cluster. No NAT gateway route, no internet path in either direction. Database traffic flows exclusively from the application tier's security group on the PostgreSQL port. RDS runs in Multi-AZ configuration: a synchronous standby replica in the second AZ with automatic failover. If the primary AZ goes down, RDS promotes the standby with no data loss and no manual intervention.

VPC endpoints and private connectivity

VPC endpoints eliminated internet traversal for all AWS API calls. S3, KMS, ECR, CloudWatch, Secrets Manager, Systems Manager, and CloudTrail all connect through interface or gateway endpoints within the VPC. PHI never leaves the AWS network backbone. This also removes the NAT gateway as a single point of failure for AWS service access, since VPC endpoints route through the AWS private network independently.

Route 53 and edge protection

Route 53 health checks monitor the ALB endpoint. If the primary region becomes unhealthy, failover routing directs traffic to a static maintenance page (or, in a full active-passive setup, to the DR region's ALB). Health check intervals are 10 seconds with a 3-check threshold, meaning failover triggers within 30 seconds of a regional issue.

Break-Glass Access via Systems Manager

SSH is disabled on all infrastructure. There are no bastion EC2 instances with key pairs. Shell access goes exclusively through AWS Systems Manager Session Manager, accessed via Identity Center's break-glass permission set.

The break-glass flow:

  1. Engineer authenticates through Identity Center with MFA. Selects the BreakGlass permission set for the production account.
  2. Identity Center issues temporary credentials (60-minute maximum session).
  3. The AssumeRole event fires a CloudWatch alarm via EventBridge. The engineering team is notified via SNS within seconds, before the operator's first command.
  4. The operator starts an SSM session to the target EC2 GPU instance or uses ECS exec for Fargate tasks. No inbound security group rules needed because SSM connects outbound through the VPC endpoint.
  5. Every keystroke is logged to the log archive account's S3 bucket with the same encryption and Object Lock immutability as the CloudTrail audit trail.
  6. After 60 minutes, the session and credentials expire automatically. No manual cleanup.

The break-glass permission set grants read-only access by default. If the operator needs to modify data or restart a service, they must assume a separate write-scoped role with a second MFA challenge, logged as a distinct CloudTrail event. This separation means an operator can investigate a production issue without the ability to alter data, and the audit trail distinguishes investigation from intervention.

For 21 CFR Part 11, the SSM session log maps directly to §11.10(d) (limiting system access to authorized individuals) and §11.10(e) (secure, computer-generated, time-stamped audit trails). The session transcript, combined with the Identity Center authentication event and CloudTrail API log, forms a complete chain: who authenticated, what role they assumed, what commands they ran, and when the session ended.

Resilience and Disaster Recovery

A medical device that's offline disrupts clinical workflow. Radiation oncology departments plan treatment schedules days in advance. If the autocontouring service is unavailable when a physicist needs it, the fallback is manual contouring, which adds hours per patient. The architecture has to account for AZ failures, regional degradation, and data durability across failure scenarios.

Multi-AZ by default

Every stateful component runs across two Availability Zones in us-east-2:

  • RDS PostgreSQL: Multi-AZ with synchronous replication. Automatic failover, zero data loss, ~60 second promotion time.
  • ECS Fargate (web/services): Tasks distributed across AZs by the ECS scheduler. Minimum healthy percentage set to 100%, so capacity is maintained during rolling deployments.
  • ECS EC2 GPU (inference): Auto Scaling group spans both AZs. The coordinator maintains minimum capacity in each AZ, so a single AZ failure doesn't eliminate inference capacity.
  • NAT Gateways: One per AZ. If an AZ loses its NAT gateway, the other AZ's tasks continue reaching external endpoints.
  • ALB: Cross-zone load balancing enabled. A single unhealthy AZ doesn't affect request routing to healthy targets.

Cross-region backup to us-west-2

The DR region (us-west-2) holds two things: an S3 cross-region replica and a promotable RDS read replica.

S3 cross-region replication (CRR) copies every DICOM object and audit log to us-west-2 within 15 minutes. The replica bucket has its own KMS key (KMS keys are regional) and its own Object Lock configuration. If the primary region suffers a sustained outage, the data exists independently in us-west-2.

The RDS cross-region read replica uses asynchronous replication. Under normal operation, replica lag is under 1 second. In a regional disaster, the read replica can be promoted to a standalone primary. This is a manual operation because automatic cross-region failover for RDS requires Aurora Global Database, which was architectural overkill for ContourCompanion's write volume. The RTO for cross-region failover is measured in minutes, not hours (promote replica, update Route 53, verify connectivity).

Recovery objectives

ScenarioRTORPO
Single AZ failure~60 seconds (automatic)Zero
Primary region degradation~15 minutes (manual promotion)< 1 minute (async replication lag)
Data corruptionPoint-in-time recovery within 35-day window< 5 minutes

These targets were documented in the risk management file as design inputs and verified during infrastructure qualification testing.

Audit Trails: 21 CFR Part 11 on AWS

21 CFR Part 11 requires secure, computer-generated, time-stamped audit trails that record the identity of the operator, the action taken, and when it happened. For a cloud-hosted medical device, that means logging every infrastructure change, every data access, and every application event that affects electronic records.

CloudTrail is the foundation. We configured it as an organization-wide multi-region trail with log file integrity validation enabled. Integrity validation uses SHA-256 hashing and SHA-256 with RSA digital signing, making it computationally infeasible to modify, delete, or forge log entries without detection.

Logs deliver to the log archive account, the dedicated account in the Control Tower landing zone with no human write access. SCPs prevent anyone in any account from stopping the trail or modifying its delivery configuration. The log bucket uses S3 Object Lock in Compliance Mode, which prevents anyone, including the root account of the log archive, from deleting log files before the retention period expires. The only way to remove a compliance-locked object early is to delete the entire AWS account. That's the level of immutability regulators expect.

The audit account runs AWS Security Hub with the HIPAA and CIS Benchmarks enabled, aggregating findings across all accounts. AWS Config rules in each account feed into a central Config aggregator in the audit account, providing a single-pane view of compliance posture across the organization.

The 21 CFR Part 11 mapping:

Part 11 RequirementAWS Implementation
§11.10(a) ValidationAWS Config conformance pack for Part 11; IQ/OQ scripts against deployed infrastructure
§11.10(d) Access controlIdentity Center with MFA, permission sets scoped per account, break-glass role with SSM logging
§11.10(e) Audit trailsOrg-wide CloudTrail to log archive account, S3 Object Lock, integrity validation
§11.10(g) Authority checksSCPs at organization level, IAM condition keys, permission set boundaries

AWS Audit Manager provides a prebuilt 21 CFR Part 11 framework that maps Config rules to Part 11 requirements. We used it to generate continuous compliance evidence rather than running periodic manual audits. The compliance posture was always current, not a snapshot from three months ago.

Infrastructure Qualification: Proving the Cloud Works

The 510(k) submission doesn't require you to validate AWS itself. AWS is a SOUP item under IEC 62304, and you manage it through supplier qualification (SOC 2 reports, ISO certificates) plus your own risk assessment. What the FDA does expect is evidence that your specific configuration of AWS services meets your design requirements.

We structured this as an Installation Qualification (IQ) script that verified the deployed infrastructure against the architecture specification. The script checked:

  1. Control Tower landing zone active with all five accounts
  2. SCPs enforcing region restriction, HIPAA-eligible service allowlist, and CloudTrail protection
  3. Identity Center configured with expected permission sets, no IAM users in any account
  4. Encryption enabled on every data store with customer-managed KMS keys
  5. Security groups matching the approved configuration (no 0.0.0.0/0 rules, port-level restrictions)
  6. RDS Multi-AZ active with automated backups and cross-region replica connected
  7. CloudTrail active with integrity validation in all regions, delivering to log archive account
  8. S3 cross-region replication active with Object Lock on both source and replica
  9. WAF rules attached to the ALB with expected managed rule groups
  10. VPC endpoints operational for all required services
  11. Break-glass permission set configured with correct session duration and MFA requirement

This ran after every infrastructure deployment through CDK, producing a timestamped verification record.

The verification evidence fed directly into the risk management traceability chain: infrastructure risk controls traced to verification tests, which traced to IQ results. When FDA reviewers asked "how do you know your cloud infrastructure meets your security requirements," the answer was an automated test suite with continuous evidence, not a manual checklist someone ran once before submission.

Infrastructure as Code made this possible. Because the entire infrastructure was defined in CDK with cdk-nag HIPAA packs catching compliance violations at synthesis time, the gap between "what we designed" and "what's deployed" was always zero. The CDK code was the architecture document. The deployed infrastructure was its verified output. Post 6 in this series covers Infrastructure as Code for regulated environments in detail.

Frequently Asked Questions

Is AWS HIPAA compliant?

AWS itself is not "HIPAA compliant" in any certification sense. HIPAA has no certification program. AWS signs a Business Associate Addendum (BAA) and provides 166+ HIPAA-eligible services that can be used to process PHI. The customer is responsible for configuring those services correctly, implementing access controls, enabling encryption, and maintaining audit trails. AWS provides the tools. Compliance is your responsibility.

What is a BAA with AWS?

A Business Associate Addendum (BAA) is a contract between a covered entity (or another business associate) and AWS that establishes AWS's obligations for handling PHI under HIPAA. You execute it through AWS Artifact in the Management Console. The BAA only covers HIPAA-eligible services. Using a non-eligible service to process PHI is a HIPAA violation regardless of whether you've signed the BAA.

Which AWS services are HIPAA eligible?

AWS maintains an official list of HIPAA-eligible services that includes 166+ services as of 2025. Key services for medical device software: ECS, EKS, Fargate, ECR, S3, RDS, DynamoDB, Lambda, KMS, CloudTrail, CloudWatch, SageMaker, HealthImaging, and HealthLake. The list updates frequently, so check it before making architecture decisions.

Do I need HIPAA compliance for a SaMD?

If your SaMD processes, stores, or transmits Protected Health Information (PHI), such as patient imaging data, demographics, or clinical data, then yes. Most SaMD that integrates with clinical systems handles PHI. Even if you de-identify data before processing, you need HIPAA controls on the ingestion pipeline that receives identified data. ContourCompanion processes DICOM CT data containing patient identifiers, so HIPAA applied to the entire data path from ingestion through inference to result delivery.

References

HIPAAAWSSaMDCloud ArchitecturePHI