WellAlly Logo
WellAlly康心伴
Development

Building a HIPAA-Ready Health Data Pipeline with FastAPI, PostgreSQL, and Vault

Build a secure, HIPAA-compliant health data backend with FastAPI, PostgreSQL, and HashiCorp Vault. This tutorial covers application-level encryption, comprehensive audit logging, and role-based access control (RBAC) for handling Protected Health Information (PHI).

W
2025-12-10
12 min read

Handling Protected Health Information (PHI) is a massive responsibility. A single misstep in security can lead to heavy fines, reputational damage, and a breach of patient trust. For developers building health-tech applications, adhering to the Health Insurance Portability and Accountability Act (HIPAA) isn't just a feature—it's the foundation.

In this tutorial, we will build a HIPAA-ready data pipeline using a modern, powerful tech stack: FastAPI for our high-performance API, PostgreSQL for our reliable database, and HashiCorp Vault for gold-standard secrets management and encryption.

We'll move beyond theoretical discussions and write practical, production-oriented code. Our focus is not on providing medical advice or diagnoses, but on constructing a compliant and secure architectural backbone for a health data application. We will specifically tackle the three technical safeguard pillars of HIPAA: encryption, audit logging, and access control.

Prerequisites

  • Python 3.8+ and an understanding of FastAPI.
  • Docker and Docker Compose installed.
  • Basic knowledge of PostgreSQL and SQL.
  • HashiCorp Vault installed locally or accessible.

Understanding the Problem: The HIPAA Technical Safeguards

HIPAA's Security Rule outlines technical safeguards required to protect electronic PHI (ePHI). These are not suggestions; they are requirements. Our architecture will directly address:

  1. Access Control: We must ensure that users and systems only have access to the minimum necessary information to perform their functions. We'll implement a Role-Based Access Control (RBAC) system.
  2. Audit Controls: We need mechanisms to record and examine activity in systems that contain or use ePHI. Our FastAPI application will log every significant event.
  3. Integrity: PHI must not be improperly altered or destroyed.
  4. Transmission Security: PHI must be encrypted when it is transmitted over a network.
  5. Encryption: PHI should be encrypted when stored ("at rest").

Our chosen stack is perfectly suited to meet these demands. FastAPI's dependency injection is ideal for RBAC, PostgreSQL offers robust encryption features, and Vault provides a centralized, secure way to manage encryption keys and sensitive data.

Prerequisites & Initial Setup

We'll use Docker Compose to spin up our PostgreSQL and Vault development environment.

Create a docker-compose.yml file:

code
# docker-compose.yml
version: '3.8'

services:
  db:
    image: postgres:14-alpine
    restart: always
    environment:
      - POSTGRES_USER=admin
      - POSTGRES_PASSWORD=supersecretpassword
      - POSTGRES_DB=health_db
    ports:
      - '5432:5432'
    volumes:
      - postgres_data:/var/lib/postgresql/data/

  vault:
    image: vault:latest
    ports:
      - "8200:8200"
    environment:
      VAULT_DEV_ROOT_TOKEN_ID: "dev-root-token"
      VAULT_DEV_LISTEN_ADDRESS: "0.0.0.0:8200"
    cap_add:
      - IPC_LOCK

volumes:
  postgres_data:
Code collapsed

Run docker-compose up -d to start the services.

Now, let's set up our FastAPI application.

code
mkdir hipaa_pipeline && cd hipaa_pipeline
python -m venv venv
source venv/bin/activate
pip install fastapi uvicorn "sqlalchemy[asyncio]" asyncpg pydantic python-jose[cryptography] passlib[bcrypt] hvac```

## Step 1: Centralized Encryption with HashiCorp Vault

A common mistake is storing encryption keys within the application's configuration or code. This is a significant security risk. We'll use Vault's **transit secrets engine** to provide "Encryption as a Service," ensuring our application never handles raw encryption keys.

### What we're doing

We will enable and configure Vault's transit engine to create an encryption key. Our FastAPI application will then send data to Vault to be encrypted before it's stored in PostgreSQL and send it back to be decrypted when it's needed.

### Implementation

First, let's configure Vault. Make sure your Vault server is running from Docker Compose.

1.  **Set Vault Environment Variables**:
    ```bash
    export VAULT_ADDR='http://127.0.0.1:8200'
    export VAULT_TOKEN='dev-root-token'
Code collapsed
  1. Enable the Transit Engine:

    code
    vault secrets enable transit
    
    Code collapsed

    Expected Output: Success! Enabled the transit secrets engine at: transit/

  2. Create an Encryption Key:

    code
    vault write -f transit/keys/phi-key
    
    Code collapsed

    Expected Output: Success! Data written to: transit/keys/phi-key

Now, let's create a service in our FastAPI app to communicate with Vault.

code
# src/vault_service.py
import hvac
import os
import base64

VAULT_ADDR = os.getenv("VAULT_ADDR", "http://127.0.0.1:8200")
VAULT_TOKEN = os.getenv("VAULT_TOKEN", "dev-root-token")

client = hvac.Client(url=VAULT_ADDR, token=VAULT_TOKEN)

ENCRYPTION_KEY = "phi-key"

def encrypt_data(plain_text: str) -> str:
    """Encrypts data using Vault's transit engine."""
    if not plain_text:
        return plain_text
        
    encoded_text = base64.b64encode(plain_text.encode()).decode('ascii')
    
    response = client.secrets.transit.encrypt_data(
        name=ENCRYPTION_KEY,
        plaintext=encoded_text,
    )
    return response['data']['ciphertext']

def decrypt_data(cipher_text: str) -> str:
    """Decrypts data using Vault's transit engine."""
    if not cipher_text:
        return cipher_text
        
    response = client.secrets.transit.decrypt_data(
        name=ENCRYPTION_KEY,
        ciphertext=cipher_text,
    )
    decoded_text = base64.b64decode(response['data']['plaintext']).decode('ascii')
    return decoded_text
Code collapsed

How it works

  • We use the hvac library to connect to our Vault instance.
  • The encrypt_data function takes plaintext, base64-encodes it (a requirement for the transit engine), and sends it to the /transit/encrypt/phi-key endpoint in Vault.
  • Vault encrypts the data using the phi-key without ever exposing the key itself. It returns a ciphertext string.
  • decrypt_data does the reverse. It sends the ciphertext, and Vault returns the decoded plaintext.

This architecture is powerful because key rotation, management, and access policies are all handled centrally in Vault, reducing the security burden on our application.

Step 2: Tiered Access Control (RBAC) in FastAPI

To comply with the principle of least privilege, we need to ensure users can only access data relevant to their role. A doctor should see patient records, a billing clerk should see financial data, and a researcher should only see anonymized data.

What we're doing

We'll define user roles and create a dependency in FastAPI that checks the user's role and their permissions before allowing access to an endpoint.

Implementation

First, let's define our roles and a Pydantic model for our user.

code
# src/schemas.py
from pydantic import BaseModel
from enum import Enum

class UserRole(str, Enum):
    DOCTOR = "doctor"
    NURSE = "nurse"
    ADMIN = "admin"
    RESEARCHER = "researcher"

class User(BaseModel):
    username: str
    role: UserRole
    
# Dummy user database for demonstration
DUMMY_USERS = {
    "doctor_davis": User(username="doctor_davis", role=UserRole.DOCTOR),
    "nurse_nancy": User(username="nurse_nancy", role=UserRole.NURSE),
    "admin_arnold": User(username="admin_arnold", role=UserRole.ADMIN),
}

# A simple function to simulate getting the current user from a token
def get_current_user_from_token(token: str) -> User:
    # In a real app, you would decode a JWT here
    username = token.split("_") + "_" + token.split("_")
    return DUMMY_USERS.get(username)
Code collapsed

Now, we create the RBAC dependency.

code
# src/security.py
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
from typing import List
from .schemas import User, UserRole, get_current_user_from_token

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")

class RoleChecker:
    def __init__(self, allowed_roles: List[UserRole]):
        self.allowed_roles = allowed_roles

    def __call__(self, token: str = Depends(oauth2_scheme)):
        user = get_current_user_from_token(token)
        if not user:
            raise HTTPException(
                status_code=status.HTTP_401_UNAUTHORIZED,
                detail="User not found",
            )
        if user.role not in self.allowed_roles:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
                detail="Operation not permitted for this role",
            )
        return user

# Define role-based dependencies
allow_doctors_and_admins = RoleChecker([UserRole.DOCTOR, UserRole.ADMIN])
allow_all_clinical_staff = RoleChecker([UserRole.DOCTOR, UserRole.NURSE, UserRole.ADMIN])
allow_admins_only = RoleChecker([UserRole.ADMIN])
Code collapsed

How it works

The RoleChecker class is an elegant use of FastAPI's dependency injection.

  1. We initialize it with a list of roles that are permitted to access a specific endpoint.
  2. When used in an endpoint (Depends(allow_admins_only)), FastAPI executes the __call__ method.
  3. It fetches the current user (here, simulated from a dummy token).
  4. It checks if the user's role is in the allowed_roles list. If not, it immediately returns a 403 Forbidden error.

Step 3: Creating a Comprehensive Audit Log

HIPAA requires that you log all "access and activity in information systems that contain or use ePHI." This means we need a detailed, immutable record of who did what, and when.

What we're doing

We will create a custom middleware or decorator to automatically log critical information for every request that modifies or accesses PHI. The log will include the user, the action, the resource affected, and a timestamp.

Implementation

Let's create an audit logging service and a decorator.

code
# src/audit_log.py
import logging
from functools import wraps
from fastapi import Request, Response
from .schemas import User

# Configure a specific logger for audit trails
audit_logger = logging.getLogger("audit")
audit_logger.setLevel(logging.INFO)
# In production, this handler would write to a secure, separate file or logging service
handler = logging.StreamHandler() 
formatter = logging.Formatter('%(asctime)s - AUDIT - %(message)s')
handler.setFormatter(formatter)
audit_logger.addHandler(handler)

def log_event(user: User, action: str, resource_id: str, status: str):
    """Formats and logs an audit event."""
    audit_logger.info(
        f"User='{user.username}' Role='{user.role.value}' Action='{action}' "
        f"ResourceID='{resource_id}' Status='{status}'"
    )

def audit(action: str):
    """
    Decorator to wrap endpoint functions and log an audit trail event.
    """
    def wrapper(func):
        @wraps(func)
        async def decorator(patient_id: str, request: Request, *args, **kwargs):
            # The user comes from a dependency that has already been resolved
            user = kwargs.get('current_user')
            if not user:
                # This should ideally not happen if endpoints are secured
                log_event(User(username="anonymous", role="unknown"), action, patient_id, "failed_auth")
                return Response(status_code=401)
            
            try:
                response = await func(patient_id, request, *args, **kwargs)
                status = "success" if response.status_code < 400 else "failure"
                log_event(user, action, patient_id, status)
                return response
            except Exception as e:
                log_event(user, action, patient_id, f"error: {e}")
                raise e
        return decorator
    return wrapper
Code collapsed

How it works

  • We create a dedicated audit_logger to separate audit logs from regular application logs. In a production environment, these logs should be sent to a secure, tamper-resistant system like AWS CloudWatch or a dedicated SIEM.
  • The audit decorator takes an action string (e.g., "VIEW_PATIENT_RECORD").
  • It wraps the endpoint function, extracts the user and resource information, executes the function, and then logs the outcome (success, failure, or error).
  • This approach keeps our endpoint logic clean while ensuring that auditing is consistently applied.

Putting It All Together: The FastAPI Application

Now we'll integrate encryption, access control, and auditing into our FastAPI endpoints for managing patient data.

Database and Models (SQLAlchemy)

code
# src/database.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker, declarative_base

DATABASE_URL = "postgresql+asyncpg://admin:supersecretpassword@localhost/health_db"

engine = create_async_engine(DATABASE_URL, echo=True)
async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
Base = declarative_base()

# src/models.py
from sqlalchemy import Column, Integer, String
from .database import Base

class Patient(Base):
    __tablename__ = 'patients'
    id = Column(Integer, primary_key=True, index=True)
    patient_ref_id = Column(String, unique=True, index=True)
    # These fields will store encrypted data
    name = Column(String) 
    ssn = Column(String)
    address = Column(String)
Code collapsed

The Main Application

code
# main.py
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.future import select
from sqlalchemy.exc import IntegrityError
import uvicorn
import uuid

from src.database import async_session, engine, Base
from src.models import Patient
from src.schemas import User
from src.vault_service import encrypt_data, decrypt_data
from src.security import allow_all_clinical_staff
from src.audit_log import audit

app = FastAPI()

@app.on_event("startup")
async def startup():
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

@app.post("/patients/")
@audit(action="CREATE_PATIENT_RECORD")
async def create_patient(
    name: str, 
    ssn: str, 
    address: str, 
    current_user: User = Depends(allow_all_clinical_staff)
):
    patient_ref_id = str(uuid.uuid4())
    
    # Encrypt PHI before storing it
    encrypted_name = encrypt_data(name)
    encrypted_ssn = encrypt_data(ssn)
    encrypted_address = encrypt_data(address)
    
    new_patient = Patient(
        patient_ref_id=patient_ref_id,
        name=encrypted_name,
        ssn=encrypted_ssn,
        address=encrypted_address,
    )
    
    async with async_session() as session:
        session.add(new_patient)
        try:
            await session.commit()
            await session.refresh(new_patient)
        except IntegrityError:
            raise HTTPException(status_code=400, detail="Patient already exists")

    return {"patient_ref_id": new_patient.patient_ref_id}


@app.get("/patients/{patient_id}")
@audit(action="VIEW_PATIENT_RECORD")
async def get_patient(
    patient_id: str, 
    current_user: User = Depends(allow_all_clinical_staff)
):
    async with async_session() as session:
        result = await session.execute(
            select(Patient).where(Patient.patient_ref_id == patient_id)
        )
        patient = result.scalars().first()
        
        if not patient:
            raise HTTPException(status_code=404, detail="Patient not found")
            
    # Decrypt data before returning it
    decrypted_name = decrypt_data(patient.name)
    decrypted_ssn = decrypt_data(patient.ssn)
    decrypted_address = decrypt_data(patient.address)
    
    return {
        "patient_ref_id": patient.patient_ref_id,
        "name": decrypted_name,
        "ssn": "********" + decrypted_ssn[-4:], # Mask sensitive data
        "address": decrypted_address,
    }

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)
Code collapsed

Security Best Practices: The Database Layer

While our application-level encryption is robust, HIPAA also requires protecting data at rest. PostgreSQL has built-in features for this.

  • Encryption at Rest: Modern cloud providers (AWS RDS, Google Cloud SQL) enable Transparent Data Encryption (TDE) by default. If managing your own PostgreSQL instance, use filesystem-level encryption like dm-crypt. This protects the physical database files if the server is compromised.
  • Encryption in Transit: Always configure PostgreSQL to require SSL/TLS connections. In your postgresql.conf, set ssl = on, and in pg_hba.conf, change connection types from host to hostssl.
  • Database Auditing: Use extensions like pgaudit to create a second layer of audit logs directly within the database, capturing actions performed outside the application (e.g., by a DBA).

Conclusion

We have successfully designed and built the core of a HIPAA-ready health data pipeline. By combining the strengths of FastAPI, PostgreSQL, and HashiCorp Vault, we've created a system that is not only functional but also secure by design.

  • Encryption: We implemented application-level encryption for PHI fields using Vault, ensuring keys are never exposed.
  • Access Control: We built a flexible, role-based access control system using FastAPI's dependencies to enforce the principle of least privilege.
  • Auditing: We created a decorator-based audit trail to log every critical interaction with patient data.

This architecture provides a solid foundation. The next steps would be to implement secure authentication (JWTs), de-identification for researchers, and rigorous testing.

Resources

#

Article Tags

pythonfastapipostgresqlsecurityhealthtech
W

WellAlly's core development team, comprised of healthcare professionals, software engineers, and UX designers committed to revolutionizing digital health management.

Expertise

Healthcare TechnologySoftware DevelopmentUser ExperienceAI & Machine Learning

Found this article helpful?

Try KangXinBan and start your health management journey

© 2024 康心伴 WellAlly · Professional Health Management