Case study 03Healthcare AI

Implementing AWS AI Services

A technical deep-dive into standardizing AI infrastructure using AWS Bedrock and HealthScribe, bridging heterogeneous legacy systems with modern generative capabilities while controlling cost.

AWS BedrockClinical WorkflowsOperational AI

Overview

The brief

We standardized clinical AI infrastructure on AWS Bedrock and HealthScribe, putting one generative layer in front of a set of heterogeneous legacy EHRs.

We bridged those legacy systems to modern generative capabilities and held operational cost down by routing each request to the cheapest model that could handle it.

Clinical workflow

How it works in the clinic

Data Ingestion01

Legacy EHR Integration

Deploys FHIR wrappers around legacy HL7 connections for standardized data flow.
Bulk-streams historical patient records to establish a comprehensive timeline.

Model Orchestration02

Multi-Agent Execution

Uses Step Functions to trigger specialized Bedrock models based on task type.
Routes clinical summarizing to Claude and medical abstraction to fine-tuned models.

Optimization03

Cost Controls

Implements dynamic prompt caching and token-limit enforcement.
Schedules batch operations during off-peak hours for reduced compute cost.

Delivery04

Provider Portal Integration

Embeds contextual AI widgets directly within existing portal environments.
Eliminates the need for practitioners to learn entirely new software.

Technical architecture

What it runs on

A highly modular architecture designed to swap foundational models as capabilities evolve.

Generative Core (Amazon Bedrock)

The abstraction layer simplifying access to leading LLMs:

Knowledge Bases. Ingests internal practice guidelines to ground responses in validated protocols.
Agents for Bedrock. Executes multi-step clinical queries (e.g. retrieving lab results before summarizing).
Cost Allocation. Tags individual API calls by department to allow granular chargebacks.

Integration & API Ecosystem

The connective tissue bringing AI to end-users:

AppSync (GraphQL). Provides a unified data graph combining patient records with real-time AI insights.
Cognito Authentication. Manages strict role-based access controls defining who can execute specific AI tasks.
Lambda Orchestration. Lightweight serverless compute connects API requests to model pipelines.

Operational excellence

Run, monitor, stay compliant

Continuous Optimization

Ensuring model usage stays economical without sacrificing quality:

Token Monitoring. Granular dashboards track input/output tokens per session, flagging anomalous spikes.
Model Swapping. A/B frameworks route small traffic percentages to cheaper models to validate cost hypotheses.

Security Posture

A defense-in-depth approach protecting API endpoints:

WAF Guardrails. Filters malicious prompt-injection attempts before they reach Bedrock endpoints.
VPC Endpoints. Ensures all traffic to AWS AI services remains within the private AWS network.

Prompt engineering

Tuned for clinical relevance

Systematic prompt version control manages iterative improvements across development life cycles.

Templating Engines

Separates prompt structure from injected patient parameters for reusability.

Evaluation Frameworks

Runs offline batches of golden-standard prompts against new models to measure degradation.

Persona Shifting

Adjusts language complexity depending on whether output is for a patient or physician.

Few-Shot Learning

Embeds specific clinical examples in the system prompt to guide expected formatting.

Conclusion

The outcome

In production this standardized clinical AI across legacy EHRs behind one provider portal, and kept cost predictable by routing each request to the cheapest model that could handle it. Clinicians stayed in the software they already knew while the generative layer underneath was swapped and tuned without touching their workflow.

FAQ

Frequently asked questions

How do you standardize AI across legacy EHRs?

FHIR wrappers around legacy HL7 connections feed one generative layer on Amazon Bedrock, surfaced through the providers’ existing portal, so clinicians never have to learn new software.

How is the cost of generative AI controlled?

Each request is routed to the cheapest model that can handle it, with dynamic prompt caching, token-limit enforcement, off-peak batch scheduling, and per-department cost allocation.

How is the AI layer kept secure?

WAF guardrails filter prompt-injection attempts before they reach Bedrock, VPC endpoints keep all traffic on the private AWS network, and Cognito enforces role-based access to specific AI tasks.

More case studies

Case study 01