Implementing AWS AI Services
A technical deep-dive into standardizing AI infrastructure using AWS Bedrock and HealthScribe, bridging heterogeneous legacy systems with modern generative capabilities while controlling cost.
The brief
We standardized clinical AI infrastructure on AWS Bedrock and HealthScribe, putting one generative layer in front of a set of heterogeneous legacy EHRs.
We bridged those legacy systems to modern generative capabilities and held operational cost down by routing each request to the cheapest model that could handle it.
How it works in the clinic
Legacy EHR Integration
- Deploys FHIR wrappers around legacy HL7 connections for standardized data flow.
- Bulk-streams historical patient records to establish a comprehensive timeline.
Multi-Agent Execution
- Uses Step Functions to trigger specialized Bedrock models based on task type.
- Routes clinical summarizing to Claude and medical abstraction to fine-tuned models.
Cost Controls
- Implements dynamic prompt caching and token-limit enforcement.
- Schedules batch operations during off-peak hours for reduced compute cost.
Provider Portal Integration
- Embeds contextual AI widgets directly within existing portal environments.
- Eliminates the need for practitioners to learn entirely new software.
What it runs on
A highly modular architecture designed to swap foundational models as capabilities evolve.
Generative Core (Amazon Bedrock)
The abstraction layer simplifying access to leading LLMs:
- Knowledge Bases. Ingests internal practice guidelines to ground responses in validated protocols.
- Agents for Bedrock. Executes multi-step clinical queries (e.g. retrieving lab results before summarizing).
- Cost Allocation. Tags individual API calls by department to allow granular chargebacks.
Integration & API Ecosystem
The connective tissue bringing AI to end-users:
- AppSync (GraphQL). Provides a unified data graph combining patient records with real-time AI insights.
- Cognito Authentication. Manages strict role-based access controls defining who can execute specific AI tasks.
- Lambda Orchestration. Lightweight serverless compute connects API requests to model pipelines.
Run, monitor, stay compliant
Continuous Optimization
Ensuring model usage stays economical without sacrificing quality:
- Token Monitoring. Granular dashboards track input/output tokens per session, flagging anomalous spikes.
- Model Swapping. A/B frameworks route small traffic percentages to cheaper models to validate cost hypotheses.
Security Posture
A defense-in-depth approach protecting API endpoints:
- WAF Guardrails. Filters malicious prompt-injection attempts before they reach Bedrock endpoints.
- VPC Endpoints. Ensures all traffic to AWS AI services remains within the private AWS network.
Tuned for clinical relevance
Systematic prompt version control manages iterative improvements across development life cycles.
Templating Engines
Separates prompt structure from injected patient parameters for reusability.
Evaluation Frameworks
Runs offline batches of golden-standard prompts against new models to measure degradation.
Persona Shifting
Adjusts language complexity depending on whether output is for a patient or physician.
Few-Shot Learning
Embeds specific clinical examples in the system prompt to guide expected formatting.
The outcome
In production this standardized clinical AI across legacy EHRs behind one provider portal, and kept cost predictable by routing each request to the cheapest model that could handle it. Clinicians stayed in the software they already knew while the generative layer underneath was swapped and tuned without touching their workflow.
Frequently asked questions
How do you standardize AI across legacy EHRs?
FHIR wrappers around legacy HL7 connections feed one generative layer on Amazon Bedrock, surfaced through the providers’ existing portal, so clinicians never have to learn new software.
How is the cost of generative AI controlled?
Each request is routed to the cheapest model that can handle it, with dynamic prompt caching, token-limit enforcement, off-peak batch scheduling, and per-department cost allocation.
How is the AI layer kept secure?
WAF guardrails filter prompt-injection attempts before they reach Bedrock, VPC endpoints keep all traffic on the private AWS network, and Cognito enforces role-based access to specific AI tasks.