Engineering case study~15 min read

A production document intelligence pipeline, built on AWS Bedrock & Azure AI.

Behavioral-health and regulatory work generates thousands of documents a day (clinical records, policies, forms), each a wall of unstructured text someone has to read. Here’s the pipeline we built to read them: extract, understand and act at scale, routing each document to the cheapest model that can handle it.

AWS BedrockAzure AI.NETOCRLLM Orchestration

Why it matters

Beyond OCR and brittle rules

Traditional document processing means manual review, basic OCR that misses context, and rigid rule-based extraction that breaks on variation. This pipeline solves those by combining services, processing asynchronously, scaling automatically, and adapting to document type and complexity.

Better accuracy

Combine multiple AI services rather than betting on a single solution.

Non-blocking

Asynchronous processing so users never wait on a 50-page contract.

Scales automatically

From dozens to millions of documents on the same architecture.

Adaptive

Routes by document type and complexity to the right tool, and the right cost.

The big picture

A three-stage assembly line

The pipeline mirrors how humans read documents (extract, understand, structure) but at machine scale. Each stage is optimized independently, and we can route around any service that has a bad day.

Ingestion

Azure Blob

Text extraction

Textract · Doc AI

AI reasoning

Bedrock · Claude / Nova

Structured results

JSON output

Foundation

A storage layer that hides the cloud

An abstraction over Azure Blob, S3 or GCS, so application logic never depends on the provider. Each document gets a unique id; input and output use predictable {jobId}_input / {jobId}_output keys.

IDocumentStore.cs

csharp

1public interface IDocumentStore2{3    Task<(string logicalKey, long fileSize)> UploadAsync(4        string key, Stream content, string contentType, CancellationToken ct);5 6    Task<long?> TryDownloadAsync(string key, Stream destination, CancellationToken ct);7 8    Uri GetResourceUri(string blobName);9}

Text extraction

Getting the words out, with structure

Different documents need different extraction. A unified interface picks the right method, and returns text as individual lines so spatial relationships survive into the AI stage.

Simple text

Basic OCR is enough.

Complex forms

Layout understanding and field recognition.

Handwriting

Advanced recognition models.

Multi-language

Specialized language models.

DocumentIntelligenceService.cs

csharp

1public interface ITextRecognition2{3    Task<IList<string>> DetectTextAsync(Uri blobPath, CancellationToken ct);4}5 6public class DocumentIntelligenceService : ITextRecognition7{8    public async Task<IList<string>> DetectTextAsync(Uri blobPath, CancellationToken ct)9    {10        var op = await _client.AnalyzeDocumentAsync(11            WaitUntil.Completed, "prebuilt-read", blobPath, ct);12 13        // Return individual lines, preserves structure for the LLM stage14        return op.Value.Pages15            .SelectMany(p => p.Lines)16            .Select(l => l.Content)17            .ToList();18    }19}

The brain

LLM orchestration: a team of specialists

A model registry routes each task to the right model, tracks token usage and cost, and wraps calls in a retry policy.

LLMOrchestrationClient.cs

csharp

1public class LLMOrchestrationClient2{3    private static readonly Dictionary<AIModel, ModelConfig> ModelRegistry = new()4    {5        { AIModel.Claude35Sonnet, new(Brand.Anthropic, "anthropic.claude-3-5-sonnet-20241022-v2:0",6            supportsCache: true, inputCost: 0.003M, outputCost: 0.015M) },7        { AIModel.NovaLite, new(Brand.Amazon, "us.amazon.nova-lite-v1:0",8            supportsCache: true, inputCost: 0.00006M, outputCost: 0.00024M) },9    };10 11    public async Task<LLMResponse> ProcessAsync(AIModel model, string prompt,12        string systemPrompt, CancellationToken ct)13    {14        return await _retryPolicy.ExecuteAsync(async (c) =>15        {16            var response = await _client.ConverseAsync(BuildRequest(model, prompt, systemPrompt), c);17            LogTokenUsage(model, response.Usage);18            return new LLMResponse(response.Output.Message.Content[0].Text, response);19        }, ct);20    }21}

One registry, three specialists

Model	Best for	Input $/1K	Output $/1K
Nova Lite	Fast and cheap, for simple extraction	$0.00006	$0.00024
Nova Pro	Balanced, for general-purpose tasks	n/a	n/a
Claude 3.5 Sonnet	Complex reasoning & analysis	$0.003	$0.015

Prompt engineering

Prompts as first-class citizens

Many document-AI projects treat prompts as an afterthought. We version, test and compose them: system prompts define the role, user prompts define the task.

system_prompts.yml

yaml

1# system_prompts.yml2document_analysis: >-3  You are a document intelligence specialist trained to extract structured4  information from unstructured text. Identify key entities, relationships,5  and regulatory requirements.6 7policy_extraction: >-8  You are a policy analyst trained to interpret regulatory documents. Extract9  all verifiable requirements that can be assessed through documentation review.

PromptEngine.cs

csharp

1public class PromptEngine2{3    public async Task<PolicyAnalysisResult> AnalyzePolicyDocument(4        string documentText, CancellationToken ct)5    {6        var systemPrompt = GetPrompt("system", "policy_extraction");7        var userPrompt = BuildPrompt("extract_policy_questions", new()8        {9            ["DOCUMENT_TEXT"] = documentText10        });11 12        return await _llmClient.ProcessAsync(13            AIModel.Claude35Sonnet, userPrompt, systemPrompt, ct);14    }15}

Modularity

Role vs task separated into composable parts.

Testability

A/B test prompt versions without code changes.

Maintainability

Non-engineers improve prompts safely.

Version control

Prompts live in Git alongside the code.

Handling scale

Event-driven, asynchronous processing

A 50-page contract might take 30 seconds, and users can’t block on that. An event-driven handler scales with the queue and recovers from transient failures with retries.

DocumentProcessingHandler.cs

csharp

1[MessageHandler(concurrencyLimit: 3)]2public class DocumentProcessingHandler : IMessageConsumer<DocumentUploadedEvent>3{4    public async Task HandleAsync(DocumentUploadedEvent message, CancellationToken ct)5    {6        var jobId = message.DocumentId;7        try8        {9            var documentUri = _documentStore.GetResourceUri($"{jobId}_input");10            var lines = await _textRecognition.DetectTextAsync(documentUri, ct);11            var documentText = string.Join("\n", lines);12 13            var analysis = await _llmProcessor.AnalyzePolicyDocument(documentText, ct);14            await StoreResults(jobId, analysis, ct);15        }16        catch (Exception ex)17        {18            _logger.LogError(ex, "Failed to process document {JobId}", jobId);19        }20    }21}

API design

Sync for small, async for large

DocumentsController.cs

csharp

1[HttpPost("/api/v1/documents/analyze")]2public async Task<IActionResult> AnalyzeDocument([FromForm] IFormFile document, CancellationToken ct)3{4    if (document?.Length <= 0) return BadRequest("No document provided");5 6    var preferAsync = Request.Headers.Prefer().Return == ReturnPreference.Minimal;7    if (!preferAsync)8    {9        var bytes = await ReadDocumentBytes(document, ct);10        return Ok(await _processor.AnalyzeDocumentAsync(bytes, ct)); // small: immediate11    }12 13    var jobId = await InitiateAsyncProcessing(document, ct);       // large: async14    Response.Headers["Job-ID"] = jobId;15    return AcceptedAtAction("GetJobStatus", new { id = jobId });16}

Small documents (< 1MB): results returned immediately in the response.
Large documents: a job id to poll, with real-time status updates.
Consistent error format whether processing is sync or async.

Smart optimizations

Fast and cheap, by design

Performance and cost aren’t afterthoughts. They’re built into the architecture.

Intelligent model selection

Not every document needs the most powerful model. Routing by size and complexity cuts AI cost by up to 80% with no quality loss. A simple form doesn’t need Claude’s reasoning.

ModelSelector.cs

csharp

1public AIModel SelectOptimalModel(DocumentAnalysisRequest request)2{3    var size = request.Content.Length;4    var complexity = EstimateComplexity(request.DocumentType);5 6    return (size, complexity) switch7    {8        (< 10_000, ComplexityLevel.Low)    => AIModel.NovaLite,      // fast, cheap9        (< 50_000, ComplexityLevel.Medium) => AIModel.NovaPro,       // balanced10        (_, ComplexityLevel.High)          => AIModel.Claude35Sonnet,// most capable11        _                                  => AIModel.NovaPro12    };13}

Impact

Reserve expensive models for complex reasoning; let Nova Lite handle the long tail of simple forms. ~80% lower AI spend at the same quality.

Multi-layer caching

Identical contracts and standard forms shouldn’t be reprocessed. A memory → distributed → process fallback eliminates redundant work.

L1: memory

fastest

L2: distributed

shared

Process & cache

24h TTL

CachingProcessor.cs

csharp

1public async Task<AnalysisResult> ProcessDocumentAsync(string documentHash,2    Func<Task<AnalysisResult>> processor, CancellationToken ct)3{4    // L1: in-memory (fastest)5    if (_cache.TryGetValue(documentHash, out AnalysisResult cached)) return cached;6 7    // L2: distributed (shared across instances)8    var distributed = await _distributedCache.GetAsync(documentHash, ct);9    if (distributed != null)10    {11        var result = JsonSerializer.Deserialize<AnalysisResult>(distributed);12        _cache.Set(documentHash, result, TimeSpan.FromMinutes(30));13        return result;14    }15 16    // Miss: process once, then cache for next time17    var processed = await processor();18    await _distributedCache.SetAsync(documentHash,19        JsonSerializer.SerializeToUtf8Bytes(processed),20        new() { AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24) }, ct);21    return processed;22}

Repeated document types

Response time

full reprocess

cache hit

95% faster

AI cost

per request

cached

near-zero

Security

Sensitive data, handled correctly

Financial, medical and legal documents make security fundamental, not optional.

SecureProcessor.cs

csharp

1public async Task<ProcessingResult> ProcessSecureDocumentAsync(2    SecureDocumentRequest request, CancellationToken ct)3{4    await _auditLogger.LogDocumentAccessAsync(request.UserId, request.DocumentId);5    try6    {7        var decrypted = await _encryption.DecryptAsync(request.EncryptedContent);8        var masked = MaskSensitiveData(decrypted);          // mask PII before AI9        var result = await ProcessDocumentContent(masked, ct);10 11        result.EncryptedOutput = await _encryption.EncryptAsync(result.Output);12        result.Output = null;                                // clear plaintext13        return result;14    }15    finally16    {17        GC.Collect();                                        // clear sensitive memory18    }19}

Encrypt everything

At rest, in transit, and during processing.

Audit all access

Who accessed which document, and when.

Minimize exposure

Mask PII before sending to AI services.

Clean up

Clear sensitive plaintext from memory immediately.

Monitoring

You can’t optimize what you can’t measure

Telemetry.cs

csharp

1public async Task<T> TrackProcessingAsync<T>(string operation, Func<Task<T>> processor)2{3    using var timer = _metrics.StartTimer($"document_processing.{operation}.duration");4    try5    {6        var result = await processor();7        _metrics.Increment($"document_processing.{operation}.success");8        return result;9    }10    catch (Exception ex)11    {12        _metrics.Increment($"document_processing.{operation}.error",13            new[] { ("error_type", ex.GetType().Name) });14        throw;15    }16}

Processing times per stage; success / failure rates by document type.
Cost metrics: token usage and AI service spend.
Queue depth and error patterns, to catch issues before users do.

What we learned

Hard-won lessons from production

Start simple, scale smart

One document type working reliably beats every edge case half-done.

Prompt engineering is critical

A well-crafted prompt can lift accuracy ~40% while cutting cost.

Plan for failure

Retries, circuit breakers and graceful degradation from day one.

Monitor everything

Token usage, latency, error rates and cost, from the start.

Common challenges

…and how we solved them

Challenge

Multi-cloud complexity

Different APIs, auth and formats per provider.

Solution

Abstraction layers with unified interfaces that preserve provider-specific optimizations.

Challenge

LLM response reliability

Models occasionally return malformed JSON or unexpected structures.

Solution

Schema validation, multiple fallback parsers, and backup processing paths.

Challenge

Cost management

LLM spend escalates fast with large documents and complex prompts.

Solution

Intelligent model selection, cost tracking and automatic alerts.

Looking ahead

Where this goes next

Multi-modal

Images, tables and charts understood inline within documents.

Streaming responses

Results appear as they’re generated, not after the whole analysis.

Federated learning

Improve models from processed docs while preserving privacy.

Semantic chunking

Context-preserving splitting for better long-document accuracy.

Conclusion

Composable services, designed for failure

The key insight is treating each component (OCR, LLM processing, storage) as independent, composable services. That separation lets each part evolve while the system stays coherent. Start with a simple use case, get it working reliably, then expand. Build in monitoring and cost controls from day one. And design for failure: in production, failure isn’t a possibility, it’s a certainty.

All case studies