Engineering case study~12 min read

Conversation Capture: streaming audio in React Native.

This is the capture pipeline behind our clinical scribe. It records a live patient encounter on a clinician’s phone and streams it to the cloud reliably enough to draft a note, even on hospital Wi-Fi that drops. Here’s how we built low-latency capture, chunked buffering and resumable upload in React Native.

React NativeReal-time AudioStreamingAWS / AzureHealthcare

Introduction

Listening to the world, in real time

A clinical encounter doesn’t pause for the software. Our clinical scribe has to capture a live conversation on a clinician’s phone and stream it to the cloud continuously: low latency, lossless, resilient to the dead zones every hospital has. Here’s the capture-and-streaming pipeline we built in React Native.

Most libraries handle local playback or recording. But continuous streaming, especially low-latency streaming, introduces an entirely new layer of complexity: buffering, threading, and native-bridge synchronization. Here is how we designed the system end to end.

The concept

What conversation capture means

Passively capturing audio from the environment and transmitting it to another endpoint, whether a remote server, a local analyzer, or a visualization module. Unlike simple recording, streaming is a continuous flow: small packets are captured, encoded, transmitted and consumed in near real-time.

Microphone

Live capture

PCM frames

16kHz · mono

Encode

Base64 / PCM

Transmit

REST / WebSocket

Consumer

Decode · analyze

The challenge

Why this is hard in React Native

React Native isn’t designed for continuous audio streams out of the box. Four problems dominated the work.

Real-time capture

Getting microphone data continuously, on both Android and iOS.

Low-latency transport

Sending small audio chunks fast enough for a real-time experience.

Thread synchronization

The JS bridge introduces timing drift if threading is left unmanaged.

Platform inconsistency

iOS background permissions vs Android buffer behavior diverge.

Designing the pipeline

The streaming architecture

The goal

Capture audio → append raw base64 chunks to a file → update chunk metadata to trigger upload → once enough data is collected, read from the file and push to the server.

Rather than a WebSocket, we transmit through Azure’s buffered store: individual chunks are updated by index via REST, and committed when the final chunk arrives. The stream exposes two operations: updateChunk to write a chunk at a specific index, and commitChunk to write the final size to the audio header and flag the last chunk.

updateChunk

Updates an individual chunk at a very specific index, idempotently.

commitChunk

Writes final size to the audio header and flags the final chunk of the file.

UploadBuffer

From a continuous stream to discrete chunks

The UploadBuffer manages the real-time streaming of data into uploadable chunks, turning a continuous source (audio, video, sensor data) into manageable units the queue can ship.

Stream-to-chunk transformation

Bounds memory during long recordings, creates network-optimal chunks, and enables granular progress + partial recovery.

Async write management

A sequential write queue guarantees ordering, never blocks the UI, isolates failures, and handles backpressure.

Dynamic file management

Append-only writes with precise offset tracking, and graceful recovery of files left from previous sessions.

Stream in

UploadBuffer

accumulating64KB

Chunks out

00010203

The core write method defers file initialization until the first write, calculates chunk boundaries atomically, and queues every file operation to preserve order:

UploadBuffer.ts

1public write(base64Data: string): void {2  const data = Buffer.from(base64Data, "base64");3  this.currentChunkSize += data.byteLength;4 5  this._writeQueue.queue(Math.random().toString(), async () => {6    if (this.currentFileOffset === undefined) {7      if (await FS.exists(this._uploadQueue.filePath)) {8        const { size } = await FS.stat(this._uploadQueue.filePath);9        this.currentFileOffset = size;10      } else {11        this.currentFileOffset = 0;12      }13    }14 15    await FS.appendFile(this._uploadQueue.filePath, base64Data, "base64");16    if (this.currentChunkSize >= this._chunkSize) {17      const offset = this.currentFileOffset;18      this.currentFileOffset += this._chunkSize;19      this.currentChunkSize -= this._chunkSize;20      this._uploadQueue.queue(offset, this._chunkSize);21    }22  });23}

Lazy file initialization

File operations are deferred until the first write, avoiding unnecessary I/O and handling existing files gracefully.

Atomic chunk processing

Byte counts and offsets are updated atomically with chunk creation, then handed off to the upload queue.

Asynchronous coordination

All file operations are queued, preserving strict write ordering while never blocking on slow I/O.

UploadQueue

Chunked, resumable, persistent uploads

Mobile uploads face unreliable networks, app backgrounding, and memory constraints. The UploadQueue is a chunked, resumable and persistent upload system built for exactly those conditions.

Chunked strategy

Files break into small chunks: memory-efficient, individually retryable, and progress-trackable.

Persistent state

Upload state survives app sessions: pending chunks, completion metadata, and error state.

Network-aware

Automatic pause when offline, resume on reconnect, and in-flight request cancellation.

UploadQueue

network online

upsertChunk(0) · 0/90%

The upload-stream abstraction

UploadStream.ts

1export interface UploadStream<T> {2  upsertChunk(data: string | Uint8Array, chunkIndex: number): Promise<void>;3  commitChunks(completionState: T): Promise<void>;4}

Initialization recovers pending chunks

UploadQueue.create.ts

1public static async create<T extends UploadStream<any>>(2  filePath: string,3  userPartition: string,4  targetStream: T,5) {6  const bufferedStorage = await BufferStorage.create<string, QueuedUploadStat>({7    storage: new AsyncStorage<QueuedUploadStat>(`${userPartition}-queuedUploads`),8    debounceMs: 200,9  });10  const queue = new UploadQueue<T>(filePath, bufferedStorage, targetStream);11  await queue.queuePendingChunks();12  return queue;13}

Queuing persists before uploading

UploadQueue.queue.ts

1public queue(offset: number, length: number): void {2  const stat = this.stat();3  const index = this.lastIndex === undefined ? 0 : this.lastIndex + 1;4  stat.chunks.push({ offset, length, index });5  this.storage.set(this.storageKey, stat);6  this.lastIndex = index;7  this.queueUpload();8}

Execution is a careful state machine

UploadQueue.uploadNext.ts

1private async uploadNext(): Promise<void> {2  if (!(await this.networkConsumer.isOnline(true)) || this._paused) {3    this.stopUploads();4    return;5  }6 7  const stat = this.stat();8  const chunk = stat.chunks.shift();9  if (!chunk) {10    this.stopUploads();11    return;12  }13 14  // read chunk from file, upsert to the stream, retry on failure15}

Network failure recovery

UploadQueue.network.ts

1private handleOnlineChange(isOnline: boolean): void {2  if (isOnline) {3    this.queueUpload();4  } else {5    this.uploadQueue.cancelAll();6    this.stopUploads();7  }8}

Failed chunks are returned to the front of the queue for immediate retry.
Persistent error flags prevent infinite retry loops; backoff can be layered on.
Sequential indexing guarantees chunks never upload out of order.
A single active upload at a time prevents resource contention; reads stream on demand.

Putting it together

record-and-upload.ts

1// Create an upload queue for the audio file2const uploadQueue = await UploadQueue.create(3  audioFilePath,4  userPartition,5  new DialogueWav(apolloClients, mediaReference, audioFilePath),6);7 8uploadQueue.onProgressChanged((progress) => updateProgressBar(progress));9uploadQueue.onUploadingComplete(async () => {10  await processCompletedUpload();11  await cleanupTempFiles();12});13 14// Queue file chunks15const fileSize = await getFileSize(audioFilePath);16const chunkSize = 64 * 1024; // 64KB chunks17for (let offset = 0; offset < fileSize; offset += chunkSize) {18  const length = Math.min(chunkSize, fileSize - offset);19  uploadQueue.queue(offset, length);20}21 22// Finalize23uploadQueue.close({ dialogueId, duration: recordingDuration });

Audio capture

Recording with react-native-audio-record

The capture layer uses react-native-audio-record for real-time recording, tuned for speech and wired directly into the upload pipeline.

Optimized for speech

recorderOptions.ts

1export const recorderOptions: RecorderOptions = {2  sampleRate: 16000,  // optimal for speech processing3  channels: 1,        // mono recording for efficiency4  bitsPerSample: 16,  // standard quality for voice5  audioSource: 6,     // platform-optimized microphone source6};

Recording lifecycle

Idle

In progress

Paused

Error

Automatic safety limits: an 84-minute maximum with a 5-minute warning. Pause and resume preserve session state; silent gaps are detected automatically.

Audio flows directly from the native recording layer into the upload buffer, enabling real-time cloud streaming with no local accumulation:

AudioRecord.on.ts

1AudioRecord.on("data", (...props) => {2  uploadBuffer.write(...props);            // stream to the upload system3  throttledUpdateAudioChunk(...props);     // update UI visualization (100ms)4});

Throttled UI updates (100ms) prevent Android UI blocking while the waveform renders.
A streaming architecture avoids accumulating audio data in memory.
Keep-awake integration during active recording; cross-platform permission handling with graceful fallback.

Mobile considerations

iOS and Android pull in different directions

The same pipeline meets two very different runtimes. Designing for both from the start avoids surprises in production.

iOS

Requires UIBackgroundModes: ['audio'] for background recording.
More consistent audio frame delivery.
Fast file writes → minimal timing drift.

Android

Requires the RECORD_AUDIO permission.
Throttle UI updates (waveform ~every 100ms).
Avoid heavy JS work inside the data callback.

Performance

Lessons from timing-sensitive audio

Throttle the JS bridge: send buffer chunks (20-40 ms) instead of per-sample calls.
Binary WebSocket frames outperform base64 for high throughput.
Keep UI updates separate: waveform rendering uses throttled state or off-thread animation.
Use native modules for playback: JS decoding isn’t fast enough for 44.1kHz PCM.
A backpressure strategy (dropping late chunks) keeps real-time latency bounded under load.

Going beyond

Toward a smarter clinical audio layer

Once the pipeline works, it becomes a foundation for clinical audio intelligence: the same architecture that powers real-time note drafting, assistive prompts and richer decision support.

Noise classification

Integrate models that label sounds and environments in real time.

Pattern triggers

Fire actions on sound intensity or recognized patterns.

Full-duplex audio

Use WebRTC for talkback and two-way features.

Cloud intelligence

Stream to speech recognition or anomaly detection services.

Conclusion

Real-time, built entirely in React Native

Building real-time clinical audio streaming pushed the limits of what React Native can do with real-time data. It took native audio APIs, resilient transport, and efficient visualization. But the result was a fully functional real-time capture layer, built entirely in React Native. Whether you’re building a clinical scribe, an AI sound analyzer, or a remote monitor, these same principles can power your architecture.

All case studies