Construct a serverless audio summarization answer with Amazon Bedrock and Whisper

Recordings of enterprise conferences, interviews, and buyer interactions have turn into important for preserving essential info. Nonetheless, transcribing and summarizing these recordings manually is commonly time-consuming and labor-intensive. With the progress in generative AI and automated speech recognition (ASR), automated options have emerged to make this course of sooner and extra environment friendly.

Defending personally identifiable info (PII) is a crucial side of information safety, pushed by each moral obligations and authorized necessities. On this publish, we reveal use the Open AI Whisper basis mannequin (FM) Whisper Massive V3 Turbo, accessible in Amazon Bedrock Market, which presents entry to over 140 fashions via a devoted providing, to supply close to real-time transcription. These transcriptions are then processed by Amazon Bedrock for summarization and redaction of delicate info.

Amazon Bedrock is a completely managed service that gives a alternative of high-performing FMs from main AI corporations like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming quickly), Stability AI, and Amazon Nova via a single API, together with a broad set of capabilities to construct generative AI purposes with safety, privateness, and accountable AI. Moreover, you should use Amazon Bedrock Guardrails to mechanically redact delicate info, together with PII, from the transcription summaries to help compliance and information safety wants.

On this publish, we stroll via an end-to-end structure that mixes a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Features to orchestrate the workflow, facilitating seamless integration and processing.

Answer overview

The answer highlights the ability of integrating serverless applied sciences with generative AI to automate and scale content material processing workflows. The consumer journey begins with importing a recording via a React frontend software, hosted on Amazon CloudFront and backed by Amazon Easy Storage Service (Amazon S3) and Amazon API Gateway. When the file is uploaded, it triggers a Step Features state machine that orchestrates the core processing steps, utilizing AI fashions and Lambda capabilities for seamless information stream and transformation. The next diagram illustrates the answer structure.

The workflow consists of the next steps:

The React software is hosted in an S3 bucket and served to customers via CloudFront for quick, international entry. API Gateway handles interactions between the frontend and backend companies.
Customers add audio or video recordsdata immediately from the app. These recordings are saved in a delegated S3 bucket for processing.
An Amazon EventBridge rule detects the S3 add occasion and triggers the Step Features state machine, initiating the AI-powered processing pipeline.
The state machine performs audio transcription, summarization, and redaction by orchestrating a number of Amazon Bedrock fashions in sequence. It makes use of Whisper for transcription, Claude for summarization, and Guardrails to redact delicate information.
The redacted abstract is returned to the frontend software and exhibited to the consumer.

The next diagram illustrates the state machine workflow.

The Step Features state machine orchestrates a sequence of duties to transcribe, summarize, and redact delicate info from uploaded audio/video recordings:

A Lambda perform is triggered to collect enter particulars (for instance, Amazon S3 object path, metadata) and put together the payload for transcription.
The payload is shipped to the OpenAI Whisper Massive V3 Turbo mannequin via the Amazon Bedrock Market to generate a close to real-time transcription of the recording.
The uncooked transcript is handed to Anthropic’s Claude Sonnet 3.5 via Amazon Bedrock, which produces a concise and coherent abstract of the dialog or content material.
A second Lambda perform validates and forwards the abstract to the redaction step.
The abstract is processed via Amazon Bedrock Guardrails, which mechanically redacts PII and different delicate information.
The redacted abstract is saved or returned to the frontend software via an API, the place it’s exhibited to the consumer.

Stipulations

Earlier than you begin, just remember to have the next conditions in place:

Create a guardrail within the Amazon Bedrock console

For directions for creating guardrails in Amazon Bedrock, seek advice from Create a guardrail. For particulars on detecting and redacting PII, see Take away PII from conversations through the use of delicate info filters. Configure your guardrail with the next key settings:

Allow PII detection and dealing with
Set PII motion to Redact
Add the related PII varieties, equivalent to:
- Names and identities
- Cellphone numbers
- Electronic mail addresses
- Bodily addresses
- Monetary info
- Different delicate private info

After you deploy the guardrail, be aware the Amazon Useful resource Identify (ARN), and you’ll be utilizing this when deploys the mannequin.

Deploy the Whisper mannequin

Full the next steps to deploy the Whisper Massive V3 Turbo mannequin:

On the Amazon Bedrock console, select Mannequin catalog underneath Basis fashions within the navigation pane.
Seek for and select Whisper Massive V3 Turbo.
On the choices menu (three dots), select Deploy.

Modify the endpoint title, variety of situations, and occasion sort to fit your particular use case. For this publish, we use the default settings.
Modify the Superior settings part to fit your use case. For this publish, we use the default settings.
Select Deploy.

This creates a brand new AWS Id and Entry Administration IAM function and deploys the mannequin.

You’ll be able to select Market deployments within the navigation pane, and within the Managed deployments part, you possibly can see the endpoint standing as Creating. Anticipate the endpoint to complete deployment and the standing to vary to In Service, then copy the Endpoint Identify, and you’ll be utilizing this when deploying the

Deploy the answer infrastructure

Within the GitHub repo, observe the directions within the README file to clone the repository, then deploy the frontend and backend infrastructure.

We use the AWS Cloud Growth Equipment (AWS CDK) to outline and deploy the infrastructure. The AWS CDK code deploys the next sources:

React frontend software
Backend infrastructure
S3 buckets for storing uploads and processed outcomes
Step Features state machine with Lambda capabilities for audio processing and PII redaction
API Gateway endpoints for dealing with requests
IAM roles and insurance policies for safe entry
CloudFront distribution for internet hosting the frontend

Implementation deep dive

The backend consists of a sequence of Lambda capabilities, every dealing with a particular stage of the audio processing pipeline:

Add handler – Receives audio recordsdata and shops them in Amazon S3
Transcription with Whisper – Converts speech to textual content utilizing the Whisper mannequin
Speaker detection – Differentiates and labels particular person audio system throughout the audio
Summarization utilizing Amazon Bedrock – Extracts and summarizes key factors from the transcript
PII redaction – Makes use of Amazon Bedrock Guardrails to take away delicate info for privateness compliance

Let’s study a few of the key elements:

The transcription Lambda perform makes use of the Whisper mannequin to transform audio recordsdata to textual content:

def transcribe_with_whisper(audio_chunk, endpoint_name):
    # Convert audio to hex string format
    hex_audio = audio_chunk.hex()
    
    # Create payload for Whisper mannequin
    payload = {
        "audio_input": hex_audio,
        "language": "english",
        "process": "transcribe",
        "top_p": 0.9
    }
    
    # Invoke the SageMaker endpoint operating Whisper
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=json.dumps(payload)
    )
    
    # Parse the transcription response
    response_body = json.hundreds(response['Body'].learn().decode('utf-8'))
    transcription_text = response_body['text']
    
    return transcription_text

We use Amazon Bedrock to generate concise summaries from the transcriptions:

def generate_summary(transcription):
    # Format the immediate with the transcription
    immediate = f"{transcription}nnGive me the abstract, audio system, key discussions, and motion gadgets with homeowners"
    
    # Name Bedrock for summarization
    response = bedrock_runtime.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
        physique=json.dumps({
            "immediate": immediate,
            "max_tokens_to_sample": 4096,
            "temperature": 0.7,
            "top_p": 0.9,
        })
    )
    
    # Extract and return the abstract
    consequence = json.hundreds(response.get('physique').learn())
    return consequence.get('completion')

A crucial element of our answer is the automated redaction of PII. We applied this utilizing Amazon Bedrock Guardrails to help compliance with privateness laws:

def apply_guardrail(bedrock_runtime, content material, guardrail_id):
# Format content material in response to API necessities
formatted_content = [{"text": {"text": content}}]

# Name the guardrail API
response = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="DRAFT",
supply="OUTPUT",  # Utilizing OUTPUT parameter for correct stream
content material=formatted_content
)

# Extract redacted textual content from response
if 'motion' in response and response['action'] == 'GUARDRAIL_INTERVENED':
if len(response['outputs']) > 0:
output = response['outputs'][0]
if 'textual content' in output and isinstance(output['text'], str):
return output['text']

# Return unique content material if redaction fails
return content material

When PII is detected, it’s changed with sort indicators (for instance, {PHONE} or {EMAIL}), ensuring that summaries stay informative whereas defending delicate information.

To handle the complicated processing pipeline, we use Step Features to orchestrate the Lambda capabilities:

{
"Remark": "Audio Summarization Workflow",
"StartAt": "TranscribeAudio",
"States": {
"TranscribeAudio": {
"Sort": "Job",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "WhisperTranscriptionFunction",
"Payload": {
"bucket": "$.bucket",
"key": "$.key"
}
},
"Subsequent": "IdentifySpeakers"
},
"IdentifySpeakers": {
"Sort": "Job",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "SpeakerIdentificationFunction",
"Payload": {
"Transcription.$": "$.Payload"
}
},
"Subsequent": "GenerateSummary"
},
"GenerateSummary": {
"Sort": "Job",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "BedrockSummaryFunction",
"Payload": {
"SpeakerIdentification.$": "$.Payload"
}
},
"Finish": true
}
}
}

This workflow makes certain every step completes efficiently earlier than continuing to the subsequent, with automated error dealing with and retry logic in-built.

Check the answer

After you’ve efficiently accomplished the deployment, you should use the CloudFront URL to check the answer performance.

Safety issues

Safety is a crucial side of this answer, and we’ve applied a number of greatest practices to help information safety and compliance:

Delicate information redaction – Robotically redact PII to guard consumer privateness.
High-quality-Grained IAM Permissions – Apply the precept of least privilege throughout AWS companies and sources.
Amazon S3 entry controls – Use strict bucket insurance policies to restrict entry to licensed customers and roles.
API safety – Safe API endpoints utilizing Amazon Cognito for consumer authentication (optionally available however really helpful).
CloudFront safety – Implement HTTPS and apply fashionable TLS protocols to facilitate safe content material supply.
Amazon Bedrock information safety – Amazon Bedrock (together with Amazon Bedrock Market) protects buyer information and doesn’t ship information to suppliers or practice utilizing buyer information. This makes certain your proprietary info stays safe when utilizing AI capabilities.

Clear up

To stop pointless expenses, be certain that to delete the sources provisioned for this answer whenever you’re accomplished:

Delete the Amazon Bedrock guardrail:
1. On the Amazon Bedrock console, within the navigation menu, select Guardrails.
2. Select your guardrail, then select Delete.
Delete the Whisper Massive V3 Turbo mannequin deployed via the Amazon Bedrock Market:
1. On the Amazon Bedrock console, select Market deployments within the navigation pane.
2. Within the Managed deployments part, choose the deployed endpoint and select Delete.
Delete the AWS CDK stack by operating the command cdk destroy, which deletes the AWS infrastructure.

Conclusion

This serverless audio summarization answer demonstrates the advantages of mixing AWS companies to create a complicated, safe, and scalable software. By utilizing Amazon Bedrock for AI capabilities, Lambda for serverless processing, and CloudFront for content material supply, we’ve constructed an answer that may deal with massive volumes of audio content material effectively whereas serving to you align with safety greatest practices.

The automated PII redaction characteristic helps compliance with privateness laws, making this answer well-suited for regulated industries equivalent to healthcare, finance, and authorized companies the place information safety is paramount. To get began, deploy this structure inside your AWS surroundings to speed up your audio processing workflows.

In regards to the Authors

Kaiyin Hu is a Senior Options Architect for Strategic Accounts at Amazon Net Companies, with years of expertise throughout enterprises, startups, {and professional} companies. At present, she helps clients construct cloud options and drives GenAI adoption to cloud. Beforehand, Kaiyin labored within the Good Residence area, helping clients in integrating voice and IoT applied sciences.

Sid Vantair is a Options Architect with AWS overlaying Strategic accounts. He thrives on resolving complicated technical points to beat buyer hurdles. Exterior of labor, he cherishes spending time along with his household and fostering inquisitiveness in his youngsters.