Conferences play a vital position in decision-making, mission coordination, and collaboration, and distant conferences are widespread throughout many organizations. Nonetheless, capturing and structuring key takeaways from these conversations is commonly inefficient and inconsistent. Manually summarizing conferences or extracting motion objects requires vital effort and is susceptible to omissions or misinterpretations.
Giant language fashions (LLMs) provide a extra sturdy resolution by remodeling unstructured assembly transcripts into structured summaries and motion objects. This functionality is very helpful for mission administration, buyer help and gross sales calls, authorized and compliance, and enterprise information administration.
On this put up, we current a benchmark of various understanding fashions from the Amazon Nova household out there on Amazon Bedrock, to supply insights on how one can select one of the best mannequin for a gathering summarization activity.
LLMs to generate assembly insights
Trendy LLMs are extremely efficient for summarization and motion merchandise extraction resulting from their means to know context, infer matter relationships, and generate structured outputs. In these use circumstances, immediate engineering gives a extra environment friendly and scalable method in comparison with conventional mannequin fine-tuning or customization. Moderately than modifying the underlying mannequin structure or coaching on massive labeled datasets, immediate engineering makes use of fastidiously crafted enter queries to information the mannequin’s habits, immediately influencing the output format and content material. This methodology permits for speedy, domain-specific customization with out the necessity for resource-intensive retraining processes. For duties resembling assembly summarization and motion merchandise extraction, immediate engineering permits exact management over the generated outputs, ensuring they meet particular enterprise necessities. It permits for the versatile adjustment of prompts to go well with evolving use circumstances, making it a really perfect resolution for dynamic environments the place mannequin behaviors should be shortly reoriented with out the overhead of mannequin fine-tuning.
Amazon Nova fashions and Amazon Bedrock
Amazon Nova fashions, unveiled at AWS re:Invent in December 2024, are constructed to ship frontier intelligence at industry-leading value efficiency. They’re among the many quickest and most cost-effective fashions of their respective intelligence tiers, and are optimized to energy enterprise generative AI purposes in a dependable, safe, and cost-effective method.
The understanding mannequin household has 4 tiers of fashions: Nova Micro (text-only, ultra-efficient for edge use), Nova Lite (multimodal, balanced for versatility), Nova Professional (multimodal, stability of velocity and intelligence, excellent for many enterprise wants) and Nova Premier (multimodal, probably the most succesful Nova mannequin for advanced duties and instructor for mannequin distillation). Amazon Nova fashions can be utilized for quite a lot of duties, from summarization to structured textual content technology. With Amazon Bedrock Mannequin Distillation, prospects may also deliver the intelligence of Nova Premier to a sooner and more cost effective mannequin resembling Nova Professional or Nova Lite for his or her use case or area. This may be achieved by means of the Amazon Bedrock console and APIs such because the Converse API and Invoke API.
Answer overview
This put up demonstrates the best way to use Amazon Nova understanding fashions, out there by means of Amazon Bedrock, for automated perception extraction utilizing immediate engineering. We deal with two key outputs:
- Assembly summarization – A high-level abstractive abstract that distills key dialogue factors, selections made, and important updates from the assembly transcript
- Motion objects – A structured record of actionable duties derived from the assembly dialog that apply to the whole staff or mission
The next diagram illustrates the answer workflow.
Conditions
To comply with together with this put up, familiarity with calling LLMs utilizing Amazon Bedrock is predicted. For detailed steps on utilizing Amazon Bedrock for textual content summarization duties, discuss with Construct an AI textual content summarizer app with Amazon Bedrock. For extra details about calling LLMs, discuss with the Invoke API and Utilizing the Converse API reference documentation.
Answer elements
We developed the 2 core options of the answer—assembly summarization and motion merchandise extraction—by utilizing standard fashions out there by means of Amazon Bedrock. Within the following sections, we take a look at the prompts that have been used for these key duties.
For the assembly summarization activity, we used a persona task, prompting the LLM to generate a abstract in
For the motion merchandise extraction activity, we gave particular directions on producing motion objects within the prompts and used chain-of-thought to enhance the standard of the generated motion objects. Within the assistant message, the prefix
tag is supplied as a prefilling to nudge the mannequin technology in the suitable route and to keep away from redundant opening and shutting sentences.
Totally different mannequin households reply to the identical prompts otherwise, and it’s vital to comply with the prompting information outlined for the actual mannequin. For extra info on greatest practices for Amazon Nova prompting, discuss with Prompting greatest practices for Amazon Nova understanding fashions.
Dataset
To guage the answer, we used the samples for the general public QMSum dataset. The QMSum dataset is a benchmark for assembly summarization, that includes English language transcripts from educational, enterprise, and governance discussions with manually annotated summaries. It evaluates LLMs on producing structured, coherent summaries from advanced and multi-speaker conversations, making it a precious useful resource for abstractive summarization and discourse understanding. For testing, we used 30 randomly sampled conferences from the QMSum dataset. Every assembly contained 2–5 topic-wise transcripts and contained roughly 8,600 tokens for every transcript in common.
Analysis framework
Attaining high-quality outputs from LLMs in assembly summarization and motion merchandise extraction generally is a difficult activity. Conventional analysis metrics resembling ROUGE, BLEU, and METEOR deal with surface-level similarity between generated textual content and reference summaries, however they typically fail to seize nuances resembling factual correctness, coherence, and actionability. Human analysis is the gold normal however is dear, time-consuming, and never scalable. To handle these challenges, you need to use LLM-as-a-judge, the place one other LLM is used to systematically assess the standard of generated outputs primarily based on well-defined standards. This method provides a scalable and cost-effective option to automate analysis whereas sustaining excessive accuracy. On this instance, we used Anthropic’s Claude 3.5 Sonnet v1 because the decide mannequin as a result of we discovered it to be most aligned with human judgment. We used the LLM decide to attain the generated responses on three important metrics: faithfulness, summarization, and query answering (QA).
The faithfulness rating measures the faithfulness of a generated abstract by measuring the portion of the parsed statements in a abstract which might be supported by given context (for instance, a gathering transcript) with respect to the full variety of statements.
The summarization rating is the mixture of the QA rating and the conciseness rating with the identical weight (0.5). The QA rating measures the protection of a generated abstract from a gathering transcript. It first generates an inventory of query and reply pairs from a gathering transcript and measures the portion of the questions which might be requested appropriately when the abstract is used as a context as a substitute of a gathering transcript. The QA rating is complimentary to the faithfulness rating as a result of the faithfulness rating doesn’t measure the protection of a generated abstract. We solely used the QA rating to measure the standard of a generated abstract as a result of the motion objects aren’t presupposed to cowl all elements of a gathering transcript. The conciseness rating measures the ratio of the size of a generated abstract divided by the size of the full assembly transcript.
We used a modified model of the faithfulness rating and the summarization rating that had a lot decrease latency than the unique implementation.
Outcomes
Our analysis of Amazon Nova fashions throughout assembly summarization and motion merchandise extraction duties revealed clear performance-latency patterns. For summarization, Nova Premier achieved the very best faithfulness rating (1.0) with a processing time of 5.34s, whereas Nova Professional delivered 0.94 faithfulness in 2.9s. The smaller Nova Lite and Nova Micro fashions supplied faithfulness scores of 0.86 and 0.83 respectively, with sooner processing occasions of two.13s and 1.52s. In motion merchandise extraction, Nova Premier once more led in faithfulness (0.83) with 4.94s processing time, adopted by Nova Professional (0.8 faithfulness, 2.03s). Curiously, Nova Micro (0.7 faithfulness, 1.43s) outperformed Nova Lite (0.63 faithfulness, 1.53s) on this specific activity regardless of its smaller measurement. These measurements present precious insights into the performance-speed traits throughout the Amazon Nova mannequin household for text-processing purposes. The next graphs present these outcomes. The next screenshot reveals a pattern output for our summarization activity, together with the LLM-generated assembly abstract and an inventory of motion objects.
Conclusion
On this put up, we confirmed how you need to use prompting to generate assembly insights resembling assembly summaries and motion objects utilizing Amazon Nova fashions out there by means of Amazon Bedrock. For big-scale AI-driven assembly summarization, optimizing latency, price, and accuracy is crucial. The Amazon Nova household of understanding fashions (Nova Micro, Nova Lite, Nova Professional, and Nova Premier) provides a sensible different to high-end fashions, considerably bettering inference velocity whereas decreasing operational prices. These elements make Amazon Nova a sexy selection for enterprises dealing with massive volumes of assembly information at scale.
For extra info on Amazon Bedrock and the most recent Amazon Nova fashions, discuss with the Amazon Bedrock Consumer Information and Amazon Nova Consumer Information, respectively. The AWS Generative AI Innovation Heart has a bunch of AWS science and technique specialists with complete experience spanning the generative AI journey, serving to prospects prioritize use circumstances, construct a roadmap, and transfer options into manufacturing. Take a look at the Generative AI Innovation Heart for our newest work and buyer success tales.
Concerning the Authors
Baishali Chaudhury is an Utilized Scientist on the Generative AI Innovation Heart at AWS, the place she focuses on advancing Generative AI options for real-world purposes. She has a robust background in pc imaginative and prescient, machine studying, and AI for healthcare. Baishali holds a PhD in Pc Science from College of South Florida and PostDoc from Moffitt Most cancers Centre.
Sungmin Hong is a Senior Utilized Scientist at Amazon Generative AI Innovation Heart the place he helps expedite the number of use circumstances of AWS prospects. Earlier than becoming a member of Amazon, Sungmin was a postdoctoral analysis fellow at Harvard Medical College. He holds Ph.D. in Pc Science from New York College. Exterior of labor, he prides himself on preserving his indoor crops alive for 3+ years.
Mengdie (Flora) Wang is a Information Scientist at AWS Generative AI Innovation Heart, the place she works with prospects to architect and implement scalable Generative AI options that tackle their distinctive enterprise challenges. She makes a speciality of mannequin customization strategies and agent-based AI methods, serving to organizations harness the total potential of generative AI know-how. Previous to AWS, Flora earned her Grasp’s diploma in Pc Science from the College of Minnesota, the place she developed her experience in machine studying and synthetic intelligence.
Anila Joshi has greater than a decade of expertise constructing AI options. As a AWSI Geo Chief at AWS Generative AI Innovation Heart, Anila pioneers revolutionary purposes of AI that push the boundaries of chance and speed up the adoption of AWS companies with prospects by serving to prospects ideate, establish, and implement safe generative AI options.