{"id":13022,"date":"2026-03-24T00:14:15","date_gmt":"2026-03-24T00:14:15","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=13022"},"modified":"2026-03-24T00:14:16","modified_gmt":"2026-03-24T00:14:16","slug":"construct-a-wise-monetary-assistant-with-llamaparse-and-gemini-3-1","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=13022","title":{"rendered":"Construct a wise monetary assistant with LlamaParse and Gemini 3.1"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p><img decoding=\"async\" class=\"banner-image\" src=\"https:\/\/storage.googleapis.com\/gweb-developer-goog-blog-assets\/images\/llamaindex_gemini-api_1.original.png\" alt=\"llamaindex_gemini-api (1)\"\/>  <\/p>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">Extracting textual content from unstructured paperwork is a traditional developer headache. For many years, conventional Optical Character Recognition (OCR) techniques have struggled with advanced layouts, usually turning multi-column PDFs, embedded photographs, and nested tables into an unreadable mess of plain textual content.<\/p>\n<p data-block-key=\"4plgp\">As we speak, the multimodal capabilities of huge language fashions (LLMs) lastly make dependable doc understanding attainable.<\/p>\n<p data-block-key=\"5548k\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.cloud.llamaindex.ai\/llamaparse\/getting_started\">LlamaParse<\/a> bridges the hole between conventional OCR and vision-language agentic parsing. It delivers state-of-the-art textual content extraction throughout PDFs, displays, and pictures.<\/p>\n<p data-block-key=\"713p4\">On this publish, you&#8217;ll discover ways to use Gemini to energy LlamaParse, extract high-quality textual content and tables from unstructured paperwork, and construct an clever private finance assistant. As a reminder, Gemini fashions might make errors and shouldn&#8217;t be relied upon for skilled recommendation.<\/p>\n<h3 data-block-key=\"ai9ov\" id=\"why-llamaparse\"><b>Why LlamaParse?<\/b><\/h3>\n<p data-block-key=\"4jpk1\">In lots of circumstances, LLMs can already carry out this process successfully, nevertheless, when working with giant doc collections or extremely variable codecs, consistency and reliability can turn out to be more difficult.<\/p>\n<p data-block-key=\"6rrvf\">Devoted instruments like LlamaParse complement LLM capabilities by introducing preprocessing steps and customizable parsing directions, which assist construction advanced components comparable to giant tables or dense textual content. On the whole parsing benchmarks, this strategy has proven round a 13\u201315% enchancment in comparison with processing uncooked paperwork immediately.<\/p>\n<h3 data-block-key=\"mwrr6\" id=\"the-use-case:-parsing-brokerage-statements\"><b>The use case: parsing brokerage statements<\/b><\/h3>\n<p data-block-key=\"4ftb5\">Brokerage statements characterize the final word doc parsing problem. They include dense monetary jargon, advanced nested tables, and dynamic layouts.<\/p>\n<p data-block-key=\"4atar\">To assist customers perceive their monetary scenario, you want a workflow that not solely parses the file, however explicitly extracts the tables and explains the information via an LLM.<\/p>\n<p data-block-key=\"fscio\">Due to these superior reasoning and multimodal necessities, Gemini 3.1 Professional is the right match because the underlying mannequin. It balances an enormous context window with native spatial format comprehension.<\/p>\n<p data-block-key=\"d09ab\">The workflow operates in 4 levels:<\/p>\n<ol>\n<li data-block-key=\"7nkp3\"><b>Ingest:<\/b> You submit a PDF to the LlamaParse engine.<\/li>\n<li data-block-key=\"1h0c7\"><b>Route:<\/b> The engine parses the doc and emits a <code>ParsingDoneEvent<\/code>.<\/li>\n<li data-block-key=\"645tu\"><b>Extract:<\/b> This occasion triggers two parallel duties \u2014 textual content extraction and desk extraction \u2014 that run concurrently to reduce latency.<\/li>\n<li data-block-key=\"bk1gq\"><b>Synthesize:<\/b> As soon as each extractions full, Gemini generates a human-readable abstract.<\/li>\n<\/ol>\n<p data-block-key=\"9febc\">This two-model structure is a deliberate design selection: <b>Gemini 3.1 Professional<\/b> handles the exhausting layout-comprehension throughout parsing, whereas <b>Gemini 3 Flash<\/b> handles the ultimate summarization \u2014 optimizing for each accuracy and value.<\/p>\n<p data-block-key=\"2gt0i\">You&#8217;ll find the entire code for this tutorial within the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\">LlamaParse x Gemini demo GitHub repository<\/a>.<\/p>\n<h3 data-block-key=\"e8job\" id=\"setting-up-the-environment\"><b>Establishing the atmosphere<\/b><\/h3>\n<p data-block-key=\"17tnc\">First, set up the required Python packages for LlamaCloud, LlamaIndex workflows, and the Google GenAI SDK.<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-shell\"># with pip&#13;\npip set up llama-cloud-services llama-index-workflows pandas google-genai&#13;\n&#13;\n# with uv&#13;\nuv add llama-cloud-services llama-index-workflows pandas google-genai<\/code><\/pre>\n<p>\n        Shell\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">Subsequent, export your API keys as atmosphere variables. Get a Gemini API key from <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/ai.studio\/api-keys\">AI Studio<\/a>, and a LlamaCloud API key from <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/cloud.llamaindex.ai\/\">the console<\/a>. <b>Safety Word:<\/b> By no means hardcode your API keys in your software supply code.<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-shell\">export LLAMA_CLOUD_API_KEY=\"your_llama_cloud_key\"&#13;\nexport GEMINI_API_KEY=\"your_google_api_key\"<\/code><\/pre>\n<p>\n        Shell\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<h3 data-block-key=\"nc8e6\" id=\"step-1:-create-and-use-the-parser\"><b>Step 1: Create and use the parser<\/b><\/h3>\n<p data-block-key=\"c2omj\">Step one in your workflow is parsing. You create a LlamaParse shopper backed by Gemini 3.1 Professional and outline it in <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\/blob\/main\/src\/llamaparse_gemini\/resources.py\">sources.py<\/a> so you may inject it into your workflow as a useful resource:<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-python\">def get_llama_parse() -&gt; LlamaParse:&#13;\n    return LlamaParse(&#13;\n        api_key=os.getenv(\"LLAMA_CLOUD_API_KEY\"),&#13;\n        parse_mode=\"parse_page_with_agent\",&#13;\n        mannequin=\"gemini-3.1-pro\",&#13;\n        result_type=ResultType.MD,&#13;\n    )<\/code><\/pre>\n<p>\n        Python\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">The <code>parse_page_with_agent<\/code> mode applies a layer of agentic iteration guided by Gemini to right and format OCR outcomes based mostly on visible context.<\/p>\n<p data-block-key=\"8tnn0\">In <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\/blob\/main\/src\/llamaparse_gemini\/workflow.py\">workflow.py<\/a>, outline the occasions, state, and the parsing step:<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-python\">class BrokerageStatementWorkflow(Workflow):&#13;\n    @step&#13;\n    async def parse_file(&#13;\n        self,&#13;\n        ev: FileEvent,&#13;\n        ctx: Context[WorkflowState],&#13;\n        parser: Annotated[LlamaParse, Resource(get_llama_parse)]&#13;\n    ) -&gt; ParsingDoneEvent | OutputEvent:&#13;\n        consequence = forged(ParsingJobResult, (await parser.aparse(file_path=ev.input_file)))&#13;\n        async with ctx.retailer.edit_state() as state:&#13;\n            state.parsing_job_result = consequence&#13;\n        return ParsingDoneEvent()<\/code><\/pre>\n<p>\n        Python\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">Discover that you don&#8217;t course of parsing outcomes instantly. As an alternative, you retailer them within the world <code>WorkflowState<\/code> so they&#8217;re accessible for the extraction steps that observe.<\/p>\n<h3 data-block-key=\"gsy7s\" id=\"step-2:-extract-the-text-and-tables\"><b>Step 2: Extract the textual content and tables<\/b><\/h3>\n<p data-block-key=\"912bm\">To supply the LLM with the context required to elucidate the monetary assertion, it&#8217;s good to extract the total markdown textual content and the tabular knowledge. Add the extraction steps to your <code>BrokerageStatementWorkflow<\/code> class (see the total implementation in <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\/blob\/main\/src\/llamaparse_gemini\/workflow.py\">workflow.py<\/a>):<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-python\">@step&#13;\nasync def extract_text(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState]) -&gt; TextExtractionDoneEvent:&#13;\n    # Extraction logic omitted for brevity. See repo.&#13;\n&#13;\n@step&#13;\nasync def extract_tables(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState], ...) -&gt; TablesExtractionDoneEvent:&#13;\n    # Extraction logic omitted for brevity. See repo.<\/code><\/pre>\n<p>\n        Python\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">As a result of each steps hear for a similar <code>ParsingDoneEvent<\/code>, LlamaIndex Workflows robotically executes them <b>in parallel<\/b>. This implies your textual content and desk extractions run concurrently \u2014 chopping total pipeline latency and making the structure naturally scalable as you add extra extraction duties.<\/p>\n<h3 data-block-key=\"zmzsz\" id=\"step-3:-generate-the-summary\"><b>Step 3: Generate the abstract<\/b><\/h3>\n<p data-block-key=\"bq4uq\">With the information extracted, you may immediate Gemini 3.1 Professional to generate a abstract in accessible, non-technical language.<\/p>\n<p data-block-key=\"2c2ts\">Configure the LLM shopper and immediate template in <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\/blob\/main\/src\/llamaparse_gemini\/resources.py\">sources.py<\/a>. Right here, you employ <b>Gemini 3 Flash<\/b> for the ultimate summarization, because it presents low latency and value effectivity for textual content aggregation duties.<\/p>\n<p data-block-key=\"cduou\">The ultimate synthesis step makes use of <code>ctx.collect_events<\/code> to attend for each extractions to finish earlier than calling the Gemini API.<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-python\">@step&#13;\nasync def ask_llm(&#13;\n    self,&#13;\n    ev: TablesExtractionDoneEvent | TextExtractionDoneEvent,&#13;\n    ctx: Context[WorkflowState],&#13;\n    llm: Annotated[GenAIClient, Resource(get_llm)],&#13;\n    template: Annotated[Template, Resource(get_prompt_template)]&#13;\n) -&gt; OutputEvent:&#13;\n    if ctx.collect_events(ev, [TablesExtractionDoneEvent, TextExtractionDoneEvent]) is None:&#13;\n        return None&#13;\n    # Full immediate and LLM name accessible in repo.<\/code><\/pre>\n<p>\n        Python\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<h3 data-block-key=\"o4qv8\" id=\"running-the-workflow\"><b>Working the workflow<\/b><\/h3>\n<p data-block-key=\"9fahi\">To tie all of it collectively, the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\/blob\/main\/src\/llamaparse_gemini\/main.py\">essential.py<\/a> entry level creates and runs the workflow:<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-python\">wf = BrokerageStatementWorkflow(timeout=600)&#13;\nconsequence = await wf.run(start_event=FileEvent(input_file=input_file))<\/code><\/pre>\n<p>\n        Python\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">To check the workflow, obtain a pattern assertion from the LlamaIndex datasets:<\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-shell\">curl -L https:\/\/uncooked.githubusercontent.com\/run-llama\/llama-datasets\/essential\/llama_agents\/bank_statements\/brokerage_statement.pdf &gt; brokerage_statement.pdf<\/code><\/pre>\n<p>\n        Shell\n    <\/p>\n<\/div>\n<div class=\"inner-block-content code-block line-numbers\">\n<pre><code class=\"language-shell\"># Utilizing pip&#13;\npython3 essential.py brokerage_statement.pdf&#13;\n&#13;\n# Utilizing uv&#13;\nuv run run-workflow brokerage_statement.pdf<\/code><\/pre>\n<p>\n        Shell\n    <\/p>\n<\/div>\n<div class=\"inner-block-content rich-content\">\n<p data-block-key=\"szawa\">You now have a completely useful private finance assistant operating in your terminal, able to analyzing advanced monetary PDFs.<\/p>\n<h3 data-block-key=\"kdfmu\" id=\"next-steps\"><b>Subsequent steps<\/b><\/h3>\n<p data-block-key=\"akcgd\">AI pipelines are solely pretty much as good as the information you feed them. By combining Gemini 3.1 Professional&#8217;s multimodal reasoning with LlamaParse&#8217;s agentic ingestion, you guarantee your functions have the total, structured context they want \u2014 not simply flattened textual content.<\/p>\n<p data-block-key=\"79m67\">Once you base your structure on event-driven statefulness, just like the parallel extractions demonstrated right here, you construct techniques which are quick, scalable, and resilient. Double-check outputs earlier than counting on them.<\/p>\n<p data-block-key=\"5bklq\">Able to implement this in manufacturing? Discover <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.cloud.llamaindex.ai\/llamaparse\/getting_started\">LlamaParse<\/a> and the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/ai.google.dev\/docs\">Gemini API documentation<\/a> to experiment with multimodal technology, and dive into the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/run-llama\/llamaparse-gemini-demo\">full code within the GitHub repository<\/a>.<\/p>\n<\/div><\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Extracting textual content from unstructured paperwork is a traditional developer headache. For many years, conventional Optical Character Recognition (OCR) techniques have struggled with advanced layouts, usually turning multi-column PDFs, embedded photographs, and nested tables into an unreadable mess of plain textual content. As we speak, the multimodal capabilities of huge language fashions (LLMs) lastly make [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":13024,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[56],"tags":[122,73,94,295,8352,83],"class_list":["post-13022","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software","tag-assistant","tag-build","tag-financial","tag-gemini","tag-llamaparse","tag-smart"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/13022","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13022"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/13022\/revisions"}],"predecessor-version":[{"id":13023,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/13022\/revisions\/13023"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/13024"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-05-08 06:34:40 UTC -->