Gemini API I/O updates – Google Builders Weblog

The Gemini API gives builders a streamlined strategy to construct revolutionary functions with cutting-edge generative AI fashions. Google AI Studio simplifies this course of of testing all of the API capabilities permitting for speedy prototyping and experimentation with textual content, picture, and even video prompts. When builders wish to check and construct at scale they’ll leverage all of the capabilities obtainable by means of the Gemini API.

New fashions obtainable by means of the API

Gemini 2.5 Flash Preview – We’ve added a brand new 2.5 Flash preview (gemini-2.5-flash-preview-05-20) which is best over the earlier preview at reasoning, code, and lengthy context. This model of two.5 Flash is at present #2 on the LMarena leaderboard behind solely 2.5 Professional. We’ve additionally improved Flash cost-efficiency with this newest replace decreasing the variety of tokens wanted for a similar efficiency, leading to 22% effectivity positive factors on our evals. Our purpose is to maintain bettering based mostly in your suggestions, and make each typically obtainable quickly.

Gemini 2.5 Professional and Flash text-to-speech (TTS) – We additionally introduced 2.5 Professional and Flash previews for text-to-speech (TTS) that help native audio output for each single and a number of audio system, throughout 24 languages. With these fashions, you may management TTS expression and elegance, creating wealthy audio output. With multispeaker, you may generate conversations with a number of distinct voices for dynamic interactions.

Gemini 2.5 Flash native audio dialog – In preview, this mannequin is accessible by way of the Reside API to generate pure sounding voices for dialog, in over 30 distinct voices and 24+ languages. We’ve additionally added proactive audio so the mannequin can distinguish between the speaker and background conversations, so it is aware of when to reply. As well as, the mannequin responds appropriately to a consumer’s emotional expression and tone. A separate pondering mannequin permits extra advanced queries. This now makes it attainable so that you can construct conversational AI brokers and experiences that really feel extra intuitive and pure, like enhancing name middle interactions, growing dynamic personas, crafting distinctive voice characters, and extra.

Lyria RealTime – Reside music technology is now obtainable within the Gemini API and Google AI Studio to create a steady stream of instrumental music utilizing textual content prompts. With Lyria RealTime, we use WebSockets to ascertain a persistent, real-time communication channel. The mannequin constantly produces music in small, flowing chunks and adapts based mostly on inputs. Think about including a responsive soundtrack to your app or designing a brand new kind of musical instrument! Check out Lyria RealTime with the PromptDJ-MIDI app in Google AI Studio.

Gemini 2.5 Professional Deep Suppose – We’re additionally testing an experimental reasoning mode for two.5 Professional. We’ve seen unimaginable efficiency with these Deep Pondering capabilities for extremely advanced math and coding prompts. We sit up for making it broadly obtainable so that you can experiment with quickly.

Gemma 3n – Gemma 3n is a generative AI open mannequin optimized to be used in on a regular basis units, similar to telephones, laptops, and tablets. It might probably deal with textual content, audio and imaginative and prescient inputs. This mannequin contains improvements in parameter-efficient processing, together with Per-Layer Embedding (PLE) parameter caching and a MatFormer mannequin structure that gives the pliability to scale back compute and reminiscence necessities.

New performance within the API

Thought summaries

To assist builders perceive and debug mannequin responses, we’ve added thought summaries for two.5 Professional and Flash within the Gemini API. We take the mannequin’s uncooked ideas and synthesize them right into a useful abstract with headers, related particulars and gear calls. The uncooked chain-of-thoughts in Google AI Studio has additionally been up to date with the brand new thought summaries.

Pondering budgets

We launched 2.5 Flash with pondering budgets to offer builders management over how a lot fashions suppose to steadiness efficiency, latency, and value for the apps they’re constructing. We can be extending this functionality to 2.5 Professional quickly.

from google import genai
from google.genai import varieties

shopper = genai.Consumer(api_key="GOOGLE_API_KEY")
immediate = "What's the sum of the primary 50 prime numbers?"
response = shopper.fashions.generate_content(
  mannequin="gemini-2.5-flash-preview-05-20",
  contents=immediate,
  config=varieties.GenerateContentConfig(
    thinking_config=varieties.ThinkingConfig(thinking_budget=1024,
      include_thoughts=True
    )
  )
)

for half in response.candidates[0].content material.elements:
  if not half.textual content:
    proceed
  if half.thought:
    print("Thought abstract:")
    print(half.textual content)
    print()
  else:
    print("Reply:")
    print(half.textual content)
    print()

Python

Pattern code to allow and retrieve thought summaries with out streaming, returning a ultimate thought abstract with the response.

New URL Context instrument

We added a brand new experimental instrument, URL context, to retrieve extra context from hyperlinks that you simply present. This can be utilized by itself or at the side of different instruments similar to Grounding with Google Search. This instrument is a key constructing block for builders seeking to construct their very own model of analysis brokers with the Gemini API.

from google import genai
from google.genai.varieties import Instrument, GenerateContentConfig, GoogleSearch

shopper = genai.Consumer()
model_id = "gemini-2.5-flash-preview-05-20"

instruments = []
instruments.append(Instrument(url_context=varieties.UrlContext))
instruments.append(Instrument(google_search=varieties.GoogleSearch))

response = shopper.fashions.generate_content(
    mannequin=model_id,
    contents="Give me three day occasions schedule based mostly on YOUR_URL. Additionally let me know what must taken care of contemplating climate and commute.",
    config=GenerateContentConfig(
        instruments=instruments,
        response_modalities=["TEXT"],
    )
)

for every in response.candidates[0].content material.elements:
    print(every.textual content)
# get URLs retrieved for context
print(response.candidates[0].url_context_metadata)

Python

Pattern code for Grounding with Google Search and URL Context

Pc use instrument

We’re bringing Venture Mariner’s browser management capabilities to the Gemini API by way of a brand new laptop use instrument. To make it simpler for builders to make use of this instrument, we’re enabling the creation of Cloud Run cases optimally configured for operating browser management brokers by way of one click on from Google AI Studio. We’ve begun early testing with firms like Automation Anyplace, UiPath and Browserbase. Their helpful suggestions can be instrumental in refining its capabilities for a broader experimental developer launch this summer season.

Enhancements to structured outputs

The Gemini API now has broader help for JSON Schema, together with much-requested key phrases similar to “$ref” (for references) and people enabling the definition of tuple-like constructions (e.g., prefixItems).

Video understanding enhancements

The Gemini API now permits YouTube video URLs or video uploads to be added to a immediate, enabling customers to to summarize, translate, or analyze the video content material. With this latest replace, the API helps video clipping, enabling flexibility in analyzing particular elements of a video. That is notably helpful for movies longer than 8 hours. We’ve additionally added help for dynamic frames per second (FPS), permitting 60 FPS for movies like video games or sports activities the place velocity is vital, and 0.1 FPS for movies the place velocity is much less of a precedence. To assist customers save tokens, we now have additionally launched help for 3 completely different video resolutions: excessive (720p), commonplace (480p), and low (360p).

Async perform calling

The cascaded structure within the Reside API now helps asynchronous perform calling, guaranteeing consumer conversations stay clean and uninterrupted. This implies your Reside agent can proceed producing responses even whereas it is busy executing features within the background, by merely including the habits discipline to the perform definition and setting it to NON-BLOCKING. Learn extra about this within the Gemini API developer documentation.

Batch API

We’re additionally testing a brand new API, which helps you to simply batch up your requests and get them again in a max 24 hour turnaround time. The API will come at half the worth of the interactive API and with a lot greater price limits. We hope to roll that out extra broadly later this summer season.

Begin constructing

That’s a wrap on I/O for this 12 months! With the Gemini API and Google AI Studio, you may flip your concepts into actuality, whether or not you are constructing conversational AI brokers with natural-sounding audio or growing instruments to investigate and generate code. As all the time, try the Gemini API developer docs for all the most recent code samples and extra.

Discover this announcement and all Google I/O 2025 updates on io.google.