At this time, we’re including two new service tiers to the Gemini API: Flex and Precedence. These new choices offer you granular management over price and reliability by a single, unified interface.
As AI evolves from easy chat into advanced, autonomous brokers, builders sometimes must handle two distinct forms of logic:
- Background duties: Excessive-volume workflows like information enrichment or “considering” processes that do not want on the spot responses.
- Interactive duties: Person-facing options like chatbots and copilots the place excessive reliability is required.
Till now, supporting each meant splitting your structure between commonplace synchronous serving and the asynchronous Batch API. Flex and Precedence assist to bridge this hole. Now you can route background jobs to Flex and interactive jobs to Precedence, each utilizing commonplace synchronous endpoints. This eliminates the complexity of async job administration whereas supplying you with the financial and efficiency advantages of specialised tiers.
Flex Inference: scale innovation for 50% much less
Flex Inference is our new cost-optimized tier, designed for latency-tolerant workloads with out the overhead of batch processing.
- 50% value financial savings: Pay half the worth of the Normal API by downgrading criticality of your request (making them much less dependable, and including latency).
- Synchronous simplicity: Not like the Batch API, Flex is a synchronous interface. You employ the identical acquainted endpoints with out managing enter/output information or polling for job completion.
- Best use circumstances: Background CRM updates, large-scale analysis simulations, and agentic workflows the place the mannequin “browses” or “thinks” within the background.
Get began quick by merely configuring the service_tier parameter in your request:







