Gemini 2.5 Flash Picture is our newest, quickest, and best natively multimodal mannequin. What makes Gemini 2.5 Flash distinctive is its native multimodal structure. It was educated from the bottom as much as course of textual content and pictures in a single, unified step. This permits for highly effective capabilities past easy picture era, corresponding to conversational modifying, multi-image composition, and logical reasoning about picture content material.
Listed here are the important thing issues you are able to do:
- Textual content-to-image: Generate high-quality pictures from easy or complicated textual content descriptions.
- Picture + text-to-image (modifying): Present a picture and use textual content prompts so as to add, take away, or modify components, change the fashion, or regulate colours.
- Multi-image to picture (composition & fashion switch): Use a number of enter pictures to compose a brand new scene or switch the fashion from one picture to a different.
- Iterative refinement: Have a dialog to progressively refine your picture over a number of turns, making small changes.
- Textual content rendering: Generate pictures that comprise clear and well-placed textual content, superb for logos, diagrams, and posters.
This information will train you methods to write prompts and supply directions that get higher outcomes from Gemini 2.5 Flash. All of it begins with one elementary precept:
Describe the scene, do not simply record key phrases. The mannequin’s core energy is its deep language understanding. A story, descriptive paragraph will virtually at all times produce a greater, extra coherent picture than a easy record of disconnected phrases.
You’ll be able to attempt these with code from the official documentation or begin creating immediately in Google AI Studio.
Creating pictures from textual content
The most typical option to generate a picture is by describing what you wish to see.
1. Photorealistic scenes
For real looking pictures, assume like a photographer. Mentioning digital camera angles, lens sorts, lighting, and tremendous particulars will information the mannequin towards a photorealistic end result.
Template:
A photorealistic [shot type] of [subject], [action or expression], set in [environment]. The scene is illuminated by [lighting description], making a [mood] environment. Captured with a [camera/lens details], emphasizing [key textures and details]. The picture ought to be in a [aspect ratio] format.
Instance immediate:
A photorealistic close-up portrait of an aged Japanese ceramicist with deep, sun-etched wrinkles and a heat, realizing smile. He’s fastidiously inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by comfortable, golden hour mild streaming via a window, highlighting the tremendous texture of the clay. Captured with an 85mm portrait lens, leading to a comfortable, blurred background (bokeh). The general temper is serene and masterful. Vertical portrait orientation.
Instance output:
A photorealistic close-up portrait of an aged Japanese ceramicist…
2. Stylized illustrations & stickers
To create stickers, icons, or belongings on your initiatives, be specific concerning the fashion and bear in mind to request a white background should you want one.
Template:
A [style] sticker of a [subject], that includes [key characteristics] and a [color palette]. The design ought to have [line style] and [shading style]. The background should be white.
Instance immediate:
A kawaii-style sticker of a contented crimson panda carrying a tiny bamboo hat. It is munching on a inexperienced bamboo leaf. The design options daring, clear outlines, easy cel-shading, and a vibrant shade palette. The background should be white.
Instance output:
A kawaii-style sticker of a contented crimson panda…
3. Correct textual content in pictures
Gemini 2.5 Flash Picture can render textual content inside pictures. Be clear concerning the precise textual content you need, describe the font fashion, and set the general design.
Template:
Create a [image type] for [brand/concept] with the textual content “[text to render]” in a [font style]. The design ought to be [style description], with a [color scheme].
Instance immediate:
Create a contemporary, minimalist emblem for a espresso store known as ‘The Day by day Grind’. The textual content ought to be in a clear, daring, sans-serif font. The design ought to function a easy, stylized icon of a espresso bean seamlessly built-in with the textual content. The colour scheme is black and white.
Instance output:
Create a contemporary, minimalist emblem for a espresso store known as ‘The Day by day Grind’…
4. Product mockups & business pictures
Create clear, skilled product photographs for e-commerce, promoting, or branding.
Template:
A high-resolution, studio-lit product {photograph} of a [product description] on a [background surface/description]. The lighting is a [lighting setup, e.g., three-point softbox setup] to [lighting purpose]. The digital camera angle is a [angle type] to showcase [specific feature]. Extremely-realistic, with sharp give attention to [key detail]. [Aspect ratio].
Instance immediate:
A high-resolution, studio-lit product {photograph} of a minimalist ceramic espresso mug in matte black, introduced on a refined concrete floor. The lighting is a three-point softbox setup designed to create comfortable, subtle highlights and eradicate harsh shadows. The digital camera angle is a barely elevated 45-degree shot to showcase its clear traces. Extremely-realistic, with sharp give attention to the steam rising from the espresso. Sq. picture.
Instance output:
A high-resolution, studio-lit product {photograph} of a minimalist ceramic espresso mug…
5. Minimalist & damaging house design
Create backgrounds for web sites, shows, or advertising and marketing supplies the place you propose to overlay textual content.
Template:
A minimalist composition that includes a single [subject] positioned within the [bottom-right/top-left/etc.] of the body. The background is an unlimited, empty [color] canvas, creating vital damaging house. Gentle, delicate lighting. [Aspect ratio].
Instance immediate:
A minimalist composition that includes a single, delicate crimson maple leaf positioned within the bottom-right of the body. The background is an unlimited, empty off-white canvas, creating vital damaging house for textual content. Gentle, subtle lighting from the highest left. Sq. picture.
Instance output:
A minimalist composition that includes a single, delicate crimson maple leaf…
6. Sequential artwork (comedian panel / storyboard)
Create compelling visible narratives, panel by panel, superb for growing storyboards, comedian strips, or any type of sequential artwork by specializing in clear scene descriptions.
Template:
A single comedian e-book panel in a [art style] fashion. Within the foreground, [character description and action]. Within the background, [setting details]. The panel has a [dialogue/caption box] with the textual content “[Text]”. The lighting creates a [mood] temper. [Aspect ratio].
Instance immediate:
A single comedian e-book panel in a gritty, noir artwork fashion with high-contrast black and white inks. Within the foreground, a detective in a trench coat stands beneath a flickering streetlamp, rain soaking his shoulders. Within the background, the neon signal of a desolate bar displays in a puddle. A caption field on the prime reads “Town was a troublesome place to maintain secrets and techniques.” The lighting is harsh, making a dramatic, somber temper. Panorama.
Instance output:
A single comedian e-book panel in a gritty, noir artwork fashion…
Enhancing pictures with textual content
That is the place Gemini 2.5 Flash Picture multimodality really shines. You’ll be able to present a number of pictures alongside your textual content prompts for modifying, composition, and magnificence switch.
1. Picture modifying: Including & eradicating components
Present a picture and easily describe the change you need. The mannequin will analyze the unique picture’s fashion, lighting, and perspective to make the edit look pure and keep character consistency throughout a sequence of pictures.
Template:
Utilizing the supplied picture of [subject], please [add/remove/modify] [element] to/from the scene. Make sure the change is [description of how the change should integrate].
Instance immediate:
Utilizing the supplied picture of my cat, please add a small, knitted wizard hat on its head. Make it seem like it is sitting comfortably and matches the comfortable lighting of the photograph.
Instance enter & output:
2. Inpainting: modifying a selected space
You’ll be able to conversationally inform Gemini 2.5 Flash Picture to edit just one a part of a picture whereas leaving the remaining fully untouched.
Template:
Utilizing the supplied picture, change solely the [specific element] to [new element/description]. Maintain all the pieces else within the picture precisely the identical, preserving the unique fashion, lighting, and composition.
Instance immediate:
Utilizing the supplied picture of a lounge, change solely the blue couch to be a classic, brown leather-based chesterfield couch. Maintain the remainder of the room, together with the pillows on the couch and the lighting, unchanged.
Instance enter & output:
3. Model switch
Present a photograph and ask the mannequin to recreate its content material within the particular fashion or artwork motion.
Template:
Rework the supplied {photograph} of [subject] into the creative fashion of [artist/art style]. Protect the unique composition however render it with [description of stylistic elements].
Instance immediate:
Rework the supplied {photograph} of a contemporary metropolis road at night time into the creative fashion of Vincent van Gogh’s ‘Starry Evening’. Protect the unique composition of buildings and automobiles, however render all components with swirling, impasto brushstrokes and a dramatic palette of deep blues and brilliant yellows.
Instance enter & output:
4. Superior composition: Combining a number of pictures
Present a number of pictures as context to create a model new, composite scene. That is good for product mockups or inventive collages.
Template:
Create a brand new picture by combining the weather from the supplied pictures. Take the [element from image 1] and place it with/on the [element from image 2]. The ultimate picture ought to be a [description of the final scene].
Instance immediate:
Create knowledgeable e-commerce trend photograph. Take the blue floral gown from the primary picture and let the lady from the second picture put on it. Generate a practical, full-body shot of the lady carrying the gown, with the lighting and shadows adjusted to match an outside surroundings.
Instance enter & output:
Finest practices
As you construct, listed below are a extra suggestions for working with picture era:
- Be hyper-specific: The extra element you present, the extra management you’ve gotten. As an alternative of “fantasy armor,” describe it: “ornate elven plate armor, etched with silver leaf patterns, with a excessive collar and pauldrons formed like falcon wings.”
- Repair character consistency drifts: If you happen to discover a personality’s options start to float after many iterative edits, you possibly can restart a brand new dialog with an in depth description to retain consistency.
- Present context and intent: Clarify the objective of the picture. For instance, “Create a emblem for a high-end, minimalist skincare model” will yield higher outcomes than simply “Create a emblem.”
- Iterate and refine: Do not anticipate an ideal picture on the primary attempt. Use the conversational nature of the mannequin to make small modifications. Comply with up with prompts like, “That is nice, however are you able to make the lighting a bit hotter?” or “Maintain all the pieces the identical, however change the character’s expression to be extra severe.”
- Use “semantic damaging prompts”: As an alternative of claiming “no automobiles,” describe the specified scene positively: “an empty, abandoned road with no indicators of visitors.”
- Side ratios: When modifying, Gemini 2.5 Flash Picture typically preserves the enter picture’s facet ratio. If it does not, be specific in your immediate:
"Replace the enter picture... Don't change the enter facet ratio."
If you happen to add a number of pictures with completely different facet ratios, the mannequin will undertake the facet ratio of the final picture supplied. If you happen to want a selected ratio for a brand new picture and prompting does not produce it, the perfect follow is to offer a reference picture with the right dimensions as a part of your immediate.
- Management the digital camera: Use photographic and cinematic language to regulate the composition. Phrases like
wide-angle shot
,macro shot
,low-angle perspective
,85mm portrait lens
, andDutch angle
provide you with exact management over the ultimate picture.
Limitations
As we proceed to develop and enhance our fashions, we imagine in being clear about areas for enchancment.
Whereas Gemini 2.5 Flash Picture is a robust and versatile device, reaching perfection on the primary try with extremely nuanced requests can require some iteration. Chances are you’ll discover that producing complicated typography or sustaining absolute consistency of character options throughout a number of pictures generally wants refinement via follow-up prompts.
We’re actively working to enhance these areas and respect your creativity as we construct the following era of picture instruments collectively.
What’s subsequent? Begin creating!
You now have the foundational abilities that can assist you create and edit unbelievable pictures with Gemini 2.5 Flash. The easiest way to enhance is to follow. Listed here are some assets that can assist you in your journey:
- Evaluate pricing: Perceive the prices related to utilizing Gemini 2.5 Flash Picture era with the Gemini API on your initiatives.
- Strive the Picture Enhancing Applet: Take a look at AI-Powered photograph modifying, apply inventive filters, or make skilled changes utilizing easy textual content prompts.