As we speak, Google DeepMind launched Gemma 4, a household of state-of-the-art open fashions that redefine what is feasible by yourself {hardware}. Now out there underneath the Apache 2.0 license, Gemma 4 offers builders a robust toolkit for on-device AI improvement. With Gemma 4, now you can transcend chatbots to construct brokers and autonomous AI use circumstances working straight on-device. Gemma 4 allows multi-step planning, autonomous motion, offline code era, and even audio-visual processing, all with out specialised fine-tuning. It’s additionally constructed for a world viewers with help for over 140 languages.
Gemma 4 allows visible processing and help in >140 languages
We’re excited to announce you can expertise Gemma 4’s expansive capabilities on the sting beginning right this moment! Entry Android’s built-in Gemma 4 mannequin by means of the brand new AICore Developer Preview, or leverage Google AI Edge to construct agentic, in-app experiences throughout cellular, desktop, and edge units.
On this submit, we’ll present you the best way to get began with Google AI Edge utilizing each Google AI Edge Gallery and LiteRT-LM.
Uncover Agent Expertise with Gemma 4 in Google AI Edge Gallery
Google AI Edge Gallery, out there on iOS and Android, means that you can construct and experiment with AI experiences that run completely on-device. As we speak, we’re thrilled to announce the launch of Agent Expertise, one of many first functions to run multi-step, autonomous agentic workflows completely on-device. Powered by Gemma 4, Agent Expertise can:
- Increase the information base: Gemma 4 can entry the knowledge past its preliminary coaching information utilizing abilities to allow agentic enrichment sort experiences. For instance, you may construct a ability to question Wikipedia, permitting the agent to question and reply to any encyclopedic query.
Question Wikipedia or different information sources
Create graphs, flashcards, and different visualizations
Combine with different fashions to synthesize music and perceive photographs
Construct multi-step workflows and end-to-end experiences
To expertise the Gemma 4 E2B and E4B fashions in motion, try the Google AI Edge Gallery app right this moment. Inside the app, it’s simple to start out experimenting and creating your individual abilities with our information. We will’t wait to see what you construct and share your abilities within the Github Dialogue!
Leverage Gemma 4 throughout units with LiteRT-LM
For builders who’re thinking about deploying Gemma 4 in-app or throughout a broader vary of units, LiteRT-LM supplies stellar efficiency with attain throughout the complete {hardware} spectrum. LiteRT-LM provides GenAI particular libraries on high of LiteRT, which is already trusted by thousands and thousands of Android and edge builders with its high-performance libraries XNNPack and ML Drift. LiteRT-LM builds on this stack and enhances mannequin efficiency with the next new options:
- Minimal Reminiscence footprint: Run Gemma 4 E2B utilizing <1.5GB reminiscence on some units due to LiteRT’s help for 2-bit and 4-bit weights together with memory-mapped per-layer embeddings
- Constrained decoding: Get structured, predictable outputs each time, guaranteeing your AI-driven apps and tool-calling scripts stay dependable in manufacturing.
- Dynamic context: Flexibility to deal with single fashions throughout CPUs and GPUs with dynamic context lengths, permitting you to take full benefit of the Gemma 4 128K context window.
To help the prolonged context lengths required by agentic use circumstances, LiteRT-LM leverages cutting-edge GPU optimizations to course of 4,000 enter tokens throughout 2 distinct abilities in underneath 3 seconds.
LiteRT-LM additionally brings smaller Gemma 4 fashions to IoT & edge units with compelling efficiency on quite a lot of platforms. These embrace the Raspberry Pi 5, the place working on CPU, it reaches 133 prefill and seven.6 decode tokens/s, whereas the NPU acceleration on the Qualcomm Dragonwing IQ8 boosts efficiency to a extra spectacular 3,700 prefill and 31 decode tokens/s.
Able to get began? Take a look at the LiteRT-LM documentation for a whole information and device-specific efficiency metrics. You can too view the person mannequin playing cards for Gemma 4 E2B and Gemma 4 E4B.
Run on any machine
Gemma 4 is offered right this moment with help throughout an unprecedented vary of platforms:
- Cellular: Out there with CPU/GPU help throughout each Android and iOS. Builders also can entry and deploy Android’s built-in and optimized Gemma 4 mannequin system-wide through Android AICore.
- Desktop and Net: Seamless efficiency on Home windows, Linux, and macOS (through Steel), plus native browser-based execution powered by WebGPU.
- IoT and robotics: We’re bringing Gemma 4 to the sting on Raspberry Pi 5 and Qualcomm Dragonwing IQ8 processor with NPU acceleration.
As we speak, we’re additionally launching a brand new Python bundle and CLI software to make it simpler than ever to experiment with Gemma within the console, and to energy Gemma-based Python pipelines for IoT units. The litert-lm CLI is offered on Linux, macOS, and Raspberry Pi, enabling builders to check out the newest Gemma 4 mannequin capabilities with out writing any code. The CLI now additionally helps software calling that powered Agent Expertise in Google AI Edge Gallery. Python bindings for LiteRT-LM present the flexibleness to deeply customise your on-device LLM pipeline from Python. Getting began with LiteRT-LM in your terminal is straightforward utilizing our information.
The period of agentic experiences on-device is right here, and we hope you might be excited to start out constructing on the sting. No matter which machine you might be constructing on, get began with our Agent Expertise examples in Google AI Edge Gallery, and LiteRT-LM getting began information. We will’t wait to see what you construct!
Acknowledgements
We might like to increase a particular due to our important contributors for his or her work on this undertaking:
Advait Jain, Alice Zheng, Amber Heinbockel, Andrew Zhang, Byungchul Kim, Cormac Brick, Daniel Ho, Derek Bekebrede, Dillon Sharlet, Eric Yang, Fengwu Yao, Frank Barchard, Grant Jensen, Hriday Chhabria, Jae Yoo, Jenn Lee, Jing Jin, Jingxiao Zheng, Juhyun Lee, Lu Wang, Lin Chen, Majid Dadashi, Marissa Ikonomidis, Matthew Chan, Matthew Soulanille, Matthias Grundmann, Milen Ferev, Misha Gutman, Mohammadreza Heydary, Pradeep Kuppala, Qidong Zhao, Quentin Khan, Ram Iyengar, Raman Sarokin, Renjie Wu, Rishika Sinha, Rodney Witcher, Ronghui Zhu, Sachin Kotwani, Suleman Shahid, Tenghui Zhu, Terry Heo, Tiffany Hsiao, Wai Hon Legislation, Weiyi Wang, Xiaoming Hu, Xu Chen, Yishuang Pang, Yi-Chun Kuo, Yu-Hui Chen, Zichuan Wei, and the gTech group.







