Bringing Gemma 4 12B to your Laptop computer: Unlocking Native, Agentic Workflows with Google AI Edge

Google DeepMind’s newest open mannequin, Gemma 4 12B, is designed to convey agentic, multimodal intelligence on to your laptop computer. By combining the mannequin’s strengths with the Google AI Edge stack, you possibly can instantly get hands-on to construct and experiment domestically, on on a regular basis machines (see mannequin card for spec requirement).

This model-runtime mixture unlocks highly effective on-device capabilities, from autonomous information processing and producing wealthy visible insights, to constructing absolutely useful webpages and executing on a regular basis instrument use. You can begin interacting with Gemma 4 12B throughout Google AI Edge proper now:

Discover Gemma with Google AI Edge Gallery, our native AI showcase app, now accessible on macOS. With the 12B mannequin you possibly can generate and execute scripts on the fly for duties comparable to information evaluation.
The Google AI Edge Eloquent on-device, voice dictation app is now accessible on macOS. We added the power to interactively polish and rewrite textual content via voice instructions, completely on-device, powered by the brand new Gemma 4 12B mannequin.
LiteRT-LM can now serve native, trade appropriate endpoints straight out of your terminal through the brand new serve command within the LiteRT-LM CLI. When used with Gemma 4 12B, it is a extremely succesful and environment friendly choice to energy fully-local agentic instruments, harnesses, and workflows.

Coding with Google AI Edge Gallery on MacOS

The Google AI Edge Gallery app, now accessible on macOS, showcases Gemma 4 12B’s coding functionality, permitting you to extract significant insights out of your information proper in your gadget. By a seamless interface, you possibly can merely describe your analytical targets in pure language. Within the instance beneath, we requested the mannequin to “use a python program to render a chart png to match the highest 10 woman names born in 2024 vs 2025” given two textual content information containing the information. In response, the mannequin dynamically generates Python code, executes it domestically, and converts uncooked information into lovely, easy-to-grasp visualizations and insights.

In terms of superior coding, Gemma 4 12B does not simply write scripts. In a posh 3D rendering process, we noticed that with only one consumer immediate, the mannequin can generate a rubber duck rendering with dependency specification, generate code and self appropriate, all in a single flip.

Immediate: “use trimesh to put in writing a python program to render the connected obj file to a png file”

Google AI Edge Eloquent, our AI powered dictation and modifying app, seamlessly transforms your uncooked unstructured ideas into polished textual content. The brand new MacOS desktop model runs 100% on-device throughout your complete characteristic set, making certain a robust, absolutely offline expertise. Utilizing a handy, customizable hotkey, Eloquent allows you to use voice dictation throughout any software in your Mac. Moreover, Eloquent helps absolutely native transcription of your audio or video information.

Leveraging the superior reasoning energy of Gemma 4 12B, we’re introducing Voice Edit, a brand new characteristic that permits you to merely dictate voice instructions to remodel any piece of textual content in your desktop workflow. For instance, you possibly can spotlight a paragraph and say, “restructure these notes into an govt abstract”, or “translate this into Hindi”. With Gemma 4 12B, we see an enormous step as much as prior fashions with superior instruction following, stricter scope adherence, and a 60%+ leap in general high quality.

Obtain Google AI Edge Eloquent on macOS at present and expertise the ability of Gemma 4 12B as a totally native AI dictation and modifying assistant.

Construct with LiteRT-LM together with Drop-in Native Serving

The LiteRT-LM CLI offers a light-weight, zero-code instrument for operating language fashions domestically. We at the moment are increasing the instrument with the serve command, letting the CLI act as a drop-in native LLM server. Use this performance with Gemma 4 12B to level any normal instrument, SDK, or framework (comparable to OpenClaw, Hermes, OpenCode, Pi, or in style extensions like Proceed and Aider) on to your native endpoint.

# Import the Gemma 4 12B mannequin as "gemma4-12b"
litert-lm import --from-huggingface-repo=litert-community/gemma-4-12B-it-litert-lm gemma-4-12B-it.litertlm gemma4-12b

# Begin the OpenAI-compatible server
litert-lm serve

Shell

curl http://localhost:9379/v1/chat/completions 
  -H "Content material-Kind: software/json" 
  -d '{
    "mannequin": "gemma4-12b,gpu",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Shell

Sorry, your browser does not help playback for this video

Demonstrating the LiteRT-LM CLI serve command: Creating an trade appropriate native endpoint to attach Gemma 4 12B with Open WebUI for one-shot particle impact demo.

Prepared to be used on on a regular basis Laptops

Operating Gemma 4 12B makes on-device AI powered capabilities broadly accessible to on a regular basis laptops. Try the LiteRT-LM mannequin card for efficiency and reminiscence benchmarks. By pairing the highly effective capabilities of this new mannequin with the optimized efficiency and ease of use of Google AI Edge you possibly can construct multi-turn native brokers, analyze information in Google AI Edge Gallery, or streamline your writing with Google AI Edge Eloquent. Moreover, your information stays in your gadget whereas sustaining dependable responsiveness, utility, and value effectivity.

Acknowledgements

^{We might like to increase a particular due to our vital contributors for his or her work on this venture (in alphabetical order):}

^{Advait Jain, Alice Zheng, Alex Kanaukou, Ami Kubota, Changming Solar, Cormac Brick, Denis Daletski, Fengwu Yao, Hriday Chhabria, Jingxiao Zheng, Jingtao Zhou, Jenn Lee, Jianing Wei, Jing Jin, Lin Chen, Lu Wang, Marius Kintel, Marissa Ikonomidis, Matthias Grundmann, Mogan Shieh, Mohammadreza Heydary, Matthew Soulanille, Na Li, Qidong Zhao, Queenie Zhang, Ram Iyengar, Rishika Sinha, Sachin Kotwani, Suleman Shahid, Suril Shah, Tenghui Zhu, Wai Hon Legislation, Weiyi Wang, Xiaoming Hu, Xinan Cheng, Yi-Chun Kuo, Yishuang Pang, Yu-hui Chen.}