Constructing real-world on-device AI with LiteRT and NPU

Customers profit from prompt AI options like real-time video results, ASR, and movement seize of their cellular apps. Nevertheless, for builders, operating refined fashions on-device typically comes with balancing distinctive challenges associated to managing system thermals, preserving battery life, and stopping body drops. To ship quick, responsive AI experiences with out compromising efficiency, LiteRT unlocks Neural Processing Items (NPUs), the {hardware} particularly constructed for these workloads.

LiteRT is a cross-platform production-ready framework for on-device AI, providing CPU, GPU, and NPU acceleration throughout cellular, desktop, and IoT platforms. Designed for efficiency and scalability, LiteRT simplifies the deployment of high-speed AI options, via a unified API. This abstracts the complexity of integrating with a number of NPU SDKs, permitting builders to focus on numerous silicon with out writing vendor-specific code.

Translating NPU efficiency into significant experiences

LiteRT is already hardened throughout Google merchandise, fashionable apps, and even SDKs. Utilized by {industry} leaders together with Google Meet, Epic Video games, and Argmax Inc. here’s what NPU acceleration seems to be like in real-world manufacturing apps.

Google Meet: By leveraging the cellular NPU, Google Meet efficiently deployed an Extremely-HD segmentation mannequin 25x bigger than earlier variations – with out sacrificing inference velocity. Crucially, it maintains a constant energy footprint, creating thermal headroom essential to ship higher-quality background alternative all through a typical 20-30 min session.

Epic Video games, Inc: Excessive-fidelity, real-time animation experiences demand distinctive effectivity. Epic’s Dwell Hyperlink Face (Beta) app for Android permits creators to seize performances from a single digital camera, then generate and stream real-time MetaHuman facial animation immediately from their gadgets into Unreal Engine.

Actual-time facial fixing is computationally intensive and requires constantly low latency. Through the use of LiteRT on the NPU, Epic unlocks devoted on-device acceleration on supported Android gadgets, enabling as much as 30 FPS efficiency for real-time MetaHuman animation.

Sorry, your browser does not assist playback for this video

Actual-time MetaHuman facial animation in Unreal Engine with NPU

Argmax Inc not too long ago launched the Argmax Professional SDK for Android for on-device speech recognition in collaboration with LiteRT. By using LiteRT and AI Pack function supply by way of Google Play, Argmax was capable of deliver its top-tier accuracy and real-time velocity whereas respecting app dimension constraints on Android. Crucially, they leveraged LiteRT’s Forward-Of-Time (AOT) compilation to eradicate pricey on-device compilation steps, enabling frontier speech fashions like NVIDIA Parakeet TDT 0.6B v2 to run with industry-leading latency.

Efficiency testing throughout Google Tensor, MediaTek and Qualcomm Applied sciences SoCs, Argmax Professional SDK confirmed that upgrading from GPU to NPU delivers over 2x speedup. Past the speedups, the facility effectivity of NPUs enabled Argmax SDK Enterprise prospects like Heidi Well being to conduct dependable on-device dwell transcription for prolonged periods whereas mitigating impression to battery life. Lastly, by offloading runtime libraries and fashions to on-demand downloads by way of Play’s AI Packs, the system dynamically obtains the mannequin that is optimized for the precise NPU.

Sorry, your browser does not assist playback for this video

Argmax’s Kotlin-first SDK brings top-tier accuracy and real-time velocity to Android, with seamless NPU and GPU acceleration by Google LiteRT.

Google AI Edge Gallery App: To assist builders check and validate the efficiency of NPU acceleration, we’re joyful to announce that the Google AI Edge Gallery App now options NPU assist for choose Gemma fashions and built-in benchmarking instruments. Accessible on Android, AI Edge Gallery permits you to rapidly see the true potential of AI efficiency on cellular {hardware}. Builders may entry the Google AI Edge Gallery on GitHub to construct their very own experiences.

Sorry, your browser does not assist playback for this video

Discover varied on-device LLM use circumstances with Google AI Edge Gallery

Scaling efficiency throughout the {hardware} spectrum

Whereas the efficiency beneficial properties in speech, animation, and video are clear, the trail to the NPU has traditionally been troublesome to unlock for builders, resulting from varied vendor-specific SDKs and complexities. By offering a streamlined workflow and cross platform assist, LiteRT permits builders to deploy superior fashions, from cell phones to industrial IoT and AI PCs, with out sacrificing efficiency or portability.

Cross-platform NPU assist

As highlighted within the latest Google AI Edge Gemma 4 weblog publish, LiteRT extends NPU acceleration past cellular, permitting you to deploy your fashions throughout a variety of {hardware} utilizing a single framework. For the commercial edge, LiteRT helps platforms just like the Qualcomm Dragonwing ™ IQ8 Sequence, which additionally powers Arduino VENTUNO Q, enabling high-reliability use circumstances like robotics and good manufacturing with fashions like Gemma 4. For desktop, LiteRT is making ready for AI PCs via OpenVINO™ integration with Intel® Core™ Extremely collection 2 and 3 processors, delivering important energy financial savings and responsiveness for native GenAI workloads.

Efficiency validation at scale

Google AI Edge Portal offers a benchmark service throughout 100+ of the preferred cell phones with insights on ML workloads throughout gadgets, accelerators and configurations. Builders can now make data-driven deployment selections, comparable to whether or not to make use of AOT or JIT, that greatest go well with their use circumstances and their goal gadgets. To make use of the most recent Portal NPU options, join our non-public preview right here.

Sorry, your browser does not assist playback for this video

Google AI Edge Portal Benchmarking Outcomes

Get began along with your NPU journey

With our production-ready NPU integrations, LiteRT offers a unified workflow that abstracts away low-level complexities throughout each Simply-In-Time (JIT) and Forward-Of-Time (AOT) deployment.

Dive into our documentation and begin your journey with NPU acceleration immediately.

Tell us your suggestions and have requests by opening a difficulty on our GitHub channel. We are able to’t wait to see what you construct!

Acknowledgements

_{Google: Akshat Sharma, Alice Zheng, Andrew Zhang, Ashley Lin, Byungchul Kim, Changming Solar, Charlie Xu, Chenchen Tang, Chunlei Niu, Cormac Brick, Derek Bekebrede, Fabian Bergmark, Fengwu Yao, Gerardo Carranza, Gregory Karpiak, Jae Yoo, Jing Jin, Jingjiang Li, Julius Kammerl, Jun Jiang, Lu Wang, Maria Lyubimtseva, Mariana Quesada, Marissa Ikonomidis, Matt Kreileder, Matthias Grundmann, Meghna Johar, Na Li, Ping Yu, Renjie Wu, Rishika Sinha, Sachin Kotwani, Salil Tambe, Siargey Pisarchyk, Siargey Pisarchyk, Somdatta Banerjee, Steven Toribio, Suleman Shahid, Terry Heo, Wai Hon Legislation, Weiyi Wang, Xiaoming Hu}

_{Companions: Alen Huang, Ankit Kapoor, Arda Atahan Ibis, Atila Orhon, Brian Keene, Chen Cen, Cheng-Dao Lee. Cheng-Yen Lin, Chun-Ting Lin (Graham), Code Lin, Deep Yap, Dylan Angus, Felix Baum, HungChun Liu, Jhih-Kuan Lin, Jiun-Kai Yang (Kelvin), Kedar Gharat, Ken Sieger, Laxmi Rayapudi, Lei Chen, Mike Tremaine, Ming-Che Lin (Vincent), Poyuan Jeng, MetaHuman Staff, Vinesh Sukumar, Waimun Wong, Yi-Ru Chen, Yu-Ting Wan, Zach Nagengast}