We introduce two multilingual, multimodal basis language fashions that energy Apple Intelligence options throughout Apple gadgets and companies: (i) a ∼3B-parameter on-device mannequin optimized for Apple silicon by means of architectural improvements similar to KV-cache sharing and 2-bit quantization-aware coaching; and (ii) a scalable server mannequin constructed on a novel Parallel-Observe Combination-of-Specialists (PT-MoE) transformer that mixes observe parallelism, mixture-of-experts sparse computation, and interleaved world–native consideration to ship prime quality with aggressive price on Apple’s Personal Cloud Compute platform. Each fashions are educated on large-scale multilingual and multimodal datasets sourced by way of accountable net crawling, licensed corpora, and high-quality artificial knowledge, then additional refined with supervised fine-tuning and reinforcement studying on a brand new asynchronous platform. The ensuing fashions assist a number of extra languages whereas understanding photographs and executing device calls. In public benchmarks and human evaluations, each the server mannequin and the on-device mannequin match or surpass comparably sized open baselines.
A brand new Swift-centric Basis Fashions framework exposes guided era, constrained device calling, and LoRA adapter fine-tuning, permitting builders to combine these capabilities with a couple of traces of code. The most recent developments in Apple Intelligence fashions are grounded in our Accountable AI strategy with safeguards like content material filtering and locale-specific analysis, in addition to our dedication to defending our customers’ privateness with improvements like Personal Cloud Compute.
This paper supplies technical particulars for Updates to Apple’s On-Machine and Server Basis Language Fashions, launched on June 9, 2025, on this publish.







