SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Mills

Giant Language Fashions (LLMs) have reworked pure language processing, however face important challenges in widespread deployment because of their excessive runtime price. On this paper, we introduce SeedLM, a novel post-training compression methodology that makes use of seeds of a pseudo-random generator to encode and compress mannequin weights. Particularly, for every block of weights, we
discover a seed that’s fed right into a Linear Suggestions Shift Register (LFSR) throughout inference to effectively generate a random matrix. This matrix is then linearly mixed with compressed coefficients to reconstruct the load block. SeedLM reduces reminiscence entry and leverages idle compute cycles throughout inference, successfully rushing up memory-bound duties by buying and selling compute for fewer reminiscence accesses. In contrast to state-of-the-art strategies that depend on calibration knowledge, our method is data-free and generalizes properly throughout numerous duties. Our experiments with
Llama3 70B, which is especially difficult, present zero-shot accuracy retention at 4- and 3-bit compression to be on par with or higher than state-of-the-art strategies, whereas sustaining efficiency corresponding to FP16 baselines. Moreover, FPGA-based checks exhibit that 4-bit SeedLM, as mannequin dimension will increase, approaches a 4x speed-up over an FP16 Llama 2/3 baseline.

† Meta

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Mills

Admin

1997 Horror Basic I Know What You Did Final Summer season Returns With A 4K Blu-Ray Launch

Leave a Reply Cancel reply

Trending.

Discover Vibrant Spring 2025 Kitchen Decor Colours and Equipment – Chefio

Reconeyez Launches New Web site | SDM Journal

Safety Amplified: Audio’s Affect Speaks Volumes About Preventive Safety

Flip Your Toilet Right into a Good Oasis

Apollo joins the Works With House Assistant Program

TechTrendFeed

Categories

Recent News

Report: AI coding productiveness positive aspects cancelled out by different friction factors that sluggish builders down

How authorities cyber cuts will have an effect on you and your enterprise