• About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us
TechTrendFeed
  • Home
  • Tech News
  • Cybersecurity
  • Software
  • Gaming
  • Machine Learning
  • Smart Home & IoT
No Result
View All Result
  • Home
  • Tech News
  • Cybersecurity
  • Software
  • Gaming
  • Machine Learning
  • Smart Home & IoT
No Result
View All Result
TechTrendFeed
No Result
View All Result

MaxText Expands Put up-Coaching Capabilities: Introducing SFT and RL on Single-Host TPUs

Admin by Admin
May 10, 2026
Home Software
Share on FacebookShare on Twitter


Within the quickly evolving panorama of enormous language fashions (LLMs), pre-training is just step one. To rework a base mannequin right into a specialised assistant or a high-performing reasoning engine, post-training is important. At this time, we’re excited to announce new options in MaxText that streamline this course of: Supervised Effective-Tuning (SFT) and Reinforcement Studying (RL) now obtainable on single-host TPU configurations (similar to v5p-8 and v6e-8).

By leveraging the facility of JAX and the effectivity of the Tunix library, MaxText gives a high-performance, scalable path for builders to refine their fashions utilizing the newest post-training strategies. You’ll be able to discover the total documentation for SFT and RL to start out your post-training journey on TPUs at this time.

Supervised Effective-Tuning (SFT): Precision Tuning Made Easy

Supervised Effective-Tuning is the first technique for adapting a pre-trained mannequin to comply with particular directions or excel at area of interest duties. With the brand new single-host SFT help, customers can now take an current MaxText or Hugging Face checkpoint and fine-tune it on labeled datasets with minimal setup.

Key Highlights:

  • Seamless Integration: Native help for Hugging Face datasets (e.g., ultrachat_200k).
  • Versatile Checkpoints: Use current MaxText checkpoints or convert Hugging Face fashions (like Gemma 3) instantly inside the ecosystem.
  • Optimized Execution: Powered by Tunix, a JAX-based library particularly designed for post-training effectivity.

Reinforcement Studying (RL): Advancing Reasoning Capabilities

For duties requiring complicated logic and reasoning—similar to math or coding—Reinforcement Studying is a game-changer. MaxText now helps a number of state-of-the-art RL algorithms on single-host TPUs, using vLLM for high-throughput inference throughout the coaching loop. For instance,

  1. Group Relative Coverage Optimization (GRPO) GRPO is a memory-efficient variant of PPO (Proximal Coverage Optimization). It eliminates the necessity for a separate worth operate mannequin, as an alternative producing a number of responses per immediate and calculating relative benefits inside the group. This considerably reduces the {hardware} footprint, making superior RL accessible on a single TPU host.
  2. Group Sequence Coverage Optimization (GSPO) GSPO focuses on sequence-level significance ratios and clipping. It improves coaching stability and effectivity by rewarding mannequin habits on the sequence degree, making it significantly efficient for enhancing efficiency on benchmarks like GSM8K.

Getting Began

To start utilizing these new options, guarantee you could have the newest post-training dependencies put in:

uv pip set up maxtext[tpu-post-train]==0.2.1 --resolution=lowest
install_maxtext_tpu_post_train_extra_deps

Shell

Operating SFT:

You’ll be able to launch an SFT run utilizing the train_sft module, specifying your mannequin, dataset, and output listing:

python3 -m maxtext.trainers.post_train.sft.train_sft 
   model_name=${MODEL?} 
   load_parameters_path=${MAXTEXT_CKPT_PATH?} 
   run_name=${RUN_NAME?} 
   base_output_directory=${BASE_OUTPUT_DIRECTORY?}

Shell

Operating RL (GRPO/GSPO):

For RL, the train_rl module handles the loading of coverage and reference fashions, executes the coaching, and gives automated analysis on reasoning benchmarks:

python3 -m maxtext.trainers.post_train.rl.train_rl 
  model_name=${MODEL?} 
  load_parameters_path=${MAXTEXT_CKPT_PATH?} 
  run_name=${RUN_NAME?} 
  base_output_directory=${BASE_OUTPUT_DIRECTORY?} 
  loss_algo=gspo-token 
  chips_per_vm=${CHIPS_PER_VM?}

Shell

What’s Subsequent?

Whereas single-host help gives a robust entry level for a lot of builders, MaxText is constructed for scale. These identical workflows are designed to transition seamlessly to multi-host configurations for these coaching bigger fashions and using huge datasets. Please keep tuned for extra updates on this path from us sooner or later.

Tags: CapabilitiesexpandsIntroducingMaxTextPostTrainingSFTSingleHostTPUs
Admin

Admin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Trending.

Reconeyez Launches New Web site | SDM Journal

Reconeyez Launches New Web site | SDM Journal

May 15, 2025
Discover Vibrant Spring 2025 Kitchen Decor Colours and Equipment – Chefio

Discover Vibrant Spring 2025 Kitchen Decor Colours and Equipment – Chefio

May 17, 2025
Flip Your Toilet Right into a Good Oasis

Flip Your Toilet Right into a Good Oasis

May 15, 2025
Safety Amplified: Audio’s Affect Speaks Volumes About Preventive Safety

Safety Amplified: Audio’s Affect Speaks Volumes About Preventive Safety

May 18, 2025
Apollo joins the Works With House Assistant Program

Apollo joins the Works With House Assistant Program

May 17, 2025

TechTrendFeed

Welcome to TechTrendFeed, your go-to source for the latest news and insights from the world of technology. Our mission is to bring you the most relevant and up-to-date information on everything tech-related, from machine learning and artificial intelligence to cybersecurity, gaming, and the exciting world of smart home technology and IoT.

Categories

  • Cybersecurity
  • Gaming
  • Machine Learning
  • Smart Home & IoT
  • Software
  • Tech News

Recent News

MaxText Expands Put up-Coaching Capabilities: Introducing SFT and RL on Single-Host TPUs

MaxText Expands Put up-Coaching Capabilities: Introducing SFT and RL on Single-Host TPUs

May 10, 2026
ValiDrive Obtain Free – 1.0.1

ValiDrive Obtain Free – 1.0.1

May 10, 2026
  • About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us

© 2025 https://techtrendfeed.com/ - All Rights Reserved

No Result
View All Result
  • Home
  • Tech News
  • Cybersecurity
  • Software
  • Gaming
  • Machine Learning
  • Smart Home & IoT

© 2025 https://techtrendfeed.com/ - All Rights Reserved