To start utilizing these new options, guarantee you could have the newest post-training dependencies put in:<\/p>\n<\/div>\n
uv pip set up maxtext[tpu-post-train]==0.2.1 --resolution=lowest\ninstall_maxtext_tpu_post_train_extra_deps<\/code><\/pre>\n\n        Shell\n    <\/p>\n<\/div>\n
\nOperating SFT:<\/b><\/h4>\nYou’ll be able to launch an SFT run utilizing the train_sft module, specifying your mannequin, dataset, and output listing:<\/p>\n<\/div>\n
\npython3 -m maxtext.trainers.post_train.sft.train_sft \n   model_name=${MODEL?} \n   load_parameters_path=${MAXTEXT_CKPT_PATH?} \n   run_name=${RUN_NAME?} \n   base_output_directory=${BASE_OUTPUT_DIRECTORY?}<\/code><\/pre>\n\n        Shell\n    <\/p>\n<\/div>\n
\nOperating RL (GRPO\/GSPO):<\/b><\/h4>\nFor RL, the train_rl module handles the loading of coverage and reference fashions, executes the coaching, and gives automated analysis on reasoning benchmarks:<\/p>\n<\/div>\n
\npython3 -m maxtext.trainers.post_train.rl.train_rl \n  model_name=${MODEL?} \n  load_parameters_path=${MAXTEXT_CKPT_PATH?} \n  run_name=${RUN_NAME?} \n  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \n  loss_algo=gspo-token \n  chips_per_vm=${CHIPS_PER_VM?}<\/code><\/pre>\n\n        Shell\n    <\/p>\n<\/div>\n
\nWhat\u2019s Subsequent?<\/b><\/h3>\nWhereas single-host help gives a robust entry level for a lot of builders, MaxText is constructed for scale. These identical workflows are designed to transition seamlessly to multi-host configurations for these coaching bigger fashions and using huge datasets. Please keep tuned for extra updates on this path from us sooner or later.<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"
Within the quickly evolving panorama of enormous language fashions (LLMs), pre-training is just step one. To rework a base mannequin right into a specialised assistant or a high-performing reasoning engine, post-training is important. At this time, we’re excited to announce new options in MaxText that streamline this course of: Supervised Effective-Tuning (SFT) and Reinforcement Studying […]<\/p>\n","protected":false},"author":2,"featured_media":14633,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[56],"tags":[610,3550,979,9025,9026,9027,9028,7308],"class_list":["post-14631","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software","tag-capabilities","tag-expands","tag-introducing","tag-maxtext","tag-posttraining","tag-sft","tag-singlehost","tag-tpus"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/14631","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14631"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/14631\/revisions"}],"predecessor-version":[{"id":14632,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/14631\/revisions\/14632"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/14633"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14631"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14631"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14631"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}