{"id":14475,"date":"2026-05-05T19:10:51","date_gmt":"2026-05-05T19:10:51","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=14475"},"modified":"2026-05-05T19:10:51","modified_gmt":"2026-05-05T19:10:51","slug":"portool-significance-conscious-coverage-optimization-with-rewarded-tree-for-multi-instrument-built-in-reasoning","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=14475","title":{"rendered":"PORTool: Significance-Conscious Coverage Optimization with Rewarded Tree for Multi-Instrument-Built-in Reasoning"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Multi-tool-integrated reasoning permits LLM-empowered tool-use brokers to resolve advanced duties by interleaving natural-language reasoning with calls to exterior instruments. Nonetheless, coaching such brokers utilizing outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use choices) result in success or failure. On this paper, we suggest PORTool, an importance-aware policy-optimization algorithm that reinforces brokers\u2019 tool-use competence from outcome-level supervision whereas assigning reward on the step degree. Particularly, PORTool generates a rewarded rollout tree during which trajectories share prefixes earlier than branching, enabling direct comparisons amongst different tool-use choices throughout the similar context. It then estimates every step\u2019s significance by a correctness-dominant sign, i.e., whether or not descendants of that step can finally produce an accurate remaining reply, plus an auxiliary time period indicating whether or not the step\u2019s instrument calls execute efficiently. Utilizing these step-wise significance estimates, PORTool updates the coverage to generate environment friendly tool-call steps, guided by each native comparisons inside every branching choice and the general high quality of whole trajectories. Experiments present that PORTool improves final-answer accuracy whereas lowering tool-call steps in contrast with state-of-the-art baselines, and ablation research verify the robustness of the proposed step-wise significance estimates.<\/p>\n<ul class=\"links-stacked\">\n<li>\u2020 Purdue College<\/li>\n<li>** Work accomplished whereas at Apple<\/li>\n<\/ul>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Multi-tool-integrated reasoning permits LLM-empowered tool-use brokers to resolve advanced duties by interleaving natural-language reasoning with calls to exterior instruments. Nonetheless, coaching such brokers utilizing outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use choices) result in success or failure. On this paper, we suggest PORTool, an importance-aware policy-optimization algorithm that reinforces brokers\u2019 [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":14477,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[8959,8961,1252,1488,8958,616,8960,3580],"class_list":["post-14475","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-importanceaware","tag-multitoolintegrated","tag-optimization","tag-policy","tag-portool","tag-reasoning","tag-rewarded","tag-tree"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/14475","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14475"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/14475\/revisions"}],"predecessor-version":[{"id":14476,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/14475\/revisions\/14476"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/14477"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14475"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14475"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-05-06 11:43:08 UTC -->