{"id":7147,"date":"2025-09-29T00:59:07","date_gmt":"2025-09-29T00:59:07","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=7147"},"modified":"2025-09-29T00:59:07","modified_gmt":"2025-09-29T00:59:07","slug":"personateaming-exploring-how-introducing-personas-can-enhance-automated-ai-crimson-teaming","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=7147","title":{"rendered":"PersonaTeaming: Exploring How Introducing Personas Can Enhance Automated AI Crimson-Teaming"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>This paper was accepted on the Workshop on Regulatable ML (ReML) at NeurIPS 2025.<\/p>\n<p>Latest developments in AI governance and security analysis have known as for red-teaming strategies that may successfully floor potential dangers posed by AI fashions. Many of those calls have emphasised how the identities and backgrounds of red-teamers can form their red-teaming methods, and thus the sorts of dangers they&#8217;re prone to uncover. Whereas automated red-teaming approaches promise to enhance human red-teaming by enabling larger-scale exploration of mannequin conduct, present approaches don&#8217;t think about the function of id. As an preliminary step in direction of incorporating individuals\u2019s background and identities in automated red-teaming, we develop and consider a novel technique, PersonaTeaming, that introduces personas within the adversarial immediate era course of to discover a wider spectrum of adversarial methods. Particularly, we first introduce a technique for mutating prompts based mostly on both \u201cred-teaming skilled\u201d personas or \u201ccommon AI consumer\u201d personas. We then develop a dynamic persona-generating algorithm that mechanically generates varied persona varieties adaptive to completely different seed prompts. As well as, we develop a set of latest metrics to explicitly measure the \u201cmutation distance\u201d to enhance present range measurements of adversarial prompts. Our experiments present promising enhancements (as much as 144.1%) within the assault success charges of adversarial prompts by way of persona mutation, whereas sustaining immediate range, in comparison with RainbowPlus, a state-of-the-art automated red-teaming technique. We focus on the strengths and limitations of various persona varieties and mutation strategies, shedding mild on future alternatives to discover complementarities between automated and human red-teaming approaches.<\/p>\n<ul class=\"links-stacked\">\n<li>\u2020 Carnegie Mellon College<\/li>\n<li>\u2021 Unbiased Researcher<\/li>\n<li>** Work performed whereas at Apple<\/li>\n<\/ul>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>This paper was accepted on the Workshop on Regulatable ML (ReML) at NeurIPS 2025. Latest developments in AI governance and security analysis have known as for red-teaming strategies that may successfully floor potential dangers posed by AI fashions. Many of those calls have emphasised how the identities and backgrounds of red-teamers can form their red-teaming [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":7149,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[4042,79,267,979,705,5609,5610],"class_list":["post-7147","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-automated","tag-exploring","tag-improve","tag-introducing","tag-personas","tag-personateaming","tag-redteaming"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7147"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7147\/revisions"}],"predecessor-version":[{"id":7148,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7147\/revisions\/7148"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/7149"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-13 15:17:14 UTC -->