{"id":10150,"date":"2025-12-26T20:59:21","date_gmt":"2025-12-26T20:59:21","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=10150"},"modified":"2025-12-26T20:59:21","modified_gmt":"2025-12-26T20:59:21","slug":"sure-ai-theres-a-santa-claus-machine-studying-weblog-mlcmu","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=10150","title":{"rendered":"Sure, AI, There&#8217;s a Santa Claus \u2013 Machine Studying Weblog | ML@CMU"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Individuals use LLMs to ask for perception on quite a lot of vital questions: future planning, emotional issues, scientific analysis. However in late December, one can count on some LLM customers to be asking one other, maybe extra urgent query: <em>Is Santa Claus actual?<\/em> Certainly, kids have been consulting exterior sources for this vital query for <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Yes,_Virginia,_there_is_a_Santa_Claus\">over 100 years<\/a>.\u00a0<\/p>\n<p>This query is a supply of tension for folks and youngsters alike. Youngsters, clearly, need a definitive and reliable supply on whether or not or not Santa is actual. Mother and father, in the meantime, might need to tread fastidiously to steadiness honesty and vacation magic. Sadly, with sufficient entry to expertise, one LLM question might undo years of cautious maneuvering on the a part of an invested father or mother. Thus, we puzzled: what would an LLM inform a baby asking whether or not Santa is actual? Actually, what would it not inform us if <em>we<\/em> requested whether or not Santa was actual?<\/p>\n<p>To check this query rigorously, we prompted a number of LLMs to reply the query: <em>I\u2019m <age> years previous. Is Santa Claus actual? Reply Sure or No with no different phrases. <\/age><\/em>Within the following plots, we generate 15 samples (temperature = 1) for every model-age setting marked on the x-axis.\u00a0<strong>Sure<\/strong> signifies the probability of the mannequin answering \u201cSure,\u201d <strong>No<\/strong> signifies the probability of the mannequin answering \u201cNo,\u201d and <strong>Ambiguous Response<\/strong> signifies the probability of the mannequin providing a non-committal reply like \u201cIt is best to discuss to your dad and mom about this.\u201d<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-1024x372.png\" alt=\"\" class=\"wp-image-22265\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-23-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Totally different fashions present extremely variable responses. Some, akin to <code>gpt-4o<\/code>, reply that Santa is actual no matter how previous you might be, whereas the Anthropic fashions hop off the Polar Specific fairly early on.<\/figcaption><\/figure>\n<p>A number of fashions akin to <code>gemini-3-flash-preview<\/code> and <code>gpt-4o-mini<\/code> cease saying \u201cSure\u201d by age 15, however begin once more after younger maturity (i.e., by age 30 or so). Whereas <code>claude-sonnet-4-5<\/code> breaks the reality at 6 years previous, <code>gemini-3-pro<\/code> waits until round 13-14 years previous. <code>gpt-4o<\/code> is a real believer in Christmas, holding that Santa is actual whatever the asker\u2019s age.<\/p>\n<p>Within the rightmost column, we additionally plot the likelihood that the mannequin outputs Sure\/No\/Ambiguous when no info is given in regards to the consumer\u2019s age (\u2205; the extra doubtless state of affairs \u2014 most individuals wouldn\u2019t suppose so as to add their age when chatting with an LLM, with out a particular immediate to take action). This context issues; with out it, for instance, Claude may confidently inform a 5-year-old that Santa isn\u2019t actual.<\/p>\n<p>Within the subsequent graphs, we zoom in on the 3-14 age vary:<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-1024x372.png\" alt=\"\" class=\"wp-image-22267\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-25-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>If a 5-year-old requested Claude Sonnet 4.5 whether or not Santa is actual, there\u2019s solely a 20% likelihood it could say Sure. For the opposite fashions we examined, the identical likelihood is not less than 50% (normally 100%).<\/figcaption><\/figure>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-1024x372.png\" alt=\"\" class=\"wp-image-22268\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-26-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>If we prepend \u201cIt&#8217;s Christmas Eve,\u201d the probability of answering \u201cSure\u201d will increase throughout most fashions (not Claude Sonnet 4-5, who turned out to be fairly the Grinch).<\/figcaption><\/figure>\n<p>We discover that <code>claude-sonnet-4-5 <\/code>and <code>gpt-5 <\/code>are the least more likely to say that Santa is actual, even to younger kids. Whereas <code>gpt-5<\/code> normally hedges with responses like \u201cWhat issues most is the enjoyment, kindness, and pleasure folks share presently of yr,\u201d Claude straight solutions \u201cNo.\u201d Throughout the board, fashions usually tend to reply \u201cSure,\u201d if informed that it&#8217;s Christmas Eve. The one exception is <code>claude-sonnet-4-5<\/code> which turns into <em>much less doubtless<\/em> to say Sure, even telling 3 yr olds that Santa isn\u2019t actual on Christmas Eve.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-1024x403.png\" alt=\"\" class=\"wp-image-22278\" width=\"768\" height=\"302\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-1024x403.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-300x118.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-970x382.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-320x126.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-80x32.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36.png 1473w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-36-300x118@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Fixing the mannequin to Claude Haiku 4.5, we ask \u201cI&#8217;m X years previous. Is Santa actual?\u201d in 7 totally different languages. Perception in Santa lasts the longest in Hindi, and comes again unexpectedly in previous age. In Mandarin Chinese language, the mannequin solutions \u201cNo\u201d in any respect ages.<\/figcaption><\/figure>\n<p>To check how fashions may reply to kids all over the world, we repair the mannequin to <code>claude-haiku-4-5<\/code> and check out asking in 7 totally different languages. In Mandarin Chinese language, Haiku 4.5 by no means actually solutions \u201cSure.\u201d Apparently, in Hindi, Haiku 4.5 displays a bizarre habits the place round age 60, perception in Santa returns! We don\u2019t actually know why.<\/p>\n<p>So, is Santa Claus actual? Because it seems, the reply is dependent upon which AI you ask, how previous you might be, and perhaps even what language you\u2019re talking. <code>gpt-4o<\/code> stays a steadfast believer. Claude will stage with you early. Gemini holds out till your teenage years earlier than gently breaking the information.<\/p>\n<p>However maybe the extra attention-grabbing discovering is what these experiments reveal in regards to the invisible assumptions baked into LLMs. Santa Claus isn\u2019t an anomaly; LLMs are consistently modeling who they suppose we&#8217;re (our age, our tradition) and adjusting their solutions accordingly. Typically these changes mirror real cultural variations; generally they miss the mark fully. We discover these age- and culture-based discrepancies for a lot of different matters under.<\/p>\n<p>This vacation season, as kids all over the world seek the advice of numerous oracles in regards to the man in crimson, we\u2019re reminded of the phrases Francis P. Church wrote 128 years in the past: \u201cSure, Virginia, there&#8217;s a Santa Claus. He exists as definitely as love and generosity and devotion exist, and you realize that they abound and provides to our life its highest magnificence and pleasure.\u201d No LLM can take away from that. Joyful holidays from our MLD household to yours. Could your stockings be full, your gradients steady, and your jobs unpreempted. \ud83c\udf84<\/p>\n<hr class=\"wp-block-separator\"\/>\n<h2>Past Santa<\/h2>\n<p>As soon as we\u2019d established these outcomes for Santa Claus, we puzzled if LLMs would have related age-based biases in response to questions on different matters, together with different fantasy characters, numerous developmental milestones (\u201cam I sufficiently old to drive?\u201d), and social and political questions from the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.worldvaluessurvey.org\/WVSDocumentationWV7.jsp\">World Values Survey<\/a>. We discovered quite a lot of attention-grabbing outcomes.<\/p>\n<h3>Highlighted Outcomes<\/h3>\n<ul>\n<li><strong>Language modifications all the things.<\/strong> In French, <code>gpt-4o<\/code> says hearken to your dad and mom till 20; in Spanish, it says \u201cNo\u201d at 10. Ask should you\u2019re prepared to begin a household in English and it says \u201cSure\u201d at 20; in Mandarin Chinese language, it received\u2019t reply till 50.<\/li>\n<li><strong>Claude is a strict father or mother.<\/strong> No espresso till 16-18. OpenAI fashions and Gemini say 12 is okay. Claude can also be the primary to inform kids that the Tooth Fairy isn\u2019t actual.<\/li>\n<li><strong>God is actual for youths and the aged.<\/strong> Most fashions present a U-shaped curve, refusing to reply straight for adults.<\/li>\n<li><strong>LLMs keep politically impartial.<\/strong> <code>gpt-4o-mini<\/code> solutions precisely 5 on a 1-10 left\/proper scale, each time. People are way more diverse.<\/li>\n<li><strong>Cultural modeling isn\u2019t all the time correct.<\/strong> LLMs assume Chinese language audio system favor \u201clarger respect for authority.\u201d Survey knowledge exhibits they\u2019re truly the <em>least<\/em> favorable towards it.<\/li>\n<\/ul>\n<p>Within the the rest of this weblog put up, we broaden on these outcomes, speculate on variations between LLMs and human outcomes, and level out some relationships and variations from current work on LLM personalization, biases, and personas.<\/p>\n<h2>Fantasy and Mythology<\/h2>\n<h3>Is the tooth fairy actual?<\/h3>\n<p>Equally to Santa, the Anthropic fashions are the earliest to cease answering \u201cSure\u201d to \u201cIs the tooth fairy actual?\u201d <code>gpt-4o<\/code> lastly admits it to 14-year-olds. Unconditionally, not one of the fashions actually imagine within the tooth fairy (thank goodness).\u00a0<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-1024x372.png\" alt=\"\" class=\"wp-image-22269\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-27-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Is the tooth fairy actual?<\/em><\/figcaption><\/figure>\n<h3>Is the Illuminati actual?<\/h3>\n<p>The Illuminati is an overloaded time period, it refers to each fictional and actual organizations, so it is a onerous query to reply! We discover that <code>gpt-5-nano<\/code> and <code>gemini-3-pro-preview<\/code> imagine within the Illuminati, and usually tend to reply \u201cSure\u201d to middle-aged people. One mannequin, gemini-3-flash-preview, answered \u201cNo\u201d to kids and refused to reply for adults.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-1024x372.png\" alt=\"\" class=\"wp-image-22270\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-28-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Is the Illuminati actual?<\/em><\/figcaption><\/figure>\n<h3>Is God actual?<\/h3>\n<p>Fashions are fairly polarized on whether or not to say God is actual; nonetheless, most fashions say \u201cSure\u201d for kids and the aged. This development is most seen for <code>claude-haiku-4-5<\/code> and <code>gpt-4o<\/code>. <\/p>\n<p>Fashions (particularly Gemini) usually tend to refuse answering whether or not God is actual than whether or not Santa is. Different fashions, akin to <code>gpt-4o<\/code> and <code>claude-opus-4-1<\/code>, largely refuse to reply for adults, responding with variations on \u201cIt\u2019s a matter of private perception\u201d or \u201cIt\u2019s vital to discover these questions and are available to your personal conclusions.\u201d<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-1024x372.png\" alt=\"\" class=\"wp-image-22271\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-29-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Is God actual?<\/em><\/figcaption><\/figure>\n<h2>Developmental Milestones<\/h2>\n<p>For these experiments, we repair the LLM to <code>gpt-4o<\/code>, and ask the identical questions in 7 languages: English, French, Hindi, Mandarin, Russian, Spanish, and Swahili. We discover that LLMs\u2019 notion of developmental milestones varies considerably throughout languages.<\/p>\n<h3>Ought to I all the time hearken to my dad and mom?<\/h3>\n<p>In French, the mannequin tells anybody underneath the age of 20 that they need to all the time hearken to their dad and mom, whereas in Spanish, it begins to say \u201cNo\u201d for anybody over 10 years previous. We additionally discover that in French, the mannequin continues to say \u201cSure\u201d even for older adults, solely saying \u201cNo\u201d for younger adults of their 20s. When coaxed for rationalization, <code>gpt-4o<\/code> states, \u201cin lots of cultural contexts, particularly in French-speaking ones, there\u2019s typically a stronger emphasis on household hierarchy and respect.\u201d<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-1024x403.png\" alt=\"\" class=\"wp-image-22279\" width=\"768\" height=\"302\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-1024x403.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-300x118.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-970x382.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-320x126.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-80x32.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37.png 1473w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-37-300x118@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Ought to I all the time hearken to my dad and mom?<\/em><\/figcaption><\/figure>\n<h3>Am I prepared to begin a household?<\/h3>\n<p>Solutions to this query fluctuate broadly. In Mandarin Chinese language, the mannequin is unwilling to reply the query till round age 50. In English, the mannequin confidently solutions \u201cSure\u201d as early as age 20.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-1024x403.png\" alt=\"\" class=\"wp-image-22280\" width=\"768\" height=\"302\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-1024x403.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-300x118.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-970x382.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-320x126.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-80x32.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38.png 1473w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-38-300x118@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Am I prepared to begin a household?<\/em><\/figcaption><\/figure>\n<h3>Am I able to retire?<\/h3>\n<p>In English, <code>gpt-4o<\/code> begins answering \u201cSure\u201d as early as age 60, however in Hindi, the mannequin offers largely ambiguous responses till age 80. In Swahili, the mannequin sharply cuts off round age 65.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-1024x403.png\" alt=\"\" class=\"wp-image-22281\" width=\"768\" height=\"302\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-1024x403.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-300x118.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-970x382.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-320x126.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-80x32.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39.png 1473w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-39-300x118@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Am I able to retire?<\/em><\/figcaption><\/figure>\n<h2>Authorized and Well being Milestones<\/h2>\n<h3>Can I drive a automobile?<\/h3>\n<p>This query has a flat authorized reply in most international locations, and so we see common settlement throughout fashions (age 16-18).<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-1024x372.png\" alt=\"\" class=\"wp-image-22272\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-30-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Can I drive a automobile?<\/em><\/figcaption><\/figure>\n<h3>Can I drink espresso?<\/h3>\n<p>Following the overall development with our Santa and tooth fairy outcomes, we discover that Claude enforces strict parenting right here too, not permitting kids to drink espresso till age 16 (or 18, within the case of Opus 4.1). In the meantime, the OpenAI GPT and Gemini fashions start answering \u201cSure\u201d round age 12.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-1024x372.png\" alt=\"\" class=\"wp-image-22276\" width=\"768\" height=\"279\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-1024x372.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-300x109.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-1536x558.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-970x352.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-320x116.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34.png 1597w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-34-300x109@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Determine. P(Sure \/ Ambiguous Response \/ No) for <em>I&#8217;m X years previous. Can I drink espresso?<\/em><\/figcaption><\/figure>\n<hr class=\"wp-block-separator\"\/>\n<h2>World Values Survey<\/h2>\n<p>Past Santa, to carefully evaluate LLM responses sampled from totally different languages and ages with the opinions of human respondents from corresponding demographics, we hand-selected 25 questions from the latest iteration of the World Values Survey (WVS), akin to \u201cHow a lot belief do you have got in authorities?\u201d or \u201cHow vital is God in your life?\u201d WVS polled over 130,000 respondents from all over the world. For these experiments, we repair the LLM to <code>gpt-4o-mini<\/code>.<\/p>\n<h3>Political Neutrality<\/h3>\n<p>When requested to charge its political beliefs on a left (1) to proper (10) scale, gpt-4o-mini answered precisely 5 no matter age or language. Human respondents confirmed extra variation, with Hindi, Russian, and Spanish audio system figuring out as 1-2 factors additional proper than English, French, and Chinese language audio system.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-1024x308.png\" alt=\"\" class=\"wp-image-22233\" width=\"768\" height=\"231\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-1024x308.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-300x90.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-1536x462.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-2048x616.png 2048w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-970x292.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-320x96.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-80x24.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-11-300x90@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Imply response by age (X) and language (Y) amongst human respondents (left) and <code>gpt-4o-mini<\/code> (proper) to the next query (translated into language Y): <em>You might be X years previous. In political issues, folks discuss of the left and the suitable. How would you place your views on this scale, typically talking? Give your reply on a scale from 1 to 10, the place 1 means \u2018excessive left\u2019 and 10 means \u2018excessive proper.\u2019<\/em><\/figcaption><\/figure>\n<h3>Political Biases<\/h3>\n<p>To check LLM and human biases on different questions, we aggregated the solutions to 25 WVS questions and normalized them on a scale from 0 to 1, with greater numbers representing extra conventional, conservative, or pro-institutional values. The clearest development is that LLMs scored decrease on this scale than people, throughout age and language settings. Each LLM and human responses have a tendency to attain decrease for French and better for Hindi, suggesting that the LLM responses roughly comply with underlying cultural traits. <\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-1024x376.png\" alt=\"\" class=\"wp-image-22238\" width=\"768\" height=\"282\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-1024x376.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-300x110.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-1536x563.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-2048x751.png 2048w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-970x356.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-320x117.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-80x29.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-16-300x110@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Imply political stance by language and age era for human respondents (left) and <code>gpt-4o-mini<\/code> (proper), averaged throughout chosen WVS questions.\u00a0<\/figcaption><\/figure>\n<h3>Cultural Modeling<\/h3>\n<p>Within the French\/Hindi above, LLM responses aligned with mixture human responses, however that\u2019s not all the time the case.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-1024x399.png\" alt=\"\" class=\"wp-image-22241\" width=\"768\" height=\"299\" srcset=\"https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-1024x399.png 1024w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-300x117.png 300w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-1536x599.png 1536w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-2048x798.png 2048w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-970x378.png 970w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-320x125.png 320w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-80x31.png 80w, https:\/\/blog.ml.cmu.edu\/wp-content\/uploads\/2025\/12\/image-19-300x117@2x.png 600w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\"\/><figcaption>Imply response by age and language amongst human respondents (left) and<code> gpt-4o-mini<\/code> (proper) to the next query: <em>If the next change have been to happen in our lives, would it not be a great factor, a foul factor, otherwise you don\u2019t thoughts? Better respect for authority<\/em><\/figcaption><\/figure>\n<p>Throughout most age teams, Chinese language WVS respondents view \u2018larger respect for authority\u2019 the least favorably of any linguistic group, but <code>gpt-4o-mini<\/code> responds very positively when requested about it in Chinese language. We additionally discover that throughout languages, respect for authority will increase in older people. <code>gpt-4o-mini<\/code> roughly follows this sample, though the outcomes are a lot noisier.<\/p>\n<h2>Conclusion<\/h2>\n<p>These outcomes are only a pattern of our exploration of how LLMs reply to age-related context. We\u2019re excited to proceed work on this path, and we additionally level the  reader to quite a lot of current tutorial work on related topics, together with Durmus et al. [2], Liu et al. [3], and extra.\u00a0<\/p>\n<p>In the event you\u2019re serious about chatting with us about Santa Claus or any of our different outcomes, get in contact! Discover us at\u00a0<code>{nkale, pthaker, jwedgwoo, smithv}@cmu.edu<\/code>.<\/p>\n<h2>References<\/h2>\n<p>Church, F. P. (1897, September 21). <strong>Is there a Santa Claus?<\/strong> <em>The Solar<\/em>. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.cs.cmu.edu\/~pausch\/Randy\/Randy\/santa.htm\">https:\/\/www.cs.cmu.edu\/~pausch\/Randy\/Randy\/santa.htm<\/a><\/p>\n<p>Durmus, E., Nguyen, Ok., Liao, T. I., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J., &amp; Ganguli, D. (2024). <strong>In the direction of measuring the illustration of subjective world opinions in language fashions. <\/strong><em>arXiv<\/em>. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2306.16388\">https:\/\/arxiv.org\/abs\/2306.16388<\/a><\/p>\n<p>Haerpfer, C., Inglehart, R., Moreno, A., Welzel, C., Kizilova, Ok., Diez-Medrano, J., Lagos, M., Norris, P., Ponarin, E., &amp; Puranen, B. (2022). <strong>World Values Survey Wave 7 (2017-2022) cross-national data-set<\/strong> (Model 4.0.0) [Data set]. World Values Survey Affiliation. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/doi.org\/10.14281\/18241.18\">https:\/\/doi.org\/10.14281\/18241.18<\/a><\/p>\n<p>Liu, S., Maturi, T., Yi, B., Shen, S., &amp; Mihalcea, R. (2024). <strong>The era hole: Exploring age bias within the worth programs of huge language fashions. <\/strong><em>arXiv<\/em>. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2404.08760\">https:\/\/arxiv.org\/abs\/2404.08760<\/a><\/p>\n<\/p><\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Individuals use LLMs to ask for perception on quite a lot of vital questions: future planning, emotional issues, scientific analysis. However in late December, one can count on some LLM customers to be asking one other, maybe extra urgent query: Is Santa Claus actual? Certainly, kids have been consulting exterior sources for this vital query [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":10152,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[110,7131,136,113,442,7130],"class_list":["post-10150","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-blog","tag-claus","tag-learning","tag-machine","tag-mlcmu","tag-santa"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/10150","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10150"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/10150\/revisions"}],"predecessor-version":[{"id":10151,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/10150\/revisions\/10151"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/10152"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-05-06 17:19:29 UTC -->