{"id":4776,"date":"2025-07-21T17:08:39","date_gmt":"2025-07-21T17:08:39","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=4776"},"modified":"2025-07-21T17:08:39","modified_gmt":"2025-07-21T17:08:39","slug":"the-obtain-how-your-knowledge-is-getting-used-to-coach-ai-and-why-chatbots-arent-docs","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=4776","title":{"rendered":"The Obtain: how your knowledge is getting used to coach AI, and why chatbots aren\u2019t docs"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Thousands and thousands of photos of passports, bank cards, start certificates, and different paperwork containing personally identifiable data are doubtless included in one of many greatest open-source AI coaching units, new analysis has discovered.<\/p>\n<p>Hundreds of photos\u2014together with identifiable faces\u2014had been present in a small subset of DataComp CommonPool, a significant AI coaching set for picture technology scraped from the online. As a result of the researchers audited simply 0.1% of CommonPool\u2019s knowledge, they estimate that the actual variety of photos containing personally identifiable data, together with faces and identification paperwork, is within the tons of of hundreds of thousands.\u00a0<\/p>\n<p>The underside line? Something you place on-line may be and doubtless has been scraped.<strong> <\/strong><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.technologyreview.com\/2025\/07\/18\/1120466\/a-major-ai-training-data-set-contains-millions-of-examples-of-personal-data\/?utm_source=the_download&amp;utm_medium=email&amp;utm_campaign=the_download.unpaid.engagement&amp;utm_term=*%7CSUBCLASS%7C*&amp;utm_content=*%7CDATE:m-d-Y%7C*\">Learn the complete story<\/a>.<\/p>\n<p><em>\u2014Eileen Guo<\/em><\/p>\n<p class=\"has-medium-font-size\"><strong>AI firms have stopped warning you that their chatbots aren\u2019t docs<\/strong><\/p>\n<p>AI firms have now principally deserted the once-standard observe of together with medical disclaimers and warnings in response to well being questions, new analysis has discovered. In truth, many main AI fashions will no longer solely reply well being questions however even ask follow-ups and try a prognosis.<\/p>\n<p>Such disclaimers serve an necessary reminder to folks asking AI about the whole lot from consuming problems to most cancers diagnoses, the authors say, and their absence implies that customers of AI usually tend to belief unsafe medical recommendation. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.technologyreview.com\/2025\/07\/21\/1120522\/ai-companies-have-stopped-warning-you-that-their-chatbots-arent-doctors\/?utm_source=the_download&amp;utm_medium=email&amp;utm_campaign=the_download.unpaid.engagement&amp;utm_term=*%7CSUBCLASS%7C*&amp;utm_content=*%7CDATE:m-d-Y%7C*\">Learn the complete story<\/a>.<\/p>\n<p><em>\u2014James O\u2019Donnell<\/em><\/p>\n<\/p><\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Thousands and thousands of photos of passports, bank cards, start certificates, and different paperwork containing personally identifiable data are doubtless included in one of many greatest open-source AI coaching units, new analysis has discovered. Hundreds of photos\u2014together with identifiable faces\u2014had been present in a small subset of DataComp CommonPool, a significant AI coaching set for [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":4778,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[54],"tags":[4183,1817,157,1788,562,2547],"class_list":["post-4776","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-news","tag-arent","tag-chatbots","tag-data","tag-doctors","tag-download","tag-train"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4776"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4776\/revisions"}],"predecessor-version":[{"id":4777,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4776\/revisions\/4777"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/4778"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-15 10:40:00 UTC -->