{"id":3949,"date":"2025-06-27T03:06:24","date_gmt":"2025-06-27T03:06:24","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=3949"},"modified":"2025-06-27T03:06:24","modified_gmt":"2025-06-27T03:06:24","slug":"advancing-selfish-video-query-answering-with-multimodal-giant-language-fashions","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=3949","title":{"rendered":"Advancing Selfish Video Query Answering with Multimodal Giant Language Fashions"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Selfish Video Query Answering (QA) requires fashions to deal with long-horizon temporal reasoning, first-person views, and specialised challenges like frequent digicam motion. This paper systematically evaluates each proprietary and open-source Multimodal Giant Language Fashions (MLLMs) on QaEgo4Dv2\u2014a refined dataset of selfish movies derived from QaEgo4D. 4 in style MLLMs (GPT-4o, Gemini-1.5-Professional, Video-LLaVa-7B and Qwen2-VL-7B-Instruct) are assessed utilizing zero-shot and fine-tuned approaches for each OpenQA and CloseQA settings. We introduce QaEgo4Dv2 to mitigate<br \/>\nannotation noise in QaEgo4D, enabling extra dependable comparability. Our outcomes present that fine-tuned Video-LLaVa-7B and Qwen2-VL-7B-Instruct obtain new state-of-the-art efficiency, surpassing earlier benchmarks by as much as +2.6% ROUGE\/METEOR (for OpenQA) and +13% accuracy (for CloseQA). We additionally current an intensive error evaluation, indicating the mannequin\u2019s issue in spatial reasoning and fine-grained object recognition\u2014key areas for future enchancment.<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Selfish Video Query Answering (QA) requires fashions to deal with long-horizon temporal reasoning, first-person views, and specialised challenges like frequent digicam motion. This paper systematically evaluates each proprietary and open-source Multimodal Giant Language Fashions (MLLMs) on QaEgo4Dv2\u2014a refined dataset of selfish movies derived from QaEgo4D. 4 in style MLLMs (GPT-4o, Gemini-1.5-Professional, Video-LLaVa-7B and Qwen2-VL-7B-Instruct) are [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":3951,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[3196,3627,1182,634,1797,266,306,3626,180],"class_list":["post-3949","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-advancing","tag-answering","tag-egocentric","tag-language","tag-large","tag-models","tag-multimodal","tag-question","tag-video"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/3949","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3949"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/3949\/revisions"}],"predecessor-version":[{"id":3950,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/3949\/revisions\/3950"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/3951"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3949"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3949"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3949"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-15 10:42:28 UTC -->