{"id":4472,"date":"2025-07-12T14:01:11","date_gmt":"2025-07-12T14:01:11","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=4472"},"modified":"2025-07-12T14:01:11","modified_gmt":"2025-07-12T14:01:11","slug":"overcoming-vocabulary-constraints-with-pixel-level-fallback","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=4472","title":{"rendered":"Overcoming Vocabulary Constraints with Pixel-level Fallback"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Subword tokenization requires balancing computational effectivity and vocabulary protection, which frequently results in suboptimal efficiency on languages and scripts not prioritized throughout coaching. We suggest to reinforce pretrained language fashions with a vocabulary-free encoder that generates enter embeddings from textual content rendered as pixels. By experiments on English-centric language fashions, we exhibit that our strategy considerably improves machine translation efficiency and facilitates efficient cross-lingual switch, outperforming tokenizer-based strategies. Moreover, we discover that pixel-based representations outperform byte-level approaches and commonplace vocabulary growth. Our strategy enhances the multilingual capabilities of monolingual language fashions with out intensive retraining and reduces decoding latency through enter compression.<\/p>\n<ul class=\"links-stacked\">\n<li>\u2020 College of Copenhagen<\/li>\n<li>\u2021 Mohamed bin Zayed College of Synthetic Intelligence<\/li>\n<li>** Work carried out whereas at Apple<\/li>\n<\/ul>\n<figure id=\"figure1\" class=\"\" aria-label=\"Figure 1\">\n<div class=\"bg-gray-light text-base rounded\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/mlr.cdn-apple.com\/media\/fig1a_converted_9bd2b2fd2a.png\" aria-label=\"Diagram of Hindi-to-English translation pipeline: left shows source encoding and generation process, right shows fallback network segmentation, patch rendering, and word embedding output.\" tabindex=\"-1\" target=\"_blank\" class=\"mt-0\"><img decoding=\"async\" src=\"https:\/\/mlr.cdn-apple.com\/media\/fig1a_converted_9bd2b2fd2a.png\" alt=\"Diagram of Hindi-to-English translation pipeline: left shows source encoding and generation process, right shows fallback network segmentation, patch rendering, and word embedding output.\" loading=\"lazy\" class=\"bg-gray-light\"\/><\/a><\/div><figcaption class=\"muted\" id=\"figure-figure1-caption\" aria-hidden=\"true\">Determine 1: Illustration of our proposed NLP pipeline for Hindi-to-English machine translation. The decoder-only language mannequin is instructed, encodes the supply textual content utilizing the fallback community, and autoregressively generates an English translation.<\/figcaption><\/figure>\n<figure id=\"figure2\" class=\"\" aria-label=\"Figure 2\">\n<div class=\"bg-gray-light text-base rounded\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/mlr.cdn-apple.com\/media\/fig1b_converted_19a251e872.png\" aria-label=\"Illustration of fallback network: text segmented, rendered into bigram patches, and embedded for input into decoder-only LLM.\" tabindex=\"-1\" target=\"_blank\" class=\"mt-0\"><img decoding=\"async\" src=\"https:\/\/mlr.cdn-apple.com\/media\/fig1b_converted_19a251e872.png\" alt=\"Illustration of fallback network: text segmented, rendered into bigram patches, and embedded for input into decoder-only LLM.\" loading=\"lazy\" class=\"bg-gray-light\"\/><\/a><\/div><figcaption class=\"muted\" id=\"figure-figure2-caption\" aria-hidden=\"true\">Determine 2: Contained in the fallback community the textual content is segmented into an inventory of phrases, rendered into picture patches containing character bigrams, and projected into patch embeddings z<sub>i,j<\/sub>. The encoder outputs single-vector phrase representations y<sub>i<\/sub>, mapped as enter embeddings to the language mannequin.<\/figcaption><\/figure>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Subword tokenization requires balancing computational effectivity and vocabulary protection, which frequently results in suboptimal efficiency on languages and scripts not prioritized throughout coaching. We suggest to reinforce pretrained language fashions with a vocabulary-free encoder that generates enter embeddings from textual content rendered as pixels. By experiments on English-centric language fashions, we exhibit that our strategy [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":4474,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[3989,3991,3987,3990,3988],"class_list":["post-4472","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-constraints","tag-fallback","tag-overcoming","tag-pixellevel","tag-vocabulary"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4472","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4472"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4472\/revisions"}],"predecessor-version":[{"id":4473,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4472\/revisions\/4473"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/4474"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4472"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4472"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4472"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-28 06:47:17 UTC -->