{"id":5261,"date":"2025-08-04T18:57:33","date_gmt":"2025-08-04T18:57:33","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=5261"},"modified":"2025-08-04T18:57:33","modified_gmt":"2025-08-04T18:57:33","slug":"advancing-low-useful-resource-languages-with-multitask-nlp-pre-coaching-paper-reflections","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=5261","title":{"rendered":"Advancing Low-Useful resource Languages With Multitask NLP Pre-Coaching [Paper Reflections]"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Lately, Giant Language Fashions (LLMs) have largely improved by scaling. This has primarily concerned <a rel=\"nofollow\" target=\"_blank\" href=\"http:\/\/neptune.ai\/state-of-foundation-model-training-report#h-scaling\" target=\"_blank\" rel=\"noreferrer noopener\">growing the dimensions of the LLMs and the info they&#8217;re educated on<\/a>, leading to a extremely resource-intensive course of that may price as much as thousands and thousands of {dollars}.<\/p>\n<p>Whereas LLMs have grow to be ubiquitous, the resource-intensive pre-training course of poses a menace to the inclusion of low-resource languages, the place knowledge is scarce. Usually, that is accompanied by an absence of funding for compute assets.<\/p>\n<p>In our paper,<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aclanthology.org\/2025.africanlp-1.14\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"> <em><strong>SabiYarn: Advancing Low-Useful resource Languages with multi-task NLP Pre-Coaching<\/strong><\/em><\/a>, which was accepted on the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sites.google.com\/view\/africanlp2025\/home\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">AfricaNLP workshop<\/a> on the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/2025.aclweb.org\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">2025 ACL<\/a>, we suggest a sequence of optimization strategies within the LLM pre-training course of that made it potential to coach a SOTA multilingual basis mannequin on Nigerian languages on a single 24 GB GPU.<\/p>\n<p>One among these methods is a mask-based loss computation technique. This easy thought avoids computing loss on enter immediate tokens the mannequin already is aware of. This enables the loss operate to precisely mirror the mannequin\u2019s true efficiency on the tokens that matter and avoids losing compute by backpropagating losses that don&#8217;t contribute to the mannequin\u2019s studying course of.<\/p>\n<p>On this article, we\u2019ll discover this method, the way it displays the broader compute-aware pre-training design and its affect on the mannequin\u2019s efficiency.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-prompt-tokens-are-too-expensive-in-low-resource-settings\">Immediate tokens are (too) costly in low-resource settings<\/h2>\n<p>Throughout pre-training, LLMs are educated in causal language modeling by way of a next-token prediction process. That is sometimes a sluggish course of involving trillions of tokens, whose aim is to cut back the cross-entropy loss between the anticipated token and the label by way of backpropagation. Alongside the way in which, the mannequin acquires a number of expertise, memorizes details, and builds a world mannequin.<\/p>\n<p>For state-of-the-art fashions like Meta\u2019s <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/ai.meta.com\/blog\/llama-4-multimodal-intelligence\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Llama 4<\/a> or OpenAI\u2019s <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/openai.com\/index\/gpt-4\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GPT-4<\/a>, this computationally intensive course of sometimes entails operating hundreds of GPUs for months, <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/epoch.ai\/data-insights\/models-over-1e25-flop\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">performing over 10<sup>25<\/sup> floating-point operations (FLOP)<\/a>.<\/p>\n<p>Let\u2019s take a look at a concrete instance. Given a sequence like <em>\u201cTranslate English to Yoruba: I like rice. =&gt; Mo f\u1eb9\u0301r\u00e0n \u00ecr\u1eb9s\u00ec,\u201d<\/em> the mannequin is educated to foretell each token, from the immediate to the precise reply:<\/p>\n<div id=\"medium-table-block_572dfada8da30393d14061db5e9e390c\" class=\"block-medium-table c-table__outer-wrapper  aligncenter l-padding__top--0 l-padding__bottom--standard l-margin__top--0 l-margin__bottom--0\">\n<table class=\"c-table\">\n<thead class=\"c-table__head\">\n<tr>\n<td class=\"c-item\" style=\"\">\n<p>\n                            Step                        <\/p>\n<\/td>\n<td class=\"c-item\" style=\"\">\n<p>\n                            Immediate                        <\/p>\n<\/td>\n<td class=\"c-item\" style=\"\">\n<p>\n                            Subsequent token                        <\/p>\n<\/td>\n<td class=\"c-item\" style=\"\">\n<p>\n                            \u00a0                        <\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody class=\"c-table__body\">\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba:<\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba: I<\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba: I like<\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba: I like rice.<\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba: I like rice. -&gt;<\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba: I like rice. -&gt; Mo <\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<tr class=\"c-row\">\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<div class=\"c-ceil__inner\">\n<p><span style=\"font-weight: 400;\">Translate English to Yoruba: I like rice. -&gt; Mo <\/span><span style=\"font-weight: 400;\">f\u1eb9\u0301r\u00e0n<\/span><\/p>\n<\/p><\/div>\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<td class=\"c-ceil\">\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>On this setup, all tokens are handled equally, no matter whether or not they&#8217;re a part of the immediate or the reply. On the one hand, that is easy to arrange. Then again, it means spending compute on studying to foretell tokens which are already identified and static.<\/p>\n<p>Whereas that is positive in settings with nearly limitless compute, it turns into problematic in resource-constrained coaching. Each token prediction contributes to the overall coaching FLOPs. If half the sequence is an instruction or immediate that by no means modifications, that\u2019s half your compute spent on studying what the mannequin doesn\u2019t have to.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-making-do-without-instruction-tuning\">Making do with out instruction-tuning<\/h2>\n<p>Because of extreme compute constraints, we couldn&#8217;t embody a post-training stage the place fashions are sometimes aligned with user-facing objectives utilizing supervised examples and <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/neptune.ai\/blog\/reinforcement-learning-from-human-feedback-for-llms\" target=\"_blank\" rel=\"noreferrer noopener\">reinforcement studying from human suggestions (RLHF)<\/a>. In such phases, fashions study not simply to foretell the subsequent token however to generate useful and aligned responses.<\/p>\n<p>For instance, a pre-trained base mannequin might reply to <em>\u201cHow are you in the present day\u201d<\/em> with <em>\u201c?\u201d<\/em>, finishing the sequence with the probably subsequent token. In distinction, an instruction-tuned mannequin would attempt to present a response that aligns with the aim of being a helpful assistant or chatbot, e.g., <em>\u201cI\u2019m doing good.\u201d<\/em><\/p>\n<p>Since post-training wasn\u2019t possible for SabiYarn, we embedded process consciousness straight into the pre-training section. Our aim was to assist the mannequin generalize past fundamental next-token prediction and towards fixing significant duties like named-entity recognition, sentiment evaluation, and translation solely by way of prompt-based conditioning.<\/p>\n<p>In our <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/drive.google.com\/file\/d\/1wkWdXucSYE0hwGxowkzO3iJiTDR5qayP\/view?usp=sharing\">paper<\/a>, we suggest a task-specific coaching scheme the place the mannequin is conditioned on the duty it should carry out utilizing XML-like immediate tags. Taking inspiration from the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1910.10683\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">T5 paper<\/a>, we used the next template:<\/p>\n<div style=\"opacity: 0;\" class=\"block-code-snippet  l-padding__top--0 l-padding__bottom--0 l-margin__top--standard l-margin__bottom--standard block-code-snippet--regular language-py line-numbers block-code-snippet--show-header\" data-show-header=\"show\" data-header-text=\"\">\n<pre style=\"font-size: .875rem;\" data-prismjs-copy=\"Copy the JavaScript snippet!\"><code><task_tag> model_input <closing_tag> Mannequin\u2019s output.<\/closing_tag><\/task_tag><\/code><\/pre>\n<\/div>\n<p>For instance, an English-to-Pidgin translation process appears like this:<\/p>\n<div style=\"opacity: 0;\" class=\"block-code-snippet  l-padding__top--0 l-padding__bottom--0 l-margin__top--standard l-margin__bottom--standard block-code-snippet--regular language-py line-numbers block-code-snippet--show-header\" data-show-header=\"show\" data-header-text=\"\">\n<pre style=\"font-size: .875rem;\" data-prismjs-copy=\"Copy the JavaScript snippet!\"><code><translate> let me name my father <pcm> : Make I am going name my Papa<\/pcm><\/translate><\/code><\/pre>\n<\/div>\n<p>With this structured format, we have been now capable of solely calculate the cross-entropy loss on simply the label tokens (<em>\u201cMake I am going name my Papa\u201d<\/em>).<\/p>\n<p>That is easy to implement in PyTorch by masking out the immediate tokens within the label tensor. We use <span class=\"c-code-snippet\">-100<\/span> because the ignore index, which PyTorch\u2019s <span class=\"c-code-snippet\">cross_entropy<\/span> loss operate skips:<\/p>\n<div style=\"opacity: 0;\" class=\"block-code-snippet  l-padding__top--0 l-padding__bottom--0 l-margin__top--standard l-margin__bottom--standard block-code-snippet--regular language-py line-numbers block-code-snippet--show-header\" data-show-header=\"show\" data-header-text=\"\">\n<pre style=\"font-size: .875rem;\" data-prismjs-copy=\"Copy the JavaScript snippet!\"><code>labels = input_ids.clone()&#13;\nlabels[:, :prompt_len] = -100<\/code><\/pre>\n<\/div>\n<p>Since PyTorch\u2019s cross-entropy loss operate ignores the -100 token by default, the immediate tokens are ignored when calculating the loss for that sequence.\u00a0<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-learning-only-what-matters\">Studying solely what issues<\/h2>\n<p>An surprising good thing about this method is improved process focus. Because the mannequin shouldn&#8217;t be backpropagating on the enter portion of the sequence, the mannequin\u2019s studying sign comes completely from task-relevant tokens.<\/p>\n<p>Think about a pre-training situation the place an LLM is introduced with:<\/p>\n<div style=\"opacity: 0;\" class=\"block-code-snippet  l-padding__top--0 l-padding__bottom--0 l-margin__top--standard l-margin__bottom--standard block-code-snippet--regular language-py line-numbers block-code-snippet--show-header\" data-show-header=\"show\" data-header-text=\"\">\n<pre style=\"font-size: .875rem;\" data-prismjs-copy=\"Copy the JavaScript snippet!\"><code>translate&gt; let me name my father <pcm> : Make I am going name my Papa<\/pcm><\/code><\/pre>\n<\/div>\n<p>When the loss is computed on each token, the mannequin learns to breed the immediate construction, memorizes the duty tags, and generates the outputs. The educational sign is diluted throughout the complete sequence.<\/p>\n<p>Utilizing loss masking, the mannequin can nonetheless make input-output connections by way of the self-attention mechanism through the ahead go. Nonetheless, backpropagation (studying) solely happens when predicting the output tokens:<\/p>\n<p>We will evaluate this to how we as people study to translate to a brand new language: We obtain the complete enter as context, however studying happens once we\u2019re corrected on our translation, not on the enter sentence already offered to us.<\/p>\n<p>Masking out the enter forces the mannequin to deal with prompts as context relatively than a prediction goal, permitting coaching to give attention to input-output mappings and decreasing the tendency to overfit on immediate formatting.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-investigating-the-impact-of-task-focus-on-training-performance\">Investigating the impression of process give attention to coaching efficiency<\/h2>\n<p>To substantiate this discovering, we ran an experiment the place we educated the mannequin on a non-trivial drawback of descrambling sentences utilizing the masked loss scheme and a non-masked loss as a comparability.<\/p>\n<p>The duty was to show grammatically incoherent sentences into their coherent kinds utilizing the identical phrases within the enter. For instance, \u201c<em>The equations costly. <strong>present is<\/strong> optimization computationally that.\u201d <\/em>must be corrected to <em>\u201cThe equations<strong> present<\/strong> that optimization <strong>is<\/strong> computationally costly.\u201d<\/em> This process requires studying advanced relationships between enter and output sequences.<\/p>\n<p>Right here\u2019s what the loss curves appeared like:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" width=\"1920\" height=\"1009\" src=\"https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=1920%2C1009&amp;ssl=1\" alt=\"Loss curves\" class=\"wp-image-47889\" srcset=\"https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=1920%2C1009&amp;ssl=1 1920w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=768%2C404&amp;ssl=1 768w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=200%2C105&amp;ssl=1 200w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=1536%2C808&amp;ssl=1 1536w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=220%2C116&amp;ssl=1 220w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=120%2C63&amp;ssl=1 120w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=160%2C84&amp;ssl=1 160w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=300%2C158&amp;ssl=1 300w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=480%2C252&amp;ssl=1 480w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?resize=1020%2C536&amp;ssl=1 1020w, https:\/\/i0.wp.com\/neptune.ai\/wp-content\/uploads\/2025\/07\/Advancing-Low-Resource-Languages-with-Multitask-NLP-Pretraining-loss-curves.png?w=1999&amp;ssl=1 1999w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"\/><\/figure>\n<\/div>\n<p>We will see that the mannequin converged sooner on the duty when the loss on the enter immediate wasn\u2019t calculated. These effectivity beneficial properties compound dramatically over the complete coaching run and result in sooner convergence.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-the-cost-of-masking-what-are-we-losing\">The price of masking: what are we shedding?<\/h2>\n<p>Whereas masking the immediate tokens throughout loss computation helps preserve compute and sharpen focus, it\u2019s not with out tradeoffs. Excluding the prompts from the training sign will increase the chance that the mannequin will fail to adapt to duties the place the immediate construction or phrasing modifications at inference time.<\/p>\n<p>That stated, such tradeoffs should be weighed in opposition to the fact of useful resource constraints. In low-resource coaching situations, approaches that cut back compute whereas preserving core process efficiency are sometimes preferable to totally supervised, resource-intensive options.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-the-case-for-native-llms-for-african-languages\">The case for native LLMs for African languages<\/h2>\n<p>Whereas the broader African LLM neighborhood has targeted its efforts on adapting open-source pre-trained fashions to African languages, pre-training a foundational mannequin from scratch provides the promise of constructing a mannequin that doesn\u2019t inherit the cultural biases of Euro-American corpora. It additionally offers invaluable analysis insights and knowledge about tokenization, switch studying, linguistic patterns, and coaching dynamics for African languages.<\/p>\n<p>An usually uncared for space is the tokenizer. Tokenizers decide how languages are damaged into tokens that LLMs can acknowledge. Coaching from scratch permits us to coach our personal language-specific tokenizers, thereby integrating the morphological and phonological construction, equivalent to tonal diacritics in Yoruba, which additionally carry semantic that means.<\/p>\n<p>It additionally helps with effectivity, as we acquire a tokenizer that successfully tokenizes every language into tokens that acknowledge helpful grammatical constructions, equivalent to affixes and punctuation, which may be utilized by the mannequin to study significant representations. In distinction, utilizing an current tokenizer that isn&#8217;t educated on the goal languages results in poor tokenization, with tokens that don\u2019t precisely mirror grammatical construction, inflated sequence lengths, and finally degraded efficiency. That is very true for small fashions, that are interesting as a consequence of their decrease computing calls for.<\/p>\n<p>Wanting ahead, the long run work of our analysis group focuses on exploring trendy LLM architectures, introducing reasoning, instruction following, and test-time computing methods to resource-constrained pre-training. We\u2019re additionally exploring hardware-specific optimizations in coaching and inference and increasing our efforts to much more African languages.\u00a0<\/p>\n<div class=\"c-article-rating\" data-post-id=\"47866\">\n<h2 class=\"c-article-rating__header\">\n\t\t\t\t\t\tWas the article helpful?\t\t\t\t\t<\/h2>\n<div class=\"c-article-rating__buttons\">\n<p><button class=\"js-c-button js-c-button--yes c-button c-button--yes\" data-value=\"yes\" data-status=\"default\"><br \/>\n\t<img src=\"https:\/\/neptune.ai\/wp-content\/themes\/neptune\/img\/icon-article-rating--yes.svg\" width=\"32\" height=\"32\" loading=\"lazy\" decoding=\"async\" class=\"c-button__icon\" alt=\"yes\"\/><\/p>\n<p>\t\t\t<span class=\"c-button__label\"><br \/>\n\t\t\tSure\t\t<\/span><br \/>\n\t<\/button><\/p>\n<p><button class=\"js-c-button js-c-button--no c-button c-button--no\" data-value=\"no\" data-status=\"default\"><br \/>\n\t<img src=\"https:\/\/neptune.ai\/wp-content\/themes\/neptune\/img\/icon-article-rating--no.svg\" width=\"32\" height=\"32\" loading=\"lazy\" decoding=\"async\" class=\"c-button__icon\" alt=\"no\"\/><\/p>\n<p>\t\t\t<span class=\"c-button__label\"><br \/>\n\t\t\tNo\t\t<\/span><br \/>\n\t<\/button><\/p><\/div>\n<div class=\"c-article-feedback-form\">\n\t<button class=\"js-c-article-feedback-form__form-button c-article-feedback-form__form-button\" data-status=\"inactive\"><\/p>\n<p>\t\t<img loading=\"lazy\" decoding=\"async\" class=\"c-item__icon\" src=\"https:\/\/neptune.ai\/wp-content\/themes\/neptune\/img\/icon-bulb.svg\" width=\"20\" height=\"20\" alt=\"\"\/><\/p>\n<p>\t\t<span class=\"c-item__label\"><br \/>\n\t\t\tCounsel modifications\t\t<\/span><br \/>\n\t<\/button><\/p>\n<\/div><\/div>\n<div class=\"c-i-box c-i-box--blog\">\n<div class=\"c-i-box-topics\">\n<h3 class=\"c-i-box-topics__title\">\n\t\t\tDiscover extra content material matters:\t<\/h3>\n<\/div>\n<\/div><\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Lately, Giant Language Fashions (LLMs) have largely improved by scaling. This has primarily concerned growing the dimensions of the LLMs and the info they&#8217;re educated on, leading to a extremely resource-intensive course of that may price as much as thousands and thousands of {dollars}. Whereas LLMs have grow to be ubiquitous, the resource-intensive pre-training course [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":5263,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[3196,3095,4485,4486,4050,424,4487,4488],"class_list":["post-5261","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-advancing","tag-languages","tag-lowresource","tag-multitask","tag-nlp","tag-paper","tag-pretraining","tag-reflections"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/5261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5261"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/5261\/revisions"}],"predecessor-version":[{"id":5262,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/5261\/revisions\/5262"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/5263"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-05-12 17:07:13 UTC -->