{"id":7462,"date":"2025-10-08T05:38:03","date_gmt":"2025-10-08T05:38:03","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=7462"},"modified":"2025-10-08T05:38:03","modified_gmt":"2025-10-08T05:38:03","slug":"this-puzzle-exhibits-simply-how-far-llms-have-progressed-in-a-little-over-a-yr","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=7462","title":{"rendered":"This Puzzle Exhibits Simply How Far LLMs Have Progressed in a Little Over a\u00a0Yr"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p class=\"wp-block-paragraph\"> that the capabilities of LLMs have progressed dramatically in the previous couple of years, however it\u2019s exhausting to quantify simply how good they\u2019ve turn into.<\/p>\n<p class=\"wp-block-paragraph\">That acquired me pondering again to a geometrical downside I got here throughout on a YouTube channel final 12 months. This was in June 2024, and I attempted to get the main massive language mannequin on the time (GPT-4o) to resolve the puzzle. It didn\u2019t go that properly and required <strong><em>quite a bit <\/em><\/strong>of effort to discover a resolution, and I puzzled how the newest LLMs would fare with the identical puzzle.<\/p>\n<h2 class=\"wp-block-heading\">The puzzle<\/h2>\n<p class=\"wp-block-paragraph\">Right here\u2019s a fast reminder of what I used to be asking the LLM to resolve again then. Assume we&#8217;ve got the next grid of dots\/nodes. Within the x and y aircraft, every node is strictly one unit away from its adjoining neighbour. It seems to be like this,<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/10\/image-82.png\" alt=\"\" class=\"wp-image-624514\"\/><\/figure>\n<p class=\"wp-block-paragraph\">Now, the query I needed to reply was this,<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">What number of distinct squares might be drawn on this diagram?<\/code><\/pre>\n<p class=\"wp-block-paragraph\">It shortly turned clear that GPT-4o didn&#8217;t know the reply, so I modified tack barely and as an alternative requested it this.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">I would love a Python program that plots out all of the squares we will \ndraw on the connected diagram, assuming that the corners of any sq. \nshould lie on one of many spots on the diagram. Assume every adjoining spot is \n1 unit aside in each the x and y instructions. Additionally print out a abstract of \nthe variety of squares of the identical dimension and what their aspect lengths are<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Lengthy story brief, I <strong><em>finally<\/em><\/strong> acquired GPT-4o to provide you with an accurate Python-based resolution. Nonetheless, it took me round two hours and properly over 40 iterations of me going forwards and backwards with the mannequin to refine its reply till it got here up with one thing that labored.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">NB, have you ever labored out the reply but? Even now, I nonetheless can\u2019t fairly imagine that it\u2019s 21 squares you may draw on this graph.<\/p>\n<\/blockquote>\n<h2 class=\"wp-block-heading\">Interplay with Sonnet\u00a04.5<\/h2>\n<p class=\"wp-block-paragraph\">This brings us again to the current day and Anthropic\u2019s latest launch of what&#8217;s thought-about one of the best coding mannequin out there right this moment, particularly Claude Sonnet 4.5. It\u2019s been round 16 months since I final checked out this, so simply how a lot have LLMs improved within the intervening interval? Let\u2019s see how Sonnet 4.5 copes with the puzzle.<\/p>\n<p class=\"wp-block-paragraph\">To make use of Sonnet 4.5, I visited Anthropic\u2019s homepage.<\/p>\n<p class=\"wp-block-paragraph\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.anthropic.com\">https:\/\/www.anthropic.com<\/a><\/p>\n<p class=\"wp-block-paragraph\">It is best to see a web page like this.<\/p>\n<figure class=\"wp-block-image alignwide\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*qqvYGzpjlJnPYqI74uDGYA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Picture from the Anthropic web site<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Click on the massive black <strong>\u201cStrive Claude\u201d<\/strong> button positioned close to the highest proper of the display screen. You&#8217;ll be requested to log in at this level. Should you don\u2019t have already got an account with Anthropic, please set one up now. It\u2019s fully free to take action.<\/p>\n<p class=\"wp-block-paragraph\">As soon as you&#8217;re logged in, you may be taken to a display screen displaying a well-recognized chat-based window that means that you can work together with and ask questions of Sonnet 4.5.<\/p>\n<p class=\"wp-block-paragraph\">Earlier than I requested the mannequin to offer a Python program to resolve the puzzle, I requested it a straight query to see if it already knew the reply.<\/p>\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/10\/image-84-1024x289.png\" alt=\"\" class=\"wp-image-624516\"\/><\/figure>\n<p class=\"wp-block-paragraph\">It got here again with,<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Taking a look at this grid, I must rely all attainable squares of various \nsizes that may be shaped by connecting the dots.\nLet me first establish the grid construction. The dots type:\n\n2 dots within the high row\n2 dots within the second row\n6 dots within the third row\n6 dots within the fourth row\n2 dots within the fifth row\n2 dots within the sixth row\n\nThis seems to be a cross or plus-sign formed grid sample.\nNow I will rely squares by dimension:\n1\u00d71 squares: These are shaped by any 4 dots that make a unit sq..\n\nWithin the horizontal center part (rows 3-4): 5 squares per row \u00d7 1 = 5 squares\nWithin the vertical center part (columns the place the cross intersects): counting \nthe vertical segments offers us 5 extra squares\nWhole: 10 small squares\n\n2\u00d72 squares: These want 4\u00d74 preparations of dots.\n\nWithin the horizontal part: 4 squares\nWithin the vertical part: 4 squares\nOverlap in heart: -1 (counted twice)\nWhole: 7 squares of dimension 2\u00d72\n\n3\u00d73 squares and bigger: The grid sample is barely 6 dots broad at its most, however the cross form limits the place bigger squares can match. Checking the geometry:\n\nOne 3\u00d73 sq. can match within the heart the place the arms of the cross overlap\nWhole: 1 sq. of dimension 3\u00d73\n\nWhole: 10 + 7 + 1 = 18 squares<\/code><\/pre>\n<p class=\"wp-block-paragraph\">So, confidently incorrect. Curious and intrigued concerning the mannequin\u2019s reference to a 3\u00d73 sq. (which is inconceivable on this grid), I requested it to indicate me the sq. on the diagram. It replied with this.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Wanting on the grid extra fastidiously, I must rethink whether or not a 3\u00d73 sq. \ntruly exists.\nFor a 3\u00d73 sq. to exist, I would want a 4\u00d74 association of dots \n(since a 3\u00d73 sq. requires 4 dots on either side). Let me verify the grid \nconstruction:\nThe grid seems to be:\n\nRows 1-2: solely 2 dots every (centered)\nRows 3-4: 6 dots every (full width)\nRows 5-6: solely 2 dots every (centered)\n\nFor a 3\u00d73 sq., I would want 4 consecutive rows and 4 consecutive columns all with dots on the intersection factors. Nonetheless, wanting on the cross\/plus sample:\n\nThe highest solely has 2 columns of dots\nThe center has 6 columns of dots\nThe underside solely has 2 columns of dots\n\nSince rows 1-2 and rows 5-6 solely have 2 dots every (not spanning the complete width)\n, there is not truly a 4\u00d74 block of dots wherever on this grid.\nI must appropriate my reply: There's NO 3\u00d73 sq. attainable on this grid.\nThe full ought to be: 10 (1\u00d71 squares) + 7 (2\u00d72 squares) = 17 squares\nI apologize for the error in my preliminary rely. The cross-shaped sample \nprevents any sq. bigger than 2\u00d72 from being shaped.<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Alright, it semi-redeemed itself with that reply, however essentially, it didn\u2019t accurately say there have been 21 squares in complete.<\/p>\n<h2 class=\"wp-block-heading\">The ultimate\u00a0reply<\/h2>\n<p class=\"wp-block-paragraph\">To maintain issues truthful with the mannequin comparability over the 16 months, I requested Sonnet 4.5 the very same query I had requested GPT-4o, which, as a reminder, was this.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">I would love a Python program that plots out all of the squares we will draw on \nthe connected diagram, assuming that the corners of any sq. should lie on one \nof the spots on the diagram. Assume every adjoining spot is 1 unit aside in \neach the x and y instructions. Additionally print out a abstract of the variety of \nsquares of the identical dimension and what their aspect lengths are<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Sonnet 4.5 thought for all of 5 seconds earlier than producing its Python code.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import matplotlib.pyplot as plt\nimport numpy as np\nfrom itertools import mixtures\nfrom collections import defaultdict\n\n# Outline the grid factors based mostly on the picture\nfactors = [\n    # Top two rows (2 dots each, centered)\n    (3, 6), (4, 6),\n    (3, 5), (4, 5),\n    # Middle two rows (6 dots each, full width)\n    (1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4),\n    (1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3),\n    # Bottom two rows (2 dots each, centered)\n    (3, 2), (4, 2),\n    (3, 1), (4, 1),\n]\n\ndef is_square(p1, p2, p3, p4):\n    \"\"\"Test if 4 factors type a sq. and return aspect size if true.\"\"\"\n    factors = [p1, p2, p3, p4]\n    \n    # Calculate all pairwise distances\n    distances = []\n    for i in vary(len(factors)):\n        for j in vary(i + 1, len(factors)):\n            dist = np.sqrt((factors[i][0] - factors[j][0])**2 + \n                          (factors[i][1] - factors[j][1])**2)\n            distances.append(dist)\n    \n    distances.kind()\n    \n    # For a sq.: 4 equal sides and a pair of equal diagonals\n    # Diagonals ought to be sqrt(2) instances the aspect size\n    if len(distances) == 6:\n        sides = distances[:4]\n        diagonals = distances[4:]\n        \n        # Test if all sides are equal\n        if np.allclose(sides, [sides[0]] * 4, atol=1e-6):\n            # Test if diagonals are equal\n            if np.allclose(diagonals, [diagonals[0]] * 2, atol=1e-6):\n                # Test if diagonal = aspect * sqrt(2)\n                if np.allclose(diagonals[0], sides[0] * np.sqrt(2), atol=1e-6):\n                    return True, sides[0]\n    \n    return False, 0\n\ndef order_square_points(p1, p2, p3, p4):\n    \"\"\"Order factors of a sq. in counterclockwise order ranging from bottom-left.\"\"\"\n    factors = [p1, p2, p3, p4]\n    # Discover centroid\n    cx = sum(p[0] for p in factors) \/ 4\n    cy = sum(p[1] for p in factors) \/ 4\n    \n    # Type by angle from centroid\n    def angle(p):\n        return np.arctan2(p[1] - cy, p[0] - cx)\n    \n    factors.kind(key=angle)\n    return factors\n\n# Discover all squares\nsquares = []\nsquares_by_size = defaultdict(checklist)\n\nfor combo in mixtures(factors, 4):\n    is_sq, side_length = is_square(*combo)\n    if is_sq:\n        ordered = order_square_points(*combo)\n        squares.append(ordered)\n        squares_by_size[round(side_length, 2)].append(ordered)\n\n# Print abstract\nprint(\"=\" * 50)\nprint(\"SUMMARY OF SQUARES FOUND\")\nprint(\"=\" * 50)\ntotal_squares = 0\nfor side_length in sorted(squares_by_size.keys()):\n    rely = len(squares_by_size[side_length])\n    total_squares += rely\n    print(f\"Aspect size {side_length}: {rely} sq.(s)\")\n\nprint(f\"nTotal squares: {total_squares}\")\nprint(\"=\" * 50)\n\n# Create visualization\nfig, axes = plt.subplots(2, 2, figsize=(14, 14))\naxes = axes.flatten()\n\n# Plot 1: All dots\nax = axes[0]\nx_coords = [p[0] for p in factors]\ny_coords = [p[1] for p in factors]\nax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)\nax.set_xlim(0, 7)\nax.set_ylim(0, 7)\nax.set_aspect('equal')\nax.grid(True, alpha=0.3)\nax.set_title('Authentic Grid', fontsize=14, fontweight='daring')\nax.set_xlabel('X')\nax.set_ylabel('Y')\n\n# Plot 2: All squares overlaid\nax = axes[1]\nax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)\ncolours = plt.cm.rainbow(np.linspace(0, 1, len(squares)))\nfor sq., colour in zip(squares, colours):\n    square_closed = sq. + [square[0]]\n    xs = [p[0] for p in square_closed]\n    ys = [p[1] for p in square_closed]\n    ax.plot(xs, ys, c=colour, linewidth=2, alpha=0.6)\nax.set_xlim(0, 7)\nax.set_ylim(0, 7)\nax.set_aspect('equal')\nax.grid(True, alpha=0.3)\nax.set_title(f'All {total_squares} Squares Overlaid', fontsize=14, fontweight='daring')\nax.set_xlabel('X')\nax.set_ylabel('Y')\n\n# Plot 3 &amp; 4: Squares by dimension\nplot_idx = 2\nfor side_length in sorted(squares_by_size.keys()):\n    if plot_idx &gt;= len(axes):\n        break\n    \n    ax = axes[plot_idx]\n    ax.scatter(x_coords, y_coords, c='lightgray', s=100, zorder=2)\n    \n    squares_of_size = squares_by_size[side_length]\n    colours = plt.cm.viridis(np.linspace(0, 1, len(squares_of_size)))\n    \n    for sq., colour in zip(squares_of_size, colours):\n        square_closed = sq. + [square[0]]\n        xs = [p[0] for p in square_closed]\n        ys = [p[1] for p in square_closed]\n        ax.plot(xs, ys, c=colour, linewidth=2.5, marker='o', markersize=8)\n    \n    ax.set_xlim(0, 7)\n    ax.set_ylim(0, 7)\n    ax.set_aspect('equal')\n    ax.grid(True, alpha=0.3)\n    ax.set_title(f'Squares with aspect size {side_length} ({len(squares_of_size)} complete)', \n                 fontsize=14, fontweight='daring')\n    ax.set_xlabel('X')\n    ax.set_ylabel('Y')\n    plot_idx += 1\n\nplt.tight_layout()\nplt.present()<\/code><\/pre>\n<p class=\"wp-block-paragraph\">At first sight, I believed there was no method Sonnet might have provide you with an accurate resolution so shortly. GPT-4o<strong><em> <\/em><\/strong>had<strong><em> taken hours<\/em><\/strong>. Nonetheless, I put in all of the required libraries the code wanted and copied and pasted it right into a Jupyter Pocket book cell, operating it and acquiring the next output. I&#8217;ve omitted the graphs for now, which had been additionally output.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">==================================================\nSUMMARY OF SQUARES FOUND\n==================================================\nAspect size 1.0: 9 sq.(s)\nAspect size 1.41: 4 sq.(s)\nAspect size 2.24: 2 sq.(s)\nAspect size 2.83: 4 sq.(s)\nAspect size 3.61: 2 sq.(s)\n\nWhole squares: 21\n==================================================\n\n#\n# Plus some graphs that I am not exhibiting right here\n#<\/code><\/pre>\n<p class=\"wp-block-paragraph\">That shocked me. The reply was completely spot on.<\/p>\n<p class=\"wp-block-paragraph\">The one slight factor the mannequin didn\u2019t fairly get proper was that it didn\u2019t output a plot of every set of in another way sized squares. It simply did the 9 1x1s and the 4 \u221a2x\u221a2 ones. I solved that by asking Sonnet to incorporate these, too.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Are you able to print the graphs in sq. aspect order. Can also you've two graphs  \naspect by aspect on every \"line\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">That is what it produced.<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/10\/image-85.png\" alt=\"\" class=\"wp-image-624517\"\/><\/figure>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/10\/image-87.png\" alt=\"\" class=\"wp-image-624519\"\/><\/figure>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/10\/image-88.png\" alt=\"\" class=\"wp-image-624521\"\/><\/figure>\n<p class=\"wp-block-paragraph\">Stunning.<\/p>\n<h2 class=\"wp-block-heading\">Abstract<\/h2>\n<p class=\"wp-block-paragraph\">To reveal simply how dramatically LLMs have superior in a couple of 12 months, I made a decision to revisit a difficult geometric puzzle I first tried to resolve with GPT-4o again in June 2024. The puzzle was to jot down a Python program that finds and plots all attainable squares on a selected cross-shaped grid of dots.<\/p>\n<p class=\"wp-block-paragraph\">My expertise slightly over a 12 months in the past was a wrestle; it took me roughly two hours and over 40 prompts to information GPT-4o to an accurate Python resolution.<\/p>\n<p class=\"wp-block-paragraph\">Quick ahead to right this moment, and I examined the brand new Claude Sonnet 4.5. After I first requested the mannequin the query immediately, it didn&#8217;t calculate the proper variety of squares. Not an important begin, nonetheless, the actual check was giving it the <strong>very same immediate<\/strong> I used on GPT-4o.<\/p>\n<p class=\"wp-block-paragraph\">To my shock, it produced a whole, appropriate Python resolution in <strong>one shot<\/strong>. The code it generated not solely discovered all 21 squares but additionally accurately categorised them by their distinctive aspect lengths and generated detailed plots to visualise them. Whereas I wanted one fast follow-up immediate to excellent the plots, the core downside was solved immediately.<\/p>\n<p class=\"wp-block-paragraph\">May it&#8217;s that the very act of my attempting to resolve this puzzle final 12 months and publishing my findings launched it to the web-o-sphere, which means Anthropic have merely crawled it and included it into their mannequin data base? Sure, I suppose that might be it, however then why couldn\u2019t the mannequin reply the primary direct query I requested it concerning the complete variety of squares accurately?\u00a0<\/p>\n<p class=\"wp-block-paragraph\">To me, this experiment starkly illustrates the unbelievable leap in LLM functionality. What was as soon as a two-hour iterative wrestle with the main mannequin of its time 16 months in the past is now a five-second, one-shot success with the main mannequin right this moment.<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>that the capabilities of LLMs have progressed dramatically in the previous couple of years, however it\u2019s exhausting to quantify simply how good they\u2019ve turn into. That acquired me pondering again to a geometrical downside I got here throughout on a YouTube channel final 12 months. This was in June 2024, and I attempted to get [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":7464,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[5785,1112,5784,296,518],"class_list":["post-7462","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-ayear","tag-llms","tag-progressed","tag-puzzle","tag-shows"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7462"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7462\/revisions"}],"predecessor-version":[{"id":7463,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7462\/revisions\/7463"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/7464"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-13 11:48:58 UTC -->