This Puzzle Exhibits Simply How Far LLMs Have Progressed in a Little Over a Yr

that the capabilities of LLMs have progressed dramatically in the previous couple of years, however it’s exhausting to quantify simply how good they’ve turn into.

That acquired me pondering again to a geometrical downside I got here throughout on a YouTube channel final 12 months. This was in June 2024, and I attempted to get the main massive language mannequin on the time (GPT-4o) to resolve the puzzle. It didn’t go that properly and required quite a bit of effort to discover a resolution, and I puzzled how the newest LLMs would fare with the identical puzzle.

The puzzle

Right here’s a fast reminder of what I used to be asking the LLM to resolve again then. Assume we’ve got the next grid of dots/nodes. Within the x and y aircraft, every node is strictly one unit away from its adjoining neighbour. It seems to be like this,

Now, the query I needed to reply was this,

What number of distinct squares might be drawn on this diagram?

It shortly turned clear that GPT-4o didn’t know the reply, so I modified tack barely and as an alternative requested it this.

I would love a Python program that plots out all of the squares we will 
draw on the connected diagram, assuming that the corners of any sq. 
should lie on one of many spots on the diagram. Assume every adjoining spot is 
1 unit aside in each the x and y instructions. Additionally print out a abstract of 
the variety of squares of the identical dimension and what their aspect lengths are

Lengthy story brief, I finally acquired GPT-4o to provide you with an accurate Python-based resolution. Nonetheless, it took me round two hours and properly over 40 iterations of me going forwards and backwards with the mannequin to refine its reply till it got here up with one thing that labored.

NB, have you ever labored out the reply but? Even now, I nonetheless can’t fairly imagine that it’s 21 squares you may draw on this graph.

Interplay with Sonnet 4.5

This brings us again to the current day and Anthropic’s latest launch of what’s thought-about one of the best coding mannequin out there right this moment, particularly Claude Sonnet 4.5. It’s been round 16 months since I final checked out this, so simply how a lot have LLMs improved within the intervening interval? Let’s see how Sonnet 4.5 copes with the puzzle.

To make use of Sonnet 4.5, I visited Anthropic’s homepage.

https://www.anthropic.com

It is best to see a web page like this.

Click on the massive black “Strive Claude” button positioned close to the highest proper of the display screen. You’ll be requested to log in at this level. Should you don’t have already got an account with Anthropic, please set one up now. It’s fully free to take action.

As soon as you’re logged in, you may be taken to a display screen displaying a well-recognized chat-based window that means that you can work together with and ask questions of Sonnet 4.5.

Earlier than I requested the mannequin to offer a Python program to resolve the puzzle, I requested it a straight query to see if it already knew the reply.

It got here again with,

Taking a look at this grid, I must rely all attainable squares of various 
sizes that may be shaped by connecting the dots.
Let me first establish the grid construction. The dots type:

2 dots within the high row
2 dots within the second row
6 dots within the third row
6 dots within the fourth row
2 dots within the fifth row
2 dots within the sixth row

This seems to be a cross or plus-sign formed grid sample.
Now I will rely squares by dimension:
1×1 squares: These are shaped by any 4 dots that make a unit sq..

Within the horizontal center part (rows 3-4): 5 squares per row × 1 = 5 squares
Within the vertical center part (columns the place the cross intersects): counting 
the vertical segments offers us 5 extra squares
Whole: 10 small squares

2×2 squares: These want 4×4 preparations of dots.

Within the horizontal part: 4 squares
Within the vertical part: 4 squares
Overlap in heart: -1 (counted twice)
Whole: 7 squares of dimension 2×2

3×3 squares and bigger: The grid sample is barely 6 dots broad at its most, however the cross form limits the place bigger squares can match. Checking the geometry:

One 3×3 sq. can match within the heart the place the arms of the cross overlap
Whole: 1 sq. of dimension 3×3

Whole: 10 + 7 + 1 = 18 squares

So, confidently incorrect. Curious and intrigued concerning the mannequin’s reference to a 3×3 sq. (which is inconceivable on this grid), I requested it to indicate me the sq. on the diagram. It replied with this.

Wanting on the grid extra fastidiously, I must rethink whether or not a 3×3 sq. 
truly exists.
For a 3×3 sq. to exist, I would want a 4×4 association of dots 
(since a 3×3 sq. requires 4 dots on either side). Let me verify the grid 
construction:
The grid seems to be:

Rows 1-2: solely 2 dots every (centered)
Rows 3-4: 6 dots every (full width)
Rows 5-6: solely 2 dots every (centered)

For a 3×3 sq., I would want 4 consecutive rows and 4 consecutive columns all with dots on the intersection factors. Nonetheless, wanting on the cross/plus sample:

The highest solely has 2 columns of dots
The center has 6 columns of dots
The underside solely has 2 columns of dots

Since rows 1-2 and rows 5-6 solely have 2 dots every (not spanning the complete width)
, there is not truly a 4×4 block of dots wherever on this grid.
I must appropriate my reply: There's NO 3×3 sq. attainable on this grid.
The full ought to be: 10 (1×1 squares) + 7 (2×2 squares) = 17 squares
I apologize for the error in my preliminary rely. The cross-shaped sample 
prevents any sq. bigger than 2×2 from being shaped.

Alright, it semi-redeemed itself with that reply, however essentially, it didn’t accurately say there have been 21 squares in complete.

The ultimate reply

To maintain issues truthful with the mannequin comparability over the 16 months, I requested Sonnet 4.5 the very same query I had requested GPT-4o, which, as a reminder, was this.

I would love a Python program that plots out all of the squares we will draw on 
the connected diagram, assuming that the corners of any sq. should lie on one 
of the spots on the diagram. Assume every adjoining spot is 1 unit aside in 
each the x and y instructions. Additionally print out a abstract of the variety of 
squares of the identical dimension and what their aspect lengths are

Sonnet 4.5 thought for all of 5 seconds earlier than producing its Python code.

import matplotlib.pyplot as plt
import numpy as np
from itertools import mixtures
from collections import defaultdict

# Outline the grid factors based mostly on the picture
factors = [
    # Top two rows (2 dots each, centered)
    (3, 6), (4, 6),
    (3, 5), (4, 5),
    # Middle two rows (6 dots each, full width)
    (1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4),
    (1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3),
    # Bottom two rows (2 dots each, centered)
    (3, 2), (4, 2),
    (3, 1), (4, 1),
]

def is_square(p1, p2, p3, p4):
    """Test if 4 factors type a sq. and return aspect size if true."""
    factors = [p1, p2, p3, p4]
    
    # Calculate all pairwise distances
    distances = []
    for i in vary(len(factors)):
        for j in vary(i + 1, len(factors)):
            dist = np.sqrt((factors[i][0] - factors[j][0])**2 + 
                          (factors[i][1] - factors[j][1])**2)
            distances.append(dist)
    
    distances.kind()
    
    # For a sq.: 4 equal sides and a pair of equal diagonals
    # Diagonals ought to be sqrt(2) instances the aspect size
    if len(distances) == 6:
        sides = distances[:4]
        diagonals = distances[4:]
        
        # Test if all sides are equal
        if np.allclose(sides, [sides[0]] * 4, atol=1e-6):
            # Test if diagonals are equal
            if np.allclose(diagonals, [diagonals[0]] * 2, atol=1e-6):
                # Test if diagonal = aspect * sqrt(2)
                if np.allclose(diagonals[0], sides[0] * np.sqrt(2), atol=1e-6):
                    return True, sides[0]
    
    return False, 0

def order_square_points(p1, p2, p3, p4):
    """Order factors of a sq. in counterclockwise order ranging from bottom-left."""
    factors = [p1, p2, p3, p4]
    # Discover centroid
    cx = sum(p[0] for p in factors) / 4
    cy = sum(p[1] for p in factors) / 4
    
    # Type by angle from centroid
    def angle(p):
        return np.arctan2(p[1] - cy, p[0] - cx)
    
    factors.kind(key=angle)
    return factors

# Discover all squares
squares = []
squares_by_size = defaultdict(checklist)

for combo in mixtures(factors, 4):
    is_sq, side_length = is_square(*combo)
    if is_sq:
        ordered = order_square_points(*combo)
        squares.append(ordered)
        squares_by_size[round(side_length, 2)].append(ordered)

# Print abstract
print("=" * 50)
print("SUMMARY OF SQUARES FOUND")
print("=" * 50)
total_squares = 0
for side_length in sorted(squares_by_size.keys()):
    rely = len(squares_by_size[side_length])
    total_squares += rely
    print(f"Aspect size {side_length}: {rely} sq.(s)")

print(f"nTotal squares: {total_squares}")
print("=" * 50)

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
axes = axes.flatten()

# Plot 1: All dots
ax = axes[0]
x_coords = [p[0] for p in factors]
y_coords = [p[1] for p in factors]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title('Authentic Grid', fontsize=14, fontweight='daring')
ax.set_xlabel('X')
ax.set_ylabel('Y')

# Plot 2: All squares overlaid
ax = axes[1]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
colours = plt.cm.rainbow(np.linspace(0, 1, len(squares)))
for sq., colour in zip(squares, colours):
    square_closed = sq. + [square[0]]
    xs = [p[0] for p in square_closed]
    ys = [p[1] for p in square_closed]
    ax.plot(xs, ys, c=colour, linewidth=2, alpha=0.6)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title(f'All {total_squares} Squares Overlaid', fontsize=14, fontweight='daring')
ax.set_xlabel('X')
ax.set_ylabel('Y')

# Plot 3 & 4: Squares by dimension
plot_idx = 2
for side_length in sorted(squares_by_size.keys()):
    if plot_idx >= len(axes):
        break
    
    ax = axes[plot_idx]
    ax.scatter(x_coords, y_coords, c='lightgray', s=100, zorder=2)
    
    squares_of_size = squares_by_size[side_length]
    colours = plt.cm.viridis(np.linspace(0, 1, len(squares_of_size)))
    
    for sq., colour in zip(squares_of_size, colours):
        square_closed = sq. + [square[0]]
        xs = [p[0] for p in square_closed]
        ys = [p[1] for p in square_closed]
        ax.plot(xs, ys, c=colour, linewidth=2.5, marker='o', markersize=8)
    
    ax.set_xlim(0, 7)
    ax.set_ylim(0, 7)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    ax.set_title(f'Squares with aspect size {side_length} ({len(squares_of_size)} complete)', 
                 fontsize=14, fontweight='daring')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    plot_idx += 1

plt.tight_layout()
plt.present()

At first sight, I believed there was no method Sonnet might have provide you with an accurate resolution so shortly. GPT-4o had taken hours. Nonetheless, I put in all of the required libraries the code wanted and copied and pasted it right into a Jupyter Pocket book cell, operating it and acquiring the next output. I’ve omitted the graphs for now, which had been additionally output.

==================================================
SUMMARY OF SQUARES FOUND
==================================================
Aspect size 1.0: 9 sq.(s)
Aspect size 1.41: 4 sq.(s)
Aspect size 2.24: 2 sq.(s)
Aspect size 2.83: 4 sq.(s)
Aspect size 3.61: 2 sq.(s)

Whole squares: 21
==================================================

#
# Plus some graphs that I am not exhibiting right here
#

That shocked me. The reply was completely spot on.

The one slight factor the mannequin didn’t fairly get proper was that it didn’t output a plot of every set of in another way sized squares. It simply did the 9 1x1s and the 4 √2x√2 ones. I solved that by asking Sonnet to incorporate these, too.

Are you able to print the graphs in sq. aspect order. Can also you've two graphs  
aspect by aspect on every "line"

That is what it produced.

Stunning.

Abstract

To reveal simply how dramatically LLMs have superior in a couple of 12 months, I made a decision to revisit a difficult geometric puzzle I first tried to resolve with GPT-4o again in June 2024. The puzzle was to jot down a Python program that finds and plots all attainable squares on a selected cross-shaped grid of dots.

My expertise slightly over a 12 months in the past was a wrestle; it took me roughly two hours and over 40 prompts to information GPT-4o to an accurate Python resolution.

Quick ahead to right this moment, and I examined the brand new Claude Sonnet 4.5. After I first requested the mannequin the query immediately, it didn’t calculate the proper variety of squares. Not an important begin, nonetheless, the actual check was giving it the very same immediate I used on GPT-4o.

To my shock, it produced a whole, appropriate Python resolution in one shot. The code it generated not solely discovered all 21 squares but additionally accurately categorised them by their distinctive aspect lengths and generated detailed plots to visualise them. Whereas I wanted one fast follow-up immediate to excellent the plots, the core downside was solved immediately.

May it’s that the very act of my attempting to resolve this puzzle final 12 months and publishing my findings launched it to the web-o-sphere, which means Anthropic have merely crawled it and included it into their mannequin data base? Sure, I suppose that might be it, however then why couldn’t the mannequin reply the primary direct query I requested it concerning the complete variety of squares accurately?

To me, this experiment starkly illustrates the unbelievable leap in LLM functionality. What was as soon as a two-hour iterative wrestle with the main mannequin of its time 16 months in the past is now a five-second, one-shot success with the main mannequin right this moment.