Assessing inner high quality whereas coding with an agent

There’s no scarcity of stories on how AI coding assistants, brokers, and fleets of brokers have written huge quantities of code in a short while, code that reportedly implements the options desired. It’s uncommon that individuals discuss non-functional necessities like efficiency or safety in that context, perhaps as a result of that’s not a priority in most of the use circumstances the authors have. And it’s even rarer that individuals assess the standard of the code generated by the agent. I’d argue, although, that inner high quality is essential for improvement to proceed at a sustainable tempo over years, reasonably than collapse below its personal weight.

So, let’s take a better take a look at how the AI tooling performs in the case of inner code high quality. We’ll add a characteristic to an current utility with the assistance of an agent and take a look at what’s occurring alongside the way in which. In fact, this makes it “simply” an anecdote. This memo is on no account a research. On the identical time, a lot of what we’ll see falls into patterns and might be extrapolated, not less than in my expertise.

The characteristic we’re implementing

We’ll be working with the codebase for CCMenu, a Mac utility that exhibits the standing of CI/CD builds within the Mac menu bar. This provides a level of issue to the duty as a result of Mac functions are written in Swift, which is a typical language, however not fairly as widespread as JavaScript or Python. It’s additionally a contemporary programming language with a fancy syntax and sort system that requires extra precision than, once more, JavaScript or Python.

CCMenu periodically retrieves the standing from the construct servers with calls to their APIs. It at present helps servers utilizing a legacy protocol applied by the likes of Jenkins, and it helps GitHub Actions workflows. Essentially the most requested server that’s not at present supported is GitLab. So, that’s our characteristic: we’ll implement help for GitLab in CCMenu.

The API wrapper

GitHub gives the GitHub Actions API, which is secure and effectively documented. GitLab has the GitLab API, which can also be effectively documented. Given the character of the issue house, they’re semantically fairly related. They’re not the identical, although, and we’ll see how that impacts the duty later.

Internally, CCMenu has three GitHub-specific information to retrieve the construct standing from the API: a feed reader, a response parser, and a file that accommodates Swift capabilities that wrap the GitHub API, together with capabilities like the next:

  func requestForAllPublicRepositories(person: String, token: String?) -> URLRequest
  func requestForAllPrivateRepositories(token: String) -> URLRequest
  func requestForWorkflows(proprietor: String, repository: String, token: String?) -> URLRequest

The capabilities return URLRequest objects, that are a part of the Swift SDK and are used to make the precise community request. As a result of these capabilities are structurally fairly related they delegate the development of the URLRequest object to 1 shared, inner perform:

  func makeRequest(methodology: String = "GET", baseUrl: URL, path: String,
        params: Dictionary = [:], token: String? = nil) -> URLRequest

Don’t fear in the event you’re not aware of Swift, so long as you recognise the arguments and their varieties you’re tremendous.

Elective tokens

Subsequent, we should always take a look at the token argument in a bit extra element. Requests to the API’s might be authenticated. They don’t need to be authenticated however they are often authenticated. This enables functions like CCMenu to entry data that’s restricted to sure customers. For many API’s, GitHub and GitLab included, the token is solely a protracted string that must be handed in an HTTP header.

In its implementation CCMenu makes use of an optionally available string for the token, which in Swift is denoted by a query mark following the kind, String? on this case. That is idiomatic use, and Swift forces recipients of such optionally available values to cope with the optionality in a secure manner, avoiding the basic null pointer issues. There are additionally particular language options to make this simpler.

Some capabilities are nonsensical in an unauthenticated context, like requestForAllPrivateRepositories. These declare the token as non-optional, signalling to the caller {that a} token should be supplied.

Let’s go

I’ve tried this experiment a few instances, throughout the summer season utilizing Windsurf and Sonnet 3.5, and now, just lately, with Claude Code and Sonnet 4.5. The strategy remained related: break down the duty into smaller chunks. For every of the chunks I requested Windsurf to provide you with a plan first earlier than asking for an implementation. With Claude Code I went straight for the implementation, counting on its inner planning; and on Git when one thing ended up going within the incorrect course.

As a primary step I requested the agent, kind of verbatim: “Based mostly on the GitHub information for API, feed reader, and response parser, implement the identical performance for GitLab. Solely write the equal for these three information. Don’t make adjustments to the UI.”

This gave the impression of an inexpensive request, and by and enormous it was. Even Windsurf, with the much less succesful mannequin, picked up on key variations and dealt with them, e.g. it recognised that what GitHub calls a repository is a challenge in GitLab; it noticed the distinction within the JSON response, the place GitLab returns the array of runs on the high stage whereas GitHub has this array as a property in a top-level object.

I hadn’t regarded on the GitLab API docs myself at this stage and simply from a cursory scan of the generated code all the things regarded fairly okay, the code compiled and even the advanced perform varieties had been generated accurately, or had been they?

First shock

Within the subsequent step, I requested the agent to implement the UI so as to add new pipelines/workflows. I intentionally requested it to not fear about authentication but, to only implement the stream for publicly accessible data. The dialogue of that step is perhaps for an additional memo, however the brand new code by some means must acknowledge {that a} token is likely to be current sooner or later

  var apiToken: String? = nil

after which it could possibly use the variable within the name the wrapper perform

  let req = GitLabAPI.requestForGroupProjects(group: title, token: apiToken)
  var tasks = await fetchProjects(request: req)

The apiToken variable is accurately declared as an optionally available String, initialised to nil for now. Later, some code may retrieve the token from one other place relying on whether or not the person has determined to check in. This code led to the primary compiler error:

What’s occurring right here? Properly, it seems that the code for the API wrapper in step one had a little bit of a refined downside: it declared the tokens as non-optional in all the wrapper capabilities, e.g.

  func requestForGroupProjects(group: String, token: String) -> URLRequest

The underlying makeRequest perform, for one purpose or one other, was created accurately, with the token declared as optionally available.

The code compiled as a result of in the way in which the capabilities had been written, the wrapper capabilities undoubtedly have a string and that may after all be handed to a perform that takes an optionally available string, an argument that could be a string or nothing (nil). However now, within the code above, now we have an optionally available string and that may’t be handed to a perform that wants a (particular) string.

The vibe repair

Being lazy I merely copy-pasted the error message again to Windsurf. (Constructing a Swift app in something however Xcode is an entire completely different story, and I bear in mind an experiment with Cline the place it alternated between including and eradicating specific imports, at about 20¢ per iteration.) The repair proposed by the AI for this downside labored: it modified the call-site and inserted an empty string as a default worth for when no token was current, utilizing Swift’s ?? operator.

  let req = GitLabAPI.requestForGroupProjects(group: title, token: apiToken ?? "")
  var tasks = await fetchProjects(request: req)

This compiles, and it kinda works: if there’s no token an empty string is substituted, which implies that the argument handed to the perform is both the token or the empty string, it’s all the time a string and by no means nil.

So, what’s incorrect? The entire level of declaring the token as optionally available was to sign that the token is optionally available. The AI ignored this and launched new semantics: an empty string now alerts that no token is offered. That is

not idiomatic,
not self-documenting,
unsupported by Swift’s kind system.

It additionally required adjustments in each place the place this perform is named.

The true repair

In fact, what the agent ought to’ve carried out is to easily change the perform declaration of the wrapper perform to make the token optionally available. With that change all the things works as anticipated, the semantics stay intact, and the change is restricted to including a single ? to the perform argument’s kind, reasonably than spraying ?? "" all around the code.

Does it actually matter?

You would possibly ask whether or not I’m splitting hair right here. I don’t suppose I’m. I feel this can be a clear instance the place an AI agent left to their very own would have modified the codebase for the more severe, and it took a developer with expertise to note the difficulty and to direct the agent to the right implementation.

Additionally, this is only one of many examples I encountered. Sooner or later the agent needed to introduce a very pointless cache, and, after all, couldn’t clarify why it had even instructed the cache.

It additionally failed to grasp that the person/org overlap in GitHub doesn’t exist within the GitLab, and went to implement some sophisticated logic to deal with a non-existing downside. It took greater than nudging the agent in direction of the right locations within the documentation to speak it down from insisting that the logic was wanted.

It additionally “forgot” to make use of current capabilities to assemble URLs, replicating such logic in a number of locations, typically with out implementing all performance, e.g. the choice to overwrite the bottom URL for testing functions utilizing the defaults system on macOS.

So, in these circumstances, and there have been extra, the generated code labored. It applied the performance required. However the brand new code additionally would’ve added utterly pointless complexity and it missed non-obvious performance, lowering the standard of the codebase and introducing refined points.

If engaged on giant software program techniques has taught me one factor it’s that investing within the inner high quality of the software program, the standard of the codebase, is a worthwhile funding. Don’t get overwhelmed by technical debt. People and brokers discover it tougher to work with an advanced codebase. With out cautious oversight, although, the AI brokers appear to have a robust tendency to introduce technical debt, making future improvement tougher, for people and brokers.

Another factor

If attainable, CCMenu exhibits the avatar of the particular person/actor that triggered the construct. In GitHub the avatar URL is a part of the response to the construct standing API name. GitLab has a “cleaner”, extra RESTful design and retains further person data out of the construct response. The avatar URL should be retrieved with a separate API name to a /person endpoint.

Each Windsurf and Claude Code stumbled over this in a serious manner. I bear in mind a longish dialog the place Claude Code needed to persuade me that the URL was within the response. (It most likely received combined up as a result of a number of endpoints had been described on the identical web page of the documentation.) In the long run I discovered it simpler to implement that performance with out agent help.

My conclusions

Throughout the experiments in the summertime I used to be on the fence. The Windsurf / Sonnet 3.5 combo did pace up writing code, nevertheless it required cautious planning with prompts, and I needed to swap forwards and backwards between Windsurf and Xcode (for constructing, working exams, and debugging), which all the time felt considerably disorientating and received tiring shortly. The standard of the generated code had vital points, and the agent had an inclination to get caught attempting to repair an issue. So, on the entire it felt like I wasn’t getting a lot out of utilizing the agent. And I traded doing what I like, writing code, for overseeing an AI with an inclination to jot down sloppy code.

With Claude Code and Sonnet 4.5 the story is considerably completely different. It wants much less prompting, and the code has higher high quality. It’s on no account top quality code, nevertheless it’s higher, requiring much less rework and fewer prompting to enhance high quality. Additionally, working a dialog with Claude Code in a terminal window alongside Xcode felt extra pure than switching between two IDEs. For me this has tilted the scales sufficient to make use of Claude Code recurrently.