{"id":15067,"date":"2026-05-24T04:50:10","date_gmt":"2026-05-24T04:50:10","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=15067"},"modified":"2026-05-24T04:50:11","modified_gmt":"2026-05-24T04:50:11","slug":"maintainability-sensors-for-coding-brokers","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=15067","title":{"rendered":"Maintainability sensors for coding brokers"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>There are a number of dimensions we normally wish to obtain and monitor in our codebases: Useful correctness (works as supposed), <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.thoughtworks.com\/insights\/decoder\/f\/fitness-functions\">architectural health<\/a> (is quick\/safe\/usable sufficient), and maintainability. I outline maintainability right here as making it simple and low threat to alter the codebase over time &#8211; <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/martinfowler.com\/articles\/is-quality-worth-cost.html\">often known as \u201cinner high quality\u201d<\/a>. So I do not solely need to have the ability to make adjustments shortly right now, but additionally sooner or later. And I do not wish to fear about introducing bugs or degradation of health each time I make a change &#8211; or have AI make a change. I normally see the primary indicators of cracks within the maintainability of an AI-generated codebase when the variety of recordsdata modified for a small adjustment will increase. Or when adjustments begin breaking issues that used to work.<\/p>\n<p>Inner high quality issues have an effect on AI brokers in related ways in which they have an effect on human builders. An agent working in a tangled codebase would possibly look within the incorrect place for an current implementation, create inconsistencies as a result of it has not seen a replica, or be pressured to load extra context than a process ought to require.<\/p>\n<p>On this article, I describe my experimentation with varied sensors that assist us and AI replicate on the maintainability of a codebase, and what I discovered from that.<\/p>\n<section id=\"TheApplication\">\n<h2>The appliance<\/h2>\n<p>I am engaged on an inner analytics dashboard for group managers that reads chat area exercise, engagement, and demographic knowledge from a mixture of APIs and presents the information in an online frontend.<\/p>\n<div class=\"figure \" id=\"sensors-example-application.png\"><img decoding=\"async\" alt=\"Overview showing the application frontend, backend, and 4 external APIs - Google Chat, Google People, Employee API, Gemini API\" src=\"https:\/\/martinfowler.com\/articles\/sensors-for-coding-agents\/sensors-example-application.png\" \/><\/p>\n<p class=\"photoCaption\">Determine 1:<br \/>\n        The instance app: internet UI, service layer, and exterior APIs.\n      <\/p>\n<\/div>\n<p>The tech stack is a TypeScript, NextJS, and React. The backend reads and joins knowledge from the APIs. The appliance has been round for some time, however for the sake of those experiments I rebuilt it with AI from scratch.<\/p>\n<p>There are hardly any guides (e.g. markdown recordsdata) for AI about code high quality and maintainability current, I needed to see how effectively it may well do exactly by counting on sensor suggestions.<\/p>\n<section id=\"OverviewOfAllSensorsUsed\">\n<h3>Overview of all sensors used<\/h3>\n<div class=\"figure \" id=\"sensors-example-overview.png\"><img decoding=\"async\" alt=\"Overview of sensors: During coding session, after integration in the pipeline, repeatedly, and runtime feedback in production\" src=\"https:\/\/martinfowler.com\/articles\/sensors-for-coding-agents\/sensors-example-overview.png\" \/><\/p>\n<p class=\"photoCaption\">Determine 2:<br \/>\n          The place sensors can run: through the preliminary coding session, within the pipeline, on a schedule, and in manufacturing.\n        <\/p>\n<\/div>\n<p>That is an summary of the sensors I arrange throughout the trail to manufacturing.<\/p>\n<p><b>Throughout coding session<\/b><\/p>\n<p>Sensors that run constantly alongside the agent to offer quick suggestions.<\/p>\n<ul>\n<li>Kind checker (computational)<\/li>\n<li>ESLint (computational)<\/li>\n<li>Semgrep, SAST device prescribed by our inner AppSec workforce (computational)<\/li>\n<li>dependency-cruiser, runs structural guidelines to test inner module dependencies (computational)<\/li>\n<li>Take a look at suite outcomes together with check protection (computational &#8211; although the check suite is generated by AI, subsequently created in an inferential method)<\/li>\n<li>Incremental mutation testing (computational)<\/li>\n<li>GitLeaks runs as a part of the pre-commit hook, I contemplate it to be a sensor as effectively, as it&#8217;ll give the agent suggestions when it tries to commit (computational)<\/li>\n<\/ul>\n<p><b>After integration &#8211; pipeline<\/b><\/p>\n<p>The identical computational sensors run once more in CI. The in-session sensors give the agent early suggestions throughout growth. The CI pipeline confirms the outcome on clear infrastructure and after integration.<\/p>\n<p><b>Repeatedly<\/b><\/p>\n<p>Sensors that run on a slower cadence to detect drift that accumulates over time, somewhat than errors that happen within the second.<\/p>\n<ul>\n<li>A safety evaluate, immediate derived from our AppSec guidelines for inner functions (inferential)<\/li>\n<li>A knowledge dealing with evaluate, immediate describes issues like \u201cno consumer names ought to ever be despatched to the net frontend\u201d (inferential)<\/li>\n<li>Dependency freshness report, which runs a script first to get the age and exercise of the library dependencies, after which has AI create a report with suggestions about potential upgrades, deprecations, and many others (computational and inferential)<\/li>\n<li>Modularity and coupling evaluate (computational and inferential)<\/li>\n<\/ul>\n<p>With this context out of the best way, let&#8217;s dive into the primary class of sensors.<\/p>\n<\/section>\n<section id=\"BaseHarnessesAndModels\">\n<h3>Base harnesses and fashions<\/h3>\n<p>All through constructing the applying, I used a mixture of Cursor, Claude Code, and OpenCode (in that order of frequency). My default mannequin was normally Claude Sonnet, for among the planning and evaluation duties I used Claude Opus, and for implementation duties I ceaselessly used Cursor&#8217;s composer-2 mannequin.<\/p>\n<\/section>\n<\/section>\n<section id=\"StaticCodeAnalysisBasicLinting\">\n<h2>Static code evaluation: Primary linting<\/h2>\n<p>I will begin with my learnings from utilizing ESLint on this software. Primary linting instruments like ESLint largely goal maintainability threat on the stage of particular person recordsdata and features.<\/p>\n<section id=\"RulesForTypicalAiShortcomings\">\n<h3>Guidelines for typical AI shortcomings<\/h3>\n<p>In my expertise, the AI failure modes which might be probably the most low-hanging fruit for static code evaluation are<\/p>\n<ul>\n<li>Max variety of arguments for features<\/li>\n<li>File size<\/li>\n<li>Operate size<\/li>\n<li>Cyclomatic complexity<\/li>\n<\/ul>\n<p>Nonetheless, these weren&#8217;t even lively in ESLint&#8217;s default preset, I needed to configure maximums for them first. Hopefully, static evaluation instruments will evolve to offer higher presets for utilization with AI. A little bit of analysis exhibits that individuals are additionally beginning to publish ESLint plugins with rule units which might be particularly focusing on recognized agent failure modes, like <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/Factory-AI\/eslint-plugin\">this one by Manufacturing unit<\/a>, with guidelines about issues like requiring check recordsdata or structured logging.<\/p>\n<\/section>\n<section id=\"GuidanceForSelf-correction\">\n<h3>Steerage for self-correction<\/h3>\n<p>A sensor is supposed to provide the agent suggestions in order that it may well self-correct. Ideally, we wish to give the agent additional context for that self-correction &#8211; an excellent type of immediate injection. To do this, I constructed a customized ESLint formatter to override among the default messages &#8211; with the assistance of AI after all, naturally.<\/p>\n<p>Right here is an instance of my steering for the <code>no-explicit-any<\/code> warning.<\/p>\n<pre>We would like issues to be typed to make it simpler to keep away from errors, particularly for key ideas.\nHowever we additionally wish to keep away from cluttering our codebase with pointless sorts. Make a judgment\nname about this. In case you select to not introduce a sort, suppress it with:\n\/\/ eslint-disable-next-line @typescript-eslint\/no-explicit-any -- (give cause why)`,<\/pre>\n<\/section>\n<section id=\"ManagingWarnings-NowMoreFeasible\">\n<h3>Managing warnings &#8211; now extra possible?<\/h3>\n<p>Static code evaluation has been round for a very long time, and but, groups usually did not use it persistently, even once they had it arrange. One of many causes for that&#8217;s the administration overhead that comes with it. Efficient use of this evaluation requires a workforce to maintain a \u201cclear home\u201d, in any other case the metrics simply turn out to be noise. Specifically warnings just like the <code>no-explicit-any<\/code> instance above are difficult, since you do not at all times wish to repair them &#8211; it relies upon. And suppressing them one after the other has at all times felt tedious, and like noise within the code.<\/p>\n<p>With coding brokers, we would now have an opportunity at that clear baseline. Within the steering textual content above, the agent is advised to make a judgment name, and allowed to suppress a warning within the code. This retains the suppressions manageable, seen and reviewable.<\/p>\n<p>For thresholds, like the utmost variety of traces, or the utmost allowed cyclomatic complexity, I advised the agent within the lint message that it might barely improve the thresholds if it thinks {that a} refactoring is pointless or unimaginable in a selected case. This does not suppress the brink without end, simply will increase it, in order that the rule fires once more if it will get even worse sooner or later. Constraints are preserved with out forcing a binary suppress-or-comply selection.<\/p>\n<\/section>\n<section id=\"Observations\">\n<h3>Observations<\/h3>\n<ul>\n<li>Wanting on the exceptions AI created (suppressed warnings, elevated thresholds) was an excellent level to start out my code evaluate.<\/li>\n<li>AI ceaselessly determined to extend the cyclomatic complexity threshold, however steered good refactorings after I nudged it additional. It was the one class the place it did that, and I later found that I did not have a self-correction steering in place for this one, so there was no express instruction saying {that a} threshold improve needs to be absolutely the exception. That is an indicator that the customized lint messages can certainly make fairly a distinction.<\/li>\n<li>Generally I wish to deal with guidelines in a different way in several components of the code. Let&#8217;s take <code>no-console<\/code>, telling AI off when it makes use of <code>console.log<\/code>. Within the backend, I would like it to make use of a logger part as an alternative. Within the frontend, I&#8217;d wish to not use direct logging in any respect, or on the very least I would like to make use of a distinct logging part. That is one other instance of the facility of the self-correction steering, and the place AI might help with semantic judgment and administration of research warnings.<\/li>\n<li>I used to be watching out for examples of trade-offs between guidelines. The one one I&#8217;ve seen to this point was created by the <code>max-lines<\/code> and <code>max-lines-per-function<\/code> guidelines. I&#8217;ve seen AI do fairly a little bit of helpful refactoring and breakdown into smaller features and parts because of this sensor suggestions. Nonetheless, within the React frontend, I am seeing a worrying pattern of parts with heaps and many properties because of passing values by way of a rising chain of smaller and smaller parts. I have never bought helpful observations but about how good AI may be at making constant selections between tradeoffs like that.<\/li>\n<\/ul>\n<\/section>\n<section id=\"MainTakeaways\">\n<h3>Most important takeaways<\/h3>\n<p>General, I used to be positively shocked by what number of issues I can cowl with static evaluation. I needed to remind myself a number of occasions why it has been considerably underused up to now, and what has modified: The price-benefit stability. Value is decreased as a result of it is less expensive to create customized scripts and guidelines with AI. And the profit has additionally elevated: the evaluation outcomes assist me get a primary sense of numerous hygiene elements that would not even occur that a lot after I write code myself, so I can get frequent AI errors out of the best way.<\/p>\n<p>Nonetheless, I am unable to assist however surprise if this could additionally result in a false sense of safety and an phantasm of high quality. In any case, one more reason why linters like this have been much less used up to now is that they&#8217;ve limits, and we&#8217;ve got been cautious of utilizing them as a simplified indicator of high quality. There are many extra semantic features of high quality that static evaluation can not catch, it stays to be seen if AI can adequately fill that hole in partnership with these instruments. I additionally found new supposed points within the code each time I activated a brand new algorithm. It was at all times a mixture of irrelevant issues and issues that truly matter. So I fear about suggestions overload for the agent, sending it right into a spiral of over-engineered refactorings. <\/p>\n<\/section>\n<\/section>\n<section id=\"StaticCodeAnalysisDependencyRules\">\n<h2>Static code evaluation: Dependency guidelines<\/h2>\n<p>Primary linting is generally focussed on high quality and complexity inside a file or perform. Subsequent I began wanting into sensors that would give me and the agent suggestions about maintainability considerations that cross file and module boundaries. Evaluation instruments on this space are traditionally much more underused than the fundamental linting.<\/p>\n<p>To study concerning the potential of sensors that may assist us and AI sustain good modularity inside a codebase, I explored three issues:<\/p>\n<ul>\n<li>Dependency guidelines (deterministic)<\/li>\n<li>Coupling evaluation (deterministic and inferential)<\/li>\n<li>Modularity evaluate (inferential)<\/li>\n<\/ul>\n<p>Let&#8217;s begin with dependency guidelines. I labored with the agent to provide you with a layered module construction for my software, about half method by way of implementing it. I requested it to assist me write <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/sverweij\/dependency-cruiser\"><code>dependency-cruiser<\/code><\/a> guidelines to implement these layers.<\/p>\n<div class=\"figure \" id=\"sensors-structure-chativity.png\"><img decoding=\"async\" src=\"https:\/\/martinfowler.com\/articles\/sensors-for-coding-agents\/sensors-structure-chativity.png\" \/><\/p>\n<p class=\"photoCaption\">Determine 3:<br \/>\n          Layered module construction and dependency guidelines\n        <\/p>\n<\/div>\n<p>For instance, one of many guidelines enforces that code within the <code>shoppers<\/code> folder by no means imports something from the <code>providers<\/code> folder:<\/p>\n<pre>{\n  title: \u201cclients-no-services\u201d,\n  remark:\n    \u201cAPI shoppers should not rely on the orchestration layer above them. \u201c + LAYERS,\n  severity: \u201cerror\u201d,\n  from: { path: \u201c^server\/shoppers\/\u201d, pathNot: \u201c\/__tests__\/\u201d },\n  to: { path: \u201c^server\/providers\/\u201d },\n},<\/pre>\n<p>As with the ESLint messages, I additionally expanded the error messages a bit to be self-correction steering, recapping the layering idea as an entire:<\/p>\n<pre>ERROR  clients-no-services\n  API shoppers should not rely on the orchestration layer above them. \n  [Layers: routes -&gt; services -&gt; clients + domain; Services orchestrate: fetch data via clients, compute via domain -- no I\/O, no SDKs, no knowledge of data fetching.]<\/pre>\n<section id=\"Observations\">\n<h3>Observations<\/h3>\n<ul>\n<li>With out AI, I&#8217;d not have gotten these guidelines in place shortly. The device&#8217;s configuration syntax has a steep entry value, and AI absorbed that value nearly fully.<\/li>\n<li>The agent violated the principles a handful of occasions after I launched them, after which self-corrected based mostly on <code>dependency-cruiser<\/code> suggestions, so it did assist maintain my folder ideas.<\/li>\n<li>I additionally used the identical strategy to introduce conventions for a way React hooks needs to be structured within the frontend.<\/li>\n<li>I had to determine find out how to catch issues when AI begins creating new folders exterior of this construction, with a rule that requires each new file to be someplace within the predefined folder construction.<\/li>\n<\/ul>\n<\/section>\n<section id=\"MainTakeaways\">\n<h3>Most important takeaways<\/h3>\n<p>On the level after I launched these guidelines, the structuring of code into folders had already turn out to be slightly bit haphazard. I might see how the principles helped the agent clear that up, after which proceed implement these layers going ahead. So I&#8217;ve discovered it fairly a helpful substitute for describing code construction in a markdown information. Nonetheless, instruments like this are restricted to what&#8217;s expressible through imports, file names, and folder construction.<\/p>\n<\/section>\n<\/section>\n<section id=\"StaticCodeAnalysisCouplingData\">\n<h2>Static code evaluation: Coupling knowledge<\/h2>\n<p>Subsequent, I experimented with the extraction of typical coupling metrics from my codebase, i.e. the variety of incoming and outgoing imports and calls per file.<\/p>\n<p>I did not use any current instruments for this, as an alternative I had a coding agent write an software that creates these metrics with the assistance of the typescript compiler, in order that I might have most flexibility to mess around with this as a part of my experimentation. I had it add two interfaces: An online interface with a bunch of various visualisations of these metrics for my very own human consumption. And a CLI that may present these metrics to a coding agent.<\/p>\n<div class=\"figure \" id=\"sensors-coupling-dashboards.png\"><img decoding=\"async\" src=\"https:\/\/martinfowler.com\/articles\/sensors-for-coding-agents\/sensors-coupling-dashboards.png\" style=\"max-width: 95vw;\" width=\"930\" \/><\/p>\n<p class=\"photoCaption\">Determine 4:<br \/>\n          Coupling metrics: internet visualisations and CLI for brokers.\n        <\/p>\n<\/div>\n<section id=\"ForHumanConsumption\">\n<h3>For human consumption<\/h3>\n<p>Most of those visualisations are effectively established ideas, like a dependency construction matrix (DSM). I discovered them tedious to interpret, and though they have been vibe coded and will most definitely be improved, I feel that had extra to do with the character of the information. It is fairly detailed knowledge that wants a variety of context and expertise to interpret it, and map it again to extra excessive stage good practices. So I&#8217;ve a sense that these kind of instruments nonetheless will not actually assist cut back a human&#8217;s cognitive load a lot when reviewing codebases that have been modified by AI.<\/p>\n<\/section>\n<section id=\"ForAiConsumption\">\n<h3>For AI consumption<\/h3>\n<p>I gave an agent entry to this practice CLI (<code>coupling-analyser<\/code>) and requested it to create a report based mostly on the information, together with recommendations of find out how to enhance the essential points.<\/p>\n<p>Right here is an excerpt of what that immediate appeared like &#8211; I am primarily reproducing this to point out you that I did not truly give it a lot steering on what good or dangerous modularity seems like, I largely delegated to the mannequin to interpret what good and dangerous seems like:<\/p>\n<div class=\"prompt\">\n<p>Produce a markdown report on modularity and coupling high quality for the goal TypeScript codebase, grounded in precise CLI output from <code>npx coupling-analyser<\/code>, not guesswork from static searching alone.<\/p>\n<h2 id=\"gather-evidence-run-the-cli\">Collect proof (run the CLI)<\/h2>\n<p>Execute the CLI and seize stdout. Use the <code>report<\/code> subcommands\u2014mix as helpful for the query:<br \/>\n\u2026<\/p>\n<h2 id=\"write-the-markdown-report\">Write the markdown report<\/h2>\n<p>Use clear headings. Choose <strong>concrete module IDs \/ paths and numbers<\/strong> quoted or paraphrased from CLI output.<\/p>\n<p>Urged sections:<\/p>\n<ol>\n<li>\n<p><strong>Context<\/strong> \u2014 What was analyzed<\/p>\n<\/li>\n<li>\n<p><strong>Govt abstract<\/strong> \u2014 2\u20135 bullets: general modularity posture, high 1\u20133 systemic points.<\/p>\n<\/li>\n<li>\n<p><strong>Findings from the device<\/strong> \u2014 Summarize hotspots, high dangers, notable cycles or mutual dependencies, and behavioural highlights <strong>as reported by the CLI<\/strong>.<\/p>\n<\/li>\n<li>\n<p><strong>Interpretation (modularity lens)<\/strong> \u2014 Tie metrics to software program design: cohesion vs. unfold of change, stability vs. dependency route, fan-in\/fan-out instinct, cycle influence.<\/p>\n<\/li>\n<li>\n<p><strong>Deep dives for every excessive and significant difficulty<\/strong><\/p>\n<\/li>\n<\/ol>\n<ul>\n<li>What it&#8217;s \u2014 Module(s), position within the system, dependency neighbours (from CLI + minimal code peek if wanted).<\/li>\n<li>Obligations right now \u2026<\/li>\n<li>Why it hurts \u2026<\/li>\n<li>Design choices (2+ the place affordable) \u2026<\/li>\n<li>Why the brand new design is best \u2014 Fewer cycles, clearer dependency route, smaller surfaces, check seams, align with seemingly change vectors.<\/li>\n<li>Future change threat \u2014 How every choice reduces regression threat and makes secure evolution cheaper (concrete situations: \u201cincluding X\u201d, \u201cswapping Y\u201d, \u201ctransport Z independently\u201d).<\/li>\n<\/ul>\n<p>\u2026<\/p>\n<\/div>\n<p>This LLM-led evaluation truly pointed me to the identical coupling sizzling spots that I&#8217;d have discovered by wanting by way of the visible diagrams, simply in a format that was extra digestible. And asking the LLM to floor its evaluation within the outcomes from the deterministic device gave me the next stage of confidence, and possibly additionally used much less time and tokens than if the agent had scanned the codebase itself to search out coupling issues.<\/p>\n<\/section>\n<section id=\"Observations\">\n<h3>Observations<\/h3>\n<p>What the LLM discovered based mostly on this knowledge was fairly lackluster (I used Claude Opus 4.7 for this):<\/p>\n<ul>\n<li>It mentioned one of many largest points was a manufacturing unit that initialises all the mandatory parts, however I had launched that manufacturing unit on objective as a part that acts like a light-weight dependency injection framework.<\/li>\n<li>One other difficulty it had was with a shared (<code>zod<\/code>) schema between frontend and backend, declared a \u201cgod module\u201d by the LLM. It is a frequent sample although to create an express contract between backend and frontend, and isn&#8217;t as a lot of a problem when backend and frontend evolve collectively anyway, and even dwell collectively in the identical repo, like in my case.<\/li>\n<li>When authentic patterns seem as high-coupling hubs, there must be a strategy to suppress these in future analyses, in any other case they create much more noise.<\/li>\n<li>The one type of fascinating discovering it had: An <code>index.ts<\/code> file within the area folder indiscriminately uncovered all recordsdata in <code>.\/area<\/code>, and is imported by numerous locations. Whereas that can be a standard sample to create express contracts for a layer, it does have its professionals and cons, and is a minimum of value an investigation to see whether it is acceptable for this codebase.<\/li>\n<\/ul>\n<\/section>\n<section id=\"MainTakeaways\">\n<h3>Most important takeaways<\/h3>\n<p>The examples above present that much more so than with the fundamental linting, <i>good<\/i> and <i>dangerous<\/i> doesn&#8217;t have a transparent definition, as an alternative it&#8217;s all about what&#8217;s <i>acceptable<\/i>. And what coupling is acceptable relies on a variety of context, not simply the uncooked name and import graph of a codebase. So based mostly on this small experiment, I haven&#8217;t got the impression that one of these coupling knowledge is helpful to AI by itself.<\/p>\n<p>A extra sensible use I can think about for this knowledge is throughout threat triage for code evaluate. Once I evaluate a code change made by AI, it appears helpful to know what the influence radius of the modified recordsdata is, in order that I will pay extra consideration when e.g. a file with 10+ callers is modified. Or an AI evaluate agent might use the information to prioritise the place it spends its tokens.<\/p>\n<\/section>\n<\/section>\n<section id=\"StaticCodeAnalysisAiModularityReview\">\n<h2>Static code evaluation: AI modularity evaluate<\/h2>\n<p>The lackluster outcomes from the coupling knowledge experiment might have a number of causes:<\/p>\n<ul>\n<li>My immediate about what to analyse was not very particular<\/li>\n<li>The coupling knowledge shouldn&#8217;t be helpful to AI<\/li>\n<li>The coupling knowledge solely is just too shallow and lacks context of the total code<\/li>\n<\/ul>\n<p>So the ultimate factor I did was to go totally down the inferential route and use <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/vladikk\/modularity\">Vlad Khononov&#8217;s \u201cModularity Abilities\u201d<\/a> to analyse the codebase design and discover modularity points. This proved to be very fruitful! It gave me numerous fascinating pointers for refactorings that will clearly cut back the danger of future adjustments. I ran the talents a second time and gave them entry to my coupling evaluation CLI. The AI largely discovered affirmation within the knowledge, however not any extra findings. Quite the opposite, it identified numerous issues that the CLI was lacking. It is also value noting that the second run of the evaluation (with out context of the primary one) surfaced yet one more difficulty that the primary run didn&#8217;t discover. A helpful reminder that when it issues, it is usually value operating an LLM-based evaluation a number of occasions, to get a fuller image.<\/p>\n<section id=\"Observations\">\n<h3>Observations<\/h3>\n<p>Listed below are some highlights from the outcomes (mannequin used was Claude Opus 4.7, similar as for the coupling evaluation):<\/p>\n<ul>\n<li><b>Duplicate route code<\/b> &#8211; all my three backend endpoints had their very own route file, and every of these route implementations was nearly equivalent. So at any time when I&#8217;d wish to introduce a change to the overall rules of the backend API (for example introducing a request ID, or altering the error dealing with or logging strategy), I might need to do it in a number of recordsdata. I had solely simply launched a 3rd endpoint, so I feel it is honest sufficient that this wasn&#8217;t abstracted out but. However in my expertise, AI brokers normally do not go forward and begin refactoring with out an express nudge once they repeat a bit of code for the third or fourth time, they&#8217;re fairly completely satisfied to repeat and paste.<\/li>\n<li><b>Inconsistency in calling the backend<\/b> &#8211; or put one other method, yet one more type of semantic duplication. I&#8217;ve 3 pages within the software that must name the backend with the identical set of parameters (chosen chat area, and which date vary to analyse). Two of these pages have been utilizing the identical hook and normal strategy to do that, however when AI launched the third web page, it deviated from that and reimplemented related behaviour in its personal method. This will e.g. result in inconsistencies in error dealing with, or once more the necessity to change a number of recordsdata when backend API rules change.<\/li>\n<li><b>Inefficient dealing with of the core arguments<\/b> &#8211; As simply talked about, all of the pages within the software cross on a chat area ID and a date vary to the backend. I had already seen after I modified the best way a consumer can specify a date vary that AI needed to change a <i>lot<\/i> of recordsdata for that change &#8211; over 40! So I used to be already conscious that one thing was fishy right here, and the evaluation confirmed it: \u201cSubject: Request parameters repeated at each stage\u201d. The advice was to introduce an object that wraps all of those parameters. AI had already executed that in a method &#8211; however by no means totally adopted by way of with the utilization of that object, so it was an inconsistent mess.<\/li>\n<li><b>Obligations within the incorrect place<\/b> &#8211; The evaluate discovered a little bit of authentication code sitting inside our manufacturing unit that was speculated to solely be answerable for wiring up our modules. It carried out a fallback to mock knowledge when the consumer shouldn&#8217;t be authenticated. An surprising location like that creates a threat of being missed when new routes are added.<\/li>\n<li><b>Higher interpretation of acceptable high-import-count \u201chubs\u201d<\/b> &#8211; Bear in mind the \u201cgod lessons\u201d discovered by my earlier coupling evaluation? The modularity abilities additionally seen these, however in each instances properly identified that they&#8217;ve a objective within the context of this software. I assume that&#8217;s both as a result of good prompting in these abilities, or on account of the truth that this evaluation truly learn what was within the code, whereas I requested the opposite one to solely depend on the coupling knowledge.<\/li>\n<\/ul>\n<\/section>\n<section id=\"MainTakeaways\">\n<h3>Most important takeaways<\/h3>\n<ul>\n<li>Dependency parsers like <code>dependency-cruiser<\/code> may be efficient dwell sensors to implement some fundamental folder buildings and dependency instructions, however they will solely go to this point.<\/li>\n<li>The AI modularity evaluate is a good instance of \u201crubbish assortment\u201d, and labored fairly effectively when given highly effective prompts. Grounding it in precise coupling knowledge did not appear to make a lot distinction. It might be nice to discover a strategy to apply this to the modified recordsdata in a commit, to have this earlier within the pipeline, however I didn&#8217;t discover this but.<\/li>\n<li>I ran the modularity evaluate after constructing a lot of the codebase with out making use of that sort of evaluate myself &#8211; and it had some fairly regarding and really legitimate findings that will have elevated threat sooner or later. It exhibits that with out human evaluate and coupling experience, AND with out these additional AI critiques, the agent was positively compounding <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/martinfowler.com\/bliki\/TechnicalDebtQuadrant.html\">inadvertent technical debt<\/a>.<\/li>\n<\/ul>\n<p>General, codebase design and modularity looks as if a priority the place computational sensors alone can not assist us a lot, AI is required so as to add semantic interpretation, and contemplate trade-offs.<\/p>\n<\/section>\n<\/section>\n<div class=\"next-installment\">\n<p>Within the subsequent replace to this text, I&#8217;ll share about regression<br \/>\n        testing&#8217;s position as a sensor, and my expertise with utilizing protection and<br \/>\n        mutation testing on AI-generated check suites.<\/p>\n<p> To search out out after we publish the subsequent installment subscribe to this<br \/>\n        web site&#8217;s<br \/>\n        <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/martinfowler.com\/feed.atom\">RSS feed<\/a>, or Martin&#8217;s feeds on<br \/>\n        <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/toot.thoughtworks.com\/@mfowler\">Mastodon<\/a>,<br \/>\n        <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/bsky.app\/profile\/martinfowler.com\">Bluesky<\/a>,<br \/>\n        <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.linkedin.com\/in\/martin-fowler-com\/\">LinkedIn<\/a>, or<br \/>\n        <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/twitter.com\/martinfowler\">X<\/a>.\n        <\/p>\n<\/div>\n<hr class=\"bodySep\" \/>\n<\/div>\n<p><template id="4zOVpLCTRAKOOYB39N1a"></template><\/script><br \/>\n<br \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are a number of dimensions we normally wish to obtain and monitor in our codebases: Useful correctness (works as supposed), architectural health (is quick\/safe\/usable sufficient), and maintainability. I outline maintainability right here as making it simple and low threat to alter the codebase over time &#8211; often known as \u201cinner high quality\u201d. So I [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":15069,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[56],"tags":[617,1256,9143,9144],"class_list":["post-15067","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software","tag-agents","tag-coding","tag-maintainability","tag-sensors"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/15067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=15067"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/15067\/revisions"}],"predecessor-version":[{"id":15068,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/15067\/revisions\/15068"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/15069"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=15067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=15067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=15067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-05-24 20:18:53 UTC -->