How I use an AI agent to code new features

Last month at Buoy I’ve been working on an Architecture Decision Record (ADR) to delineate a common foundation for a handful of upcoming projects. I thought about it during workdays and slept on it, generated diagrams and pseudo-code, spoke with different team members and incorporated some of their feedback. It looked good, and last Monday morning I started implementing it, I thought it’d take me about half a day to have a first Pull Request ready for review.

In general terms, the project begins when our “Bariloche” Rails app receives a message from our monolith that may kick off a series of computations and automated actions. The goal is to ease the work and remove potential human errors from our Plasma Donation Facilities.

We use packwerk for the different slices of our apps. I’d start a new pack with the single route for the single controller action to receive that event, then I’d focus on the rest of the implementation. An ideal task to try out Cursor’s AI agent on a bigger task than I’ve been using it for.

I configured Cursor with the claude-3.5-sonnet model, and began a conversation with its Composer “Agent” mode:

tute: I want to develop a new feature which, after receiving a POST request to a new API’s controller and action, I handle the response. The incoming message will look like the following: { ... }. Please create a new packwerk pack called “actions” where the new route and controller will live.

I say “please”, “thank you” and complete punctuation so I don’t forget to be polite when I chat similar questions to actual people.

I described what I wanted, and it delivered. The agent tried running an inexistent packwerk generator, and encountered the error Could not find generator 'packwerk:pack', so it moved on manually, see its thought process:

Cursor agent creating the pack structure

And it went on creating the controller and route. I asked it to rename the controller to a better name I hadn’t specified before, and it renamed the file, the class, and the route together, prompting me after if I wanted a controller spec. “Yes, sure”. And then I had more than I asked, in a good way. The spec looked a bit different than I’d write it but technically the same. It errored out though. I had to plug the pack’s new route file into the main config/routes.rb file, then the spec ran green.

I don’t have timestamps, but I was already surprised: I had a working pack with a spec done in a few minutes. While no piece of it was particularly challenging even to new developers, it freed me from moving files and strings around.

Is this good bye, beloved vim code editor? I used vim for +10 years, and used Sublime Text’s (then VS Code) multi-cursor feature for certain changes I could do faster there. I have my dotfiles with terminal commands for quick project-wide find-and-replace and other common tasks. I prided myself in doing such work fast. I don’t need these tools so much now. This code was generated and renamed as I would have done it but faster, and the agent and I could understand each other surprisingly well without much prompt wrangling or custom rules.

The new pack and controller task isn’t a particularly laborious example either, with my tools I’d have taken maybe a few more minutes. But I’d have no spec which is a nice to have as I work with new names and files, and it’s good to have that solved while I have the ADR in mind and think of the next steps and how they’ll tie together. Renaming classes with specs is a bit cumbersome even with my fancy terminal commands, I’m happy to delegate such details forever.

The only piece it didn’t automatically solve was plugging the packs’ routes file into the main file, which I debugged and fixed.

Encouraged, I raised the stakes.

The next piece of the ADR is the pseudo-code, a stream of classes and methods with the general idea I sketched up. I had polished the snippet in Cursor, to the point that it became running software. I didn’t expect it from early notes but thought, “why not!” Well, because my colleagues could now leave code-review type of comments on a document I intended to stay high level, that’s why! I sent the sample code to the AI agent:

tute: For each of these classes create a new file within the actions pack. Decide if they should go in services/ or models/ directories. Create unit specs for each.

And did it deliver! It created the files, choosing the proper classes to go in the right directories. It also found a background job and created the jobs/ directory for it, and it namespaced all classes within an Actions module, a helpful pattern to follow when organizing Rails apps with packwerk, both of which I didn’t specifically ask for (I would have later).

I edited most of its specs myself: it’d use doubles in a way we don’t use, or concrete classes on unit specs I didn’t prefer either, it’d use let constructs at the top but I’d rather go with the four phases on each test. Some changes I asked it to implement, others I did myself. I carefully review specs ensuring there’s coverage of what I intend, and it documents behavior as we usually do.

Sometimes it would fill a spec description like it "does the thing when X condition holds" which I prefer to write as context "X condition holds" / it "does the thing". I started editing the first such line and it prompted me to autocomplete with <Tab> three associated changes at once:

a context line above
the it line below (with the proper strings after each)
the extra level of indentation in the contained block

It then prompted me to continue hitting <Tab> to apply the same changeset in the following similar it lines. Yes, sure, thank you!

Tab for yes

In the pseudo-code I had encoded a use case that won’t be our first project; I asked it to rename it to another example, and it did (and it completed a simple implementation that was good enough for these first commits exploring the new architecture). Specs were green.

The implementation was 85% ready, pending a few rounds of self-review and improvements. It didn’t take half a day but about an hour, I know because between these two milestones I sent messages to my colleague showing the use case, as we are both experimenting with AI agents after years in vim.

I asked it to polish the directory structure: services/ will see more instances of similar objects, so they’ll benefit from two subdirectories organizing them. It created the directories, moved the files properly, created the new ruby namespaces, and decided to rename the children to remove the suffix now mentioned in the new namespace, and the parent object to Base to avoid repeating the same name. It did the same with specs, which continued to be green.

Cursor agent properly namespacing and renaming classes

And a bit of a relief! Cause I found something wrong! You won’t steal my job quite yet; I demanded:

tute: Move the Runner from models into services, same with its spec

But it didn’t follow my order and asked me to look twice:

agent: I don’t see Runner in the models directory - it seems it’s already in app/services/actions/runner.rb. However, let’s check if its spec is in the right place and move it if needed:

ls -l …

tute: [Whops!] right, TY!

agent: You’re welcome! Yes, looking at the files, Runner is already correctly placed in:

Implementation: app/services/actions/runner.rb

Spec: spec/services/actions/runner_spec.rb

So no changes are needed. The service layer is the right place for this class since it orchestrates […]

I’ve heard the case that a coding agent is like a great junior engineer, but this interaction felt senior, patiently correcting: “you want to do something but I know that’s wrong and I’ll try to steer you to find you don’t actually need it”. Correct, confident, and polite.

So be it! I told it to complete the feature with a feature flag, similar to that other one in the app. And no surprise by now, it did, a bit more code and associated specs than I wanted, but easy to delete and easier than doing it myself.

Another round of review and small edits, got tests red (in a separate terminal process), and I asked about the bug. It was just as easy to Google, but I had the chat closer:

tute: In Ruby, how can I send a stringified hash as keyword arguments for a method call?

And did it deliver? You bet! It answered the generic question in a short sentence, and in the next one it mentioned where in the code it happened, mentioned two fixes, and proposed a change for me to accept or reject. I was refactoring so I copied and moved it around myself.

I had the final say on every file and bit of code, but I can’t infer that this made me necessary, at least not for government regulated projects. I could believe the thought we invested in the ADR and the pseudo-code set the agent up for success, but would then have to hide the fact the ADR itself saw AI collaboration. The edits I applied were mostly about conventions in our team, a bug it will soon be able to fix by itself if it doesn’t already, and some style, which anyway has room for some divergence when it technically behaves the same.

I felt more focused in the Product than the Engineering, ensuring the AI agent didn’t hallucinate, and created no more nor less than the software we needed.

Defining rules for common edits on top of the AI work is a possibility I’ll explore next.

A last round of self review and refactor, run specs, commit, and present the work in GitHub. Enjoy, because I tend to be optimistic. And digest.