fileops: Agentic Engineering

RoleDesigner and sole engineer
TypePersonal tool · Design-engineering
StackPython

A tool I built to fix a problem I kept hitting while working with AI coding agents: when an agent changes several files at once, a failure partway through leaves your project half-edited and no clean way back. fileops makes a batch of file changes all-or-nothing. Either every change lands, or none of them do

The friction

AI coding agents change files one at a time. Rename a function across ten files and that's ten separate steps, each one a chance to fail. When step seven breaks, you're left with a half-finished change: some files updated, some not, and the cleanup is on you. There was no way to tell the agent "make all of these changes together, or don't touch anything."

why it's also slow

Each operation is its own round trip: the agent proposes a change, waits, confirms, proposes the next. An N-file refactor is O(N) turns, and every turn spends context window and API credits. Batching collapses that to one call: the same single request whether it edits 1 file or 50

The insight

Databases solved this problem decades ago. A transaction either commits in full or rolls back to where it started; you never see a half-written state. File changes had no equivalent for this workflow. So I built one: declare every operation up front in a single spec, and let the tool guarantee the all-or-nothing outcome

How it works

The core trick is to do all the risky work off to the side first, then swap everything into place in one fast, safe motion. New file contents are prepared as temporary copies and the originals are backed up. Only then does it commit. If anything fails partway, it undoes every step it already took and restores the originals, so you never see a broken in-between state

the mechanism

Atomicity rides on POSIX rename semantics. Prepare: stage new content to a temp file in the target's own directory (same filesystem, so the rename is atomic) and back up anything being modified or deleted. Commit: os.replace(temp, target) per operation, which rename(2) guarantees is atomic. Rollback: on any failure, reverse the committed operations in reverse order and restore the backups. No temp files are left behind on either path

surviving a crash

Staged data is fsync'd before the rename and the parent directory is fsync'd after, so an OS or process crash can't surface a renamed-but-empty file or a lost rename. This is crash-consistency, not protection against sudden power loss; that would need a full hardware flush (F_FULLFSYNC on macOS) at roughly 50x the cost, for a failure mode outside this tool's job

Proving it was real

A working prototype is not a finished tool. Most of the work happened after it ran the first time. I reviewed my own code and found six functional bugs, then wrote regression tests so they couldn't come back. I ran adversarial cases against it and fixed five more correctness and safety problems. I added handling for the ugly edge cases: preserving a file's permissions, refusing to silently clobber symlinked files, reporting honestly which operations rolled back instead of claiming success

The one I'm proudest of is smaller than all of that. I had written that the tool protected your data through a power loss. Reading my own claim closely, it didn't, not strictly: it guaranteed consistency through a crash, which is a different and weaker promise. So I narrowed the claim to match exactly what the code could back. The work was choosing accuracy over a better-sounding sentence

the bug I almost shipped

A write followed by an edit to the same file in one batch used to compose against the stale on-disk text instead of the batch's pending content. The first operation was silently dropped at commit while the batch still reported success: the worst kind of failure, the kind that lies. It's now covered by a regression test

engineering health

33 tests on the executor alone covering every operation type, rollback, dry-run, and cleanup. CI runs ruff, mypy --strict, and the full suite across Python 3.11 through 3.14 on every push

I don't only design interfaces; I build the tools that make the work possible, and I hold them to a real engineering standard. The judgment that goes into a clean component is the same judgment that decides which guarantees a tool should make and refuses to overstate the ones it can't