I've Been Pair Programming With Xcode's AI Agent for 3 Months. We Need to Talk.

NativeFirst Team 10 min read
A developer workspace with a MacBook showing code on screen — representing the daily reality of working with AI coding agents in Xcode

You know when a company hires a brilliant new intern? They’re fast. They’re eager. They know everything in the textbook. They produce massive amounts of work on their first day and you think, “Holy crap, this person is going to be incredible.”

Then you read what they actually wrote.

The code works. Technically. But it’s 200 lines where you needed 40. It reinvents a utility that already exists three folders over. It handles edge cases that can’t physically happen in your app. And it adds fourteen comments explaining what a for loop does.

That’s been my life since February. Apple shipped Xcode 26.3 with full agentic coding — Claude and Codex baked directly into the IDE — and I’ve been using it daily on real iOS projects ever since.

Three months in, here’s the conversation nobody seems to be having honestly.


What It Actually Does (For Those Who Haven’t Tried It)

Quick context for anyone still on an older Xcode: the agent isn’t autocomplete. It’s not a smarter version of code suggestions. It’s a full coding partner that can search your project, read your documentation, navigate your file tree, update build settings, capture Xcode Previews, and iterate through build-run-fix cycles autonomously.

You describe what you want. It writes it. It builds. If it fails, it reads the error, fixes it, builds again. Repeat until green.

Apple called it “astoundingly fast, smart, and too convenient” in the press release. AppleInsider’s hands-on review used almost the same words. And honestly? Those first two weeks, I agreed completely.


The Honeymoon Was Real

Let me give credit where it’s due. Here’s what Xcode’s agent is genuinely, unambiguously good at:

Boilerplate generation. Need a new MVVM module with a view model, a view, a repository protocol, and mock data? Describe it, get it in 30 seconds. The structure is clean. The naming follows your existing conventions (because it reads your project). This used to be 15 minutes of copying and renaming. Now it’s a prompt.

Test writing. I hate writing tests for straightforward view models. The agent loves it. It reads the view model, generates comprehensive XCTest cases, handles edge cases I’d have been too lazy to cover, and even uses your existing mock patterns. My test coverage went from “aspirational” to “actually respectable.”

Preview iteration. This one surprised me. The agent captures Xcode Previews, sees what the UI looks like, and iterates. “Make the spacing tighter.” “That card needs rounded corners of 12.” “The text is truncating — fix it.” It’s like having a designer looking over your shoulder, except it actually makes the changes.

Documentation and error messages. Converting a codebase from “no docs” to “has docs” used to be a guilt-ridden weekend project. Now it’s a prompt.

For our SwiftUI at Scale course, I used the agent to scaffold initial lesson code examples. What used to take an hour of boilerplate per lesson now takes ten minutes of prompting and five minutes of review.


Then Reality Kicked In

Around week three, the honeymoon ended. Not with a crash — with a slow, creeping realization.

The 50-line problem. Simon Willison (Django co-creator) nailed this in a recent interview: AI agents write 50 lines where a human would write 5. And he’s right. I started noticing the agent creating entire helper structs for things a one-line extension could handle. Building custom view modifiers when a simple .padding() would do. Wrapping Foundation APIs in abstractions that add zero value.

The code works. It compiles. It passes tests. But when you come back to it two weeks later, you’re wading through a swamp of over-engineering that makes the simple feel complex.

Architecture astronautics. Ask the agent to “add offline caching to this feature” and it’ll design you a three-layer architecture with a protocol-oriented repository pattern, a caching strategy protocol, an in-memory fallback, a disk persistence layer, and a synchronization coordinator. For what should’ve been a 20-line SwiftData query with .modelContainer persistence.

It doesn’t know when simple is better. It doesn’t know that your app has 200 users, not 200 million. It builds everything like it’s preparing for Google scale.

The false confidence trap. This is the dangerous one. The agent’s code always looks professional. Clean formatting. Good naming. Proper access control. So you start trusting it faster, reviewing it less carefully. And then three weeks later you find it’s been force-unwrapping an optional that absolutely can be nil in production, buried inside a function that looked fine at a glance.


The Security Conversation Nobody Wants to Have

David Mytton, CEO of Arcjet, predicted “catastrophic problems” from unreviewed agent-coded apps hitting production. Simon Willison went further — he said we’re “due a Challenger disaster” with coding agent security.

That sounds dramatic. But after three months, I get why they’re worried.

The agent doesn’t think about security the way a paranoid iOS developer does. It handles the happy path beautifully. But it won’t volunteer that your Keychain access needs error handling for locked devices. It won’t mention that the URL you’re constructing from user input could be manipulated. It won’t flag that your App Transport Security exception probably shouldn’t be there.

It writes correct code, not defensive code. And in iOS development — where you’re dealing with user data, payment information, health records, location — that gap matters.

I’ve started treating every agent session like a PR from a junior developer: read every line, question every assumption, check every boundary. Which, ironically, is about as much effort as writing it myself. The speed gain is real, but only if your review muscle is strong.

We’ve written about this broader pattern before — the vibe coding technical debt crisis is exactly what happens when people skip the review step.


What Changed About How I Work

Here’s the thing nobody prepared me for: the agent didn’t just change what I write. It changed what I think about while working.

Before Xcode 26.3, my brain was split roughly 70/30 between “what should this code do” and “how exactly do I type that in Swift.” Now it’s more like 90/10. The mechanical translation work is gone. I spend almost all my time on architecture, on naming, on intent.

That’s mostly good. It means I’m working at a higher level of abstraction. More time designing the system, less time remembering whether it’s URLSession.shared.data(for:) or URLSession.shared.data(from:).

But there’s a dark side. I’ve noticed my recall of Swift APIs getting fuzzier. Last week I blanked on the @Observable macro syntax — something I’ve written a hundred times. My fingers knew it. My brain didn’t anymore. That freaked me out a little.

It’s like how GPS killed my sense of direction. I can get anywhere, faster than ever. But I no longer know where I am.


The Prompt Quality Problem

Here’s something Apple doesn’t advertise: the quality of your results scales directly with the quality of your prompts. Vague descriptions get vague code. Precise, contextual prompts get excellent output.

“Add networking to this screen” gets you a generic URLSession call with no error handling.

“Add a repository method that fetches paginated results from this specific endpoint, handles 401 by refreshing the token, surfaces network errors as a typed enum, and cancels in-flight requests when the view disappears” gets you production-ready code.

The gap between those two outcomes is enormous. And it means the developers who benefit most from AI agents are, counterintuitively, the developers who already know exactly what they want. The ones who could write it themselves but are tired of typing it.

If you’re working with AI coding tools daily — whether in Xcode or elsewhere — having a solid prompt library saves you from reinventing the same instructions every time. That’s actually why we built PromptKit: a place to store, organize, and quickly retrieve the prompts that actually produce good results across your tools.


Looking Ahead: What WWDC 2026 Needs to Fix

WWDC 2026 is five weeks away (June 8-12). If Apple’s serious about agentic coding — and given the investment, they clearly are — here’s what the next version needs:

Project-level context awareness. The agent reads your files but doesn’t understand your project’s philosophy. It doesn’t know you prefer composition over inheritance, or that you’ve deliberately avoided Combine in favor of async/await. It needs some kind of project-level instruction file — like a .xcode-agent-rules that sets constraints.

Scale-appropriate suggestions. An indie app with 500 users doesn’t need the same architecture as iMessage. The agent should ask about context or infer it from project size.

Security-first mode. A toggle that makes the agent run every generated code path through a security checklist before presenting it. iOS developers handle sensitive data. The tooling should reflect that.

Honest confidence scores. Instead of presenting all output with equal confidence, flag when it’s guessing. “I’m not sure this is the right approach for SwiftData relationships — here’s what I think, but verify” would save hours of debugging.


The Verdict After 90 Days

Xcode’s AI agent is the best and worst thing to happen to my workflow simultaneously.

It’s the best because I ship features faster, write more tests, and spend less time on mechanical translation work. Real, measurable productivity gains. Not hype.

It’s the worst because it tempts me to think less. To trust more. To let “it compiles and passes tests” become my definition of done. Every time I catch myself skimming agent output instead of reading it, I have to physically stop and reset.

My rule after three months: use the agent for generation, never for decision-making. Let it write the implementation. Never let it choose the architecture. Let it produce test cases. Never let it decide what to test. Let it scaffold. Never let it design.

The developers who thrive with this tool will be the ones who use it like power steering — not like autopilot. It makes the mechanical work effortless. But you still need to know where you’re going.

If you’re an iOS developer who hasn’t tried it yet, you should. Just go in with eyes open. And for the love of everything, read what it writes before you ship it.


Building with AI tools in your iOS workflow? Our SwiftUI at Scale course covers how to architect apps that stay maintainable whether the code comes from your fingers or an AI agent — because the architecture decisions are the part machines still can’t make for you.

Share this post

Share on X LinkedIn

Comments

Leave a comment

0/1000

N

NativeFirst Team

Editorial

The NativeFirst team — engineers and designers building native Apple apps and writing the courses we wish we had when we started.