ai development mcp swift architecture

From Grep to Graph: Building a Local RAG That Actually Understands Code

I used to watch AI agents run grep searches, load 50 files into context, and still miss the answer sitting in line 47 of a file they never checked. Context window full of irrelevant matches. Agent guessed. Code broke.

Eight days later, I had something that worked. The difference isn't a smarter model. It's structure.

(The git history doesn't lie. January 19th: "Start Local RAG indexing + MCP tools." January 26th: AST chunking for Swift, Ruby, TypeScript. January 30th: dependency graph. Rapid iteration when the tools help build themselves.)


The Problem

Context windows are huge now. 200K tokens. A million tokens. Infinite context is coming, right?

Wrong framing.

The problem isn't capacity. It's relevance. Dump 200K tokens of code into a prompt and the model drowns in noise. Important details get buried. Signal-to-noise ratio tanks.

"But context windows keep growing. Why not just wait?"

Because the problem scales with the solution. Bigger context windows mean you can dump more code, which means more noise, which means the model still misses what matters. I've tested this. A 200K context with 180K tokens of mediocre matches performs worse than a 20K context with 15K tokens of precise matches. Relevance beats capacity.

Here's what a typical AI coding session looks like at scale:

  1. Agent runs grep -r "validatePayment" --include="*.ts"
  2. Returns 47 matches across the codebase
  3. Agent loads 15 files into context
  4. Half are test files. A quarter are unrelated string matches.
  5. Context window hits 80K tokens
  6. The actual implementation? In a file the agent never read because grep matched the wrong thing first.

Text search doesn't understand code. It doesn't know that validatePayment in a test file matters less than validatePayment in PaymentService.ts. It doesn't know that PaymentService imports ValidationRules, where the actual logic lives.

I spent months working around this. Cherry-picking files. Writing elaborate prompts. Accepting mediocre results.

So I built something better.


What I Built

Peel is a macOS app I built in January 2026. Weekend experiment turned primary development tool.

The core:

  1. AST Chunker - Parses Swift, TypeScript, Ruby, and Glimmer files into semantic chunks (classes, functions, components) instead of arbitrary line ranges
  2. Dependency Graph - Tracks imports, inheritance, protocol conformance, and mixins across the codebase
  3. Local Embeddings - MLX-powered semantic search running entirely on-device
  4. Code Analyzer - Local LLM that generates summaries and semantic tags for each chunk
  5. MCP Server - Exposes everything to VS Code, Cursor, and Claude Desktop via JSON-RPC

The entire system runs locally. No API calls for search. No code leaving my machine. Fast enough to query mid-thought.


AST Chunking

Most RAG systems chunk by lines. 100 lines per chunk. Maybe 200. Overlap them to avoid cutting functions in half.

This is wrong.

Code has structure. A 4,000-line file might have 12 meaningful units: 3 classes, 8 extensions, 1 import block. Each unit is a complete thought. When you search, you want semantic units, not arbitrary slices.

Here's the core data structure:

public struct ASTChunk: Sendable, Equatable {
  public let constructType: ConstructType  // classDecl, function, component
  public let constructName: String?        // "UserService", "validateForm"
  public let startLine: Int
  public let endLine: Int
  public let text: String
  public let metadata: ASTChunkMetadata
}

The ConstructType enum captures what kind of code this is:

public enum ConstructType: String {
  case file           // Entire file or fallback
  case imports        // Import block
  case classDecl      // Class definition
  case structDecl     // Struct definition
  case enumDecl       // Enum definition
  case protocolDecl   // Protocol/interface
  case function       // Top-level function
  case method         // Method within a type
  case component      // UI Component (Ember/React)
  // ...
}

The real value is in the metadata:

public struct ASTChunkMetadata: Sendable {
  public var decorators: [String]      // @MainActor, @Observable, @tracked
  public var protocols: [String]       // Sendable, Equatable
  public var imports: [String]         // Foundation, SwiftUI
  public var superclass: String?       // Component, Actor
  public var frameworks: [String]      // SwiftUI, Ember, Rails
  
  // Language-specific
  public var propertyWrappers: [String]  // Swift: @State, @Environment
  public var mixins: [String]            // Ruby: include, extend
  public var callbacks: [String]         // Rails: before_action
  public var associations: [String]      // Rails: has_many, belongs_to
  public var usesEmberConcurrency: Bool  // Ember: task patterns
  public var hasTemplate: Bool           // Glimmer: <template> blocks
}

Now when I search, results include:

  • "This is a classDecl named PaymentService"
  • "It conforms to Sendable and PaymentProcessing"
  • "It uses @MainActor and imports Combine"
  • "It inherits from BaseService"

The agent doesn't just find text. It finds semantic units with context.


Multi-Language Parsing

My daily work spans Swift, TypeScript (with Ember/Glimmer), and Ruby (Rails). Each language needs its own parser.

Swift uses Apple's SwiftSyntax library:

let sourceFile = Parser.parse(source: source)
for statement in sourceFile.statements {
  if let classDecl = item.as(ClassDeclSyntax.self) {
    let metadata = extractMetadata(from: classDecl)
    // decorators, protocols, superclass...
  }
}

TypeScript and Glimmer use Babel, bundled for JavaScriptCore:

const ast = babelParse(source, {
  plugins: ['typescript', 'decorators-legacy', 'classProperties']
});

// Handle Glimmer <template> blocks specially
if (isGlimmer) {
  const templateRanges = preprocessGlimmer(source);
  // Associate templates with their components
}

Glimmer files (.gts) need special handling. They embed <template> blocks inside TypeScript. The chunker extracts these and associates them with their parent component class. When I search for a component, I get both the class definition and its template.

Ruby uses pattern matching plus heuristics. No AST library needed for the metadata I care about:

// Rails associations
if line.hasPrefix("has_many") || line.hasPrefix("belongs_to") {
  metadata.associations.append(extractAssociation(line))
}

// Mixins
if line.hasPrefix("include ") {
  metadata.mixins.append(extractMixin(line))
}

// Callbacks
if line.contains("before_action") || line.contains("after_create") {
  metadata.callbacks.append(extractCallback(line))
}

I'm not trying to build a full Ruby parser. I'm extracting the metadata that helps AI agents understand the code.


Dependency Graph

Every time I index a file, I extract its dependencies:

enum LocalRAGDependencyType: String {
  case `import`   // Swift: import, TS: import
  case require    // Ruby: require, require_relative
  case include    // Ruby: include (mixin)
  case extend     // Ruby: extend
  case inherit    // Class inheritance
  case conform    // Protocol/interface conformance
}

These get stored in a SQLite table with source file, target, and relationship type.

Use case: "What uses PaymentService?"

Old way: grep -r "PaymentService" across 200 files. Manually filter out tests, comments, string literals.

New way:

SELECT source_files.path 
FROM dependencies
WHERE target_file_id = (SELECT id FROM files WHERE path LIKE '%PaymentService%')

Returns only files that actually import or inherit from PaymentService. Five results instead of 47. All relevant.

Use case: "Show me the inheritance chain"

The graph knows PaymentController inherits from BaseController which inherits from ApplicationController. One query traces the entire chain.

Use case: "What would break if I change this interface?"

Query everything that depends on that interface. Get a precise impact analysis. No grep guessing.


Local Embeddings

Embeddings run entirely on-device using MLX (Apple's ML framework for Apple Silicon). No API calls.

The model selection scales to available RAM:

static func recommended(forMemoryGB gb: Double) -> MLXAnalyzerModelTier {
  if gb >= 48 { return .large }       // Mac Studio: 7B model
  else if gb >= 24 { return .medium } // 32GB MacBook: 3B model  
  else if gb >= 12 { return .small }  // 18GB M3: 1.5B model
  else { return .tiny }               // 8GB: 0.5B model
}

My M3 MacBook Pro with 18GB runs the 1.5B Qwen2.5-Coder model. Mac Studio users get the 7B model. Everyone gets quality embeddings without API costs.

  • Privacy: Code never leaves the machine
  • Speed: No network latency, works offline
  • Cost: Zero API costs, run unlimited queries
  • Quality: Qwen2.5-Coder models understand code

Vector search finds conceptually similar code even when keywords don't match. "How does authentication work?" matches the SessionManager class even though the word "authentication" doesn't appear in it.


Search Pipeline

When an agent asks a question:

  1. Text search for exact keyword matches (fast, precise)
  2. Vector search for semantic similarity (catches concepts)
  3. Combine and rank results by relevance
  4. Enrich with metadata from the dependency graph

The result structure gives agents everything they need:

struct LocalRAGSearchResult: Sendable {
  let filePath: String
  let startLine: Int
  let endLine: Int
  let snippet: String
  
  let constructType: String?    // "classDecl", "function", "component"
  let constructName: String?    // "PaymentService", "validateForm"
  let language: String?         // "Swift", "Glimmer TypeScript"
  let isTest: Bool              // Deprioritize test files
  
  let aiSummary: String?        // "Handles payment validation and processing"
  let aiTags: [String]          // ["validation", "payments", "async"]
  let score: Float?             // Relevance score
}

The agent gets structured answers. Not a wall of text.


MCP Server

Everything exposed via Model Context Protocol. VS Code config:

{
  "mcp.servers": {
    "peel": {
      "url": "http://127.0.0.1:8765/rpc"
    }
  }
}

Now any MCP-compatible tool can query my codebase:

# Text search
curl -X POST http://127.0.0.1:8765/rpc \
  -d '{"method":"tools/call","params":{"name":"rag.search","arguments":{"query":"payment validation","limit":10}}}'

# Vector search  
curl -X POST http://127.0.0.1:8765/rpc \
  -d '{"method":"tools/call","params":{"name":"rag.search.vector","arguments":{"query":"how does auth work"}}}'

# Dependency queries
curl -X POST http://127.0.0.1:8765/rpc \
  -d '{"method":"tools/call","params":{"name":"rag.dependencies","arguments":{"filePath":"PaymentService.swift"}}}'

Copilot, Claude, Cursor, or anything else that speaks MCP gets the same capabilities.

MCP from day one meant agents could validate their own work. When an agent writes code that touches LocalRAGStore, it queries the dependency graph to see what else might break. It searches for similar patterns. It checks if tests exist.

The tool helps build itself. Every improvement to Peel makes the agents better at improving Peel.


Eight Days from Idea to Daily Driver

The git log tells the story:

January 19 (Day 1): "Start Local RAG indexing + MCP tools." SQLite schema, file scanning, line-based chunking. And the MCP server skeleton. From day one, VS Code agents could query the (terrible) early search results. This meant the agents helping me build Peel could use Peel to understand Peel.

January 19-20 (Days 1-2): System embeddings, vector search stub, UI scaffolding. Results were mediocre but the pipeline worked.

January 21 (Day 3): Core ML embeddings, context injection into agents, first draft of product manual. The "this actually works" moment.

January 26 (Day 8): AST chunking explosion. Swift via SwiftSyntax. Ruby via tree-sitter. TypeScript/Glimmer via Babel bundled for JavaScriptCore. Four commits in one day adding language-aware parsing.

January 28 (Day 10): Structured metadata extraction. Decorators, protocols, imports tracked per chunk.

January 30 (Day 12): Dependency graph. Import tracking, inheritance chains. The "what depends on this?" feature I'd been wanting.

The timeline compressed because each improvement made the next one easier. Better search meant agents found relevant code faster. Better code understanding meant agents made fewer mistakes. The tool helped build itself.

I've used it every day since. The codebase I work with has 12+ services, hundreds of thousands of lines across Swift, TypeScript, and Ruby. Before Peel, I spent significant time manually finding relevant code. Now the agents find it.


Development Workflow

I write Swift in VS Code, not Xcode. This surprises people.

Peel is a native macOS/iOS app. SwiftUI, SwiftData, strict Swift 6 concurrency. I open the .xcodeproj folder in VS Code and use Copilot agents to write code. Xcode handles building, debugging, and previews.

Why? Copilot for Xcode is progressing but still has rough edges. VS Code agent experience is smoother today. I can have an agent read MCP server output, understand what's happening, and write code that integrates correctly.

My workflow:

  1. VS Code: Write code with agents, use MCP tools to search and understand
  2. Xcode: Build, debug, run previews, profile
  3. Peel: The app I'm building provides tools to the agents building it

Not elegant, but fast. When Copilot for Xcode catches up, I'll consolidate. The MCP server doesn't care which IDE calls it.


Before and After

Before:

  • Agent runs 5+ grep searches
  • Loads 20+ files into context
  • Half are irrelevant (tests, comments, string matches)
  • Context at 100K+ tokens
  • Agent makes educated guesses
  • Code often wrong, needs multiple iterations

After:

  • Agent calls RAG search once
  • Gets 5-10 precise chunks
  • Each chunk has metadata explaining what it is
  • Context under 20K tokens
  • Agent understands the structure
  • Code works on first try more often

The models aren't the bottleneck. Context is. Give an AI agent structured context instead of raw grep output and it performs like a different tool.

Something I can't prove but keep noticing: the code quality is higher. Not just "it works" but polished. UX details I didn't ask for. No duplicated logic. Edge cases handled. The code built this week is hard to find fault with, and I keep looking. Maybe smaller, precise context lets the model think more clearly instead of pattern-matching through noise. Maybe I'm imagining it. But the results are there.

What still breaks: Cross-repo references. When ServiceA in one repo imports a type from ServiceB in another repo, the dependency graph loses the trail. I'm working on it. Also, the Ruby parser is too naive for metaprogramming-heavy code. Rails conventions work great. DSL-heavy gems? Not so much. Peel isn't magic. It's just better than grep.


What I'd Do Differently

Most of this worked. Some of it was luck. If I were starting over:

  1. Start with the dependency graph. It's more valuable than I expected. Import relationships tell you more about code organization than any other signal. I built it last because I thought it was a nice-to-have. It's not.

  2. Don't overthink chunking boundaries. My first version tried to be too clever about splitting large classes. Just keep the whole class if it's under 300 lines. Agents handle it fine.

  3. Test search quality obsessively. The difference between "good" and "great" search results compounds. Every improvement in retrieval quality shows up immediately in agent performance.

  4. MCP from day one. Build for integration, not standalone.


If You Want to Build This

Core technologies:

  • Swift: SwiftSyntax for parsing, SQLite for storage, MLX for embeddings
  • TypeScript: Babel for AST, bundled with esbuild for JavaScriptCore
  • Search: Hybrid text + vector with result merging
  • Integration: MCP server (JSON-RPC over HTTP)

The AST chunker for TypeScript/Glimmer is ~400 lines of JavaScript. The Swift chunker is ~600 lines. The dependency extractor is ~300 lines per language. This isn't a massive system. It's a focused tool that does one thing well.

The full Peel app has more features (agent orchestration, parallel worktrees, distributed swarm execution), but the RAG core is the foundation everything else builds on.


Bigger Picture

This is Intent-Driven Development at the infrastructure level.

Structure your code understanding with AST parsing. Tag it with semantic metadata. Build the dependency graph. Give AI agents a map instead of making them wander.

Models will keep getting better. Context windows will keep growing. The fundamental problem remains: relevance beats capacity. A focused 10K-token context outperforms a sprawling 100K-token dump.

Structure compounds. Yesterday I added HuggingFace reranking. Today the dependency graph landed. Tomorrow, who knows. Each piece makes the others more useful.

Eight days to build. Used every day since. That's the ROI on building your own tools.


Peel: Available Soon

Peel is in rapid development. I'm using it daily, shipping features by the minute, and dogfooding relentlessly. It's not ready for public release yet, but it's getting close.

When it launches, it'll be a paid app. I'm still figuring out the details, but not subscription nonsense where you lose access if you stop paying. Pay once and own it, or subscribe if you want to support ongoing development. Something fair.

I need to pay my mortgage. You need tools that work.

If you're interested in early access or want to follow development, email me: cory@crunchybananas.com. I'm building this because I needed it. Turns out I'm not the only one.


This post is part of a series on AI-augmented development. See also: Intent-Driven Development and Why We're Back.


AI Transparency: This post was written with assistance from Claude. The technical details, code examples, and architecture are from my actual implementation. Claude helped structure the narrative and tighten the prose.

If this article helped you ship code, consider sponsoring my work.

No perks. No ads. Just support for writing that doesn't waste your time.