From Grep to Graph: Building a Local RAG That Actually Understands Code
I used to watch AI agents run grep searches, load 50 files into context, and still miss the answer sitting in line 47 of a file they never checked. Context window full of irrelevant matches. Agent guessed. Code broke.
Eight days later, I had something that worked. The difference isn't a smarter model. It's structure.
(The git history doesn't lie. January 19th: "Start Local RAG indexing + MCP tools." January 26th: AST chunking for Swift, Ruby, TypeScript. January 30th: dependency graph. Rapid iteration when the tools help build themselves.)
The Problem
Context windows are huge now. 200K tokens. A million tokens. Infinite context is coming, right?
Wrong framing.
The problem isn't capacity. It's relevance. Dump 200K tokens of code into a prompt and the model drowns in noise. Important details get buried. Signal-to-noise ratio tanks.
"But context windows keep growing. Why not just wait?"
Because the problem scales with the solution. Bigger context windows mean you can dump more code, which means more noise, which means the model still misses what matters. I've tested this. A 200K context with 180K tokens of mediocre matches performs worse than a 20K context with 15K tokens of precise matches. Relevance beats capacity.
Here's what a typical AI coding session looks like at scale:
- Agent runs
grep -r "validatePayment" --include="*.ts" - Returns 47 matches across the codebase
- Agent loads 15 files into context
- Half are test files. A quarter are unrelated string matches.
- Context window hits 80K tokens
- The actual implementation? In a file the agent never read because grep matched the wrong thing first.
Text search doesn't understand code. It doesn't know that validatePayment in a test file matters less than validatePayment in PaymentService.ts. It doesn't know that PaymentService imports ValidationRules, where the actual logic lives.
I spent months working around this. Cherry-picking files. Writing elaborate prompts. Accepting mediocre results.
So I built something better.
What I Built
Peel is a macOS app I built in January 2026. Weekend experiment turned primary development tool.
The core:
- AST Chunker - Parses Swift, TypeScript, Ruby, and Glimmer files into semantic chunks (classes, functions, components) instead of arbitrary line ranges
- Dependency Graph - Tracks imports, inheritance, protocol conformance, and mixins across the codebase
- Local Embeddings - MLX-powered semantic search running entirely on-device
- Code Analyzer - Local LLM that generates summaries and semantic tags for each chunk
- MCP Server - Exposes everything to VS Code, Cursor, and Claude Desktop via JSON-RPC
The entire system runs locally. No API calls for search. No code leaving my machine. Fast enough to query mid-thought.
AST Chunking
Most RAG systems chunk by lines. 100 lines per chunk. Maybe 200. Overlap them to avoid cutting functions in half.
This is wrong.
Code has structure. A 4,000-line file might have 12 meaningful units: 3 classes, 8 extensions, 1 import block. Each unit is a complete thought. When you search, you want semantic units, not arbitrary slices.
Here's the core data structure:
public struct ASTChunk: Sendable, Equatable {
public let constructType: ConstructType // classDecl, function, component
public let constructName: String? // "UserService", "validateForm"
public let startLine: Int
public let endLine: Int
public let text: String
public let metadata: ASTChunkMetadata
}
The ConstructType enum captures what kind of code this is:
public enum ConstructType: String {
case file // Entire file or fallback
case imports // Import block
case classDecl // Class definition
case structDecl // Struct definition
case enumDecl // Enum definition
case protocolDecl // Protocol/interface
case function // Top-level function
case method // Method within a type
case component // UI Component (Ember/React)
// ...
}
The real value is in the metadata:
public struct ASTChunkMetadata: Sendable {
public var decorators: [String] // @MainActor, @Observable, @tracked
public var protocols: [String] // Sendable, Equatable
public var imports: [String] // Foundation, SwiftUI
public var superclass: String? // Component, Actor
public var frameworks: [String] // SwiftUI, Ember, Rails
// Language-specific
public var propertyWrappers: [String] // Swift: @State, @Environment
public var mixins: [String] // Ruby: include, extend
public var callbacks: [String] // Rails: before_action
public var associations: [String] // Rails: has_many, belongs_to
public var usesEmberConcurrency: Bool // Ember: task patterns
public var hasTemplate: Bool // Glimmer: <template> blocks
}
Now when I search, results include:
- "This is a
classDeclnamedPaymentService" - "It conforms to
SendableandPaymentProcessing" - "It uses
@MainActorand importsCombine" - "It inherits from
BaseService"
The agent doesn't just find text. It finds semantic units with context.
Multi-Language Parsing
My daily work spans Swift, TypeScript (with Ember/Glimmer), and Ruby (Rails). Each language needs its own parser.
Swift uses Apple's SwiftSyntax library:
let sourceFile = Parser.parse(source: source)
for statement in sourceFile.statements {
if let classDecl = item.as(ClassDeclSyntax.self) {
let metadata = extractMetadata(from: classDecl)
// decorators, protocols, superclass...
}
}
TypeScript and Glimmer use Babel, bundled for JavaScriptCore:
const ast = babelParse(source, {
plugins: ['typescript', 'decorators-legacy', 'classProperties']
});
// Handle Glimmer <template> blocks specially
if (isGlimmer) {
const templateRanges = preprocessGlimmer(source);
// Associate templates with their components
}
Glimmer files (.gts) need special handling. They embed <template> blocks inside TypeScript. The chunker extracts these and associates them with their parent component class. When I search for a component, I get both the class definition and its template.
Ruby uses pattern matching plus heuristics. No AST library needed for the metadata I care about:
// Rails associations
if line.hasPrefix("has_many") || line.hasPrefix("belongs_to") {
metadata.associations.append(extractAssociation(line))
}
// Mixins
if line.hasPrefix("include ") {
metadata.mixins.append(extractMixin(line))
}
// Callbacks
if line.contains("before_action") || line.contains("after_create") {
metadata.callbacks.append(extractCallback(line))
}
I'm not trying to build a full Ruby parser. I'm extracting the metadata that helps AI agents understand the code.
Dependency Graph
Every time I index a file, I extract its dependencies:
enum LocalRAGDependencyType: String {
case `import` // Swift: import, TS: import
case require // Ruby: require, require_relative
case include // Ruby: include (mixin)
case extend // Ruby: extend
case inherit // Class inheritance
case conform // Protocol/interface conformance
}
These get stored in a SQLite table with source file, target, and relationship type.
Use case: "What uses PaymentService?"
Old way: grep -r "PaymentService" across 200 files. Manually filter out tests, comments, string literals.
New way:
SELECT source_files.path
FROM dependencies
WHERE target_file_id = (SELECT id FROM files WHERE path LIKE '%PaymentService%')
Returns only files that actually import or inherit from PaymentService. Five results instead of 47. All relevant.
Use case: "Show me the inheritance chain"
The graph knows PaymentController inherits from BaseController which inherits from ApplicationController. One query traces the entire chain.
Use case: "What would break if I change this interface?"
Query everything that depends on that interface. Get a precise impact analysis. No grep guessing.
Local Embeddings
Embeddings run entirely on-device using MLX (Apple's ML framework for Apple Silicon). No API calls.
The model selection scales to available RAM:
static func recommended(forMemoryGB gb: Double) -> MLXAnalyzerModelTier {
if gb >= 48 { return .large } // Mac Studio: 7B model
else if gb >= 24 { return .medium } // 32GB MacBook: 3B model
else if gb >= 12 { return .small } // 18GB M3: 1.5B model
else { return .tiny } // 8GB: 0.5B model
}
My M3 MacBook Pro with 18GB runs the 1.5B Qwen2.5-Coder model. Mac Studio users get the 7B model. Everyone gets quality embeddings without API costs.
- Privacy: Code never leaves the machine
- Speed: No network latency, works offline
- Cost: Zero API costs, run unlimited queries
- Quality: Qwen2.5-Coder models understand code
Vector search finds conceptually similar code even when keywords don't match. "How does authentication work?" matches the SessionManager class even though the word "authentication" doesn't appear in it.
Search Pipeline
When an agent asks a question:
- Text search for exact keyword matches (fast, precise)
- Vector search for semantic similarity (catches concepts)
- Combine and rank results by relevance
- Enrich with metadata from the dependency graph
The result structure gives agents everything they need:
struct LocalRAGSearchResult: Sendable {
let filePath: String
let startLine: Int
let endLine: Int
let snippet: String
let constructType: String? // "classDecl", "function", "component"
let constructName: String? // "PaymentService", "validateForm"
let language: String? // "Swift", "Glimmer TypeScript"
let isTest: Bool // Deprioritize test files
let aiSummary: String? // "Handles payment validation and processing"
let aiTags: [String] // ["validation", "payments", "async"]
let score: Float? // Relevance score
}
The agent gets structured answers. Not a wall of text.
MCP Server
Everything exposed via Model Context Protocol. VS Code config:
{
"mcp.servers": {
"peel": {
"url": "http://127.0.0.1:8765/rpc"
}
}
}
Now any MCP-compatible tool can query my codebase:
# Text search
curl -X POST http://127.0.0.1:8765/rpc \
-d '{"method":"tools/call","params":{"name":"rag.search","arguments":{"query":"payment validation","limit":10}}}'
# Vector search
curl -X POST http://127.0.0.1:8765/rpc \
-d '{"method":"tools/call","params":{"name":"rag.search.vector","arguments":{"query":"how does auth work"}}}'
# Dependency queries
curl -X POST http://127.0.0.1:8765/rpc \
-d '{"method":"tools/call","params":{"name":"rag.dependencies","arguments":{"filePath":"PaymentService.swift"}}}'
Copilot, Claude, Cursor, or anything else that speaks MCP gets the same capabilities.
MCP from day one meant agents could validate their own work. When an agent writes code that touches LocalRAGStore, it queries the dependency graph to see what else might break. It searches for similar patterns. It checks if tests exist.
The tool helps build itself. Every improvement to Peel makes the agents better at improving Peel.
Eight Days from Idea to Daily Driver
The git log tells the story:
January 19 (Day 1): "Start Local RAG indexing + MCP tools." SQLite schema, file scanning, line-based chunking. And the MCP server skeleton. From day one, VS Code agents could query the (terrible) early search results. This meant the agents helping me build Peel could use Peel to understand Peel.
January 19-20 (Days 1-2): System embeddings, vector search stub, UI scaffolding. Results were mediocre but the pipeline worked.
January 21 (Day 3): Core ML embeddings, context injection into agents, first draft of product manual. The "this actually works" moment.
January 26 (Day 8): AST chunking explosion. Swift via SwiftSyntax. Ruby via tree-sitter. TypeScript/Glimmer via Babel bundled for JavaScriptCore. Four commits in one day adding language-aware parsing.
January 28 (Day 10): Structured metadata extraction. Decorators, protocols, imports tracked per chunk.
January 30 (Day 12): Dependency graph. Import tracking, inheritance chains. The "what depends on this?" feature I'd been wanting.
The timeline compressed because each improvement made the next one easier. Better search meant agents found relevant code faster. Better code understanding meant agents made fewer mistakes. The tool helped build itself.
I've used it every day since. The codebase I work with has 12+ services, hundreds of thousands of lines across Swift, TypeScript, and Ruby. Before Peel, I spent significant time manually finding relevant code. Now the agents find it.
Development Workflow
I write Swift in VS Code, not Xcode. This surprises people.
Peel is a native macOS/iOS app. SwiftUI, SwiftData, strict Swift 6 concurrency. I open the .xcodeproj folder in VS Code and use Copilot agents to write code. Xcode handles building, debugging, and previews.
Why? Copilot for Xcode is progressing but still has rough edges. VS Code agent experience is smoother today. I can have an agent read MCP server output, understand what's happening, and write code that integrates correctly.
My workflow:
- VS Code: Write code with agents, use MCP tools to search and understand
- Xcode: Build, debug, run previews, profile
- Peel: The app I'm building provides tools to the agents building it
Not elegant, but fast. When Copilot for Xcode catches up, I'll consolidate. The MCP server doesn't care which IDE calls it.
Before and After
Before:
- Agent runs 5+ grep searches
- Loads 20+ files into context
- Half are irrelevant (tests, comments, string matches)
- Context at 100K+ tokens
- Agent makes educated guesses
- Code often wrong, needs multiple iterations
After:
- Agent calls RAG search once
- Gets 5-10 precise chunks
- Each chunk has metadata explaining what it is
- Context under 20K tokens
- Agent understands the structure
- Code works on first try more often
The models aren't the bottleneck. Context is. Give an AI agent structured context instead of raw grep output and it performs like a different tool.
Something I can't prove but keep noticing: the code quality is higher. Not just "it works" but polished. UX details I didn't ask for. No duplicated logic. Edge cases handled. The code built this week is hard to find fault with, and I keep looking. Maybe smaller, precise context lets the model think more clearly instead of pattern-matching through noise. Maybe I'm imagining it. But the results are there.
What still breaks: Cross-repo references. When ServiceA in one repo imports a type from ServiceB in another repo, the dependency graph loses the trail. I'm working on it. Also, the Ruby parser is too naive for metaprogramming-heavy code. Rails conventions work great. DSL-heavy gems? Not so much. Peel isn't magic. It's just better than grep.
What I'd Do Differently
Most of this worked. Some of it was luck. If I were starting over:
Start with the dependency graph. It's more valuable than I expected. Import relationships tell you more about code organization than any other signal. I built it last because I thought it was a nice-to-have. It's not.
Don't overthink chunking boundaries. My first version tried to be too clever about splitting large classes. Just keep the whole class if it's under 300 lines. Agents handle it fine.
Test search quality obsessively. The difference between "good" and "great" search results compounds. Every improvement in retrieval quality shows up immediately in agent performance.
MCP from day one. Build for integration, not standalone.
If You Want to Build This
Core technologies:
- Swift: SwiftSyntax for parsing, SQLite for storage, MLX for embeddings
- TypeScript: Babel for AST, bundled with esbuild for JavaScriptCore
- Search: Hybrid text + vector with result merging
- Integration: MCP server (JSON-RPC over HTTP)
The AST chunker for TypeScript/Glimmer is ~400 lines of JavaScript. The Swift chunker is ~600 lines. The dependency extractor is ~300 lines per language. This isn't a massive system. It's a focused tool that does one thing well.
The full Peel app has more features (agent orchestration, parallel worktrees, distributed swarm execution), but the RAG core is the foundation everything else builds on.
Bigger Picture
This is Intent-Driven Development at the infrastructure level.
Structure your code understanding with AST parsing. Tag it with semantic metadata. Build the dependency graph. Give AI agents a map instead of making them wander.
Models will keep getting better. Context windows will keep growing. The fundamental problem remains: relevance beats capacity. A focused 10K-token context outperforms a sprawling 100K-token dump.
Structure compounds. Yesterday I added HuggingFace reranking. Today the dependency graph landed. Tomorrow, who knows. Each piece makes the others more useful.
Eight days to build. Used every day since. That's the ROI on building your own tools.
Peel: Available Soon
Peel is in rapid development. I'm using it daily, shipping features by the minute, and dogfooding relentlessly. It's not ready for public release yet, but it's getting close.
When it launches, it'll be a paid app. I'm still figuring out the details, but not subscription nonsense where you lose access if you stop paying. Pay once and own it, or subscribe if you want to support ongoing development. Something fair.
I need to pay my mortgage. You need tools that work.
If you're interested in early access or want to follow development, email me: cory@crunchybananas.com. I'm building this because I needed it. Turns out I'm not the only one.
This post is part of a series on AI-augmented development. See also: Intent-Driven Development and Why We're Back.
AI Transparency: This post was written with assistance from Claude. The technical details, code examples, and architecture are from my actual implementation. Claude helped structure the narrative and tighten the prose.