Document extraction & validation
PDFs into clean, structured JSON — then every total re-derived (foot, crossfoot, articulate) and locked with golden-file tests. No OCR for born-digital text.
I turn messy PDFs and documents into clean, validated, structured data — and build the AI systems and SaaS products around them.
I ship real products — then make the results provably correct: tested code does the math, not the model.
What I do
A finance background, an engineer's hands, and a hard rule: tested code proves the answer — the model never gets the last word on a number.
PDFs into clean, structured JSON — then every total re-derived (foot, crossfoot, articulate) and locked with golden-file tests. No OCR for born-digital text.
Multi-agent systems, RAG, and LLM pipelines built behind verification — sealed holdouts and fresh-context critics — so improvements are earned, not hallucinated.
Production B2B products end to end — auth, multi-tenant data, billing, payroll-format exports, audit trails, and dashboards. Real users, real subscriptions.
Selected work
Five projects across live SaaS, document intelligence, autonomous AI, and native C++. Read the full write-ups →
Helps tipped-industry employers track and document the “no tax on tips & overtime” deductions under the One Big Beautiful Bill Act: automatic Treasury Tipped Occupation Code assignment, FLSA overtime, W-2 Box 14 exports (ADP/Gusto/QuickBooks), an audit trail, multi-tenant role-based access, and an analytics dashboard. Subscriptions plus a free trial.
Visit the live productAn autonomous system that improves its own code behind a 7-tier verification gauntlet (parse, unit, property, mutation, benchmark, sealed holdout, fresh-context critic) — so improvements are earned, not hallucinated. A panel of specialized agents (planner, coder, tester, reviewer, critic) plus meta-improvement loops, all sandboxed.
Improvements gated by a sealed holdout + fresh-context critic
Inner research loop ported from Udit Goenka’s autoresearch (MIT, based on Andrej Karpathy’s work); the meta-improvement architecture and verification stack are my own.
Turns a government audit statement (PDF) into clean structured JSON, then re-derives every total to prove the extraction is correct, with a golden-file regression test. The model never does the arithmetic — tested code does.
25 checks, 0 exceptions — inject one wrong figure and it’s caught
Full-stack B2B SaaS that auto-categorizes nonprofit grant spending into 2 CFR 200 budget categories, tracks budget-to-actual per grant, and generates audit-ready compliance reports, with QuickBooks/Xero integration.
Audit-ready compliance reports, budget-to-actual per grant
A native C++20 + SDL2 Pac-Man built from scratch — co-op multiplayer, four distinct ghost AIs, procedural chiptune audio, cross-compiled to an ARM handheld, with sanitizer/coverage/ strict build presets and tests. Around 3,000 lines.
About
I’m an AI & document-automation engineer with a finance background. I build document-extraction pipelines, AI systems, and full-stack SaaS — with a focus on output that is verifiable, not just plausible.
Over the past year I designed and shipped five production document-automation tools for a public-accounting firm (Millhuff-Stang, CPA), built a live tax-compliance SaaS (OBBBA Tracker), and built a self-improving multi-agent engineering harness. Earlier I built a Retrieval-Augmented Generation (RAG) document system for a law firm and a case-management web application for a legal practice.
More about me, the full stack & credentials
Writing & research
Short, sanitized pieces on verifiable AI. Read them all →
Why deterministic extraction + tie-out validation + golden-file tests beat trusting an LLM with numbers.
02How AgentA keeps an autonomous improvement loop honest with a sealed holdout and a fresh-context critic.
03Shipping a real tax-compliance SaaS: data model, payroll-format exports, and keeping a compliance product trustworthy.
Contact
I’m available for contract or full-time AI, automation, and document-intelligence work. The fastest way to reach me is email — no forms, no backend, just say hello.