Back to expertise

Expertise

Intelligent systems & ML

Production-grade intelligent systems — LLM apps, retrieval, agents, eval harnesses, and deployment patterns that hold up for real users — not demos.

Overview

I build AI that survives real users: grounded outputs, clear failure modes, and systems you can observe and improve. That means retrieval you can trust, evals that match production traffic, and deployment patterns that don’t fall over when load or cost spikes.

Typical engagements

  • Assistants and copilots with tool use, tenancy, and audit-friendly logging.
  • RAG pipelines with versioning, chunking strategy, and re-ingestion that doesn’t break the world.
  • Evaluation harnesses: offline sets plus live sampling, latency and cost dashboards.
  • Partnering with security and compliance on data retention, access, and red-team style reviews.

How I work

  • Start from the product promise and risk profile — not from the latest model name.
  • Ship thin vertical slices, then harden: observability before scale.
  • Treat prompts and tools as code: review, version, and test.

Tools & context

Python & TypeScriptOpenAI / Anthropic / open-source LLMsVector DBs & embeddingsKubernetes & observability stacks

Want to talk about something in this space?

Get in touch