Upheal· Senior AI Engineer (Contract) · Nov 2023 – May 2026
How do you ship LLM products that don't regress in production, across 100+ releases?
Led prompt engineering, evaluation and agentic systems for AI-generated clinical progress notes — the core product of a Best-Startup-Award-winning AI documentation platform for therapists.
Owned quality and shipped 100+ production releases. Built an LLM-as-judge eval framework on Langfuse with datasets, eval runs and trace-level debugging. Productionised RAG, prompt orchestration and agentic patterns across Gemini, Claude, GPT-* and Llama; benchmarked Vertex AI, Bedrock, Azure OpenAI and Anthropic for quality, latency and cost. Automated a Claude-Agent-SDK customer-support agent. Reported AI roadmap and cost trade-offs directly to founders.
Python · TypeScript · Langfuse · Vertex AI · Bedrock · Anthropic API · Grafana · BetterStack
