Posted on 06 May 2026

Founding AI Evaluation & Trace Engineer

Remote
Permanent
DOE

Applied AI

Our client is building a platform that enables AI systems to reason over complex, high-stakes data. Think structured evidence, multi-step workflows, and systems that don’t just generate answers - they justify them.

They’re hiring a Founding AI Evaluation & Trace Engineer to own how system performance is measured, understood, and improved.

This isn’t about tweaking prompts. This is about building the engine that makes AI systems get better over time.

What You'll Do

  • Own the evaluation layer for multi-step AI systems - what happened, why it happened, and how to improve it
  • Build trace infrastructure to capture reasoning, decisions, and failures across workflows
  • Design benchmarks, scoring systems, and regression tests that actually reflect real-world performance
  • Turn expert feedback into structured data (labels, rubrics, preference sets)
  • Diagnose where things break: reasoning, retrieval, data gaps, prompts, tooling
  • Create comparison tooling across models, prompts, and system versions
  • Lay the groundwork for future model improvement (distillation, fine-tuning, synthetic data)

What They Want

  • You know how to tell if an AI system is actually improving - not just sounding better
  • You can turn messy human feedback into clear evaluation frameworks
  • You’ve worked with LLMs, agents, or multi-step systems
  • You understand where AI fails, and how to isolate the cause
  • You can build, not just think (evaluation tools, pipelines, infra)

Nice To Have

  • Experience with LLM evaluation, benchmarking, or human-in-the-loop systems
  • Exposure to post-training methods (RL, distillation, etc.)
  • Background in high-stakes domains where accuracy matters

Why This Role

  • Founding-level ownership: you define how quality is measured
  • Direct impact on how the system improves, not just what it outputs
  • Work at the intersection of AI performance, product, and real-world use

What it's Not

  • Not a pure research / model training role
  • Not prompt tinkering
  • Not dashboards nobody uses
Ready to expand?

Let’s build your team

LOGO ACE WHITE DARK BLUE

Have a question?

Our team are ready to assist you. Get in touch, and we’ll get back to you shortly.

Get in touch
Footer image

A recruitment firm dedicated to sourcing top talent for the US Manufacturing & Life Science sectors.