Phoenix Evaluations: LLM Quality Metrics
Objective
Implement comprehensive Phoenix evaluations for agent LLM responses.
Metrics to Track
- Hallucination Detection - Check responses against source documents
- Reasoning Quality - Evaluate logical consistency
- Decision Confidence Calibration - Compare predicted confidence to actual outcomes
- Response Relevance - Measure alignment with user intent
Implementation
- Create
PhoenixEvaluator
class insrc/telemetry/phoenix-evaluator.ts
- Use Phoenix Evaluations API
- Run evaluations async after each LLM call
- Store evaluation results in Phoenix
Acceptance Criteria
- All LLM responses automatically evaluated
- Evaluations visible in Phoenix UI
- Evaluation latency <500ms
- Support custom evaluation functions