No server. No signup. Multi-objective scoring from YAML specs. Deterministic code judges + customizable LLM judges, version-controlled in Git.
No cloud dependency. All data stays on your machine. Zero overhead to get started.
Correctness, latency, cost, and safety measured in a single evaluation run.
Deterministic code validators and customizable LLM judges, composable and extensible.
Direct LLM providers plus Claude Code, Codex, Pi, Copilot, OpenCode agent targets.
Structured criteria with weights and auto-generation. Google ADK-style object rubrics.
Compare evaluation runs side-by-side with statistical deltas and regression detection.
npm install -g agentv agentv init Copy .env.example to .env and add your API keys.
description: Math evaluation
execution:
target: default
tests:
- id: addition
criteria: Correctly calculates 15 + 27 = 42
input: What is 15 + 27? agentv eval ./evals/example.yaml | Feature | AgentV | LangWatch | LangSmith | LangFuse |
|---|---|---|---|---|
| Setup | npm install | Cloud account + API key | Cloud account + API key | Cloud account + API key |
| Server | None (local) | Managed cloud | Managed cloud | Managed cloud |
| Privacy | All local | Cloud-hosted | Cloud-hosted | Cloud-hosted |
| CLI-first | ✓ | ✗ | Limited | Limited |
| CI/CD ready | ✓ | Requires API calls | Requires API calls | Requires API calls |
| Version control | ✓ YAML in Git | ✗ | ✗ | ✗ |
| Evaluators | Code + LLM + Custom | LLM only | LLM + Code | LLM only |