Pipeline
Configure how models work together on your tasks
Enter a prompt to see costs
--
Cost comparison
Baseline:
Type a prompt below to see cost estimates
Quality estimate
Pipeline quality~109%vs Opus
Range: 82–135%·medium confidence
Task mix: router prior. Benchmarks: prompt: arena_hard / alpaca_eval_2 (lineage); agent: SWE-bench Verified (lineage); reasoning: AIME 2025 / GPQA Diamond (exact).
Baseline quality: ~100% for Opus 4