Loading...
Loading...
Loading...
How LitFin’s reasoning brain and credit-mind score against the public 2026 leaderboards. Each card lists the dataset version, the subset we ran, and the date. We publish the raw JSON alongside each result so the numbers are fully auditable.
Honest scoring: where compute permitted only a stratified subset, we label that subset clearly. We do not over-state coverage.
npm run bench:all to generate JSON files under Docs/benchmarks/.pnpm install.npm run bench:all to regenerate every JSON file under Docs/benchmarks/.LITFIN_BENCH_LIVE=1 (and optionally LITFIN_BENCH_MODEL=sonnet). The cache makes re-runs cost-free.