Public benchmarks

How LitFin’s reasoning brain and credit-mind score against the public 2026 leaderboards. Each card lists the dataset version, the subset we ran, and the date. We publish the raw JSON alongside each result so the numbers are fully auditable.

Honest scoring: where compute permitted only a stratified subset, we label that subset clearly. We do not over-state coverage.

No benchmark results found yet. Run npm run bench:all to generate JSON files under Docs/benchmarks/.

How to reproduce

Clone the repo and install dependencies with pnpm install.
Run npm run bench:all to regenerate every JSON file under Docs/benchmarks/.
For a real LLM-backed run, set LITFIN_BENCH_LIVE=1 (and optionally LITFIN_BENCH_MODEL=sonnet). The cache makes re-runs cost-free.
See the README under Docs/benchmarks/ for SOTA references and known limitations per benchmark.

Public benchmarks

Honest scoring: where compute permitted only a stratified subset, we label that subset clearly. We do not over-state coverage.

No benchmark results found yet. Run npm run bench:all to generate JSON files under Docs/benchmarks/.

How to reproduce

Clone the repo and install dependencies with pnpm install.
Run npm run bench:all to regenerate every JSON file under Docs/benchmarks/.
For a real LLM-backed run, set LITFIN_BENCH_LIVE=1 (and optionally LITFIN_BENCH_MODEL=sonnet). The cache makes re-runs cost-free.
See the README under Docs/benchmarks/ for SOTA references and known limitations per benchmark.