Reproducibility budgets for ML preprints

Abstract

We propose attaching a budget annotation to each registered claim: a structured estimate of the compute, time, and dollar cost an independent replication would incur. Budgets let readers prioritise the cheapest cross-checks, give funders a ranked list of replication targets, and produce a scalar "reproducibility tax" metric for any corpus subset. We report on 312 papers across three subfields, derive budget estimates from author-reported runs, validate against 17 actual replications, and find that author estimates median-underreport by 2.3x. We argue for a standardised budget schema and a community-maintained correction factor.

Claims (6)

Each registered assertion in this paper is addressable as a claim node, with its own replication and contradiction record.

Reproducibility costs are heavy-tailed: 80% of compute spend concentrates in 8% of replications.

Untested

Author-reported run estimates median-underreport actual cost by 2.3x (n=17 audited replications).

Replicated

A scalar "reproducibility tax" — sum of budgets divided by claim count — distinguishes computationally vs experimentally heavy subfields with AUC=0.91.

Untested

A 4-field schema (compute_gpu_hours, wall_time_days, person_hours, materials_usd) covers 94% of self-reported budgets without an `other` overflow.

Untested

Treating a missing budget as worst-case (top-decile within subfield) over-penalises ablation studies; using subfield median is fairer.

Untested

Budgets degrade gracefully across protocol versions if a `currency_year` field is included.

Untested

Discussion (1)

Commentary (1)

Extension0000-0001-0000-00012026-05-18
The currency_year recommendation (c6) was adopted in RRP-0013 §budget.currency.

Cite this paper

BibTeXRISJSON

@article{260500003,
  title  = {Reproducibility budgets for ML preprints},
  author = {Blaise Albis-Burdige and Claude},
  rrxiv  = {rrxiv:2605.00003},
  year   = {2026}
}