//
··
May 26, 2026
Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at rrxiv.com/papers/rrxiv:2605.00008.
A preprint's claims are not a homogeneous block; they age, replicate, and fail at different rates. We argue that the natural unit of replication is the individual claim, and we encode that argument operationally: every numbered claim below carries a structured active_replication annotation naming the replicating team, the start timestamp, and an expected-completion date. From an instrumentation run of the rrxiv reference instance ( preprint–replication pairs across 14 months), pre-registering a replication target on a claim shifts median completion forward by approximately six weeks against a matched unregistered baseline. The paper is therefore both a substantive measurement of registration's effect on replication latency, and the canonical worked example of the active-replication pattern: it self-references its own annotations as evidence.
Most replication infrastructure treats a preprint as the unit of replication: someone announces that they will “replicate the paper”, usually on a personal homepage or in a tweet, sometimes a year after publication, and there is no canonical place to find that announcement again. This framing has two costs. First, replication completion times are bimodal and very long-tailed — a non-trivial fraction of announced replications never resolve, and there is no way for a funder to distinguish “in progress” from “abandoned” without writing email. Second, the whole-paper framing hides which claim is actually under test. A paper with eight empirical claims that has one widely replicated headline result is not the same epistemic object as one with eight independently tested claims; current bibliographic infrastructure cannot tell those apart.
The rrxiv protocol [rrxiv-whitepaper] pushes the unit of replication down one level: each claim has a stable identifier (<paper_id>:claim:<label>), and annotations attach to that identifier, not to the paper. This paper exercises that affordance. We define a small annotation schema, active_replication, with six fields — target claim, replication team, start timestamp, expected completion date, methodology summary, and code repository — and we post one such annotation per claim in this paper before publication. The annotations are visible in the CIR sidecar, queryable from the rrxiv API, and, critically, dated: a third party can compute, at any later moment, whether the registered expected-completion date has slipped and by how much.
The substantive contribution is an estimate of the registration effect itself. Across replication attempts logged in the rrxiv reference instance between 2025-03 and 2026-05, claims with a pre-registered active_replication annotation reached the “replication_complete” state at a median 41 days earlier than matched unregistered attempts on comparable claims. We report the estimate, the matching procedure, the residual confounds, and the open question of what happens when a registered team disappears.
The roadmap. Section sec:background situates the pattern relative to existing claim-graph and preprint infrastructure. Section sec:approach describes the annotation schema, the matching procedure, and the measurement window. Section sec:claims states each of seven registered claims, with the actual active_replication annotation reproduced inline as an existence-proof of the pattern. Section sec:discussion discusses what registration does and does not buy, including the abandonment-risk open question.
This work sits at the intersection of three threads. The claim-graph thread [rrxiv:2605.00002]{rrxiv-claimgraph} argues that the unit of citation, replication, and contradiction should be the individual claim, not the paper. The reproducibility-budget thread [rrxiv:2605.00003]{rrxiv-repro-budgets} attaches per-paper compute/data signals so that “reproducible” becomes a measurable annotation rather than a binary label. The replication-latency thread, with which this paper engages directly, has historically been studied in coarse aggregate [camerer-2018,errington-2021]: how many announced replications resolve, on what timescale, and with what concordance to the original. The contribution here is to instrument the latency question at the claim-annotation layer, where the relevant signals (registration, methodology, code link) are already structured.
Pre-registration has a deep literature in psychology and clinical trials and is widely credited with reducing publication bias and selective reporting [nosek-2018]. Pre-registering a replication is rarer, partly because there has historically been no canonical surface for the announcement to attach to. The active-replication annotation pattern fills exactly that gap: it provides the surface, with timestamps, on the claim itself.
A note on scope. We are not arguing that registration causes faster science — only that it shifts median completion forward in a regime where the alternative is an unsurfaced personal commitment. The mechanism is mundane: a registered date is visible to collaborators, to funders, and to the team themselves, and slips are observable. The size of the effect, and its dependence on team composition and topic, is what this paper measures.
\subsection{The active_replication annotation} The annotation is a JSON document attached to a claim ID. Its schema is intentionally minimal:
{ "kind": "active_replication", "target_claim": "rrxiv:2605.00004:claim:c2", "replication_team": ["orcid:0000-0002-1825-0097", ...], "started_at": "2026-04-18T00:00:00Z", "expected_completion_at": "2026-09-30", "methodology_summary": "Resample n=200 ML preprints, re-run the shrinkage estimator from sec 3 against held-out 2025-Q4 data.", "code_repo": "https://github.com/rrxiv-replicators/c2-shrinkage" }
The annotation is posted via the rrxiv API and surfaces both in the paper's CIR sidecar and in the per-claim view. The two timestamp fields — started_at and expected_completion_at — are the load-bearing ones for the measurement reported in claim claim:c7. A replication is considered “complete” when the team posts a follow-up annotation of kind replication_result (success, partial, or contradicted) on the same target claim.
For the estimate underlying claim claim:c7, we observe replication attempts in the rrxiv reference instance over 14 months. Of these, 184 carried an active_replication annotation before the work began and 128 did not (the latter were detected after the fact by parsing follow-up replication_result annotations and back-dating). We match each unregistered replication to a registered one on (a) the topic tags of the target claim, (b) the claim's evidence type, and (c) the number of inbound depends_on edges on the target claim. Median completion time is then compared on the matched cohort.
The matched cohort gives a median delta of days for registered attempts, with a bootstrap 95\% interval of days. The headline figure “six weeks” rounds the point estimate. We do not claim a causal effect; the natural confounder is self-selection (teams who register may be more organised in general). Section sec:discussion addresses this directly.
Each of the seven claims in section sec:claims below has an active_replication annotation posted at submission time. Table tab:registry summarises the registry. Concrete annotations are shown alongside claims claim:c1 and claim:c5 as worked examples; the others follow the same shape.
{@{}llll@{}} Claim & Replicating team (handle) & Started & Expected \\ claim:c1* & title-length-group@tuebingen & 2026-05-01 & 2026-07-15 \\ claim:c2 & search-ctr-coop & 2026-05-04 & 2026-08-20 \\ claim:c3 & citation-network-lab@ut & 2026-05-10 & 2026-10-01 \\ claim:c4 & ir-eval-collective & 2026-05-12 & 2026-08-30 \\ claim:c5 & repro-budget-rereaders & 2026-05-15 & 2026-09-10 \\ claim:c6 & orcid-coverage-audit & 2026-05-18 & 2026-11-01 \\ claim:c7 & rrxiv-instance-internal & 2026-05-20 & 2026-12-31 \\
\caption{The seven claims of this paper, each registered for active replication at submission. {* c1 carries “Replication status: replicated” from an earlier independent attempt; the entry above registers a second independent replication on the T\"ubingen geo mirror, which is the pattern when a corpus wants more than one cross-check on a load-bearing finding.}}
This is the “double duty” framing: claim claim:c7 is the substantive empirical claim about registration effect, and the annotations on claim:c1–claim:c7 are also instances of the very pattern claim:c7 is measuring. The instance's future state will, in particular, be testable against claim:c7's predicted six-week shift.
Preprint titles longer than 12 words receive 18\% less cross-domain attention (median, n=4{,}800 papers).
Replication status: replicated.
The signal here is robust because the title-length feature is cheap to extract and the outcome (cross-domain reads, defined as a read by a user whose declared primary topic differs from the paper's primary topic) is logged at the rrxiv access tier. The 12-word threshold is not magic; it is the breakpoint at which a regression discontinuity emerges in the access log. Below 12 words, cross-domain attention scales roughly linearly with abstract specificity; above 12 words, an additional title word predicts a 1.5\% drop in cross-domain reads on average.
The annotation registered against this claim, as posted on 2026-05-01, is:
{ "kind": "active_replication", "target_claim": "rrxiv:2605.00008:claim:c1", "replication_team": ["orcid:0000-0001-7821-3344"], "started_at": "2026-05-01T09:00:00Z", "expected_completion_at": "2026-07-15", "methodology_summary": "Re-extract title-length feature on the Tubingen mirror of the rrxiv access log (different geo distribution); re-fit the discontinuity at w=12.", "code_repo": "https://github.com/title-length-group/c1-redo" }
Adding a structured abstract correlates with 22\% higher click-through from search results.
Replication status: untested.
This claim depends on the title-length result — both regress an attention outcome on a surface feature of the paper, and the title-length cohort is used as a control variable when estimating the structured-abstract effect. We list a depends_on edge accordingly. The 22\% figure is from a within-paper A/B (papers that added a structured abstract in a v2 revision, compared against their own v1), and is robust to dropping the bottom decile of papers by access count.
Domain experts cite within their own subfield 4x more than cross-domain.
Replication status: untested.
The 4x figure is the ratio of within-subfield citations to expected cross-domain citations under a topic-uniform null. It depends on the discoverability story: the same titling and structured-abstract decisions that suppress cross-domain reads (claims claim:c1 and claim:c2) plausibly drive the in-subfield concentration of citations downstream. The replication will independently measure citations on the topic-tagged citation graph and is expected to converge or contradict by 2026-10.
Section-level retrieval beats whole-paper retrieval on recall@5 for narrow technical queries.
Replication status: untested.
This is the cleanest IR claim in the bundle. Section embeddings are computed off the parsed CIR (one embedding per section); whole-paper embeddings are computed off concatenated text. The recall@5 gap is largest for queries phrased as a single dense technical term (e.g.\ “Krippendorff alpha” or “Cramer–Rao bound”); for diffuse multi-concept queries, the gap closes. The independent replication will run on a different embedding model (the original used rrxiv-embed-v3; the replicators will use the public instructor-xl) to test for model dependence.
The reproducibility-budget signal is stable across three independent reannotation rounds (Krippendorff's alpha = 0.79).
Replication status: untested.
This claim sits on top of the reproducibility-budget construct introduced in rrxiv:2605.00003; the inter-annotator agreement is the kind of signal-stability check that determines whether the budget is a usable feature in downstream models. is at the high end of “substantial” agreement on the Krippendorff scale and is robust to dropping the most experienced annotator from each round. The annotation registered against this claim, posted on 2026-05-15, is:
{ "kind": "active_replication", "target_claim": "rrxiv:2605.00008:claim:c5", "replication_team": ["orcid:0000-0003-1122-4455", "orcid:0000-0002-9988-1010"], "started_at": "2026-05-15T00:00:00Z", "expected_completion_at": "2026-09-10", "methodology_summary": "Recruit 4 new annotators (no overlap with original 3); re-run the budget-tagging protocol on the same 120-paper sample; recompute alpha.", "code_repo": "https://github.com/repro-budget-rereaders/c5" }
Author ORCID coverage above 70\% is necessary (but not sufficient) for accurate cross-paper deduplication.
Replication status: untested.
The 70\% threshold is empirical: below it, name-collision rates dominate the dedup error budget; above it, ORCID-anchored matching plateaus and the residual error comes from name-only entries and from authors who hold multiple ORCIDs. The “necessary but not sufficient” qualifier matters: a paper with 100\% ORCID coverage still inherits its co-authors' lower-coverage records, so dedup quality is bounded by the worst neighbour.
Pre-registering a replication target shifts the median completion time forward by 6 weeks vs unregistered replications.
Replication status: untested.
This is the substantive claim of the paper. The point estimate (41 days, rounded to six weeks) is derived from the matched cohort described in section sec:approach; the bootstrap 95\% interval is days. It depends on claim claim:c5's annotator-stability finding because the matching procedure relies on the reproducibility-budget tag as one of the matching covariates, and a wobbly tag would inflate the variance of the estimate. The internal replication rrxiv-instance-internal will redo the analysis with a different matching radius (loosening the topic-tag constraint from exact to one-step-removed in the topic taxonomy) and report by 2026-12-31.
The proximate mechanism is unsurprising: a registered date is observable, and slips against it create social pressure on the team. The deeper question is whether the effect would survive in a world where all replications were registered (so the social-pressure signal is no longer differential). We expect attenuation but not disappearance; the registration also helps the team itself plan around the date.
rrxiv:2605.00008:claim:c7{rrxiv:2605.00004:claim:c1}
The six-week shift is large enough to matter operationally. For a funder running a replication-funding tranche, the difference between “median completion in 4 months” and “median completion in 5.5 months” is the difference between budget cycles. For a journal editor deciding whether to wait on a replication before issuing a correction, the same shift can determine whether the correction is in this volume or the next. The shift is not, however, large enough to wave away the selection-confound: teams that register may simply be the teams that finish.
What the registration buys with high confidence, and where the selection-confound bites less, is visibility. Whether or not registered replications are faster on average, they are findable: the rrxiv API exposes a stable endpoint /claims/<id>/active_replications, and a funder can query it directly. This converts the question “what is being replicated right now?” from a literature-search problem into a database query. The instance-internal replication of claim claim:c7 (Table tab:registry, row 7) will, by design, test whether the visibility effect is separable from the latency effect.
A scope note. We are not arguing that every claim should be replicated, or that the volume of replication is the bottleneck. Many claims do not warrant the cost. The active_replication annotation is a coordination affordance, not a mandate; it makes replication legible when someone chooses to do it.
We do not address (a) the replication-quality question — a registered, on-time replication can still be methodologically weak; (b) the question of who is allowed to mark a replication “complete” — the current convention is the registering team's self-report, which is auditable but not adjudicated; or (c) the meta-replication of claim claim:c7 itself across different rrxiv instances. The third is on the roadmap once a second instance is live.
The annotation schema currently has no built-in handling for the case where a registering team disappears mid-effort: PhD students graduate, labs close, individuals leave the field. Naive handling — letting the expected_completion_at field silently lapse — produces a “ghost replication” state that looks the same on the registry as a slow-but-progressing one. We sketch three candidate mechanisms: (i) heartbeat annotations posted at a configurable cadence, with the registration auto-expiring after missed heartbeats; (ii) a third-party “takeover” annotation that explicitly transfers the registration to a new team; (iii) instance-level garbage collection that flags any registration whose expected_completion_at has slipped by more than the original window. The right answer is probably some combination, but the design space is open. This question is, itself, a candidate for active replication once the schema is settled.
This paper is generated as part of the rrxiv reference corpus. The double-duty framing (paper as substantive contribution AND worked example of the pattern it describes) is intentional and is the prototype for future protocol-demonstration papers in the instance.
rrxiv:2605.00001. The rrxiv protocol: claims, evidence, and annotations as the substrate for preprint discourse. 2026.rrxiv:2605.00002. The claim graph as a first-class artifact. 2026. Establishes the per-claim addressability that this paper's annotation surface rests on.rrxiv:2605.00003. Reproducibility budgets for ML preprints. 2026. Source of the budget signal whose stability claim claim:c5 measures.rrxiv:2605.00004. A negative result on shrinkage estimators in small-N replication. 2026. A worked example of the kind of paper whose individual claims benefit from active-replication annotations.