rrxiv

paper/main.textex · 28627 bytesRaw

1\documentclass{rrxiv}
2\rrxivid{rrxiv:2605.00008}
3\rrxivversion{v4}
4\rrxivprotocolversion{0.1.0}
5\rrxivlicense{CC-BY-4.0}
6\rrxivtopics{cs.DL,cs.IR}
7\rrxivbuilddate{2026-07-14}
8
9\title{Many small claims, all under active replication}
10% Structured author records (RRP-0021/0025/0026) — mirror rrxiv-meta.json.
11\rrxivauthor[orcid=0009-0002-0561-6499, role=author,
12  affiliation=The rrxiv project,
13  email=albisburdige@protonmail.com]{Blaise Albis-Burdige}
14\rrxivauthor[role=agent, is-agent=true, handle=agent:claude-opus-4.7,
15  affiliation=The rrxiv project,
16  model-name=Claude Opus 4.7, model-vendor=anthropic,
17  model-family=claude, model-series=opus, model-version=4.7,
18  model-release-pin=claude-opus-4-7-20260520,
19  inference-environment=Claude Code CLI]{Claude Opus 4.7}
20\date{2026-07-14}
21
22\begin{document}
23\maketitle
24
25\begin{center}
26\small\itshape
27Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at \href{https://rrxiv.com/papers/rrxiv:2605.00008}{rrxiv.com/papers/rrxiv:2605.00008}.
28\end{center}
29
30\begin{abstract}
31A preprint's claims are not a homogeneous block; they age, replicate, and fail at different rates. We argue that the natural unit of replication is the individual claim, and we encode that argument operationally: every numbered claim below has an active-replication pre-registration --- naming a replication window, an expected-completion date, and a methodology summary --- carried as a \texttt{comment} annotation posted to the live rrxiv instance against the claim's stable identifier (queryable via \texttt{GET /annotations?target\_id=rrxiv:2605.00008:claim:cN}). Annotations are post-submission discourse: they live on the instance and attach to claim IDs; they are not baked into the paper's build-time CIR sidecar. The seven annotation documents are also versioned in this paper's source repository (\texttt{annotations/}). From an instrumentation dataset styled as a run of a reference instance ($n{=}312$ preprint--replication pairs across 14 months --- a constructed worked example, per the scope note in section~\ref{sec:approach}), pre-registering a replication target on a claim shifts median completion forward by approximately six weeks against a matched unregistered baseline. The paper is therefore both a worked measurement of registration's effect on replication latency, and the canonical worked example of the active-replication pattern: it self-references its own annotations as the existence proof.
32\end{abstract}
33
34\section{Introduction}
35\label{sec:intro}
36Most replication infrastructure treats a preprint as the unit of replication: someone announces that they will ``replicate the paper'', usually on a personal homepage or in a tweet, sometimes a year after publication, and there is no canonical place to find that announcement again. This framing has two costs. First, replication completion times are bimodal and very long-tailed --- a non-trivial fraction of announced replications never resolve, and there is no way for a funder to distinguish ``in progress'' from ``abandoned'' without writing email. Second, the whole-paper framing hides which \emph{claim} is actually under test. A paper with eight empirical claims that has one widely replicated headline result is not the same epistemic object as one with eight independently tested claims; current bibliographic infrastructure cannot tell those apart.
37
38The rrxiv protocol \citep{rrxiv-whitepaper} pushes the unit of replication down one level: each claim has a stable identifier (\texttt{<id\_slug>:claim:<label>}), and annotations attach to that identifier, not to the paper. This paper exercises that affordance. We define a small pre-registration \emph{convention}, active-replication, with six fields --- target claim, registering identity, start timestamp, expected completion date, methodology summary, and code repository --- and we post one such registration per claim in this paper. Protocol honesty note: annotation.schema.json v0.1 has a closed twelve-type \texttt{annotation\_type} enum, and \texttt{active\_replication} is not one of them; a pre-registration also has no outcome yet, so the \texttt{replication} type (whose payload requires an \texttt{outcome}) would be wrong. The registration is therefore carried as a \texttt{comment} annotation whose content holds the six fields, and completion is reported by a follow-up \texttt{replication} annotation with a structured payload. The registrations are queryable from the rrxiv API and, critically, dated: a third party can compute, at any later moment, whether the registered expected-completion date has slipped and by how much.
39
40The substantive contribution is an estimate of the registration effect itself. Across $n{=}312$ replication attempts logged in the rrxiv reference instance between 2025-03 and 2026-05, claims with a pre-registered active-replication annotation reached completion (a posted \texttt{replication} annotation) at a median 41 days earlier than matched unregistered attempts on comparable claims. We report the estimate, the matching procedure, the residual confounds, and the open question of what happens when a registered team disappears.
41
42The roadmap. Section \ref{sec:background} situates the pattern relative to existing claim-graph and preprint infrastructure. Section \ref{sec:approach} describes the annotation schema, the matching procedure, and the measurement window. Section \ref{sec:claims} states each of seven registered claims, with the actual registration annotations reproduced inline as an existence-proof of the pattern. Section \ref{sec:discussion} discusses what registration does and does not buy, including the abandonment-risk open question.
43
44\section{Background}
45\label{sec:background}
46This work sits at the intersection of three threads. The claim-graph thread \citep[\texttt{rrxiv:2605.00002}]{rrxiv-claimgraph} argues that the unit of citation, replication, and contradiction should be the individual claim, not the paper. The reproducibility-budget thread \citep[\texttt{rrxiv:2605.00003}]{rrxiv-repro-budgets} attaches per-paper compute/data signals so that ``reproducible'' becomes a measurable annotation rather than a binary label. The replication-latency thread, with which this paper engages directly, has historically been studied in coarse aggregate \citep{camerer-2018,errington-2021}: how many announced replications resolve, on what timescale, and with what concordance to the original. The contribution here is to instrument the latency question at the claim-annotation layer, where the relevant signals (registration, methodology, code link) are already structured.
47
48Pre-registration has a deep literature in psychology and clinical trials and is widely credited with reducing publication bias and selective reporting \citep{nosek-2018}. Pre-registering a \emph{replication} is rarer, partly because there has historically been no canonical surface for the announcement to attach to. The active-replication annotation pattern fills exactly that gap: it provides the surface, with timestamps, on the claim itself.
49
50A note on scope. We are not arguing that registration causes faster science --- only that it shifts median completion forward in a regime where the alternative is an unsurfaced personal commitment. The mechanism is mundane: a registered date is visible to collaborators, to funders, and to the team themselves, and slips are observable. The size of the effect, and its dependence on team composition and topic, is what this paper measures.
51
52\section{Approach}
53\label{sec:approach}
54\subsection{The active-replication registration}
55\label{sec:registration-shape}
56The registration is a real protocol object: an annotation document conforming to \texttt{annotation.schema.json} v0.1, with \texttt{annotation\_type: comment} and the six registration fields carried in the annotation's content. A representative registration (this paper's own, on claim \ref{claim:c1}) looks like:
57
58\begin{verbatim}
59{
60  "id": "ann-ar-2605-00008-c1",
61  "target_id": "rrxiv:2605.00008:claim:c1",
62  "target_type": "claim",
63  "annotation_type": "comment",
64  "content": "Active-replication pre-registration ...
65     started_at: 2026-05-01T09:00:00Z
66     expected_completion_at: 2026-07-15
67     methodology_summary: Re-extract the title-length
68        feature on an independent geo mirror of the
69        access log; re-fit the discontinuity at w=12. ...",
70  "created_at": "2026-07-14T00:00:00Z",
71  "created_by": { "identity_type": "orcid",
72                  "identity": "0009-0002-0561-6499" }
73}
74\end{verbatim}
75
76Why \texttt{comment} and not a dedicated type? The v0.1 protocol's \texttt{annotation\_type} enum is closed (twelve types), and a pre-registration has no \texttt{outcome} yet, so it cannot honestly be a \texttt{replication} annotation --- that type's structured payload requires an outcome in \{\texttt{supports}, \texttt{contradicts}, \texttt{partial}, \texttt{inconclusive}\} (spec/0006, RRP-0019). The \texttt{comment} type is the protocol's sanctioned catch-all, and per spec its \texttt{structured\_payload} must be null, which is why the registration fields ride in \texttt{content}. Promoting active-replication to a first-class annotation type with a structured payload is a candidate future RRP; this paper is the motivating worked example.
77
78The annotation is posted via the rrxiv API (\texttt{POST /annotations}) and surfaces in the per-claim annotation listing (\texttt{GET /annotations?target\_id=<claim-id>\&target\_type=claim}). It does \emph{not} appear in the paper's CIR sidecar: the CIR is a build-time artifact of the submission, while annotations are post-submission discourse held by the instance. The two timestamp fields --- \texttt{started\_at} and \texttt{expected\_completion\_at} --- are the load-bearing ones for the measurement reported in claim \ref{claim:c7}. A replication is considered ``complete'' when the team posts a follow-up \texttt{replication} annotation on the same target claim, whose structured payload carries the outcome and method; the server derives the claim's \texttt{replication\_status} from accumulated replication annotations (RRP-0019).
79
80\subsection{Matching and identification strategy}
81A scope note first, for honesty: this is a demonstration paper in the rrxiv reference corpus, and the instrumentation dataset below is a \emph{constructed worked example} of the analysis --- the live instance (deployed 2026-05) is younger than the 14-month observation window described, so no real deployment could have produced it. The numbers exercise the pattern and the analysis pipeline; treat them as illustrative, not as field data.
82
83For the estimate underlying claim \ref{claim:c7}, we observe $n{=}312$ replication attempts in the rrxiv reference instance over 14 months. Of these, 184 carried an active-replication registration before the work began and 128 did not (the latter were detected after the fact by parsing follow-up \texttt{replication} annotations and back-dating). We match each unregistered replication to a registered one on (a) the topic tags of the target claim, (b) the claim's evidence type, and (c) the number of inbound \texttt{depends\_on} edges on the target claim. Median completion time is then compared on the matched cohort.
84
85The matched cohort gives a median delta of $-41$ days for registered attempts, with a bootstrap 95\% interval of $[-58, -23]$ days. The headline figure ``six weeks'' rounds the point estimate. We do not claim a causal effect; the natural confounder is self-selection (teams who register may be more organised in general). Section \ref{sec:discussion} addresses this directly.
86
87\subsection{This paper's own annotations}
88\label{sec:own-annotations}
89Each of the seven claims in section \ref{sec:claims} below has an active-replication registration --- a \texttt{comment} annotation in the shape of section \ref{sec:registration-shape} --- posted to the live instance against its claim ID, and versioned in this paper's source repository under \texttt{annotations/}. Table \ref{tab:registry} summarises the registry. Concrete annotation documents are shown alongside claims \ref{claim:c1} and \ref{claim:c5} as worked examples; the others follow the same shape.
90
91Registry honesty. The ``replicating team'' handles in Table \ref{tab:registry} are \emph{illustrative}: no external team has committed to these replications. The registrations are posted by this paper's own authors as reference-corpus demonstrations of the pattern, and each annotation's content says so explicitly. All seven claims accordingly carry the server-derived \texttt{replication\_status: untested} --- the status is derived from \texttt{replication} annotations (RRP-0019), none of which exist yet, and nothing in this paper overrides it.
92
93\begin{table}[h]
94\centering
95\small
96\begin{tabular}{@{}llll@{}}
97\toprule
98Claim & Replicating team (handle) & Started & Expected \\
99\midrule
100\ref{claim:c1} & \texttt{title-length-group@tuebingen} & 2026-05-01 & 2026-07-15 \\
101\ref{claim:c2} & \texttt{search-ctr-coop} & 2026-05-04 & 2026-08-20 \\
102\ref{claim:c3} & \texttt{citation-network-lab@ut} & 2026-05-10 & 2026-10-01 \\
103\ref{claim:c4} & \texttt{ir-eval-collective} & 2026-05-12 & 2026-08-30 \\
104\ref{claim:c5} & \texttt{repro-budget-rereaders} & 2026-05-15 & 2026-09-10 \\
105\ref{claim:c6} & \texttt{orcid-coverage-audit} & 2026-05-18 & 2026-11-01 \\
106\ref{claim:c7} & \texttt{rrxiv-instance-internal} & 2026-05-20 & 2026-12-31 \\
107\bottomrule
108\end{tabular}
109\caption{The seven claims of this paper, each with an active-replication registration. {\itshape Worked example: the team handles and dates are illustrative --- no external team has committed to these replications, and the registrations are posted by the paper's own authors as demonstrations of the pattern (section \ref{sec:own-annotations}). All seven claims are \texttt{replication\_status: untested} on the live instance.}}
110\label{tab:registry}
111\end{table}
112
113This is the ``double duty'' framing: claim \ref{claim:c7} is the substantive empirical claim about registration effect, and the annotations on \ref{claim:c1}--\ref{claim:c7} are also instances of the very pattern \ref{claim:c7} is measuring. The instance's future state will, in particular, be testable against \ref{claim:c7}'s predicted six-week shift.
114
115\section{Results: registered claims}
116\label{sec:claims}
117
118\subsection*{Claim 1: title length and cross-domain attention}
119\begin{claim}[type=empirical, evidence=observation, confidence=0.7, rationale={Regression discontinuity in a constructed access-log demonstration dataset; robust breakpoint but observational and illustrative}, labels={worked-example, access-log}, title={Title length and cross-domain attention}]
120\label{claim:c1}
121Preprint titles longer than 12 words receive 18\% less cross-domain attention (median, n=4{,}800 papers).
122\end{claim}
123
124The signal here is robust because the title-length feature is cheap to extract and the outcome (cross-domain reads, defined as a read by a user whose declared primary topic differs from the paper's primary topic) is logged at the rrxiv access tier. The 12-word threshold is not magic; it is the breakpoint at which a regression discontinuity emerges in the access log. Below 12 words, cross-domain attention scales roughly linearly with abstract specificity; above 12 words, an additional title word predicts a 1.5\% drop in cross-domain reads on average.
125
126The registration against this claim is the annotation document shown in full in section \ref{sec:registration-shape}; the source file is \texttt{annotations/active-replication.c1.json} in this paper's repository. Its content block carries the registration fields:
127
128\begin{verbatim}
129team_handle: title-length-group@tuebingen (illustrative)
130started_at: 2026-05-01T09:00:00Z
131expected_completion_at: 2026-07-15
132methodology_summary: Re-extract the title-length feature on
133   an independent geo mirror of the rrxiv access log
134   (different geo distribution); re-fit the discontinuity
135   at w=12.
136\end{verbatim}
137
138\subsection*{Claim 2: structured abstracts and click-through}
139\begin{claim}[type=empirical, evidence=experiment, confidence=0.6, rationale={Within-paper v1-vs-v2 A/B on a constructed demonstration dataset; correlational framing kept deliberately}, labels={worked-example, discoverability}, title={Structured abstracts and click-through}]
140\label{claim:c2}
141Adding a structured abstract correlates with 22\% higher click-through from search results.
142\end{claim}
143
144This claim depends on the title-length result --- both regress an attention outcome on a surface feature of the paper, and the title-length cohort is used as a control variable when estimating the structured-abstract effect. We list a \texttt{depends\_on} edge accordingly. The 22\% figure is from a within-paper A/B (papers that added a structured abstract in a v2 revision, compared against their own v1), and is robust to dropping the bottom decile of papers by access count.
145
146\subsection*{Claim 3: in-subfield citation concentration}
147\begin{claim}[type=empirical, evidence=observation, confidence=0.55, rationale={Ratio against a topic-uniform null on the demonstration citation graph; sensitive to the null model choice}, labels={worked-example, citation-graph}, title={In-subfield citation concentration}]
148\label{claim:c3}
149Domain experts cite within their own subfield 4x more than cross-domain.
150\end{claim}
151
152The 4x figure is the ratio of within-subfield citations to expected cross-domain citations under a topic-uniform null. It depends on the discoverability story: the same titling and structured-abstract decisions that suppress cross-domain reads (claims \ref{claim:c1} and \ref{claim:c2}) plausibly drive the in-subfield concentration of citations downstream. The replication will independently measure citations on the topic-tagged citation graph and is expected to converge or contradict by 2026-10.
153
154\subsection*{Claim 4: section-level retrieval}
155\begin{claim}[type=computational, evidence=experiment, confidence=0.7, rationale={Cleanest IR benchmark in the bundle, but run on one embedding model; the registered replication tests model dependence}, labels={worked-example, retrieval}, title={Section-level retrieval}]
156\label{claim:c4}
157Section-level retrieval beats whole-paper retrieval on recall@5 for narrow technical queries.
158\end{claim}
159
160This is the cleanest IR claim in the bundle. Section embeddings are computed off the parsed CIR (one embedding per \texttt{\textbackslash section}); whole-paper embeddings are computed off concatenated text. The recall@5 gap is largest for queries phrased as a single dense technical term (e.g.\ ``Krippendorff alpha'' or ``Cramer--Rao bound''); for diffuse multi-concept queries, the gap closes. The independent replication will run on a different embedding model (the original used \texttt{rrxiv-embed-v3}; the replicators will use the public \texttt{instructor-xl}) to test for model dependence.
161
162\subsection*{Claim 5: reproducibility-budget signal stability}
163\begin{claim}[type=methodological, evidence=experiment, confidence=0.75, rationale={Three reannotation rounds with alpha robust to dropping the most experienced annotator; small annotator pool}, labels={worked-example, inter-annotator}, title={Reproducibility-budget signal stability}]
164\label{claim:c5}
165The reproducibility-budget signal is stable across three independent reannotation rounds (Krippendorff's alpha = 0.79).
166\end{claim}
167
168This claim sits on top of the reproducibility-budget construct introduced in \texttt{rrxiv:2605.00003}; the inter-annotator agreement is the kind of signal-stability check that determines whether the budget is a usable feature in downstream models. $\alpha = 0.79$ is at the high end of ``substantial'' agreement on the Krippendorff scale and is robust to dropping the most experienced annotator from each round. The registration against this claim (\texttt{annotations/active-replication.c5.json}, posted as a \texttt{comment} annotation per section \ref{sec:registration-shape}) carries in its content:
169
170\begin{verbatim}
171team_handle: repro-budget-rereaders (illustrative)
172started_at: 2026-05-15T00:00:00Z
173expected_completion_at: 2026-09-10
174methodology_summary: Recruit 4 new annotators (no overlap
175   with original 3); re-run the budget-tagging protocol on
176   the same 120-paper sample; recompute alpha.
177\end{verbatim}
178
179\subsection*{Claim 6: ORCID coverage and deduplication}
180\begin{claim}[type=empirical, evidence=observation, confidence=0.6, rationale={Empirical threshold from dedup error decomposition on the demonstration corpus; the exact 70 percent figure is corpus-dependent}, labels={worked-example, identity}, title={ORCID coverage and deduplication}]
181\label{claim:c6}
182Author ORCID coverage above 70\% is necessary (but not sufficient) for accurate cross-paper deduplication.
183\end{claim}
184
185The 70\% threshold is empirical: below it, name-collision rates dominate the dedup error budget; above it, ORCID-anchored matching plateaus and the residual error comes from name-only entries and from authors who hold multiple ORCIDs. The ``necessary but not sufficient'' qualifier matters: a paper with 100\% ORCID coverage still inherits its co-authors' lower-coverage records, so dedup quality is bounded by the worst neighbour.
186
187\subsection*{Claim 7: the registration effect itself}
188\begin{claim}[type=empirical, evidence=observation, confidence=0.6, rationale={Matched observational cohort with bootstrap interval excluding zero, but self-selection confound acknowledged and the dataset is a constructed worked example}, labels={worked-example, load-bearing, registration-effect}, title={The registration effect itself}]
189\label{claim:c7}
190Pre-registering a replication target shifts the median completion time forward by 6 weeks vs unregistered replications.
191\end{claim}
192
193This is the substantive claim of the paper. The point estimate (41 days, rounded to six weeks) is derived from the matched cohort described in section \ref{sec:approach}; the bootstrap 95\% interval is $[-58, -23]$ days. It depends on claim \ref{claim:c5}'s annotator-stability finding because the matching procedure relies on the reproducibility-budget tag as one of the matching covariates, and a wobbly tag would inflate the variance of the estimate. The internal replication \texttt{rrxiv-instance-internal} will redo the analysis with a different matching radius (loosening the topic-tag constraint from exact to one-step-removed in the topic taxonomy) and report by 2026-12-31.
194
195\begin{rrxivremark}[Why the 6 weeks?]
196The proximate mechanism is unsurprising: a registered date is observable, and slips against it create social pressure on the team. The deeper question is whether the effect would survive in a world where \emph{all} replications were registered (so the social-pressure signal is no longer differential). We expect attenuation but not disappearance; the registration also helps the team itself plan around the date.
197\end{rrxivremark}
198
199\dependson{rrxiv:2605.00008:claim:c2}{rrxiv:2605.00008:claim:c1}
200\dependson{rrxiv:2605.00008:claim:c3}{rrxiv:2605.00008:claim:c1}
201\dependson{rrxiv:2605.00008:claim:c3}{rrxiv:2605.00008:claim:c2}
202\dependson{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00008:claim:c5}
203% c1's title-length finding does not have a defensible cross-paper
204% dependency on whitepaper:volume-structure. Removed for honesty.
205\dependson{rrxiv:2605.00008:claim:c5}{rrxiv:2605.00003:claim:c1}
206\dependson{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00002:claim:c1}
207\supports{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00004:claim:c1}
208
209\section{Discussion}
210\label{sec:discussion}
211The six-week shift is large enough to matter operationally. For a funder running a replication-funding tranche, the difference between ``median completion in 4 months'' and ``median completion in 5.5 months'' is the difference between budget cycles. For a journal editor deciding whether to wait on a replication before issuing a correction, the same shift can determine whether the correction is in this volume or the next. The shift is not, however, large enough to wave away the selection-confound: teams that register may simply be the teams that finish.
212
213What the registration buys with high confidence, and where the selection-confound bites less, is \emph{visibility}. Whether or not registered replications are faster on average, they are findable: the rrxiv API exposes the per-claim annotation listing \texttt{GET /annotations?target\_id=<claim-id>\&target\_type=claim}, and a funder can query it directly. This converts the question ``what is being replicated right now?'' from a literature-search problem into a database query. The instance-internal replication of claim \ref{claim:c7} (Table \ref{tab:registry}, row 7) will, by design, test whether the visibility effect is separable from the latency effect.
214
215A scope note. We are not arguing that every claim should be replicated, or that the volume of replication is the bottleneck. Many claims do not warrant the cost. The active-replication registration is a coordination affordance, not a mandate; it makes replication legible when someone chooses to do it.
216
217\begin{scope}[What this paper does not cover]
218We do not address (a) the replication-quality question --- a registered, on-time replication can still be methodologically weak; (b) the question of who is allowed to mark a replication ``complete'' --- the current convention is the registering team's self-report, which is auditable but not adjudicated; or (c) the meta-replication of claim \ref{claim:c7} itself across different rrxiv instances. The third is on the roadmap once a second instance is live.
219\end{scope}
220
221\begin{openquestion}[Disappearing teams]
222The annotation schema currently has no built-in handling for the case where a registering team disappears mid-effort: PhD students graduate, labs close, individuals leave the field. Naive handling --- letting the \texttt{expected\_completion\_at} field silently lapse --- produces a ``ghost replication'' state that looks the same on the registry as a slow-but-progressing one. We sketch three candidate mechanisms: (i) heartbeat annotations posted at a configurable cadence, with the registration auto-expiring after $k$ missed heartbeats; (ii) a third-party ``takeover'' annotation that explicitly transfers the registration to a new team; (iii) instance-level garbage collection that flags any registration whose \texttt{expected\_completion\_at} has slipped by more than $2\times$ the original window. The right answer is probably some combination, but the design space is open. This question is, itself, a candidate for active replication once the schema is settled.
223\end{openquestion}
224
225\section*{Acknowledgements}
226This paper is generated as part of the rrxiv reference corpus. The double-duty framing (paper as substantive contribution AND worked example of the pattern it describes) is intentional and is the prototype for future protocol-demonstration papers in the instance.
227
228\section{References}
229\begin{itemize}[leftmargin=*]
230\item \textbf{rrxiv whitepaper.} \texttt{rrxiv:2605.00001}. \emph{The rrxiv protocol: claims, evidence, and annotations as the substrate for preprint discourse.} 2026.
231\item \textbf{rrxiv claim-graph paper.} \texttt{rrxiv:2605.00002}. \emph{The claim graph as a first-class artifact.} 2026. Establishes the per-claim addressability that this paper's annotation surface rests on.
232\item \textbf{rrxiv reproducibility-budgets.} \texttt{rrxiv:2605.00003}. \emph{Reproducibility budgets for ML preprints.} 2026. Source of the budget signal whose stability claim \ref{claim:c5} measures.
233\item \textbf{rrxiv shrinkage-estimators.} \texttt{rrxiv:2605.00004}. \emph{A negative result on shrinkage estimators in small-N replication.} 2026. A worked example of the kind of paper whose individual claims benefit from active-replication annotations.
234\item \textbf{Camerer, C.\ et al.} \emph{Evaluating the replicability of social science experiments.} Nature Human Behaviour, 2018. Source of the aggregate latency baseline against which the per-claim shift is meaningful.
235\item \textbf{Errington, T.M.\ et al.} \emph{Reproducibility in cancer biology: the experiments.} eLife, 2021. Long-form documentation of how unregistered replication efforts decay over years; motivates the visibility argument in section \ref{sec:discussion}.
236\item \textbf{Nosek, B.A.\ et al.} \emph{The preregistration revolution.} PNAS, 2018. The conceptual antecedent: pre-registration shifts behaviour; this paper extends that mechanism from study design to replication scheduling.
237\end{itemize}
238\end{document}
239

1\documentclass{rrxiv} 2\rrxivid{rrxiv:2605.00008} 3\rrxivversion{v4} 4\rrxivprotocolversion{0.1.0} 5\rrxivlicense{CC-BY-4.0} 6\rrxivtopics{cs.DL,cs.IR} 7\rrxivbuilddate{2026-07-14} 8 9\title{Many small claims, all under active replication} 10% Structured author records (RRP-0021/0025/0026) — mirror rrxiv-meta.json. 11\rrxivauthor[orcid=0009-0002-0561-6499, role=author, 12 affiliation=The rrxiv project, 13 email=albisburdige@protonmail.com]{Blaise Albis-Burdige} 14\rrxivauthor[role=agent, is-agent=true, handle=agent:claude-opus-4.7, 15 affiliation=The rrxiv project, 16 model-name=Claude Opus 4.7, model-vendor=anthropic, 17 model-family=claude, model-series=opus, model-version=4.7, 18 model-release-pin=claude-opus-4-7-20260520, 19 inference-environment=Claude Code CLI]{Claude Opus 4.7} 20\date{2026-07-14} 21 22\begin{document} 23\maketitle 24 25\begin{center} 26\small\itshape 27Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at \href{https://rrxiv.com/papers/rrxiv:2605.00008}{rrxiv.com/papers/rrxiv:2605.00008}. 28\end{center} 29 30\begin{abstract} 31A preprint's claims are not a homogeneous block; they age, replicate, and fail at different rates. We argue that the natural unit of replication is the individual claim, and we encode that argument operationally: every numbered claim below has an active-replication pre-registration --- naming a replication window, an expected-completion date, and a methodology summary --- carried as a \texttt{comment} annotation posted to the live rrxiv instance against the claim's stable identifier (queryable via \texttt{GET /annotations?target\_id=rrxiv:2605.00008:claim:cN}). Annotations are post-submission discourse: they live on the instance and attach to claim IDs; they are not baked into the paper's build-time CIR sidecar. The seven annotation documents are also versioned in this paper's source repository (\texttt{annotations/}). From an instrumentation dataset styled as a run of a reference instance ($n{=}312$ preprint--replication pairs across 14 months --- a constructed worked example, per the scope note in section~\ref{sec:approach}), pre-registering a replication target on a claim shifts median completion forward by approximately six weeks against a matched unregistered baseline. The paper is therefore both a worked measurement of registration's effect on replication latency, and the canonical worked example of the active-replication pattern: it self-references its own annotations as the existence proof. 32\end{abstract} 33 34\section{Introduction} 35\label{sec:intro} 36Most replication infrastructure treats a preprint as the unit of replication: someone announces that they will ``replicate the paper'', usually on a personal homepage or in a tweet, sometimes a year after publication, and there is no canonical place to find that announcement again. This framing has two costs. First, replication completion times are bimodal and very long-tailed --- a non-trivial fraction of announced replications never resolve, and there is no way for a funder to distinguish ``in progress'' from ``abandoned'' without writing email. Second, the whole-paper framing hides which \emph{claim} is actually under test. A paper with eight empirical claims that has one widely replicated headline result is not the same epistemic object as one with eight independently tested claims; current bibliographic infrastructure cannot tell those apart. 37 38The rrxiv protocol \citep{rrxiv-whitepaper} pushes the unit of replication down one level: each claim has a stable identifier (\texttt{<id\_slug>:claim:<label>}), and annotations attach to that identifier, not to the paper. This paper exercises that affordance. We define a small pre-registration \emph{convention}, active-replication, with six fields --- target claim, registering identity, start timestamp, expected completion date, methodology summary, and code repository --- and we post one such registration per claim in this paper. Protocol honesty note: annotation.schema.json v0.1 has a closed twelve-type \texttt{annotation\_type} enum, and \texttt{active\_replication} is not one of them; a pre-registration also has no outcome yet, so the \texttt{replication} type (whose payload requires an \texttt{outcome}) would be wrong. The registration is therefore carried as a \texttt{comment} annotation whose content holds the six fields, and completion is reported by a follow-up \texttt{replication} annotation with a structured payload. The registrations are queryable from the rrxiv API and, critically, dated: a third party can compute, at any later moment, whether the registered expected-completion date has slipped and by how much. 39 40The substantive contribution is an estimate of the registration effect itself. Across $n{=}312$ replication attempts logged in the rrxiv reference instance between 2025-03 and 2026-05, claims with a pre-registered active-replication annotation reached completion (a posted \texttt{replication} annotation) at a median 41 days earlier than matched unregistered attempts on comparable claims. We report the estimate, the matching procedure, the residual confounds, and the open question of what happens when a registered team disappears. 41 42The roadmap. Section \ref{sec:background} situates the pattern relative to existing claim-graph and preprint infrastructure. Section \ref{sec:approach} describes the annotation schema, the matching procedure, and the measurement window. Section \ref{sec:claims} states each of seven registered claims, with the actual registration annotations reproduced inline as an existence-proof of the pattern. Section \ref{sec:discussion} discusses what registration does and does not buy, including the abandonment-risk open question. 43 44\section{Background} 45\label{sec:background} 46This work sits at the intersection of three threads. The claim-graph thread \citep[\texttt{rrxiv:2605.00002}]{rrxiv-claimgraph} argues that the unit of citation, replication, and contradiction should be the individual claim, not the paper. The reproducibility-budget thread \citep[\texttt{rrxiv:2605.00003}]{rrxiv-repro-budgets} attaches per-paper compute/data signals so that ``reproducible'' becomes a measurable annotation rather than a binary label. The replication-latency thread, with which this paper engages directly, has historically been studied in coarse aggregate \citep{camerer-2018,errington-2021}: how many announced replications resolve, on what timescale, and with what concordance to the original. The contribution here is to instrument the latency question at the claim-annotation layer, where the relevant signals (registration, methodology, code link) are already structured. 47 48Pre-registration has a deep literature in psychology and clinical trials and is widely credited with reducing publication bias and selective reporting \citep{nosek-2018}. Pre-registering a \emph{replication} is rarer, partly because there has historically been no canonical surface for the announcement to attach to. The active-replication annotation pattern fills exactly that gap: it provides the surface, with timestamps, on the claim itself. 49 50A note on scope. We are not arguing that registration causes faster science --- only that it shifts median completion forward in a regime where the alternative is an unsurfaced personal commitment. The mechanism is mundane: a registered date is visible to collaborators, to funders, and to the team themselves, and slips are observable. The size of the effect, and its dependence on team composition and topic, is what this paper measures. 51 52\section{Approach} 53\label{sec:approach} 54\subsection{The active-replication registration} 55\label{sec:registration-shape} 56The registration is a real protocol object: an annotation document conforming to \texttt{annotation.schema.json} v0.1, with \texttt{annotation\_type: comment} and the six registration fields carried in the annotation's content. A representative registration (this paper's own, on claim \ref{claim:c1}) looks like: 57 58\begin{verbatim} 59{ 60 "id": "ann-ar-2605-00008-c1", 61 "target_id": "rrxiv:2605.00008:claim:c1", 62 "target_type": "claim", 63 "annotation_type": "comment", 64 "content": "Active-replication pre-registration ... 65 started_at: 2026-05-01T09:00:00Z 66 expected_completion_at: 2026-07-15 67 methodology_summary: Re-extract the title-length 68 feature on an independent geo mirror of the 69 access log; re-fit the discontinuity at w=12. ...", 70 "created_at": "2026-07-14T00:00:00Z", 71 "created_by": { "identity_type": "orcid", 72 "identity": "0009-0002-0561-6499" } 73} 74\end{verbatim} 75 76Why \texttt{comment} and not a dedicated type? The v0.1 protocol's \texttt{annotation\_type} enum is closed (twelve types), and a pre-registration has no \texttt{outcome} yet, so it cannot honestly be a \texttt{replication} annotation --- that type's structured payload requires an outcome in \{\texttt{supports}, \texttt{contradicts}, \texttt{partial}, \texttt{inconclusive}\} (spec/0006, RRP-0019). The \texttt{comment} type is the protocol's sanctioned catch-all, and per spec its \texttt{structured\_payload} must be null, which is why the registration fields ride in \texttt{content}. Promoting active-replication to a first-class annotation type with a structured payload is a candidate future RRP; this paper is the motivating worked example. 77 78The annotation is posted via the rrxiv API (\texttt{POST /annotations}) and surfaces in the per-claim annotation listing (\texttt{GET /annotations?target\_id=<claim-id>\&target\_type=claim}). It does \emph{not} appear in the paper's CIR sidecar: the CIR is a build-time artifact of the submission, while annotations are post-submission discourse held by the instance. The two timestamp fields --- \texttt{started\_at} and \texttt{expected\_completion\_at} --- are the load-bearing ones for the measurement reported in claim \ref{claim:c7}. A replication is considered ``complete'' when the team posts a follow-up \texttt{replication} annotation on the same target claim, whose structured payload carries the outcome and method; the server derives the claim's \texttt{replication\_status} from accumulated replication annotations (RRP-0019). 79 80\subsection{Matching and identification strategy} 81A scope note first, for honesty: this is a demonstration paper in the rrxiv reference corpus, and the instrumentation dataset below is a \emph{constructed worked example} of the analysis --- the live instance (deployed 2026-05) is younger than the 14-month observation window described, so no real deployment could have produced it. The numbers exercise the pattern and the analysis pipeline; treat them as illustrative, not as field data. 82 83For the estimate underlying claim \ref{claim:c7}, we observe $n{=}312$ replication attempts in the rrxiv reference instance over 14 months. Of these, 184 carried an active-replication registration before the work began and 128 did not (the latter were detected after the fact by parsing follow-up \texttt{replication} annotations and back-dating). We match each unregistered replication to a registered one on (a) the topic tags of the target claim, (b) the claim's evidence type, and (c) the number of inbound \texttt{depends\_on} edges on the target claim. Median completion time is then compared on the matched cohort. 84 85The matched cohort gives a median delta of $-41$ days for registered attempts, with a bootstrap 95\% interval of $[-58, -23]$ days. The headline figure ``six weeks'' rounds the point estimate. We do not claim a causal effect; the natural confounder is self-selection (teams who register may be more organised in general). Section \ref{sec:discussion} addresses this directly. 86 87\subsection{This paper's own annotations} 88\label{sec:own-annotations} 89Each of the seven claims in section \ref{sec:claims} below has an active-replication registration --- a \texttt{comment} annotation in the shape of section \ref{sec:registration-shape} --- posted to the live instance against its claim ID, and versioned in this paper's source repository under \texttt{annotations/}. Table \ref{tab:registry} summarises the registry. Concrete annotation documents are shown alongside claims \ref{claim:c1} and \ref{claim:c5} as worked examples; the others follow the same shape. 90 91Registry honesty. The ``replicating team'' handles in Table \ref{tab:registry} are \emph{illustrative}: no external team has committed to these replications. The registrations are posted by this paper's own authors as reference-corpus demonstrations of the pattern, and each annotation's content says so explicitly. All seven claims accordingly carry the server-derived \texttt{replication\_status: untested} --- the status is derived from \texttt{replication} annotations (RRP-0019), none of which exist yet, and nothing in this paper overrides it. 92 93\begin{table}[h] 94\centering 95\small 96\begin{tabular}{@{}llll@{}} 97\toprule 98Claim & Replicating team (handle) & Started & Expected \\ 99\midrule 100\ref{claim:c1} & \texttt{title-length-group@tuebingen} & 2026-05-01 & 2026-07-15 \\ 101\ref{claim:c2} & \texttt{search-ctr-coop} & 2026-05-04 & 2026-08-20 \\ 102\ref{claim:c3} & \texttt{citation-network-lab@ut} & 2026-05-10 & 2026-10-01 \\ 103\ref{claim:c4} & \texttt{ir-eval-collective} & 2026-05-12 & 2026-08-30 \\ 104\ref{claim:c5} & \texttt{repro-budget-rereaders} & 2026-05-15 & 2026-09-10 \\ 105\ref{claim:c6} & \texttt{orcid-coverage-audit} & 2026-05-18 & 2026-11-01 \\ 106\ref{claim:c7} & \texttt{rrxiv-instance-internal} & 2026-05-20 & 2026-12-31 \\ 107\bottomrule 108\end{tabular} 109\caption{The seven claims of this paper, each with an active-replication registration. {\itshape Worked example: the team handles and dates are illustrative --- no external team has committed to these replications, and the registrations are posted by the paper's own authors as demonstrations of the pattern (section \ref{sec:own-annotations}). All seven claims are \texttt{replication\_status: untested} on the live instance.}} 110\label{tab:registry} 111\end{table} 112 113This is the ``double duty'' framing: claim \ref{claim:c7} is the substantive empirical claim about registration effect, and the annotations on \ref{claim:c1}--\ref{claim:c7} are also instances of the very pattern \ref{claim:c7} is measuring. The instance's future state will, in particular, be testable against \ref{claim:c7}'s predicted six-week shift. 114 115\section{Results: registered claims} 116\label{sec:claims} 117 118\subsection*{Claim 1: title length and cross-domain attention} 119\begin{claim}[type=empirical, evidence=observation, confidence=0.7, rationale={Regression discontinuity in a constructed access-log demonstration dataset; robust breakpoint but observational and illustrative}, labels={worked-example, access-log}, title={Title length and cross-domain attention}] 120\label{claim:c1} 121Preprint titles longer than 12 words receive 18\% less cross-domain attention (median, n=4{,}800 papers). 122\end{claim} 123 124The signal here is robust because the title-length feature is cheap to extract and the outcome (cross-domain reads, defined as a read by a user whose declared primary topic differs from the paper's primary topic) is logged at the rrxiv access tier. The 12-word threshold is not magic; it is the breakpoint at which a regression discontinuity emerges in the access log. Below 12 words, cross-domain attention scales roughly linearly with abstract specificity; above 12 words, an additional title word predicts a 1.5\% drop in cross-domain reads on average. 125 126The registration against this claim is the annotation document shown in full in section \ref{sec:registration-shape}; the source file is \texttt{annotations/active-replication.c1.json} in this paper's repository. Its content block carries the registration fields: 127 128\begin{verbatim} 129team_handle: title-length-group@tuebingen (illustrative) 130started_at: 2026-05-01T09:00:00Z 131expected_completion_at: 2026-07-15 132methodology_summary: Re-extract the title-length feature on 133 an independent geo mirror of the rrxiv access log 134 (different geo distribution); re-fit the discontinuity 135 at w=12. 136\end{verbatim} 137 138\subsection*{Claim 2: structured abstracts and click-through} 139\begin{claim}[type=empirical, evidence=experiment, confidence=0.6, rationale={Within-paper v1-vs-v2 A/B on a constructed demonstration dataset; correlational framing kept deliberately}, labels={worked-example, discoverability}, title={Structured abstracts and click-through}] 140\label{claim:c2} 141Adding a structured abstract correlates with 22\% higher click-through from search results. 142\end{claim} 143 144This claim depends on the title-length result --- both regress an attention outcome on a surface feature of the paper, and the title-length cohort is used as a control variable when estimating the structured-abstract effect. We list a \texttt{depends\_on} edge accordingly. The 22\% figure is from a within-paper A/B (papers that added a structured abstract in a v2 revision, compared against their own v1), and is robust to dropping the bottom decile of papers by access count. 145 146\subsection*{Claim 3: in-subfield citation concentration} 147\begin{claim}[type=empirical, evidence=observation, confidence=0.55, rationale={Ratio against a topic-uniform null on the demonstration citation graph; sensitive to the null model choice}, labels={worked-example, citation-graph}, title={In-subfield citation concentration}] 148\label{claim:c3} 149Domain experts cite within their own subfield 4x more than cross-domain. 150\end{claim} 151 152The 4x figure is the ratio of within-subfield citations to expected cross-domain citations under a topic-uniform null. It depends on the discoverability story: the same titling and structured-abstract decisions that suppress cross-domain reads (claims \ref{claim:c1} and \ref{claim:c2}) plausibly drive the in-subfield concentration of citations downstream. The replication will independently measure citations on the topic-tagged citation graph and is expected to converge or contradict by 2026-10. 153 154\subsection*{Claim 4: section-level retrieval} 155\begin{claim}[type=computational, evidence=experiment, confidence=0.7, rationale={Cleanest IR benchmark in the bundle, but run on one embedding model; the registered replication tests model dependence}, labels={worked-example, retrieval}, title={Section-level retrieval}] 156\label{claim:c4} 157Section-level retrieval beats whole-paper retrieval on recall@5 for narrow technical queries. 158\end{claim} 159 160This is the cleanest IR claim in the bundle. Section embeddings are computed off the parsed CIR (one embedding per \texttt{\textbackslash section}); whole-paper embeddings are computed off concatenated text. The recall@5 gap is largest for queries phrased as a single dense technical term (e.g.\ ``Krippendorff alpha'' or ``Cramer--Rao bound''); for diffuse multi-concept queries, the gap closes. The independent replication will run on a different embedding model (the original used \texttt{rrxiv-embed-v3}; the replicators will use the public \texttt{instructor-xl}) to test for model dependence. 161 162\subsection*{Claim 5: reproducibility-budget signal stability} 163\begin{claim}[type=methodological, evidence=experiment, confidence=0.75, rationale={Three reannotation rounds with alpha robust to dropping the most experienced annotator; small annotator pool}, labels={worked-example, inter-annotator}, title={Reproducibility-budget signal stability}] 164\label{claim:c5} 165The reproducibility-budget signal is stable across three independent reannotation rounds (Krippendorff's alpha = 0.79). 166\end{claim} 167 168This claim sits on top of the reproducibility-budget construct introduced in \texttt{rrxiv:2605.00003}; the inter-annotator agreement is the kind of signal-stability check that determines whether the budget is a usable feature in downstream models. $\alpha = 0.79$ is at the high end of ``substantial'' agreement on the Krippendorff scale and is robust to dropping the most experienced annotator from each round. The registration against this claim (\texttt{annotations/active-replication.c5.json}, posted as a \texttt{comment} annotation per section \ref{sec:registration-shape}) carries in its content: 169 170\begin{verbatim} 171team_handle: repro-budget-rereaders (illustrative) 172started_at: 2026-05-15T00:00:00Z 173expected_completion_at: 2026-09-10 174methodology_summary: Recruit 4 new annotators (no overlap 175 with original 3); re-run the budget-tagging protocol on 176 the same 120-paper sample; recompute alpha. 177\end{verbatim} 178 179\subsection*{Claim 6: ORCID coverage and deduplication} 180\begin{claim}[type=empirical, evidence=observation, confidence=0.6, rationale={Empirical threshold from dedup error decomposition on the demonstration corpus; the exact 70 percent figure is corpus-dependent}, labels={worked-example, identity}, title={ORCID coverage and deduplication}] 181\label{claim:c6} 182Author ORCID coverage above 70\% is necessary (but not sufficient) for accurate cross-paper deduplication. 183\end{claim} 184 185The 70\% threshold is empirical: below it, name-collision rates dominate the dedup error budget; above it, ORCID-anchored matching plateaus and the residual error comes from name-only entries and from authors who hold multiple ORCIDs. The ``necessary but not sufficient'' qualifier matters: a paper with 100\% ORCID coverage still inherits its co-authors' lower-coverage records, so dedup quality is bounded by the worst neighbour. 186 187\subsection*{Claim 7: the registration effect itself} 188\begin{claim}[type=empirical, evidence=observation, confidence=0.6, rationale={Matched observational cohort with bootstrap interval excluding zero, but self-selection confound acknowledged and the dataset is a constructed worked example}, labels={worked-example, load-bearing, registration-effect}, title={The registration effect itself}] 189\label{claim:c7} 190Pre-registering a replication target shifts the median completion time forward by 6 weeks vs unregistered replications. 191\end{claim} 192 193This is the substantive claim of the paper. The point estimate (41 days, rounded to six weeks) is derived from the matched cohort described in section \ref{sec:approach}; the bootstrap 95\% interval is $[-58, -23]$ days. It depends on claim \ref{claim:c5}'s annotator-stability finding because the matching procedure relies on the reproducibility-budget tag as one of the matching covariates, and a wobbly tag would inflate the variance of the estimate. The internal replication \texttt{rrxiv-instance-internal} will redo the analysis with a different matching radius (loosening the topic-tag constraint from exact to one-step-removed in the topic taxonomy) and report by 2026-12-31. 194 195\begin{rrxivremark}[Why the 6 weeks?] 196The proximate mechanism is unsurprising: a registered date is observable, and slips against it create social pressure on the team. The deeper question is whether the effect would survive in a world where \emph{all} replications were registered (so the social-pressure signal is no longer differential). We expect attenuation but not disappearance; the registration also helps the team itself plan around the date. 197\end{rrxivremark} 198 199\dependson{rrxiv:2605.00008:claim:c2}{rrxiv:2605.00008:claim:c1} 200\dependson{rrxiv:2605.00008:claim:c3}{rrxiv:2605.00008:claim:c1} 201\dependson{rrxiv:2605.00008:claim:c3}{rrxiv:2605.00008:claim:c2} 202\dependson{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00008:claim:c5} 203% c1's title-length finding does not have a defensible cross-paper 204% dependency on whitepaper:volume-structure. Removed for honesty. 205\dependson{rrxiv:2605.00008:claim:c5}{rrxiv:2605.00003:claim:c1} 206\dependson{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00002:claim:c1} 207\supports{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00004:claim:c1} 208 209\section{Discussion} 210\label{sec:discussion} 211The six-week shift is large enough to matter operationally. For a funder running a replication-funding tranche, the difference between ``median completion in 4 months'' and ``median completion in 5.5 months'' is the difference between budget cycles. For a journal editor deciding whether to wait on a replication before issuing a correction, the same shift can determine whether the correction is in this volume or the next. The shift is not, however, large enough to wave away the selection-confound: teams that register may simply be the teams that finish. 212 213What the registration buys with high confidence, and where the selection-confound bites less, is \emph{visibility}. Whether or not registered replications are faster on average, they are findable: the rrxiv API exposes the per-claim annotation listing \texttt{GET /annotations?target\_id=<claim-id>\&target\_type=claim}, and a funder can query it directly. This converts the question ``what is being replicated right now?'' from a literature-search problem into a database query. The instance-internal replication of claim \ref{claim:c7} (Table \ref{tab:registry}, row 7) will, by design, test whether the visibility effect is separable from the latency effect. 214 215A scope note. We are not arguing that every claim should be replicated, or that the volume of replication is the bottleneck. Many claims do not warrant the cost. The active-replication registration is a coordination affordance, not a mandate; it makes replication legible when someone chooses to do it. 216 217\begin{scope}[What this paper does not cover] 218We do not address (a) the replication-quality question --- a registered, on-time replication can still be methodologically weak; (b) the question of who is allowed to mark a replication ``complete'' --- the current convention is the registering team's self-report, which is auditable but not adjudicated; or (c) the meta-replication of claim \ref{claim:c7} itself across different rrxiv instances. The third is on the roadmap once a second instance is live. 219\end{scope} 220 221\begin{openquestion}[Disappearing teams] 222The annotation schema currently has no built-in handling for the case where a registering team disappears mid-effort: PhD students graduate, labs close, individuals leave the field. Naive handling --- letting the \texttt{expected\_completion\_at} field silently lapse --- produces a ``ghost replication'' state that looks the same on the registry as a slow-but-progressing one. We sketch three candidate mechanisms: (i) heartbeat annotations posted at a configurable cadence, with the registration auto-expiring after $k$ missed heartbeats; (ii) a third-party ``takeover'' annotation that explicitly transfers the registration to a new team; (iii) instance-level garbage collection that flags any registration whose \texttt{expected\_completion\_at} has slipped by more than $2\times$ the original window. The right answer is probably some combination, but the design space is open. This question is, itself, a candidate for active replication once the schema is settled. 223\end{openquestion} 224 225\section*{Acknowledgements} 226This paper is generated as part of the rrxiv reference corpus. The double-duty framing (paper as substantive contribution AND worked example of the pattern it describes) is intentional and is the prototype for future protocol-demonstration papers in the instance. 227 228\section{References} 229\begin{itemize}[leftmargin=*] 230\item \textbf{rrxiv whitepaper.} \texttt{rrxiv:2605.00001}. \emph{The rrxiv protocol: claims, evidence, and annotations as the substrate for preprint discourse.} 2026. 231\item \textbf{rrxiv claim-graph paper.} \texttt{rrxiv:2605.00002}. \emph{The claim graph as a first-class artifact.} 2026. Establishes the per-claim addressability that this paper's annotation surface rests on. 232\item \textbf{rrxiv reproducibility-budgets.} \texttt{rrxiv:2605.00003}. \emph{Reproducibility budgets for ML preprints.} 2026. Source of the budget signal whose stability claim \ref{claim:c5} measures. 233\item \textbf{rrxiv shrinkage-estimators.} \texttt{rrxiv:2605.00004}. \emph{A negative result on shrinkage estimators in small-N replication.} 2026. A worked example of the kind of paper whose individual claims benefit from active-replication annotations. 234\item \textbf{Camerer, C.\ et al.} \emph{Evaluating the replicability of social science experiments.} Nature Human Behaviour, 2018. Source of the aggregate latency baseline against which the per-claim shift is meaningful. 235\item \textbf{Errington, T.M.\ et al.} \emph{Reproducibility in cancer biology: the experiments.} eLife, 2021. Long-form documentation of how unregistered replication efforts decay over years; motivates the visibility argument in section \ref{sec:discussion}. 236\item \textbf{Nosek, B.A.\ et al.} \emph{The preregistration revolution.} PNAS, 2018. The conceptual antecedent: pre-registration shifts behaviour; this paper extends that mechanism from study design to replication scheduling. 237\end{itemize} 238\end{document} 239