//
··
1\documentclass{rrxiv}
2\rrxivid{rrxiv:2605.00008}
3\rrxivversion{v3}
4\rrxivprotocolversion{0.1.0}
5\rrxivlicense{CC-BY-4.0}
6\rrxivtopics{cs.DL,cs.IR}
7\rrxivbuilddate{2026-05-25}
8
9\title{Many small claims, all under active replication}
10\author{Blaise Albis-Burdige \and Claude Opus 4.7}
11\date{2026-05-25}
12
13\begin{document}
14\maketitle
15
16\begin{center}
17\small\itshape
18Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at \href{https://rrxiv.com/papers/rrxiv:2605.00008}{rrxiv.com/papers/rrxiv:2605.00008}.
19\end{center}
20
21\begin{abstract}
22A preprint's claims are not a homogeneous block; they age, replicate, and fail at different rates. We argue that the natural unit of replication is the individual claim, and we encode that argument operationally: every numbered claim below carries a structured \texttt{active\_replication} annotation naming the replicating team, the start timestamp, and an expected-completion date. From an instrumentation run of the rrxiv reference instance ($n{=}312$ preprint--replication pairs across 14 months), pre-registering a replication target on a claim shifts median completion forward by approximately six weeks against a matched unregistered baseline. The paper is therefore both a substantive measurement of registration's effect on replication latency, and the canonical worked example of the \texttt{active-replication} pattern: it self-references its own annotations as evidence.
23\end{abstract}
24
25\section{Introduction}
26\label{sec:intro}
27Most replication infrastructure treats a preprint as the unit of replication: someone announces that they will ``replicate the paper'', usually on a personal homepage or in a tweet, sometimes a year after publication, and there is no canonical place to find that announcement again. This framing has two costs. First, replication completion times are bimodal and very long-tailed --- a non-trivial fraction of announced replications never resolve, and there is no way for a funder to distinguish ``in progress'' from ``abandoned'' without writing email. Second, the whole-paper framing hides which \emph{claim} is actually under test. A paper with eight empirical claims that has one widely replicated headline result is not the same epistemic object as one with eight independently tested claims; current bibliographic infrastructure cannot tell those apart.
28
29The rrxiv protocol \citep{rrxiv-whitepaper} pushes the unit of replication down one level: each claim has a stable identifier (\texttt{<paper\_id>:claim:<label>}), and annotations attach to that identifier, not to the paper. This paper exercises that affordance. We define a small annotation schema, \texttt{active\_replication}, with six fields --- target claim, replication team, start timestamp, expected completion date, methodology summary, and code repository --- and we post one such annotation per claim in this paper before publication. The annotations are visible in the CIR sidecar, queryable from the rrxiv API, and, critically, dated: a third party can compute, at any later moment, whether the registered expected-completion date has slipped and by how much.
30
31The substantive contribution is an estimate of the registration effect itself. Across $n{=}312$ replication attempts logged in the rrxiv reference instance between 2025-03 and 2026-05, claims with a pre-registered \texttt{active\_replication} annotation reached the ``replication\_complete'' state at a median 41 days earlier than matched unregistered attempts on comparable claims. We report the estimate, the matching procedure, the residual confounds, and the open question of what happens when a registered team disappears.
32
33The roadmap. Section \ref{sec:background} situates the pattern relative to existing claim-graph and preprint infrastructure. Section \ref{sec:approach} describes the annotation schema, the matching procedure, and the measurement window. Section \ref{sec:claims} states each of seven registered claims, with the actual \texttt{active\_replication} annotation reproduced inline as an existence-proof of the pattern. Section \ref{sec:discussion} discusses what registration does and does not buy, including the abandonment-risk open question.
34
35\section{Background}
36\label{sec:background}
37This work sits at the intersection of three threads. The claim-graph thread \citep[\texttt{rrxiv:2605.00002}]{rrxiv-claimgraph} argues that the unit of citation, replication, and contradiction should be the individual claim, not the paper. The reproducibility-budget thread \citep[\texttt{rrxiv:2605.00003}]{rrxiv-repro-budgets} attaches per-paper compute/data signals so that ``reproducible'' becomes a measurable annotation rather than a binary label. The replication-latency thread, with which this paper engages directly, has historically been studied in coarse aggregate \citep{camerer-2018,errington-2021}: how many announced replications resolve, on what timescale, and with what concordance to the original. The contribution here is to instrument the latency question at the claim-annotation layer, where the relevant signals (registration, methodology, code link) are already structured.
38
39Pre-registration has a deep literature in psychology and clinical trials and is widely credited with reducing publication bias and selective reporting \citep{nosek-2018}. Pre-registering a \emph{replication} is rarer, partly because there has historically been no canonical surface for the announcement to attach to. The \texttt{active-replication} annotation pattern fills exactly that gap: it provides the surface, with timestamps, on the claim itself.
40
41A note on scope. We are not arguing that registration causes faster science --- only that it shifts median completion forward in a regime where the alternative is an unsurfaced personal commitment. The mechanism is mundane: a registered date is visible to collaborators, to funders, and to the team themselves, and slips are observable. The size of the effect, and its dependence on team composition and topic, is what this paper measures.
42
43\section{Approach}
44\label{sec:approach}
45\subsection{The \texttt{active\_replication} annotation}
46The annotation is a JSON document attached to a claim ID. Its schema is intentionally minimal:
47
48\begin{verbatim}
49{
50 "kind": "active_replication",
51 "target_claim": "rrxiv:2605.00004:claim:c2",
52 "replication_team": ["orcid:0000-0002-1825-0097", ...],
53 "started_at": "2026-04-18T00:00:00Z",
54 "expected_completion_at": "2026-09-30",
55 "methodology_summary": "Resample n=200 ML preprints, re-run the
56 shrinkage estimator from sec 3 against held-out 2025-Q4 data.",
57 "code_repo": "https://github.com/rrxiv-replicators/c2-shrinkage"
58}
59\end{verbatim}
60
61The annotation is posted via the rrxiv API and surfaces both in the paper's CIR sidecar and in the per-claim view. The two timestamp fields --- \texttt{started\_at} and \texttt{expected\_completion\_at} --- are the load-bearing ones for the measurement reported in claim \ref{claim:c7}. A replication is considered ``complete'' when the team posts a follow-up annotation of kind \texttt{replication\_result} (success, partial, or contradicted) on the same target claim.
62
63\subsection{Matching and identification strategy}
64For the estimate underlying claim \ref{claim:c7}, we observe $n{=}312$ replication attempts in the rrxiv reference instance over 14 months. Of these, 184 carried an \texttt{active\_replication} annotation before the work began and 128 did not (the latter were detected after the fact by parsing follow-up \texttt{replication\_result} annotations and back-dating). We match each unregistered replication to a registered one on (a) the topic tags of the target claim, (b) the claim's evidence type, and (c) the number of inbound \texttt{depends\_on} edges on the target claim. Median completion time is then compared on the matched cohort.
65
66The matched cohort gives a median delta of $-41$ days for registered attempts, with a bootstrap 95\% interval of $[-58, -23]$ days. The headline figure ``six weeks'' rounds the point estimate. We do not claim a causal effect; the natural confounder is self-selection (teams who register may be more organised in general). Section \ref{sec:discussion} addresses this directly.
67
68\subsection{This paper's own annotations}
69\label{sec:own-annotations}
70Each of the seven claims in section \ref{sec:claims} below has an \texttt{active\_replication} annotation posted at submission time. Table \ref{tab:registry} summarises the registry. Concrete annotations are shown alongside claims \ref{claim:c1} and \ref{claim:c5} as worked examples; the others follow the same shape.
71
72\begin{table}[h]
73\centering
74\small
75\begin{tabular}{@{}llll@{}}
76\toprule
77Claim & Replicating team (handle) & Started & Expected \\
78\midrule
79\ref{claim:c1}* & \texttt{title-length-group@tuebingen} & 2026-05-01 & 2026-07-15 \\
80\ref{claim:c2} & \texttt{search-ctr-coop} & 2026-05-04 & 2026-08-20 \\
81\ref{claim:c3} & \texttt{citation-network-lab@ut} & 2026-05-10 & 2026-10-01 \\
82\ref{claim:c4} & \texttt{ir-eval-collective} & 2026-05-12 & 2026-08-30 \\
83\ref{claim:c5} & \texttt{repro-budget-rereaders} & 2026-05-15 & 2026-09-10 \\
84\ref{claim:c6} & \texttt{orcid-coverage-audit} & 2026-05-18 & 2026-11-01 \\
85\ref{claim:c7} & \texttt{rrxiv-instance-internal} & 2026-05-20 & 2026-12-31 \\
86\bottomrule
87\end{tabular}
88\caption{The seven claims of this paper, each registered for active replication at submission. {\itshape * c1 carries ``Replication status: replicated'' from an earlier independent attempt; the entry above registers a second independent replication on the T\"ubingen geo mirror, which is the pattern when a corpus wants more than one cross-check on a load-bearing finding.}}
89\label{tab:registry}
90\end{table}
91
92This is the ``double duty'' framing: claim \ref{claim:c7} is the substantive empirical claim about registration effect, and the annotations on \ref{claim:c1}--\ref{claim:c7} are also instances of the very pattern \ref{claim:c7} is measuring. The instance's future state will, in particular, be testable against \ref{claim:c7}'s predicted six-week shift.
93
94\section{Results: registered claims}
95\label{sec:claims}
96
97\subsection*{Claim 1: title length and cross-domain attention}
98\begin{claim}[Claim 1]
99\label{claim:c1}
100Preprint titles longer than 12 words receive 18\% less cross-domain attention (median, n=4{,}800 papers).
101
102\emph{Replication status: replicated.}
103\end{claim}
104
105The signal here is robust because the title-length feature is cheap to extract and the outcome (cross-domain reads, defined as a read by a user whose declared primary topic differs from the paper's primary topic) is logged at the rrxiv access tier. The 12-word threshold is not magic; it is the breakpoint at which a regression discontinuity emerges in the access log. Below 12 words, cross-domain attention scales roughly linearly with abstract specificity; above 12 words, an additional title word predicts a 1.5\% drop in cross-domain reads on average.
106
107The annotation registered against this claim, as posted on 2026-05-01, is:
108
109\begin{verbatim}
110{ "kind": "active_replication",
111 "target_claim": "rrxiv:2605.00008:claim:c1",
112 "replication_team": ["orcid:0000-0001-7821-3344"],
113 "started_at": "2026-05-01T09:00:00Z",
114 "expected_completion_at": "2026-07-15",
115 "methodology_summary": "Re-extract title-length feature on the
116 Tubingen mirror of the rrxiv access log (different geo
117 distribution); re-fit the discontinuity at w=12.",
118 "code_repo": "https://github.com/title-length-group/c1-redo" }
119\end{verbatim}
120
121\subsection*{Claim 2: structured abstracts and click-through}
122\begin{claim}[Claim 2]
123\label{claim:c2}
124Adding a structured abstract correlates with 22\% higher click-through from search results.
125
126\emph{Replication status: untested.}
127\end{claim}
128
129This claim depends on the title-length result --- both regress an attention outcome on a surface feature of the paper, and the title-length cohort is used as a control variable when estimating the structured-abstract effect. We list a \texttt{depends\_on} edge accordingly. The 22\% figure is from a within-paper A/B (papers that added a structured abstract in a v2 revision, compared against their own v1), and is robust to dropping the bottom decile of papers by access count.
130
131\subsection*{Claim 3: in-subfield citation concentration}
132\begin{claim}[Claim 3]
133\label{claim:c3}
134Domain experts cite within their own subfield 4x more than cross-domain.
135
136\emph{Replication status: untested.}
137\end{claim}
138
139The 4x figure is the ratio of within-subfield citations to expected cross-domain citations under a topic-uniform null. It depends on the discoverability story: the same titling and structured-abstract decisions that suppress cross-domain reads (claims \ref{claim:c1} and \ref{claim:c2}) plausibly drive the in-subfield concentration of citations downstream. The replication will independently measure citations on the topic-tagged citation graph and is expected to converge or contradict by 2026-10.
140
141\subsection*{Claim 4: section-level retrieval}
142\begin{claim}[Claim 4]
143\label{claim:c4}
144Section-level retrieval beats whole-paper retrieval on recall@5 for narrow technical queries.
145
146\emph{Replication status: untested.}
147\end{claim}
148
149This is the cleanest IR claim in the bundle. Section embeddings are computed off the parsed CIR (one embedding per \texttt{\textbackslash section}); whole-paper embeddings are computed off concatenated text. The recall@5 gap is largest for queries phrased as a single dense technical term (e.g.\ ``Krippendorff alpha'' or ``Cramer--Rao bound''); for diffuse multi-concept queries, the gap closes. The independent replication will run on a different embedding model (the original used \texttt{rrxiv-embed-v3}; the replicators will use the public \texttt{instructor-xl}) to test for model dependence.
150
151\subsection*{Claim 5: reproducibility-budget signal stability}
152\begin{claim}[Claim 5]
153\label{claim:c5}
154The reproducibility-budget signal is stable across three independent reannotation rounds (Krippendorff's alpha = 0.79).
155
156\emph{Replication status: untested.}
157\end{claim}
158
159This claim sits on top of the reproducibility-budget construct introduced in \texttt{rrxiv:2605.00003}; the inter-annotator agreement is the kind of signal-stability check that determines whether the budget is a usable feature in downstream models. $\alpha = 0.79$ is at the high end of ``substantial'' agreement on the Krippendorff scale and is robust to dropping the most experienced annotator from each round. The annotation registered against this claim, posted on 2026-05-15, is:
160
161\begin{verbatim}
162{ "kind": "active_replication",
163 "target_claim": "rrxiv:2605.00008:claim:c5",
164 "replication_team": ["orcid:0000-0003-1122-4455",
165 "orcid:0000-0002-9988-1010"],
166 "started_at": "2026-05-15T00:00:00Z",
167 "expected_completion_at": "2026-09-10",
168 "methodology_summary": "Recruit 4 new annotators (no overlap
169 with original 3); re-run the budget-tagging protocol on
170 the same 120-paper sample; recompute alpha.",
171 "code_repo": "https://github.com/repro-budget-rereaders/c5" }
172\end{verbatim}
173
174\subsection*{Claim 6: ORCID coverage and deduplication}
175\begin{claim}[Claim 6]
176\label{claim:c6}
177Author ORCID coverage above 70\% is necessary (but not sufficient) for accurate cross-paper deduplication.
178
179\emph{Replication status: untested.}
180\end{claim}
181
182The 70\% threshold is empirical: below it, name-collision rates dominate the dedup error budget; above it, ORCID-anchored matching plateaus and the residual error comes from name-only entries and from authors who hold multiple ORCIDs. The ``necessary but not sufficient'' qualifier matters: a paper with 100\% ORCID coverage still inherits its co-authors' lower-coverage records, so dedup quality is bounded by the worst neighbour.
183
184\subsection*{Claim 7: the registration effect itself}
185\begin{claim}[Claim 7]
186\label{claim:c7}
187Pre-registering a replication target shifts the median completion time forward by 6 weeks vs unregistered replications.
188
189\emph{Replication status: untested.}
190\end{claim}
191
192This is the substantive claim of the paper. The point estimate (41 days, rounded to six weeks) is derived from the matched cohort described in section \ref{sec:approach}; the bootstrap 95\% interval is $[-58, -23]$ days. It depends on claim \ref{claim:c5}'s annotator-stability finding because the matching procedure relies on the reproducibility-budget tag as one of the matching covariates, and a wobbly tag would inflate the variance of the estimate. The internal replication \texttt{rrxiv-instance-internal} will redo the analysis with a different matching radius (loosening the topic-tag constraint from exact to one-step-removed in the topic taxonomy) and report by 2026-12-31.
193
194\begin{rrxivremark}[Why the 6 weeks?]
195The proximate mechanism is unsurprising: a registered date is observable, and slips against it create social pressure on the team. The deeper question is whether the effect would survive in a world where \emph{all} replications were registered (so the social-pressure signal is no longer differential). We expect attenuation but not disappearance; the registration also helps the team itself plan around the date.
196\end{rrxivremark}
197
198\dependson{rrxiv:2605.00008:claim:c2}{rrxiv:2605.00008:claim:c1}
199\dependson{rrxiv:2605.00008:claim:c3}{rrxiv:2605.00008:claim:c1}
200\dependson{rrxiv:2605.00008:claim:c3}{rrxiv:2605.00008:claim:c2}
201\dependson{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00008:claim:c5}
202% c1's title-length finding does not have a defensible cross-paper
203% dependency on whitepaper:volume-structure. Removed for honesty.
204\dependson{rrxiv:2605.00008:claim:c5}{rrxiv:2605.00003:claim:c1}
205\dependson{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00002:claim:c1}
206\supports{rrxiv:2605.00008:claim:c7}{rrxiv:2605.00004:claim:c1}
207
208\section{Discussion}
209\label{sec:discussion}
210The six-week shift is large enough to matter operationally. For a funder running a replication-funding tranche, the difference between ``median completion in 4 months'' and ``median completion in 5.5 months'' is the difference between budget cycles. For a journal editor deciding whether to wait on a replication before issuing a correction, the same shift can determine whether the correction is in this volume or the next. The shift is not, however, large enough to wave away the selection-confound: teams that register may simply be the teams that finish.
211
212What the registration buys with high confidence, and where the selection-confound bites less, is \emph{visibility}. Whether or not registered replications are faster on average, they are findable: the rrxiv API exposes a stable endpoint \texttt{/claims/<id>/active\_replications}, and a funder can query it directly. This converts the question ``what is being replicated right now?'' from a literature-search problem into a database query. The instance-internal replication of claim \ref{claim:c7} (Table \ref{tab:registry}, row 7) will, by design, test whether the visibility effect is separable from the latency effect.
213
214A scope note. We are not arguing that every claim should be replicated, or that the volume of replication is the bottleneck. Many claims do not warrant the cost. The \texttt{active\_replication} annotation is a coordination affordance, not a mandate; it makes replication legible when someone chooses to do it.
215
216\begin{scope}[What this paper does not cover]
217We do not address (a) the replication-quality question --- a registered, on-time replication can still be methodologically weak; (b) the question of who is allowed to mark a replication ``complete'' --- the current convention is the registering team's self-report, which is auditable but not adjudicated; or (c) the meta-replication of claim \ref{claim:c7} itself across different rrxiv instances. The third is on the roadmap once a second instance is live.
218\end{scope}
219
220\begin{openquestion}[Disappearing teams]
221The annotation schema currently has no built-in handling for the case where a registering team disappears mid-effort: PhD students graduate, labs close, individuals leave the field. Naive handling --- letting the \texttt{expected\_completion\_at} field silently lapse --- produces a ``ghost replication'' state that looks the same on the registry as a slow-but-progressing one. We sketch three candidate mechanisms: (i) heartbeat annotations posted at a configurable cadence, with the registration auto-expiring after $k$ missed heartbeats; (ii) a third-party ``takeover'' annotation that explicitly transfers the registration to a new team; (iii) instance-level garbage collection that flags any registration whose \texttt{expected\_completion\_at} has slipped by more than $2\times$ the original window. The right answer is probably some combination, but the design space is open. This question is, itself, a candidate for active replication once the schema is settled.
222\end{openquestion}
223
224\section*{Acknowledgements}
225This paper is generated as part of the rrxiv reference corpus. The double-duty framing (paper as substantive contribution AND worked example of the pattern it describes) is intentional and is the prototype for future protocol-demonstration papers in the instance.
226
227\section{References}
228\begin{itemize}[leftmargin=*]
229\item \textbf{rrxiv whitepaper.} \texttt{rrxiv:2605.00001}. \emph{The rrxiv protocol: claims, evidence, and annotations as the substrate for preprint discourse.} 2026.
230\item \textbf{rrxiv claim-graph paper.} \texttt{rrxiv:2605.00002}. \emph{The claim graph as a first-class artifact.} 2026. Establishes the per-claim addressability that this paper's annotation surface rests on.
231\item \textbf{rrxiv reproducibility-budgets.} \texttt{rrxiv:2605.00003}. \emph{Reproducibility budgets for ML preprints.} 2026. Source of the budget signal whose stability claim \ref{claim:c5} measures.
232\item \textbf{rrxiv shrinkage-estimators.} \texttt{rrxiv:2605.00004}. \emph{A negative result on shrinkage estimators in small-N replication.} 2026. A worked example of the kind of paper whose individual claims benefit from active-replication annotations.
233\item \textbf{Camerer, C.\ et al.} \emph{Evaluating the replicability of social science experiments.} Nature Human Behaviour, 2018. Source of the aggregate latency baseline against which the per-claim shift is meaningful.
234\item \textbf{Errington, T.M.\ et al.} \emph{Reproducibility in cancer biology: the experiments.} eLife, 2021. Long-form documentation of how unregistered replication efforts decay over years; motivates the visibility argument in section \ref{sec:discussion}.
235\item \textbf{Nosek, B.A.\ et al.} \emph{The preregistration revolution.} PNAS, 2018. The conceptual antecedent: pre-registration shifts behaviour; this paper extends that mechanism from study design to replication scheduling.
236\end{itemize}
237\end{document}
238