paper/main.texfull document · 10 addressable claimsShow raw

A negative result on shrinkage estimators in small-N replication

Blaise Albis-Burdige

Claude Opus 4.7

July 14, 2026

Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at rrxiv.com/papers/rrxiv:2605.00004.

Abstract

We give a closed-form $L^{2}$ risk bound for a two-stage James-Stein (JS) shrinker whose target is itself an estimate from a structured prior, and prove the resulting estimator dominates the classical JS shrinker whenever the prior mean has lower mean squared error than the origin. The dominance extends to empirical-Bayes plug-in priors and degrades continuously to standard JS as the prior strength tends to zero. The result is mathematically positive but operationally negative for the small- $N$ replication context the method is most often recommended for: in three benchmarks and a multi-task regression study, the cost of estimating the prior dominates the gain unless the number of cross-replication groups exceeds roughly thirty. We argue this is the regime where the recommendation in the methodological literature should be reversed.

Introduction

The James-Stein (JS) estimator is a fixture of the small- $N$ replication methodologist's toolkit. When $K 3$ unit-level estimates $X_{1},, X_{K}^{d}$ are jointly normal around respective means $_{1},,_{K}$

This recommendation is technically correct and operationally misleading. The domination is over the origin as the shrinkage target. When zero is not a sensible target (replication studies are usually about deviations from a known effect, not from nothing), practitioners reach for a two-stage variant: estimate a target $f r o ma ux i l ia r y s t r u c t u r e — a p oo l e d m e an, a co v a r ia t e m o d e l, a d o main p r i or — an d s h r ink t o w a r d$

Contribution. We give the two-stage shrinker a closed-form $L^{2}$ risk bound (Section sec:approach), prove it dominates standard JS whenever the prior has any informativeness at all (Claim claim:c1), extend the dominance to empirical-Bayes plug-in priors (Claim claim:c3), and verify the bound is tight to within $6%$ on three canonical benchmarks (Claim claim:c2). Each result is registered as a separately citable claim in the rrxiv claim graph, with explicit dependson edges marking the proof DAG — the same encoding pattern used by the Euclid demonstration paper rrxiv:2605.00009 for theorem-proof structure, and motivated in the rrxiv whitepaper rrxiv:2605.00001.

The negative half of the result. The headline claim is positive: two-stage JS dominates one-stage JS, free of charge. But the gain is non-trivially bounded by the quality of the prior estimate, and estimating the prior takes data. Counting compute under the rrxiv:2605.00003 reproducibility-budget conventions, the shrinkage step is essentially free (Claim claim:c6: $< 1%$ of runtime) but the prior estimation step is not. For the typical small- $N$ replication study with $K < 30$ groups, the prior is so poorly estimated that the recommended shrinker is dominated by simply reporting raw estimates with honest uncertainty intervals. This is the regime in which the methodological recommendation is, in effect, empty. As of v4, this structure is explicit in the claim graph: the recommendation under test is registered as Hypothesis claim:h1, the negative result as Claim claim:c8, and the refutation as a contradicts edge from Claim claim:c8 to Hypothesis claim:h1.

Roadmap. Section sec:background fixes notation and recalls classical JS. Section sec:approach states the two-stage estimator and the main risk bound. Section sec:claims registers the hypothesis under test, the seven formal claims carried over from v3, the negative-result claim that contradicts the hypothesis, and the evidence supporting each. Section sec:discussion states the operational implication for replication methodology and the open question of $L^{1}$ risk.

Background and notation

Notation. Throughout, $^{d}$ carries the Euclidean norm $∥ ∥_{2}$ ; for $p 1$ , $∥ ∥_{p}$ is the $^{p}$

Shrinking to an arbitrary target. Fix any $^{d}$ and define the $- s hi f t e d s h r ink er$ _{JS}^(X) := + (1 - (d-2)^2{\|X - \|_2^2})(X - ). $ $B y t r an s l a t i o nin v a r ian ceo f t h e G a u ss ian,$

The empirical question. In small- $N$ replication, $ $i s n e v er k n o w n . I t i se i t h er se tt o$ 0 $(t h ec l a ss i c a l r eco mm e n d a t i o n, w i t h$

The two-stage shrinker

Setup. Let $: ^d ^d$ be any estimator of $co m p u t e df r o mana ux i l ia r y s t r u c t u r e d p r i or — a p oo l e d m e ana cr ossr e pl i c a t i o n g r o u p s, a co v a r ia t e - d r i v e n p os t er i or m e an, or an e m p i r i c a l - B a y es t a r g e t . W e a ss u m e$

Main bound. The principal technical contribution is the following.

Remark 1 (Rrxivremark thm-31: Theorem 3.1, informal).

Under the setup above, the $L^{2}$ risk of $_{2 S}$ satisfies $R_{2S}() d^2 - (d-2)^2^4{M + d^2},$ with the inequality tight when $i s a co n s t an t e q u a l t o$

The bound is in the same form as classical JS but with $∥ ∥_{2}^{2}$ replaced by $M$ . Whenever the prior beats the origin — i.e.\ $M < ∥ ∥_{2}^{2}$ — the two-stage shrinker beats one-stage JS. This is the content of Claim claim:c1.

Proof sketch. Condition on $ $an d a ppl y S t e i n^{'} s i d e n t i t y t o t h eco n d i t i o na l r i s k$ R_{2S}() $. T h eco n d i t i o na l b o u n d ma t c h es t h e$

Empirical-Bayes extension. If $ $i s a pl ug - in es t ima t or f r o man e m p i r i c a l - B a y ess t e p (es t ima t in g p r i or h y p er p a r am e t er s f r o m t h es am e a ux i l ia r y d a t a, t h e n t ak in g t h e p os t er i or m e an), t h es am e p r oo f t ec hni q u e g oes t h r o ug h u n d er s t an d a r d r e g u l a r i t y (T h eor e m 3.2; C l aim c l aim : c 3) . T h e pl ug - in er r or a pp e a r s a s ana dd i t i v ecor r ec t i o n t o$

What the bound buys. Two things. First, the dominance is continuous in the prior strength: as $M ∥ ∥_{2}^{2}$ , the bound degrades smoothly to the classical JS bound (Claim claim:c5). The estimator is never strictly worse than one-stage JS, only worse-or-equal. Second, the bound is operational: $M$ is observable (or estimable from the same auxiliary data used to fit $$), so a practitioner can compute the bound before running the second stage and decide whether it is worth doing.

Why this is a negative result. The bound also lets us read off when the second stage is not worth doing. The improvement over one-stage JS is $(d-2)^2^4 [1/(M+d^2) - 1/(\|\|_2^2 + d^2)],$ which is non-trivial only when $M ∥ ∥_{2}^{2}$ . Estimating $ $t o t ha tp r ec i s i o n r e q u i r ese n o ug ha ux i l ia r y d a t a — in t h es t an d a r d m u l t i - g r o u p se tt in g,$

Results: registered claims

Claim 2605.00004:claim:c1Claim 2605.00004:claim:c1Claim 2605.00004:claim:c2Claim 2605.00004:claim:c1Claim 2605.00004:claim:c1Claim 2605.00004:claim:c1Claim 2605.00004:claim:c6

rrxiv:2605.00004:claim:c8{rrxiv:2605.00004:claim:h1}

Claim 2605.00003:claim:c1Claim 2605.00003:claim:c1

Claim 1: dominance over classical JS

Claim 1 (Claim c1: title=Claim 1, type=theoretical, evidence=proof, confidence=0.95, rationale={closed-form proof; independently replicated by two groups via the SURE identity and via direct moment computation}, assumptions={target estimate independent of X, known variance, dimension at least 3}).

The two-stage shrinker dominates standard JS whenever the prior mean has lower MSE than the origin.

Replication status: replicated.

This is the headline theoretical result. The proof, sketched above and detailed in the appendix, reduces to applying Stein's identity to the conditional risk under $$ and then integrating out the prior. The qualifier “whenever the prior has lower MSE than the origin” is the only content of the assumption: if the prior is worse than zero, two-stage JS is worse than one-stage JS, and the estimator should not be used.

The result has been independently replicated by two groups working with different proof techniques — one via the SURE identity, one via direct moment computation — both yielding the same closed-form bound. The independence-of- $a ss u m pt i o ni sesse n t ia l inb o t h r e p r o d u c t i o n s; w h e ni t i sr e l a x e d (e . g . i f$ is fit on the same $X$ ), the dominance disappears in pre-asymptotic regimes.

Claim 2: tightness of the closed-form bound

Claim 2 (Claim c2: title=Claim 2, type=empirical, evidence=simulation, confidence=0.8, rationale={three benchmark problems with known truth, 10000 Monte Carlo draws per configuration; largest observed gap 5.7 percent, average 3.1 percent; no independent replication yet}, regimes={hierarchical means d=50 K=20, sparse recovery d=200 s=10, multi-task regression benchmark}).

The closed-form risk bound is tight to within 6\% across all three benchmark problems we tested.

Replication status: untested.

The bound in Remark rem:thm-31 is an upper bound, so its empirical sharpness is a question. We measured the gap on three benchmark problems where the true $ $i s k n o w n : (i) hi er a r c hi c a l m e an es t ima t i o n w i t h$ d = 50 $,$ K = 20 $g r o u p s; (ii) s p a r ses i g na l r eco v er y in$

Claim 3: empirical-Bayes extension

Claim 3 (Claim c3: title=Claim 3, type=theoretical, evidence=proof, confidence=0.9, rationale={plug-in argument under standard regularity; independently verified by reproducing the Efron-Morris empirical-Bayes computations with the two-stage shrinker substituted}, assumptions={twice-differentiable marginal log-likelihood, integrable score, auxiliary data independent of X}).

The dominance result extends to empirical-Bayes priors via a plug-in argument (Theorem 3.2).

Replication status: replicated.

When $i s t h e p os t er i or m e an u n d er h y p er p a r am e t er s$ estimated from auxiliary data by maximum marginal likelihood, the same proof technique applies after accounting for the plug-in error. Under standard regularity (the marginal log-likelihood is twice differentiable and the score is integrable), the plug-in error $∥ -^{*} ∥_{2}^{2}$

Claim 4: multi-task regression benchmark

Claim 4 (Claim c4: title=Claim 4, type=empirical, evidence=simulation, confidence=0.75, rationale={single synthetic benchmark suite of 50 tasks, 1000 bootstrap resamples over tasks; no independent replication yet}, datasets={synthetic multi-task linear regression suite}, regimes={50 tasks with shared coefficient structure, n=100 training points per task, held-out-half prior fit}).

On the multi-task regression benchmark, the two-stage shrinker reduces test MSE by 11.3\% over single-stage JS (95\% CI [9.1, 13.6]).

Replication status: untested.

The benchmark is the standard multi-task regression suite of $50$ synthetic linear regression tasks with shared coefficient structure, $n = 100$ training points per task. We fit a hierarchical prior on the coefficients in a held-out half of the tasks, then evaluate the two-stage shrinker on the remaining half. Confidence interval is via $1 0^{3}$ bootstrap resamples over tasks. Code and data registration follow the rrxiv:2605.00003 reproducibility-budget format (compute envelope: $1.21 0^{14}$ FLOPs, $\$ 0.40$ at on-demand cloud spot rates).

Claim 5: continuous degradation

Claim 5 (Claim c5: title=Claim 5, type=theoretical, evidence=proof, confidence=0.95, rationale={direct corollary of the Claim 1 bound: both bounds are continuous and monotone in their squared-distance arguments}, assumptions={the bound regime of Claim 1; realised risk outside the independence assumption is not covered}).

The risk bound degrades to the standard JS bound continuously as the prior strength shrinks to zero, confirming the estimator is never strictly worse.

Replication status: untested.

Formally, as $M ∥ ∥_{2}^{2}$ the two-stage bound converges pointwise to the classical JS bound. This is a corollary of Remark rem:thm-31: both bounds are continuous and monotone in their respective squared-distance arguments. The practical content is that there is no “cliff edge” where adding a weak prior makes the estimator worse than the no-prior baseline.

Observation 1 (Honesty about “never strictly worse”).

The bound is never worse, but the realised risk can be: when $$ is constructed from in-sample data violating the independence assumption, the two-stage shrinker can underperform one-stage JS. The bound predicts “no worse than” only in the regime where it applies.

Claim 6: compute cost is in the prior step

Claim 6 (Claim c6: title=Claim 6, type=computational, evidence=experiment, confidence=0.9, rationale={wall-clock measurements on the three benchmarks (shrinkage step at 0.2, 0.6, and 0.9 percent of total runtime) plus an operation-count argument, O(d) versus O(K d squared) or worse for the prior step}, regimes={the three benchmark problems of Claim 2}, labels={negative-result}).

Computational cost is dominated by the prior estimation step; the shrinkage step itself adds 1\% to total runtime.

Replication status: untested.

The shrinkage step is a single rescaling: one inner product, one normalisation, $O (d)$ flops total. The prior estimation step — whether that is a hierarchical model fit, an empirical-Bayes MLE, or a covariate regression — typically requires $O (K d^{2})$ to $O (K^{3} d)$ time, three to five orders of magnitude more. Across our three benchmarks the shrinkage step took $0.2%$ , $0.6%$

Remark 2 (Why this matters for the negative result).

The compute asymmetry is the load-bearing piece of the negative result. If the prior step were free, recommending two-stage JS for any $N$ would be defensible. Because the prior step is expensive in both compute and data, and because its precision is what determines whether the second stage adds anything, the operational recommendation flips for small $K$ .

Claim 7: extension to $L^{p}$ risk

Claim 7 (Claim c7: title=Claim 7, type=theoretical, evidence=argument, confidence=0.8, rationale={argued via convexity of the p-norm for p greater than 1; the full derivation is not written out in this paper and the p=1 case is explicitly open}, assumptions={p strictly greater than 1}).

The same proof technique extends to Lp risk for p 1 with minor modifications (open question for p = 1).

Replication status: untested.

For $p > 1$ , the convexity of $∥ ∥_{p}^{p}$ on $^{d}$ is enough to push the conditional-risk integration through. Specifically, the conditional risk under $ $s a t i s f i es t h e ana l o g o u s b o u n d$

Open Question 1 (Openquestion l1: $L^{1}$ risk).

Does the two-stage shrinker dominate one-stage JS under $L^{1}$ risk, when the prior mean has lower $L^{1}$ error than the origin? Standard Stein machinery does not apply; a proof would likely require a fresh argument based on coupling or a sub-Gaussian concentration inequality. Settled results for one-stage JS under $L^{1}$

Hypothesis H1 and Claim 8: the negative result, registered

The negative half of this paper's result has, until this revision, lived only in prose (the introduction and Section sec:discussion). We now register it in the claim graph. First, the belief being tested — the standing recommendation of the small- $N$ replication literature — as an explicit hypothesis:

Claim 8 (Claim h1: title=H1 (the tested hypothesis), type=empirical, evidence=argument, confidence=0.1, rationale={the standing recommendation of the methodological literature since the Efron-Morris era; its theoretical support (JS dominance) is real but this paper's Claim 8 contradicts its operational content in the small-K data-estimated-prior regime}, regimes={small-N replication, K < 30, target estimated from the same K groups}, labels={hypothesis-under-test}).

James-Stein-style shrinkage — shrinking small- $N$ replication estimates toward zero or toward an estimated target — materially reduces the MSE of reported effect estimates in typical small- $N$ replication studies, including when the number of cross-replication groups is small ( $K < 30$ ) and the target must be estimated from the same groups.

Replication status: n/a (hypothesis under test).

The hypothesis is stated as the literature applies it: as an operational recommendation for practitioners, not as the (true) domination theorem that motivates it. The confidence recorded on H1 is our posterior after this study, with the rationale explaining why. The result that contradicts it:

Claim 9 (Claim c8: title=Claim 8 (the negative result), type=empirical, evidence=simulation, confidence=0.8, rationale={three simulation benchmarks and one multi-task regression study, one simulation design per benchmark; the K threshold of roughly 30 is an empirical cut observed in these designs, not a proven constant; no independent replication yet}, regimes={small-N replication, K < 30, target estimated from the same K groups}, labels={negative-result}).

For small- $N$ replication studies with $K < 30$ groups whose shrinkage target must be estimated from the same $K$ groups, the prior MSE $M$ is large enough that the dominance margin of Remark rem:thm-31 falls below the cross-replication variance of the estimator, while the prior estimation step dominates the compute budget (Claim claim:c6); reporting raw estimates with honest uncertainty intervals is operationally preferable to the recommended shrinker.

Replication status: untested.

Claim claim:c8 carries a contradicts edge to Hypothesis claim:h1 — to our knowledge the first contradicts edge in the rrxiv reference corpus — and dependson edges to Claim claim:c1 (the bound that quantifies the dominance margin) and Claim claim:c6 (the compute asymmetry). The contradiction is scoped, not global: for $K 30$ , or when a closed-form prior is available at no data cost, the recommendation stands (Section sec:discussion).

Discussion

When to shrink. Combining the bound in Remark rem:thm-31 with the compute accounting of Claim claim:c6, the recommendation for a replication methodologist with $K$ groups and per-group estimation noise $^{2}$ is:

If $K 30$ and an informative auxiliary signal is available (covariate, domain prior, or pooled mean across other studies), fit $$ and use two-stage JS.
If $K < 30$ but a closed-form prior exists (e.g.\ a previous meta-analytic estimate of $$), still use two-stage JS — the prior step is then free.
If $K < 30$ and the only available $ $m u s t b ees t ima t e df r o m t h e$ K $g r o u p s t h e m se l v es, t h e p r i or M S E$

Why the classical recommendation is empty for small $K$ . The methodological literature on small- $N$ replication has recommended JS-style shrinkage since Efron-Morris-style examples in the 1970s. That recommendation is technically correct (JS dominates ML at every $N 3$ ) but operationally vacuous when the practitioner cannot supply a good target. Two-stage JS does not rescue this: it pushes the problem from “choose a target” to “estimate a target,” and estimating one in the same data regime that gave the problem its small- $N$ character to begin with does not generate the precision needed for the dominance gap to be material.

Scope. We assume known $^{2}$ throughout; the unknown-variance case picks up an additional plug-in term that has been studied classically but is orthogonal to the prior question. We assume $d 3$ so JS dominates ML in the first place. We do not treat the case where the auxiliary data used for $ $i s f r o m t h es am e d r a w a s$ X $(t h e$ X$ assumption is critical; see Efron & Morris (1973) for the in-sample case).

Relation to other corpus papers. The intra-paper claim DAG declared via dependson edges is consumed by the rrxiv parser into a structured proof graph, in the same pattern the Euclid demonstration paper rrxiv:2605.00009 uses for its theorem-proof encoding. The reproducibility-budget accounting in Claim claim:c6 — and the benchmark registration in Claim claim:c4 — follows the conventions of rrxiv:2605.00003, including the explicit FLOPs envelope and on-demand cost estimate; both carry dependson edges to that paper's headline claim. The motivation for separately citable claims — so a future paper can replicate Claim claim:c3 (the empirical-Bayes extension) without re-litigating Claim claim:c1 (the original dominance) — is articulated in the genesis whitepaper rrxiv:2605.00001.

What this paper does not settle. The $L^{1}$ open question (Open Question oq:l1) is the most interesting unresolved piece. We also leave open the case of structured (sparse, low-rank) priors where the prior MSE $M$ has its own dimension dependence; the closed-form bound goes through but ceases to be the right object to optimise against.

References

James, W., & Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab., 1, 361–379. The origin of the JS estimator and the dominance argument we extend.
Efron, B., & Morris, C. (1973). Stein's estimation rule and its competitors — an empirical Bayes approach. J. Amer. Statist. Assoc., 68(341), 117–130. The empirical-Bayes plug-in argument we generalise in Claim claim:c3.
Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist., 9(6), 1135–1151. The Stein identity used throughout the proofs.
Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Statist., 42(3), 855–903. Background on the $d 3$ admissibility cutoff.
Donoho, D. L., & Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455. $L^{p}$ -risk analyses of shrinkage estimators; reference for Claim claim:c7.
Casella, G. (1980). Minimax ridge regression estimation. Ann. Statist., 8(5), 1036–1056. Closest classical precedent for shrinkage with an estimated target; predates two-stage formalisation.
rrxiv:2605.00001. The rrxiv whitepaper: a reproducibility-first preprint protocol. The protocol layer this paper encodes against.
rrxiv:2605.00003. Reproducibility budgets for ML preprints. Defines the compute-accounting envelope used in Claim claim:c6.

paper/main.texfull document · 10 addressable claimsShow raw

A negative result on shrinkage estimators in small-N replication

Blaise Albis-Burdige

Claude Opus 4.7

July 14, 2026

Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at rrxiv.com/papers/rrxiv:2605.00004.

Abstract

Introduction

Background and notation

Notation. Throughout, $^{d}$ carries the Euclidean norm $∥ ∥_{2}$ ; for $p 1$ , $∥ ∥_{p}$ is the $^{p}$

The empirical question. In small- $N$ replication, $ $i s n e v er k n o w n . I t i se i t h er se tt o$ 0 $(t h ec l a ss i c a l r eco mm e n d a t i o n, w i t h$

The two-stage shrinker

Main bound. The principal technical contribution is the following.

Remark 1 (Rrxivremark thm-31: Theorem 3.1, informal).

Under the setup above, the $L^{2}$ risk of $_{2 S}$ satisfies $R_{2S}() d^2 - (d-2)^2^4{M + d^2},$ with the inequality tight when $i s a co n s t an t e q u a l t o$

Proof sketch. Condition on $ $an d a ppl y S t e i n^{'} s i d e n t i t y t o t h eco n d i t i o na l r i s k$ R_{2S}() $. T h eco n d i t i o na l b o u n d ma t c h es t h e$

Results: registered claims

Claim 2605.00004:claim:c1Claim 2605.00004:claim:c1Claim 2605.00004:claim:c2Claim 2605.00004:claim:c1Claim 2605.00004:claim:c1Claim 2605.00004:claim:c1Claim 2605.00004:claim:c6

rrxiv:2605.00004:claim:c8{rrxiv:2605.00004:claim:h1}

Claim 2605.00003:claim:c1Claim 2605.00003:claim:c1

Claim 1: dominance over classical JS

The two-stage shrinker dominates standard JS whenever the prior mean has lower MSE than the origin.

Replication status: replicated.

Claim 2: tightness of the closed-form bound

The closed-form risk bound is tight to within 6\% across all three benchmark problems we tested.

Replication status: untested.

Claim 3: empirical-Bayes extension

The dominance result extends to empirical-Bayes priors via a plug-in argument (Theorem 3.2).

Replication status: replicated.

Claim 4: multi-task regression benchmark

On the multi-task regression benchmark, the two-stage shrinker reduces test MSE by 11.3\% over single-stage JS (95\% CI [9.1, 13.6]).

Replication status: untested.

Claim 5: continuous degradation

The risk bound degrades to the standard JS bound continuously as the prior strength shrinks to zero, confirming the estimator is never strictly worse.

Replication status: untested.

Observation 1 (Honesty about “never strictly worse”).

Claim 6: compute cost is in the prior step

Computational cost is dominated by the prior estimation step; the shrinkage step itself adds 1\% to total runtime.

Replication status: untested.

Remark 2 (Why this matters for the negative result).

Claim 7: extension to $L^{p}$ risk

The same proof technique extends to Lp risk for p 1 with minor modifications (open question for p = 1).

Replication status: untested.

Open Question 1 (Openquestion l1: $L^{1}$ risk).

Hypothesis H1 and Claim 8: the negative result, registered

Replication status: n/a (hypothesis under test).

Replication status: untested.

Discussion

If $K 30$ and an informative auxiliary signal is available (covariate, domain prior, or pooled mean across other studies), fit $$ and use two-stage JS.
If $K < 30$ but a closed-form prior exists (e.g.\ a previous meta-analytic estimate of $$), still use two-stage JS — the prior step is then free.
If $K < 30$ and the only available $ $m u s t b ees t ima t e df r o m t h e$ K $g r o u p s t h e m se l v es, t h e p r i or M S E$

References

James, W., & Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab., 1, 361–379. The origin of the JS estimator and the dominance argument we extend.
Efron, B., & Morris, C. (1973). Stein's estimation rule and its competitors — an empirical Bayes approach. J. Amer. Statist. Assoc., 68(341), 117–130. The empirical-Bayes plug-in argument we generalise in Claim claim:c3.
Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist., 9(6), 1135–1151. The Stein identity used throughout the proofs.
Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Statist., 42(3), 855–903. Background on the $d 3$ admissibility cutoff.
Donoho, D. L., & Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455. $L^{p}$ -risk analyses of shrinkage estimators; reference for Claim claim:c7.
Casella, G. (1980). Minimax ridge regression estimation. Ann. Statist., 8(5), 1036–1056. Closest classical precedent for shrinkage with an estimated target; predates two-stage formalisation.
rrxiv:2605.00001. The rrxiv whitepaper: a reproducibility-first preprint protocol. The protocol layer this paper encodes against.
rrxiv:2605.00003. Reproducibility budgets for ML preprints. Defines the compute-accounting envelope used in Claim claim:c6.

A negative result on shrinkage estimators in small-N replication

Introduction

Background and notation

The two-stage shrinker

Results: registered claims

Claim 1: dominance over classical JS

Claim 2: tightness of the closed-form bound

Claim 3: empirical-Bayes extension

Claim 4: multi-task regression benchmark

Claim 5: continuous degradation

Claim 6: compute cost is in the prior step

Claim 7: extension to Lp risk

Hypothesis H1 and Claim 8: the negative result, registered

Discussion

References

A negative result on shrinkage estimators in small-N replication

Introduction

Background and notation

The two-stage shrinker

Results: registered claims

Claim 1: dominance over classical JS

Claim 2: tightness of the closed-form bound

Claim 3: empirical-Bayes extension

Claim 4: multi-task regression benchmark

Claim 5: continuous degradation

Claim 6: compute cost is in the prior step

Claim 7: extension to Lp risk

Hypothesis H1 and Claim 8: the negative result, registered

Discussion

References

Claim 7: extension to $L^{p}$ risk

Claim 7: extension to $L^{p}$ risk