//
··
May 26, 2026
Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at rrxiv.com/papers/rrxiv:2605.00004.
We give a closed-form risk bound for a two-stage James-Stein (JS) shrinker whose target is itself an estimate from a structured prior, and prove the resulting estimator dominates the classical JS shrinker whenever the prior mean has lower mean squared error than the origin. The dominance extends to empirical-Bayes plug-in priors and degrades continuously to standard JS as the prior strength tends to zero. The result is mathematically positive but operationally negative for the small- replication context the method is most often recommended for: in three benchmarks and a multi-task regression study, the cost of estimating the prior dominates the gain unless the number of cross-replication groups exceeds roughly thirty. We argue this is the regime where the recommendation in the methodological literature should be reversed.
The James-Stein (JS) estimator is a fixture of the small- replication methodologist's toolkit. When unit-level estimates are jointly normal around respective means
This recommendation is technically correct and operationally misleading. The domination is over the origin as the shrinkage target. When zero is not a sensible target (replication studies are usually about deviations from a known effect, not from nothing), practitioners reach for a two-stage variant: estimate a target
Contribution. We give the two-stage shrinker a closed-form risk bound (Section sec:approach), prove it dominates standard JS whenever the prior has any informativeness at all (Claim claim:c1), extend the dominance to empirical-Bayes plug-in priors (Claim claim:c3), and verify the bound is tight to within on three canonical benchmarks (Claim claim:c2). Each result is registered as a separately citable claim in the rrxiv claim graph, with explicit dependson edges marking the proof DAG — the same encoding pattern used by the Euclid demonstration paper rrxiv:2605.00009 for theorem-proof structure, and motivated in the rrxiv whitepaper rrxiv:2605.00001.
The negative half of the result. The headline claim is positive: two-stage JS dominates one-stage JS, free of charge. But the gain is non-trivially bounded by the quality of the prior estimate, and estimating the prior takes data. Counting compute under the rrxiv:2605.00003 reproducibility-budget conventions, the shrinkage step is essentially free (Claim claim:c6: of runtime) but the prior estimation step is not. For the typical small- replication study with groups, the prior is so poorly estimated that the recommended shrinker is dominated by simply reporting raw estimates with honest uncertainty intervals. This is the regime in which the methodological recommendation is, in effect, empty.
Roadmap. Section sec:background fixes notation and recalls classical JS. Section sec:approach states the two-stage estimator and the main risk bound. Section sec:claims registers the seven formal claims and the evidence supporting each. Section sec:discussion states the operational implication for replication methodology and the open question of risk.
Notation. Throughout, carries the Euclidean norm ; for , is the
Shrinking to an arbitrary target. Fix any and define the _{JS}^(X) := + (1 - (d-2)^2{\|X - \|_2^2})(X - ). $
The empirical question. In small- replication, $0
Setup. Let : ^d ^d be any estimator of
Main bound. The principal technical contribution is the following.
Under the setup above, the risk of satisfies R_{2S}() d^2 - (d-2)^2^4{M + d^2}, with the inequality tight when
The bound is in the same form as classical JS but with replaced by . Whenever the prior beats the origin — i.e.\ — the two-stage shrinker beats one-stage JS. This is the content of Claim claim:c1.
Proof sketch. Condition on $R_{2S}()
Empirical-Bayes extension. If $
What the bound buys. Two things. First, the dominance is continuous in the prior strength: as , the bound degrades smoothly to the classical JS bound (Claim claim:c5). The estimator is never strictly worse than one-stage JS, only worse-or-equal. Second, the bound is operational: is observable (or estimable from the same auxiliary data used to fit $$), so a practitioner can compute the bound before running the second stage and decide whether it is worth doing.
Why this is a negative result. The bound also lets us read off when the second stage is not worth doing. The improvement over one-stage JS is (d-2)^2^4 [1/(M+d^2) - 1/(\|\|_2^2 + d^2)], which is non-trivial only when . Estimating $
The two-stage shrinker dominates standard JS whenever the prior mean has lower MSE than the origin.
Replication status: replicated.
This is the headline theoretical result. The proof, sketched above and detailed in the appendix, reduces to applying Stein's identity to the conditional risk under $$ and then integrating out the prior. The qualifier “whenever the prior has lower MSE than the origin” is the only content of the assumption: if the prior is worse than zero, two-stage JS is worse than one-stage JS, and the estimator should not be used.
The result has been independently replicated by two groups working with different proof techniques — one via the SURE identity, one via direct moment computation — both yielding the same closed-form bound. The independence-of- is fit on the same ), the dominance disappears in pre-asymptotic regimes.
The closed-form risk bound is tight to within 6\% across all three benchmark problems we tested.
Replication status: untested.
The bound in Remark rem:thm-31 is an upper bound, so its empirical sharpness is a question. We measured the gap on three benchmark problems where the true $d = 50K = 20
The dominance result extends to empirical-Bayes priors via a plug-in argument (Theorem 3.2).
Replication status: replicated.
When estimated from auxiliary data by maximum marginal likelihood, the same proof technique applies after accounting for the plug-in error. Under standard regularity (the marginal log-likelihood is twice differentiable and the score is integrable), the plug-in error
On the multi-task regression benchmark, the two-stage shrinker reduces test MSE by 11.3\% over single-stage JS (95\% CI [9.1, 13.6]).
Replication status: untested.
The benchmark is the standard multi-task regression suite of synthetic linear regression tasks with shared coefficient structure, training points per task. We fit a hierarchical prior on the coefficients in a held-out half of the tasks, then evaluate the two-stage shrinker on the remaining half. Confidence interval is via bootstrap resamples over tasks. Code and data registration follow the rrxiv:2605.00003 reproducibility-budget format (compute envelope: FLOPs, \0.40$ at on-demand cloud spot rates).
The risk bound degrades to the standard JS bound continuously as the prior strength shrinks to zero, confirming the estimator is never strictly worse.
Replication status: untested.
Formally, as the two-stage bound converges pointwise to the classical JS bound. This is a corollary of Remark rem:thm-31: both bounds are continuous and monotone in their respective squared-distance arguments. The practical content is that there is no “cliff edge” where adding a weak prior makes the estimator worse than the no-prior baseline.
The bound is never worse, but the realised risk can be: when $$ is constructed from in-sample data violating the independence assumption, the two-stage shrinker can underperform one-stage JS. The bound predicts “no worse than” only in the regime where it applies.
Computational cost is dominated by the prior estimation step; the shrinkage step itself adds 1\% to total runtime.
Replication status: untested.
The shrinkage step is a single rescaling: one inner product, one normalisation, flops total. The prior estimation step — whether that is a hierarchical model fit, an empirical-Bayes MLE, or a covariate regression — typically requires to time, three to five orders of magnitude more. Across our three benchmarks the shrinkage step took ,
The compute asymmetry is the load-bearing piece of the negative result. If the prior step were free, recommending two-stage JS for any would be defensible. Because the prior step is expensive in both compute and data, and because its precision is what determines whether the second stage adds anything, the operational recommendation flips for small .
The same proof technique extends to Lp risk for p 1 with minor modifications (open question for p = 1).
Replication status: untested.
For , the convexity of on is enough to push the conditional-risk integration through. Specifically, the conditional risk under $
Does the two-stage shrinker dominate one-stage JS under risk, when the prior mean has lower error than the origin? Standard Stein machinery does not apply; a proof would likely require a fresh argument based on coupling or a sub-Gaussian concentration inequality. Settled results for one-stage JS under
When to shrink. Combining the bound in Remark rem:thm-31 with the compute accounting of Claim claim:c6, the recommendation for a replication methodologist with groups and per-group estimation noise is:
Why the classical recommendation is empty for small . The methodological literature on small- replication has recommended JS-style shrinkage since Efron-Morris-style examples in the 1970s. That recommendation is technically correct (JS dominates ML at every ) but operationally vacuous when the practitioner cannot supply a good target. Two-stage JS does not rescue this: it pushes the problem from “choose a target” to “estimate a target,” and estimating one in the same data regime that gave the problem its small- character to begin with does not generate the precision needed for the dominance gap to be material.
Scope. We assume known throughout; the unknown-variance case picks up an additional plug-in term that has been studied classically but is orthogonal to the prior question. We assume so JS dominates ML in the first place. We do not treat the case where the auxiliary data used for $XX$ assumption is critical; see Efron & Morris (1973) for the in-sample case).
Relation to other corpus papers. The intra-paper claim DAG declared via dependson edges is consumed by the rrxiv parser into a structured proof graph, in the same pattern the Euclid demonstration paper rrxiv:2605.00009 uses for its theorem-proof encoding. The reproducibility-budget accounting in Claim claim:c6 follows the conventions of rrxiv:2605.00003, including the explicit FLOPs envelope and on-demand cost estimate. The motivation for separately citable claims — so a future paper can replicate Claim claim:c3 (the empirical-Bayes extension) without re-litigating Claim claim:c1 (the original dominance) — is articulated in the genesis whitepaper rrxiv:2605.00001.
What this paper does not settle. The open question (Open Question oq:l1) is the most interesting unresolved piece. We also leave open the case of structured (sparse, low-rank) priors where the prior MSE has its own dimension dependence; the closed-form bound goes through but ceases to be the right object to optimise against.
rrxiv:2605.00001. The rrxiv whitepaper: a reproducibility-first preprint protocol. The protocol layer this paper encodes against.rrxiv:2605.00003. Reproducibility budgets for ML preprints. Defines the compute-accounting envelope used in Claim claim:c6.rrxiv:2605.00003rrxiv:2605.00009. Euclid's Elements, encoded as an rrxiv paper. The canonical theorem-proof DAG example.