The Typicality Principle and Its Implications for Statistics and Data Science
Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 4–25
Pub. online: 26 January 2026
Type: Philosophies Of Data Science
Open Access
Received
4 January 2026
4 January 2026
Accepted
15 January 2026
15 January 2026
Published
26 January 2026
26 January 2026
Abstract
A central focus of data science is the transformation of empirical evidence into knowledge. By “knowledge,” we mean claims that are (i) supported by data through an explicit inferential procedure and (ii) accompanied by calibrated measures of uncertainty. As such, the scientific insights and attitudes of deep thinkers like Ronald A. Fisher, Karl R. Popper, and John W. Tukey are expected to inspire exciting new advances in machine learning and artificial intelligence in years to come. Along these lines, the present paper advances a novel typicality principle which states, roughly, that if the observed data is sufficiently “atypical” in a certain sense relative to a posited theory, then that theory is unwarranted. This emphasis on typicality brings familiar but often overlooked background notions like model-checking to the inferential foreground. One instantiation of the typicality principle is in the context of parameter estimation, where we propose a new typicality-based regularization strategy that leans heavily on goodness-of-fit testing. The effectiveness of this new regularization strategy is illustrated in three non-trivial examples where ordinary maximum likelihood estimation fails miserably. We also demonstrate how the typicality principle fits within a bigger picture of reliable and efficient uncertainty quantification.
Supplementary material
Supplementary MaterialCode to reproduce all figures in this paper is included in the supplementary materials.
References
Aldrich J (1997). R. A. Fisher and the making of maximum likelihood 1912–1922. Statistical Science, 12(3): 162–176. https://doi.org/10.1214/ss/1030037906
Basu D (1975). Statistical information and likelihood. Sankhyā: The Indian. Journal of Statistics, Series A, 37(1): 1–71. Discussion and correspondance between Barnard and Basu. https://doi.org/10.1111/j.2517-6161.1975.tb01024.x
Birnbaum A (1962). On the foundations of statistical inference. Journal of the American Statistical Association, 57: 269–326. https://doi.org/10.1080/01621459.1962.10480660
Breiman L (2001). Statistical modeling: The two cultures. Statistical Science, 16(3): 199–231. https://doi.org/10.1214/ss/1009213726
Cella L, Martin R (2023). Possibility-theoretic statistical inference offers performance and probativeness assurances. International Journal of Approximate Reasoning, 163: 109060. https://doi.org/10.1016/j.ijar.2023.109060
Datta GS, Ghosh JK (1995). On priors providing frequentist validity for Bayesian inference. Biometrika, 82(1): 37–45. https://doi.org/10.2307/2337625
Dempster AP (1966). New methods for reasoning towards posterior distributions based on sample data. The Annals of Mathematical Statistics, 37: 355–374. https://doi.org/10.1214/aoms/1177699517
Dempster AP (2008). The Dempster–Shafer calculus for statisticians. International Journal of Approximate Reasoning, 48(2): 365–377. https://doi.org/10.1016/j.ijar.2007.03.004
Donoho D (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4): 745–766. https://doi.org/10.1080/10618600.2017.1384734
Dubois D, Foulloy L, Mauris G, Prade H (2004). Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliable Computing, 10(4): 273–297. https://doi.org/10.1023/B:REOM.0000032115.22510.b5
Durbin J (1970). On Birnbaum’s theorem on the relation between sufficiency, conditionality and likelihood. Journal of the American Statistical Association, 65(329): 395–398. https://doi.org/10.1080/01621459.1970.10481088
Eschker SJ, Liu C (2024). Towards strong AI: Transformational beliefs and scientific creativity. arXiv preprint: https://arxiv.org/abs/2412.19938
Fisher RA (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, 222: 309–368. https://doi.org/10.1098/rsta.1922.0009
Fisher RA (1925). Theory of statistical estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 22: 200–225. https://doi.org/10.1017/S0305004100009580
Fisher RA (1935a). The fiducial argument in statistical inference. Annals of Eugenics, 6: 391–398. https://doi.org/10.1111/j.1469-1809.1935.tb02120.x
Fisher RA (1935b). The logic of inductive inference. Journal of the Royal Statistical Society, 98: 39–82. https://doi.org/10.2307/2342435
Fraser DAS, Reid N, Lin W (2018). When should modes of inference disagree? Some simple but challenging examples. The Annals of Applied Statistics, 12(2): 750–770. https://doi.org/10.1214/18-AOAS1160SF
Hannig J, Iyer H, Lai RCS, Lee TCM (2016). Generalized fiducial inference: A review and new results. Journal of the American Statistical Association, 111(515): 1346–1361. https://doi.org/10.1080/01621459.2016.1165102
Hinton G, Vinyals O, Dean J (2015). Distilling the knowledge in a neural network. arXiv preprint: https://arxiv.org/abs/1503.02531
Hose D (2022). Possibilistic Reasoning with Imprecise Probabilities: Statistical Inference and Dynamic Filtering, Ph.D. thesis, University of Stuttgart. https://dominikhose.github.io/dissertation/diss_dhose.pdf
Jiang Y, Liu C (2025). Estimation of over-parameterized models from an auto-modeling perspective. Journal of the American Statistical Association, 120(552): 2422–2434. https://doi.org/10.1080/01621459.2025.2455192
Jiang Y, Liu C, Zhang H (2023). Finite sample valid inference via calibrated bootstrap. https://arxiv.org/abs/2408.16763
Kyburg HE Jr (1987). Bayesian and non-Bayesian evidential updating. Artificial Intelligence, 31(3): 271–293. https://doi.org/10.1016/0004-3702(87)90068-3
Le Cam L (1990). Maximum likelihood: An introduction. International Statistical Review, 58(2): 153–171. https://doi.org/10.2307/1403464
Liu C (2023). Reweighted and circularised Anderson-Darling tests of goodness-of-fit. Journal of Nonparametric Statistics, 35(4): 869–904. https://doi.org/10.1080/10485252.2023.2213782
Liu Y, Yao Y, Ton JF, Zhang X, Guo R, Cheng H, et al. (2024). Trustworthy LLMs: A survey and guideline for evaluating large language models’ alignment. https://arxiv.org/abs/2308.05374
Martin R (2018). On an inferential model construction using generalized associations. Journal of Statistical Planning and Inference, 195: 105–115. https://doi.org/10.1016/j.jspi.2016.11.006
Martin R (2022a). Valid and efficient imprecise-probabilistic inference with partial priors, I. First results. https://arxiv.org/abs/2203.06703
Martin R (2022b). Valid and efficient imprecise-probabilistic inference with partial priors, II. General framework. https://arxiv.org/abs/2211.14567
Martin R (2023). Valid and efficient imprecise-probabilistic inference with partial priors. III. Marginalization. https://arxiv.org/abs/2309.13454
Martin R (2024). A possibility-theoretic solution to Basu’s Bayesian–frequentist via media. Sankhya A, 86: 43–70. https://doi.org/10.1007/s13171-023-00323-9
Martin R (2025a). A new Monte Carlo method for valid prior-free possibilistic statistical inference. https://arxiv.org/abs/2501.10585
Martin R (2025b). Possibilistic inferential models: a review. Journal of the American Statistical Association. To appear: https://arxiv.org/abs/2507.09007
Martin R, Liu C (2013). Inferential models: A framework for prior-free posterior probabilistic inference. Journal of the American Statistical Association, 108(501): 301–313. https://doi.org/10.1080/01621459.2012.747960
Martin R, Prim SN, Williams J (2025). Decision-making with possibilistic inferential models. https://arxiv.org/abs/2112.13247
Mayo D (2014). On the Birnbaum argument for the strong likelihood principle. Statistical Science, 29(2): 227–239. https://doi.org/10.1214/13-STS457
Neyman J, Scott EL (1948). Consistent estimates based on partially consistent observations. Econometrica, 16: 1–32. https://doi.org/10.2307/1914288
Shafer G (1982). Belief functions and parametric models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 44(3): 322–352. With discussion. https://doi.org/10.1111/j.2517-6161.1982.tb01211.x
Stein C (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, Neyman J (Ed.), 197–206. University of California Press, Berkeley and Los Angeles.
Stein C (1959). An example of wide discrepancy between fiducial and confidence intervals. The Annals of Mathematical Statistics, 30: 877–880. https://doi.org/10.1214/aoms/1177706072
Stigler SM (2007). The epic story of maximum likelihood. Statistical Science, 22(4): 598–620. https://doi.org/10.1214/07-STS249
Tibshirani R (1989). Noninformative priors for one parameter of many. Biometrika, 76(3): 604–608. https://doi.org/10.1093/biomet/76.3.604
Tukey JW (1986). The Collected Works of John W. Tukey. Vol. III. The Wadsworth & Brooks/Cole Statistics/Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA. Philosophy and principles of data analysis: 1949–1964, Edited and with comments by Lyle V. Jones, With a biography of Tukey by Frederick Mosteller.
Xie M, Singh K (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81(1): 3–39. https://doi.org/10.1111/insr.12000
Zabell SL (1992). R. A. Fisher and the fiducial argument. Statistical Science, 7(3): 369–387. https://doi.org/10.1214/ss/1177011233