A Joint Equivalence and Difference (JED) Test for Practical Use in Controlled Trials

Riffenburgh, Robert H.; Wang, Lingge

doi:10.6339/24-JDS1142

Journal of Data Science

A Joint Equivalence and Difference (JED) Test for Practical Use in Controlled Trials

Volume 23, Issue 1 (2025), pp. 171–187

Robert H. Riffenburgh

Lingge Wang

https://doi.org/10.6339/24-JDS1142

Pub. online: 2 July 2024 Type: Statistical Data Science

Open Access

Received
5 February 2024

Accepted
8 June 2024

Published
2 July 2024

Abstract

A joint equivalence and difference (JED) test is needed because difference tests and equivalence (more exactly, similarity) tests each provide only a one-sided answer. The concept and underlying theory have appeared numerous times, noted and discussed here, but never in a form usable in workaday statistical applications. This work provides such a form as a straightforward simple test with a step-by-step guide and possible interpretations and formulas. For initial treatment, it restricts attention to a t test of two means. The guide is illustrated by a numerical example from the field of orthopedics. To assess the quality of the JED test, its sensitivity and specificity are examined for test outcomes depending on error risk α, total sample size, sub-sample size ratio, and variability ratio. These results are shown in tables. Interpretations are discussed. It is concluded that the test exhibits high power and effect size and that only quite small samples show any effect on the power or effect size of the JED test by commonly seen values of any of the parameters. Data for the example and computer codes for using the JED test are accessible through links to supplementary material. We recommend that this work be extended to other test forms and multivariate forms.

Supplementary material

Supplementary Material

The dataset used in numerical example (Section 3) and R code for tables (Section 4) can be found at: https://github.com/wlingge/JED

References

Allen IE, Seaman CA (2006). Different, equivalent or both? Quality Progress, 39(7): 77.

Bauer P, Kieser M (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4): 934–937. https://doi.org/10.1093/biomet/83.4.934

Berger RL (1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics, 24(4): 295–300. https://doi.org/10.2307/1267823

Betensky RA (2019). The p-value requires context, not a threshold. American Statistician, 73(sup1): 115–117. https://doi.org/10.1080/00031305.2018.1529624

Bloch DA, Lai TL, Tubert-Bitter P (2001). One-sided tests in clinical trials with multiple endpoints. Biometrics, 57(4): 1039–1047. https://doi.org/10.1111/j.0006-341X.2001.01039.x

Bofinger E (1985). Expanded confidence intervals. Communications in Statistics - Theory and Methods, 14(8): 1849–1864. https://doi.org/10.1080/03610928508829017

Bofinger E (1992). Expanded confidence intervals, one-sided tests, and equivalence testing. Journal of Biopharmaceutical Statistics, 2(2): 181–188. https://doi.org/10.1080/10543409208835038

Christensen E (2007). Methodology of superiority vs. equivalence trials and non-inferiority trials. Journal of Hepatology, 46(5): 947–954. https://doi.org/10.1016/j.jhep.2007.02.015

Cohen J (1998). Statistical Power Analysis for the Behavioral Sciences. Routledge, New York.

Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL (1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. Journal of the National Cancer Institute, 22(1): 173–203.

Da Silva GT, Logan BR, Klein JP (2009). Methods for equivalence and noninferiority testing. Biology of Blood and Marrow Transplantation, 15(1): 120–127. https://doi.org/10.1016/j.bbmt.2008.10.004

Gastwirth JL (1992). Methods for assessing the sensitivity of statistical comparisons used in title VII cases to omitted variables. Jurimetrics Journal, 33: 19.

Goeman JJ, Solari A, Stijnen T (2010). Three-sided hypothesis testing: Simultaneous testing of superiority, equivalence and inferiority. Statistics in Medicine, 29(20): 2117–2125. https://doi.org/10.1002/sim.4002

Hirotsu C (2007). A unifying approach to non-inferiority, equivalence and superiority tests via multiple decision processes. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry, 6(3): 193–203. https://doi.org/10.1002/pst.305

Hsu JC, Hwang JG, Liu HK, Ruberg SJ (1994). Confidence intervals associated with tests for bioequivalence. Biometrika, 81(1): 103–114. https://doi.org/10.1093/biomet/81.1.103

Mascha EJ (2010). Equivalence and noninferiority testing in anesthesiology research. The Journal of the American Society of Anesthesiologists, 113(4): 779–781.

Matthews RA (2019). Moving towards the post p< 0.05 era via the analysis of credibility. American Statistician, 73(sup1): 202–212. https://doi.org/10.1080/00031305.2018.1543136

Morikawa T, Yoshida M (1995). A useful testing strategy in phase III trials: Combined test of superiority and test of equivalence. Journal of Biopharmaceutical Statistics, 5(3): 297–306. https://doi.org/10.1080/10543409508835115

Öhrn F, Jennison C (2010). Optimal group-sequential designs for simultaneous testing of superiority and non-inferiority. Statistics in Medicine, 29(7–8): 743–759. https://doi.org/10.1002/sim.3790

Perlman MD (1969). One-sided testing problems in multivariate analysis. The Annals of Mathematical Statistics, 40(2): 549–567. https://doi.org/10.1214/aoms/1177697723

Perlman MD, Wu L (2004). A note on one-sided tests with multiple endpoints. Biometrics, 60(1): 276–280. https://doi.org/10.1111/j.0006-341X.2004.00159.x

Riffenburgh RH (2006). A Comparison of Two Fractured-ankle Pinning Devices. Unpublished process improvement data, Naval Medical Center San Diego. Personal data, collection of R. H. Riffenburgh.

Riffenburgh RH, Gillen DL (2020). Statistics in Medicine, 4th edition. Elsevier, Amsterdam.

Rosenbaum PR, Rubin DB (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B, Methodological, 45(2): 212–218. https://doi.org/10.1111/j.2517-6161.1983.tb01242.x

Rosenbaum PR, Silber JH (2009). Sensitivity analysis for equivalence and difference in an observational study of neonatal intensive care units. Journal of the American Statistical Association, 104(486): 501–511. https://doi.org/10.1198/jasa.2009.0016

Roy SN (1953). On a heuristic method of test construction and its use in multivariate analysis. The Annals of Mathematical Statistics, 24(2): 220–238. https://doi.org/10.1214/aoms/1177729029

Satterthwaite FE (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6): 110–114. https://doi.org/10.2307/3002019

Serdar CC, Cihan M, Yücel D, Serdar MA (2021). Sample size, power and effect size revisited: Simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochemia Medica, 31(1): 27–53. https://doi.org/10.11613/BM.2021.010502

Student (1908). The probable error of a mean. Biometrika, 6(1): 1–25. https://doi.org/10.2307/2331554

Tamhane AC, Logan BR (2004). A superiority-equivalence approach to one-sided tests on multiple endpoints in clinical trials. Biometrika, 91(3): 715–727. https://doi.org/10.1093/biomet/91.3.715

Tryon WW (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6(4): 371. https://doi.org/10.1037/1082-989X.6.4.371

Tryon WW, Lewis C (2008). An inferential confidence interval method of establishing statistical equivalence that corrects Tryon’s (2001) reduction factor. Psychological Methods, 13(3): 272–277. https://doi.org/10.1037/a0013158

Wald A (1945). Sequential method of sampling for deciding between two courses of action. Journal of the American Statistical Association, 40(231): 277–306. https://doi.org/10.1080/01621459.1945.10500736

Waldhoer T, Heinzl H (2011). Combining difference and equivalence test results in spatial maps. International Journal of Health Geographics, 10: 1–10. https://doi.org/10.1186/1476-072X-10-1

Welch BL (1947). The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika, 34(1–2): 28–35. https://doi.org/10.1093/biomet/34.1-2.28

2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

decision-making error rate estimation means testing medical decisions statistical testing

Metrics

since February 2021

238

Article info
views

115

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file