A Joint Equivalence and Difference (JED) Test for Practical Use in Controlled Trials
Pub. online: 2 July 2024
Type: Statistical Data Science
Open Access
Received
5 February 2024
5 February 2024
Accepted
8 June 2024
8 June 2024
Published
2 July 2024
2 July 2024
Abstract
A joint equivalence and difference (JED) test is needed because difference tests and equivalence (more exactly, similarity) tests each provide only a one-sided answer. The concept and underlying theory have appeared numerous times, noted and discussed here, but never in a form usable in workaday statistical applications. This work provides such a form as a straightforward simple test with a step-by-step guide and possible interpretations and formulas. For initial treatment, it restricts attention to a t test of two means. The guide is illustrated by a numerical example from the field of orthopedics. To assess the quality of the JED test, its sensitivity and specificity are examined for test outcomes depending on error risk α, total sample size, sub-sample size ratio, and variability ratio. These results are shown in tables. Interpretations are discussed. It is concluded that the test exhibits high power and effect size and that only quite small samples show any effect on the power or effect size of the JED test by commonly seen values of any of the parameters. Data for the example and computer codes for using the JED test are accessible through links to supplementary material. We recommend that this work be extended to other test forms and multivariate forms.
Supplementary material
Supplementary MaterialThe dataset used in numerical example (Section 3) and R code for tables (Section 4) can be found at: https://github.com/wlingge/JED
References
Bauer P, Kieser M (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4): 934–937. https://doi.org/10.1093/biomet/83.4.934
Berger RL (1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics, 24(4): 295–300. https://doi.org/10.2307/1267823
Betensky RA (2019). The p-value requires context, not a threshold. American Statistician, 73(sup1): 115–117. https://doi.org/10.1080/00031305.2018.1529624
Bloch DA, Lai TL, Tubert-Bitter P (2001). One-sided tests in clinical trials with multiple endpoints. Biometrics, 57(4): 1039–1047. https://doi.org/10.1111/j.0006-341X.2001.01039.x
Bofinger E (1985). Expanded confidence intervals. Communications in Statistics - Theory and Methods, 14(8): 1849–1864. https://doi.org/10.1080/03610928508829017
Bofinger E (1992). Expanded confidence intervals, one-sided tests, and equivalence testing. Journal of Biopharmaceutical Statistics, 2(2): 181–188. https://doi.org/10.1080/10543409208835038
Christensen E (2007). Methodology of superiority vs. equivalence trials and non-inferiority trials. Journal of Hepatology, 46(5): 947–954. https://doi.org/10.1016/j.jhep.2007.02.015
Da Silva GT, Logan BR, Klein JP (2009). Methods for equivalence and noninferiority testing. Biology of Blood and Marrow Transplantation, 15(1): 120–127. https://doi.org/10.1016/j.bbmt.2008.10.004
Goeman JJ, Solari A, Stijnen T (2010). Three-sided hypothesis testing: Simultaneous testing of superiority, equivalence and inferiority. Statistics in Medicine, 29(20): 2117–2125. https://doi.org/10.1002/sim.4002
Hirotsu C (2007). A unifying approach to non-inferiority, equivalence and superiority tests via multiple decision processes. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry, 6(3): 193–203. https://doi.org/10.1002/pst.305
Hsu JC, Hwang JG, Liu HK, Ruberg SJ (1994). Confidence intervals associated with tests for bioequivalence. Biometrika, 81(1): 103–114. https://doi.org/10.1093/biomet/81.1.103
Matthews RA (2019). Moving towards the post p< 0.05 era via the analysis of credibility. American Statistician, 73(sup1): 202–212. https://doi.org/10.1080/00031305.2018.1543136
Morikawa T, Yoshida M (1995). A useful testing strategy in phase III trials: Combined test of superiority and test of equivalence. Journal of Biopharmaceutical Statistics, 5(3): 297–306. https://doi.org/10.1080/10543409508835115
Öhrn F, Jennison C (2010). Optimal group-sequential designs for simultaneous testing of superiority and non-inferiority. Statistics in Medicine, 29(7–8): 743–759. https://doi.org/10.1002/sim.3790
Perlman MD (1969). One-sided testing problems in multivariate analysis. The Annals of Mathematical Statistics, 40(2): 549–567. https://doi.org/10.1214/aoms/1177697723
Perlman MD, Wu L (2004). A note on one-sided tests with multiple endpoints. Biometrics, 60(1): 276–280. https://doi.org/10.1111/j.0006-341X.2004.00159.x
Rosenbaum PR, Rubin DB (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B, Methodological, 45(2): 212–218. https://doi.org/10.1111/j.2517-6161.1983.tb01242.x
Rosenbaum PR, Silber JH (2009). Sensitivity analysis for equivalence and difference in an observational study of neonatal intensive care units. Journal of the American Statistical Association, 104(486): 501–511. https://doi.org/10.1198/jasa.2009.0016
Roy SN (1953). On a heuristic method of test construction and its use in multivariate analysis. The Annals of Mathematical Statistics, 24(2): 220–238. https://doi.org/10.1214/aoms/1177729029
Satterthwaite FE (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6): 110–114. https://doi.org/10.2307/3002019
Serdar CC, Cihan M, Yücel D, Serdar MA (2021). Sample size, power and effect size revisited: Simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochemia Medica, 31(1): 27–53. https://doi.org/10.11613/BM.2021.010502
Student (1908). The probable error of a mean. Biometrika, 6(1): 1–25. https://doi.org/10.2307/2331554
Tamhane AC, Logan BR (2004). A superiority-equivalence approach to one-sided tests on multiple endpoints in clinical trials. Biometrika, 91(3): 715–727. https://doi.org/10.1093/biomet/91.3.715
Tryon WW (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6(4): 371. https://doi.org/10.1037/1082-989X.6.4.371
Tryon WW, Lewis C (2008). An inferential confidence interval method of establishing statistical equivalence that corrects Tryon’s (2001) reduction factor. Psychological Methods, 13(3): 272–277. https://doi.org/10.1037/a0013158
Wald A (1945). Sequential method of sampling for deciding between two courses of action. Journal of the American Statistical Association, 40(231): 277–306. https://doi.org/10.1080/01621459.1945.10500736
Waldhoer T, Heinzl H (2011). Combining difference and equivalence test results in spatial maps. International Journal of Health Geographics, 10: 1–10. https://doi.org/10.1186/1476-072X-10-1
Welch BL (1947). The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika, 34(1–2): 28–35. https://doi.org/10.1093/biomet/34.1-2.28