Bibliographical Connections for Semiparametric Analysis in Case-Control Studies on Gene-Environment Interactions
Pub. online: 16 October 2024
Type: Data Science Reviews
Open Access
Received
30 June 2024
30 June 2024
Accepted
23 September 2024
23 September 2024
Published
16 October 2024
16 October 2024
Abstract
Analyzing the gene-environment interaction (GEI) is crucial for understanding the etiology of many complex traits. Among various types of study designs, case-control studies are popular for analyzing gene-environment interactions due to their efficiency in collecting covariate information. Extensive literature explores efficient estimation under various assumptions made about the relationship between genetic and environmental variables. In this paper, we comprehensively review the methods based on or related to the retrospective likelihood, including the methods based on the hypothetical population concept, which has been largely overlooked in GEI research in the past decade. Furthermore, we establish the methodological connection between these two groups of methods by deriving a new estimator from both the retrospective likelihood and the hypothetical population perspectives. The validity of the derivation is demonstrated through numerical studies.
Supplementary material
Supplementary MaterialThe code that implements Algorithm 1 in Section 3.3 is provided in the Supplementary Materials
References
Breslow NE, Robins JM, Wellner JA (2000). On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli, 6: 447–55. https://doi.org/10.2307/3318670
Chatterjee N, Carroll RJ (2005). Semiparametric maximum likelihood estimation in case-control studies of gene-environment interactions. Biometrika, 92: 399–418. https://doi.org/10.1093/biomet/92.2.399
Chatterjee N, Kalaylioglu Z, Carroll RJ (2005). A new paradigm of conditional-likelihoods for exploiting gene-environment independence in family based case-control studies. Genetic Epidemiology, 28: 138–156. https://doi.org/10.1002/gepi.20049
Chen YH, Chatterjee N, Carroll RJ (2008). Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association. Biostatistics, 9: 81–99. https://doi.org/10.1093/biostatistics/kxm011
Chen YH, Chatterjee N, Carroll RJ (2009). Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. Journal of the American Statistical Association, 104: 220–233. https://doi.org/10.1198/jasa.2009.0104
Crossa J (2012). From genotype x environment interaction to gene x environment interaction. Current Genomics, 13(3): 225–244. https://doi.org/10.2174/138920212800543066
Crouch DJ, Bodmer WF (2020). Polygenic inheritance, gwas, polygenic risk scores, and the search for functional variants. Proceedings of the National Academy of Sciences, 117(32): 18924–18933. https://doi.org/10.1073/pnas.2005634117
Curtis D (2018). Polygenic risk score for schizophrenia is not strongly associated with the expression of specific genes or gene sets. Psychiatric Genetics, 28(4): 59–65. https://doi.org/10.1097/YPG.0000000000000197
Emdin CA, Haas M, Ajmera V, Simon TG, Homburger J, Neben C, et al. (2021). Association of genetic variation with cirrhosis: A multi-trait genome-wide association and gene–environment interaction study. Gastroenterology, 160(5): 1620–1633. https://doi.org/10.1053/j.gastro.2020.12.011
Gauderman WJ, Mukherjee B, Aschard H, Hsu L, Lewinger JP, Patel CJ, et al. (2017). Update on the state of the science for analytical methods for gene-environment interactions. American Journal of Epidemiology, 186(7): 762–770. https://doi.org/10.1093/aje/kwx228
Gauderman WJ, Zhang P, Morrison JL, Lewinger JP (2013). Finding novel genes by testing g× e interactions in a genome-wide association study. Genetic Epidemiology, 37(6): 603–613. https://doi.org/10.1002/gepi.21748
Han SS, Chatterjee N (2018). Review of statistical methods for gene-environment interaction analysis. Current Epidemiology Reports, 5: 39–45. https://doi.org/10.1007/s40471-018-0135-2
Han SS, Rosenberg PS, Garcia-Closas M, Figueroa JD, Silverman D, Chanock SJ, et al. (2012). Likelihood ratio test for detecting gene (g)-environment (e) interactions under an additive risk model exploiting ge independence for case-control data. American Journal of Epidemiology, 176: 1060–1067. https://doi.org/10.1093/aje/kws166
Hunter DJ (2005). Gene–environment interactions in human diseases. Nature Reviews. Genetics, 6(4): 287–298. https://doi.org/10.1038/nrg1578
Hutter CM, Mechanic LE, Chatterjee N, Kraft P, Gillanders EM, Tank NGET (2013). Gene-environment interactions in cancer epidemiology: A national cancer institute think tank report. Genetic Epidemiology, 37(7): 643–657. https://doi.org/10.1002/gepi.21756
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. (2018). Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics, 50(9): 1219–1224. https://doi.org/10.1038/s41588-018-0183-z
Liang L, Ma Y, Carroll RJ (2019). A semiparametric efficient estimator in case-control studies for gene–environment independent models. Journal of Multivariate Analysis, 173: 38–50. https://doi.org/10.1016/j.jmva.2019.01.006
Lin DY, Zeng D (2006). Likelihood-based inference on haplotype effects in genetic association studies. Journal of the American Statistical Association, 101(473): 89–104. https://doi.org/10.1198/016214505000000808
Liu J, Ma Y (2019). Locally efficient semiparametric estimators for a class of Poisson models with measurement error. Canadian Journal of Statistics, 47(2): 157–181. https://doi.org/10.1002/cjs.11483
Lobach I, Carroll RJ, Spinka C, Gail MH, Chatterjee N (2008). Haplotype-based regression analysis of case-control studies with unphased genotypes and measurement errors in environmental exposures. Biometrics, 64: 673–684. https://doi.org/10.1111/j.1541-0420.2007.00930.x
Luo S, Mukherjee B, Chen J, Chatterjee N (2009). Shrinkage estimation for robust and efficient screening of single-SNP association from case-control genome-wide association studies. Genetic Epidemiology, 33: 740–750. https://doi.org/10.1002/gepi.20428
Meisner A, Kundu P, Chatterjee N (2019). Case-only analysis of gene-environment interactions using polygenic risk scores. American Journal of Epidemiology, 188(11): 2013–2020. https://doi.org/10.1093/aje/kwz175
Modan B, Hartge P, Hirsh-Yechezkel G, Chetrit A, Lubin F, Beller U, et al. (2001). Parity, oral contraceptives, and the risk of ovarian cancer among carriers and noncarriers of a BRCA1 or BRCA2 mutation. The New England Journal of Medicine, 345: 235–240. https://doi.org/10.1056/NEJM200107263450401
Mukherjee B, Ahn J, Gruber SB, Chatterjee N (2012). Testing gene-environment interaction in large-scale case-control association studies: Possible choices and comparisons. American Journal of Epidemiology, 175: 177–190. https://doi.org/10.1093/aje/kwr367
Murcray CE, Lewinger JP, Gauderman WJ (2009). Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology, 169(2): 219–226. https://doi.org/10.1093/aje/kwn353
Piegorsch WW, Weinberg CR, Taylor JA (1994a). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Statistics in Medicine, 13(2): 153–162. https://doi.org/10.1002/sim.4780130206
Piegorsch WW, Weinberg CR, Taylor JA (1994b). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population based case-control studies. Statistics in Medicine, 13: 153–162. https://doi.org/10.1002/sim.4780130206
Prentice RL, Pyke R (1979). Logistic disease incidence models and case-control studies. Biometrika, 66: 403–411. https://doi.org/10.1093/biomet/66.3.403
Spinka C, Carroll RJ, Chatterjee N (2005). Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity. Genetic Epidemiology, 29: 108–127. https://doi.org/10.1002/gepi.20085
Stalder O, Asher A, Liang L, Carroll RJ, Ma Y, Chatterjee N (2017). Semiparametric analysis of complex polygenic gene-environment interactions in case-control studies. Biometrika, 104(4): 801–812. https://doi.org/10.1093/biomet/asx045
Thomas D (2010). Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. Annual Review of Public Health, 31: 21. https://doi.org/10.1146/annurev.publhealth.012809.103619
Tsiatis AA, Ma Y (2004). Locally efficient semiparametric estimators for functional measurement error models. Biometrika, 91: 835–848. https://doi.org/10.1093/biomet/91.4.835
Wang T, Asher A (2021). Improved semiparametric analysis of polygenic gene–environment interactions in case–control studies. Statistics in Biosciences, 13(3): 386–401. https://doi.org/10.1007/s12561-020-09298-9