Brain imaging research poses challenges due to the intricate structure of the brain and the absence of clearly discernible features in the images. In this study, we propose a technique for analyzing brain image data identifying crucial regions relevant to patients’ conditions, specifically focusing on Diffusion Tensor Imaging data. Our method utilizes the Bayesian Dirichlet process prior incorporating generalized linear models, that enhances clustering performance while it benefits from the flexibility of accommodating varying numbers of clusters. Our approach improves the performance of identifying potential classes utilizing locational information by considering the proximity between locations as clustering constraints. We apply our technique to a dataset from Transforming Research and Clinical Knowledge in Traumatic Brain Injury study, aiming to identify important regions in the brain’s gray matter, white matter, and overall brain tissue that differentiate between young and old age groups. Additionally, we explore a link between our discoveries and the existing outcomes in the field of brain network research.
Pub. online:13 Mar 2024Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 22, Issue 2 (2024): Special Issue: 2023 Symposium on Data Science and Statistics (SDSS): “Inquire, Investigate, Implement, Innovate”, pp. 280–297
Abstract
The use of visuals is a key component in scientific communication. Decisions about the design of a data visualization should be informed by what design elements best support the audience’s ability to perceive and understand the components of the data visualization. We build on the foundations of Cleveland and McGill’s work in graphical perception, employing a large, nationally-representative, probability-based panel of survey respondents to test perception in stacked bar charts. Our findings provide actionable guidance for data visualization practitioners to employ in their work.
Our contribution is to widen the scope of extreme value analysis applied to discrete-valued data. Extreme values of a random variable are commonly modeled using the generalized Pareto distribution, a peak-over-threshold method that often gives good results in practice. When data is discrete, we propose two other methods using a discrete generalized Pareto and a generalized Zipf distribution respectively. Both are theoretically motivated and we show that they perform well in estimating rare events in several simulated and real data cases such as word frequency, tornado outbreaks and multiple births.
This study delves into the impact of the COVID-19 pandemic on the enrollment rates of on-site undergraduate programs within Brazilian public universities. Employing the Machine Learning Control Method, a counterfactual scenario was constructed in which the pandemic did not occur. By contrasting this hypothetical scenario with real-world data on new entrants, a variable was defined to characterize the impact of the COVID-19 pandemic on on-site undergraduate programs at Brazilian public universities. This variable reveals that the impact factor varies significantly when considering the geographical locations of the institutions offering these courses. Courses offered by institutions located in smaller population cities experienced a more pronounced impact compared to those situated in larger urban centers.
One crucial aspect of precision medicine is to allow physicians to recommend the most suitable treatment for their patients. This requires understanding the treatment heterogeneity from a patient-centric view, quantified by estimating the individualized treatment effect (ITE). With a large amount of genetics data and medical factors being collected, a complete picture of individuals’ characteristics is forming, which provides more opportunities to accurately estimate ITE. Recent development using machine learning methods within the counterfactual outcome framework shows excellent potential in analyzing such data. In this research, we propose to extend meta-learning approaches to estimate individualized treatment effects with survival outcomes. Two meta-learning algorithms are considered, T-learner and X-learner, each combined with three types of machine learning methods: random survival forest, Bayesian accelerated failure time model and survival neural network. We examine the performance of the proposed methods and provide practical guidelines for their application in randomized clinical trials (RCTs). Moreover, we propose to use the Boruta algorithm to identify risk factors that contribute to treatment heterogeneity based on ITE estimates. The finite sample performances of these methods are compared through extensive simulations under different randomization designs. The proposed approach is applied to a large RCT of eye disease, namely, age-related macular degeneration (AMD), to estimate the ITE on delaying time-to-AMD progression and to make individualized treatment recommendations.
The exploration of whether artificial intelligence (AI) can evolve to possess consciousness is an intensely debated and researched topic within the fields of philosophy, neuroscience, and artificial intelligence. Understanding this complex phenomenon hinges on integrating two complementary perspectives of consciousness: the objective and the subjective. Objective perspectives involve quantifiable measures and observable phenomena, offering a more scientific and empirical approach. This includes the use of neuroimaging technologies such as electrocorticography (ECoG), EEG, and fMRI to study brain activities and patterns. These methods allow for the mapping and understanding of neural representations related to language, visual, acoustic, emotional, and semantic information. However, the objective approach may miss the nuances of personal experience and introspection. On the other hand, subjective perspectives focus on personal experiences, thoughts, and feelings. This introspective view provides insights into the individual nature of consciousness, which cannot be directly measured or observed by others. Yet, the subjective approach is often criticized for its lack of empirical evidence and its reliance on personal interpretation, which may not be universally applicable or reliable. Integrating these two perspectives is essential for a comprehensive understanding of consciousness. By combining objective measures with subjective reports, we can develop a more holistic understanding of the mind.
The United States has a racial homeownership gap due to a legacy of historic inequality and discriminatory policies, but factors that contribute to the racial disparity in homeownership rates between White Americans and people of color have not been fully characterized. In order to alleviate this issue, policymakers need a better understanding of how risk factors affect the homeownership rates of racial and ethnic groups differently. In this study, data from several publicly available surveys, including the American Community Survey and United States Census, were leveraged in combination with statistical learning models to investigate potential factors related to homeownership rates across racial and ethnic categories, with a focus on how risk factors vary by race or ethnicity. Our models indicated that job availability for specific demographics, and specific regions of the United States were factors that affect homeownership rates in Black, Hispanic, and Asian populations in different ways. Based on the results of this study, it is recommended policymakers promote strategies to increase access to jobs for people of color (POC), such as vocational training and programs to reduce implicit bias in hiring practices. These interventions could ultimately increase homeownership rates for POC and be a step toward reducing the racial wealth gap.
Racial and ethnic representation in home ownership rates is an important public policy topic for addressing inequality within society. Although more than half of the households in the US are owned, rather than rented, the representation of home ownership is unequal among different racial and ethnic groups. Here we analyze the US Census Bureau’s American Community Survey data to conduct an exploratory and statistical analysis of home ownership in the US, and find sociodemographic factors that are associated with differences in home ownership rates. We use binomial and beta-binomial generalized linear models (GLMs) with 2020 county-level data to model the home ownership rate, and fit the beta-binomial models with Bayesian estimation. We determine that race/ethnic group, geographic region, and income all have significant associations with the home ownership rate. To make the data and results accessible to the public, we develop an Shiny web application in R with exploratory plots and model predictions.
In 2022 the American Statistical Association established the Riffenburgh Award, which recognizes exceptional innovation in extending statistical methods across diverse fields. Simultaneously, the Department of Statistics at the University of Connecticut proudly commemorated six decades of excellence, having evolved into a preeminent hub for academic, industrial, and governmental statistical grooming. To honor this legacy, a captivating virtual dialogue was conducted with the department’s visionary founder, Dr. Robert H. Riffenburgh, delving into his extraordinary career trajectory, profound insights into the statistical vocation, and heartfelt accounts from the faculty and students he personally nurtured. This multifaceted narrative documents the conversation with more detailed background information on each topic covered by the interview than what is presented in the video recording on YouTube.
In the form of a scholarly exchange with ChatGPT, we cover fundamentals of modeling stochastic dependence with copulas. The conversation is aimed at a broad audience and provides a light introduction to the topic of copula modeling, a field of potential relevance in all areas where more than one random variable appears in the modeling process. Topics covered include the definition, Sklar’s theorem, the invariance principle, pseudo-observations, tail dependence and stochastic representations. The conversation also shows to what degree it can be useful (or not) to learn about such concepts by interacting with the current version of a chatbot.