This study delves into the impact of the COVID-19 pandemic on the enrollment rates of on-site undergraduate programs within Brazilian public universities. Employing the Machine Learning Control Method, a counterfactual scenario was constructed in which the pandemic did not occur. By contrasting this hypothetical scenario with real-world data on new entrants, a variable was defined to characterize the impact of the COVID-19 pandemic on on-site undergraduate programs at Brazilian public universities. This variable reveals that the impact factor varies significantly when considering the geographical locations of the institutions offering these courses. Courses offered by institutions located in smaller population cities experienced a more pronounced impact compared to those situated in larger urban centers.
The United States has a racial homeownership gap due to a legacy of historic inequality and discriminatory policies, but factors that contribute to the racial disparity in homeownership rates between White Americans and people of color have not been fully characterized. In order to alleviate this issue, policymakers need a better understanding of how risk factors affect the homeownership rates of racial and ethnic groups differently. In this study, data from several publicly available surveys, including the American Community Survey and United States Census, were leveraged in combination with statistical learning models to investigate potential factors related to homeownership rates across racial and ethnic categories, with a focus on how risk factors vary by race or ethnicity. Our models indicated that job availability for specific demographics, and specific regions of the United States were factors that affect homeownership rates in Black, Hispanic, and Asian populations in different ways. Based on the results of this study, it is recommended policymakers promote strategies to increase access to jobs for people of color (POC), such as vocational training and programs to reduce implicit bias in hiring practices. These interventions could ultimately increase homeownership rates for POC and be a step toward reducing the racial wealth gap.
Racial and ethnic representation in home ownership rates is an important public policy topic for addressing inequality within society. Although more than half of the households in the US are owned, rather than rented, the representation of home ownership is unequal among different racial and ethnic groups. Here we analyze the US Census Bureau’s American Community Survey data to conduct an exploratory and statistical analysis of home ownership in the US, and find sociodemographic factors that are associated with differences in home ownership rates. We use binomial and beta-binomial generalized linear models (GLMs) with 2020 county-level data to model the home ownership rate, and fit the beta-binomial models with Bayesian estimation. We determine that race/ethnic group, geographic region, and income all have significant associations with the home ownership rate. To make the data and results accessible to the public, we develop an Shiny web application in R with exploratory plots and model predictions.
This paper aims to determine the effects of socioeconomic and healthcare factors on the performance of controlling COVID-19 in both the Southern and Southeastern United States. This analysis will provide government agencies with information to determine what communities need additional COVID-19 assistance, to identify counties that effectively control COVID-19, and to apply effective strategies on a broader scale. The statistical analysis uses data from 328 counties with a population of more than 65,000 from 13 states. We define a new response variable by considering infection and mortality rates to capture how well each county controls COVID-19. We collect 14 factors from the 2019 American Community Survey Single-Year Estimates and obtain county-level infection and mortality rates from USAfacts.org. We use the least absolute shrinkage and selection operator (LASSO) regression to fit a multiple linear regression model and develop an interactive system programmed in R shiny to deliver all results. The interactive system at https://asa-competition-smu.shinyapps.io/COVID19/ provides many options for users to explore our data, models, and results.
Linear regression models are widely used in empirical studies. When serial correlation is present in the residuals, generalized least squares (GLS) estimation is commonly used to improve estimation efficiency. This paper proposes the use of an alternative estimator, the approximate generalized least squares estimators based on high-order AR(p) processes (GLS-AR). We show that GLS-AR estimators are asymptotically efficient as GLS estimators, as both the number of AR lag, p, and the number of observations, n, increase together so that $p=o({n^{1/4}})$ in the limit. The proposed GLS-AR estimators do not require the identification of the residual serial autocorrelation structure and perform more robust in finite samples than the conventional FGLS-based tests. Finally, we illustrate the usefulness of GLS-AR method by applying it to the global warming data from 1850–2012.
Statistical learning methods have been growing in popularity in recent years. Many of these procedures have parameters that must be tuned for models to perform well. Research has been extensive in neural networks, but not for many other learning methods. We looked at the behavior of tuning parameters for support vector machines, gradient boosting machines, and adaboost in both a classification and regression setting. We used grid search to identify ranges of tuning parameters where good models can be found across many different datasets. We then explored different optimization algorithms to select a model across the tuning parameter space. Models selected by the optimization algorithm were compared to the best models obtained through grid search to select well performing algorithms. This information was used to create an R package, EZtune, that automatically tunes support vector machines and boosted trees.
Identifying treatment effect modifiers (i.e., moderators) plays an essential role in improving treatment efficacy when substantial treatment heterogeneity exists. However, studies are often underpowered for detecting treatment effect modifiers, and exploratory analyses that examine one moderator per statistical model often yield spurious interactions. Therefore, in this work, we focus on creating an intuitive and readily implementable framework to facilitate the discovery of treatment effect modifiers and to make treatment recommendations for time-to-event outcomes. To minimize the impact of a misspecified main effect and avoid complex modeling, we construct the framework by matching the treated with the controls and modeling the conditional average treatment effect via regressing the difference in the observed outcomes of a matched pair on the averaged moderators. Inverse-probability-of-censoring weighting is used to handle censored observations. As matching is the foundation of the proposed methods, we explore different matching metrics and recommend the use of Mahalanobis distance when both continuous and categorical moderators are present. After matching, the proposed framework can be flexibly combined with popular variable selection and prediction methods such as linear regression, least absolute shrinkage and selection operator (Lasso), and random forest to create different combinations of potential moderators. The optimal combination is determined by the out-of-bag prediction error and the area under the receiver operating characteristic curve in making correct treatment recommendations. We compare the performance of various combined moderators through extensive simulations and the analysis of real trial data. Our approach can be easily implemented using existing R packages, resulting in a straightforward optimal combined moderator to make treatment recommendations.