Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Race-Specific Risk Factors for Homeownership Disparity in the Continental United States

Rachel E. Richardson Damon T. Leach Natalie M. Winans All authors (6)

https://doi.org/10.6339/23-JDS1116

Pub. online: 13 Dec 2023 Type: Data Science In Action

Open Access

Journal: Journal of Data Science Volume 22, Issue 4 (2024), pp. 591–604

Abstract

The United States has a racial homeownership gap due to a legacy of historic inequality and discriminatory policies, but factors that contribute to the racial disparity in homeownership rates between White Americans and people of color have not been fully characterized. In order to alleviate this issue, policymakers need a better understanding of how risk factors affect the homeownership rates of racial and ethnic groups differently. In this study, data from several publicly available surveys, including the American Community Survey and United States Census, were leveraged in combination with statistical learning models to investigate potential factors related to homeownership rates across racial and ethnic categories, with a focus on how risk factors vary by race or ethnicity. Our models indicated that job availability for specific demographics, and specific regions of the United States were factors that affect homeownership rates in Black, Hispanic, and Asian populations in different ways. Based on the results of this study, it is recommended policymakers promote strategies to increase access to jobs for people of color (POC), such as vocational training and programs to reduce implicit bias in hiring practices. These interventions could ultimately increase homeownership rates for POC and be a step toward reducing the racial wealth gap.

Exploring Racial and Ethnic Differences in US Home Ownership with Bayesian Beta-Binomial Regression

Jhonatan Medri Tejasvi Channagiri Lu Lu

https://doi.org/10.6339/23-JDS1113

Pub. online: 31 Oct 2023 Type: Data Science In Action

Open Access

Journal: Journal of Data Science Volume 22, Issue 4 (2024), pp. 605–620

Abstract

Racial and ethnic representation in home ownership rates is an important public policy topic for addressing inequality within society. Although more than half of the households in the US are owned, rather than rented, the representation of home ownership is unequal among different racial and ethnic groups. Here we analyze the US Census Bureau’s American Community Survey data to conduct an exploratory and statistical analysis of home ownership in the US, and find sociodemographic factors that are associated with differences in home ownership rates. We use binomial and beta-binomial generalized linear models (GLMs) with 2020 county-level data to model the home ownership rate, and fit the beta-binomial models with Bayesian estimation. We determine that race/ethnic group, geographic region, and income all have significant associations with the home ownership rate. To make the data and results accessible to the public, we develop an Shiny web application in R with exploratory plots and model predictions.

Legendary Career and Colorful Life: A Conversation with Dr. Bob Riffenburgh

Haim Bar Jun Yan

https://doi.org/10.6339/23-JDS1115

Pub. online: 12 Oct 2023 Type: Data Science Conversation

Open Access

Journal: Journal of Data Science Volume 21, Issue 4 (2023), pp. 818–837

Abstract

In 2022 the American Statistical Association established the Riffenburgh Award, which recognizes exceptional innovation in extending statistical methods across diverse fields. Simultaneously, the Department of Statistics at the University of Connecticut proudly commemorated six decades of excellence, having evolved into a preeminent hub for academic, industrial, and governmental statistical grooming. To honor this legacy, a captivating virtual dialogue was conducted with the department’s visionary founder, Dr. Robert H. Riffenburgh, delving into his extraordinary career trajectory, profound insights into the statistical vocation, and heartfelt accounts from the faculty and students he personally nurtured. This multifaceted narrative documents the conversation with more detailed background information on each topic covered by the interview than what is presented in the video recording on YouTube.

The Philosophy of Copula Modeling: A Conversation with ChatGPT

Marius Hofert

https://doi.org/10.6339/23-JDS1114

Pub. online: 11 Oct 2023 Type: Philosophies Of Data Science

Open Access

Journal: Journal of Data Science Volume 21, Issue 4 (2023), pp. 619–637

Abstract

In the form of a scholarly exchange with ChatGPT, we cover fundamentals of modeling stochastic dependence with copulas. The conversation is aimed at a broad audience and provides a light introduction to the topic of copula modeling, a field of potential relevance in all areas where more than one random variable appears in the modeling process. Topics covered include the definition, Sklar’s theorem, the invariance principle, pseudo-observations, tail dependence and stochastic representations. The conversation also shows to what degree it can be useful (or not) to learn about such concepts by interacting with the current version of a chatbot.

The Effects of County-Level Socioeconomic and Healthcare Factors on Controlling COVID-19 in the Southern and Southeastern United States

Jackson Barth Guanqing Cheng Webb Williams All authors (5)

https://doi.org/10.6339/23-JDS1111

Pub. online: 5 Sep 2023 Type: Data Science In Action

Open Access

Journal: Journal of Data Science Volume 22, Issue 4 (2024), pp. 631–646

Abstract

This paper aims to determine the effects of socioeconomic and healthcare factors on the performance of controlling COVID-19 in both the Southern and Southeastern United States. This analysis will provide government agencies with information to determine what communities need additional COVID-19 assistance, to identify counties that effectively control COVID-19, and to apply effective strategies on a broader scale. The statistical analysis uses data from 328 counties with a population of more than 65,000 from 13 states. We define a new response variable by considering infection and mortality rates to capture how well each county controls COVID-19. We collect 14 factors from the 2019 American Community Survey Single-Year Estimates and obtain county-level infection and mortality rates from USAfacts.org. We use the least absolute shrinkage and selection operator (LASSO) regression to fit a multiple linear regression model and develop an interactive system programmed in R shiny to deliver all results. The interactive system at https://asa-competition-smu.shinyapps.io/COVID19/ provides many options for users to explore our data, models, and results.

Editorial: Advances in Network Data Science

Yuguo Chen Daniel Sewell Panpan Zhang All authors (4)

https://doi.org/10.6339/23-JDS213EDI

Pub. online: 8 Aug 2023 Type: Editorial

Open Access

Journal: Journal of Data Science Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 443–445

A Mixed-Membership Model for Social Network Clustering

Guang Ouyang Dipak K. Dey Panpan Zhang

https://doi.org/10.6339/23-JDS1109

Pub. online: 7 Aug 2023 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 508–522

Abstract

We propose a simple mixed membership model for social network clustering in this paper. A flexible function is adopted to measure affinities among a set of entities in a social network. The model not only allows each entity in the network to possess more than one membership, but also provides accurate statistical inference about network structure. We estimate the membership parameters using an MCMC algorithm. We evaluate the performance of the proposed algorithm by applying our model to two empirical social network data, the Zachary club data and the bottlenose dolphin network data. We also conduct some numerical studies based on synthetic networks for further assessing the effectiveness of our algorithm. In the end, some concluding remarks and future work are addressed briefly.

Generating General Preferential Attachment Networks with R Package wdnet

Yelie Yuan Tiandong Wang Jun Yan All authors (4)

https://doi.org/10.6339/23-JDS1110

Pub. online: 25 Jul 2023 Type: Computing In Data Science

Open Access

Journal: Journal of Data Science Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 538–556

Abstract

Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We present a generic, user-friendly implementation of weighted, directed PA network generation with R package wdnet. The core algorithm is based on an efficient binary tree approach. The package further allows adding multiple edges at a time, heterogeneous reciprocal edges, and user-specified preference functions. The engine under the hood is implemented in C++. Usages of the package are illustrated with detailed explanation. A benchmark study shows that wdnet is efficient for generating general PA networks not available in other packages. In restricted settings that can be handled by existing packages, wdnet provides comparable efficiency.

Network A/B Testing: Nonparametric Statistical Significance Test Based on Cluster-Level Permutation

Hongwei Shang Xiaolin Shi Bai Jiang

https://doi.org/10.6339/23-JDS1112

Pub. online: 25 Jul 2023 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 523–537

Abstract

A/B testing is widely used for comparing two versions of a product and evaluating new proposed product features. It is of great importance for decision-making and has been applied as a golden standard in the IT industry. It is essentially a form of two-sample statistical hypothesis testing. Average treatment effect (ATE) and the corresponding p-value can be obtained under certain assumptions. One key assumption in traditional A/B testing is the stable-unit-treatment-value assumption (SUTVA): there is no interference among different units. It means that the observation on one unit is unaffected by the particular assignment of treatments to the other units. Nonetheless, interference is very common in social network settings where people communicate and spread information to their neighbors. Therefore, the SUTVA assumption is violated. Analysis ignoring this network effect will lead to biased estimation of ATE. Most existing works focus mainly on the design of experiment and data analysis in order to produce estimators with good performance in regards to bias and variance. Little attention has been paid to the calculation of p-value. We work on the calculation of p-value for the ATE estimator in network A/B tests. After a brief review of existing research methods on design of experiment based on graph cluster randomization and different ATE estimation methods, we propose a permutation method for calculating p-value based on permutation test at the cluster level. The effectiveness of the method against that based on individual-level permutation is validated in a simulation study mimicking realistic settings.

Revisiting the Use of Generalized Least Squares in Time Series Regression Models

Yue Fang Sergio G. Koreisha Qi-man Shao

https://doi.org/10.6339/23-JDS1108

Pub. online: 21 Jul 2023 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 22, Issue 4 (2024), pp. 486–504

Abstract

Linear regression models are widely used in empirical studies. When serial correlation is present in the residuals, generalized least squares (GLS) estimation is commonly used to improve estimation efficiency. This paper proposes the use of an alternative estimator, the approximate generalized least squares estimators based on high-order AR(p) processes (GLS-AR). We show that GLS-AR estimators are asymptotically efficient as GLS estimators, as both the number of AR lag, p, and the number of observations, n, increase together so that $p=o({n^{1/4}})$ in the limit. The proposed GLS-AR estimators do not require the identification of the residual serial autocorrelation structure and perform more robust in finite samples than the conventional FGLS-based tests. Finally, we illustrate the usefulness of GLS-AR method by applying it to the global warming data from 1850–2012.

8 9 10 11 12

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China