Pub. online:25 Jul 2023Type:Computing In Data ScienceOpen Access
Journal:Journal of Data Science
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 538–556
Abstract
Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We present a generic, user-friendly implementation of weighted, directed PA network generation with R package wdnet. The core algorithm is based on an efficient binary tree approach. The package further allows adding multiple edges at a time, heterogeneous reciprocal edges, and user-specified preference functions. The engine under the hood is implemented in C++. Usages of the package are illustrated with detailed explanation. A benchmark study shows that wdnet is efficient for generating general PA networks not available in other packages. In restricted settings that can be handled by existing packages, wdnet provides comparable efficiency.
Modeling heterogeneity on heavy-tailed distributions under a regression framework is challenging, yet classical statistical methodologies usually place conditions on the distribution models to facilitate the learning procedure. However, these conditions will likely overlook the complex dependence structure between the heaviness of tails and the covariates. Moreover, data sparsity on tail regions makes the inference method less stable, leading to biased estimates for extreme-related quantities. This paper proposes a gradient boosting algorithm to estimate a functional extreme value index with heterogeneous extremes. Our proposed algorithm is a data-driven procedure capturing complex and dynamic structures in tail distributions. We also conduct extensive simulation studies to show the prediction accuracy of the proposed algorithm. In addition, we apply our method to a real-world data set to illustrate the state-dependent and time-varying properties of heavy-tail phenomena in the financial industry.
As the COVID-19 pandemic has strongly disrupted people’s daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the serial interval and the epidemic doubling time. We collect relevant studies based on the COVID-19 data in China and conduct a meta-analysis to obtain pooled estimates on the four metrics. From the summary results, we conclude that the COVID-19 has stronger transmissibility than SARS, implying that stringent public health strategies are necessary.
Pub. online:16 Sep 2021Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 446–469
Abstract
In this paper, we study macroscopic growth dynamics of social network link formation. Rather than focusing on one particular dataset, we find invariant behavior in regional social networks that are geographically concentrated. Empirical findings suggest that the startup phase of a regional network can be modeled by a self-exciting point process. After the startup phase ends, the growth of the links can be modeled by a non-homogeneous Poisson process with a constant rate across the day but varying rates from day to day, plus a nightly inactive period when local users are expected to be asleep. Conclusions are drawn based on analyzing four different datasets, three of which are regional and a non-regional one is included for contrast.