Generating General Preferential Attachment Networks with R Package wdnet

Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We present a generic, user-friendly implementation of weighted, directed PA network generation with R package wdnet. The core algorithm is based on an efficient binary tree approach. The package further allows adding multiple edges at a time, heterogeneous reciprocal edges, and user-specified preference functions. The engine under the hood is implemented in C++. Usages of the package are illustrated with detailed explanation. A benchmark study shows that wdnet is efficient for generating general PA networks not available in other packages. In restricted settings that can be handled by existing packages, wdnet provides comparable efficiency.


Introduction
Preferential attachment (PA) networks are important network models in scientific research.The standard PA model (Barabási and Albert, 1999) evolves under the mechanism that a new node is attached to an existing node with probability proportional to its degree.With the increasing needs of accommodating the heterogeneity and complexity of modern networks, a variety of extended PA network models have been proposed.Examples are directed PA models (Bollobás et al., 2003), generalized directed PA models (Britton, 2020), weighted PA models (Barrat et al., 2004), and PA models with reciprocal edges (Britton, 2020;Wang and Resnick, 2022a,b).In a general setting, the probability that a node gets a new edge is proportional to a preference function of some (node-specific) characteristics (e.g., node degree or strength).Due to their versatility, PA models have found a wide range of applications such as friendship networks (Momeni and Rabbat, 2015), scientific collaboration networks (Abbasi et al., 2012), Wikipedia networks (Capocci et al., 2006), and the World Wide Web (Kong et al., 2008), among others.Many of these networks are massive in scale.
Efficient generation of large-scale PA networks is critical to the investigations of their complex local and asymptotic properties.When the preference function is linear in node degree,  2017) developed a structured algorithm with complexity O(n) for generating directed PA networks, where n is the number of generation steps.When the preference function is nonlinear in node degree, however, a naive extension of this algorithm requires visiting existing nodes one after another at each sampling step, leading to an increase in complexity to O(n 2 ).
Other node-degree-based techniques like stratified sampling or grouping (Hadian et al., 2016) cannot handle continuous edge weights.An algorithm based on a binary tree (Atwood et al., 2015) has complexity O(n log n) at the cost of additional storage of subtree information for each node.This algorithm is promising in handling weighted, directed PA networks with general preference functions.No user-friendly software package, however, has been available beyond the C implementation of Atwood et al. (2015).
Existing software packages only provide limited functions for PA network generation.Python package NetworkX (Hagberg et al., 2008) has a utility function for generating unweighted, undirected, linear PA networks.R packages igraph (Csardi and Nepusz, 2006), PAFit (Pham et al., 2020), and fastnet (Dong et al., 2020) contain functions for generating directed and/or undirected PA networks, but none of them allows edge weights.Both igraph and PAFit provide functions for preference functions that are not linear in node degrees, but they only cover a small class of power and logarithm functions.Further, no existing package implements the recently proposed PA models with reciprocity (Britton, 2020;Wang and Resnick, 2022a,b).See Table 1 for a brief summary of the functions for generating PA networks in these packages.
We introduce an R package wdnet (Yuan et al., 2023) for efficient generations of a general class of PA networks.The core algorithm is a generalization of the binary tree approach (Atwood et al., 2015).Our package contains substantial improvements in the flexibility for the generation of PA networks: It not only allows directed edges and edge weights, but also has additional features such as multiple edge additions, user-defined preference functions, and heterogeneous reciprocal edges, among others.See Table 1 for a summary of the features in comparison with existing packages.The engine under the hood is implemented in C++ for fast speed and then interfaced to R as facilitated by package Rcpp (Eddelbuettel and François, 2011).
The rest of the paper is organized as follows.In Section 2, we introduce the preliminaries of weighted, directed PA networks and present the core binary tree algorithm for generating PA networks with basic configurations.In Section 3, we illustrate the usage of the main generation function and how to control the PA network configurations for advanced features like adding multiple edges and reciprocal edges, and defining user-specified preference functions.Performance comparisons are conducted in Section 4. Section 5 concludes with a summary of the paper and 2 Generating weighted, directed PA networks We begin with an introduction to weighted, directed PA networks and a generic PA network generation framework in Section 2.1.The core of an efficient PA network generation algorithm in package wdnet is specified in Section 2.2.

Preliminaries
For discrete time t = 0, 1, 2, . .., let G(t) := (V (t), E(t)) be a weighted, directed network with node set V (t) and edge set E(t).For any v j , v k ∈ V (t), let (v j , v k , w jk ) ∈ E(t) denote a directed edge from v j to v k , where w jk > 0 represents its weight.There can be more than one edges from v j to v k .For the special case of j = k, (v j , v k , w jk ) ∈ E(t) is a self-loop.By convention, an initial (or seed) network G(0) has at least one node and one edge.We consider weighted, directed PA networks that allow adding multiple edges at each epoch.For illustration, we begin with a standard directed PA network that adds one edge at a time for now.There are three edge creation scenarios, respectively associated with probabilities α, β, γ ⩾ 0, subject to α + β + γ = 1.Note that we do not allow β = 1 to avoid degenerative situations.At each step t ⩾ 1, we flip a three-sided coin whose outcomes correspond to the three edge creation scenarios as follows: (1) With probability α, add a new edge from a new node to an existing one from G(t − 1); (2) With probability β, add a new edge between two existing nodes from G(t − 1) (self-loops are allowed); (3) With probability γ, add a new edge from an existing node from G(t − 1) to a new one.
For convenience, we call these three scenarios α, β, and γ schemes, respectively; see Figure 1 for a graphical illustration.
Once an edge creation scenario is decided, we need to determine the corresponding source and/or target nodes.The probability of each candidate node from the current network being selected is proportional to its preference score, which is given by a function (called preference function) of node-specific characteristics.For unweighted, directed PA networks, the most commonly used characteristics are out-and in-degrees, whereas for weighted, directed PA networks, out-and in-strengths are usually adopted.Let w kj represent the out-and in-strength of v j ∈ V (t), respectively.Let θ 1 (v j , t) := f 1 (O(v j , t), I(v j , t)) be the preference score for sampling v j as a source node for a newly added edge at step t + 1, with a non-negative function f 1 (•) called source preference function.Then the probability of node v j ∈ V (t) being selected as a source node at time t + 1 is given by .
Similarly, with a non-negative target preference function f 2 (•), one can define the preference score for sampling v j ∈ V (t) as a target node at time t + 1 as well as the associated sampling probability.The default option for both f 1 and f 2 in the package is a power function: where the parameters, a i , i = 1, . . ., 5, are specified by the users and can be different for f 1 and f 2 .User-defined preference functions are also allowed; see Section 3.2 for details.
Once the source and target nodes of a new edge are selected, its weight is drawn independently from a distribution with probability density or mass function h on a positive support.The in-and out-strengths of the corresponding nodes are also updated, as well as their source and target preference scores.Then the algorithm proceeds to the next step.
Algorithm 1 summarizes the core structure of generating a weighted, directed PA network.The bottleneck of the algorithm is how to efficiently sample source or target nodes, i.e., the Sample_Node() function in Algorithm 1.We use the sampling procedure for source nodes as an illustration.At time t + 1, the sampling step takes an updated vector of preference scores {θ 1 (v j , t) : v j ∈ V (t)} as input.In fact, the task is straightforward.Given the grid of increasing breakpoints formed by cumulative sums of the current preference scores, find an appropriate interval that contains a uniform random variable U drawn from Unif(0, v j ∈V (t) θ 1 (v j , t)).This can be done by sequentially subtracting node preference scores from U until we find the node such that removing its preference score would cause U ⩽ 0. The above sampling method is a fundamental linear search, as it has to visit each of the existing nodes (one after another), and keeps updating their preference scores.This sampling approach is the linear method in the package, and the complexity of network generation by using this method is O(n 2 ).
Fast sampling is possible for some special cases like when source and target preference functions are linear in node out-and in-degrees, respectively.Without loss of generality, consider a source preference function in the form of f 1 (x, y) = x + a 5 .When the edges are unweighted, the interval (containing U ) can be determined by one uniform draw (Wan et al., 2017).This algorithm acts like by putting the node labels into a bag the same number of times as their out-degrees and then drawing a label from the bag, which is analogous to the Pólya urn theory (Mahmoud, 2008).This technique is called bag in package igraph, and the same name is adopted in package wdnet.When the edges are weighted, by a clever maneuver, the sampling step for the whole generation process can be done in one batch with a pre-set cumulative sum vector of the edge weights by using the base R function findInterval().This is an extension of the bag algorithm, so we named it bagx.See Appendix A for more details about bag and bagx.

Node sampling based on a binary tree
Our recommended approach for Sample_Node() is a binary tree approach that extends the algorithm in Atwood et al. (2015) to weighted, directed PA networks with general preference Algorithm 1: Generating a weighted, directed PA network.
Input: Number of steps n; initial network G(0) = (V (0), E(0)); probabilities for three edge creation scenarios α, β, γ; probability density (or mass) function h for drawing edge weights from; preference functions for source node f 1 and for target node Algorithm: Update O, I, θ 1 and θ 2 with initial network G(0); t ← 1; Edge weights are marked next to the edges.
Figure 2: A generated network with initial graph colored with blue (panel a), its corresponding complete binary tree structure (panel b) and a summary of node attributes (panel c); the source and target preference functions are respectively given by f 1 (x, y) = x + 1 and f 2 (x, y) = y + 1.
functions.In a binary tree structure, each node has no more than two children nodes.The two children nodes of a parent node are distinguished by their positions, i.e., the left and the right child.Except for the root node, each node has only one parent node.A complete binary tree refers to a binary tree with all levels fully filled except for the last level.The last level is not necessarily completely filled, but has to be filled from left to right.A hypothetical example of complete binary tree is given in Figure 2b.An important application of binary trees is searching.The complexity of searching a specific node in a binary tree with n nodes is of order O(log n) (Mahmoud, 1992), which is more efficient than the linear search of complexity O(n).
We translate a PA network to a binary tree as follows.Each node in a PA network corresponds to a node in the associated complete binary tree based on the time of its creation.Suppose that G(0) contains only one node v 1 with a self-loop, then v 1 becomes the root of the complete binary tree.Then node v 2 that joins the PA network at t = 1 is the left child of v 1 , and the next new node v 3 , which joins the PA network at t = 2, is the right child of v 1 .For the next newcomer v 4 (at time t = 3), it is attached to v 2 as a left child since the first level Algorithm 2: Node sampling function based on a binary tree storage structure.
Input: Node set V (t − 1); i ∈ {1, 2} for sampling a source or a target node, respectively.Output: j, the index of the sampled node. 1 Function Sample_Node(V (t − 1), i): (consisting of v 2 and v 3 ) is fully filled.The transition continues in this fashion until all nodes in the PA network are added to the complete binary tree.For an initial graph G(0) containing more than one nodes, a node enumeration {v 1 , v 2 , . . ., v |V (0)| } is required before constructing the binary tree; see Figures 2a and 2b for an example with an initial graph consisting of two nodes (connected by one edge which is colored with blue).Different enumeration orders of the initial network result in different binary trees and, consequently, different networks because of the underlying sampling mechanism.
Having built a complete binary tree, we augment the nodes therein according to a collection of node attributes.At step t, the binary tree node v j stores the following information: parent (except for v 1 ) κ(v j ), left child l(v j ), right child r(v j ), out-strength O(v j , t), in-strength I(v j , t), preference score as a source node θ 1 (v j , t), preference score as a target node θ 2 (v j , t).For now we consider θ 1 and θ 2 as functions of node out-and in-strengths, but in general, θ 1 and θ 2 can be functions of any node-level characteristics.Additionally, let η 1 (v j , t) and η 2 (v j , t) denote the total preference of source and target nodes of the subtree (a portion of the binary tree consisting of a node and all of its descendants) with root v j , respectively, giving rise to the following relationship: Figure 2c summarizes the node attributes, including η 1 and η 2 , from the complete binary tree (in Figure 2b) that is constructed from the weighted network in Figure 2a.Node sampling based on the binary tree structure, available as the binary method in wdnet, is summarized in Algorithm 2. Generally, the algorithm searches for the subtree to which the potentially sampled node belongs in a recursive manner until the root of the resulting subtree (or the node itself if it is at the bottom level) is returned.The subtree-based searching substantially reduces the complexity of the sampling step from O(n) (for the linear search algorithm) to O(log n).The network generation algorithm with binary search thus has complexity O(n log n).
Upon the creation of a new edge (v j , v k , w jk ), the following quantities need to be updated: node strengths O(v j , t) and I(v k , t); preference scores θ 1 (v j , t), θ 1 (v k , t), θ 2 (v j , t) and θ 2 (v k , t); total preference scores of subtrees η 1 (v j , t), η 2 (v j , t), η 1 (κ(v j ), t), η 2 (κ(v j ), t), etc.The update of total preference scores is not shown in the algorithm.It traces the growth path through subtrees (backwards), and has the same time complexity O(log n) as the sampling method.

Usage
We start with the main function to generate PA networks with basic configurations in Section 3.1, and then introduce additional features in Section 3.2.

Main generation function
The function rpanet() is used to generate PA networks.
With respect to the required inputs in Algorithm 1, we elaborate the usage of rpanet() and its specifications via the control argument as follows.
Initial network The initial.network is specified by a list containing a matrix of edges (edgelist) in the order of edge creations, a vector of edge weights (edgeweight), and a logical argument (directed) indicating whether the initial network as well as the generated network are directed.Each row of edgelist has two elements specifying the two nodes of an edge.The length of edgeweight is equal to the number of rows of edgelist.If edgeweight is not specified, all edges from the initial network are assumed to have weight 1.The default initial network has only one edge, (1, 2, 1.0), corresponding to a network consisting of two nodes with a unit-weight edge from node 1 to node 2. The initial.network can also be specified by a wdnet object, which can be constructed via utility functions edgelist_to_wdnet() or adj_to_wdnet().The following example sets up an initial network with two weighted edges, (1, 2, 0.5) and (3, 4, 2.0).netwk0 <-list(edgelist = matrix(c(1, 2, 3, 4), nrow = 2, byrow = TRUE), edgeweight = c(0.5,2.0), directed = TRUE) Edge scenarios The function rpa_control_scenario() is used to specify the probability of each edge creation scenario.Based on the real data analysis in Wan et al. (2017), we also include two additional edge creation scenarios to the α, β and γ schemes introduced in Section 2.1: (1) the ξ scheme where a new edge is added between two new nodes, and (2) the ρ scheme where a new node with a self-loop is added.Self-loops are allowed in the β scheme by setting beta.loop to be TRUE.When beta.loop = FALSE, the order of sampling source and target nodes may affect the structure of generated PA network as controlled by the logical arguments source.first.The default settings of these arguments are α = 1, β = γ = ξ = ρ = 0, beta.loop= source.first= TRUE.The following example sets up a configuration that excludes self-loops under the β scheme and samples target nodes before source nodes.
ctr1 <-rpa_control_scenario(alpha = 0.2, beta = 0.6, gamma = 0.2, beta.loop= FALSE, source.first= FALSE) Edge weights Edge weights are controlled by rpa_control_edgeweight() through its sampler argument.This argument accepts a function that takes a single parameter, representing the number of sampled values, and returns a vector of sampled edge weights.Note that the sampled values must be positive real numbers.The default setting is sampler = NULL, referring to the case where all new edges take unit weight.As shown in the following example, edge weights are sampled from a gamma distribution with shape 5 and scale 0.2; the "+' ' operator has been overloaded to concatenate multiple control lists.
We further allow users to specify their own preference functions; see Section 3.2 for details.
Returned value Function rpanet() returns a list of class wdnet containing the following components: newedge is a vector summarizing the number of new edges added at each step; edge.attr is a data frame containing edge weights and edge creation scenarios, where edges from schemes α, β, γ, ξ, ρ are respectively denoted as scenarios 1, 2, 3, 4, 5, and edges from the initial network are denoted as scenario 0; node.attr is a data frame containing node out-and in-strengths as well as source and target preference scores.Other items are self-explanatory.

Additional features
The package wdnet provides a few additional distinctive features in the PA network generation process that are not available in other software packages.These features are obtained by adapting the Algorithm 1.

Multiple edge addition
The creation of multiple edges at one step is controlled by function rpa_control_newedge().The first argument of this function, sampler, determines the distribution of the number of new edges to be added in the same step.This argument accepts a function that takes a single parameter, representing the number of values to be sampled, and returns a vector of sampled number of new edges.Note that the sampled values must be positive integers.By default, sampler is set to NULL, representing the addition of only one edge at each step.
When more than one edges are added at one step, we keep the node strengths and their preference scores unchanged until all edges at this step have been added.Users need to specify whether to sample the candidate nodes with replacements or not.For directed networks, the logical arguments snode.replaceand tnode.replacedetermine whether the source and target nodes are sampled with replacement, respectively.For undirected networks, only one logical argument node.replaceneeds to be specified.
The code below updates the setting from ctr3 by letting the number of new edges follows a unit-shifted Poisson distribution (Wang and Resnick, 2023) with probability mass function Both source and target nodes are sampled without replacement.
ctr4 <-ctr3 + rpa_control_newedge(sampler = function(n) rpois(n, 2) + 1, snode.replace= FALSE, tnode.replace= FALSE) Reciprocal edges Reciprocal edges are mutual links between two nodes.We allow reciprocal edges under a heterogeneous setting (Wang and Resnick, 2022b) where each node belongs to one of the K ⩾ 1 groups.With the emergence of each new node, its group label is given according to a user-specified probability vector π := (π 1 , π 2 , . . ., π K ), where 0 ⩽ π k ⩽ 1 represents the probability that the node belongs to group k ∈ {1, 2, . . ., K}.Similar to stochastic block models, there is a probability block matrix q := (q kℓ ) K×K (not necessarily symmetric), which is also specified by the users, to determine the probability of adding a reciprocal edge for each new edge joining the network.For example, consider a new edge (v i , v j , w ij ) where v i and v j are respectively labeled with k = 2 and ℓ = 3, then its reciprocal correspondence (v j , v i , w ji ) is added to the network instantaneously with probability q 32 .The weight of the reciprocal edge (if added), w ji , is independently sampled with configurations specified in rpa_control_edgeweight().When more than one new edges are added at a step, the reciprocal edge for each of them is added independently, one after another.The function rpa_control_reciprocal() gives the configurations of reciprocal edges.The arguments group.proband recip.probspecify the probability vector π and the block probability matrix q, respectively.In addition, the logical argument selfloop.recipdetermines whether reciprocal edges for self-loops are allowed.Their default settings are group.prob= NULL, recip.prob= NULL and selfloop.recip= FALSE, respectively, referring to the case of no immediate reciprocal edges.The following example creates a configuration with π = (0.4,0.6) and q = 0.4 0.1 0.2 0.5 .
Customized preference functions User-defined preference functions in C++ syntax are allowed by setting ftype = "customized" in rpa_control_preference().This is implemented in C++ through the utility functions in R package RcppXPtrUtils (Ucar, 2022).For directed networks, one-line C++ expressions can be passed to arguments spref and tpref to define the source and target preference functions, respectively.The expressions are strings in R but with valid C++ syntax as transformations of outs and ins.The strings are passed to the function cppXPtr() in package RcppXPtrUtils, which compiles the source code and returns an XPtr (external pointer) that points to the compiled preference function.The default preference functions f 1 (x, y) = x + 1 and f 2 (x, y) = y + 1 can be equivalently achieved by setting spref = "outs + 1" and tpref = "ins + 1".The following example sets the preference functions to f 1 (x, y) = ln(x + 1) + 1, f 2 (x, y) = ln(y + 1) + 1: ctr6 <-ctr5 + rpa_control_preference(ftype = "customized", spref = "log(outs + 1) + 1", tpref = "log(ins + 1) + 1") For undirected networks, argument pref specifies a one-line C++ expression as a transformation of node strength s.The default preference function g(x) = x + 1 could be equivalently achieved by pref = "s + 1".Users need to ensure the non-negativity of the preference functions.The generation process will be terminated if a negative preference value is encountered.
For more advanced preference functions which may take multiple lines of C++ code, see examples in Appendix B.

Benchmarks
In this section, we generate weighted and unweighted PA networks with different sizes and preference functions via our package (wdnet, version 1.2.0), igraph (version 1.3.5)and PAFit (version 1.2.5), and compare their performance.All simulations were run on a single core of Intel Xeon Gold 6150 CPU @ 2.70GHz with 16 GB of RAM.
Weighted networks Since the other two packages (i.e., igraph and PAFit) do not admit edge weights, the comparison of weighted PA network generation is between the linear and binary methods in our package wdnet.Specifically, we assign the same probabilities to edge creation scenarios (i.e., α = β = γ = 1/3), set the source and target preference functions respectively to f 1 (x, y) = x k + 0.1 and f 2 (x, y) = y k + 0.1 with k ∈ {0.5, 1, 2} (where k = 0.5 and k = 2 respectively refer to sub-linearity and super-linearity).Draw the edge weights independently from Gamma(5, 0.2).For each k, we generate PA networks of various evolutionary steps (i.e., n ∈ 10 3 , • • • , 10 7 ) with a simple initial network consisting of two nodes and one edge (1, 2, 1) (default).The top three panels of Figure 3 compare the median runtimes of generating 100 independent weighted, directed PA networks via binary and linear methods.When preference functions are sub-linear (k = 0.5) or linear (k = 1), the binary method is much more efficient than the linear method.Besides, the larger the number of steps is, the more advantageous it is to use the binary method.Some simulations for the linear method are omitted because they are excessively timeconsuming.For a super-linear preference function (k = 2), the difference in generation speed between the two methods becomes subtle.A further investigation reveals that the sum of source (and target) preference scores of the 20 earliest created nodes, {v 1 , v 2 , . . ., v 20 }, take 99% of the total (for all nodes), making them dominant in the sampling process.Those early created nodes are always quickly selected under whichever edge addition scenario since linear search visits those "ancestors' ' first.Consequently, the time cost of using linear method is significantly reduced.
To further investigate the impact of early created nodes in the sampling process, we consider a modified (i.e., weighted and directed) Erdös-Rényi (ER) network (Erdös and Rényi, 1959;Gilbert, 1959) as an initial network.For each simulation run, we generate an ER network with 10 4 nodes and 10 6 edges, where edge weights are drawn independently from Gamma(5, 0.2).We keep all other parameters same as in the previous experiment, and give the median runtime (of 100 independently generated PA network replica) in the bottom three panels of Figure 3.We observe similar patterns for k = 0.5 and k = 1, so focus on k = 2 only.Here the binary algorithm outperforms the linear method, since the large seed network alleviates the domination of old nodes in the subsequent sampling process.
In fact, most nodes that are sampled throughout the process are those with high strengths in the seed graph.Owing to the feature of ER network, a few hundred nodes (out of 10 4 ) in the seed network are repeatedly sampled.However, a larger pool (compared to 20 in the previous experiment) results in longer runtime when using the linear method.On the other hand, we believe the performance of linear algorithm will improve as n gets larger since fewer nodes will continue to be dominant in the sampling process.Since we have added a sorting procedure in the linear algorithm, those nodes will be quickly selected.Accordingly, the linear method may finally outperform the binary method for extremely large networks.Last but not least, we find the tracing curves between 10 3 and 10 4 become flatter in the bottom plots when compared to their upper counterparts (for each k).This is due to the large seed graph, which requires a certain amount of time to initialize the sampling process.Consequently, there is a small difference in the total generation time for relatively small n.
Unweighted networks Next, we compare the performance of generating unweighted PA networks using our package wdnet and the other two popular packages PAFit and igraph.Since PAFit and igraph allow the α scheme only, we now set α = 1 in the rpa_control_scenario() function.Under such setting, we only need to define a target preference function in the form of f 2 (x, y) = y k +0.1, where we again assume k ∈ {0.5, 1, 2}.Similar to the previous experiments, we generate unweighted PA networks with n ∈ 10 3 , • • • , 10 7 for each k.A default initial network is adopted for each simulation run.The results are given in the top three panels of Figure 4.For k = 0.5 and k = 1, we find the binary algorithm in our package and igraph (psumtree method) are almost equally efficient, and outperform the rest.The performance of linear method in our package is better than PAFit (similar to linear search) since the former is implemented in C++ whereas the latter is implemented in R. For k = 2, we do not see much difference among the two methods in our package wdnet and that in igraph, which is consistent with our conclusions for the weighted PA network generation experiment.Overall, PAFit is the least efficient, especially for generating large PA networks.Lastly, we repeat our simulations by considering large ER networks as the initial graphs in order to alleviate the impact of few old nodes during the generation process.The setup is the same as that for weighted network simulations.Noticing that PAFit does not accept arbitrary initial networks, we exclude it from this set of simulation comparisons.The corresponding results are shown in the bottom three panels of Figure 4. Similar to the conclusions drawn for weighted PA networks, the binary method outperforms the linear method in our package owing to the less influence from old nodes when k = 2.Moreover, there is little performance difference between the binary method and igraph across all considered k values.

Discussions
Our R package wdnet provides useful tools to efficiently generate large-scale PA networks.The package admits a wide range of PA network specifications such as multiple edge addition scenarios, weighted and directed edges, and reciprocal edges.Our implementations extend those discussed in Wan et al. (2017) and Britton (2020), most of which are not available in other existing packages.A distinctive feature of the package is that it allow users to define their own preference functions.Our binary algorithm is efficient for general situations.Our linear algorithm outperforms implementations of the same algorithm in other packages due to its sorting step.The core implementation is in C++ for fast speed.
Efficient generation of PA networks facilitates investigation of PA networks properties and goodness-of-fit diagnosis in real applications.A PA network is controlled by many parameters.When theoretical properties, for example, transitivity and clustering coefficients, are challenging to derive, their empirical versions can be easily learned from generating many realizations given the model parameters.When the initial network size is large relative to the desired PA network size, its impact may not be ignorable and could be studied through simulations.In real applications, the goodness-of-fit diagnosis of a PA network can be done by generating many replicates from the fitted PA model and comparing the observed network statistics with the empirical distribution of the same statistics from the replicates.Such goodness-of-fit check may motivate modification of the PA networks to fit the real data better (e.g., Wang et al., 2022).
Beyond general PA network generation, wdnet also provides a collection of other functions.Specifically, several centrality measures are available via function centrality(), including the recently proposed weighted PageRank centrality (Zhang et al., 2022).Assortativity measures for weighted directed networks discussed in (Yuan et al., 2021) are available via function assortcoef().A degree-preserving rewiring algorithm for generating networks with predetermined assortativity coefficients (Wang et al., 2022) is available via function dprewire().All these functions are derived from recent research, so they are not available in other packages.

Figure 1 :
Figure 1: Three edge creation scenarios corresponding to α, β and γ schemes (from left to right), respectively.
zero vectors of out-and in-strengths, O and I; Initialize (n + |V (0)|)-dimensional zero vectors of source and target preference scores θ 1 and θ 2 ;

Figure 4 :
Figure 4: Algorithm runtime for unweighted networks with default initial network (upper) of one edge from v 1 to v 2 and with initial unweighted ER networks (lower) of 10 4 nodes and 10 6 edges.Probability of edge schemes are α = 1, β = γ = 0.The target preference function is f 2 (x, y) = y k + 0.1.Each point represents the median runtime of 100 replications.

Table 1 :
Summary of packages generating PA networks.
Search in the subtree with root l(v j ) */ 8 j ← index of l(v j ); 9 else if U > temp then /* Search in the subtree with root r(v j ) */ 10 U ← U − temp; 11 j ← index of r(v j ); 12 else if U ⩽ 0 then /* Return the index of the sampled node v j */ 13 return j;