This document contains code used to analyse real data from the 9-11 Hijacker network for the Flow-Through Centrality project. We want to test whether which method of computing centrality is the most stable as applied to real data. The R package PMRplus is used to to obtain rank-based tests of accuracy. We also need the R package ``tidyverse’’ to massage the data into the appropriate format for hypothesis testing. The references are given below.
Before the tests are run, we subtract the true value of Centrality from each of the observed centrality values for each of the 10 samples, multiply by 2, then divide by the difference of the true centrality and the observed centrality, thus obtaining a percentage error. This operation gets all centrality measures on the same scale and allows comparisons with the true value.
Torsten Hothorn, Kurt Hornik, Mark A. van de Wiel, Achim Zeileis (2008). Implementing a Class of Permutation Tests: The coin Package. Journal of Statistical Software 28(8), 1-23. URL http://www.jstatsoft.org/v28/i08/
The functions we use from the PMCMRplus package are NPM_test ordered pairs test. The data need to be in long format and balanced across the groups.
.csv files associated with this markdown document are in “~/Code/HijackerData”. Original data were given in Excel files, which are in the directory “~/Code/HijackerData/ExcelFiles”.
# setwd("./Code/HijackerData")
#install.packages("coin") - use if you have not installed the package.
library(coin) # load the package in R
## Loading required package: survival
library(tidyverse) # for tidying the data
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(xtable) # for obtaining tables in LaTex format
library(PMCMRplus) # For Nashimoto-Wright NPM-test for ordered means of non-normal data
RMarkdown file to enter Hijacker Data with 6% perturbation and calculate relative percentage error for each centrality measure.
varNames <- c("Network",paste0(c("Cent","Tier"), rep(1:62,each=2)))
dataBW <- read.csv(file="Hijack_6percent_betweenness.csv",skip=5,nrows=11,header=FALSE)
colnames(dataBW) <- varNames
dataBW$Network<-str_replace_all(dataBW$Network, c(" " = "" ))
test <- dataBW %>% mutate_at(vars(matches("Cent")), .funs = function(x) is.infinite(x))
length(test[test == TRUE])
## [1] 11
dataBW <- dataBW %>% mutate_at(vars(matches("Cent")), .funs =function(x) (2*(x-x[1]))/(abs(x)+abs(x[1])))
dataBW <- dataBW %>% mutate_at(vars(matches("Cent")), .funs = function(x) ifelse(is.nan(x), 0, x))
dataBW <- dataBW[-1,] # delete first row of true values
dataBW$Measure <- rep("BW",10)
dataBW$Sample <- 1:10
dataBWLong <- gather(dataBW,key="CentLevel",value="Centrality",starts_with("Cent"), factor_key=TRUE) %>% select(Sample,Network,Measure,Centrality,CentLevel)
dataCL <- read.csv(file="Hijack_6percent_Closeness.csv",skip=5,nrows=11,header=FALSE)
colnames(dataCL) <- varNames
dataCL$Network<-str_replace_all(dataCL$Network, c(" " = "" ))
test <- dataCL %>% mutate_at(vars(matches("Cent")), .funs = function(x) is.infinite(x))
length(test[test == TRUE])
## [1] 11
dataCL <- dataCL %>% mutate_at(vars(matches("Cent")), .funs =function(x) (2*(x-x[1]))/(abs(x)+abs(x[1])))
dataCL <- dataCL %>% mutate_at(vars(matches("Cent")), .funs = function(x) ifelse(is.nan(x), 0, x))
dataCL <- dataCL[-1,] # delete first row of true values
dataCL$Measure <- rep("CL",10)
dataCL$Sample <- 1:10
dataCLLong <- gather(dataCL,key="CentLevel",value="Centrality",starts_with("Cent"), factor_key=TRUE) %>% select(Sample,Network,Measure,Centrality,CentLevel)
dataFB <- read.csv(file="Hijack_6percent_FlowBetween.csv",skip=5,nrows=11,header=FALSE)
colnames(dataFB) <- varNames
dataFB$Network<-str_replace_all(dataFB$Network, c(" " = "" ))
test <- dataFB %>% mutate_at(vars(matches("Cent")), .funs = function(x) is.infinite(x))
length(test[test == TRUE])
## [1] 11
dataFB <- dataFB %>% mutate_at(vars(matches("Cent")), .funs =function(x) (2*(x-x[1]))/(abs(x)+abs(x[1])))
dataFB <- dataFB %>% mutate_at(vars(matches("Cent")), .funs = function(x) ifelse(is.nan(x), 0, x))
dataFB <- dataFB[-1,] # delete first row of true values
dataFB$Measure <- rep("FB",10)
dataFB$Sample <- 1:10
dataFBLong <- gather(dataFB,key="CentLevel",value="Centrality",starts_with("Cent"), factor_key=TRUE) %>% select(Sample,Network,Measure,Centrality,CentLevel)
dataFT <- read.csv(file="Hijack_6percent_Flowthrough.csv",skip=5,nrows=11,header=FALSE)
colnames(dataFT) <- varNames
dataFT$Network<-str_replace_all(dataFT$Network, c(" " = "" ))
test <- dataFT %>% mutate_at(vars(matches("Cent")), .funs = function(x) is.infinite(x))
length(test[test == TRUE])
## [1] 11
dataFT <- dataFT %>% mutate_at(vars(matches("Cent")), .funs =function(x) (2*(x-x[1]))/(abs(x)+abs(x[1])))
dataFT <- dataFT %>% mutate_at(vars(matches("Cent")), .funs = function(x) ifelse(is.nan(x), 0, x))
dataFT <- dataFT[-1,] # delete first row of true values
dataFT$Measure <- rep("FT",10)
dataFT$Sample <- 1:10
dataFTLong <- gather(dataFT,key="CentLevel",value="Centrality",starts_with("Cent"), factor_key=TRUE) %>% select(Sample,Network,Measure,Centrality,CentLevel)
dataSB <- read.csv(file="Hijack_6percent_StableBetween.csv",skip=5,nrows=11,header=FALSE)
colnames(dataSB) <- varNames
dataSB$Network<-str_replace_all(dataSB$Network, c(" " = "" ))
test <- dataSB %>% mutate_at(vars(matches("Cent")), .funs = function(x) is.infinite(x))
length(test[test == TRUE])
## [1] 77
dataSB <- dataSB %>% mutate_at(vars(matches("Cent")), .funs =function(x) (2*(x-x[1]))/(abs(x)+abs(x[1])))
dataSB <- dataSB %>% mutate_at(vars(matches("Cent")), .funs = function(x) ifelse(is.nan(x), 0, x))
dataSB <- dataSB[-1,] # delete first row of true values
dataSB$Measure <- rep("SB",10)
dataSB$Sample <- 1:10
dataSBLong <- gather(dataSB,key="CentLevel",value="Centrality",starts_with("Cent"), factor_key=TRUE) %>% select(Sample,Network,Measure,Centrality,CentLevel)
hijackPert6 <- bind_rows(dataFTLong, dataFBLong, dataCLLong, dataBWLong, dataSBLong)
hijackPert6$Measure <- ordered(hijackPert6$Measure, levels=c("FT","FB","CL","BW","SB"))
hijackPert6Lev1 <- subset(hijackPert6, CentLevel == "Cent1", select = c("Sample","Network","Measure","Centrality"))
npmLev1 <- chaAllPairsNashimotoTest(Centrality~Measure,data=hijackPert6Lev1,p.adjust.methods="holm")
hijackPert6Lev2 <- subset(hijackPert6, CentLevel == "Cent2", select = c("Sample","Network","Measure","Centrality"))
npmLev2 <- chaAllPairsNashimotoTest(Centrality~Measure,data=hijackPert6Lev2,p.adjust.methods="holm")
hijackPert6Lev3 <- subset(hijackPert6, CentLevel == "Cent3", select = c("Sample","Network","Measure","Centrality"))
npmLev3 <- chaAllPairsNashimotoTest(Centrality~Measure,data=hijackPert6Lev3,p.adjust.methods="holm")
hijackPert6Lev4 <- subset(hijackPert6, CentLevel == "Cent4", select = c("Sample","Network","Measure","Centrality"))
npmLev4 <- chaAllPairsNashimotoTest(Centrality~Measure,data=hijackPert6Lev4,p.adjust.methods="holm")
hijackPert6Lev5 <- subset(hijackPert6, CentLevel == "Cent5", select = c("Sample","Network","Measure","Centrality"))
npmLev5 <- chaAllPairsNashimotoTest(Centrality~Measure,data=hijackPert6Lev5,p.adjust.methods="holm")
npmLev1
##
## Pairwise comparisons using Pairwise comparisons using Nashimoto-Wright's NPT'-Test
## data: Centrality by Measure
## FT FB CL BW
## FB 1.00000 - - -
## CL 1.00000 1.00000 - -
## BW 0.00063 0.00063 0.00063 -
## SB 0.00063 0.00063 0.00063 1.00000
##
## P value adjustment method: holm
## alternative hypothesis: greater
npmLev2
##
## Pairwise comparisons using Pairwise comparisons using Nashimoto-Wright's NPT'-Test
## data: Centrality by Measure
## FT FB CL BW
## FB 1 - - -
## CL 1 1 - -
## BW 1 1 1 -
## SB 1 1 1 1
##
## P value adjustment method: holm
## alternative hypothesis: greater
npmLev3
##
## Pairwise comparisons using Pairwise comparisons using Nashimoto-Wright's NPT'-Test
## data: Centrality by Measure
## FT FB CL BW
## FB 1.0000 - - -
## CL 1.0000 1.0000 - -
## BW 0.0037 0.0037 0.0037 -
## SB 0.0079 0.0079 0.0079 1.0000
##
## P value adjustment method: holm
## alternative hypothesis: greater
npmLev4
##
## Pairwise comparisons using Pairwise comparisons using Nashimoto-Wright's NPT'-Test
## data: Centrality by Measure
## FT FB CL BW
## FB 1 - - -
## CL 1 1 - -
## BW 1 1 1 -
## SB 1 1 1 1
##
## P value adjustment method: holm
## alternative hypothesis: greater
npmLev5
##
## Pairwise comparisons using Pairwise comparisons using Nashimoto-Wright's NPT'-Test
## data: Centrality by Measure
## FT FB CL BW
## FB 0.974 - - -
## CL 0.456 0.456 - -
## BW 0.456 0.456 0.951 -
## SB 0.011 0.011 0.456 0.456
##
## P value adjustment method: holm
## alternative hypothesis: greater
Plotting only 5 levels due to overcrowding of plot.
mypal <- c("#CC79A7","#E69F00","#0072B2", "#009E73", "#56B4E9", "#D55E00")
hijackPert6All <- bind_rows(hijackPert6Lev1,hijackPert6Lev2,hijackPert6Lev3,hijackPert6Lev4,hijackPert6Lev5, .id="CentLevel")
ggplot(hijackPert6All, aes(x=Sample,y=Centrality,colour=Measure)) +
theme_bw() + geom_line(aes(linetype=Measure), size=1.1) +
scale_colour_manual(values=mypal) +
facet_grid(CentLevel~.)
pdf(file="hijackPert6Lines.pdf")
ggplot(hijackPert6All, aes(x=Sample,y=Centrality,colour=Measure)) +
theme_bw() + geom_line(aes(linetype=Measure), size=1.1) +
scale_colour_manual(values=mypal) +
facet_grid(CentLevel~.)
dev.off()
## quartz_off_screen
## 2
sumDataPert6 <- hijackPert6All %>% group_by(CentLevel,Measure) %>% summarise(Mean=mean(Centrality),SD=sd(Centrality),Median=median(Centrality),MAD=mad(Centrality))
## `summarise()` has grouped output by 'CentLevel'. You can override using the `.groups` argument.
ggplot(sumDataPert6,aes(x=CentLevel,y=Mean,colour=Measure)) +
theme_bw() + geom_point(aes(shape=Measure), size=3) +
scale_colour_manual(values=mypal)
## Warning: Using shapes for an ordinal variable is not advised
pdf(file="hijackPert6Summary.pdf")
ggplot(sumDataPert6,aes(x=CentLevel,y=Mean,colour=Measure)) +
theme_bw() + geom_point(aes(shape=Measure), size=3) +
scale_colour_manual(values=mypal)
## Warning: Using shapes for an ordinal variable is not advised
dev.off()
## quartz_off_screen
## 2
xtable(sumDataPert6)
## % latex table generated in R 4.1.2 by xtable 1.8-4 package
## % Tue Jul 12 12:10:02 2022
## \begin{table}[ht]
## \centering
## \begin{tabular}{rllrrrr}
## \hline
## & CentLevel & Measure & Mean & SD & Median & MAD \\
## \hline
## 1 & 1 & FT & 0.00 & 0.00 & 0.00 & 0.00 \\
## 2 & 1 & FB & 0.00 & 0.00 & 0.00 & 0.00 \\
## 3 & 1 & CL & -0.02 & 0.02 & -0.02 & 0.01 \\
## 4 & 1 & BW & 0.00 & 0.00 & 0.00 & 0.00 \\
## 5 & 1 & SB & 0.00 & 0.00 & 0.00 & 0.00 \\
## 6 & 2 & FT & 0.04 & 0.21 & 0.11 & 0.08 \\
## 7 & 2 & FB & 0.04 & 0.23 & 0.06 & 0.30 \\
## 8 & 2 & CL & -0.02 & 0.02 & -0.01 & 0.02 \\
## 9 & 2 & BW & 0.05 & 0.44 & 0.04 & 0.37 \\
## 10 & 2 & SB & -0.07 & 0.90 & 0.00 & 0.51 \\
## 11 & 3 & FT & 0.20 & 0.63 & 0.00 & 0.00 \\
## 12 & 3 & FB & -0.07 & 0.34 & 0.02 & 0.05 \\
## 13 & 3 & CL & -0.03 & 0.05 & -0.01 & 0.01 \\
## 14 & 3 & BW & 0.20 & 0.63 & 0.00 & 0.00 \\
## 15 & 3 & SB & 0.00 & 0.00 & 0.00 & 0.00 \\
## 16 & 4 & FT & -0.00 & 0.11 & 0.02 & 0.07 \\
## 17 & 4 & FB & -0.02 & 0.20 & -0.03 & 0.15 \\
## 18 & 4 & CL & -0.02 & 0.02 & -0.02 & 0.02 \\
## 19 & 4 & BW & -0.03 & 0.11 & 0.01 & 0.06 \\
## 20 & 4 & SB & -0.01 & 0.11 & 0.00 & 0.14 \\
## 21 & 5 & FT & -0.18 & 0.47 & -0.01 & 0.30 \\
## 22 & 5 & FB & -0.46 & 0.50 & -0.36 & 0.51 \\
## 23 & 5 & CL & -0.01 & 0.01 & -0.01 & 0.01 \\
## 24 & 5 & BW & -0.41 & 0.84 & -0.05 & 0.12 \\
## 25 & 5 & SB & 0.00 & 0.00 & 0.00 & 0.00 \\
## \hline
## \end{tabular}
## \end{table}