Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 877

Order by:

Select: All None Download:

Building an Honest Tree for Mass Spectra Classification Based on Prior Logarithm Normal Distribution

Cheng-Jian Xu Ping He Yi-Zeng Liang

https://doi.org/10.6339/JDS.2003.01(4).179

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 1, Issue 4 (2003), pp. 497–509

A Meta Analysis for the Basic Reproduction Number of 2 COVID-19 with Application in Evaluating the Effectiveness of 3 Isolation Measures in Different Countries

Jianghu Dong Yongdao Zhou Ying Zhang All authors (5)

https://doi.org/10.6339/JDS.202007_18(3).0016

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 496–510

Abstract

COVID-19 is quickly spreading around the world and carries along with it a significant threat to public health. This study sought to apply meta-analysis to more accurately estimate the basic reproduction number (R0) because prior estimates of R0 have a broad range from 1.95 to 6.47 in the existing literature. Utilizing meta-analysis techniques, we can determine a more robust estimation of R0, which is substantially larger than that provided by the World Health Organization (WHO). A susceptible-Infectious-removed (SIR) model for the new infection cases based on R0 from meta analysis is proposed to estimate the effective reproduction number Rt. The curves of estimated Rt values over time can illustrate that the isolation measures enforced in China and South Korea were substantially more effective in controlling COVID-19 compared to the measures enacted early in both Italy and the United States. Finally, we present the daily standardized infection cases per million population over time across countries, which is a good index to indicate the effectiveness of isolation measures on the prevention of COVID-19. This standardized infection case determines whether the current infection severity status is out of range of the national health capacity to care for patients.

Improved Test for Detecting Explosive Bubbles

Harsha S Ismail B

https://doi.org/10.6339/JDS.201707_15(3).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 3 (2017), pp. 495–508

Abstract

Recent decades have witnessed a series of damages in the financial sector due to the unpleasant movements of prices beyond certain limits. These movements are commonly termed as Financial Bubbles. The formation and burst of a bubble creates huge damage in the field of finance. Hence in order to prevent the market from facing damages, the detection and modeling of financial bubble is very essential. We proposed improved test procedures for detecting financial bubbles by combining the existing Max test and Supremum Augmented Dickey Fuller (SADF) test generally used for detecting bubbles. The performance of proposed test is compared with existing tests via Monte Carlo simulation. It is observed that the proposed test have higher power compared to the existing tests, for detecting collapsible bubble irrespective of window length and collapsible probability. Further the power of proposed test increases as window size decreases. The empirical study of S&P 500 monthly data from January 2006 to December 2010 is carried out to demonstrate the advantages of proposed test procedures over existing tests.

Copulas Applications in Estimating Value-at-Risk (VaR): Iranian Crude Oil Prices

Faramarz Kashanchi Pranesh Kumar

https://doi.org/10.6339/JDS.201407_12(3).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 3 (2014), pp. 495–518

Abstract

Abstract: Crude oil being the primary source of energy is been unquestioningly the main driving engine of every country in this world whether it is the oil producer economy and/or oil consumer economy. Crude oil, one of the key strategic products in the global market, may influence the economy of the exporting and importing countries. Iran is one of the major crude oil exporting partners of the Organization of the Petroleum Exporting Countries (OPEC). Analysis of the risk measures associated with the Iranian oil price data is of strategic importance to the Iranian government and policy makers in particular for the short-and long-term planning for setting up the oil production targets. Oil price risk-management focuses mainly on when and how an organization can best prevent the costly exposure to the price risk. Value-at-Risk (VaR) is the commonly accepted instrument of risk-measure and is evaluated by analysing the negative tail of the probability distributions of the returns/profit and loss. Among several approaches for calculating VaR, the most common approaches are variance-covariance approach, historical simulation and Monte-Carlo simulation. Recently, copula functions have emerged as a powerful tool to model and simulate multivariate probability distributions. Copula applications have been noted predominantly in the areas of finance, actuary, economics and health and clinical studies. In addition, copulas are useful devices to deal with the non normality and non-linearity issues which are frequently observed in cases of financial time series data. In this paper we shall apply copulas namely; Frank copula, Clayton copula and Gumbel copula to analyse the time series crude oil price data of Iran in respect of OPEC prices. Data considered are; i. Monthly average prices for a barrel of Iranian and OPEC crude oil, from January 1997 to December 2008, ii. Seasonal number of barrels of Iran’s crude oil export, from January 1997 to December 2008. The results will demonstrate copula simulated data are providing higher and lower relative change values on the upper and lower tails respectively in comparison to the original data.

Modelling the Distribution of Age at Last Conception of Females

Shruti Verma Kaushalendra K. Singh

https://doi.org/10.6339/JDS.201507_13(3).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 3 (2015), pp. 495–508

Abstract

Social phenomena that are related to human beings cannot be performed under controlled conditions, making it difficult for policy planners to have an idea about the expected future conditions in the society under varying situations and forming policies. However, modelling can be really helpful to planners in these situations. The present paper attempts to find the distributions of age at last conception of women with the help of stochastic modelling for human fertility taking into consideration different parity progression behaviours among couples. This may be helpful to planners for having at least a rough idea of estimated proportion of women of different age groups who will be completing their childbearing and willing to go for sterilization after marriage under different stopping rules regarding desired family size and sex composition of children. Accordingly, these estimates will help planners to optimize the cost and service provision for sterilization programs for women.

The Bayesian Multiple Logistic Random Effects Model for Analysis of Clinical Trial Data

Karan P. Singh Alfred A. Bartolucci Sejong Bae

https://doi.org/10.6339/JDS.2010.08(3).606

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 3 (2010), pp. 495–504

Abstract

Abstract: A prospective, multi-institutional and randomized surgical trial involving 724 early stage melanoma patients was conducted to determine whether excision margins for intermediate-thickness melanomas (1.0 to 4.0 mm) could be safely reduced from the standard 4-cm radius. Patients with 1- to 4-mm-thick melanomas on the trunk or proximal extremities were randomly assigned to receive either a 2- or 4-cm surgical margin with or without immediate node dissection (i.e. immediate vs. later -within 6 months). The median follow-up time was 6 years. Recurrence rates did not correlate with surgical margins, even among stratified thickness groups. The hospital stay was shortened from 7.0 days for patients receiving 4-cm surgical margins to 5.2 days for those receiving 2-cm margins (p = 0.0001). This reduction was largely due to reduced need for skin grafting in the 2cm group. The overall conclusion was that the narrower margins significantly reduced the need for skin grafting and shortened the hospital stay. Due to the adequacy of subject follow up, recently a statistical focus was on what prognostics factors usually called covariates actually determined recurrence. As was anticipated, the thickness of the lesion (p = 0.0091) and whether or not the lesion was ulcerated (p = 0.0079), were determined to be significantly associated with recurrence events using the logistic regression model. This type of fixed effect analysis is rather a routine. The authors have determined that a Bayesian consideration of the results would afford a more coherent interpretation of the effect of the model assuming a random effect of the covariates of thickness and ulceration. Thus, using a Markov Chain Monte Carlo method of parameter estimation with non informative priors, one is able to obtain the posterior estimates and credible regions of estimates of these effects as well as their interaction on recurrence outcome. Graphical displays of convergence history and posterior densities affirm the stability of the results. We demonstrate how the model performs under relevant clinical conditions. The conditions are all tested using a Bayesian statistical approach allowing for the robust testing of the model parameters under various recursive partitioning conditions of the covariates and hyper parameters which we introduce into the model. The convergence of the parameters to stable values are seen in trace plots which follow the convergence patterns This allows for precise estimation for determining clinical conditions under which the response pattern will change.

A Bayesian Multiple Comparison Approach for Gene Expression Data Analysis

Erlandson F. Saraiva Lu´ıs A. Milan

https://doi.org/10.6339/JDS.201607_14(3).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 3 (2016), pp. 491–508

Abstract

Abstract: Methods used to detect differentially expressed genes in situations with one control and one treatment are t-tests. These methods do not per- form well when control and treatment variances are different. In situations with a control and more than one treatment, it is common to apply analysis of variance followed by a Tukey and/or Duncan test to identify which treat- ment caused the difference. We propose a Bayesian approach for multiple comparison analysis which is very useful in the context of DNA microarray experiments. It uses a priori Dirichlet process and Polya urn scheme. It is a unified procedure (for cases with one or more treatments) which detects differentially expressed genes and identify treatments causing the difference. We use simulations to verify the performance of the proposed method and compare it with usual methods. In cases with control and one treatment and control and more than one treatment followed by Tukey and Duncan tests, the method presents better performance when variances are different. The method is applied to two real data sets. In these cases, genes not detected by usual methods are identified by the proposed method.

Psychometric Data Analysis: A Size/fit Trade-off Evaluation Procedure for Knowledge Structures

Ali Unl¨u Waqas Ahmed Malik

https://doi.org/10.6339/JDS.2008.06(4).480

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 4 (2008), pp. 491–514

Abstract

Abstract: A crucial problem in knowledge space theory, a modern psy chological test theory, is the derivation of a realistic knowledge structure representing the organization of knowledge in an information domain and examinee population under reference. Often, one is left with the problem of selecting among candidate competing knowledge structures. This article proposes a measure for the selection among competing knowledge structures. It is derived within an operational framework (prediction paradigm), and is partly based on the unitary method of proportional reduction in predictive error as advocated by the authors Guttman, Goodman, and Kruskal. In particular, this measure is designed to trade off the (descriptive) fit and size of a knowledge structure, which is of high interest in knowledge space theory. The proposed approach is compared with the Correlational Agreement Coef ficient, which has been recently discussed for the selection among competing surmise relations. Their performances as selection measures are compared in a simulation study using the fundamental basic local independence model in knowledge space theory

Count Regression Models with an Application to Zoological Data Containing Structural Zeros

Ilknur Ozmen Felix Famoye

https://doi.org/10.6339/JDS.2007.05(4).385

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 4 (2007), pp. 491–502

Abstract

Abstract: Recently, count regression models have been used to model over dispersed and zero-inflated count response variable that is affected by one or more covariates. Generalized Poisson (GP) and negative binomial (NB) regression models have been suggested to deal with over-dispersion. Zero inflated count regression models such as the zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB) and zero-inflated generalized Pois son (ZIGP) regression models have been used to handle count data with many zeros. The aim of this study is to model the number of C. caretta hatchlings dying from exposure to the sun. We present an evaluation frame work to the suitability of applying the Poisson, NB, GP, ZIP and ZIGP to zoological data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood (ML) is provided. Based on the score test and the goodness of fit measure for zoological data, the GP regression model performs better than other count regression models.

Tests of Independence with Incomplete Contingency Tables Using Likelihood Functions

Shin-Soo Kang Michael D. Larsen

https://doi.org/10.6339/JDS.201110_09(4).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 4 (2011), pp. 487–500

Abstract

Abstract: Kang (2006) used the log-likelihood function with Lagrangian multipliers for estimation of cell probabilities in two-way incomplete contingency tables. The constraints on cell probabilities can be incorporated through Lagrangian multipliers for the likelihood function. The method can be readily extended to multidimensional tables. Variances of the MLEs are derived from the matrix of second derivatives of the log likelihood with respect to cell probabilities and the Lagrange multiplier. Wald and likelihood ratio tests of independence are derived using the estimates and estimated variances. Simulation results, when data are missing at random, reveal that maximum likelihood estimation (MLE) produces more efficient estimates of population proportions than either multiple imputation (MI) based on data augmentation or complete case (CC) analysis. Neither MLE nor MI, however, leads to an improvement over CC analysis with respect to power of tests for independence in 2×2 tables. Thus, the partially classified marginal information increases precision about proportions, but is not helpful for judging independence.

30 31 32 33 34

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China