Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

A Study of Age Distribution of Prostate Cancer Detection

Suparna Basu Sanjay Kumar Singh Umesh Singh

https://doi.org/10.6339/JDS.201607_14(3).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 3 (2016), pp. 539–552

Abstract

Abstract: This paper aims to propose a suitable statistical model for the age distribution of prostate cancer detection. Descriptive studies suggest the onset of prostate cancer after 37 years of age with maximum diagnosis age at around 70 years. The major deficiency of descriptive studies is that the results cannot be generalized for all types of populations usually having non-identical environmental conditions. The proposition follows by checking the suitability of the model through different statistical tools like Akaike Information Criterion, Kolmogorov Smirnov distance, Bayesian Information Criterion and χ2 statistic. The Maximum likelihood estimate of the parameters of the proposed model along with their asymptotic confidence intervals have been obtained for the considered real data set.

A Comparative Analysis of Decision Trees Vis-`a-vis Other Computational Data Mining Techniques in Automotive Insurance Fraud Detection

Adrian Gepp J. Holton Wilson Kuldeep Kumar All authors (4)

https://doi.org/10.6339/JDS.201207_10(3).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 3 (2012), pp. 537–561

Abstract

Abstract: The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.

Getting the Most from Data — Maximizing Information and Power by Using Appropriate and Modern Statistical Methods

Timothy E. O’Brien Martin B. Berg

https://doi.org/10.6339/JDS.2009.07(4).558

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 4 (2009), pp. 537–550

Abstract

Abstract: Through a series of carefully chosen illustrations from biometry and biomedicine, this note underscores the importance of using appropriate analytical techniques to increase power in statistical modeling and testing. These examples also serve to highlight some of the important recent devel opments in applied statistics of use to practitioner

Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis

Bernd Klaus

https://doi.org/10.6339/JDS.2013.11(3).1185

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 3 (2013), pp. 537–558

Abstract

Abstract: Supervised classifying of biological samples based on genetic information, (e.g., gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate, which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and variable selection methods are com petitive. Particularly, they lead to both compact and interpretable feature sets. Program files to be used with the statistical software R implementing the variable selection approaches presented in this article are available from my homepage: http://b-klaus.de.

Meta-Analysis of Several Epidemic Characteristics of COVID-19

Panpan Zhang Tiandong Wang Sharon X. Xie

https://doi.org/10.6339/JDS.202007_18(3).0019

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 536–549

Abstract

As the COVID-19 pandemic has strongly disrupted people’s daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the serial interval and the epidemic doubling time. We collect relevant studies based on the COVID-19 data in China and conduct a meta-analysis to obtain pooled estimates on the four metrics. From the summary results, we conclude that the COVID-19 has stronger transmissibility than SARS, implying that stringent public health strategies are necessary.

Bivariate Generalized Burr and Related Distributions: Properties and Estimation

Hiba Z. Muhammed

https://doi.org/10.6339/JDS.201907_17(3).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 3 (2019), pp. 535–550

Abstract

Compound distributions gained their importance from the fact that natural factors have compound effects, as in the medical, social and logical experiments. Dubey (1968) introduced the compound Weibull by compounding Weibull distribution with gamma distribution. The main aim of this paper is to define a bivariate generalized Burr (compound Weibull) distribution so that the marginals have univariate generalized Burr distributions. Several properties of this distribution such as marginals, conditional distributions and product moments have been discussed. The maximum likelihood estimates for the unknown parameters of this distribution and their approximate variance- covariance matrix have been obtained. Some simulations have been performed to see the performances of the MLEs. One data analysis has been performed for illustrative purpose.

The Generalized Weibull-Burr Xii Distribution and Its Applications

Najmieh Maksaei Emrah Altun

https://doi.org/10.6339/JDS.201707_15(3).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 3 (2017), pp. 535–552

Abstract

In this paper, we introduce a new lifetime model, called the Gen- eralized Weibull-Burr XII distribution. We discuss some of its mathematical properties such as density, hazard rate functions, quantile function and mo- ments. Maximum likelihood method is used to estimate model parameters. A simulation study is performed to assess the performance of maximum like- lihood estimators by means of biases, mean squared errors. Finally, we prove that the proposed distribution is a very competitive model to other classical models by means of application on real data set.

Power Weighted Quantile Regression and Its Application

Xue J. Ma Feng X. He

https://doi.org/10.6339/JDS.201407_12(3).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 3 (2014), pp. 535–544

Abstract

Abstract: In the paper, we propose power weighted quantile regression(PWQR), which can reduce the effect of heterogeneous of the conditional densities of the response effectively and improve efficiency of quantile regression). In addition to PWQR, this article also proves that all the weighting of those that the actual value is less than the estimated value of PWQR and the proportion of all the weighting is very close to the corresponding quantile. At last, this article establishes the relationship between Geomagentic Indices and GIC. According to the problems of power system security operation, we make GIC risk value table. This table can have stronger practical operation ability, can provide power system security operation with important inferences.

Distribution-Free Regression: Reinterpreting Design-Based Sampling

Gordon G. Bechtel

https://doi.org/10.6339/JDS.2007.05(4).352

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 4 (2007), pp. 535–554

Abstract

Abstract: An individual in a finite population is represented by a random variable whose expectation is linearly composed of explanatory variables and a personal effect. This expectation locates her (his) random variable on a scale when s(he) responds to a questionnaire item or physical instrument. This formulation reinterprets design-based sampling, which represents an individual as a constant waiting to be observed. Retaining constant expecta tions , however, along with fixed realizations of random variables, preserves and strengthens design-based theory through the Horvitz-Thompson (1952) theorem. This interpretation reaffirms the usual design-based regression es timates, whose normality is seen to be free of any assumptions about the distribution of the outcome variable. It also formulates response error in a way that renders a superpopulation, postulated by model-based sampling, unnecessary. The value of distribution-free regression is illustrated with an analysis of American presidential approval.

Neyman type A Distribution for the Natural Disasters and Related Casualties in Turkey

Gamze Ozel Semra Turkan

https://doi.org/10.6339/JDS.201507_13(3).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 3 (2015), pp. 533–550

Abstract

The statistical modeling of natural disasters is an indispensable tool for extracting information for prevention and risk reduction casualties. The Poisson distribution can reveal the characteristics of 1 a natural disaster. However, this distribution is insufficient for the clustering of natural events and related casualties. The best approach is to use a Neyman type A (NTA) distribution which has the feature that two or more events occur in a short time. We obtain some properties of the NTA distribution and suggest that it could provide a suitable description to analyze the natural disaster distribution and casualties. We support this argument using disaster events, including earthquakes, floods, landslides, forest fires, avalanches, and rock falls in Turkey between 1900 and 2013. The data strongly supports that the NTA distribution represents the main tool for handling disaster data. The findings indicate that approximately three earthquakes, fifteen landslides, five floods, six rock falls, six avalanches, and twenty nine forest fires are expected in a year. The results from this model suggest that the probability of the total number of casualties is the highest for the earthquakes and the lowest for the rock falls. This study also finds that the expected number of natural disasters approximately equals to 64 per year and inter-event time between two successive earthquakes is approximately four months. The inter-event time for the natural disasters is approximately six days in Turkey.

28 29 30 31 32

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China