Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

A Generalization Of Inverse Marshall-Olkin Family Of Distributions

K. Jayakumar K. K. Sankaran

https://doi.org/10.6339/JDS.202001_18(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 1 (2020), pp. 1–43

Abstract

We introduce a new family of distributions namely inverse truncated discrete Linnik G family of distributions. This family is a generalization of inverse Marshall-Olkin family of distributions, inverse family of distributions generated through truncated negative binomial distribution and inverse family of distributions generated through truncated discrete Mittag-Leffler distribution. A particular member of the family, inverse truncated negative binomial Weibull distribution is studied in detail. The shape properties of the probability density function and hazard rate, model identifiability, moments, median, mean deviation, entropy, distribution of order statistics, stochastic ordering property, mean residual life function and stress-strength properties of the new generalized inverse Weibull distribution are studied. The unknown parameters of the distribution are estimated using maximum likelihood method, product spacing method and least square method. The existence and uniqueness of the maximum likelihood estimates are proved. Simulation is carried out to illustrate the performance of maximum likelihood estimates of model parameters. An AR(1) minification model with this distribution as marginal is developed. The inverse truncated negative binomial Weibull distribution is fitted to a real data set and it is shown that the distribution is more appropriate for modeling in comparison with some other competitive models.

A Comparison Of Regularized Linear Discriminant Functions For Poorly-Posed Classification Problems

L. A. Thompson Wade Davis Phil D. Young All authors (5)

https://doi.org/10.6339/JDS.201901_17(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 1 (2019), pp. 1–36

Abstract

For statistical classification problems where the total sample size is slightly greater than the feature dimension, regularized statistical discriminant rules may reduce classification error rates. We review ten dispersion-matrix regularization approaches, four for the pooled sample covariance matrix, four for the inverse pooled sample covariance matrix, and two for a diagonal covariance matrix, for use in Anderson’s (1951) linear discriminant function (LDF). We compare these regularized classifiers against the traditional LDF for a variety of parameter configurations, and use the estimated expected error rate (EER) to assess performance. We also apply the regularized LDFs to a well-known real-data example on colon cancer. We found that no regularized classifier uniformly outperformed the others. However, we found that the more contemporary classifiers (e.g., Thomaz and Gillies, 2005; Tong et al., 2012; and Xu et al., 2009) tended to outperform the older classifiers, and that certain simple methods (e.g., Pang et al., 2009; Thomaz and Gillies, 2005; and Tong et al., 2012) performed very well, questioning the need for involved cross-validation in estimating regularization parameters. Nonetheless, an older regularized classifier proposed by Smidt and McDonald (1976) yielded consistently low misclassification rates across all scenarios, despite the shape of the true covariance matrix. Finally, our simulations showed that regularized classifiers that relied primarily on asymptotic approximations with respect to the training sample size rarely outperformed the traditional LDF, and are thus not recommended. We discuss our results as they pertain to the effect of high dimension, and offer general guidelines for choosing a regularization method for poorly-posed problems.

A Method for Evaluating Options for Motif Detection in Electricity Meter Data

Ian Dent Tony Craig Uwe Aickelin All authors (4)

https://doi.org/10.6339/JDS.201801_16(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 1 (2018), pp. 1–28

Abstract

Investigation of household electricity usage patterns, and mat- ching the patterns to behaviours, is an important area of research given the centrality of such patterns in addressing the needs of the electricity indu- stry. Additional knowledge of household behaviours will allow more effective targeting of demand side management (DSM) techniques. This paper addresses the question as to whether a reasonable number of meaningful motifs, that each represent a regular activity within a domestic household, can be identified solely using the household level electricity meter data. Using UK data collected from several hundred households in Spring 2011 monitored at a frequency of five minutes, a process for finding repeating short patterns (motifs) is defined. Different ways of representing the motifs exist and a qualitative approach is presented that allows for choosing between the options based on the number of regular behaviours detected (neither too few nor too many).

The Performance of Hybrid Artificial Neural Network Models for Option Pricing during Financial Crises

David Liu Siyuan Huang

https://doi.org/10.6339/JDS.201601_14(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 1 (2016), pp. 1–18

Abstract

Abstract: this paper provides a novel research on the pricing ability of the hybrid ANNs based upon the Hang Seng Index Options spanning a period of from Nov, 2005 to Oct, 2011, during which time the 2007-20008 financial crisis had developed. We study the performances of two hybrid networks integrated with Black-Scholes model and Corrado and Su model respectively. We find that hybrid neural networks trained by using the financial data retained from a booming period of a market cannot have good predicting performance for options for the period that undergoes a financial crisis (tumbling period in the market), therefore, it should be cautious for researchers/practitioners when carry out the predictions of option prices by using hybrid ANNs. Our findings have likely answered the recent puzzles about NN models regarding to their counterintuitive performance for option pricing during financial crises, and suggest that the incompetence of NN models for option pricing is likely due to the fact NN models may have been trained by using data from improper periods of market cycles (regimes), and is not necessarily due to the learning ability and the flexibility of NN models.

Assessing Effects of An Intervention on Bottle-Weaning and Reducing Daily Milk Intake from Bottles in Toddlers Using Two-Part Random Effects Models

Yungtai Lo

https://doi.org/10.6339/JDS.201501_13(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 1 (2015), pp. 1–20

Abstract

Abstract: Two-part random effects models have been used to fit semi-continuous longitudinal data where the response variable has a point mass at 0 and a con tinuous right-skewed distribution for positive values. We review methods pro posed in the literature for analyzing data with excess zeros. A two-part logit-log normal random effects model, a two-part logit-truncated normal random effects model, a two-part logit-gamma random effects model, and a two-part logit-skew normal random effects model were used to examine effects of a bottle-weaning intervention on reducing bottle use and daily milk intake from bottles in toddlers aged 11 to 13 months in a randomized controlled trial. We show in all four two-part models that the intervention promoted bottle-weaning and reduced daily milk intake from bottles in toddlers drinking from a bottle. We also show that there are no differences in model fit using either the logit link function or the probit link function for modeling the probability of bottle-weaning in all four models. Furthermore, prediction accuracy of the logit or probit link function is not sensitive to the distribution assumption on daily milk intake from bottles in toddlers not off bottles.

Assessing Agreement between Raters from the Point of Coefficients and Loglinear Models

Ayfer Ezgi Yilmaz Tulay Saracbasi Tulay Saracbasi

https://doi.org/10.6339/JDS.201701_15(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 1 (2017), pp. 1–24

Abstract

In square contingency tables, analysis of agreement between row and column classifications is of interest. For nominal categories, kappa co- efficient is used to summarize the degree of agreement between two raters. Numerous extensions and generalizations of kappa statistics have been pro- posed in the literature. In addition to the kappa coefficient, several authors use agreement in terms of log-linear models. This paper focuses on the approaches to study of interrater agreement for contingency tables with nominal or ordinal categories for multiraters. In this article, we present a detailed overview of agreement studies and illustrate use of the approaches in the evaluation agreement over three numerical examples.

Comparing Two Dependent Groups: Dealing with Missing Values

Rand R. Wilcox

https://doi.org/10.6339/JDS.201101_09(1).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 1–13

Abstract

Abstract: The paper considers the problem of comparing measures of lo cation associated with two dependent groups when values are missing at random, with an emphasis on robust measures of location. It is known that simply imputing missing values can be unsatisfactory when testing hypothe ses about means, so the goal here is to compare several alternative strategies that use all of the available data. Included are results on comparing means and a 20% trimmed mean. Yet another method is based on the usual median but differs from the other methods in a manner that is made obvious. (It is somewhat related to the formulation of the Wilcoxon-Mann-Whitney test for independent groups.) The strategies are compared in terms of Type I error probabilities and power.

Yoon Young Jung Youngja Park Dean P. Jones All authors (5)

https://doi.org/10.6339/JDS.2010.08(1).481

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 1 (2010), pp. 1–19

Abstract

Abstract: High resolution of NMR spectroscopic data of biosamples are a rich source of information on the metabolic response to physiological variation or pathological events. There are many advantages of NMR techniques such as the sample preparation is fast, simple and non-invasive. Statistical analysis of NMR spectra usually focuses on differential expression of large resonance intensity corresponding to abundant metabolites and involves several data preprocessing steps. In this paper we estimate functional components of spectra and test their significance using multiscale techniques. We also explore scaling in NMR spectra and use the systematic variability of scaling descriptors to predict the level of cysteine, an important precursor of glutathione, a control antioxidant in human body. This is motivated by high cost (in time and resources) of traditional methods for assessing cysteine level by high performance liquid chromatograph (HPLC).

Least square and Empirical Bayes Approaches for Estimating Random Change Points

Yuanjia Wang Yixin Fang

https://doi.org/10.6339/JDS.2009.07(1).444

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 1 (2009), pp. 1–12

Abstract

Abstract: Here we develop methods for applications where random change points are known to be present a priori and the interest lies in their estimation and investigating risk factors that influence them. A simple least square method estimating each individual’s change point based on one’s own observations is first proposed. An easy-to-compute empirical Bayes type shrinkage is then proposed to pool information from separately estimated change points. A method to improve the empirical Bayes estimates is developed. Simulations are conducted to compare least-square estimates and Bayes shrinkage estimates. The proposed methods are applied to the Berkeley Growth Study data to estimate the transition age of the puberty height growth.

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves

Feng Gao

https://doi.org/10.6339/JDS.2008.06(1).368

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 1 (2008), pp. 1–13

Abstract

Abstract: Receiver operating characteristic (ROC) methodology is widely used to evaluate diagnostic tests. It is not uncommon in medical practice that multiple diagnostic tests are applied to the same study sample. A va riety of methods have been proposed to combine such potentially correlated tests to increase the diagnostic accuracy. Usually the optimum combina tion is searched based on the area under a ROC curve (AUC), an overall summary statistics that measures the distance between the distributions of diseased and non-diseased populations. For many clinical practitioners, however, a more relevant question of interest may be ”what the sensitivity would be for a given specificity (say, 90%) or what the specificity would be for a given sensitivity?”. Generally there is no unique linear combination superior to all others over the entire range of specificities or sensitivities. Under the framework of a ROC curve, in this paper we presented a method to estimate an optimum linear combination maximizing sensitivity at a fixed specificity while assuming a multivariate normal distribution in diagnostic tests. The method was applied to a real-world study where the accuracy of two biomarkers was evaluated in the diagnosis of pancreatic cancer. The performance of the method was also evaluated by simulation studies.

79 80 81 82 83

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China