Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

A Two-Stage Bayesian Model for Predicting Winners in Major League Baseball

Tae Young Yang Tim Swartz

https://doi.org/10.6339/JDS.2004.02(1).142

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 1 (2004), pp. 61–73

Abstract

Abstract: The probability of winning a game in major league baseball depends on various factors relating to team strength including the past per formance of the two teams, the batting ability of the two teams and the starting pitchers. These three factors change over time. We combine these factors by adopting contribution parameters, and include a home field ad vantage variable in forming a two-stage Bayesian model. A Markov chain Monte Carlo algorithm is used to carry out Bayesian inference and to sim ulate outcomes of future games. We apply the approach to data obtained from the 2001 regular season in major league baseball.

Stability and Structure of CART and SPAN Search Generated Data Partitions for the Analysis of Low Birth Weight

Roger J. Marshall Panagiota Kitsantas

https://doi.org/10.6339/JDS.2012.10(1).1014

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 1 (2012), pp. 61–73

Abstract

Abstract: Searching for data structure and decision rules using classification and regression tree (CART) methodology is now well established. An alternative procedure, search partition analysis (SPAN), is less well known. Both provide classifiers based on Boolean structures; in CART these are generated by a hierarchical series of local sub-searches and in SPAN by a global search. One issue with CART is its perceived instability, another the awkward nature of the Boolean structures generated by a hierarchical tree. Instability arises because the final tree structure is sensitive to early splits. SPAN, as a global search, seems more likely to render stable partitions. To examine these issues in the context of identifying mothers at risk of giving birth to low birth weight babies, we have taken a very large sample, divided it at random into ten non-overlapping sub-samples and performed SPAN and CART analyses on each sub-sample. The stability of the SPAN and CART models is described and, in addition, the structure of the Boolean representation of classifiers is examined. It is found that SPAN partitions have more intrinsic stability and less prone to Boolean structural irregularities.

Notes on Entropy for Concomitants of Record Values in Farlie-Gumbel-Morgenstern (FGM) Family

Saeid Tahmasebi

https://doi.org/10.6339/JDS.2013.11(1).1104

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 1 (2013), pp. 59–68

Abstract

Abstract: Let {(Xi , Yi), i ≥ 1} be a sequence of bivariate random variables from a continuous distribution. If {Rn, n ≥ 1} is the sequence of record values in the sequence of X’s, then the Y which corresponds with the nth record will be called the concomitant of the nth-record, denoted by R[n] . In FGM family, we determine the amount of information contained in R[n] and compare it with amount of information given in Rn. Also, we show that the Kullback-Leibler distance among the concomitants of record values is distribution-free. Finally, we provide some numerical results of mutual information and Pearson correlation coefficient for measuring the amount of dependency between Rn and R[n] in the copula model of FGM family.

Data Science Applications and Implications in Legal Studies: A Perspective Through Topic Modelling

Jinzhe Tan Huan Wan Ping Yan All authors (4)

https://doi.org/10.6339/22-JDS1058

Pub. online: 4 Aug 2022 Type: Data Science In Action

Open Access

Journal: Journal of Data Science Volume 21, Issue 1 (2023), pp. 57–67

Abstract

Law and legal studies has been an exciting new field for data science applications whereas the technological advancement also has profound implications for legal practice. For example, the legal industry has accumulated a rich body of high quality texts, images and other digitised formats, which are ready to be further processed and analysed by data scientists. On the other hand, the increasing popularity of data science has been a genuine challenge to legal practitioners, regulators and even general public and has motivated a long-lasting debate in the academia focusing on issues such as privacy protection and algorithmic discrimination. This paper collects 1236 journal articles involving both law and data science from the platform Web of Science to understand the patterns and trends of this interdisciplinary research field in terms of English journal publications. We find a clear trend of increasing publication volume over time and a strong presence of high-impact law and political science journals. We then use the Latent Dirichlet Allocation (LDA) as a topic modelling method to classify the abstracts into four topics based on the coherence measure. The four topics identified confirm that both challenges and opportunities have been investigated in this interdisciplinary field and help offer directions for future research.

The Poisson Burr X Inverse Rayleigh Distribution And Its Applications

Rania H. M. Abdelkhalek

https://doi.org/10.6339/JDS.202001_18(1).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 1 (2020), pp. 56–77

Abstract

A new flexible extension of the inverse Rayleigh model is proposed and studied. Some of its fundamental statistical properties are derived. We assessed the performance of the maximum likelihood method via a simulation study. The importance of the new model is shown via three applications to real data sets. The new model is much better than other important competitive models.

Automating Data Analysis Methods in Epidemiology

George Choueiry Pascale Salameh

https://doi.org/10.6339/JDS.201901_17(1).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 1 (2019), pp. 55–80

Abstract

Technological advances in software development effectively handled technical details that made life easier for data analysts, but also allowed for nonexperts in statistics and computer science to analyze data. As a result, medical research suffers from statistical errors that could be otherwise prevented such as errors in choosing a hypothesis test and assumption checking of models. Our objective is to create an automated data analysis software package that can help practitioners run non-subjective, fast, accurate and easily interpretable analyses. We used machine learning to predict the normality of a distribution as an alternative to normality tests and graphical methods to avoid their downsides. We implemented methods for detecting outliers, imputing missing values, and choosing a threshold for cutting numerical variables to correct for non-linearity before running a linear regression. We showed that data analysis can be automated. Our normality prediction algorithm outperformed the Shapiro-Wilk test in small samples with Matthews correlation coefficient of 0.5 vs. 0.16. The biggest drawback was that we did not find alternatives for statistical tests to test linear regression assumptions which are problematic in large datasets. We also applied our work to a dataset about smoking in teenagers. Because of the opensource nature of our work, these algorithms can be used in future research and projects.

Estimating Transmissibility of Seasonal Influenza Virus by Surveillance Data

Shenghai Zhang

https://doi.org/10.6339/JDS.201101_09(1).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 55–64

Abstract

Abstract: It is important to estimate transmissibility of influenza virus during its growing phase for understanding the propagation of the virus. The estimation procedures of the transmissibility are usually based on the data generated in flu seasons. The data-generating process of the outbreak of influenza has many features. The data is generated by not only a biological process but also control measures such as flu vaccination. The estimation is discussed by considering the aspects of the data-generating process and using the model to capture the essential characteristics of flu transmission during the growing phase of a flu season.

Ratio and Inverse Moments of Marshall-Olkin Extended Burr Type Xii Distribution Based on Lower Generalized Order Statistics

Devendra Kumar

https://doi.org/10.6339/JDS.201601_14(1).0004

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 1 (2016), pp. 53–66

Abstract

Abstract: In this small note we have established some new explicit expressions for ratio and inverse moments of lower generalized order statistics for the Marshall-Olkin extended Burr type XII distribution. These explicit expressions can be used to develop the relationship for moments of ordinary order statistics, record statistics and other ordered random variable techniques. Further, a characterization result of this distribution has been considered on using the conditional moment of the lower generalized order statistics.

The Extended Dagum Distribution: Properties and Application

Alisson de O. Silva Luana Cecília M. da Silva Gauss M. Cordeiro

https://doi.org/10.6339/JDS.201501_13(1).0004

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 1 (2015), pp. 53–72

Abstract

Abstract: We study a new five-parameter model called the extended Dagum distribution. The proposed model contains as special cases the log-logistic and Burr III distributions, among others. We derive the moments, generating and quantile functions, mean deviations and Bonferroni, Lorenz and Zenga curves. We obtain the density function of the order statistics. The parameters are estimated by the method of maximum likelihood. The observed information matrix is determined. An application to real data illustrates the importance of the new model.

Bayesian Circle Segmentation with Application to DNA Copy Number Alteration Detection

Junfeng Liu Harner Harner Harry Yang

https://doi.org/10.6339/JDS.2008.06(1).390

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 1 (2008), pp. 53–73

Abstract

Abstract: Several statistical approaches have been proposed to consider circumstances under which one universal distribution is not capable of fit ting into the whole domain. This paper studies Bayesian detection of mul tiple interior epidemic/square waves in the interval domain, featured by two identical statistical distributions at both ends. We introduce a simple dimension-matching parameter proposal to implement the sampling-based posterior inference for special cases where each segmented distribution on a circle has the same set of regulating parameters. Molecular biology research reveals that, cancer progression may involve DNA copy number alteration at genome regions and connection of two biologically inactive chromosome ends results in a circle holding multiple epidemic/square waves. A slight modification of a simple novel Bayesian change point identification algo rithm, random grafting-pruning Markov chain Monte Carlo (RGPMCMC), is proposed by adjusting the original change point birth/death symmetric transition probability with a differ-by-one change point number ratio. The algorithm performance is studied through simulations with connection to DNA copy number alteration detection, which promises potential applica tion to cancer diagnosis at the genome level.

74 75 76 77 78

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China