Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Tilted Normal Distribution and Its Survival Properties

Sudhansu S. Maiti Mithu Dey

https://doi.org/10.6339/JDS.2012.10(2).1038

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 2 (2012), pp. 225–240

Abstract

Abstract: To analyze skewed data, skew normal distribution is proposed by Azzalini (1985). For practical problems of estimating the skewness parame ter of this distribution, Gupta and Gupta (2008) suggested power normal dis tribution as an alternative. We search for another alternative, named tilted normal distribution following the approach of Marshall and Olkin (1997) to add a positive parameter to a general survival function and taking survival function is of normal form. We have found out different properties of this distribution. Maximum likelihood estimate of parameters of this distribu tion have been found out. Comparison of tilted normal distribution with skew normal and power normal distribution have been made.

Estimating the Interest Rate Term Structures of Treasury and Corporate Debt with Bayesian Penalized Splines

Min Li Yan Yu

https://doi.org/10.6339/JDS.2005.03(3).216

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 3 (2005), pp. 223–240

Abstract

Abstract: This paper provides a Bayesian approach to estimating the interest rate term structures of Treasury and corporate debt with a penalized spline model. Although the literature on term structure modeling is vast, to the best of our knowledge, all methods developed so far belong to the frequentist school. In this paper, we develop a two-step estimation procedure from a Bayesian perspective. The Treasury term structure is first estimated with a Bayesian penalized spline model. The smoothing parameter is naturally embedded in the model as a ratio of posterior variances and does not need to be selected as in the frequentist approach. The corporate term structure is then estimated by adding a credit spread to the estimated Treasury term structure, incorporating knowledge of the positive credit spread into the Bayesian model as an informative prior. In contrast to the frequentist method, the small sample size of the corporate debt poses no particular difficulty to the proposed Bayesian approach.

Modeling Compositional Regression With Uncorrelated and Correlated Errors: A Bayesian Approach

Taciana K. O. Shimizu Francisco Louzada Adriano K. Suzuki All authors (4)

https://doi.org/10.6339/JDS.201804_16(2).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 2 (2018), pp. 221–250

Abstract

Compositional data consist of known compositions vectors whose components are positive and defined in the interval (0,1) representing proportions or fractions of a “whole”. The sum of these components must be equal to one. Compositional data is present in different knowledge areas, as in geology, economy, medicine among many others. In this paper, we propose a new statistical tool for volleyball data, i.e., we introduce a Bayesian anal- ysis for compositional regression applying additive log-ratio (ALR) trans- formation and assuming uncorrelated and correlated errors. The Bayesian inference procedure based on Markov Chain Monte Carlo Methods (MCMC). The methodology is applied on an artificial and a real data set of volleyball.

Modelling Location, Scale and Shape Parameters of the Birnbaum-Saunders Generalized T Distribution

Luiz R. Nakamura Robert A. Rigby Dimitrios M. Stasinopoulos All authors (6)

https://doi.org/10.6339/JDS.201704_15(2).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 2 (2017), pp. 221–238

Abstract

The Birnbaum-Saunders generalized t (BSGT) distribution is a very flflexible family of distributions that admits different degrees of skewness and kurtosis and includes some important special or limiting cases available in the literature, such as the Birnbaum-Saunders and BirnbaumSaunders t distributions. In this paper we provide a regression type model to the BSGT distribution based on the generalized additive models for location, scale and shape (GAMLSS) framework. The resulting model has high flflexibility and therefore a great potential to model the distribution parameters of response variables that present light or heavy tails, i.e. platykurtic or leptokurtic shapes, as functions of explanatory variables. For different parameter settings, some simulations are performed to investigate the behavior of the estimators. The potentiality of the new regression model is illustrated by means of a real motor vehicle insurance data set.

Quantifying Treatment Effects When Flexibly Modeling Individual Change in a Nonlinear Mixed Effects Model

Robert J. Gallop Sona Dimidjian David C. Atkins All authors (4)

https://doi.org/10.6339/JDS.201104_09(2).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 2 (2011), pp. 221–241

Abstract

Abstract: A core task in analyzing randomized clinical trials based on longitudinal data is to find the best way to describe the change over time for each treatment arm. We review the implementation and estimation of a flexible piecewise Hierarchical Linear Model (HLM) to model change over time. The flexible piecewise HLM consists of two phases with differing rates of change. The breakpoints between these two phases, as well as the rates of change per phase are allowed to vary between treatment groups as well as individuals. While this approach may provide better model fit, how to quantify treatment differences over the longitudinal period is not clear. In this paper, we develop a procedure for summarizing the longitudinal data for the flexible piecewise HLM on the lines of Cook et al. (2004). We focus on quantifying the overall treatment efficacy using the area under the curve (AUC) of the individual flexible piecewise HLM models. Methods are illustrated through data from a placebo-controlled trial in the treatment of depression comparing psychotherapy and pharmacotherapy.

Analyzing Collinear Data by Principal Component Regression Approach — An Example from Developing Countries

Abu Jafar Mohammad Sufian

https://doi.org/10.6339/JDS.2005.03(2).220

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 2 (2005), pp. 221–232

Abstract

Abstract: The aim of this paper is to identify the effects of socioeconomic factors and family planning program effort on total fertility rate with national level data from forty-three developing countries. The data used have mainly been taken from the secondary source “Family Planning and Child Survival: 100 Developing Countries” compiled by the Center for Population and Family Health, Columbia University. Because the independent variables were found to be highly correlated among themselves, component regression technique has been used to analyze the data. The analysis shows that the family planning program effort has the largest contribution in lowering the total fertility rate, followed by percent of urban population, female literacy rate, and infant mortality rate in that order. Policy implications are discussed.

A New Family of Generalized Distributions on the Unit Interval: The T− kumasatwamy Family of Distributions

Patrick Osatohanmwen F.O. Oyegue Ewere F. All authors (4)

https://doi.org/10.6339/JDS.202004_18(2).0001

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 18, Issue 2 (2020), pp. 219–237

Abstract

The so-called Kumaraswamy distribution is a special probability distribution developed to model doubled bounded random processes for which the mode do not necessarily have to be within the bounds. In this article, a generalization of the Kumaraswamy distribution called the T-Kumaraswamy family is defined using the T-R {Y} family of distributions framework. The resulting T-Kumaraswamy family is obtained using the quantile functions of some standardized distributions. Some general mathematical properties of the new family are studied. Five new generalized Kumaraswamy distributions are proposed using the T-Kumaraswamy method. Real data sets are further used to test the applicability of the new family.

Regression for Compositional Data with Compositional Data as Predictor Variables with or without Zero Values

Abdulaziz Alenazi

https://doi.org/10.6339/JDS.201901_17(1).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 1 (2019), pp. 219–238

Abstract

Compositional data are positive multivariate data, constrained to lie within the simplex space. Regression analysis of such data has been studied and many regression models have been proposed, but most of them not allowing for zero values. Secondly, the case of compositional data being in the predictor variables side has gained little research interest. Surprisingly enough, the case of both the response and predictor variables being compositional data has not been widely studied. This paper suggests a solution for this last problem. Principal components regression using the 𝛼 -transformation and Kulback-Leibler divergence are the key elements of the proposed approach. An advantage of this approach is that zero values are allowed, in both the response and the predictor variables side. Simulation studies and examples with real data illustrate the performance of our algorithm.

Double Sampling Designs to Reduce the Non-discovery Rate: Application to Microarray Data

Maela Kloareg David Causeur

https://doi.org/10.6339/JDS.2009.07(2).452

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 2 (2009), pp. 219–234

Abstract

Abstract: Simultaneous tests of a huge number of hypotheses is a core issue in high flow experimental methods such as microarray for transcriptomic data. In the central debate about the type I error rate, Benjamini and Hochberg (1995) have proposed a procedure that is shown to control the now popular False Discovery Rate (FDR) under assumption of independence between the test statistics. These results have been extended to a larger class of dependency by Benjamini and Yekutieli (2001) and improvements have emerged in recent years, among which step-up procedures have shown desirable properties. The present paper focuses on the type II error rate. The proposed method improves the power by means of double-sampling test statistics in tegrating external information available both on the sample for which the outcomes are measured and also on additional items. The small sample dis tribution of the test statistics is provided and simulation studies are used to show the beneficial impact of introducing relevant covariates in the testing strategy. Finally, the present method is implemented in a situation where microarray data are used to select the genes that affect the degree of muscle destructuration in pigs. A phenotypic covariate is introduced in the analysis to improve the search for differentially expressed genes.

Bayesian Wavelet Regression for Spatial Estimation

G. Avarez B. Sans´o

https://doi.org/10.6339/JDS.2008.06(2).416

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 2 (2008), pp. 219–229

Abstract

Abstract: We consider the problem of estimating the properties of an oil reservoir, like porosity and sand thickness, in an exploration scenario where only a few wells have been drilled. We use gamma ray records measured directly from the wells as well as seismic traces recorded around the wells. To model the association between the soil properties and the signals, we fit a linear regression model. Additionally we account for the spatial correla tion structure of the observations using a correlation function that depends on the distance between two points. We transform the predictor variable using discrete wavelets and then perform a Bayesian variable selection us ing a Metropolis search. We obtain predictions of the properties over the whole reservoir providing a probabilistic quantification of their uncertainties, thanks to the Bayesian nature of our method. The cross-validated results show that a very high accuracy can be achieved even with a very small number of wavelet coefficients.

58 59 60 61 62

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China