Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 889

Order by:

Select: All None Download:

Allometric Extension for Multivariate Regression

Thaddeus Tarpey Christopher T. Ivey

https://doi.org/10.6339/JDS.2006.04(4).287

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 4 (2006), pp. 479–495

Abstract

Abstract: In multivariate regression, interest lies on how the response vector depends on a set of covariates. A multivariate regression model is proposed where the covariates explain variation in the response only in the direction of the first principal component axis. This model is not only parsimonious, but it provides an easy interpretation in allometric growth studies where the first principal component of the log-transformed data corresponds to constants of allometric growth. The proposed model naturally generalizes the two–group allometric extension model to the situation where groups differ according to a set of covariates. A bootstrap test for the model is proposed and a study on plant growth in the Florida Everglades is used to illustrate the model.

Making Sense of Contingency Tables in Archaeology: the Aid of Correspondence Analysis to Intra-Site Activity Areas Research

Gianmarco Alberti

https://doi.org/10.6339/JDS.2013.11(3).1141

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 3 (2013), pp. 479–499

Abstract

Abstract: The use of contingency tables is widespread in archaeology. Cross tabulations are used in many different studies as a useful tool to syntheti cally report data, and are also useful when analyst wishes to seek for latent data structures. The latter case is when Correspondence Analysis (CA) comes into play. By graphically displaying the dependence between rows and columns, CA enables the analyst to explore the data in search of a meaningful inner structure. The article aims to show the utility of CA in archaeology in general and, in particular, for the identification of areas de voted to different activities within settlements. The application of CA to the data from a prehistoric village in north-eastern Sicily (P. Milazzese at Panarea, Aeolian Islands-Italy), taken as a case study, allows to show how CA succeeds in pinpointing different activity areas and in providing grounds to open new avenues of inquiry into other aspects of the archaeological doc umentation.

Using Visualization to Support Investment & Divestment Decisions in the Canadian Armed Forces

Mark Rempel

https://doi.org/10.6339/JDS.201407_12(3).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 3 (2014), pp. 477–494

Abstract

Abstract: Many nations’ defence departments use capabilitybased planning to guide their investment and divestment decisions. This planning process involves a variety of data that in its raw form is difficult for decisionmakers to use. In this paper we describe how dimensionality reduction and partition clustering are used in the Canadian Armed Forces to create visualizations that convey how important military capabilities are in planning scenarios and how much capacity the planned force structure has to provide the capabilities. Together, these visualizations give decisionmakers an overview of which capabilities may require investment or may be candidates for divestment.

Analyst Optimism in the Automotive Industry: A Post-Bailout Boost and Methodological Insights

Barry Hettler Nonna Sorokina Yertai Tanai All authors (4)

https://doi.org/10.6339/JDS.201507_13(3).0004

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 3 (2015), pp. 473–494

Abstract

This paper empirically investigates the impact of the government bailout on analysts’ forecast optimism regardingfirms in the automotive industry. We compare the results from M- and MM-robust methodologies to the results from OLS regression in an event study context and find that inferences change. When M- and MM-robust estimation methods are used to estimate the same model, the results for key control variables fall directly in line with those of similar previous studies. Furthermore, an analysis of residuals indicates that the application of M- and MM estimation methods pulls the main prediction equation towards the main sample data, suggesting a more rigorous fit. Based on robust methods, we observe changes in analyst optimism during the announcement period of the bailout, as evidenced by the significantly positive variable of interest. We support our empirical results with simulations and confirm significant improvements in estimation accuracy when robust regression methods are applied to the samples contaminated by outliers.

Inferences about a Probabilistic Measure of Effect Size When Dealing with More Than Two Groups

Rand R. Wilcox

https://doi.org/10.6339/JDS.201107_09(3).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 3 (2011), pp. 471–486

Abstract

Abstract: For two independent random variables, X and Y, let p = P(X > Y ) + 0.5P(X = Y ), which is sometimes described as a probabilistic measure of effect size. It has been argued that for various reasons, p represents an important and useful way of characterizing how groups differ. In clinical trials, for example, an issue is the likelihood that one method of treatment will be more effective than another. The paper deals with making inferences about p when three or more groups are to be compared. When tied values can occur, the results suggest using a multiple comparison procedure based on an extension of Cliff’s method used in conjunction with Hochberg’s sequentially rejective technique. If tied values occur with probability zero, an alternative method can be argued to have a practical advantage. As for a global test, extant rank-based methods are unsatisfactory given the goal of comparing groups based on p. The one method that performed well in simulations is based in part on the distribution of the difference between each pair of random variables. A bootstrap method is used where a p-value is based on the projection depth of the null vector relative to the bootstrap cloud. The proposed methods are illustrated using data from an intervention study.

Interval Estimation for Ratios of Correlated Age-Adjusted Rates

Ram C. Tiwari Yi Li Zhaohui Zou

https://doi.org/10.6339/JDS.2010.08(3).610

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 3 (2010), pp. 471–482

Abstract

Abstract: Providing reliable estimates of the ratios of cancer incidence and mortality rates across geographic regions has been important for the National cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program as it profiles cancer risk factors as well decides cancer control planning. A fundamental difficulty, however, arises when such ratios have to be computed to compare the rate of a subregion (e.g., California) with that of a parent region (e.g., the US). Such a comparison is often made for policy-making purposes. Based on F-approximations as well as normal approximations, this paper provides new confidence intervals (CIs) for such rate ratios. Intensive simulations, which capture the real issues with the observed mortality data, reveal that these two CIs perform well. In general, for rare cancer sites, the F-intervals are often more conservative, and for moderate and common cancers, all intervals perform similarly.

Distortion Diagnostics for Covariate-adjusted Regression: Graphical Techniques Based on Local Linear Modeling

Danh V. Nguyen Damla S¸ent¨urk

https://doi.org/10.6339/JDS.2007.05(4).363

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 4 (2007), pp. 471–490

Abstract

Abstract: Linear regression models are often useful tools for exploring the relationship between a response and a set of explanatory (predictor) variables. When both the observed response and the predictor variables are contaminated/distorted by unknown functions of an observable confounder, inferring the underlying relationship between the latent (unobserved) variables is more challenging. Recently, S¸ent¨urk and M¨uller (2005) proposed the method of covariate-adjusted regression (CAR) analysis for this distorted data setting. In this paper, we describe graphical techniques for assessing departures from or violations of specific assumptions regarding the type and form of the data distortion. The type of data distortion consists of multiplicative, additive or no-distortion. The form of the distortion encompasses a class of general smooth distorting functions. However, common confounding adjustment methods in regression analysis implicitly make distortion assumptions, such as assuming additive or multiplicative linear distortions. We illustrate graphical detection of departures from such assumptions on the distortion. The graphical diagnostic techniques are illustrated with numeri cal and real data examples. The proposed graphical assessment of distortion assumptions is feasible due to the CAR estimation method, which utilizes a local regression technique to estimate a set of transformed distorting functions (S¸ent¨urk and Nguyen, 2006).

Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults

Liang Zhu Jianguo Sun Phillip Wood

https://doi.org/10.6339/JDS.2009.07(4).471

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 4 (2009), pp. 469–485

Abstract

Abstract: Alcohol and drug uses are common in today’s society and it is well-known that they can lead to serious consequences. Studies have been conducted in order, for example, to understand short- or long-term tem poral processes of alcohol and drug uses. This paper discusses statistical modeling for joint analysis of alcohol and drug uses and several models and the corresponding estimation approaches are presented. The methods are applied to a prospective study of alcohol and drug uses on college freshmen, which motivated this investigation. The analysis results suggest that female subjects seem to have much less consequences of alcohol and drug uses than male subjects and the consequences of alcohol and drug uses decrease along with ages.

The Gamma Burr XII Distributions: Theory and Applications

Renata Rojas Guerra Fernando A. Pe˜na-Ram´ırez Gauss M. Cordeiro

https://doi.org/10.6339/JDS.201707_15(3).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 3 (2017), pp. 467–494

Abstract

We introduce a four-parameter distribution, called the Zografos-Balakrishnan Burr XII distribution. Our purpose is to provide a Burr XII generalization that may be useful to still more complex situations. The new distribution may be an interesting alternative to describe income distributions and can also be applied in actuarial science, finance, bioscience, telecommunications and modelling lifetime data, for example. It contains as special models some well-known distributions, such as the log-logistic, Weibull, Lomax and Burr XII distributions, among others. Some of its structural properties are investigated. The method of maximum likelihood is used for estimating the model parameters and a simulation study is conducted. We provide two application to real data to demonstrate the usefulness of the proposed distribution. Since the Risti´c-Balakrishnan Burr XII distribution has a similar structure to the studied distribution, we also present some of its properties and expansions.

Analyzing Spatial Panel Data of Cigarette Demand: A Bayesian Hierarchical Modeling Approach

Yanbing Z Zheng Jun Zhu Dong Li

https://doi.org/10.6339/JDS.2008.06(4).428

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 4 (2008), pp. 467–489

Abstract

Abstract: Analysis of spatial panel data is of great importance and inter est in spatial econometrics. Here we consider cigarette demand in a spatial panel of 46 states of the US over a 30-year period. We construct a de mand equation to examine the elasticity of per pack cigarette price and per capita disposable income. The existing spatial panel models account for both spatial autocorrelation and state-wise heterogeneity, but fail to account for temporal autocorrelation. Thus we propose new spatial panel models and adopt a fully Bayesian approach for model parameter inference and predic tion of cigarette demand at future time points using MCMC. We conclude that the spatial panel model that accounts for state-wise heterogeneity, spa tial dependence, and temporal dependence clearly outperforms the existing models. Analysis based on the new model suggests a negative cigarette price elasticity but a positive income elasticity.

33 34 35 36 37

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China