One of the major climatic interests of the last decades has been to understand and describe the rainfall patterns of specific areas of the world as functions of other climate covariates. We do it for the historical climate monitoring data from Tegucigalpa, Honduras, using non-homogeneous hidden Markov models (NHMMs), which are dynamic models usually used to identify and predict heterogeneous regimes. For estimating the NHMM in an efficient and scalable way, we propose the stochastic Expectation-Maximization (EM) algorithm and a Bayesian method, and compare their performance in synthetic data. Although these methodologies have already been used for estimating several other statistical models, it is not the case of NHMMs which are still widely fitted by the traditional EM algorithm. We observe that, under tested conditions, the performance of the Bayesian and stochastic EM algorithms is similar and discuss their slight differences. Analyzing the Honduras rainfall data set, we identify three heterogeneous rainfall periods and select temperature and humidity as relevant covariates for explaining the dynamic relation among these periods.
Abstract: Methods used to detect differentially expressed genes in situations with one control and one treatment are t-tests. These methods do not per- form well when control and treatment variances are different. In situations with a control and more than one treatment, it is common to apply analysis of variance followed by a Tukey and/or Duncan test to identify which treat- ment caused the difference. We propose a Bayesian approach for multiple comparison analysis which is very useful in the context of DNA microarray experiments. It uses a priori Dirichlet process and Polya urn scheme. It is a unified procedure (for cases with one or more treatments) which detects differentially expressed genes and identify treatments causing the difference. We use simulations to verify the performance of the proposed method and compare it with usual methods. In cases with control and one treatment and control and more than one treatment followed by Tukey and Duncan tests, the method presents better performance when variances are different. The method is applied to two real data sets. In these cases, genes not detected by usual methods are identified by the proposed method.
The choice of an appropriate bivariate parametrical probability distribution for pairs of lifetime data in presence of censored observations usually is not a simple task in many applications. Each existing bivariate lifetime probability distribution proposed in the literature has different dependence structure. Commonly existing classical or Bayesian discrimination methods could be used to discriminate the best among different proposed distributions, but these techniques could not be appropriate to say that we have good fit of some particular model to the data set. In this paper, we explore a recent dependence measure for bivariate data introduced in the literature to propose a graphical and simple criterion to choose an appropriate bivariate lifetime distribution for data in presence of censored data.
Abstract: Although many scoring models have been developed in literature to offer financial institutions guidance in credit granting decision, the pur pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor ing model is proposed to help financial institutions identify factors which truly reflect customer value and can affect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 269–292
Abstract
This article develops nonlinear functional forms for modeling count time series of daily deaths due to the COVID-19 virus. Our models explain the mean levels of the time series while accounting for the time-varying variances. A Bayesian approach using Markov chain Monte Carlo (MCMC) is adopted for analysis, inference and forecasting of the time series under the proposed models. Applications are shown for time series of death counts from several countries affected by the pandemic.