One of the major climatic interests of the last decades has been to understand and describe the rainfall patterns of specific areas of the world as functions of other climate covariates. We do it for the historical climate monitoring data from Tegucigalpa, Honduras, using non-homogeneous hidden Markov models (NHMMs), which are dynamic models usually used to identify and predict heterogeneous regimes. For estimating the NHMM in an efficient and scalable way, we propose the stochastic Expectation-Maximization (EM) algorithm and a Bayesian method, and compare their performance in synthetic data. Although these methodologies have already been used for estimating several other statistical models, it is not the case of NHMMs which are still widely fitted by the traditional EM algorithm. We observe that, under tested conditions, the performance of the Bayesian and stochastic EM algorithms is similar and discuss their slight differences. Analyzing the Honduras rainfall data set, we identify three heterogeneous rainfall periods and select temperature and humidity as relevant covariates for explaining the dynamic relation among these periods.
Abstract: In this paper we propose a new bivariate long-term distribution based on the Farlie-Gumbel-Morgenstern copula model. The proposed model allows for the presence of censored data and covariates in the cure parameter. For inferential purpose a Bayesian approach via Markov Chain Monte Carlo (MCMC) is considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we develop a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated on artificial and real HIV data.
Abstract: Methods used to detect differentially expressed genes in situations with one control and one treatment are t-tests. These methods do not per- form well when control and treatment variances are different. In situations with a control and more than one treatment, it is common to apply analysis of variance followed by a Tukey and/or Duncan test to identify which treat- ment caused the difference. We propose a Bayesian approach for multiple comparison analysis which is very useful in the context of DNA microarray experiments. It uses a priori Dirichlet process and Polya urn scheme. It is a unified procedure (for cases with one or more treatments) which detects differentially expressed genes and identify treatments causing the difference. We use simulations to verify the performance of the proposed method and compare it with usual methods. In cases with control and one treatment and control and more than one treatment followed by Tukey and Duncan tests, the method presents better performance when variances are different. The method is applied to two real data sets. In these cases, genes not detected by usual methods are identified by the proposed method.
The choice of an appropriate bivariate parametrical probability distribution for pairs of lifetime data in presence of censored observations usually is not a simple task in many applications. Each existing bivariate lifetime probability distribution proposed in the literature has different dependence structure. Commonly existing classical or Bayesian discrimination methods could be used to discriminate the best among different proposed distributions, but these techniques could not be appropriate to say that we have good fit of some particular model to the data set. In this paper, we explore a recent dependence measure for bivariate data introduced in the literature to propose a graphical and simple criterion to choose an appropriate bivariate lifetime distribution for data in presence of censored data.
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 269–292
Abstract
This article develops nonlinear functional forms for modeling count time series of daily deaths due to the COVID-19 virus. Our models explain the mean levels of the time series while accounting for the time-varying variances. A Bayesian approach using Markov chain Monte Carlo (MCMC) is adopted for analysis, inference and forecasting of the time series under the proposed models. Applications are shown for time series of death counts from several countries affected by the pandemic.