Investigation of household electricity usage patterns, and mat- ching the patterns to behaviours, is an important area of research given the centrality of such patterns in addressing the needs of the electricity indu- stry. Additional knowledge of household behaviours will allow more effective targeting of demand side management (DSM) techniques. This paper addresses the question as to whether a reasonable number of meaningful motifs, that each represent a regular activity within a domestic household, can be identified solely using the household level electricity meter data. Using UK data collected from several hundred households in Spring 2011 monitored at a frequency of five minutes, a process for finding repeating short patterns (motifs) is defined. Different ways of representing the motifs exist and a qualitative approach is presented that allows for choosing between the options based on the number of regular behaviours detected (neither too few nor too many).
Time series modelling is very popular technique used in data science. Main motive of time series modelling is to know the data generating process and also get its parameters which depend on all the observations. There may be few observations which misinterpret the data and also influence the parameters, such type of observations are called Outlier. The present study dealt the handling of outlier in context of ARIMA time series and proposed an alternative approach for the replacement of outlier. In usual process two ways of handling the outlier is popular, in first remove the outliers from the data and second replace it by the nearby values. Removal concept cannot work in the auto-correlated data like time series and similarly replacement of outlier through just previous/after value is also not much appropriate method because of dependency structure. Therefore, we are proposing an alternative approach, in which outlier is replaced by estimated values through best model. Detailed methodology is discussed and then an empirical analysis on the time series of National Pension Scheme (NPS) is carried out. Most of the series are modelled perfectly and few series were not due to non-stationary nature of the series. After getting an outlier free series, forecasting is also done. The realization of the series also performed on proposed methodology to get generalized view of proposed methodology and get similar result.
Some specific random fields have been studied by many researchers whose finite-dimensional marginal distributions are multivariate closed skewnormal or multivariate extended skew-t, in time and spatial domains. In this paper, a necessary and sufficient condition is provided for applicability of such random field in spatial interpolation, based on the marginal distributions. Two deficiencies of the random fields generated by some well-known multivariate distributions are pointed out and in contrast, a suitable skew and heavy tailed random field is proposed. The efficiency of the proposed random field is illustrated through the interpolation of a real data.
Efficiency analysis is very useful and important to measure the performance of the firms in com- petitive market of rapidly developing country like Bangladesh. The more efficient firms, and the decision making units (DMUs) are usually referred as benchmarking units for the development. In this study, efficiency scores are obtained using the non-parametric Data Envelopment Anal- ysis (DEA) technique for 1007 manufacturing firms in Bangladesh from the enterprise survey data. The DEA is used to calculate weights for inputs and outputs by assigning the maximum efficiency score for a DMU under evaluation. Total 29 firms are found efficient under variable returns to scale assumption. The significant determinants behind the inefficiency found in this analysis include mainly the firm size, manager’s experience in respective sector, annual losses due to power outage, number of production workers.
Partial Least Squares Discriminant Analysis (PLSDA) is a statistical method for classification and consists of a classical Partial Least Squares Regression in which the dependent variable is a categorical one expressing the class membership of each observation. The aim of this study is both analyzing the performance of PLSDA method in classifying 28 European Union (EU) member countries and 7 candidate countries (Albania, Montenegro, Serbia, Macedonia FYR, Turkey moreover including potential candidates Bosnia and Herzegovina and Kosova) correctly to their pre-defined classes (candidate or member) and determining the economic and/or demographic indicators, which are effective in classifying, by using the data set obtained from database of the World Bank.
The probability that the estimator is equal to the value of the estimated parameter is zero. Hence in practical applications we provide together with the point estimates their estimated standard errors. Given a distribution of random variable which has heavier tails or thinner tails than a normal distribution, then the confidence interval common in the literature will not be applicable. In this study, we obtained some results on the confidence procedure for the parameters of generalized normal distribution which is robust in any case of heavier or thinner than the normal distribution using pivotal quantities approach, and on the basis of a random sample of fixed size n. Some simulation studies and applications are also examined.
A new four-parameter lifetime distribution named as the power Lomax Poisson is introduced and studied. The subject distribution is obtained by combining the power Lomax and Poisson distributions. Structural properties of the power Lomax Poisson model are implemented. Estimation of the model parameters are performed using the maximum likelihood, least squares and weighted least squares techniques. An intensive simulation study is performed for evaluating the performance of different estimators based on their relative biases, standard errors and mean square errors. Eventually, the superiority of the new compounding distribution over some existing distribution is illustrated by means of two real data sets. The results showed the fact that, the suggested model can produce better fits than some well-known distributions.
Families of distributions are commonly used to model insurance claims data that require flexible distributional forms in a satisfactory manner, but the specification problem to assess the goodness-of-fit of the hypothesized model can sometimes be a challenge due to the complexity of the likelihood function of the family of distributions involved. The previous work shows that these specification problems can be attacked by means of semi-parametric tests based on generalized method of moment (GMM) estimators. While the approach can be directly applied to both discrete and continuous families of distributions, the paper focuses on developing a testing strategy within a framework of discrete families of distributions. Both the local power analysis and the approximate slope method demonstrate the excellent performance of these tests. The finite-sample performance of the tests, based on both asymptotic and bootstrap critical values, are also discussed and are compared with established methods that require the complete specification of likelihood functions.
Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005- 2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software RINLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.
Overdispersion is a common phenomenon in Poisson modelling. The generalized Poisson (GP) distribution accommodates both overdispersion and under dispersion in count data. In this paper, we briefly overview different overdispersed and zero-inflated regression models. To study the impact of fitting inaccurate model to data simulated from some other model, we simulate data from ZIGP distribution and fit Poisson, Generalized Poisson (GP), Zero-inflated Poisson (ZIP), Zero-inflated Generalized Poisson (ZIGP) and Zero-inflated Negative Binomial (ZINB) model. We compare the performance of the estimates of Poisson, GP, ZIP, ZIGP and ZINB through mean square error, bias and standard error when the samples are generated from ZIGP distribution. We propose estimators of parameters of ZIGP distribution based on the first two sample moments and proportion of zeros referred to as MOZE estimator and compare its performance with maximum likelihood estimate (MLE) through a simulation study. It is observed that MOZE are almost equal or even more efficient than that of MLE of the parameters of ZIGP distribution.