ANALYSIS OF DELAYED S SHAPED SOFTWARE RELIABILITY GROWTH MODEL WITH TIME DEPENDENT FAULT CONTENT RATE FUNCTION

Many software reliability growth models based upon a non-homogeneous Poisson process (NHPP) have been proposed to measure and asses the reliability of a software system quantitatively. Generally, the error detection rate and the fault content function during software testing is considered to be dependent on the elapsed time testing. In this paper we have proposed three software reliability growth models (SRGM’s) incorporating the notion of error generation over the time as an extension of the delayed S-shaped software reliability growth model based on a non-homogeneous Poisson process (NHPP). The model parameters are estimated using the maximum likelihood method for interval domain data and three data sets are provided to illustrate the estimation technique. The proposed model is compared with the existing delayed S-shaped model based on error sum of squares, mean sum of squares, predictive ratio risk and Akaike’s information criteria using three different data sets. We show that the proposed models perform satisfactory better than the existing models.

These models were developed based on the assumption that faults detected in the testing phase are removed immediately with no debugging time delay and no new faults are introduced into the software.In other words, it is assumed that whenever an attempt is made to remove a fault, it is removed with certainty and this is referred as perfect because of a number of factors like tester's skill and expertise.The testing is observed and the original fault may remain, leading to a phenomenon known as imperfect debugging.Another possibility is that while correcting a software error additional errors may be generated and these errors may get into software.Such models may be referred as error generation models.In the case of error generation, the total fault content increases as testing progresses because new faults are introduced into the system while removing the original faults.Goel and Okumoto (1979) have introduced the concept of imperfect debugging for the first time in literature.Obha and Chon (1989) extended this work by introducing the error generation into the GO model and named it as imperfect debugging model.Kapur and Grag (1990) introduced the imperfect debugging in GO model and they considered fault introduction rate per remaining faults being reduced due to the imperfect debugging.Thus the number of failure observed and detected by infinite time is more than the initial fault content.Yamada et al. (1992) extended GO model by assuming that fault introduction rate is linear and exponential time dependent.Further several authors have used this approach to model error generation activity in software reliability model.Pham and Zhang (1997) have developed a software reliability model considering error introduction rate as exponential and error introduction rate as non-decreasing type.Pham et al. (1999) have discussed linear fault introduction rate function and fault introduction is of non-decreasing time dependent type.Tokuno and Yamada (2000) proposed imperfect debugging SRGM with two types of hazard rates for software reliability measurement and assessment.Zhang and Pham (2000) developed NHPP model with imperfect debugging and time dependent fault-detection rate.Zhang et al. (2003) proposed SRGM model to integrate fault removal efficiency, failure rate, and fault introduction rate into software reliability assessment.Chatterjee and Shukla (2016) proposed a change point-based SRGM under imperfect debugging with revised concept of fault dependency.The above discussed models does not always reflect real testing environments.Recently Hanagal and Bhalerao (2016a, 2016b, 2017) obtained software reliability growth models based on inverse Weibull, generalized inverse Weibull, extended inverse Weibull distributions.It is of interest to construct a SRM by considering introduction of error generation during testing and central processing unit (CPU) execution time.
The remainder of the paper is organized as follows.In Section 2, we describe NHPP delayed S-shaped models.In section 3, we discuss the proposed models with three different fault content functions and software reliability assessment measures and the estimation of unknown model parameters using maximum likelihood method.Parameter estimation of is of primary importance in software reliability prediction.Once the analytical solution for () is known for a given model, the parameters in the solution need to be determined.Parameter estimation is achieved by applying a technique of maximum likelihood estimate (MLE), for the interval domain area data.In many cases, the maximum likelihood estimators are consistent and asymptotically normally distributed as the sample size increases; see Zhao and Xie (1996).In Section 4, we represent the analysis of three data sets DS I, DS II and DS III.Section 5 contains the major conclusions and remarks of the study.

NHPP Delayed S-Shaped Model
In the NHPP S-shaped mode, the software reliability growth curve is an S-shaped curve.The detection rate of faults, where the error detection rate changes with time, because the greatest at a certain time after testing begins, after which it decreases exponentially.In other words, some faults are covered by other faults at the beginning of the testing phase, before these faults are actually removed, the covered faults remain undetected.Yamada et al. (1984) also determined that the software testing process usually involves a learning process where testers becomes familiar with the software products, environments, and software specifications.Several S-shaped models (Yamada et al. 1984;Pham 1997a) such as delayed S-shaped, inflection S-shaped, etc., exists in literature.Kareer et al. (1990) proposed an S-shaped SRGM with two types of errors.The errors have been classified depending upon their survey.Gupta et al. (2011) obtained software reliability estimation using delayed S-shaped model under imperfect debugging and time lag.Ahmed et al. (2011) developed an inflation SRGM considering log-logistic testing-effort and imperfect debugging.Kaur and Sharma (2015) predict the time between failure and accuracy by using CASRE tool for SRG models.Bokhari et al. (2017) proposed delayed S-shaped SRGM with imperfect debugging and new modified Weibull testing effort function.
We now discuss a stochastic model for a software error detection process based on NHPP in which the growth curve of the number of detected software errors for the observed failure data is S-shaped, called delayed S-shaped NHPP model (Yamada et al 1984).The software error detection procedure described by an S-shaped curve can be characterized as a learning process in which the test team members become familiar with the test environment, testing tools or project requirements, i.e., their test skills gradually improve.The delayed S-shaped model is based on the following assumptions: 1.All the faults in the program are mutually independent from the failure detection point of view.DEPENDENT FAULT CONTENT RATE FUNCTION 2. The probability of failure detection at any time is proportional to the current number of faults in a software.3. The proportionality of failure detection in constant.4. The initial error content of the software is a random variable.5.A software system is subject to failures at random times caused by errors present in the system.6.The time between ( − 1) ℎ and  ℎ failures depends on the time to the ( − 1) ℎ failure.7.Each time a failure occurs, the error which caused it is simultaneously removed and no other errors are introduced.On the basis of these assumptions we have the following differential equation: =expected total number of faults that exists in the software before testing.() = failure detection rate per fault, which also represents the average failure rate of a fault.
In delayed S-shaped NHPP model, () = (2.2) which shows an S-shaped curve.This model is called delayed S-shaped NHPP model.Further, the counting process (),  ≥ 0 representing the cumulative number of software faults detected up to testing time  is a stochastic process.Basic assumption about this process lead to the commonly accepted conclusion that, for any  ≥ 0, () is a Poisson distributed with time dependent Poisson parameter (), the mean value function (MVF).
where () is given by : The MVF represents the expected number of software errors that have accumulated up to time , or estimated cumulative faults up to time .
The intensity function, () of the NHPP given by: () =  2  − (2.5)This represents the number of faults represented per unit testing time.
In general the second assumption (2) i.e. the detected faults are removed with certainty is not always true [See Yamada et al. (1992), Pham and Zhang (1997) and Pham (1999)].If additional introduced faults affect the fraction of the fault content, we have to modify this assumption by introducing the concept of error generation while debugging (or testing) process is an action.That is, total fault content rate function, () representing the sum of expected number of initial software faults and introduced faults by time .Replacing  by () in equation (2.1), we have differential equation The above differential equation was given by Pham (2007).
The generalized mean value function (MVF) solution of the above differential equation (2.6) given by (Pham and Zhang 1997) is as follows: where () = ∫ ()   0 with initial condition ( 0 ) =  0 and  0 is the time to begin the debugging process.Pham (2007) obtained NHPP SRGM with () = (1 + ) 2 and in this case both () and () have common parameter .In our proposed models we consider the parameters are independent instead of dependent case.
Many existing software reliability growth models [Pham (2006)], Pham and Zhang (2003), Yamada and Osaki (1985) can be considered as a special case of the above general model.An increasing function () implies an increasing total number of faults and reflects imperfect debugging.An increasing () implies an increasing fault detection rate, which could be either attributed to imperfect debugging, or to software process fluctuations, or a combination of both.Different () and () functions also reflect different assumptions of the software testing processes.
In the simplest model, the function () and () are both constants.A constant () =  stands for the assumption that no new errors are introduced during the debugging process.A constant () =  implies that the proportional factor relating the error detection rate () to the total number of remaining errors is constant.In a general model, the functions () and () are both functions of time.
In the following section we propose general models with three different types of fault content function () and discuss the methods of estimating parameters of the models.DEPENDENT FAULT CONTENT RATE FUNCTION

3.
Proposed Models with Three Different Fault Content Rate Functions in Delayed S-shaped NHPP

3.1.1Linear Fault Content Rate Function
Here we assume that fault introduction rate is a linear function and dependent on time, () =  1 () = (1 + )  > 0,  > 0,  ≥ 0 where  is the number of initial fault content in the system and  is an increasing rate of the number of introduced faults to the initial fault content function.Let  1 () be the MVF when fault content function  1 () is substituted for () in (2.6) thus we have: . Solving the differential equation given in (3.1) with respect to  1 () under the initial condtition  1 (0) = 0 we derived the mean value function given by: 2) In this paper we call NHPP model with MVF _1() given in (3.2) as delayed S-shaped linear model.The intensity function of this model is:

Software Reliability Assessment Measures
Based on delayed S-shaped linear model, we can derive the following quantitative measures useful for software reliability assessment.

3.1.1Parameter Estimation
In this section we discuss the method of estimating the parameters of the models mentioned above

Estimation using interval Domain Data
Suppose that  pairs of observations (  ,  i ); Closed form expressions of MLEs of ,  and  cannot be obtained.However, the MLEs can be obtained by iterative solution procedure.Setting the derivatives of the loglikelihood function for ( , , , ) to zero, the MLEs ( ̂,  ̂,  ̂) are obtained by iterative solution procedure i.e., Newton Raphson method.We have used R-software for the iterative solution procedure.Let  ̂,  ̂,  ̂ be the MLEs of , ,  respectively.

Quadratic Fault Content Rate Function
Here we assume that fault introduction rate is a quadratic function and dependent on time, () =  2 () =  +  −  2 ,  > 0,  > 0,  > 0  ≥ 0 where  is the number of initial fault content in the system prior to the testing, and  and  are increasing and decreasing rate of the number of introduced faults to the initial fault content function.
Let  2 () be the MVF when fault content function  2 () is substituted for () in (2.6) thus have > 0,  > 0,  > 0 , () > 0,  ≥ 0 (3.12) . Solving the differential equation with respect to  2 () under the initial condition  2 (0) = 0 we derived the mean value function given by : We shall call the present NHPP model with MVF given in (3.13) as delayed S-shaped quadratic model.The intensity function of this model is given by: Based on delayed S-shaped quadratic model, we can derive the following quantitative measures useful for software reliability assessment.

3.2.1Parameter Estimation
In this section we discuss the method of estimating the parameters of the method mentioned above

Estimation using interval Domain Data
The likelihood for the interval domain data is given by  Closed form expressions for MLEs of ,  and  cannot be obtained.However, the MLEs can be obtained by iterate solution procedure.Setting the derivatives of the loglikelihood function for ( , , , ) to zero, the MLEs ( ̂,  ̂,  ̂) are obtained by iterative solution procedure i.e., Newton Raphson method.We have used R-software for the iterative solution procedure.Let  ̂,  ̂ and  ̂ be the MLEs of , ,  ̂ respectively.

Exponential Fault Content Rate Function
Here we assume that the fault introduction rate is exponential function dependent on time.We chose the following fault content function: () =  3 () =  − ,  > 0,  > 0 ,  ≥ 0 where  is a number of initial fault content in the system prior to the testing and  is an increasing rate of the number of introduced faults to the initial fault content function.Let  3 () be the MVF when fault content function  3 () is substituted for () in ( 2 under the initial condition  3 (0) = 0 we derived the mean value function given by: We shall call the present model with MVF given in (3.24) as delayed S-shaped exponential model.The intensity function of this model is given by: Based on delayed S-shaped exponential model, we can derive the following quantitative measures useful for software reliability assessment.

3.3.1Estimation of Parameters Using Interval Domain Data
The likelihood function mentioned above for the interval domain data is given by where  0 = 0 and  0 = 0 .Taking the natural logarithm of equation Closed form expressions for MLEs of ,  and  cannot be obtained.However, the MLEs can be obtained by iterate solution procedure.Setting the derivatives of the loglikelihood function for ( , , , ) to zero, the MLEs ( ̂,  ̂,  ̂) are obtained by iterative solution procedure i.e., Newton Raphson method.We have used R-software for the iterative solution procedure.Let  ̂,  ̂,  ̂ be the MLEs of , ,  respectively.

Analysis of Three Data Sets
The Data Set I is about US Tactical Data Systems (NTDS) given by [Goel and Okumoto (1979)a]: the software data set was extracted from information about failures in the development of software for the real time multi-computer complex of the US Naval Fleet Computer Programming Center of the US Naval Tactical Data System (NTDS).The software consists of 38 different project modules.The time horizon is divided into four phases: Production phase, test phase, user phase and subsequent test phase.The 26 software failures were found during the production phase, five during the test phase and the last failure was found on 1th Jan 1971.One failure was observed during the user phase, in September 1971, and two failures during the test phase in 1971.The Data Set II is about On Line Data Entry IBM Software Package: The data reported by [Ohba (1984)] are recorded from testing an online data entry software package developed by IBM.The Data Set III is about Real-Time Command and Control System: The data set was reported by [Musa et el. (1987)] based on failure data from a real-time command and control system, which represents the failure observed during system testing for 25 hours of CPU time.
Applying data sets DS I, DS II and DS III on the models discussed above, we obtain the maximum likelihood (ML) estimates and goodness of fit measures, such as sum of squares due to error (SSE), mean squares error (MSE), predictive ratio risk (PRR) [Pham and Deng (2003)] and Akaike's information criterion (AIC) values for existing and for the proposed finite failure models are summarized in Table 1, 2 and 3  From the Table 1, we observe that the SSE, MSE, PRR, AIC and BIC values of our proposed delayed S-shaped linear, quadratic and exponential models are smaller than the existing delayed S-shaped and Pham model.Hence we may conclude that our proposed models perform better.Further we observe that the SSE, MSE, PRR, AIC and BIC values for delayed S-shaped exponential model are smaller as compared to the other two proposed models and existing delayed S-shaped and Pham model.Hence we conclude that our proposed delayed S-shaped exponential model performs better than all other models for Data Set I.  From the Figure 1, we observe that for our proposed delayed S-shaped linear, quadratic and exponential models, the estimated MVF is closer to the actual data points as compared to the MVF of existing delayed S-shaped and Pham software reliability growth models.Hence we conclude that our proposed models explains the data better than the given existing models.The Figure 2 illustrates the behavior of residual fault content functions of software reliability growth models for delayed S-shaped model and our proposed delayed S-shaped linear, quadratic and exponential models.We observe that in comparison with the existing closed S-shaped model our proposed models residual fault content functions are getting closer to zero as time increases.Hence our proposed models perform better for data set I From the Table 2, we observe that the SSE, MSE, PRR, AIC and BIC values of our proposed delayed S-shaped linear, quadratic and exponential models are smaller than the existing delayed S-shaped model and Pham model.Hence we may conclude that our proposed models perform better.Further we observe that the SSE, MSE, PRR, AIC and BIC values for delayed S-shaped exponential model are smaller as compared to the other two proposed models and existing delayed S-shaped model and Pham model.Hence we conclude that our proposed delayed S-shaped exponential perform better than all other models for Data Set II.From the Figure 3, we observe that for our proposed delayed S-shaped linear, quadratic and exponential models, the estimated MVF is closer to the actual data points as compared to the MVF of existing delayed S-shaped and Pham software reliability growth models.Hence we conclude that our proposed models explain the data than the given existing model The Figure 4 illustrates the behavior of residual fault content functions of software reliability growth models for delayed S-shaped model and our proposed delayed S-shaped linear, quadratic and exponential models.We observe that in comparison with the existing closed S-shaped model our proposed models residual fault content functions are getting closer to zero as time increases.Hence our proposed models perform better for data set II.
From the Table 3, we observe that the SSE, MSE, PRR, AIC and BIC values of our DEPENDENT FAULT CONTENT RATE FUNCTION proposed delayed S-shaped linear, quadratic and exponential models are smaller than the existing delayed S-shaped model and Pham model.Hence we may conclude that our proposed models perform better.Further we observe that the SSE, MSE, PRR, AIC and BIC values for delayed S-shaped exponential model are smaller as compared to the other two proposed models and existing delayed S-shaped model and Pham model.Hence we conclude that our proposed delayed S-shaped exponential perform better than all other models for Data Set III.From the Figure 5, we observe that for our proposed delayed S-shaped linear, quadratic and exponential models, the estimated MVF is closer to the actual data points as compared to the MVF of existing delayed S-shaped and Pham software reliability growth models.Hence we conclude that our proposed models explain the data than the given existing model The Figure 6 illustrates the behavior of residual fault content functions of software reliability growth models for delayed S-shaped model and our proposed delayed S-shaped linear, quadratic and exponential models.We observe that in comparison with the existing closed S-shaped model our proposed models residual fault content functions are getting closer to zero as time increases.Hence our proposed models perform better for data set III.

Conclusions and Remarks
In this paper we proposed three different software reliability growth models based on three types of fault content rate function ().The model parameters are estimated using maximum likelihood method by using the interval domain data type.Three data sets are illustrated for the above estimate technique and model comparisons.The proposed models are compared with the existing delayed S-shaped models, using sum of squares due to error, mean squares sum of error, predictive ratio risk, Akaikes information criterion and Bayesian Information criterion.It may be observed that the proposed models perform better than the existing models for all the three data sets.Further we also plotted estimated MVF values versus time.

,
where  is the error detection rate per error in the steady state.Solving the differential equation given in (2.1) with respect to () under the initial condition (0) = 0. Yamada et al. (1984) obtained the mean value function given by: () = [1 − (1 + ) − ]
respectively.Further we have plotted the actual data and estimated values of cumulative faults (or Mean Value Function (MVF)) of finite failure models against time for data sets DS I, DS II and DS II as shown in Figures 1, 2 and 3 respectively.The S-shaped model, Yamada et al (1984), with fault content function as constant is compared with the proposed delayed S-shaped linear, quadratic and exponential model.

Figure 1 :
Figure 1 : Plot of Data and Estimated cumulative Faults against time of DS I

Figure 2 :
Figure 2 : Plot of Estimated Residual Fault Content Function against time of DS I

Figure 3 :
Figure 3 : Plot of Data and Estimated Cumulative Faults against Time of DS II

Figure 4 :
Figure 4 : Plot Estimated Residual Fault Content Function against time of DS II

Figure 5 :
Figure 5 : Plot of Data and Estimated Cumulative Faults against of DS III.

Figure 6 :
Figure 6 : Plot of Estimated Residual Fault Content Function against time of DS III

Table 1 :
ML and AIC estimates of Models for DS I

Table 2 :
ML and AIC estimates of Models for DS II

Table 3 :
ML and AIC estimates of Models for DS III ANALYSIS OF DELAYED S SHAPED SOFTWARE RELIABILITY GROWTH MODEL WITH TIME DEPENDENT FAULT CONTENT RATE FUNCTION 874