Bayesian Behavior Scoring Model

Although many scoring models have been developed in literature to offer financial institutions guidance in credit granting decision, the purpose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer demographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scoring model is proposed to help financial institutions identify factors which truly reflect customer value and can affect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.


Introduction
Financial institutions, such as credit card issuing banks, make heavy use of various scoring models to make a decision on a credit application and to monitor a consumer' repayment behavior.The model for the former purpose is known as a credit scoring model while the model for the later purpose is known as a behavior scoring model.Most scoring models are built for yielding a binary outcome indicating whether an application should be approved given the firsttime applicant's debt ratio, credit report, and demographic variables.Regardless of purposes, the most common scoring models include logit model (e.g., Hand et al., 2005), linear discrimination analysis (e.g., Lee et al., 2002), data envelopment analysis (e.g., Min et al., 2008), and other data mining approaches such as neural networks (e.g., West, 2000;Lee et al., 2002;Baesens et al., 2003;Baesens et al., 2003), classification trees (e.g., Paass and Kindermann, 1998), support vector machine approach (e.g., Baesens et al., 2003;Huang et al., 2007), and rule-based approaches (e.g., Messier and Hansen, 1988).
Even though these scoring models offer financial institutions guidance in credit granting decision, the current scoring models suffer from the following limitations.First, the scoring model can only reflect an applicant's current status.But, in reality, the repayment ability of a customer is varied with personal circumstances which cannot be observed by credit card issuing banks and researchers.The most financial institutions can do is to monitor a customer's repayment history over time.As long as the defaulting signal appears, the issuing bank can prepare for the consequence.
Second, the scoring model is constructed under the assumption that the credit-granting decision made by the financial institution is correct.But, how the financial institution makes the credit-granting decision is usually unknown and questionable.Third, most scoring model concerns the discrimination ability, not its explanatory ability.The discrimination ability of a scoring model is usually evaluated by type 1 and type 2 errors.However, as mentioned above, using the discrimination ability to evaluate the performance of scoring model is questionable because it is given that "true" decisions made by the financial institution is correct.In addition, because the model is lack of explanatory ability, it provides limited information on customer types and its relationship to product attributes, such as APR (annual percentage rate) and credit limits.Therefore, these scoring models with decent predictive accuracy still cannot help financial institutions determine factors which truly reflect customer value, and how credit line and APR should be set to minimize defaulting risk.
To address these issues, we present a Bayesian behavior scoring model to parameterize the relationship between customer repayment ability, credit card product attributes (e.g., credit limits and annual percentage rate (APR)), and demographic variable.To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan.This dataset includes the transaction records, demographic variables, and payment information of 2,948 cardholders from January 1, 2010 to December 31, 2010.We successfully demonstrate that the proposed Bayesian behavior scoring method can be employed to evaluate customer repayment ability given credit card attributes and demographic variables.Our methodology gives financial institutions additional option in analyzing customer value and monitoring customers who may have higher default risk.This paper is organized as follows.Section 2 describes the proposed Bayesian behavior scoring model.Section 3 presents the empirical results.Concluding remarks is given in Section 4.

Bayesian Behavior Scoring Model
In this paper, we proposed a Bayesian behavior scoring model to study customer value.The rationality behind our model development is that consumer credit is granted given the likelihood that the borrower can repay the loan on time in the future.Thus, debt ratio should be an important factor for determining a credit applicant's repayment ability.
Let H denotes the total number of customers, w ij denotes the i th customer's amount of principal repayment at the j th period; p ij denotes the i th customer's total principal amount payable at the j th period; Q ij denotes the i th customer's amount of interest paid at the j th period; s ij denotes the i th customer's total interest payment due at the j th period.Then, the i th cardholder's repayment ability at the j th period is defined as In Eq.( 1), the first term (w ij /p ij ) represents a customer's principal repayment ability, and the second term (Q ij /s ij ) is used to measure a customer's interest repayment.ρ (0 ≤ ρ ≤ 1) represents the weight given to the importance of each repayment ability.Since M ij is a weight average of two ratios, the value of M ij is between 0 and 1.The greater M ij is, the higher repayment ability the i th customer has.
To further determine how customers' credit limits (x 1ij ) and APRs (x 2ij ) affect customer repayment ability and explain the heterogeneity of customer repayment ability, we let x 1ij represent the i th credit cardholder's credit limit at the j th period, x 2ij represent the i th credit card holder's APR at the j th period, z i represent the i th credit card holder's demographic variables.M ij /(1 − M ij + k) is the odds of the customer repayment, where k is a small positive constant added to avoid having a zero denominator which occurs when a credit cardholder pays in full on time.The advantage of using the odds of the customer repayment is that we can measure the relative likelihood of being a valuable credit cardholder.For example, when the odds equals to 4, the likelihood of being an accountable customer is four times of the likelihood of being delinquent in payment.We let the behavior scoring model for the i th cardholder take the loglinear form and define it as: where J i denotes the total number of observation of the i th cardholder.Therefore, the behavior scoring model defined in Eq.( 2) can be expressed as where B i = (β i0 , β i1 , β i2 ) represents intercept and coefficients of product attributes credit limit (x 1ij ) and APR(x 2ij ).We further let then Eq.( 3) can be re-defined as To further investigate the influence of demographic variables on customer response to credit card attributes, we further assume that where z i is a d × 1 vector of the i th cardholder's demographic variables, Θ is a 3 × d coefficient matrix, and ∆ i is a 3 × 1 vector of the i th cardholder's stochastic terms distributed as N 3 (0, Λ).
The Gibbs sampler was used to estimate the proposed model given the following conjugate priors: where vec(Θ) stacks the columns of Θ which is a 3 × d coefficient matrix.The detail explanation regarding the Gibbs sampler and conjugate priors can be found at Gelfand and Smith (1990) and Smith and Roberts (1993).
Markov Chain Monte Carlo (MCMC) procedure is carried out by generating draws from the following full conditional distributions: The calculation details of the full conditionals above are given in the appendix.

Empirical Study
A dataset provided by a leading bank in Taipei, Taiwan, is used to illustrate the proposed Bayesian behavior scoring model.This dataset contains the information of credit cardholders whose applications were approved between 2008 and 2009 and have transaction records spanned over a complete year from January 1, 2010 to December 31, 2010.The transaction data contains account-level information, such as the transaction date, the amount of transaction, repayment date and the amount of repayment.
To ensure the consistency of data field and the requirement of Bayesian behavior scoring model, data preprocessing was needed.We first used a customer identification to match two raw datasets: one contains effective credit card account information of 25,328 credit cardholders, and the other stores over 5 million individual transaction records for these accounts.To satisfy the requirement of Bayesian model estimation, we only kept those credit card holders who have at least 8 repayment records within one-year period.In this case, 8,915 credit card holders were obtained.Then, we randomly selected 2,948 credit card holders, which represent 25% of the valid sample, to build the behavioral scoring model.In this sample of 2,948 credit card holders, who have at least 8 repayment records within a one-year period, each cardholder has his/her five demographic variables, annual income, and credit limits with APR granted by the bank.The summary statistics for variables are presented in Table 1.
The model was evaluated by assuming ρ = 0.9, ρ = 0.5, and ρ = 0.1 for different decision scenarios.When ρ = 0.5, equal weights were assigned to a customer's principal repayment ability and a customer's interest repayment ability.When ρ = 0.9, a customer's principal repayment ability is the major concern of the credit card issuer.On the other hand, when ρ = 0.1, a customer's interest repayment ability is much more important than a customer's principal repayment ability.For all scenarios, the MCMC ran for a total of 2,000 iterations and converges after 1,500 MCMC iterations.The draws of last 500 iterations were used to compute posterior estimates.

The Posterior Estimates of B i
To better demonstrate the result, we computed the average of the posterior draw of B i by letting β = H i=1 B i /H.The posterior estimates β and exp(β) are presented in Tables 2 and 3, respectively.The purpose of taking exponential on β is to exam the impact of its corresponding product attribute on the likelihood of being default and the likelihood of being responsible.
As shown in Table 2, for all scenarios, all estimated value of β are away from zero, indicating that both credit limit and APR have significant impacts on the odds of customer repayment.Table 3 shows that, when APR and credit limits are not considered, the average likelihood of being a responsible customer is 1.68, 10.09 times, and 19.74 times of being a default customer for ρ = 0.1, ρ = 0.5, and ρ = 0.9 respectively.When one unit of credit limit is granted, the average likelihood of being a responsible customer is 1.39, 1.67, and 2.46 times of being a default customer given everything else remains the same for ρ = 0.1, ρ = 0.5, and ρ = 0.9 respectively.Finally, when APR is increased by one unit, the average of likelihood of being a responsible customer is 0.02, 0.007, and 0.003 times of being a default customer given everything else remains the same.
Regardless of the scenarios, the following pattern can be observed in Table 3.When the value of ρ increases, the values of exp(β 0 ) and exp(β 1 ) increase, but the value of exp(β 2 ) decreases.This result indicates that increasing APR will raise the default probability greatly because the high incurring interest will make a cardholder even more difficult to reduce his/her total debt.The positive association between the likelihood of being accountable and credit limit is due to the fact that higher credit limit is usually assigned to an applicant with better credit score and higher income.Thus, this result is consistent to the general expectation that an applicant with higher income can usually keep up regular payment.Finally, the change in exp(β 0 ) for different values of ρ shows the critical impact of a credit card issuer's policy on the likelihood of being accountable.A more conservative policy is implemented if a credit card issuer's concern a credit cardholder's principal repayment ability more (e.g., ρ = 0.9) and appreciate those who can pay principal regularly.If a credit card holder can pay most of principal, less interest will incur, and he has less chance to be default.

Posterior Estimates of Θ
The influence of each demographic variables (Z d ) on the coefficients (β i0 , β i1 , β i2 ) of credit card attributes for each scenario is summarized in Table 4.As shown in Table 4, if the weight assigned to the principal repayment ability increase (e.g., ρ = 0.9), demographic variables have less explanatory ability in the coefficients (β i0 , β i1 , β i2 ) of credit card attributes.For example, for the scenario of ρ = 0.9 in Table 4(c), only gender has influence on the coefficient of APR (β i2 ).However, for the scenario of ρ = 0.1 in Table 4(a), annual income, gender, industry sector, and education have significant influence on β i0 , while the coefficient of credit limit (β i1 ) is influenced by intercept, work class, and industry type, and the coefficient of APR (β i2 ) is influenced by gender and education.
For ρ = 0.1, the male customer who has less education, less income, and work at the service area is more likely to keep up with the repayment given the fixed APR and credit limits.This result is reasonable for the scenario of ρ = 0.1 because those customers may not have ability to pay most of the principal, but can pay interest regularly on time.In addition, work class (Z 4 ) and industry sector (Z 5 ) have little influence on the coefficient of credit limit (β i1 ) since, in Table 4(a), the posterior estimates of Θ 2,5 and Θ 2,6 are very close to zero.
For ρ = 0.5, except annual income (Z 2 ) and marital status (Z 6 ), demographic variables have no significant effect on credit limit.This result is reasonable because credit limit is usually granted according to annual income.Besides, this result also suggests that high income cardholders tends to have a greater coefficient of credit limit (β i1 ), which implies that high income cardholders have higher likelihood to be accountable for both principal and interest repayment.Compared to the married cardholders, the unmarried cardholders are more sensitive to their credit limit, and they are less accountable for credit card repayment.

Posterior Estimates of Λ
Table 5 shows the posterior estimates of Λ for ρ = 0.1, ρ = 0.5, and ρ = 0.9 respectively.The covariance estimates are given in the upper triangular of the matrix, and the correlation estimates are given in the lower triangular matrix.The standard deviation of the estimates is shown in parentheses.As shown in Table 5, the variance of intercept increases when the value of ρ increases.It suggests that, when the bank places more weight on the principal repayment ability, the estimates of β i0 become more heterogeneous and cardholders have various levels of repayment abilities.So, placing more weight on the principal repayment ability becomes a better way to distinguish customer types.Even though the negative correlation estimates between intercept and APR at different values of ρ can be observed, the scenario of ρ = 0.9, relative to those of ρ = 0.1 and ρ = 0.5, has stronger negative correlation.The negative correlation between intercept and APR is also observed.The stronger negative correlation is observed when ρ = 0.1, and the estimated values for the scenarios of ρ = 0.5 and ρ = 0.9 are close to each other.
Even though no clear pattern can be observed from the variance of APR and the covariance between APR and Credit Limit, we view the variance of APR in all three scenarios indifferent because all estimated values are close to 10.For the covariance between APR and Credit Limit, the estimated values for the scenarios of ρ = 0.1 and ρ = 0.5 are close to −0.6 while the estimated values ρ = 0.9 becomes −0.9612.This pattern described above suggests that greater consumer heterogeneity is observed when more weight is placed on the principal repayment ability (ρ = 0.9).This result also suggests that the principal repayment ability can be a better way to discriminate customer's repayment ability.
The negative association is observed between the coefficient of APR (β 2 ) and the coefficient of intercept (β 0 ), and the coefficient of APR (β 2 ) and the coefficient of credit limits (β 1 ) in all scenarios.The negative correlation between the coefficient of APR (β 2 ) and the coefficient of credit limits (β 1 ) is supported by Table 2 in which the posterior mean β 1 and β 2 have opposite sign.The same explanation can be applied to the negative association between the coefficient of intercept (β 0 ) and the coefficient of APR (β 2 ) as well.The negative association between the coefficient of intercept (β 0 ) and he coefficient of credit limit (β 1 ) is observed even though both β 0 and β 1 have positive values as indicated in Table 2.We believe that this occur because of the greater variance of intercept (β 0 ), which leads to a greater variance in each individual's intercept coefficient estimates.

Summary
In this empirical study, we have demonstrated that the proposed Bayesian behavior scoring method can be employed to evaluate the odds of customer repayment ability given credit card attributes.Moreover, the proposed model can successfully identify factors which will lead to higher default probability.Therefore, financial institutions can have additional option in analyzing customer value and monitoring customers who may have higher defaulting risk.

Concluding Remarks
Even though inappropriate credit granting decisions can result in high credit risk, monitoring and controlling the default risk after credit is granted is equally important.In this paper, a Bayesian scoring model is constructed to parameterize relationship among the odds of being accountable, financial product attributes, and customer demographics.Unlike most scoring models developed in literature which either focuses on improving the discrimination ability and prediction accuracy of a model or deriving the decision rule for credit granting, our methodology can help the financial institutions identify factors which can lead to higher default risk.As long as the defaulting signal appears, the credit issuers can take action and prepare for the consequence.
A credit card dataset provided by a local bank in Taiwan is used to illustrate the proposed Bayesian behavior scoring model under three different scenarios.In general the empirical result shows that both credit limit and APR have significant impacts on the odds of customer repayment.Specially increasing APR will raise the default probability greatly.This result is reasonable because the high incurring interest will make a cardholder even more difficult to reduce his total debt.The unmarried cardholders are more sensitive to their credit limit, and they are less accountable for credit card repayment.High income customers have higher likelihood to be accountable for credit card repayment.Female cardholders and cardholders with higher education are more likely to have good repayment ability.This research demonstrates a practical Bayesian application in monitoring customer default risk.The model itself is straightforward and can be extended easily to other financial products, such as mortgage or car loan.In future research, the objective function, such as profit maximization or risk minimization, can be incorporated into the proposed model to design credit granting policies. Prior: The full conditionals for MCMC can be obtained as follows: So, the full conditional of Θ is Thus, the posterior draw of Θ can be generated from Vec(Θ) ∼ N W • Z ⊗ Λ −1 Vec(B ) + V −1 0 u 0 , W , ) = Vec Θ • Z + Vec(∆ ) = (Z ⊗ I M ) • Vec(Θ) + Vec(∆ ), and Vec(B ) ∼ N ((Z ⊗ I M ) • Vec (Θ) , I H ⊗ Λ) .

Table 1 :
Summary statistics for the variables

Table 4 :
Posterior estimates of Θ

Table 5 :
Posterior estimates of Λ