PREDICTING MORTGAGE DEFAULT: A DISCRIMINANT ANALYSIS OF CAUSAL FACTORS.

By

Douglas Waldo, University of Sarasota

Robert Wharton, Palm Beach Atlantic College

Abstract

Confronted by an increasingly competitive environment, combined with the need for expanded portfolios, lenders are faced with the need to accept greater risk. In doing so, competitive pressures have called for more accurate risk assessment of mortgage portfolios. This study analyzed the predictive ability of variables commonly used in credit-scoring. Accurate prediction of defaulted mortgages was significantly increased through an analysis of loan-to-value ratios, interest rates, and years at current address.

Introduction

Melchiorre (1995) associated the growth in mortgage default rates with the increasing popularity of Adjustable Rate Mortgages (ARMs). While this type of mortgage agreement contributes to the feasibility of many loan applications, the experience of what Shelton (1995) referred to as payment shock, has contributed to the prevalence of delinquency and default over the past several decades. While many borrowers enjoy the benefit of lower payout in the initial years of the agreement, subsequent rate adjustments in excess of increases to borrower income or appraised home value, often place borrowers in seemingly unrecoverable circumstances. Regardless of whether default can be avoided, adjustments to mortgage interest rates clearly influence borrowers’ ability to meet their obligations.

Antecedents for Default

Nadler (1993) referred to a study conducted by the debt-rating agency, Fitch Investors Service of New York, that analyzed regional default rates and their deterministic variables. Using regional economic data, a model was devised to study the influences of two types of variables on mortgage default rates for a given locale. The variables chosen for the model reflected home equity and borrower’s ability to pay. Specifically, regional unemployment rates were considered because of their explanatory power in determining the consistency of borrower income and therefore ability to make mortgage payments.

When income is lost, even temporarily, borrowers choose among competing debt payments in order to remain financially solvent. This decision is made more difficult when the borrower is confronted by numerous debt service obligations during an already stressful period. While many of these debts are not secured, the vigorous collection initiatives used by credit card companies often cause the borrower to dedicate depleting funds to commitments other than the mortgage obligation. Equity considerations and years at current residence were determined to be quite significant in avoiding default (Nadler, 1993). Expectations regarding equity and future value have a tremendous influence on the efforts with which borrowers avoid delinquency, and ultimately default.

Nadler (1993) found that one traditional underwriting variable, borrower income, was not substantiated by empirical research. In fact, despite popular assumptions, a positive correlation appeared between income and default risk at higher income levels. Thus, it was confirmed that prevailing beliefs regarding ability-to-pay variables are fallible.

In analyzing the presence of racial discrimination in residential lending markets, Nesiba (1996) studied the home appraisal process as a contributory element to mortgage insolvency. While searching for discriminatory pressures in home valuation, his research found that estimates of default risk were significantly affected by previous real estate transactions within the immediate geographic vicinity. Loan applications originating from areas characterized by comparatively fewer transactions were likely to be denied with greater frequency. Those loans actually packaged for closing were likely to require a higher down payment and higher rate of interest than those originating from areas with greater real estate traffic, thus increasing the probability of default.

Nesiba (1996) utilized these findings to confirm that underlying assumptions regarding racial discrimination in residential lending were often the result of inadequate transaction history and not perceptible bias. However, the effects of more stringent loan packaging in affected neighborhoods yield a higher default rate that appears to be racially skewed. Future loan applications are then developed with measures of risk that are not indicative of the borrower’s ability to meet the mortgage obligation. The inadequacy of available information furthers the notion of race as a contributory element to the failures of credit scoring. In fact, loans associated with these assumptions are more adversely influenced by improper loan-to-value measures. As a result, a self-fulfilling prophecy is established. Lang and Nakamura (1991) found that the reinforcement of poor transaction history and information gathering have promoted the continuation of racially biased residential lending decisions and inaccurate loan-to-value assumptions.

Lending programs that allow higher loan-to-value ratios provide the opportunity for home ownership to many borrowers who would otherwise fail to qualify. Higher loan-to-value programs therefore are subject to greater risk since they are typically comprised of borrowers with lower savings from which to cover their down payments. The lack of savings also contributes to the likelihood of insolvency as less protection is available in periods of job loss or other financial crisis. Borrowers in higher loan-to-value programs are more apt to deplete financial resources sooner during such crisis and therefore are less likely to avoid default. Hence the loan-to-value ratio is a critical variable in assessing the default risk of individual loan applications.

Credit Scoring

Sullivan (1994) remarked that credit scoring gained popularity over the past few decades because of its emphasis on reducing risk in loan applications and managing existing loans more profitably. Further, Dennis (1995) noted that credit scoring has promoted a greater predictability of loan performance as well as greater marketability of loan portfolios. Finally, Kulkosky (1996) pointed to greater efficiencies in loan origination, servicing and secondary marketing as a result of credit scoring.

Shelton (1995) identified two characteristics of credit scoring that demonstrated more reliable decision outcomes than traditional managerial judgment. First, while traditional assessment of loan applications has been plagued by suggestions of discrimination and subjectively, automated credit scoring systems benefit from objectivity (Sonntag, 1995). The objective nature of scoring systems lends greater predictability in that assumptions used in modeling are based on variables with statistical weighting (Taylor-Shoff, 1997). Second, given the historical analysis of the data, weighting applied to the data provides an optimization of credit scoring. That is, subjective and discriminatory measurements are compensated for in the design of the scoring model, based on the historically significant predictability of each variable.

Method

In determining the applicability of variables used in credit scoring, this study utilized multivariate discriminant analysis (MDA) to classify variables taken from mortgage applications. The data was selected through a simple random sampling that resulted in a pool of applications comprising 200 non-default and 37 default mortgages originating between 1986-1993. The sample is both cross sectional and time series in nature.

The variables analyzed include: number of dependents (DEPEND), loan-to-value ratio (LTV), marital status (MARSTAT), payment-to-income ratio (PAYINC), interest rate (RATE), years at current address (YRSADD), and years at current job (YRSJOB). The variable observations for each mortgage were analyzed following the designation of testing and training samples utilizing the holdout method. A training sample of 100 non-default mortgages and 18 default mortgages was compared with a testing sample of 100 non-default mortgages and 19 default mortgages. Summary statistics were prepared for variables observed in both default an non-default observations (Table 1).

Variables were selected for inclusion in the model using the stepwise inclusion method. The selection rule utilized in this process sought to maximize the minimum Mahalanobis distance (D2) between the default and non-default group. The result of the initial stepwise selection process yielded four remaining variables. This process was completed a second time with three resulting variables. While a third selection was attempted, no further elimination was needed.

Testing Assumptions

In utilizing discriminant analysis, two assumptions were tested during the process. First, the observations were tested for equal covariance matrices. The following hypothesis was tested at both the .01 and .05 significance levels while calculating Box’s M:

Ho: covariance matrices are not equal.

The resulting approximate F was 1.67, compared with the critical F of 1.7 at the .01 significance level and 1.46 at the .05 significance level. Therefore the null hypothesis was rejected at the .05 level and accepted at the .01 level.

Second, the variables were tested for normal distribution using the Ryan-Joiner test (Table 2). The following hypothesis was tested at both the .01 and .05 significance levels for the default and non-default groups.

Ho: the variable is not from a normal population.

The results of this test indicate that the null hypothesis was rejected at both the .05 and .01 significance levels for all but two of the variables in the default group and all but three in the non-default group.

Standardized Coefficients

Table 3 lists the standardized coefficients and the group means. Standardized coefficients demonstrate the contribution of each coefficient to the power of the discriminating function relative to the contribution of the other coefficients in the equation. According to the group means, a negative sign tends to place higher values of the particular variable in the non-default group while higher values with a positive sign tends to be associated with the default group. Both years at current address (YRSADD) and years at current job (YRSJOB) have negative signs. In other words, the greater the number of years for either variable, the more likely the individual will be categorized as non-default. The remaining variables, number dependents (DEPEND), loan-to-value ratio (LTV), marital status (MARSTAT), payments per income (PAYINC), an interest rate (RATE) have positive signs.

The Discriminant Function

After the second phase of the selection process, the discriminant function was derived:

default = -7.68 + 2.75 (LTV) + .78 (RATE) – .054 (YRSADD)

Cross validation provided accurate classification for 73% of the non-default group and 86% of the default group, with an overall accuracy of 74.7% of the 217 observations.

The Hold Out Method

The canonical discriminant function for the training set was derived:

default = -7.6 + 3.417 (LTV) + .698 (RATE) – .042 (YRSADD)

When applied to the holdout sample, this function yielded an accurate classification for 72.6% of the non-default group and 93.8% of the default group, with an overall accuracy of 75.7% for all observations.

Discussion

An important finding emerges out of a mere review of the descriptive statistics. The three variables with significant variation between the means of default and non-default groups were selected by the variable selection process. One implication is that by a cursory observation of the descriptive elements of this study, variables with greater discriminating power can be determined with considerable accuracy.

Recent literature concedes that criteria for evaluating the performance of discriminant functions focuses on classification results. According to the result of both the cross-validation procedure and the holdout method, approximately 75% of all observations were accurately classified. A comparison of these two methods indicates that the holdout method was more accurate overall, but performed significantly better when predicting default observations. This finding is significant because a majority of the recent literature has supported the notion of incorrect classification of a default observation as a costlier error.

The results of this study indicate the credit-scoring systems, relying on variables such as loan-to-value ratio, interest rate, and years at current address, can improve the predictability of mortgage solvency, thereby reducing the likelihood of default through the approval process. Finally, while many variables are available for assessing default risk, utilizing the variables mentioned here can contribute to an assessment that more accurately reflects the solvency of the obligation, ignoring those variables previously assumed to have a demographic or geographic bias.

Table 1

Descriptive Statistics

Non-default group Default group

Variables: Mean St.Dev. Mean St.Dev.

AGE 47.03 14.99 44.83 12.76

DEPEND 2.12 1.06 2.44 1.54

LTV 0.72 0.16 0.82 0.18

MARSTAT 1.69 0.95 1.67 0.96

PAYINC 0.14 0.06 0.13 0.08

RATE 7.56 1.13 9.01 1.03

YRSADD 6.59 6.53 4.45 5.57

YRSJOB 8.31 6.94 7.07 6.64

Table 2

Ryan-Joiner Test for Normality

Variable: Non-default group Default group

RATE .9967 .9859

MARSTAT .9998 1.000

DEPEND .9859 .9766

YRSADD .8740 .7931

YRSJOB .9398 .9134

LTV .9731 .9352

PAYINC .9795 .9822

Decision Rule: (.01 significance level) Reject if < .9865

(.05 significance level) Reject if < .9835

Table 3

Standardized Canonical Discriminant Function Coefficients

and Group Means

Variable:

DEPEND .51335

LTV .46197

MARSTAT .13503

PAYINC .04233

RATE .73897

YRSADD -.36413

YRSJOB -.03109

Group Means

Default Group: 1.42667

Non-default Group: -.19329

Table 4

Phase One: Variable Selection

Step Variable Minimum D2

1 RATE 1.24170

2 LTV 1.75156

3 DEPEND 2.20774

4 YRSADD 2.58312

Discriminant Function:

Default = -8.21+.349(DEPEND)+3.29(LTV)+.685(RATE)-.062(YRSADD)

Phase Two: Variable Selection

Step Variable Minimum D2

1 RATE 1.39640

2 LTV 1.81064

3 YRSADD 2.04769

Discriminant Function:

Default = -7.68 + 2.75 (LTV) + .78 (RATE) – .054 (YRSADD)

Table 5

Comparison of Cross-validation and Holdout Method

0 = Non-default

1 = Default

Classification Cross-validation Holdout

0 as 0 137 69

0 as 1 51 26

Subtotal N 188 95

1 as 0 4 1

1 as 1 25 15

Subtotal N 29 16

Total N Correct 162 84

Total N 217 111

% Correct Default 86% 94%

% Correct Non-default 73% 73%

% Correct Overall 74.7% 75.7%

References

Dennis, Warren L., “Fair lending and credit scoring,” Mortgage Banking, Vol. 56, pp. 55 – 59, 1995.

Kulkosky, Edward, “Credit scoring could have a downside, experts say,” American Banker, Vol. 161, p. 8, 1996.

Land, William W., and Nakamura, Leonard I., “Flight to quality in banking and economic activity,” Journal of Monetary Economics, Vol. 36, pp. 145 – 164, 1995.

Melchiorre, Camillo T., “A new weapon in default servicing,” Mortgage Banking, Vol. 55, pp. 26 – 33, 1995.

Nadler, James, “Mapping default zones,” Mortgage Banking, Vol. 54, pp. 127 – 134, 1993.

Nesiba, Reynold F., “Racial discrimination in residential lending markets: Why empirical researchers always see it,” Journal of Economic Issues, Vol. 30, pp. 51 – 78, 1996.

Shelton, Larry W., “A default prevention effort,” Mortgage Banking, Vol. 55, p. 89, 1995.

Sonntag, Janet, “The debate over credit scoring,” Mortgage Banking, Vol. 56, pp. 46-52, 1995.

Sullivan, Deidre, “Scoring borrower risk,” Mortgage Banking, Vol. 55, pp. 94 – 99, 1994.

Taylor-Shoff, Sally, “Shedding new light on credit scoring,” Mortgage Banking, Vol. 57, pp. 56 – 62, 1997.

Leave a Reply

You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>