THE NON-LINEAR RELATIONSHIPS OF NUMERIC FACTORS ON HOUSING PRICES BY USING GAM

Most research on housing price modeling utilize linear regression models. These research mostly describe the actual contribution of factors in a linear way on magnitude, including positive or negative. The goal of this paper is to identify the non-linear patterns for 3 major types of real estates through model building that includes 49 housing factors. The datasets were composed by 33,027 transactions in Taipei City from July 2013 to the end of 2016. The non-linear patterns present in the combination manner of a sequence of uptrends and downtrends that are derived from Generalized Additive Models (GAM).


Introduction and motivation
Similar types of homes command varying prices in different regions and neighborhoods (see Visser et al. [1]). Therefore, it is necessary to encompass the environmental factors for investigating housing factors. Hanink et al. [2] and Zoppi et al. [3] analyzed the relationship between such housing values and a set of determinants that were related to both the urban environment and the housing market's structural factors. To build upon these analyses, this paper also adopts both structural and environmental factors to analyze housing prices in Taipei city.
For describing environmental factors, based on the works of Chen and Wang [4], this paper also applied are proximity (see Sah et al. [5]) and a number of specific factors (see Wang et al. [6]). Proximity represents the degree of convenience to reach a POI and the number of specific factors shows the maturity of factors in the vicinity.
There are many studies working on environmental factors. Kim and Lahr [7] and Shyr et al. [8] worked on Metropolitan Rapid Transit (MRT) systems. Wen et al. [9] , Wang [10] and Owusu-Edusi et al. [11] found the distance to different types of schools are significant. Emrath [12] ,and Pope and Pope [13] focused on shopping centers. Public parks were held the attention by Wu et al. [14] and Hammer et al. [15]. Chiang et al. [16] were interested in convenience stores.
The aforesaid studies discuss on specific environmental characteristics. Most of such studies utilize linear regression models. This paper encompasses more factors in order to figure out the interesting patterns of factors of 3 major housing types in Taipei.

Data
This study's fundamental data was downloaded from the Taiwan Actual Price Registration (APR). The APR's factors are treated as structural (shown in Appendix factor 1~18). Environmental aspects in the vicinity of homes were retrieved from a designated distance circle (1,200m) that applied the house as the center.
Following Chen and Wang [4], the major housing types addressed in this paper also included apartments (APT); buildings (BLD); and suites (SUT). In total, this paper applies 49 factors that are listed in the Appendix 1. The overall data includes 33,027 observations from July 2013 to end of 2016. About each housing type's amount, there were 8,891 of APT, 19,066 of BLD, and 5,070 of SUT.
Housing price was applied as the dependent variable. Other housing factors were applied as the independent variables. In this paper, conducted was a 5-fold cross validation.

Methodology
Trevor Hastie and Robert Tibshirani developed GAM in 1986 (see Hastie [17]). GAM is a generalized linear model with a linear predictor involving a sum of the smooth function of covariates (see Wood [18]). GAM is an additive modeling technique in which the predictors' effectiveness is derived from smooth functions. In general the model has a structure like equation: where Y i~ some exponential family distributions (Gaussian, Gamma, Poisson and Binomial) and is a the dependent variable, E(Y i ) represents the expected value, g() denotes the link function that links E(Y i ) to the predictor variables x i , α denotes the intercept or mean, X i * is a row of the model matrix for parametric model components (category), θ is the corresponding parameter vector, s i (x i )is the smooth, nonparametric function, β i are coefficients of the smooth, p is the number of factors (numeric), and ε i is the residuals.
In this research, the adopted smooth function are the thin plate regression splines. The argument of family in GAM is 'Gaussian' as the housing price will be on a normal distribution. The link function is set to 'identity' that is 'not using' a link function.
The research flow is shown in Figure 1 below. This paper utilizes the R 3.3.3 language. GAM can be applied in the mgcv 1.8.17 package (see Wood [19]) to figure out non-linear relationships.

Factors selection procedures
For digging out more applicable factors, this paper utilizes two factors selection procedures. First, variance inflation factors (VIF) helps to discover the factors having higher collinearity and then to remove them. Second, the Akaike's information criterion (AIC) is adopted in forward, backward, and stepwise selection procedures to determine the modified model with the most suitable factors. The modified model defined in this paper is a model having the lowest adj_R2 with fewest number of factors. The results of factors selection are depicted in Table 1 below. The number in parentheses after adj_R2 is the number of factors chose. The complete model includes all factors. The VIF model removes factors based on a stepwise selection on factors by using VIF value. This selection is suggested by Zainodin et al. [20] and uses VIF < 5 or even lower criteria as an exclusion rule that thus shows there is no serious collinearity problem. Table 2 show the means and standard deviations of the 3 housing types' respective housing prices. On average, buildings are the most expensive housing type in Taipei (worth over NTD 27 million), followed by apartments (about NTD 14 million), and suites (about NTD 9 million). The housing prices of apartments hold the largest range.  Table 3 demonstrates adj-R2 values of the 3 housing types. In this paper, the factors for 3 housing types utilized are from the modified models. These modified models have different factors that are shown in Appendix without '-' label.

The non-linear patterns
Chen and Wang [4] have shown some patterns, such as floor area, housing age, etc. This paper presents other interesting patterns, especially those are non-linear ones having the shapes of V, Λ, L, etc.

Apartments
The pattern of floor numbering for apartments represents the 'L' shape as shown in Figure 2.

Buildings
The patterns of distances to shopping malls, hospital and parks all have 'V' shapes for buildings as shown in Figure 3 (a), (b) and (c) respectively. These factors' effectiveness in proximity for buildings are that housing prices are higher at places either near these factors or farther from them. The lowest housing prices appear at the distance of 0.56 km, 0.8 km, and 0.3 km far from shopping malls, hospital and parks respectively.

Suites
The patterns of distances to senior high schools, hospital and shopping malls also have 'V' shapes for suites as shown in Figure 4

Summary
Overall, this paper leverages GAM and applies its advantages to detail the major relationships and factors involving apartments, buildings, and suites. This paper's main contribution is to adopt GAM to detail the relationships between interesting factors. Those relationships are nonlinear.