Detecting Financial Fraud Using Data Mining Techniques: A Decade Review from 2004 to 2015

: Objective: Financial fraud has been a big concern for many organizations across industries; billions of dollars are lost yearly because of this fraud. So businesses employ data mining techniques to address this continued and growing problem. This paper aims to review research studies conducted to detect financial fraud using data mining tools within one decade and communicate the current trends to academic scholars and industry practitioners. Method: Various combinations of keywords were used to identify the pertinent articles. The majority of the articles retrieved from Science Direct but the search spanned other online databases (e.g., Emerald, Elsevier, World Scientific, IEEE, and Routledge - Taylor and Francis Group). Our search yielded a sample of 65 relevant articles (58 peer-reviewed journal articles with 7 conference papers). One-fifth of the articles was found in Expert Systems with Applications (ESA) while about one-tenth found in Decision Support Systems (DSS). Results: 41 data mining techniques were used to detect fraud across different financial applications such as health insurance and credit card. Logistic regression model appeared to be the leading data mining tool in detecting financial fraud with a 13% of usage.In general, supervised learning tool have been used more frequently than the unsupervised ones. Financial statement fraud and bank fraud are the two largest financial applications being investigated in this area – about 63%, which corresponds to 41 articles out of the 65 reviewed articles. Also, the two primary journal outlets for this topic are ESA and DSS. Conclusion: This review provides a fast and easy-to-use source for both researchers and professionals, classifies financial fraud applications into a high-level and detailed-level framework, shows the most significant data mining techniques in this domain, and reveals the most countries exposed to financial fraud.


Introduction
Financial fraud has been a big concern for many organizations across industries and in different countries since it brings huge devastations to business.Billions of dollars are lost yearly due to financial fraud; Bank of America, for example, agrees to pay $16.5 billion for resolving financial fraud case [49].Also, IRS (2014) indicates that Mr. Walker, the founder of Bixby Energy Systems, deceived more than 1,800 investors and committed multi-million dollar fraud.His fraudulent actions involve providing false statements of a) his subordinates' salaries and commissions; b) the operational capacity of the firm's core products, and c) an initial public stock offering [30].Hence, the numbers still indicate this is a growing problem, which needs more attention from professionals and academicians.
Financial fraud detection tools have been brought to scenic in order to address this problem and to provide reliable solutions to business.Financial fraud is normally discovered through outlier detection process [32] enabled by data mining techniques, which also identify valuable information by revealing hidden trends, relationships, patterns found in a large database [25].Data mining, defined as "a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequently gain knowledge from a large database" [50], is a major contributor for detecting different types of financial fraud through its diverse methods, such as, logistic regression, decision tree, support vector machine (SVM), neural network (NN) and naïve Bayes.Some of these techniques outperform the others in specific financial contexts.Glancy and Yadav (2011) divide those contexts to three main areas: internal, insurance and credit [22].Jans et al. (2011) further classify internal fraud into two categories: financial statement fraud and transaction fraud [31].They define financial statement fraud as "the intentional misstatement of certain financial values to enhance the appearance of profitability and deceive shareholders or creditors" while transaction fraud captures the process of snatching organizational assets.
Although detecting financial fraud is considered a high priority for many organizations, the current literature lacks for an up-to-date, comprehensive and in-depth review that can help firms with their decisions of selecting the appropriate data mining technique.Ngai et al. (2011) provide a well-organized and detailed literature review on detecting financial fraud via data mining methods based on 49 articles ranging from 1997 to 2008 [50].However, the specified time period is not able to capture the increasing trend of research in this area, specifically in the year of 2011, which is considered as a record year in financial fraud [11].This has motivated us to extend Ngai et al.'s review and contribute by 1) revealing which context should implement what technique of data mining, 2) unfolding what technique can yield a higher classification accuracy in detecting financial fraud, 3) providing a new classification framework for financial fraud, and 4) expanding the sample of the reviewed articles to make it one of the most comprehensive reviews on this topic.Overall, this paper is an attempt to leverage our knowledge and to increase our understanding of data mining applications in financial fraud.

Literature Review
Due to its high importance, financial fraud has been given a considerable attention in prior research.Literature has tapped on different types of financial fraud using different methods of data mining.

Method
A number of keywords was used to identify the pertinent articles, for instance, "detecting financial fraud, financial fraud and data mining, financial fraud detection, and detecting financial fraud via data mining".Most of the relevant articles were found in MIS related journals, e.g., Expert Systems with Applications and Decision Support Systems but some were found in finance and economic related journals, e.g., Journal of Risk and Insurance, and Applied Economics.Table 2 lists thirty-nine titles for both journals and conferences included in our analysis.
Although the majority of the articles retrieved from Science Direct, the search spanned other online databases (e.g., Emerald, Elsevier, World Scientific, IEEE, and Routledge -Taylor and Francis Group).Our search yielded a sample of 65 relevant articles (58 peer-reviewed journal articles with 7 conference papers).One-fifth of the articles was found in Expert Systems with Applications while about one-tenth found in Decision Support Systems (Table 2).Hence, these two journals have been the primary outlet for this topic.However, most of the articles had been conducted in the United States, followed by Taiwan, China and Spain (Table 3).

Results
This section highlights the most frequent data mining techniques used in financial fraud associated with their usage frequency, description and business application.Also, based on the reviewed different applications of financial fraud, this section provides a new classification scheme at two levels: high and detailed.

Usage Frequency of Data Mining Techniques
Out of 41 data mining techniques used in the reviewed articles, Table 4 shows the most applied ones in a period ranging from 2004 to 2015.Logistic regression model appears to be the leading data mining technique in detecting financial fraud with a 13%, followed by both of neural network and decision tree, with a 11%.While support vector machine is represented by a 9% and naïve Bayes is represented by a 6%.Besides fraud detection, data mining techniques can address a wide array of business applications, for example, bankruptcy prediction, sales forecasting and scheduling optimization as shown in Table 4.This tool uses "if" and "then" to unfold related items [62].

Market basket analysis 15
Process mining 2 This algorithm gives access to knowledge via mining event logs to analyze system processes [31].

Fraud detection 16
Fuzzy logic 2 This algorithm can deal with human reasoning and decision-making processes.

Models for project risk assessment
This table demonstrates that the supervised learning techniques (e.g., neural network, decision tree, support vector machine, and naïve Bayes) have been used more frequently than the unsupervised ones (e.g., clustering, association rules, and fuzzy logic).Thus, it could be stated that supervised learning techniques are better-performing tools than the unsupervised ones in detecting financial fraud.

Classification Framework Based on Fraud Type
Based on the analysis of the reviewed articles in this area, it is possible to classify financial fraud at a high-level into four major categories, namely, financial statement fraud, bank fraud, insurance fraud, and other related financial fraud (Table 5).The table shows the number of articles found in each type of financial fraud while the small pieces of pie chart represent those numbers in percentages.It is evident that financial statement fraud and bank fraud constitute the largest portion (63%)this percentage corresponds to 41 articles out of the 65 reviewed articles.6 further classifies and provides in-depth analysis by indicating the frequency of the sub-categories of financial fraud types.Bank fraud is subcategorized into credit card fraud, money laundering, and fraudulent bank account while insurance fraud is subcategorized into healthcare fraud, auto fraud, and corp fraud.The proposed classification framework can work as a reference in guiding financial fraud detection research through providing the help to scholars in identifying the demanding areas that need more attention.This framework can also provide industry professionals an index to select the appropriate data mining technique for a specific context of financial fraud.For example, firms that suffer from credit card fraud, they have an option of employing any of the supervised learning tools (i.e., naïve Bayes, decision tree, neural network, and SVM) and it is recommended to go with the most frequent used technique; decision tree.As noted, this selection is based on the fraud context and data mining technique frequency but it can be also based on performance (Table 2).(2008, 2009, 2010 and 211) account for more than a half of publications in financial fraud detection.This high rate of publications reflects a serious growth in financial fraud across industries during these years.In particular, there had been a dramatic increase of the published papers during 2011.This increase seemed to be a natural response to the surge of fraud activities in that year; a 13% increase of financial fraud in 2011 compared to the previous year [60].Also, abc NEWS (2012) indicated that the year of 2011 is considered the worst year for financial fraud on record [11].

Limitations and Conclusion
This review has some limitations.First, it does not consider all sub-categories of financial fraud, i.e., advanced-fee fraud that targets a very large number of people who looks for "workfrom-home" opportunity.This fraud deceives people to pay a fee in advance so that they get the offer but once the fee is collected, they do not realize the expected benefits.Second, a decade review may not be sufficient to address this growing problem as it started when the business started.Third, the 65 articles explored may not reveal the entire story of data mining usage in the domain of financial fraud; several online databases need to be included in the sample for more powerful presentation and analysis.
However, it is crucial to have a wide-ranging review on detecting financial fraud in order to increase the understanding and to expand the knowledge of this area among researchers and professionals.This review sheds light on different valuable aspects of financial fraud detection:  It provides a fast and easy-to-use source either for scholars or practitioners who are interested in the topic. It shows the importance of the investigated data mining techniques in the domain of financial fraud by presenting their frequency, usage percentage, and other general business applications.Although it is notable that logistic regression, decision tree, SVM, NN and Bayesian networks have been widely used (> 50%) to detect financial fraud, they are not always associated with the best classification results. This review provides high-level and detailed classification frameworks of financial fraud.
The high-level framework includes four major types -financial statement fraud, bank fraud, insurance fraud, and other related financial fraud.The detailed framework sub-classifies bank fraud to credit card fraud, money laundering, and account bank fraud and sub-classifies insurance fraud to healthcare fraud, auto fraud, and corp fraud.Combining the two frameworks into a single integrated catalog scheme can help to classify any new type of financial fraud.However, it is apparent that financial statement fraud has been the most examined type in this area.Thus, it is necessary for business firms to be more cautious when they audit or process their financial statements. This paper emphasizes the huge increase of research conducted to address financial fraud in the years of 2008, 2009, 2011 and 2012.These four years account approximately for more than 50% of the publications in the 10-year period.More notably, the amount of research increased by 42% in 2011 compared to the previous year. Considering the country distribution table, it is possible to conclude that the countries (United States, Taiwan, China and Spain) that collectively had published 65% of the total articles on this topic, are being more exposed to it.In particular, the United States accounts for more than one-third (35%) of the papers published in this area.
In sum, the highlighted aspects through this review can provide organizations with useful information regarding the various types of financial fraud and data mining techniques available to them.Organizations may be able to select the most suitable technique once considering its particular usage context, frequency, and performance.This could lead to achieving a higher level of accuracy in detecting financial fraud.Besides this benefit, researchers can take advantage of knowing the most frequent used methods and in which context so that they can develop a research project to either investigating such method in a different context or suggesting a new innovative method in a similar context.However, the primary contribution of this paper is twofold; the first is to provide an up-to-date and comprehensive analysis of this crucial topic as an extension to Ngai et al.'s review.The second is to provide scholars and practitioners with an excellent source of data mining applications used in financial fraud for their fast access and use.
Table 1 presents the 65 examined articles in chronological order.From the table, we can determine what methods are being frequently implemented for which case of financial fraud and what method can work best across fraud types.For example, the logistic model can help in detecting financial fraud in automobile insurance, corporate insurance, financial statement, and credit card but it can be considered the best-performing method in the context of corporate insurance fraud.

Table 3 :
The number of articles for detecting financial fraud by countries

Table 4 :
Most used data mining methods, their usage frequency, description and general business application

Table 5 :
Classification of fraud types examined by data mining methods in one decade 21This type of fraud is prevalent in today business world and one of the biggest challenges faced by managers and investors.It is basically the act of intentional or irresponsible conducts and conveys deception or misrepresentation; this produces materially misleading

Table 6 :
[69]her break-down for fraud types with corresponding data mining techniques Whoever knowingly executes, or attempts to execute, a scheme or artifice-(1) to defraud a financial institution; or (2) to obtain any of the moneys, funds, credits, assets, securities, or other property owned by, or under the custody or control of, a financial institution, by means of false or fraudulent pretenses, representations, or promises"[10].Bank fraud is sub-categorized here into credit card fraud, money laundering, and fraudulent bank account.Insurance fraud 14 This term is broadly labeled as insurance abuse, especially in practice[69].Insurance fraud includes here auto insurance fraud, healthcare insurance fraud, and corp insurance fraud.

Table 7 and
Chart 1: Yearly distribution of the articles on detecting financial fraud

Table 7 and
Chart 1 above highlight the yearly distribution of the 65 articles across the 10year period.The gray highlighted years