THOMPSON RIVERS UNIVERSITY

Credit Risk Modeling
A Comparative Analysis of Artificial and Deep Neural Networks

by
Marriappan Vasudevan

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE DEGREE OF MASTER OF BUSINESS ADMINISTRATION

KAMLOOPS, BRITISH COLUMBIA
APRIL, 2020
Supervisor: Dr. Mohammad Mahbobi

Committee Members
Dr. Salman Kimiagiri
Dr. Li Zhang
Dr. Jabed Tomal

© Marriappan Vasudevan, 2020

Abstract
Credit risk assessment plays a major role in the banks and financial institutions to prevent
counterparty risk failure. One of the primary capabilities of a robust risk management system
must be detecting the risks earlier, though many of the bank systems today lack this key
capability which leads to further losses (MGI, 2017). In searching for an improved
methodology to detect such credit risk and increasing the lacking capabilities earlier, a
comparative analysis between Deep Neural Network (DNN) and machine learning techniques
such as Support Vector Machines (SVM), K-Nearest Neighbours (KNN) and Artificial Neural
Network (ANN) were conducted. The Deep Neural Network used in this study consists of six
layers of neurons. Further, sampling techniques such as SMOTE, SVM-SMOTE, RUS, and
All-KNN to make the imbalanced dataset a balanced one were also applied. Using supervised
learning techniques, the proposed DNN model was able to achieve an accuracy of 82.18% with
a ROC score of 0.706 using the RUS sampling technique. The All-KNN sampling technique
was capable of achieving the maximum true positives in two different models. Using the
proposed approach, banks and credit check institutions can help prevent major losses occurring
due to counterparty risk failure.
Keywords: Credit Risk, Deep Neural Network, Artificial Neural Network, Support Vector
Machines, Sampling techniques

ii

Acknowledgement

I would like to thank Dr. Mohammad Mahbobi for providing me guidance during my research
and supervising my thesis. I would also like to thank my thesis committee members Dr. Salman
Kimiagari, Dr. Li Zhang from School of Business and Economics, and Dr. Jabed Tomal from
Faculty of Science, and Dr. Maryam Darvish from Department of Operations and Decision
Systems, University of Laval for providing their thoughtful insights, reviewing my work and
providing thoughtful comments on my thesis.
I would like to take this opportunity to thank Thompson Rivers University and School of
Business and Economics for providing me with this opportunity to pursue my Master of
Business Administration with the thesis. I would like to thank Heidi Milovick, Catherine
Dallaire, Monica Macaulay, and Shelley Lee for their continued support and guidance
throughout my MBA journey at TRU.
I would also like to thank all the professors at TRU with whom I have taken classes and who
have been instrumental in helping me complete this program.
I’m humbled and blessed with so many friends that I have made during the time I have spent
in TRU and Kamloops who have always been on cheer and supported for me at every stage of
my career and this journey
Lastly, I would like to thank my parents for their love and support throughout my program
journey.

iii

Table of contents

Abstract .................................................................................................................................... ii
Acknowledgement .................................................................................................................. iii
Table of contents .................................................................................................................... iv
List of tables............................................................................................................................ vi
List of figures ........................................................................................................................ viii
List of Abbreviations ...............................................................................................................x
1.0 Introduction ......................................................................................................................1
1.1 Background .....................................................................................................................1
1.2 Rationale for the Study ..................................................................................................2
1.3 Organization of Thesis ...................................................................................................4
2.0 Classification Techniques and Approaches .....................................................................5
2.1 Supervised and Unsupervised Learning ......................................................................5
2.2 Support Vector Machines with Sigmoid and RBF Kernel .........................................6
2.3 K- Nearest Neighbours...................................................................................................8
2.4 Artificial Neural Network (ANN) .................................................................................9
2.5 Deep Learning Architectures ......................................................................................11
3.0 Literature Review ...........................................................................................................14
3.1 Credit Risk Assessment with SVM .............................................................................14
3.2 KNN in Credit Risk Assessment .................................................................................17
3.3 Artificial Neural Networks in Credit Risk Assessment ............................................18
3.4 Deep Learning Models in Credit Risk Assessment ...................................................20
iv

4.0 Methodology .....................................................................................................................26
4.1 Software Used ...............................................................................................................26
4.2 Dataset ...........................................................................................................................27
4.3 Sampling techniques ....................................................................................................31
4.4 Performance Evaluation ..............................................................................................32
4.5 Overall Framework ......................................................................................................35
5.0 Results and Analysis ........................................................................................................38
5.1 Preliminary Analysis ....................................................................................................38
5.2 Feature Selections .........................................................................................................42
5.3 Model Analysis – Confusion Matrix with 10 features...............................................45
5.4 Performance Metrics Analysis ....................................................................................49
5.5 Confusion Matrices with 23 features ..........................................................................62
5.6 Performance Metrics with 23 features .......................................................................66
6.0 Implications and Conclusion...........................................................................................78
6.1 Policy Implications regarding the use of machine learning in Canada...................78
6.2 Future Work .................................................................................................................79
6.3 Key Contributions ........................................................................................................80
6.4 Practical Insights ..........................................................................................................81
6.5 Conclusion .....................................................................................................................82
References ...............................................................................................................................83

v

List of tables
Table 1.0 Functions and Parameters of SVM used in this study .............................................. 7
Table 2.0 Literature Review Gap - SVM and KNN ............................................................... 23
Table 3.0 Literature Review Gap - ANN and DNN ............................................................... 24
Table 4.0 Literature Review Gap - Deep Neural Network ..................................................... 25
Table 5.0 Software used for models in this study ................................................................... 27
Table 6.0 Features of the dataset used in this study................................................................ 30
Table 7.0 Confusion Matrix as used in this study................................................................... 33
Table 8.0 Imbalanced Dataset ................................................................................................. 38
Table 9.0 Descriptive Statistics - Age, Sex, Education and Marriage .................................... 38
Table 10.0 Descriptive Statistics - Payment status of six months .......................................... 41
Table 11.0 Descriptive Statistics - Amount of Bill Statements over 6 months ...................... 41
Table 12.0 Descriptive Statistics - Payment Amounts over 6 months .................................... 42
Table 13.0 Logistic Regression Results .................................................................................. 44
Table 14.0 Confusion Matrix - DNN ...................................................................................... 45
Table 15.0 Confusion Matrix - ANN ...................................................................................... 46
Table 16.0 Confusion Matrix - SVM with RBF Kernel ......................................................... 46
Table 17.0 Confusion Matrix - KNN ...................................................................................... 47
Table 18.0 Consolidated Confusion Matrix ............................................................................ 48
Table 19.0 Performance Metrics - DNN ................................................................................. 49
Table 20.0 Performance Metrics - ANN ................................................................................. 52
Table 21.0 Performance Metrics - SVM- RBF Kernel ........................................................... 55
Table 22.0 Performance Metrics - KNN ................................................................................. 58
Table 23.0 Consolidated Accuracies of the Models ............................................................... 61
Table 24.0 Consolidated Balanced Accuracies of the Models ............................................... 61
Table 25.0 Consolidated ROC Scores of the Models ............................................................. 61
Table 26.0 Confusion Matrix with 23 features - DNN ........................................................... 62
Table 27.0 Confusion Matrix with 23 features - ANN ........................................................... 63
Table 28.0 Confusion Matrix with 23 features - SVM with RBF Kernel............................... 64
Table 29.0 Confusion Matrix with 23 features -KNN ............................................................ 64
vi

Table 30.0 Consolidated Confusion Matrix - 23 features ....................................................... 65
Table 31.0 Performance Metrics with 23 features - DNN ...................................................... 66
Table 32.0 Performance Metrics with 23 features - ANN ...................................................... 69
Table 33.0 Performance Metrics with 23 features - SVM - RBF Kernel ............................... 72
Table 34.0 Performance Metrics with 23 features - KNN ...................................................... 75
Table 35.0 Consolidated Accuracies of the Models - 23 features .......................................... 77
Table 36.0 Consolidated Balanced Accuracies of the Models – 23 features .......................... 77

vii

List of figures
Figure 1.0 Illustration of SVM.................................................................................................. 7
Figure 2.0 Illustration of feed-forward neural network .......................................................... 10
Figure 3.0 Architecture of the feed-forward network used in the study ................................. 11
Figure 4.0 Illustration of DNN with Feed-Forward propagation ............................................ 12
Figure 5.0 Architecture of DNN used in this study ................................................................ 12
Figure 6.0 Illustration of Receiver Operating Characteristics ................................................ 35
Figure 7.0 Overall Framework used in this study ................................................................... 37
Figure 8.0 Age versus default Payment .................................................................................. 39
Figure 9.0 Sex versus default payment ................................................................................... 39
Figure 10.0 Marriage versus default Payment ........................................................................ 40
Figure 11.0 Education versus default payments ..................................................................... 40
Figure 12.0 Plot of features and their relative importance using logistic regression .............. 44
Figure 13.0 Receiver Operating Characteristics - DNN with SMOTE .................................. 50
Figure 14.0 Receiver Operating Characteristics - DNN with SVM SMOTE ......................... 50
Figure 15.0 Receiver Operating Characteristics - DNN with RUS ........................................ 51
Figure 16.0 Receiver Operating Characteristics - DNN with All-KNN ................................. 51
Figure 17.0 Receiver Operating Characteristics - ANN with SMOTE .................................. 53
Figure 18.0 Receiver Operating Characteristics - ANN with SVM SMOTE ......................... 53
Figure 19.0 Receiver Operating Characteristics - ANN with RUS ........................................ 54
Figure 20.0 Receiver Operating Characteristics - ANN with All-KNN ................................. 54
Figure 21.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
SMOTE ................................................................................................................................... 56
Figure 22.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
SVM SMOTE ......................................................................................................................... 56
Figure 23.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
RUS ......................................................................................................................................... 57
Figure 24.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
All-KNN ................................................................................................................................. 57
Figure 25.0 Receiver Operating Characteristics - KNN with SMOTE .................................. 59
Figure 26.0 Receiver Operating Characteristics - KNN with SVM SMOTE ......................... 59
viii

Figure 27.0 Receiver Operating Characteristics - KNN with RUS ........................................ 60
Figure 28.0 Receiver Operating Characteristics - KNN with All-KNN ................................. 60
Figure 29.0 Receiver Operating Characteristics with 23 features - DNN with SMOTE........ 67
Figure 30.0 Receiver Operating Characteristics with 23 features - DNN with SVM SMOTE
................................................................................................................................................. 67
Figure 31.0 Receiver Operating Characteristics with 23 features - DNN with RUS ............. 68
Figure 32.0 Receiver Operating Characteristics with 23 features - DNN with All-KNN ...... 68
Figure 33.0 Receiver Operating Characteristics with 23 features - ANN with SMOTE........ 69
Figure 34.0 Receiver Operating Characteristics with 23 features - ANN with SVM SMOTE
................................................................................................................................................. 70
Figure 35.0 Receiver Operating Characteristics with 23 features - ANN with RUS ............. 70
Figure 36.0 Receiver Operating Characteristics with 23 features - ANN with All-KNN ...... 71
Figure 37.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with SMOTE ...................................................................................................... 72
Figure 38.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with SVM SMOTE ............................................................................................. 73
Figure 39.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with RUS ............................................................................................................ 73
Figure 40.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with All-KNN ..................................................................................................... 74
Figure 41.0 Receiver Operating Characteristics with 23 features - KNN with SMOTE........ 75
Figure 42.0 Receiver Operating Characteristics with 23 features - KNN with SVM SMOTE
................................................................................................................................................. 76
Figure 43.0 Receiver Operating Characteristics with 23 features - KNN with RUS ............. 76
Figure 44.0 Receiver Operating Characteristics with 23 features - KNN with All-KNN ...... 77

ix

List of Abbreviations
ANN...................................................................................................Artificial Neural Network
AI...............................................................................................................Artificial Intelligence
DL........................................................................................................................Deep Learning
DNN..........................................................................................................Deep Neural Network
G-Mean.............................................................................................................Geometric Mean
KNN..........................................................................................................K-Nearest Neighbour
ML..................................................................................................................Machine Learning
RUS.....................................................................................................Random Under Sampling
ROC.....................................................................................Receiver Operating Characteristics
ReLU..........................................................................................................Rectified Linear Unit
SVM....................................................................................................Support Vector Machines
SMOTE................................................................Synthetic Minority Oversampling Technique

x

1.0 Introduction
1.1 Background
Credit risk is known as the probability of an organization or a consumer of financial credit
instruments defaulting on the debt payment obligation, i.e. counterparty failure risk (Basel I,
p.8). There are numerous standardized ways identified by the Basel Committee and Bank of
International Settlements through which the member central banks and regional banks across
the world can mitigate this risk. These techniques include collateralized transactions (Basel II,
p.40), On Balance Sheet Netting (Basel II, p.42), Guarantees and Credit Derivatives (Basel II,
p.42), Maturity Mismatch (Basel II, p.42) and other approaches like collateral against the debt
obligations. Basel Accord II recommends forming credit risk control units (Basel II p.102), a
team internal to the banking operations which can help in maintaining the ratings of the
consumer and thereby maintaining oversight on the overall exposure of the bank to credit risk.
These teams are likely to produce the internal ratings for a given credit approval request
thereby which the banking officials can decisively take actions for the approval of debt or any
kind of financial credit instruments. Although banks do implement these techniques in their
credit risk management procedures, but by predicting these risks during the application process
or prior to the customer request, banks can avert any sort of counterparty failure.
The financial credit instrument that we have used in this study are credit cards which have
become a common form of payment in the last decade for a range of financial transactions. As
per the report published by the Payments Canada (2019) on Canadian Payment Methods and
Trends, of the total payment transactions that took place in 2018, 28% of the transactions were
carried out by credit cards, an increase of 52% from 2017. Data released by the Canadian
Bankers Association on credit card statistics (2018) indicated that the total net dollar value of
transactions carried out by VISA and MasterCard holders exceeded CAD $547 billion in 2018.
There were 75.8 Million cards in circulation for the year of which 0.8% of the card holder’s
were delinquent in credit card payment resulting in more than 600,000 credit card delinquency
cases in 2018 alone (CBA, 2018). As the per the Global Payment reports (2019) published by
JP Morgan Chase on the United States, US has a credit card penetration of 2.01 per capita
which are enabled for e-commerce transactions. US Federal Reserve Bank’s Economics

1

Research published the delinquency rate at 2.59% for the Q1 2019 which has been steadily
increasing for the past two years from 2.42% in Q1 2017.
Given the growing trend in payments through credit cards, it can be assumed that the
delinquency rate may increase over the coming years in terms of credit card payments. The
major reason for the increase in the delinquency rate as per St Louis Federal Reserve (2019)
has been due to increased user base of credit cards especially between age group of 18 to 29
years. The delinquency rate among these users in 2019 alone has been 8.05% as per St Louis
Federal Reserve. In order to understand the delinquency, we must take a look at the definition
of default used by banks across the globe. As per the Basel accord II, the definition of default
is as follows (Basel II, p.104,105):
“A default is considered to have occurred with regard to a particular obligor when either or
both of the two following events have taken place.
The bank considers that the obligor is unlikely to pay its credit obligations to the banking
group in full, without recourse by the bank to actions such as realizing security (if held).

The obligor is past due more than 90 days on any material credit obligation to the banking
group.”

Following the definition of default, the delinquency rate for credit card payment obligations is
calculated as defaulters who fail to pay the obligations for more than 90 days. Due to the
limitations in the dataset the complete definition of delinquency may not be implemented in
this study. However, for conducting this study since the credit instruments used has been credit
card, default is considered when the clients fail to make any payment in the next month by due
date. By predicting and identifying credit card customers who might be defaulting in the
payments, banks can avoid major losses occurring due to the credit card defaulters. As per the
Canadian Bankers Association Data on Credit Card Delinquency, the net annualized loss rate
for 2019 alone has been 3.45%.
1.2 Rationale for the Study
According to McKinsey Global Institute (2016) implementing adequate measures with
advanced analytics to detect credit risk and averting further losses, portfolios can reduce up to
2

50% of the cost in the credit risk operations of the business. One of the primary capabilities of
a robust risk management system must be detecting the risks earlier, though many of the bank
systems today lack this key capability which leads to further losses (MGI,2017).

By

implementing and placing a system to check defaulters, banks can avoid losses which will help
save the bank millions of dollars. With reference to our study, these losses would be occurring
due to credit card default payments. This leads us to the rationale behind the study of
developing a model using deep neural network (DNN) architecture which can efficiently help
the banks in identifying the defaulters and thereby helping them save millions of dollars.
Identifying and classifying credit card defaulters using machine learning and advanced
analytics can help banks and financial institutions detect their risk early in the transactions or
in a client’s portfolio based on the data available in the system. This will allow banks and
financial institutions to implement appropriate measures and help them in targeted risk-based
pricing, faster client service without sacrifice in risk levels, and more effective management of
existing portfolios (Bahillo et.al, 2016).
Our major objective is to develop a robust and efficient DNN model with a combination of
specific sampling algorithms based on machine learning techniques. This thesis would then
conduct a comparative study with the already established machine learning techniques like
Support Vector Machines (SVM), K-Nearest Neighbours (KNN) and Artificial Neural
Network (ANN) used in credit risk assessment and the respective literature. These models have
been developed from the understanding of the current literature and techniques already in place
for credit risk identification and classification. To undertake and complete this research we
plan to use datasets that include open-source data sets offered by the University of California,
Irvine database (https://archive.ics.uci.edu/ml/datasets, 2019) available for conducting
researches and developing such models.
Our inspiration for research is based on the recent advancements in the use of artificial
intelligence and machine learning techniques to solve the problems faced by the financial
industry. The probability of default and classification of the defaulters in credit risk assessment
has been widely studied with machine learning techniques but limited with regards to deep
learning techniques. In this thesis, we propose a 6 Layer-DNN Model to study credit risk
assessment. We will be comparing it with techniques like ANN, SVM and KNNs which are
3

some of the widely used models to and predict study credit risk assessment. This thesis will
also include the study of sampling techniques like SMOTE, RUS, SVM-SMOTE, and AllKNN to be used along with the imbalanced dataset and the models.
1.3 Organization of Thesis
The organization of the thesis is as follows. Chapter 2 describes the current classification
techniques used in credit risk research and models used in this study. Established machine
learning models used for the comparative study are explained in detail. DNN architectures
along with our model proposed for this study is introduced in this chapter. Chapter 3 presents
the literature on credit risk assessment along with specific techniques or models used in those
studies. Chapter 4 outlines the methodology used in this study and the process carried out while
conducting the study. The performance evaluation, robustness, and sensitivity analysis carried
out for the models are discussed in detail in this chapter. Chapter 5 presents a comparative
study between the performances of the different models using performance metrics, confusion
matrix, and ROC curve. Chapter 6 concludes the thesis by presenting key results of the
evaluations, further discussion into the policy implications of using the models in real-world
application and future work in incorporating a combination of techniques for credit risk
classification.

4

2.0 Classification Techniques and Approaches
Post-Great Recession (2008-2009), credit risk identification and prevention have
received great importance from managers of the financial institution for issuing debt and line
of credit (Harris 2013). Regulatory developments post-global financial crisis has mandated to
perform complete due diligence on the credit history of the companies and candidates
requesting for the credit line. These regulations have initiated the development of a variety of
techniques under the credit risk scoring model (e.g. Basel III). Financial firm and investment
banks heavily rely on these scoring techniques to identify defaulters so that credit lines are
offered to the most legit ones. One of the earliest risks scoring statistical techniques
discriminant analysis (DA) was developed based on the Fisher's linear discriminant model
(1936) and his seminal paper published on the topic of quantitative techniques to classify
between "good" and "bad" applicants.
In the past few decades, data mining techniques based on supervised learning and
unsupervised learning algorithms have been implemented for classification and default
identification. Data mining is a process of analyzing data using different techniques and by
different dimensions which can then be used in the process of decision making to cut costs, to
identify risk, to improve customer service and to involve many more applications. It ideally
involves finding the relationship between the dependent variable and set of independent
variables or features involved in a given dataset. In this chapter, we take a deeper look into the
established techniques used in the study and introduce our DNN model.
2.1 Supervised and Unsupervised Learning
Data Mining techniques can be classified as supervised learning and unsupervised learning
techniques. The primary difference between supervised learning and unsupervised learning is
that in supervised learning the models are trained using a partial dataset ideally 80% of the
dataset and post which these models are used for prediction and classification problems. In
unsupervised learning however, the step to train the model is skipped and these models are
directly used for solving the problems. Unsupervised learning techniques are much more
complex as compared to the supervised learning techniques given the nature of the problem in
hand.
5

Supervised learning techniques involves the process of modifying and optimizing the
systems so that the desired outputs or targets are detected for a given range of inputs (Reed &
Marks, 1999). It involves training the model which can also be termed as the process of
adaptation through which the models can learn the relationship between the inputs and outputs.
It involves an external entity termed as an external “teacher” (Reed & Marks, 1999) which
helps in specifying the output for a given set of input variables. In some machine learning
literature directed data mining is termed as supervised learning which involves classifications,
prediction, and estimation (Hamori, 2014) whereas undirected data mining techniques are
termed as unsupervised learning which involves affinity grouping and clustering
(Schmidhuber, 2014). This thesis utilizes supervised learning techniques. These techniques
include SVM with RBF Kernel, KNNs, ANN and DNNs.
2.2 Support Vector Machines with Sigmoid and RBF Kernel
Support Vector Machines (SVM) are one of the prominent binary classification
machine learning models utilized to resolve the problem of classification especially if the
dataset consists of binary features (T. Harris, 2013). Support Vector Machines. SVM were first
developed by Vapnik & Cortes in 1995 which attempts to find the optimal separating
hyperplane between the classes by maximizing the class margin (T. Harris, 2013). The model
can be depicted as in Figure 1.0. The points lying on the boundaries of the hyperplane are
called support vectors. The optimal hyperplane is found by maximizing the width of the
margin. Figure 1.0 shows the margin as the distance between the separating hyperplane
between the positive class and the negative class.
The optimization function in the SVMs for finding the optimal hyperplane is carried
out by functions called kernel functions. These functions play a similar role in finding an
optimized solution similar to an optimization problem. For this thesis, the Radial Basis
Function (RBF) is used as a kernel function. RBF reflect SVMs with exponential functions
whereas Sigmoid functions are taken as a function of the tangent to the input parameters.

6

Source: T. Harris, 2013

Figure 1.0 Illustration of SVM
Table 1.0 indicates the functional form of SVM involved in the study along with
parameters and default values.
Table 1.0 Functions and Parameters of SVM used in this study
Kernel Functions

Functional Form

Radial Basis Function

K(xi.xj)=exp(-𝛾𝛾||xi-xj^2)

Parameters
𝛾𝛾 ∈ 𝑅𝑅

Default Values
𝛾𝛾 = 1

Source: Khemakhem & Boujelbène, 2015

SVM works on the optimization of the margin between the hyperplane. For a set of training
instances say {(𝑥𝑥1 , 𝑦𝑦1 ), … … … … . . (𝑥𝑥𝑛𝑛 , 𝑦𝑦𝑛𝑛 )} where 𝑥𝑥 ∈ 𝑅𝑅 𝑛𝑛 , 𝑦𝑦 ∈ {−1,1} where y is the class

label for the dependent feature in a binary classification problem as in this study. In a binary

classification problem, SVM attempts in finding a classifier 𝑓𝑓(𝑥𝑥) which in turn minimizes the
misclassification rate. The 𝑓𝑓(𝑥𝑥) is the hyperplane which can be represented as 𝑓𝑓(𝑥𝑥) =

𝑠𝑠𝑠𝑠𝑠𝑠(𝑤𝑤 𝑇𝑇 𝑥𝑥 + 𝑏𝑏). This function in training results in the convex quadratic optimization problem.
7

The convex optimization problem can be rewritten in a dual quadratic programming problem
form using the Lagrangian functions as below.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑊𝑊(𝛼𝛼) = 1/2 ∑𝑛𝑛𝑖𝑖=1 ∑𝑛𝑛𝑗𝑗=1 𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦�𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 �

(1)

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑡𝑡𝑡𝑡
𝑛𝑛

� 𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 = 0 ∀𝑖𝑖 ∶ 0 ≤ 𝛼𝛼𝛼𝛼 ≤ 𝐶𝐶
𝑖𝑖=1

Here 𝛼𝛼 is the Lagrange multipliers and C is the tradeoff between the maximum margin and

misclassification error. The term 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 ) represents the kernel functions which are used to

map linearly non-separable instances into a higher dimensional space. The kernel used in the
study is represented in Table 1.0.
2.3 K- Nearest Neighbours
Nearest Neighbour algorithm has been one of the majorly studied algorithms with

respect to classification problem. The algorithm was first introduced by Fix & Hodge in 1951
with their seminal paper on ‘Discriminatory analysis, nonparametric discrimination’. The
researchers were the first ones to establish the rules of the Nearest Neighbour and how the
algorithm identifies the nearest neighbors using Euclidean distance. Cover & Hart introduced
the nearest neighbor algorithm for pattern classification in 1967 and identified how the K-NN
algorithm can fit into a broader applications of classification problems.
KNN was introduced by Altman N. S in 1992 as a nonparametric method for pattern
recognition and classification. This algorithm also belongs to the class of supervised learning
techniques as the algorithm requires to be trained before the actual application of the algorithm
on a give set of independent features. It is also one of the standard machine learning methods
which can be extended and applied for large scale data mining problems (Nadkarni, 2016). The
algorithm uses the common principle that in a given dataset, similar objects or features exist
within the proximity of one another.
Being a non-parametric classification technique, KNNs can be used for non-linear
datasets like credit risk assessment. In this thesis, the K-NN algorithm is used as a classification
technique to identify the default payments in the dataset. Parameter tuning is key relative to
the K-NN model. One of the most important parameters to be identified for K-NN is the
8

number of nearest neighbors. Using the trial and error method, we have tuned our nearest
neighbor to be 10 based on the understanding of overfitting and underfitting the model.
Overfitting the model means using excessive data points to fit the data onto the model which
results in plain memorization of the datapoints by the model (Massaron & Boschetti, 2016,
p.94) and thereby can provide incorrect measurements for the model prediction. Underfitting
on the other hand indicates use of less datapoints or information to fit the model thereby not
utilizing the complete information for training the model accurately.
2.4 Artificial Neural Network (ANN)
ANNs consist of neurons that are similar to those of human neurons. These neurons form
a single functional unit in the layer of networks. The ANN can consist of one to many layers
making them easily programmable algorithms to be studied in the field of computer science.
The mathematical model of a neuron was proposed by McCulloch & Pitts in 1943. The neuron
proposed by McCulloch & Pitts in 1943 consisted of binary input, binary output, and single
activation function. Stacking multiple neurons with a given set of input variables and
connecting them with different weights and activation functions provides us with ANNs or
simply neural networks. The most common form of the neural network is known as the feedforward network where the information from the input variables is carried forward linearly
through cross-connected neurons as the middle layers and finally towards the desired output
layer. These networks are termed as “feed-forward” as the information flow in only one
direction without any feedback loops or back into the hidden layers

9

Source: Retrieved from https://www.extremetech.com

Figure 2.0 Illustration of feed-forward neural network
Over the past few years with the help of advanced programming languages, neural network
research has led to several other architectures like error back-propagation neural networks,
recurrent neural networks and convolutional neural networks which is a widely implemented
neural network in the image processing and image recognition technologies. The ANN in this
study has been influenced by the work of Khemakhem & Boujelbene, 2017 where they used
an ANN to conduct a credit risk assessment. The ANN used in this thesis consists of 4 layers
which are as follows:
Layer 1: Input Layer consisting of 10 neurons representing the 10 input variables
Layer 2: A hidden layer consisting of 16 neurons
Layer 3: A hidden layer consisting of 10 neurons
Layer 4: An output layer consisting of a single neuron.
This thesis uses Rectified Linear Unit (ReLU) as the activation function for the neurons
with a feed-forward neural architecture as explained above. The hidden layer neurons were
10

optimized throughout this study for better accuracy and classification results through the trial
and error method. The choice of neurons in the hidden layer were decided by a common
assumption to form a tunnel architecture in the network topology of the neural networks as to
reduce the error rates in the neural networks. Combined with this assumption and using
multiple trials for avoiding overfitting of the models the neurons were appropriated at 16 and
10 for the hidden layers in the ANN architecture. Similar method was carried out for finalizing
the architecture of the DNN model. We have used the binary_crossentropy as our loss function
and Stochastic Gradient Descent as our optimizer for the neural network model.

Figure 3.0 Architecture of the feed-forward network used in the study
2.5 Deep Learning Architectures
DNNs consists of multiple layers of neural networks and works on a similar line of ANN.
They form a part of the larger family of deep learning architectures which also consists of Deep
Recurrent Neural Network, Deep Belief Network, and Deep Convolutional Neural Networks.
Figure 4.0 presents an idea of a DNN with 3 hidden layers. DNN architectures for broader
applications can include N-different hidden layers depending upon the optimization of the
model and problem being solved using DNN.

11

Source: Retrieved from http://neuralnetworksanddeeplearning.com/chap5.html

Figure 4.0 Illustration of DNN with Feed-Forward propagation

Figure 5.0 Architecture of DNN used in this study
The DNN used in this thesis consists of 6 layers which are as follows:
Layer 1: Input Layer consisting of 10 neurons representing the 10 input variables
Layer 2: A hidden layer consisting of 30 neurons

12

Layer 3: A hidden layer consisting of 25 neurons
Layer 4: A hidden layer consisting of 15 neurons
Layer 5: A hidden layer consisting of 10 neurons
Layer 6: An output layer consisting of a single neuron.
This thesis uses Rectified Linear Unit (ReLU) as the activation function for the neurons
with a feed-forward neural architecture as explained above. The hidden layer neurons were
optimized throughout this study for better accuracy and classification results through the trial
and error method. To reduce the loss function, we have used the binary_crossentropy and we
have used Stochastic Gradient Descent as our optimizer for the DNN model.

13

3.0 Literature Review
In this chapter, a detailed literature review of the studies in the field of credit risk
assessment is presented. Section 3.1 outlines the studies conducted with SVM and comparison
with other methods. Section 3.2 discusses in detail the studies conducted with ANN. The
following section 3.3 discusses the latest research in the credit card default detection
techniques and outlines literature on DNNs.
One of the earliest risks scoring statistical techniques, discriminant analysis (DA) was
developed based on the Fisher's linear discriminant model (1936) and his seminal paper
published on the topic of quantitative techniques to classify between "good" and "bad"
applicants. Post-1980, the DA techniques were replaced by statistical techniques like linear
regression, logistic regression and early stage base classifier likes nearest neighbours, decision
trees that provided significant results provided the data were linearly separable, however, if the
data sets are not linearly separable then these techniques have proved to be insufficient for
credit risk analysis (S. Chen et al, 2011). In the past decade, researchers and analyst have
shifted their focus on ANNs and machine learning techniques to classify the defaulters from
non-defaulters where the datasets are not linearly separable. Some of the non-linear numerical
methods for classification included ANN, SVM and maximum likelihood model proposed by
Standard & Poor's Risk Solutions Group (S. Chen et al, 2011). Khemakhem & Boujelbène
(2015) studied the difference between DA and ANN on Tunisian companies and established
the fact that neural network (NN) models are more accurate in terms of predictability. They
criticized NN models for being less robust and less well-founded terming them a black box of
unknown operating rules as NN models are unable to explain the results provided by them.
3.1 Credit Risk Assessment with SVM
In the past few years, kernel-based vector algorithms derived from the statistical
learning theory by Vapnik (1998) have come into a wide variety of research for classification
problems and one of them is SVM. SVM are one of the latest machine learning techniques
used in the finance industry to classify defaulters and non-defaulters based on their credit and
financial history. SVM falls under the category of supervised machine learning techniques

14

which can be used for classification or regression problems but often these techniques are used
for classification problems.
L. Yu et al (2010) studied credit risk evaluation using SVM with a multiagent ensemble
learning system They used credit card applicants from British financial service companies and
increased the bad applicants to match the level of good applicants. This allowed them to
perform their study on the balanced dataset. As per L. Yu et al (2010) Multiagent system with
SVM outperformed Logistic Regression, Quadratic Discriminant Analysis, and Feed-forward
neural network but lagged with Multiagent Feedforward Neural Network model. Their study
did not include any kind of sensitivity analysis or robustness test with the model which would
have helped in understanding the application of the models. Their study also lacked in
explaining the implications of using such models on credit risk evaluation and future
applications.
S. Chen et al (2011) studied the bankruptcy of German firms using SVM with a
Gaussian Kernel. They used 28 different financial ratios for the firms that went bankrupt
between 1996 to 2002 and used these ratios as features for the algorithm. S. Chen et al (2011)
identified that SVM outperforms logit in terms of classification problems especially in the case
of linearly non-separable datasets. Their datasets consisted of 20,000 solvent firms and 1,000
solvent whose financial statements were extracted from the database Creditreform. S. Chen et
al (2011) did perform sensitivity analysis using the parameters of the SVM but overlooked the
imbalanced dataset they used for the study.
J.-H. Trustorff et al (2011) conducted a similar study using least squares SVM and
logistic regression models. They chose 5 debt ratios to identify the credit risk of the companies
and in total studied 78.000 companies using these ratios. One of the major outcomes of this
study was that SVM perform well under small training samples with high variance in the input
data (J.-H. Trustorff et al, 2011). Both J.-H. Trustorff et al (2011) and S. Chen et al (2011)
have overlooked the imbalanced dataset they used in the study. To overcome this problem in
our study we have used over-sampling and under-sampling techniques which will be explained
in detail in the next chapter.
Wang & Ma (2012) used a hybrid ensemble approach for detecting enterprise credit
risk assessment. They used financial records of 239 companies provided by the Industrial and
15

Commercial Bank of China. The method involved Bagging and Boosting techniques along
with Linear and Polynomial SVM kernel. However, the dataset used in this study was much
smaller in comparison to other datasets used in the study. Lack of applications of the
methodologies to a large dataset was one of the shortcomings of this research.
Harris studied credit risk assessment in 2013 and in 2015 which is of particular interest
to us. These two studies involve the use of SVM in credit risk assessment. T. Harris (2013)
conducted a study on credit risk assessment based on default definitions as given by the Bank
of International Settlements and Base Committee. His study argued that using “narrow” and
“Broad” definitions of defaults based on the number of days past due payments, credit risk
evaluations could be improved using quantitative credit risk models. His methodologies,
however, lacked in providing clear applications of the credit risk models along with any
sensitivity analysis of the models. His study in 2015 involved the application of clustered
SVM proposed by Gu and Han (2013) and compared it with techniques like logistic regression,
decision trees and a combination of other techniques. In this study, he used German Credit
Dataset provided by UCI Machine learning repository and Barbados credit union dataset.
Cao et.al (2013) proposed a novel model-based of cost-sensitive SVM (CS-SVM)
enhanced by particle swarm optimization technique (PSO) for loan default discrimination.
Their research improved the SVM model integrating with cost sensitivity and PSO increasing
the accuracy of the output but their model was applied as a binary classification technique to a
specific bank data thereby limiting the application of the model for a wider dataset. Limitation
of the model application on the wider dataset places the question of efficiency and scalability
on the model used by Cao et al (2013) and suggested for further research on multi-class multifeature classification clustering models for shortcomings in their research.
Paulius Danenas & Gintautas Garsva have studied the application of SVM in credit risk
assessment in different scenarios and using different combinations of kernel functions. One of
their recent research (Danenas & Garsva, 2015) on credit risk assessment was completed by
SVM with particle swarm optimization as used by Cao et al (2013). They also utilized financial
ratios as the input features for the credit risk assessment. In their research, they used the
Zmijewksi score (Z-score) as a binary output feature with companies scoring greater than zero
i.e. Z > 0 to be labeled as bankrupt. They compared the measurements of the model with
16

logistic regression and RBF based network classifiers. Limitations on the stability of particle
swarm optimization-based SVM were one of the major lacking points of their research. The
model didn’t outperform linear SVM models as used by other researchers in the credit risk
assessment.
Based on the literature presented above, SVM has been one of the prominently studied
models in credit risk assessment. This makes it one of the ideal models to be involved in the
study and conduct comparative research with the DNN Model presented in this study.
3.2 KNN in Credit Risk Assessment
Henley & Hand (1996) studied K-nearest neighbor as a classifier for credit risk scoring
techniques by considering the bad risk rate as part of their research. The authors identified that
K-NN performed well in identifying the bad risk rate and was able to perform well in
comparison to decision trees, logistic and linear regression. The dataset used by Henley &
Hand (1996) was fairly balanced with over 54% of the dataset consisting of credit risk and
involved 16 features. The researchers were able to reduce the bad risk rate up to 40%. Although
the research was carried in the early developmental stages of machine learning techniques, the
researchers didn’t give a detailed performance metrics of the models studied and further
application of the model in the credit risk assessment. Post their study as per our knowledge
based on the research for literature review, K-NN’s application was not studied until the early
2000s.
Marinakis et. al (2008) studied the nearest neighbor classifier using metaheuristic
algorithms for credit risk assessment using loan portfolios of 1411 firms from Greek
Commercial Bank. The authors used 16 different financial ratios including profitability,
solvency and managerial performance ratios. The dataset used had 218 firms with default class
whereas 1193 firms were non-default class (Marinakis et. al, 2008) making it an imbalanced
dataset but their research didn’t involve any techniques to make the imbalanced dataset a
balanced one. Using the metaheuristics algorithms some of the models were able to achieve
more than 98% accuracy with an overall average of between 94% to 97% percent.
Abdelmoula (2015) studied the Tunisian bank credit risk using the K-NN algorithm with 3
nearest neighbor parameters. The dataset consisted of 924 credit records between 2003 to 2006
held by a Tunisian commercial bank (Abdelmoula, 2015). Abdelmoula (2015) was able to
17

obtain an accuracy of 88.63% with over 95% in terms of ROC score. The author used over 24
financial and non-financial ratios as features of the study with cash flow and non-cash flow
models. Abdelmoula (2015) also used Type 1 and Type 2 error rates as credit risk and
commercial risk to identify whether the models are able to cover these error rates which would
help the banks in making efficient risk management decisions. Type 1 error rate indicates the
rate of default customers being categorized as non-default customers and Type 2 error indicates
the rate of non default customers being categorized as default customers(Abdelmoula,
2015).With respect to methodology, although the author used ROC as the main performance
metric, there was no discussion regarding the dataset’s imbalanced nature. It would have been
highly possible that the dataset involved may have been imbalanced and thereby the research
lacked any techniques to improve the dataset. Being said that to the best of our knowledge
while conducting this research Abdelmoula’s (2015) research is one of the high-quality
researches in the use of K-NN with respect to credit risk assessment.
Although K-NN is one of the base classifiers and highly popular machine learning
techniques, there hasn’t been much application of different types of K-NN in the credit risk
assessment. This knowledge discovery comes as a collateral finding as a part of this research.
3.3 Artificial Neural Networks in Credit Risk Assessment
Khashman (2010) built a credit risk evaluation system using three different neural network
models using 24 numerical attributes and implemented it with nine different learning schemes.
From 27 different learning models, he chooses 3 learning models which provided an error rate
of less than 0.008 which does indicate that efficient models require iterative regression
procedures to deliver accurate risk evaluation techniques. These three models delivered an
overall accuracy rate of 83.6% but the research lacked in multiple points like feature selection
procedures as in how the clients were chosen for the training and validation procedures. All
three models used only one hidden layer in their design whereas the latest research focuses
more on multiple hidden layers to enhance the results and achieve better accuracy.
Cimpoeru (2011) introduced the concepts of neural calculus and studied the concept of
error backpropagation techniques. The author of this research focused on multiple models like
feedforward networks with multiple layers, adaptive networks based on fuzzy algorithms and
SVM’s. Cimpoeru (2011) conducted a study on Romanian small-medium enterprises whose
18

turnover was between EUR 700,000 and EUR 3,755,000. The research was conducted on 2%
of the total population as sample and input variables were financial ratios based on the data
available. Although the research conducted was extensive but the research lacked clearly
outlining the application of these models in real-time datasets and what can be done to improve
the efficiency of the models.
Karaa et.al (2012) conducted a similar study by comparing SVM and NN models and
established the superiority of NN models over SVM. The researchers focused mainly on the
historical datasets of the companies and their financial ratios. The authors didn’t mention if the
dataset was imbalanced and any use of sampling techniques in the research. They achieved
accuracy of 90.2% accuracy with NN model and Type 1 error rate at 18.55%. They also
indicated their comparative results between DA and logistic regression techniques which
proved that logistic regression is a better model in resolving classification problems.
Oreski et.al (2012) investigated the extent of the impact that total data from a single
bank has on the genetic algorithms based neural network (GA-NN) for credit risk assessment.
Their primary study was based on the subject of feature engineering and feature selection
through hybrid models of genetic algorithms which helps in better feature selection for data
processing and evaluation as compared to other models. Using the same hybrid models in both
places i.e. in feature extraction and in the data-processing has allowed the researchers to get
better accuracy as compared to using different models in different places. Although the
research was carried out and performed with far better accuracy genetic algorithm-based neural
network (GA-NN) are computationally intensive techniques as per the researchers and the
feature selection process takes a longer duration of time to complete. Implementing this
technique in the banks will definitely require optimization of the models and the internal
parameters as well because each bank uses a different set of ratios to determine the credit risk
assessment of the clients. Even though the accuracy rate of 82.30% was achieved using these
models it can be improved using some of the advanced artificial intelligence techniques like
SVM and DNN Models. Moreover, the limited application of this model due to technologyintensive requirement necessitates the study to be improved and provide better models for realworld applications.

19

Khemakhem & Boujelbène (2015) studied the difference between DA and ANNs on
Tunisian companies and established the fact that neural network (NN) models are more
accurate in terms of predictability but they criticized NN models being less robust and less
well-founded terming them black-box operating rules as NN models are unable to explain the
results provided by models used in the study of Tunisian companies. ANN although in many
cases provided better results (Oreski et.al 2012, Khemakhem & Boujelbène 2015) as compared
to linear models in classification, it has been criticized for being vulnerable to multiple minima
problems as OLS and MLE were (S. Chen et al 2011). The major reason cited behind this
vulnerability was due to the principle of minimizing empirical risks leading to the poor
classification of sample data sets (Haykin 1998, S. Chen et al 2011). Several researchers in the
past years have done comparative analysis between different models of ANN and ML
techniques to understand the shortcomings and learning to improve the efficiency of such
models. Khashman (2010), Cimpoeru (2011) and Karaa et.al (2012) conducted this kind of
research by comparing different models to understand their impact on the data and the output.
3.4 Deep Learning Models in Credit Risk Assessment
With the advancements in machine learning, development of software languages and faster
processing capabilities of computers, DNN and Deep Learning Architectures have taken center
stage in the study of applications relative to predictions and classifications. Sun & Vasarhelyi
(2018) studied the application of DNN on credit card delinquencies, one of the major
influencers for conducting this study. The credit card applicants from one of the largest banks
in Brazil with over 700,000 credit card applicants and found out that deep learning actually
improves the accuracy of prediction in case of a large dataset. Although they used a novel
approach but lacked in terms of sensitivity analysis and overlooked the imbalanced dataset
they used in the study and did not incorporate any kind of sampling techniques that might have
helped in overcoming the imbalanced dataset.
Hamori et al (2018) studied credit card delinquency using the same dataset as we have used
in this study. Their study involved a comparison of ensemble learning methods along with
Neural Networks and DNNs with Tanh and ReLU activations functions. They identified that
the dataset used was imbalanced and used the approach of normalization rather than sampling
techniques with the dataset. Secondly, their DNN model consisted of only two-layer which
20

ideally falls under the category of neural network and did not include higher number neurons
or layers of neurons as is the case with DNN.
Zhu et al (2018) introduced the use of Relief Algorithm based Convolutional Neural
Network (CNN) in the consumer credit scoring. The researchers used consumer credit data
from a Chinese consumer finance company which consisted of 24,387 data points and over
570 numeric attributes. Out of these 570 numeric attributes, they used 50 attributes concerned
with the consumer credit (Zhu et al, 2018). They compared the results with logistic regression
and random forest which are two completely different sets of statistical techniques and machine
learning algorithms respectively. Their study only included AUC and F1- Measure which
indicates that the dataset used was highly imbalanced whereas their methodology did not
include any data normalization or sampling techniques with the neural network.
H. Kvamme et al (2018) used a convolutional neural network to predict mortgage defaults
from the consumer's account balance. They used a dataset from a Norwegian Bank, DNB
consisting of 20,989 data points with a time series from 2012 to 2016. Their neural network
consisted of 3 hidden layers with ReLU Activation functions with one output layer with a
SoftMax activation function. For overcoming imbalanced dataset problem and overfitting of
the model, they used data augmentation and regularization on the both the CNN models they
used in this research. One of the major critiques of this research would on the selection of data
features and use of only consumer account dataset, not financial transactions data which the
customers carry out in the day to day life.
Bayraci & Susuz (2019) studied DNN-based classification models in credit risk assessment
of Tunisian financial institutions in two separate datasets. For the datasets pertaining to credit
card applicants, to avoid the imbalanced nature of the dependent variable, the researchers used
a random selection of the major and minor classes. They identified that DNN works well with
complex datasets. However, their research lacked in presenting sufficient evaluations of DNN
Models in terms of F1- Measure and AUC, instead they chose to use the Weighted Average
Accuracy rate. Secondly, the researchers didn’t quite specify the activation functions or the
number of layers used in the DNN model used in the research.
From the literature review, it can be observed that several gaps could be outlined. Previous
research on DNN Model has majorly overlooked the sampling techniques that could be
21

implemented along with these models. The evaluation of the models has been limited to
accuracy whereas in the case of the imbalanced dataset it is recommended to use other
measures like F1- Score, G-Mean and AUC – ROC Curve. Limited research has been
completed on comparing the established scoring techniques like SVM with DNN models
which could help us in understand whether DNN models have an advantage or not. Previous
researches have been limited to presenting the outcomes of the models in terms of their
performance, however limited discussion has been presented on the policy implication for the
use of such models in financial institutions.
This thesis aims at filling the gaps in the literature as highlighted in Table 2.0, Table
3.0 and Table 4.0. The methodologies presented in the next chapter will outlay the sampling
techniques that are used to overcome the imbalanced nature of the dataset. Evaluations
techniques like F1-Score, G-Mean along with accuracy, sensitivity, and specificity for the
imbalanced datasets are also presented. This thesis also intends on presenting some of the latest
policies that are formulated or are in place for the use of such models for credit risk prediction
and what could be done better in terms of adopting these models into real-life applications.

22

Table 2.0 Literature Review Gap - SVM and KNN
Author/Authors

Models Used

Dataset

L. Yu et al (2010)

SVM
with
Ensemble learning,
LogitR,
FeedForward
Neural Network
SVM
with
Gaussian Kernel
SVM with Least
Squares
SVM with the
hybrid ensemble
SVM

Balanced
increasing
applicants

S. Chen et al (2011)
Trustorff et al (2011)
Wang and Ma (2012)

Sampling
Techniques
by No
bad

Gap in the Literature

Imbalanced

No

Sampling techniques

Imbalanced

No

Sampling techniques

Sampling techniques

smaller - 239 No
instances
Smaller - 1000 No
instances

Smaller dataset

Danenas & Garsva (2015) PSO-SVM, SVM

Imbalanced,
24000 instances

No

Measurements
for
imbalanced dataset, ROC,
Sampling techniques

Henley & Hand (1996)

KNN

Balanced

No

Performance
measurements

Marinakis et. al (2008)

KNN

Abdelmoula (2015)

KNN

Imbalanced, 1411 No
instances
N/A,
924 No
instances

Harris (2013 and 2015)

Smaller dataset, Sampling
techniques

Sampling
techniques,
Smaller Dataset
No
discussion
on
imbalanced dataset

Literature Gap
Filled by this study
Sampling
techniques

Sampling
techniques
Sampling
techniques
30,000
instances
used in this study
30,000
instances
used in this study,
Sampling
techniques
Sampling
techniques
and
better performance
measurement
techniques
Performance
measurements under
imbalanced dataset
Sampling
techniques
Sampling
techniques

23

Table 3.0 Literature Review Gap - ANN and DNN

Author/Authors

Models Used

Dataset

Gap
in
the
Literature
Performance
measurements like
ROC, F-Measure
N/A

Literature Gap Filled
by this study
Performance
measurements
under
imbalanced dataset
N/A

Khashman (2010)

ANN

24 Attributes
Financial ratios

Cimpoeru (2011)

ANN - Neural
calculus, Error
Back
propagation
techniques
GA- NN

Financial Ratios, No No
discussion
on
dataset's nature

Financial Ratios, No No
discussion
on
dataset's nature

Technology
intensive

Financial Ratios of No
Tunisian
Companies

Less Robust Model

Vasarhelyi DNN
(Layers Credit
Card No
not mentioned)
delinquencies700,000 instances

Overlooked
Imbalanced dataset

DNN model used in this
study (Able to run on
any laptop with 8 GB
ram)
Consistent
results
obtained by ANN and
DNN model used in this
study
Sampling techniques
along with DNN Model

Oreski et.al (2012)

Khemakhem
Boujelbène (2015)

Sun &
(2018)

& ANN

Sampling
Techniques
of No

24

Author/Authors

Models Used

Dataset

Hamori et al (2018)

DNN - 2 Layers

Same Dataset as used No
in this study

Zhu et al (2018)

CNN

H. Kvamme et al (2018)

Bayraci & Susuz (2019)

Sampling
Techniques

Chinese
Consumer No
Finance
Company,
24387
instances,
Imbalanced Dataset
CNN
DNB Bank, 20989 No
instances,
Augmentation
and
Regularization
for
imbalanced nature of
the dataset
DNN (Layers not Tunisian
Financial No
mentioned)
Institutions, Random
selection of major and
minor classes

Gap
in
Literature

the Literature Gap
Filled by this
study
Sampling techniques Sampling
techniques, DNN
model used in this
study has 4 hidden
layers
Sampling techniques, Sampling
No model comparison techniques,
4
different models
used in this study
Sampling techniques Sampling
techniques

Sampling techniques

Sampling
techniques

Table 4.0 Literature Review Gap - Deep Neural Network

25

4.0 Methodology
In this chapter, the methodologies used in the study are discussed in depth. Section 4.1
outlines the software used and how the models are constructed. Section 4.2 outlines the dataset
used in the study. Section 4.3 discusses the sampling techniques to over the imbalanced
datasets as we discussed in the literature review. Section 4.4 describes the evaluation
techniques used to determine the performance of the models. Section 4.5 discusses the overall
framework used in the study.
4.1 Software Used
LIBLINEAR: It is an open-source library developed by National Taiwan University in 2008,
used primarily for large scale classifications (Fan et al, 2008). This software package primarily
supports logistic regression and linear SVM. The package allows developers and common
users with limited knowledge in programming to implement and research about the impact of
classification techniques in several fields.
SCI-KIT LEARN: SciKit is a python based open-source software library distributed under the
BSD licenses. The major focus of the developers of this package has been on the
implementation of the models (Pedregosa et al, 2011). It provides a range of in-build
algorithms for classification, regression, and clustering such as SVM, random forests, gradient
boosting, and KNN along with sampling algorithms.
KERAS: Keras is described as one of the widely used python based deep learning library. This
software package is capable of running along with other higher-end software packages in deep
learning like TensorFlow, Theano and CNTK. Keras supports two kinds of models within its
packages, one consisting of the sequential model which reflects the feed-forward neural
architecture and the second one through functional API (Application programming interface)
for complex models.
TensorFlow: TensorFlow was developed at Google as a part of the research project by Abadi
et al (2016). TensorFlow packages were also developed in python programming language and
released as an open-source software package. As per the white paper published by Abadi et al
(2016), TensorFlow was primarily developed to operate at large scale computing systems and
26

in heterogeneous environments. Over the past years, TensorFlow has gained a lot of traction
in the machine learning research community due to the ease of implementation and advanced
machine learning algorithms
ECLIPSE: For conducting this study, Eclipse has been used as an integrated development
environment and software packages of sci-kit-learn, TensorFlow, Keras has been integrated
into the environment through PyDev-Plugins (Python Development Environment). These
plugins allow for the integration of python-based software packages like sci-kit-learn,
TensorFlow, Keras into Eclipse which based on Java programming language.
The models used in the study were developed by using the above-mentioned packages. The
following table outlines the software package used for the corresponding models.
Table 5.0 Software used for models in this study
Models

Software Used

SVM – RBF Kernel

SciKit-Learn, LIBLINEAR

KNN

SciKit-Learn

Two-layer – ANN

Keras, TensorFlow, SciKit -Learn

DNN

Keras, TensorFlow, SciKit-Learn

Sampling Techniques

SciKit-Learn

4.2 Dataset

The data utilized for the research has been obtained from the University of California,
Irvine Machine Learning repository which is one of the leading databases for research datasets
in artificial intelligence and machine learning. The dataset contains over 30,000 rows of
individual client credit cards with 23 explanatory features. These 23 explanatory features are
outlined in Table 6.0. The explanatory features are based on the 30,000 client’s credit card
transaction that happened between April to September 2005. The response variable or the
dependent variable is ‘default payment next month’ which indicates that the client will fail in
paying any amount to the financial institution in the next month, thereby defaulting in the credit
card payment.
27

For training and testing the models, this study uses a ratio of 80:20 for splitting the
entire dataset randomly using the software package Sklearn. 80% of the dataset has been used
for training the models whereas 20% of the dataset was used for testing the models. The
preliminary analysis of the dataset has been explained in detail in Chapter 5. To particularly
identify the explanatory features contributing towards the probability of default, the dataset
has been kept consistent throughout with the ratios of testing and training datasets. The
following table defines the 24 features of the dataset:

28

Features

Type

Explanation
Amount of the given credit (NT dollar): it includes both the individual consumer credit

LIMIT_BAL

Quantitative

and his/her family (supplementary) credit.

SEX

Qualitative

Gender (1 = male; 2 = female).
Education (0=No Education, 1 = graduate school; 2 = university; 3 = high school; 4,5,6

EDUCATION

Qualitative

= others).

MARRIAGE

Qualitative

Marital status (1 = married; 2 = single; 3,0 = others).

AGE

Qualitative

Age (year)
The measurement scale for the repayment status is: -2=No payment required as
BILL_AMT =0, -1 = pay duly; 1 = payment delay for one month; 2 = payment delay
for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine
months and above.

History of past payment
PAY_0

Quantitative

The repayment status in September: 2005.

PAY_2

Quantitative

The repayment status in August: 2005.

PAY_3

Quantitative

The repayment status in July: 2005.

PAY_4

Quantitative

The repayment status in June: 2005.

PAY_5

Quantitative

The repayment status in May: 2005.

PAY_6

Quantitative

The repayment status in April: 2005.

29

Amount

of

bill

statement
BILL_AMT1

Quantitative

Amount of bill statement in September: 2005

BILL_AMT2

Quantitative

Amount of bill statement in August: 2005

BILL_AMT3

Quantitative

Amount of bill statement in July: 2005

BILL_AMT4

Quantitative

Amount of bill statement in June: 2005

BILL_AMT5

Quantitative

Amount of bill statement in May: 2005

BILL_AMT6

Quantitative

Amount of bill statement in April: 2005

PAY_AMT1

Quantitative

Amount paid in September: 2005

PAY_AMT2

Quantitative

Amount paid in August: 2005

PAY_AMT3

Quantitative

Amount paid in July: 2005

PAY_AMT4

Quantitative

Amount paid in June: 2005

PAY_AMT5

Quantitative

Amount paid in May:2005

PAY_AMT6

Quantitative

Amount paid in April: 2005

Quantitative

Output variable/ Response Variable/Dependent variable

Amount

of

previous

payment (NT dollar)

default payment next
month

Source: University of California, Irvine, Retrieved from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients, 2019

Table 6.0 Features of the dataset used in this study

30

4.3 Sampling techniques
As discussed in Chapter 3, one of the gaps in the literature has been the use of sampling
techniques along with the models implemented to study credit risk assessment. Sampling
techniques are used to overcome the problem of an imbalanced dataset and minimize the
impact of such datasets on the final outcome provided by the models. These sampling
techniques can be divided into two namely, over-sampling and under-sampling techniques.
Oversampling techniques helps increasing the minority class to match the majority class
thereby providing balanced dataset. Under-sampling techniques helps in reducing the majority
class to match the minority class.
For this study, the following oversampling and under-sampling have been used for further
analysis of dataset along with models in credit risk assessment.
Over-Sampling techniques:

SMOTE – Synthetic Minority Over-Sampling Technique
SMOTE was first proposed by Chawla et al (2002) in their seminal paper on the technique.
Based on google scholar’s estimation over 9000 papers have cited this research, indicating the
review of this technique over the past two decades. SMOTE is implemented by over-sampling
the minority class and by under-sampling the majority class (Chawla et al, 2002). In this study,
the minority class would be the segment of data with credit card clients defaulted in their
payment and the majority class would be vice versa.
SVM – SMOTE
It is a variant of the SMOTE Algorithm which uses the SVM kernel algorithm for detecting
samples and generating new synthetic samples (Karaa, Cooper and Kamei,2009). Based on our
literature review, SVM-SMOTE has not been used in the literature of credit risk assessment as
researchers prefer to use SMOTE as a form of oversampling and conduct a further comparison.
By using one more method in this study, a comparison between these two oversampling
methods can also be established.
Under-Sampling techniques:
RUS – Random Under-Sampling
31

Random Under Sampling has been one of the widely used under-sampling techniques in
the literature we have reviewed. This technique under-samples the majority class by randomly
picking samples with or without replacement.
All-KNN
All – K Nearest Neighbour (All- KNN) uses a K-Nearest neighbor algorithm to carry out
the under-sampling. This technique has been developed based on the paper published by
Tomek (1976). Based on our literature review, All-KNN under-sampling technique has not
been previously employed to study the effect of this technique on the respective models used
in this study. Using this technique in this study will allow us to establish a comparison between
the two under-sampling techniques which will be used for further analysis.
Although oversampling and under sampling techniques both help in creating a balanced
dataset. These two techniques have their own advantages and disadvantages while used in
conjunction with the machine learning techniques. Oversampling techniques tends to become
computationally intensive due to increase in the datapoints whereas with under sampling its
vice versa. Oversampling helps in increasing the datapoint of the class or dependent variable
which are less in the original dataset also called as minority class. Under sampling techniques
results in loss of information the datapoints of major dependent variable are reduced to match
the minority class where as Oversampling techniques helps in increasing the information at
hand.
4.4 Performance Evaluation
To understand the model’s performance with respect to each other we have outlined the
following metrics for all of them. Since we identify that our dataset may be imbalanced in
nature, we have also included metrics for understanding the performances of the models under
such conditions.
Confusion Matrix:
Confusion matrix has been used widely to understand the segregation of true positives, false
positives, true negatives and false-negative within the study of classification models. For this
study, the confusion matrix defines the default payments and payments occurring in time.
Following tables illustrates the confusion matrix used in this study.
32

Table 7.0 Confusion Matrix as used in this study
Actual Y

Predicted Y
Default Payment(Y=1)

Payment on Time (Y=0)

Default Payment(Y=1)

True Positive (TP)

False Negative (FN)

Payment on Time (Y=0)

False Positive (FP)

True Negative (TN)

Accuracy:
Accuracy of the classification model is the proportion of correct predictions to the total
number of instances or data points used in the prediction. It is given by formula as below. The
values for the accuracy range from 0 to 1 where 0 indicates the least accuracy and 1 indicates
the highest accuracy of classification for positive and negative values.
Accuracy =

(𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇)

(𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹+𝐹𝐹𝐹𝐹)

(2)

Where TP stands for True positives, TN for True Negative, FP for False Positives and FN for
False Negatives.
Sensitivity:
Sensitivity is known as the true positive rate is the proportion of true positives to the total
number of positive instances or positive data points used in the prediction. The values for the
sensitivity range from 0 to 1 where 0 indicates the least sensitivity and 1 indicates the highest
sensitivity and the model is geared towards classifying positive values better.
Sensitivity =

𝑇𝑇𝑃𝑃

(𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹)

(3)

33

Specificity:
Specificity is known as the true negative rate is the proportion of true negative to the total
number of negative instances used in the study.
Specificity =

𝑇𝑇𝑇𝑇

(4)

(𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹)

The above metrics are generally used among all the machine learning and neural network
model evaluation and performance.
In the case of imbalanced datasets or skewed datasets, it is ideal that more appropriate metrics
are used for comparison. The following metrics used in this study will allow for such
comparisons.
Balanced Accuracy:
Balanced accuracy is most commonly used when dealt with imbalanced datasets. It is the
arithmetic mean of sensitivity and specificity for a given model. The values for the balanced
accuracy range from 0 to 1 where 0 indicates the least accuracy and 1 indicates the highest
accuracy of classification for positive and negative values.
Balanced Accuracy =
Geometric Mean:

Specificity+Sensitivity
2

(5)

Geometric Mean or G- mean in this context is defined as the square root of sensitivity and
specificity. The values for the geometric mean range from 0 to 1 where 0 indicates the least
value for Geometric mean and 1 indicates the highest value for geometric mean.
Geometric Mean = �Specificity x Sensitivity

(6)

F1-Score or Balanced F-Score or F- measure:

F1- Score is defined as the harmonic mean of precision and recall characteristics of the model.
The best value is 1 and the worst value is 0. It is given by the below formula.

34

F1=

2 𝑋𝑋 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑋𝑋 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
(𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃+𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)

(7)

Precision is the ratio of true positives to total positives including both true and false positives
where recall is the ratio of true positives to true positives and false negatives.
Area Under the Curve (AUC):
The area under the curve is the measurement of the Receiver Operating Characteristic (ROC)
of the model which is calculated from prediction scores. Figure 6.0 portrays an example of a
ROC curve for a classifier. Any classifier which follows the 45-degree line is considered as a
useless classifier. A perfect classifier classifies a default payment as “default” 100% of the
time whereas real-life classifier’s performance lies in between useless and perfect classifiers.

Source: Yang, 2002

Figure 6.0 Illustration of Receiver Operating Characteristics
4.5 Overall Framework
In this study, we have implemented 4 different models using 2 oversampling techniques
and 2 under-sampling techniques as described in the previous sections. Before applying the
models to the dataset, preprocessing of the dataset was undertaken to perform preliminary
analysis and the feature selection procedure was carried out. To understand the feature
importance and use them in further analysis we used logistic regression which is one of the
35

widely used techniques in the feature selection in the literature reviewed. Once the set number
of features is selected based on the output from the logistic regression, the cleaned dataset was
then passed through all the models along with sampling techniques. The following flowchart
presents an outline of the overall framework used in this study.

36

Figure 7.0 Overall Framework used in this study

37

5.0 Results and Analysis
In this chapter analysis of the models' output and their performance are discussed along with
results from different sampling techniques used in this study. Section 5.1 outlines the
preliminary analysis of the raw dataset. Section 5.2 discusses the selection of features using
logistic regression. Section 5.3 outlines the model analysis using a confusion matrix for each
of the models and the sampling techniques. Section 5.4 discusses the results of each model
based on the performance metrics outlined in Chapter 4. Section 5.5 showcases the ROC curve
achieved under each of the models and sampling techniques.
5.1 Preliminary Analysis
To understand the dataset better, a preliminary analysis was conducted on the raw dataset
and several of the descriptive statistics were identified. The descriptive statistics are listed as
shown in the below table. Table 8.0 shows how the dataset is distributed between default and
non – default datapoints. Out of 30,000 records of clients in the dataset, 6,636 have defaulted
in their payments. The percentage of the default records to total records in the dataset used to
conduct this study is at 22.12 %, making the dataset an imbalanced dataset.
Table 8.0 Imbalanced Dataset
Total dataset

30000

default payments

6636

Percentage of default payments

22.12%

Table 9.0 Descriptive Statistics - Age, Sex, Education and Marriage

Count
Mean
Std Dev.
min
25% Conf. Int
50% Conf. Int

SEX
30000
1.6037
0.4891
1
1
2

EDUCATION
30000
1.8531
0.7903
0
1
2

MARRIAGE
30000
1.5519
0.5220
0
1
2

AGE
30000
35.4855
9.2179
21
28
34

75% Conf. Int
Max

2
2

2
6

2
3

41
79

38

Age v/s default payment

0 - Non- Default
1 - Default Customer

1400
1200

Count

1000
800
600
400
200
0

21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
AGE

Figure 8.0 Age versus default Payment

SEX v/s default payments

0 - Non Default

16000
1 - Default Customers

14000

COUNT

12000
10000
8000
6000
4000
2000
0

Male

Female
SEX

Figure 9.0 Sex versus default payment
Table 9.0 highlights the statistics of the clients regarding their age, sex, education, and
marriage. The average age of the client is over 35 years with the minimum age being 21 and
maximum age at 79, indicating the use of credit cards across different generations. The average
education of clients is more than 1 indicating most of the clients having at least school level
education. Figures 8.0, 9.0, 10.0 and 11.0 depict the count of each category against the default
39

payments which portrays a clearer picture of the different categories in the dataset. In these
figures 0 define the non default customers represented by the blue colour and 1 defines the
default customer represented by orange colour.

14000

Marriage v/s default payments

0 - Non Default
1 - Default Customers

12000

Count

10000
8000
6000
4000
2000
0

Others

Married

Single

Marriage

Figure 10.0 Marriage versus default Payment

Education v/s default payments
12000

0 - Non Default

10000

1 - Default Customers

Count

8000
6000
4000
2000
0

No
Education

Others

High School Graduate University
School
Education

Figure 11.0 Education versus default payments

40

Table 10.0 Descriptive Statistics - Payment status of six months
PAY_0

PAY_2

PAY_3

PAY_4

PAY_5

PAY_6

Count

30000

30000

30000

30000

30000

30000

Mean

-0.0167

-0.1338

-0.1662

-0.2207

-0.2662

-0.2911

Std Dev.

1.1238

1.19719

1.19687

1.16914

1.13319

1.14999

Min

-2

-2

-2

-2

-2

-2

25% Conf. Int -1

-1

-1

-1

-1

-1

50% Conf. Int 0

0

0

0

0

0

75% Conf. Int 0

0

0

0

0

0

8

8

8

8

8

8

Max

Table 11.0 Descriptive Statistics - Amount of Bill Statements over 6 months
BILL_AM
T1

BILL_AM
T2

BILL_AM
T3

BILL_AM
T4

BILL_AM
T5

BILL_AM
T6

30000

30000

30000

30000

30000

30000

Mean
Std
Dev.

51223.33

49179.08

47013.15

43262.95

40311.40

38871.76

73635.86

71173.77

69349.39

64332.86

60797.16

59554.11

min
25%
Conf.
Int
50%
Conf.
Int
75%
Conf.
Int

-165580

-69777

-157264

-170000

-81334

-339603

3558.75

2984.75

2666.25

2326.75

1763

1256

22381.5

21200

20088.5

19052

18104.5

17071

67091

64006.25

60164.75

54506

50190.5

49198.25

max

964511

983931

1664089

891586

927171

961664

Coun
t

Table 10.0 highlights the status of the payments of the clients over the past 6 months and
how much they have delayed in payments of the credit card statements. The lowest value being
-2 and highest value 8 indicating some defaulters haven’t paid the bills for over 8 months. The
mean across the payments holds a negative sign indicating customers who have defaulted for

41

a month or two may have paid the bills as well. This indicates the data consists of different
combinations of the client with the payment status
Table 11.0 indicates the bill statements of the clients over the past 6 months. The average bill
statements across the 6 months have been over $40,000 NT dollars indicating the expenditures
and payments occurring through the credit cards. The maximum bills statements have been
over $90,000 NT dollars highlighting the use of credit cards for expenses.
Table 12.0 Descriptive Statistics - Payment Amounts over 6 months
PAY_AMT1

PAY_AMT2

PAY_AMT3

PAY_AMT4

PAY_AMT5

PAY_AMT6

Count

30000

30000

30000

30000

30000

30000

Mean

5663.58

5921.16

5225.68

4826.08

4799.39

5215.50

Std Dev.

16563.28

23040.87

17606.96

15666.16

15278.31

17777.47

min
25%
Conf. Int
50%
Conf. Int
75%
Conf. Int

0

0

0

0

0

0

1000

833

390

296

252.5

117.75

2100

2009

1800

1500

1500

1500

5006

5000

4505

4013.25

4031.5

4000

max

873552

1684259

896040

621000

426529

528666

Table 12.0 highlights the payments made by clients against their bill statements over the 6
months. The minimum amount paid was 0 indicating clients who have defaulted in payments
and the maximum payments have been in a wide range depending on bills with an average of
over $5000 NT dollars.
5.2 Feature Selections
To eliminate noise in the dataset and to further optimize the importance of the features on
the output variable, we implemented logistic regression on the raw dataset and identified that
out of the 23 features in the raw dataset only 10 features played an important role in the
detection of default payment. Out of the 10 variables, 6 variables were PAY_0 to PAY_6 which
indicates that past repayment status plays a major role in identifying whether the client will
make any future payments. It could also be stated that these repayment statuses will be
42

correlated with the dependent variable. The logistic regression is given by the equation
(8) which includes 23 independent variables and 1 dependent variable.
𝑃𝑃(𝑌𝑌 = 1|𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 … , 𝑋𝑋23 ) =

𝑒𝑒 𝛽𝛽0+𝛽𝛽1 𝑋𝑋1 +𝛽𝛽2 𝑋𝑋2 +𝛽𝛽3 𝑋𝑋3 ……………+𝛽𝛽23 𝑋𝑋23 +𝜀𝜀𝑡𝑡
1 + 𝑒𝑒𝛽𝛽0 +𝛽𝛽1 𝑋𝑋1 +𝛽𝛽2 𝑋𝑋2 +𝛽𝛽3 𝑋𝑋3 ……………+𝛽𝛽23 𝑋𝑋23 +𝜀𝜀𝑡𝑡

(8)

where β0 is the constant, and β1 , β2 , … … … . , β23 are Coefficients of independent variables.
The independent features are labelled as X1 , X2 , X3 … , X23 , and Y has been defined as the binary
response for the client to be at fault Y = 1 or non-default whenY = 0.
The independent variables are defined by the characteristics of the each of the client’s data

included in this study. These characteristics are outlined in detail in the Table 6.0. The choice
of independent and dependent variable has been made based on these characteristics and by
definition of default. Based on these definitions, in this study, the dependent variable will be
the default payment and independent variables are remaining features as outlined in Table 6.0.
Common types of regression analysis use Mean Squared Error (MSE) as loss function that
gives a convex shape. A complete optimization can be done by finding its vertex as a global
minimum. However, there is no such option for logistic regression. Since the dependent feature
is not continuous, the hypothesis of MSE will result in a non-convex graph with local
minimums. The appropriate loss function for logistic regression is known as Cross Entropy
Loss Function for linear classification models as defined by (Murphy, 2012). Such loss
function also ensures that as the probability of the correct answer is maximized, the probability
of the incorrect answer is minimized; since the two sum to one, any increase in the probability
of the correct answer is coming at the expense of the incorrect answer. The optimized Cross
Entropy Loss Function is reported by MATLAB
Using the coefficients of dependent variables obtained from the logistic regression we plotted
the graph of independent variables against their relative importance. The plot of the relative
importance of the features can be seen in Figure 8.0. Table 13.0 displays the logistic regression
results obtained with the variables as defined in equation 8. The pseudo R-square value of
0.1207 in the table reflects the McFadden’s R-Square as per the documentation of the
programming used for calculating the value of the logistic regression results. Assuming, L0 be
the value of the likelihood function for a model with no predictors, and let Lm be the likelihood
for the model being estimated. McFadden’s R -square is defined as
43

𝑅𝑅2 = 1 − (

ln(𝐿𝐿𝑚𝑚)
ln(𝐿𝐿0)

)

(9)

As per McFadden (1974) a small ratio of the log likelihood indicates that model being
estimated is far better fit than the model with no predictors. Based on the results from this
step, the dataset was reduced to only 10 features which played an important role and was
used for further analysis of the models.

Figure 12.0 Plot of features and their relative importance using logistic regression
Table 13.0 Logistic Regression Results
Model:
Method:
Dep. Variable:
No. Observations:
Df Residuals:
Df Model:
Pseudo R-square:
Log-Likelihood:
LL-Null:
LLR p-value:

Logit
MLE
default payment next month
30000
29976
23
0.1207
-13939
-15853
0.00000
44

5.3 Model Analysis – Confusion Matrix with 10 features
The model analysis is presented with the help of metrics discussed in Chapter 4 Section 4.4.
The following tables give a detailed confusion matrix of the models used in this study. Each
model outlays the true positives, true negatives, false positives, and false negatives as discussed
in the previous sections. These true positives, true negatives, false positives, and false negatives
are generated by the models as we perform the tests on these models once the models are
trained using the training dataset. For the dataset used in this study true positive detection
indicates that the model was able to detect the default payment correctly, true negative
indicates that the model was able to detect the non-default payment correctly, false-positive
indicate that the model was not able to detect the non-default payment correctly and falsenegative indicate that the model was not able to detect the default payment correctly. Table
14.0 gives a detailed confusion matrix for all the sampling techniques for the DNN model used
in this study. All-KNN sampling technique has the highest true positives at 655 instances as
compared to any other sampling technique with this model and SMOTE oversampling has the
least true positives at 331 instances.
Table 14.0 Confusion Matrix - DNN
Model = DNN
Sampling
SMOTE

Actual Y
TRUE
FALSE

Predicted Y
Positive
331
138

Negative
4565
966

SVM SMOTE

TRUE
FALSE

575
412

4291
722

RUS

TRUE
FALSE

516
288

4415
781

ALLKNN

TRUE
FALSE

655
613

4090
642

Table 15.0 outlays the detailed confusion matrix of ANNs with the sampling techniques. As it
can be observed, the All-KNN technique has the highest number of true positives at 783

45

instances as compared to other techniques whereas Random Under Sampling has the least
number of true positives at 496 instances.
Table 15.0 Confusion Matrix - ANN
Model = ANN
Sampling
SMOTE

Actual Y
TRUE
FALSE

Predicted Y
Positive
555
437

Negative
4266
742

SVM SMOTE

TRUE
FALSE

558
455

4248
739

RUS

TRUE
FALSE

496
385

4318
801

ALLKNN

TRUE
FALSE

783
1215

3488
514

Table 16.0 outlays the detailed confusion matrix of SVM- RBF Kernel with the sampling
techniques. As it can be observed, in this model RUS technique has the highest number of true
positives at 775 instances as compared to other techniques whereas All-KNN has the least
number of true positives at 450 instances.
Table 16.0 Confusion Matrix - SVM with RBF Kernel
Model = SVM - RBF Kernel
Actual Y
SMOTE
TRUE
FALSE

Predicted Y
Positive
689
772

Negative
3931
608

SVM SMOTE

TRUE
FALSE

684
682

4021
613

RUS

TRUE
FALSE

775
900

38033
522

ALLKNN

TRUE
FALSE

450
227

4476
847

46

Table 17.0 outlays the detailed confusion matrix of the KNN model with the sampling
techniques. As it can be observed, in this model RUS technique has the highest number of true
positives at 716 instances as compared to other techniques whereas SVM-SMOTE has the least
number of true positives 418 instances.
Out of the 4 models, 2 models have shown the highest number of true positives and number of
true negatives with All-KNN under-sampling techniques establishing that All KNN techniques
detection capabilities are better than the other sampling techniques. KNN model has the highest
number of true positives among all the other models indicating the model’s capabilities to
detect true positives among the models used in this study.
Table 17.0 Confusion Matrix - KNN
Model = KNN
Actual Y
TRUE
FALSE

Predicted Y
Positive
711
1123

Negative
3580
586

SVM SMOTE

TRUE
FALSE

706
1015

3688
591

RUS

TRUE
FALSE

716
942

3761
581

ALLKNN

TRUE
FALSE

418
334

4369
879

SMOTE

Table 18.0 provides the consolidated confusion matrix across all models and sampling
techniques used in this study. The figures have been represented in percentage format to
provide us with a better understanding of sampling techniques and their performance. As true
positives indicate the default clients identified as default, we could observe that All -KNN
technique has performed well with ANN and DNN whereas RUS has performed better with
SVM and KNN in identifying true positives.

47

Confusion Matrix

DNN

ANN

SVM

KNN

Sampling

Actual Y

Positive

Negative

Positive

Negative

Positive

Negative

Positive

Negative

SMOTE

TRUE

5.52%

76.08%

9.25%

71.10%

11.48%

65.52%

11.85%

59.67%

FALSE

2.30%

16.10%

7.28%

12.37%

12.87%

10.13%

18.72%

9.77%

TRUE

9.58%

71.52%

9.30%

70.80%

11.40%

67.02%

11.77%

61.47%

FALSE

6.87%

12.03%

7.58%

12.32%

11.37%

10.22%

16.92%

9.85%

TRUE

8.60%

73.58%

8.27%

71.97%

12.92%

633.88%

11.93%

62.68%

FALSE

4.80%

13.02%

6.42%

13.35%

15.00%

8.70%

15.70%

9.68%

TRUE

10.92%

68.17%

13.05%

58.13%

7.50%

74.60%

6.97%

72.82%

FALSE

10.22%

10.70%

20.25%

8.57%

3.78%

14.12%

5.57%

14.65%

SVM SMOTE

RUS

ALLKNN

Table 18.0 Consolidated Confusion Matrix

48

5.4 Performance Metrics Analysis
Table 18.0 outlays the performance metrics for each of the sampling technique with the
DNN Model. As it can be observed, under most of the sampling technique DNN Model has
been able to give an accuracy of 81% with the RUS-DNN model providing the highest
accuracy at 82.18%. Based on balanced accuracy and G-Mean, All-KNN based DNN model
has the highest performance metrics. All the sampling techniques under the DNN model were
able to achieve an ROC score of 0.70 except for the SVMSMOTE technique which achieved
0.686 ROC score. The average accuracy for the techniques was at 81%, balanced accuracy at
66.16% and ROC score of 0.698.
Table 19.0 Performance Metrics - DNN
Model = DNN

Accuracy
Specificity
Sensitivity
Balanced Accuracy
Geometric Mean
Precision
Recall
F1
Area Under the
Curve
Training
Testing

SMOTE
0.8160
0.9707
0.2552
0.6130
0.4977
0.7058
0.2552
0.3749

SVMSMOTE
0.8110
0.9124
0.4433
0.6779
0.6360
0.5826
0.4433
0.5035

RUS
0.8218
0.9388
0.3978
0.6683
0.6111
0.6418
0.3978
0.4912

All
KNN
0.7908
0.8697
0.5050
0.6874
0.6627
0.5166
0.5050
0.5107

0.700
0.701

0.699
0.686

0.705
0.706

0.698
0.698

Average
0.8099
0.9229
0.4003
0.6616
0.6019
0.6117
0.4003
0.4701

0.701
0.698

ROC

Following figures show the ROC Curve for each of the techniques under the DNN Model

49

Figure 13.0 Receiver Operating Characteristics - DNN with SMOTE

Figure 14.0 Receiver Operating Characteristics - DNN with SVM SMOTE

50

Figure 15.0 Receiver Operating Characteristics - DNN with RUS

Figure 16.0 Receiver Operating Characteristics - DNN with All-KNN

51

Table 19.0 outlays the performance metrics for each of the sampling technique with the
ANN Model. As it can be observed, for this model SMOTE, SVM-SMOTE and RUS
techniques were able to give more than 80% accuracy whereas All-KNN lagged in accuracy.
ANN Model has been able to give an accuracy of 80.10% with SVMSMOTE - technique.
Based on balanced accuracy and G-Mean, AllKNN based ANN model has the highest
performance metrics. All the sampling techniques under the ANN model were able to achieve
an ROC score of 0.70 except for the RUS technique which achieved 0.691 ROC score. The
average accuracy for the techniques was at 77.97%, balanced accuracy at 66.43% and ROC
score of 0.703.
Table 20.0 Performance Metrics - ANN
Model = ANN
Accuracy
Specificity
Sensitivity
Balanced Accuracy
Geometric Mean
Precision
Recall
F1
Area Under the ROC Curve
Training
Testing

SMOTE
0.8035
0.9071
0.4279
0.6675
0.6230
0.5595
0.4279
0.4849

SVMSMOTE
0.8010
0.9033
0.4302
0.6668
0.6234
0.5508
0.4302
0.4831

RUS
0.8023
0.9181
0.3824
0.6503
0.5925
0.5630
0.3824
0.4555

AllKNN
0.7118
0.7417
0.6037
0.6727
0.6692
0.3919
0.6037
0.4753

Average
0.7797
0.8676
0.4611
0.6643
0.6270
0.5163
0.4611
0.4747

0.707
0.706

0.708
0.707

0.686
0.691

0.708
0.706

0.702
0.703

Following figures show the ROC Curve for each of the techniques under the ANN Model

52

Figure 17.0 Receiver Operating Characteristics - ANN with SMOTE

Figure 18.0 Receiver Operating Characteristics - ANN with SVM SMOTE

53

Figure 19.0 Receiver Operating Characteristics - ANN with RUS

Figure 20.0 Receiver Operating Characteristics - ANN with All-KNN
54

Table 20.0 outlays the performance metrics for each of the sampling technique with the RBF
Kernel-based Support Vector Machine. As it can be observed, under this model All-KNN has
achieved more than 80% accuracy whereas SMOTE and RUS have achieved closer to 77%
accuracy. Based on Balanced Accuracy and G-Mean, RUS has performed much better than
other techniques with this model. All the techniques were able to achieve more than 0.69 of
the Testing - ROC scores except the AllKNN technique which achieved 0.649. Taking the
average on all the techniques, the model was able to achieve more than 78.46% accuracy, with
a balanced accuracy of over 68.18% and 65.19% of G-Mean.
Table 21.0 Performance Metrics - SVM- RBF Kernel
Model = SVM-RBF
Accuracy
Specificity
Sensitivity
Balanced Accuracy
Geometric Mean
Precision
Recall
F1
Area Under the
Curve
Training
Testing

SMOTE
0.7700
0.8358
0.5312
0.6835
0.6663
0.4716
0.5312
0.4996

SVMSMOTE
0.7842
0.8550
0.5274
0.6912
0.6715
0.5007
0.5274
0.5137

RUS
0.7630
0.8086
0.5975
0.7031
0.6951
0.4627
0.5975
0.5215

AllKNN
0.8210
0.9517
0.3470
0.6494
0.5747
0.6647
0.3470
0.4559

Average
0.7846
0.8628
0.5008
0.6818
0.6519
0.5249
0.5008
0.4977

0.733
0.684

0.730
0.691

0.721
0.703

0.653
0.649

0.709
0.682

ROC

Following figures show the ROC Curve for each of the techniques under the Support Vector
Machine – RBF Kernel

55

Figure 21.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
SMOTE

Figure 22.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
SVM SMOTE

56

Figure 23.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
RUS

Figure 24.0 Receiver Operating Characteristics - Support Vector Machine - RBF Kernel with
All-KNN

57

Table 21.0 outlays the performance metrics for each of the sampling technique with the KNN.
KNN was able to achieve the least accuracy of all models, even in terms of balanced accuracy
and G-Mean. Within the techniques, it could be observed that AllKNN has performed better
than other sampling techniques with 79.78% and SMOTE has the least accuracy at 71.52%.
On average of all the techniques, the KNN model was able to achieve 74.79% accuracy. All
the techniques have performed differently in terms of the Testing-ROC score. Oversampling
techniques have scored more than 0.65 whereas under-sampling techniques have scored more
than 0.67 except the AllKNN technique with 0.626 ROC score.
Table 22.0 Performance Metrics - KNN
Model = KNN
Accuracy
Specificity
Sensitivity
Balanced Accuracy

SMOTE
0.7152
0.7612
0.5482
0.6547
0.6460
0.3877
0.5482
0.4542

Geometric Mean
Precision
Recall
F1
Area Under the ROC
Curve
0.750
Training
0.655
Testing

SVMSMOTE
0.7323
0.7842
0.5443
0.6643
0.6533
0.4102
0.5443
0.4679

RUS
0.7462
0.7997
0.5520
0.6759
0.6644
0.4318
0.5520
0.4846

AllKNN
0.7978
0.9290
0.3223
0.6257
0.5472
0.5559
0.3223
0.4080

Average
0.7479
0.8185
0.4917
0.6551
0.6277
0.4464
0.4917
0.4537

0.747
0.664

0.710
0.676

0.638
0.626

0.711
0.655

Following figures show the ROC Curve for each of the techniques under the KNN Model

58

Figure 25.0 Receiver Operating Characteristics - KNN with SMOTE

Figure 26.0 Receiver Operating Characteristics - KNN with SVM SMOTE

59

Figure 27.0 Receiver Operating Characteristics - KNN with RUS

Figure 28.0 Receiver Operating Characteristics - KNN with All-KNN

60

Table 23.0 Consolidated Accuracies of the Models
Sampling

SMOTE

SVMSMOTE

RUS

All KNN

Average

DNN - Accuracy

0.8160

0.8110

0.8218

0.7908

0.8099

ANN - Accuracy

0.8035

0.8010

0.8023

0.7118

0.7797

SVM - Accuracy

0.7700

0.7842

0.7630

0.8210

0.7846

KNN - Accuracy

0.7152

0.7323

0.7462

0.7978

0.7479

Table 24.0 Consolidated Balanced Accuracies of the Models
Sampling

SMOTE

SVMSMOTE

RUS

All KNN

Average

DNN - BA

0.6130

0.6779

0.6683

0.6874

0.6616

ANN - BA

0.6675

0.6668

0.6503

0.6727

0.6643

SVM - BA

0.6835

0.6912

0.7031

0.6494

0.6818

KNN - BA

0.6547

0.6643

0.6759

0.6257

0.6551

Table 25.0 Consolidated ROC Scores of the Models
Model

Sampling

SMOTE

SVMSMOTE

RUS

All KNN

Average

DNN - ROC

Training

0.700

0.699

0.705

0.698

0.701

Testing

0.701

0.686

0.706

0.698

0.698

Training

0.707

0.708

0.686

0.708

0.702

Testing

0.706

0.707

0.691

0.706

0.703

Training

0.733

0.730

0.721

0.653

0.709

Testing

0.684

0.691

0.703

0.649

0.682

Training

0.750

0.747

0.710

0.638

0.711

Testing

0.655

0.664

0.676

0.626

0.655

ANN - ROC

SVM - ROC

KNN - ROC

61

Table 23.0, 24.0 and 25.0 provides the consolidated accuracies, balanced accuracies and
ROC scores of the models and the sampling techniques. Based on these tables we could identify
that in terms of accuracies DNN and ANN has performed better whereas SVM has performed
better in terms of Balanced accuracy. ROC scores of ANN and DNN models are much better
as compared to SVM and KNN. To understand the framework and to study the effect of the
remaining features on the models we applied all the models and sampling techniques to the
dataset with 23 independent features and 1 dependent variable. The following tables and
figures will outlay the confusion matrices, performance metrics and ROC curves for each of
the models and sampling techniques.
5.5 Confusion Matrices with 23 features
Table 26.0 gives a detailed confusion matrix for all the sampling techniques for the DNN
with 24 features. As it can be observed that introducing the remaining features has introduced
noise in the dataset increasing the loss functions in the DNN Model. The model was not able
to recognize any true positives across all the sampling techniques used. As it can be seen
consistently across all the sampling techniques, we can conclude that additional features have
taken away the ability of the model to detect default payments accurately.
Table 26.0 Confusion Matrix with 23 features - DNN
Model = DNN
Sampling
SMOTE

Actual Y
TRUE
FALSE

Predicted Y
Positive
0
0

Negative
4703
1297

SVM SMOTE

TRUE
FALSE

0
0

4703
1297

RUS

TRUE
FALSE

0
0

4703
1297

ALLKNN

TRUE
FALSE

0
0

4703
1297

Table 27.0 gives a detailed confusion matrix for all the sampling techniques for the ANN
with 23 features. The results of the confusion matrix for both ANN Model and the DNN model
62

has been consistent across all sampling techniques. The model was not able to recognize any
true positives across all the sampling techniques used. One of the reasons for these results may
also be due to the use of the same activation functions in both the ANN and DNN Model. This
will require further investigation into the activation and loss functions of both the models
which are currently out of scope for this study.
Table 27.0 Confusion Matrix with 23 features - ANN
Model = ANN
Sampling
SMOTE

Actual Y
TRUE
FALSE

Predicted Y
Positive
0
0

Negative
4703
1297

SVM SMOTE

TRUE
FALSE

0
0

4703
1297

RUS

TRUE
FALSE

0
0

4703
1297

ALLKNN

TRUE
FALSE

0
0

4703
1297

Table 28.0 outlays the detailed confusion matrix of SVM- RBF Kernel with the 24 features.
SVM has been able to improve on the introduction of the features but was not successful as
with 10 features. As it can be observed, in this model RUS technique has the highest number
of true positives at 67 instances as compared to other techniques whereas All-KNN has the
least number of true positives at 10 instances. All the sampling techniques have more 4600
true negatives which indicate the effect on the model due to features introduction.

63

Table 28.0 Confusion Matrix with 23 features - SVM with RBF Kernel
Model = SVM - RBF Kernel
Actual Y
SMOTE
TRUE
FALSE

Predicted Y
Positive
52
67

Negative
4636
1245

SVM SMOTE

TRUE
FALSE

48
59

4644
1249

RUS

TRUE
FALSE

67
86

4617
1230

ALLKNN

TRUE
FALSE

10
8

4695
1287

Table 29.0 outlays the detailed confusion matrix of KNNs with the 24 features. KNN has
shown much better results in terms of the true positive detection as compared to the other
models but was not as successful as with only 10 features. As it can be observed, in this model
RUS technique has the highest number of true positives at 716 instances as compared to other
techniques whereas All-KNN has the least number of true positives at 94 instances. All-KNN
sampling technique under this model has the greatest number of true negatives at 4604 whereas
other sampling techniques have more than 3000 true negatives. Out of the 4 models, DNN and
ANN are the most affected models due to the introduction of additional features whereas KNN
is the least affected model.
Table 29.0 Confusion Matrix with 23 features -KNN
Model = KNN
Actual Y
TRUE
FALSE

Predicted Y
Positive
649
1614

Negative
3089
648

SVM SMOTE

TRUE
FALSE

550
1245

3458
747

RUS

TRUE
FALSE

716
1649

3054
581

ALLKNN

TRUE
FALSE

94
99

4604
1203

SMOTE

64

Table 30.0 Consolidated Confusion Matrix - 23 features
Confusion Matrix

DNN

SMOTE

TRUE

0.00%

78.38%

0.00%

78.38%

0.87%

77.27%

10.82%

51.48%

FALSE

0.00%

21.62%

0.00%

21.62%

1.12%

20.75%

26.90%

10.80%

TRUE

0.00%

78.38%

0.00%

78.38%

0.80%

77.40%

9.17%

57.63%

FALSE

0.00%

21.62%

0.00%

21.62%

0.98%

20.82%

20.75%

12.45%

TRUE

0.00%

78.38%

0.00%

78.38%

1.12%

76.95%

11.93%

50.90%

FALSE

0.00%

21.62%

0.00%

21.62%

1.43%

20.50%

27.48%

9.68%

TRUE

0.00%

78.38%

0.00%

78.38%

0.17%

78.25%

1.57%

76.73%

FALSE

0.00%

21.62%

0.00%

21.62%

0.13%

21.45%

1.65%

20.05%

ALLKNN

Positive

Negative

Positive

KNN

Actual Y

RUS

Negative

SVM

Sampling

SVM SMOTE

Positive

ANN

Negative

Positive

Negative

65

5.6 Performance Metrics with 23 features
Table 31.0 outlays the performance metrics for the DNN Model with 23 features. As it can
be observed, we have received consistent accuracy of 78.38% mainly due to true negatives
with sensitivity at 1.000 and balanced accuracy at 50%. ROC score for all the techniques has
been flat 0.50 which indicates that with 23 features DNN Model is a useless classifier and
cannot be used for further applications.
Table 31.0 Performance Metrics with 23 features - DNN
Model = DNN
SMOTE
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

Accuracy
Specificity
Sensitivity
Balanced Accuracy
Geometric Mean
Precision
Recall
F1
Area Under the ROC
Curve
0.500
Training
0.500
Testing

SVMSMOTE
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

RUS
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

AllKNN
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

Average
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

0.500
0.500

0.500
0.500

0.500
0.500

0.500
0.500

Following figures show the ROC Curve for each of the techniques under the DNN Model with
23 features

66

Figure 29.0 Receiver Operating Characteristics with 23 features - DNN with SMOTE

Figure 30.0 Receiver Operating Characteristics with 23 features - DNN with SVM SMOTE

67

Figure 31.0 Receiver Operating Characteristics with 23 features - DNN with RUS

Figure 32.0 Receiver Operating Characteristics with 23 features - DNN with All-KNN
Table 32.0 outlays the performance metrics for the ANN Model with 23 features. As it can
be observed, we have received consistent accuracy of 78.38% mainly due to true negatives,
with sensitivity at 1.000 and balanced accuracy at 50%. ROC score for all the techniques has
been flat 0.50 which indicates that with 24 features ANN Model is a useless classifier and
cannot be used for further applications. As mentioned before both ANN and Deep Neural
model has shown similar characteristics with respect to the introduction of features, indicating
68

that these models will require further study on their behaviour towards activation and loss
functions. The major reason behind both ANN and DNN gives out similar results is that the
error rate for both these models converges to the same values using 23 features.
Table 32.0 Performance Metrics with 23 features - ANN
Model = ANN
SMOTE
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

Accuracy
Specificity
Sensitivity
Balanced Accuracy
Geometric Mean
Precision
Recall
F1
Area Under the ROC Curve
0.500
Training
0.500
Testing

SVMSMOTE
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

RUS
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

AllKNN
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

Average
0.7838
1.0000
0.0000
0.5000
0.0000
0.0000
0.0000
0.0000

0.500
0.500

0.500
0.500

0.500
0.500

0.500
0.500

Following figures show the ROC Curve for each of the techniques under the ANN Model with
23 features

Figure 33.0 Receiver Operating Characteristics with 23 features - ANN with SMOTE

69

Figure 34.0 Receiver Operating Characteristics with 23 features - ANN with SVM SMOTE

Figure 35.0 Receiver Operating Characteristics with 23 features - ANN with RUS

70

Figure 36.0 Receiver Operating Characteristics with 23 features - ANN with All-KNN
Table 33.0 outlays the performance metrics for the RBF Kernel-based Support Vector
Machine with 23 features. As it can be observed, under this model All-KNN has achieved more
than 78.42% accuracy whereas SMOTE and RUS have achieved closer to 78.1% accuracy.
Based on Balanced Accuracy and G-Mean, RUS has performed much better than other
techniques with this model. ROC score indicates that the performance of the classifier in terms
of such an imbalanced dataset. A huge difference between the training and the testing ROC
score indicates that the model is overfitting due to the use of sampling techniques. In this case,
as we can observe the training ROC for most of the techniques except All-KNN are closing in
at 0.99 and training score at 0.51, the models were overfitted. All-KNN technique is the only
exception with a closer gap between the training and the testing ROC score but the score with
this technique is closer to 0.50 as well indicating the model with the technique cannot be used
for further application.

71

Table 33.0 Performance Metrics with 23 features - SVM - RBF Kernel
Model = SVM-RBF
Accuracy
Specificity
Sensitivity
Balanced Accuracy

SMOTE
0.7813
0.9858
0.0401
0.5130
0.1988
0.4370
0.0401
0.0734

Geometric Mean
Precision
Recall
F1
Area Under the ROC Curve
0.994
Training
0.513
Testing

SVMSMOTE
0.7820
0.9875
0.0370
0.5123
0.1911
0.4486
0.0370
0.0684

RUS
0.7807
0.9817
0.0517
0.5167
0.2253
0.4379
0.0517
0.0924

AllKNN
0.7842
0.9983
0.0077
0.5030
0.0877
0.5556
0.0077
0.0152

Average
0.7821
0.9883
0.0341
0.5112
0.1757
0.4698
0.0341
0.0624

0.994
0.512

0.992
0.517

0.527
0.503

0.877
0.511

Following figures show the ROC Curve for each of the techniques under the SVM - RBF
Kernel with 23 features

Figure 37.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with SMOTE

72

Figure 38.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with SVM SMOTE

Figure 39.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with RUS

73

Figure 40.0 Receiver Operating Characteristics with 23 features - Support Vector Machine RBF Kernel with All-KNN

Table 34.0 outlays the performance metrics for the KNN with 23 features. As it can be
observed, under this model All-KNN has achieved more than 78.30% accuracy whereas
SMOTE and RUS have achieved closer to 62% accuracy. Based on Balanced Accuracy and
G-Mean, RUS has performed much better than other techniques with this model. ROC score
for the oversampling techniques with this model shown a smaller gap between the training and
the testing score as compared to the SVM. Under-sampling techniques have shown a much
lesser gap and have been able to achieve a nearer score in both training and testing ROC. Along
with accuracy, balanced accuracy and G-Mean, KNN along with RUS has shown to be useful
classifier as compared to other techniques.

74

Table 34.0 Performance Metrics with 23 features - KNN
Model = KNN
SMOTE
0.6230
0.6568
0.5004
0.5786
0.5733
0.2868
0.5004
0.3646

Accuracy
Specificity
Sensitivity
Balanced Accuracy
Geometric Mean
Precision
Recall
F1
Area Under the ROC
Curve
0.750
Training
0.579
Testing

SVMSMOTE
0.6680
0.7353
0.4241
0.5797
0.5584
0.3064
0.4241
0.3558

RUS
0.6283
0.6494
0.5520
0.6007
0.5987
0.3027
0.5520
0.3910

AllKNN
0.7830
0.9789
0.0725
0.5257
0.2664
0.4870
0.0725
0.1262

Average
0.6756
0.7551
0.3873
0.5712
0.4992
0.3457
0.3872
0.3094

0.727
0.58

0.653
0.601

0.528
0.526

0.665
0.572

Following figures show the ROC Curve for each of the techniques under the KNN with 23
features

Figure 41.0 Receiver Operating Characteristics with 23 features - KNN with SMOTE

75

Figure 42.0 Receiver Operating Characteristics with 23 features - KNN with SVM SMOTE

Figure 43.0 Receiver Operating Characteristics with 23 features - KNN with RUS

76

Figure 44.0 Receiver Operating Characteristics with 23 features - KNN with All-KNN

Table 35.0 Consolidated Accuracies of the Models - 23 features
Sampling

SMOTE

SVMSMOTE

RUS

All KNN

Average

DNN - Accuracy

0.7838

0.7838

0.7838

0.7838

0.7838

ANN - Accuracy

0.7838

0.7838

0.7838

0.7838

0.7838

SVM - Accuracy

0.7813

0.7820

0.7807

0.7842

0.7821

KNN - Accuracy

0.6230

0.6680

0.6283

0.7830

0.6756

Table 36.0 Consolidated Balanced Accuracies of the Models – 23 features
Sampling

SMOTE

SVMSMOTE

RUS

All KNN

Average

DNN - Accuracy

0.7838

0.7838

0.7838

0.7838

0.7838

ANN - Accuracy

0.7838

0.7838

0.7838

0.7838

0.7838

SVM - Accuracy

0.7813

0.7820

0.7807

0.7842

0.7821

KNN - Accuracy

0.6230

0.6680

0.6283

0.7830

0.6756

77

6.0 Implications and Conclusion
In this chapter, policies and implications for the use of machine learning and artificial
intelligence in the banking sector and financial institutions have been discussed with a focus
on the Canadian Banking sector. Section 6.1 discusses the policy implications and the
development of a robust framework for unified implementation across the financial institutions
in Canada. Section 6.3 outlines future work. Section 6.4 highlights the key contributions of the
study. Section 6.5 concludes the study.
6.1 Policy Implications regarding the use of machine learning in Canada
Being one of the first national governments to establish a pan-national AI strategy, the
Canadian government is at the forefront of bringing AI into applications than other national
governments. The major aim of establishing the Pan-Canadian Artificial Intelligence Strategy
was to increase the number of researches on AI and skilled graduates in the domain of AI and
machine learning (CIFAR, 2019), to develop policies and thought leadership on economic,
ethical and legal implications regarding the developments in the field of artificial intelligence
(CIFAR, 2019). As a part of the efforts, recently CIFAR, the institution responsible for leading
the strategy has increased the AI research Chair across Canada to 80 from 46 within the year
2019 itself.
The initiatives from the government have also been extended in terms of the establishment
of the superclusters across Canada for implementing high-tech, AI-based applications for
supporting different business functionalities. Out of these superclusters, the technology
superclusters are located in the province of British Columbia for enhancing the applications of
AI. Although the government has made headway in applying the knowledge of AI for a better
business environment, but there have been limited responses from the private partners of the
domain. It was not until late 2019, Canadian Banks has developed or implemented AI
technologies in their systems in some or the other form, with Royal Bank of Canada (RBC)
being the leader in the domain. RBC has also been a keen supporter of the government’s
CIFAR initiatives.

78

Bank of Canada’s stance on machine learning and AI has been limited to research. Being
the central bank, they could play a more developmental role in establishing a more robust
framework for the application of artificial intelligence and machine learning in banking
institutions. Through Partnerships in Innovation and technology program (PIVOT) Bank of
Canada has been able to generate interest in developing innovative technologies for them but
there have been limited applications of those technologies in actual business scenarios. As a
central bank, it is of understanding they should consider that formulation of the future monetary
policies may have a major impact due to the use of AI and machine learning in business.
Poloz (2019) in his discussion paper outlines how economies and monetary policies may
drastically change in terms of implementation due to the fourth industrial revolution which
calls for widespread application of machine learning and AI. Using Terms of Trade Economic
Model, the author identified that real-time positive technology shock can lead to economic
expansion and maintain downward pressure on the inflation targets. The technology in question
has been the application of AI and machine learning in the economy. Poloz (2019) also
discusses how the model has taken into account the major financial vulnerabilities faced by the
central bank and the risk associated with macroeconomic factors.
The government of Canada and the Bank of Canada both are making headway towards the
application of AI and machine learning in different parts of the economy. One would call for
a more robust framework which can bring changes in the fundamental parts of the financial
institutions like personal risk management for the credit instruments like the one we have used
in this study. This would require collaborative actions from the banks operating in Canada
along with the central bank being at the center stage of this framework implementation. To
implement AI and machine learning models in such applications, the framework should also
take into consideration the privacy and security of the datasets consisting of client information.
The robust framework for creating such changes can be achieved through public-private
partnerships and through a common understanding of the needs of the institutions participating
in the framework.
6.2 Future Work
To identify the feature’s importance Logistic Regression was used in the pre-processing
stage and several of the features were discarded from further analysis. More robust feature
79

selection procedures can be implemented for the selection of features in conjunction with the
DNN Model proposed in the study. It is of understanding that not all the discarded features
may play an important role but feature selection can play a vital role in the output variable.
The dataset used in the study had 30,000 different client information. To understand the
complete working of the DNN Model proposed in the study, a larger dataset of the order of
millions of records will help in further analyzing the model. A larger dataset can also help in
understanding how fast the proposed model can help in getting the output as compared to the
different models from the literature of credit risk assessment.
To realize the importance of DNN it is imperative that more similar studies will be required
using different credit instruments like home mortgages, line of credits and vehicle loans.
Comparative studies between two different datasets can also help in analyzing the model
further.
6.3 Key Contributions
Some of the primary contributions of this study are as follows:
1. DNN Model proposed in the study has been able to achieve 81% accuracy with a ROC
score of 0.70
2. Application of 4 different sampling techniques along with 4 different models for the study
in credit risk assessment along with two sampling techniques to be used for the first time
in credit risk assessment to the best of knowledge
3. Apart from SMOTE and RUS, All-KNN and SVM SMOTE are equally powerful
sampling techniques under different models and scenarios as studied under this thesis
Some of the secondary contributions of this study are as follows:
1. Use of K-NN in the comparative study as through literature reviews it was identified that
K-NN is the least studied model in credit risk assessment, although it is one of the base
classifiers in the field of machine learning.
2. The proposition of a new framework for the widespread application and implementation
of machine learning and artificial intelligence in the Canadian financial sector.

80

6.4 Practical Insights
Application of machine learning and DNNs can help the financial institutions in predicting
the counterparty risk failure as we have seen in this study. Assuming the model is applied in
the real-life scenario, the loss due to credit card delinquency can be reduced considerably. As
per Mckinsey’s Global Institute research on credit risk management (MGI, 2017), application
of machine learning and advanced analytics can help financial institutions in three different
ways. Firstly, by potential improvement in the revenue due to early detection of credit risk or
counterparty risk. Secondly by saving potential money in cost reduction due to detection of
potential fraud customers in the application of process of credit instruments such as credit
cards. Thirdly by saving money which were previously employed in the risk mitigation
strategies surrounding the credit risk management. At each of these stages’ financial
institutions, can save up to 10 to 15% of the potential value in revenue which in combination
reduces the losses up to 30 to 35% by application of advanced analytical tools in credit risk
management (Bahillo et al, 2016). Further application of advanced analytical models can help
banks in improving their return on equity by approximately up to 4% (Harle, Havas &
Samandari, 2015).
Canadian Bankers Association reported that over 600,000 credit cardholders were
delinquents in 2018 (CBA,2018) with a net loss of approximately CAD $4.38 billion dollars
as the net dollar value for credit cards transactions alone were at CAD $547.98 billion dollars.
This dataset comprises for all the credit card issuing institutions in Canada. The delinquency
rate for 2018 was at 0.8% (CBA,2018) which gives us the total loss value and total delinquent
card holders. By the application of machine learning models, it can be brought down to CAD
$2 to $3 billion dollars approximately if we apply the potential reduction percentages as stated
by Bahillo et al (2016). This understanding and the application of the DNN based models can
have profound impacts on the bottom line of the major financial institutions.
Considering a loss of CAD $4.38 billion dollars with 600,000 card holders, the average loss
per card holder to the financial institutions can be approximated to be CAD $7300 dollars
annually. Assuming the model applied in this study is applied to identity these 600,000 card
holders in the earlier stage, at 82.18% accuracy 493,080 card holders will be classified as

81

delinquents. The savings would be approximately CAD $3.6 (493,080 x 7300) billion dollars
to the financial institutions if these delinquent card holders are detected at the earlier stage.
Financial institutions like major banks and credit agencies can combine the application of
models and computing powers to develop algorithms that can detect credit card delinquency
with better accuracy. Being at the expense of the personal and more accurate information of
the clients can also provide these institutions to accurately choose the required features to
detect the default payments. Application of DNN models can only provide the required results
if provided with the appropriate features to predict the dependent feature, in this study, it was
the default payment for the next month. The choice of features creates a profound impact on
the application of the DNN models as features with the least significant importance can result
in noise and an increase in the error rates where as significant features can increase the accuracy
rates as we have observed in this study.
6.5 Conclusion
One of the primary capabilities of a robust risk management system must be detecting the
risks earlier, though many of the bank systems today lack this key capability which leads to
further losses (MGI, 2017). This thesis was able to contribute to this gap by proposing a DNN
model to be used along with sampling techniques for imbalanced datasets. The proposed model
was able to achieve 82.18% accuracy with the use of the RUS sampling technique and a ROC
score of 0.706. As a direct comparison with the models used by Hamori et al (2018) since they
used the same dataset, our models and techniques have much better accuracy as they were only
able to achieve 69.17% average accuracy in testing. Comparing with other models used in the
literature, since many of them lacked the use of sampling techniques in one way or the other,
this study could not place a direct comparison. Being said that at 82.18% accuracy and 0.706
ROC score, the DNN model proposed in this study can be concluded to be used as a real-life
classifier in predicting credit risk assessment. Further, the application of such techniques and
models will require the construction of a robust framework through a public-private
partnership in the Canadian financial sector.

82

References
2019 Global payments trends report – Canada Country Insights. (2019). Retrieved from
https://www.jpmorgan.com/merchant-services/insights/reports/Canada
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., … Zheng, X. (2016).
TensorFlow: A system for large-scale machine learning. Retrieved from
https://search.ebscohost.com/login.aspx?direct=true&db=edsarx&AN=edsarx.1605.08695&s
ite=eds-live
Abdelmoula, A. K. (2015). Bank credit risk analysis with k-nearest neighbor classifier: Case
of Tunisian banks. Accounting & Management Information Systems / Contabilitate Si
Informatica de Gestiune, 14(1), 79–106. Retrieved from https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=bth&AN=102300233&site=eds-live
An Experiment with the Edited Nearest-Neighbor Rule. (1976). IEEE Transactions on
Systems, Man, and Cybernetics, Systems, Man and Cybernetics, IEEE Transactions on, IEEE
Trans. Syst., Man, Cybern, SMC-6(6), 448–452. https://doiorg.ezproxy.tru.ca/10.1109/TSMC.1976.4309523
Basel I: International Convergence of Capital Measurement and Capital Standards (1988,
July 15). Retrieved from https://www.bis.org/publ/bcbs04a.htm
Bahillo, J. A., Ganguly, S., Kremer, A., & Kristensen, I. (2016). The value in digitally
transforming credit risk management. Retrieved from https://www.mckinsey.com/businessfunctions/risk/our-insights/the-value-in-digitally-transforming-credit-risk-management
Basel I: International Convergence of Capital Measurement and Capital Standards (1988, July
15). Retrieved from https://www.bis.org/publ/bcbs04a.htm
Basel II: International Convergence of Capital Measurement and Capital Standards: a Revised
Framework. (2004, June 10). Retrieved from https://www.bis.org/publ/bcbs107.htm
Basel III: A global regulatory framework for more resilient banks and banking systems revised version June 2011. (2011, June 1). Retrieved from
https://www.bis.org/publ/bcbs189.htm
Canadian demands for speed and convenience influencing payments innovation. (2018,
December 12). Retrieved from https://www.payments.ca/industry-info/our-research/canadiandemands-speed-and-convenience-influencing-payments-innovation
Canadians rapidly adopting new payments channels. (2019, December 5). Retrieved from
https://www.payments.ca/industry-info/our-research/canadians-rapidly-adopting-newpayments-channels
CBA - Credit card statistics. (2019, July 18). Retrieved from https://cba.ca/credit-cardstatistics
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2011). SMOTE: Synthetic
Minority Over-sampling Technique. https://doi-org.ezproxy.tru.ca/10.1613/jair.953
83

CIFAR - Pan-Canadian Artificial Intelligence Strategy. (2019).
https://www.cifar.ca/ai/pan-canadian-artificial-intelligence-strategy

Retrieved from

Cover, T., & Hart, P. (1967, January). Nearest neighbor pattern classification. IEEE
Transactions on Information Theory, 13(1), 21–27. doi: 10.1109/TIT.1967.1053964
BAYRACI, S., & SUSUZ, O. (2019). A Deep Neural Network (DNN) based classification
model in application to loan default prediction. Theoretical & Applied Economics, (4), 75–84.
Retrieved
from
https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=bth&AN=140243898&site=eds-live
Cao, J., Lu, H., Wang, W., & Wang, J. (2013). A loan default discrimination model using costsensitive support vector machine improved by PSO. Information Technology & Management,
14(3), 193–204. https://doi-org.ezproxy.tru.ca/10.1007/s10799-013-0161-1
Chen, S., Härdle, W. K., & Moro, R. A. (2011). Modeling default risk with support vector
machines. Quantitative Finance, 11(1), 135-154. doi: 10.1080/14697680903410015
Cimpoeru, S. S. (2011). Neural Networks and Their Application in Credit Risk Assessment.
Evidence from the Romanian Market. Technological & Economic Development of Economy,
17(3), 519–534. https://doi-org.ezproxy.tru.ca/10.3846/20294913.2011.606339
Danenas, P., & Garsva, G. (2015). Selection of Support Vector Machines based classifiers for
credit risk domain. Expert Systems With Applications, 42(6), 3194–3204. https://doiorg.ezproxy.tru.ca/10.1016/j.eswa.2014.12.001
Equifax
history
(2018).
Retrieved
https://www.equifax.co.uk/resources/what_we_do/credit-experts-since-1899.html

from

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A
Library for Large Linear Classification. JOURNAL OF MACHINE LEARNING
RESEARCH,
9,
1871–1874.
Retrieved
from
https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=edswsc&AN=000262636800009&site=edslive
Fix. E., & Hodges, Jr., J. L. (1951, February). Discriminatory analysis, nonparametric
discrimination. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a800276.pdf
Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert Systems
with Applications, 42(2), 741–750. https://doi-org.ezproxy.tru.ca/10.1016/j.eswa.2014.08.029
Härle, P., Havas, A., & Samandari, H. (2015). The future of bank risk management. Retrieved
from
https://www.mckinsey.com/business-functions/risk/our-insights/the-future-of-bankrisk-management
Haykin, S. S. (1998). Neural networks: a comprehensive foundation. Upper Saddle River, NJ:
Prentice-Hall, c1998.
Hien M. Nguyen, Eric W. Cooper, and Katsuari Kamei. 2011. Borderline over-sampling for
imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigm. 3, 1 (April 2011), 4–
21. DOI:https://doi.org/10.1504/IJKESDP.2011.039875

84

KARAA, A., & KRICHENE, A. (2012). Credit-Risk Assessment Using Support Vectors
Machine and Multilayer Neural Network Models: A Comparative Study Case of a Tunisian
Bank. Accounting & Management Information Systems / Contabilitate Si Informatica de
Gestiune, 11(4), 587–620.
Kasabov, N. K. (1996). Foundations of Neural Networks, Fuzzy Systems, and Knowledge
Engineering. Cambridge, Mass: MIT Press. Retrieved from https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=nlebk&AN=1810&site=eds-live
Khashman, A. (2010). Neural networks for credit risk evaluation: Investigation of different
neural models and learning schemes. Expert Systems with Applications.
https://doi.org/10.1016/j.eswa.2010.02.101
Khemakhem, S., & Boujelbènea, Y. (2015). Credit risk prediction: A comparative study
between discriminant analysis and the neural network approach. Accounting & Management
Information Systems / Contabilitate Si Informatica de Gestiune, 14(1), 60–78
Kim, A., Yang, Y., Lessmann, S., Ma, T., Sung, M.-C., & Johnson, J. E. V. (2020). Can Deep
Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting.
European Journal of Operational Research, 283(1), 217–234
Kvamme, H., Sellereite, N., Aas, K., & Sjursen, S. (2018). Predicting mortgage default using
convolutional neural networks. Expert Systems With Applications, 102, 207–217. https://doiorg.ezproxy.tru.ca/10.1016/j.eswa.2018.02.029
Massaron, L., & Boschetti, A. (2016). Regression Analysis with Python. Packt Publishing
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P.
Zarembka (Ed.), Frontiers in econometrics (pp. 104-142). New York, NY: Academic Press
Murphy, K. P. (2012). Machine learning: A Probabilistic Perspective. Cambridge, MA: MIT
Press
Oreski, S., Oreski, D., & Oreski, G. (2012). Hybrid system with genetic algorithm and artificial
neural networks and its application to retail credit risk assessment. Expert Systems With
Applications,39(16),12605–12617. https://doi-org.ezproxy.tru.ca/10.1016/j.eswa.2012.05.023
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay,
É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research,
12(10),
2825–2830.
Retrieved
from
https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=bth&AN=70109929&site=eds-live
Poloz, S. (2019, November 14). Technological Progress and Monetary Policy: Managing the
Fourth Industrial Revolution. Retrieved from https://www.bankofcanada.ca/2019/11/staffdiscussion-paper-2019-11
Reed, R. D., & Marks, R. J. (1999). Neural Smithing: Supervised Learning in Feedforward
Artificial Neural Networks. Cambridge, Mass: A Bradford Book. Retrieved from
https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=nlebk&AN=9366&site=eds-live

85

Schmidhuber, J. (2014). Deep Learning in Neural Networks: An Overview. https://doiorg.ezproxy.tru.ca/10.1016/j.neunet.2014.09.003
Shigeyuki Hamori, Minami Kawai, Takahiro Kume, Yuji Murakami, & ChikaraWatanabe.
(2018). Ensemble Learning or Deep Learning? Application to Default Risk Analysis. Journal
of
Risk
&
Financial
Management,
11(1),
1.
https://doiorg.ezproxy.tru.ca/10.3390/jrfm11010012
Sun, T., & Vasarhelyi, M. A. (2018). Predicting credit card delinquencies: An application of
deep neural networks. Intelligent Systems in Accounting, Finance & Management, 25(4), 174–
189. https://doi-org.ezproxy.tru.ca/10.1002/isaf.1437
Vapnik, V.N. (2000). The nature of statistical learning theory (2nd ed). New York: Springer
W. E. Henley, & D. J. Hand. (1996). A $k$-Nearest-Neighbour Classifier for Assessing
Consumer Credit Risk. Journal of the Royal Statistical Society. Series D (The Statistician),
45(1), 77. https://doi-org.ezproxy.tru.ca/10.2307/2348414
Yannis Marinakis, Magdalene Marinaki, Michael Doumpos, Nikolaos Matsatsinis, &
Constantin Zopounidis. (2008). Optimization of nearest neighbor classifiers via metaheuristic
algorithms for credit risk assessment. Journal of Global Optimization, 42(2), 279-293.
Retrieved from https://search-ebscohostcom.ezproxy.tru.ca/login.aspx?direct=true&db=edb&AN=34205028&site=eds-live
Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for
consumer credit scoring. 2018 International Conference on Artificial Intelligence and Big
Data (ICAIBD), Artificial Intelligence and Big Data (ICAIBD), 2018 International
Conference On, 205–208. https://doi-org.ezproxy.tru.ca/10.1109/ICAIBD.2018.8396195

86