THOMPSON RIVERS UNIVERSITY Skin Cancer Detection Using Deep Learning By Saeid Moradi A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in Data Science KAMLOOPS, BRITISH COLUMBIA August, 2024 SUPERVISOR Dr. Mateen Shaikh ABSTRACT Skin cancer is one of the most prevalent cancer types worldwide currently, underscoring the significance of early detection and precise diagnosis for effective treatment. This study employs the HAM10000 dataset, comprising 10015 skin lesion instances across seven categories of pigmented skin lesions. Preprocessing techniques are applied, including image resizing and normalization, and data augmentation is implemented to address dataset imbalances. The research primarily employs supervised machine learning models for skin cancer detection, utilizing Convolutional Neural Networks (CNNs). Specifically, VGG16, VGG19, ResNet50, MobileNet, MobileNetV2, and MobileNetV3 are examined for their performance on the dataset. Results indicate that ResNet50, with 92.31% accuracy and 91.98% F1-score, demonstrates higher performance, while MobileNetV3, with about 13 minutes of training time, outperforms in terms of computational efficiency. Key Words: Skin Cancer Detection; CNNs; VGG16; ResNet50; MobileNet, MobileNetV2, MobileNetV3. ii ACKNOWLEDGEMENTS Work was conducted on Secwepemcúl’ecw, the unceded territory of the Secwépemc. The TRU Kamloops campus operates on the traditional lands of the Tk’emlúps te Secwépemc. This work acknowledges the support from the NSERC Discovery grant RGPIN-2018-06787. I want to express my heartfelt gratitude to my thesis supervisor, Dr. Mateen Shaikh, for his unwavering support, guidance, and invaluable insights throughout the research process. His expertise and encouragement have been instrumental in shaping the direction of this thesis. Beyond academics, I extend my most profound appreciation to my wife, Zohreh Moradi. Her support, patience, and understanding have been the cornerstones of my journey. Her encouragement during challenging moments and celebration during triumphs have made this academic pursuit a shared endeavor. Thank you to everyone who contributed to and supported this thesis. Your influence has left an indelible mark on my academic and personal growth. iii Contents 1 Introduction 1.1 1 What is skin cancer? . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 2 Types of skin cancer . . . . . . . . . . . . . . . . . . . 1.2 Summary of contribution . . . . . . . . . . . . . . . . . . . . . 10 1.3 Artificial intelligence and skin cancer detection . . . . . . . . . 11 1.4 1.3.1 Skin cancer detection . . . . . . . . . . . . . . . . . . . 11 1.3.2 Artificial intelligence methods in medical imaging . . . 12 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Exploratory Data Analysis - EDA . . . . . . . . . . . . 15 2 Literature Review 2.1 19 Machine Learning Techniques . . . . . . . . . . . . . . . . . . 20 2.1.1 Decision trees . . . . . . . . . . . . . . . . . . . . . . . 20 iv CONTENTS v 2.1.2 Support Vector Machines . . . . . . . . . . . . . . . . . 21 2.1.3 Artificial Neural Network . . . . . . . . . . . . . . . . . 22 2.1.4 Naı̈ve Bayes . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.5 K-Nearest Neighbors . . . . . . . . . . . . . . . . . . . 25 2.1.6 Machine Learning Techniques Summary . . . . . . . . 25 2.2 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.1 Deep learning models in skin cancer detection . . . . . 38 2.2.2 Recurrent Neural Networks . . . . . . . . . . . . . . . 39 2.2.3 Long Short-Term Memory . . . . . . . . . . . . . . . . 41 2.2.4 Generative Adversarial Network . . . . . . . . . . . . . 43 2.2.5 Convolutional Neural Network . . . . . . . . . . . . . . 44 2.2.6 Related Works . . . . . . . . . . . . . . . . . . . . . . 46 2.2.7 CNN Architecture 2.2.8 Transfer Learning . . . . . . . . . . . . . . . . . . . . . 68 3 Methodology 3.1 . . . . . . . . . . . . . . . . . . . . 47 70 Main stages of the methodology . . . . . . . . . . . . . . . . . 70 3.1.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . 71 CONTENTS vi 3.1.2 Data Augmentation . . . . . . . . . . . . . . . . . . . . 71 3.1.3 Model Architecture . . . . . . . . . . . . . . . . . . . . 72 3.1.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 73 4 Results & Discussion 4.1 4.2 77 Transfer Learning and Data Augmentation . . . . . . . . . . . 77 4.1.1 Parameter Tuning and Implementation Details . . . . . 77 4.1.2 CNN models without data augmentation . . . . . . . . 79 4.1.3 CNN models with data augmentation . . . . . . . . . . 88 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Conclusion 102 A Appendix: Accepted Paper 120 List of Figures 1.1 Melanoma skin lesions examples [1]. . . . . . . . . . . . . . . . 4 1.2 Basal Cell Carcinoma (BCC) skin lesion instance [2]. . . . . . 6 1.3 Squamous cell carcinoma (SCC) skin lesions examples [3]. . . . 7 1.4 Actinic keratosis (AK) skin lesions examples [3]. . . . . . . . . 8 1.5 Dysplastic Nevi skin lesions examples [4]. . . . . . . . . . . . . 9 1.6 Skin lesion images of HAM10000 Dataset. . . . . . . . . . . . 16 1.7 Distribution of Lesion Types in HAM10000 dataset. . . . . . . 17 1.8 Distribution of skin lesion types in the HAM10000 dataset. . . 18 2.1 Artificial Neural Networks (ANNs) basic architecture [5]. . . . 23 2.2 Shallow ANNs vs Deep neural networks [6]. . . . . . . . . . . . 26 2.3 Notations of neural network. [6]. . . . . . . . . . . . . . . . . . 28 2.4 Most Common Activation Functions. . . . . . . . . . . . . . . 29 vii LIST OF FIGURES viii 2.5 A neural network with one hidden layer. . . . . . . . . . . . . 31 2.6 Calculation in each node on neural network. . . . . . . . . . . 31 2.7 Computational graph of the model. . . . . . . . . . . . . . . . 35 2.8 Multiclass classification output layer. . . . . . . . . . . . . . . 39 2.9 Recurrent neural networks architecture [7]. . . . . . . . . . . . 40 2.10 Long short term (LSTM) architecture [8]. . . . . . . . . . . . . 42 2.11 Generative Adversarial Network (GAN) architecture [9]. . . . 44 2.12 CNNs Architecture. . . . . . . . . . . . . . . . . . . . . . . . . 49 2.13 Filtering with the stride of 2. . . . . . . . . . . . . . . . . . . 50 2.14 Padding in filtering [10]. . . . . . . . . . . . . . . . . . . . . . 52 2.15 Convolution on RGB images. . . . . . . . . . . . . . . . . . . 53 2.16 Convolution on RGB images with 2 filters. . . . . . . . . . . . 53 2.17 One layer of a convolutional network. . . . . . . . . . . . . . . 54 2.18 Pooling layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.19 Dropout layer. [10]. . . . . . . . . . . . . . . . . . . . . . . . . 58 2.20 VGG16 architecture. [11]. . . . . . . . . . . . . . . . . . . . . 59 2.21 Residual Block. . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.22 Residual Network. . . . . . . . . . . . . . . . . . . . . . . . . . 61 LIST OF FIGURES ix 2.23 ResNet Architecture. [12]. . . . . . . . . . . . . . . . . . . . . 62 2.24 Normal Convolution Vs. Depthwise Separable Convolution. [6]. 64 2.25 MobileNet architecture. [6]. . . . . . . . . . . . . . . . . . . . 64 2.26 MobileNet Version 2 architecture. [6]. . . . . . . . . . . . . . . 65 2.27 MobileNet Version 3 architecture. [13]. . . . . . . . . . . . . . 66 2.28 MobileNetV3 activation functions. [13]. . . . . . . . . . . . . . 67 2.29 Transfer learning methodology. . . . . . . . . . . . . . . . . . 69 3.1 Our methodology process steps. . . . . . . . . . . . . . . . . . 71 3.2 Model architecture. . . . . . . . . . . . . . . . . . . . . . . . . 73 4.1 Distribution of skin lesions in training dataset before augmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 List of Tables 2.1 Summary of Related Works . . . . . . . . . . . . . . . . . . . 48 4.1 Pre-trained CNNs without Data Augmentation. . . . . . . . . 80 4.2 Fine-tuned CNNs without Data Augmentation. . . . . . . . . 81 4.3 True labels of the test dataset. . . . . . . . . . . . . . . . . . . 82 4.4 Confusion matrix of MobileNetV1 on the test dataset. . . . . . 82 4.5 Confusion matrix of MobileNetV2 on the test dataset. . . . . . 83 4.6 Confusion matrix of MobileNetV3 on the test dataset. . . . . . 84 4.7 Confusion matrix of VGG16 on the test dataset. . . . . . . . . 85 4.8 Confusion matrix of VGG19 on the test dataset. . . . . . . . . 86 4.9 Confusion matrix of ResNet50 on the test dataset. . . . . . . . 87 4.10 Frozen Weight CNNs with Data Augmentation. . . . . . . . . 90 4.11 Fine-tuned CNNs with Data Augmentation. . . . . . . . . . . 91 x LIST OF TABLES xi 4.12 Confusion matrix of MobileNetV1 on the test dataset after data augmentation. . . . . . . . . . . . . . . . . . . . . . . . . 93 4.13 Confusion matrix of MobileNetV2 on the test dataset after data augmentation. . . . . . . . . . . . . . . . . . . . . . . . . 94 4.14 Confusion matrix of MobileNetV3 on the test dataset after data augmentation. . . . . . . . . . . . . . . . . . . . . . . . . 95 4.15 Confusion matrix of VGG16 on the test dataset after data augmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.16 Confusion matrix of VGG19 on the test dataset after data augmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.17 Confusion matrix of ResNet50 on the test dataset after data augmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.18 Summary of Model Performance, Strengths, and Weaknesses . 99 Chapter 1 Introduction 1.1 What is skin cancer? Skin cancer stands out as one of the most prevalent forms of cancer globally currently [14]. Skin cancer is mainly categorized into two major groups: melanoma, which is the dangerous type, and non-melanoma, which is more common but generally less deadly [15]. Based on the World Cancer Research Fund International (WCRFI) report, estimating the incidence of skin cancer poses a distinctive challenge for several reasons. The existence of various sub-types of skin cancer complicates the compilation of data. For instance, non-melanoma skin cancer is frequently not monitored by cancer registries, and registrations for this type of cancer are often incomplete as many cases are effectively treated through surgical procedures or ablation. Consequently, the reported global incidence of skin cancer is likely lower than the actual occurrence due to these factors. WCRFI reported that melanoma is the 17th most common cancer worldwide; it is the 13th most common cancer in men 1 and the 15th most common cancer in women. Among women aged 30 to 35, skin cancer is the second most prevalent cancer, following breast cancer, and among women aged 25 to 29, it stands as the most common cancer [16]. In addition, WCRFI reports that there were more than 150,000 new cases of melanoma skin cancer worldwide in 2020, with a total of 324,635 cases around the world. Australia, New Zealand, Denmark, The Netherlands, and Norway were the five countries with the highest melanoma skin cancer rates in 2020. This is likely due to a combination of factors, including high sun and UV exposure levels, especially in Australia and New Zealand, and the predominance of light-skinned populations in these countries, who are more susceptible to UV-induced skin damage. The mortality of melanoma skin cancer around the world in 2020 was 57,043 deaths, where New Zealand, Norway, Montenegro, Slovakia, and Slovenia had the highest number of deaths. There were 1,198,071 cases of non-melanoma type in 2020, and the five countries with the highest rates were Australia, New Zealand, the US, Canada, and Switzerland. The worldwide mortality rate for non-melanoma skin cancer was 63,731 in 2020, whereas Papua New Guinea, Namibia, Mozambique, Zimbabwe, and Angola had the highest mortality rates [17]. 1.1.1 Types of skin cancer Melanoma and nonmelanoma represent the principal categories of skin cancer. This section outlines the specific subtypes within the families of skin cancers. 2 Melanoma Melanoma is a dangerous form of skin cancer originating from melanocytes, which produce skin pigment, melanin [18]. Melanoma can potentially impact any region of the human body, with a common occurrence on sun-exposed areas like the hands, face, neck, and lips [18]. UV radiation from the sun can penetrate the skin and damage the DNA within skin cells, particularly in melanocytes, the cells responsible for producing the pigment melanin. This damage often takes the form of DNA mutations, such as the formation of thymine dimers, which can lead to errors during DNA replication. If these mutations affect genes that regulate cell growth and division—such as tumor suppressor genes or oncogenes—they can disrupt normal cellular functions and lead to uncontrolled cell proliferation. Over time, these changes can accumulate, transforming normal melanocytes into malignant melanoma cells. [18]. Timely diagnosis is crucial for effectively treating melanoma; otherwise, it can metastasize to other parts of the body and ultimately lead to death [19]. Prolonged exposure to specific forms of light, such as ultraviolet rays from the sun or tanning devices, constitutes the primary factor responsible for the development of both melanoma and non-melanoma skin cancers [20]. Additionally, several factors have been linked to an elevated risk of skin cancer, including radiation exposure, genetic predisposition, and family history, as well as variations in skin pigmentation [20]. In 2024, an estimated 200,340 cases are projected in the U.S., with 8,290 deaths expected [18]. Figure 1.1 indicates melanoma skin cancer image. 3 Figure 1.1: Melanoma skin lesions examples [1]. 4 Non-Melanoma Non-Melanoma Skin Cancer, alternatively referred to as keratinocyte cancer, originates in the skin’s keratinocyte cells, and it has two major subtypes: Basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) [21]. Basal Cell Carcinoma (BCC) Basal cell carcinoma (BCC), the most prevalent form of skin cancer, originates in basal cells responsible for renewing skin cells in the lower epidermis [2]. While typically confined to sun-exposed areas and rarely metastatic, BCC can lead to disfigurement or, in rare instances, life-threatening spread [2]. According to the American Cancer Society (ACS), around 80 percent of all skin cancers are basal cell cancers [22]. BCC manifests on the skin’s surface, resembling sores, growths, bumps, scars, or red patches. Diagnosed through visual inspection and biopsy, BCC, if untreated, may invade adjacent areas and recur [22]. Its occurrence in sun-exposed regions, such as the face, head, neck, and arms, is linked to long-term sun or UV exposure [2]. Figure 1.2 shows some examples of BCC skin lesions. Squamous cell carcinoma (SCC) Squamous cell carcinoma (SCC) is a common form of skin cancer, accounting for approximately 20% of all non-melanoma skin cancers [3]. It is characterized by abnormal growth of squamous cells. Since the primary cause of SCC is UV radiation, it typically appears as scaly patches or raised growths on sun-exposed areas but can occur anywhere on the body. Early detection 5 Figure 1.2: Basal Cell Carcinoma (BCC) skin lesion instance [2]. 6 Figure 1.3: Squamous cell carcinoma (SCC) skin lesions examples [3]. is crucial for successful treatment, as advanced SCCs can become dangerous by invading deeper layers of the skin, underlying tissues, or even spreading (metastasizing) to lymph nodes and other organs, which can lead to significant complications and be life-threatening [3]. Regular self-examination and annual dermatologist visits are recommended, particularly for individuals at higher risk—such as those with a history of excessive sun exposure, fair skin, or a family history of skin cancer. These practices and sun safety measures can significantly reduce the risk of developing skin cancer [23]. Figure 1.3 shows some examples of SCC skin lesions. 7 Figure 1.4: Actinic keratosis (AK) skin lesions examples [3]. Actinic keratosis (AK) Actinic keratosis (AK), known as solar keratosis, is a skin condition triggered by prolonged exposure to ultraviolet radiation, typically from sunlight [21]. Actinic keratosis (AK) is a pre-malignant skin growth that can potentially progress into squamous cell carcinoma (SCC). AKs usually emerge on skin areas exposed to the elements, such as the head, neck, hands, and forearms [23]. Figure 1.4 shows some examples of AK skin lesions. Dysplastic Nevi Atypical moles, also known as dysplastic nevi, share similarities with regular moles but also display specific characteristics akin to melanoma. They often have an irregular shape or color and are larger than typical moles. Atypical moles can develop on skin that is usually covered, such as the buttocks or scalp, as well as on skin exposed to the sun [23]. Figure 1.5 shows some examples of dysplastic nevi skin lesions. 8 Figure 1.5: Dysplastic Nevi skin lesions examples [4]. 9 1.2 Summary of contribution This research is motivated by two primary goals. First, to improve the efficiency and accuracy of skin cancer diagnosis by developing an artificial intelligence-based screening system using dermoscopic images of skin lesions. Such a system could aid clinical screening tests, reduce diagnostic errors, and enhance early detection, which is critical for successful treatment. Second, this study aims to address the urgent need for reliable automated skin cancer detection systems, particularly in regions with limited access to dermatology specialists. By evaluating the classification performance of six CNN models and analyzing their training behavior and time requirements, this research provides a comprehensive assessment of AI-based solutions for skin cancer diagnosis. Ultimately, this study seeks to bridge diagnostic gaps, enable timely treatment, improve patient outcomes, and potentially save lives. Most recent studies focus on optimizing model accuracy without addressing the computational complexity, making them less suitable for real-time or mobile applications. Additionally, many approaches do not adequately address class imbalance in datasets, which can lead to biased models that underperform on minority classes. This study addresses these gaps by evaluating a diverse set of pre-trained CNN models, focusing on accuracy and computational efficiency. Moreover, by fine-tuning these models and analyzing their performance across a balanced dataset, this research aims to develop a practical, scalable solution for skin cancer detection that can be deployed in resource-limited settings. A portion of this thesis was peer reviewed and accepted for publication to appear in the proceedings of The International Conference on Intelligent 10 Informatics and Biomedical Sciences (ICIIBMS) 2024. The accepted, not the final published, version of the manuscript, is provided in the Appendix. In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Thompson Rivers University’s products or services. Internal or personal use of this material is permitted. 1.3 Artificial intelligence and skin cancer detection 1.3.1 Skin cancer detection Early detection and accurate diagnosis are critical factors in treating skin cancer. Typically, physicians rely on the biopsy method for skin cancer detection, which involves extracting a sample from a suspected skin lesion for laboratory-based confirmation of cancer [24]. However, this process is often painful, slow, and time-consuming. A biopsy is usually conducted to confirm the diagnosis of a suspected lesion or to remove a lesion for cosmetic or therapeutic reasons [24]. Dermatologists can correctly classify skin cancer with an accuracy of 75% to 84% when diagnosing melanoma [25, 26].However, globally, there is a shortage of skilled dermatologists in public healthcare systems, exacerbating the challenges in dermatological diagnosis and treatment [27], and demonstrating the need for fast and accurate diagnostic techniques that clinicians can easily employ. 11 1.3.2 Artificial intelligence methods in medical imaging Artificial intelligence (AI), a domain within computer science characterized by using machines and programs to emulate intelligent human behavior through various technological approaches, stands as a pivotal catalyst driving the fourth industrial revolution [28]. Within this domain, Machine learning (ML) emerges as a prominent technique, employing statistical models and algorithms to progressively learn from data, enabling the prediction of characteristics of new samples and the execution of desired tasks [29]. ML trains computers to emulate human cognitive processes, learning from past experiences and expanding upon them with minimal human intervention. Its profound impact spans various societal domains, including production lines, healthcare, education, transportation, and food industries [29]. Indeed, machine learning is actively reshaping everyday life, and industries such as housing, automotive, retail, etc. Central to the objective of machine learning is the endowment of computers with the capacity to collect and interpret data, thereby facilitating informed decision-making processes based on past and present outcomes [30]. ML enables computers to gain insights from data through various paradigms such as supervised, unsupervised, semi-supervised, or reinforcement learning [31]. Supervised learning involves pattern recognition from labeled datasets containing descriptive features and corresponding class labels. In contrast, unsupervised learning algorithms discern patterns from unlabeled datasets, often applied in anomaly detection tasks [32]. Deep Learning (DL), as a subcategory of ML comprising deep neural networks, shares similarities with ML yet operates on a deeper level of complexity. DL techniques can be supervised, unsupervised, 12 or semi-supervised, demonstrating widespread application in medical imaging for tasks such as image segmentation, classification, and object detection due to their superior performance [33]. In recent decades, deep learning has profoundly transformed the field of machine learning. The significant increase in processing power has facilitated remarkable progress in computer vision technologies, notably by developing deep learning models like Convolutional Neural Networks (CNNs) [34]. The urgency for early skin cancer detection has intensified, and deep learning has emerged as a powerful tool in this endeavor. Studies have demonstrated that early identification of skin cancer using deep learning improves the performance of human specialists, ultimately leading to a reduction in mortality rates [35]. By incorporating efficient formulations into deep learning techniques, exceptional and state-of-the-art processing and classification accuracy can be achieved [36]. Computer-based technology presents a promising avenue for diagnosing skin cancer symptoms, offering advantages in comfort, cost-effectiveness, and speed [36]. Typically, the process of skin cancer detection entails several stages, starting with the acquisition of images of skin lesions. These images are then subjected to preprocessing techniques to enhance quality and remove noise [37]. Subsequently, relevant features are extracted from the preprocessed images, which are crucial inputs for classification algorithms. Finally, these algorithms utilize the extracted features to categorize skin lesions into their classes [38]. This approach leverages the capabilities of computer-based technology in the diagnosis process, enabling efficient and accurate identification of potential skin cancer symptoms. This research tries to develop a skin lesion diagnosis model using the HAM10000 dataset [39], including a wide array of dermatoscopic images. The research methodology involves exploring and analyzing the HAM10000 13 dataset, focusing on harnessing the inherent complexities within the dermatoscopic images. Applying deep learning techniques and algorithms aims to develop a model that can effectively recognize patterns and characteristics within skin lesion images. The study aims to contribute to the progress of dermatological diagnostics, particularly in the classification of skin lesions. 1.4 Data Description Quality data plays a pivotal role in the performance of machine learning models. Therefore, a diverse and comprehensive collection of dermoscopic images is necessary to assess the effectiveness of computer-based systems for skin cancer diagnosis. The HAM10000 dataset, which consists of highresolution dermoscopic images, is used in this research. The dataset consists of 10,015 dermatoscopic images obtained from different populations and acquired through various modalities. The dataset was gathered from two sources: Cliff Rosendahl’s skin cancer practice in Queensland, Australia, and the Dermatology Department of the Medical University of Vienna, Austria. It includes representative cases of all significant diagnostic categories for pigmented lesions such as actinic keratoses and intraepithelial carcinoma (AKIEC), basal cell carcinoma (BCC), benign keratosis-like lesions (BKL), dermatofibroma (DF), melanoma (MEL), melanocytic nevi (nv), and vascular lesions (vasc) [36]. The dataset is publicly available through the Kaggle [40]. The resulting dataset includes 327 images of AKIEC, 514 images of basal cell carcinomas, 1099 images of benign keratoses, 115 images of dermatofibromas, 6705 images of melanomas, 1113 images of melanocytic nevi, and 142 images of vascular skin lesions [39]. Figure 1.6 indicates images from 14 the dataset for seven lesion types. 1.4.1 Exploratory Data Analysis - EDA Exploratory data analysis (EDA) involves analyzing and summarizing datasets to understand their characteristics better before formal modeling. The main goal of EDA is to identify patterns, trends, and relationships in the data that can inform further analysis and modeling. The following shows the analysis of the HAM10000 dataset to gain some insights into the data structures and samples. Figure 1.7 indicates information about the distribution of lesion types in the HAM10000 dataset. The lesion types bar chart indicates that melanocytic nevi are the most diagnosed condition among people in this dataset among the various types of skin diseases. On the other hand, dermatofibroma is a benign skin lesion less common than other lesion types in the dataset. It shows a kind of imbalance in the HAM10000 dataset. The pie chart in Figure 1.8 illustrates the distribution of skin lesion types in the HAM10000 dataset. Approximately 67% of the dataset comprises nevi lesions, while dermatofibroma skin lesions constitute only 1.2%. Such an imbalanced distribution may pose challenges in training models and potentially impact their generalization capabilities. We use data augmentation to solve the problem of imbalance in the dataset. 15 Figure 1.6: Skin lesion images of HAM10000 Dataset. 16 Figure 1.7: Distribution of Lesion Types in HAM10000 dataset. 17 Figure 1.8: Distribution of skin lesion types in the HAM10000 dataset. 18 Chapter 2 Literature Review The rising incidence of skin cancer necessitates timely diagnosis and continuous monitoring, placing a strain on specialist medical services. This burden could be alleviated by promoting patient self-surveillance techniques and integrating decision support systems for less experienced physicians. Unlike human diagnosis, machine diagnosis is objective and remains unaffected by external factors, offering consistent results. If properly applied, leveraging AI for skin cancer detection and progression monitoring can potentially reduce the need for biopsies and detect cancers early before they progress.Additionally, training interventions can empower patients and their caregivers to conduct self-skin examinations, which can facilitate teledermoscopy — a process where images of skin lesions are captured using a smartphone or digital camera and then transmitted to a dermatologist for remote evaluation. This approach can reduce the frequency of in-person medical consultations while effectively monitoring skin conditions [28]. Finding an automatic classification system for skin cancer is challenging 19 due to the complexity and diversity of skin cancer images. First, it’s important to note that different skin lesions often share significant similarities among various classes, increasing the risk of misdiagnosis [41]. Additionally, even within the same class, several skin lesions can vary in color, features, structure, size, and location [42]. 2.1 Machine Learning Techniques While traditional machine learning approaches perform well in specific skin cancer classification tasks, they often prove ineffective in handling complicated diagnostic problems. Typically, conventional machine learning methods for skin cancer diagnosis require extracting features from skin disease images and classifying these extracted features [43]. Commonly used features include the asymmetry, borders, color, and diameter of moles (known as ABCD features) [44], as well as 2D wavelet transformations [25] and the gray-level co-occurrence matrix (GLCM) features [45]. Various classification techniques like Support Vector Machines (SVM) [43], XGBoost [46], and decision trees [47] are frequently employed. Because of the limited number of selected features, machine learning algorithms may find it challenging to classify only a subset of skin cancer diseases. They may struggle to generalize to a broader spectrum of disease types [48]. 2.1.1 Decision trees Decision trees, another machine learning technique that is a supervised learning method primarily employed for classification problems, offer an intuitive 20 algorithm for assessing the long-term risk of non-melanoma skin cancer postliver transplant, utilizing variables linked to the peri-transplant period [49]. In a different context, [50] utilizes decision trees as a visual representation mode, dividing branches to depict various outcomes during a clinical procedure. This application involves assessing the cost-effectiveness of sentinel lymph node biopsy, a standard technique in melanoma and breast cancer treatment, specifically in the context of head and neck cutaneous squamous cell carcinoma, a subset of skin cancer. Moreover, decision trees can function as an intermediate layer, as demonstrated in [51], which showcases their effectiveness in region extraction and skin cancer classification using deep convolutional neural networks. In this architecture, decision trees, support vector machines, and k-nearest neighbors are crucial in classifying most features. Notably, the decision tree model in [49] reports a specificity of 42% and a sensitivity of 91%, while models akin to those in [50] exhibit a sensitivity of 77% with a reported 100% specificity. It’s essential to recognize that decision tree model predictions are significantly influenced by the quality of the datasets they are trained on [50]. 2.1.2 Support Vector Machines Support Vector Machines (SVMs) are powerful supervised learning models widely used for classifying, predicting, and analyzing data. Within the domain of skin lesion classification, SVMs have proven effective. In [52], using ABCD features facilitates the extraction of critical attributes, including shape, color, and size, from clinical images. These features are then employed 21 to classify skin lesions into distinct categories, such as melanoma, seborrheic keratosis, and lupus erythematosus, demonstrating the efficacy of the ABCD feature set when coupled with SVMs. In [53], preprocessing steps such as grayscale conversion, noise removal, and binarization are applied to the input image to enhance accuracy. Similarly, a bag-of-features approach incorporating spatial information is employed for skin cancer detection. SVMs are trained using histograms of oriented gradients, resulting in promising outcomes compared to existing algorithms [54]. A suggested methodology consisting of several phases, including pre-processing, segmentation, feature extraction, and classification, was proposed in [55]. Experimentation was conducted on a dataset comprising 1800 images, resulting in an accuracy 83% for a six-class classification task. This accuracy was attained using a support vector machine (SVM) with a quadratic kernel. 2.1.3 Artificial Neural Network An artificial neural network (ANN) is a nonlinear and statistical prediction technique that draws its structural inspiration from the biological framework of the human brain. As shown in Figure 2.1 An ANN comprises three layers of neurons; the initial layer is called the input layer, where these input neurons transmit data to the second layer, often referred to as the intermediate or hidden layer. In a typical ANN, multiple hidden layers can exist. The intermediate neurons convey data to the third layer, consisting of output neurons. At each layer, computations are learned through backpropagation, which is employed to grasp the intricate associations and relationships between the input and output layers. 22 Figure 2.1: Artificial Neural Networks (ANNs) basic architecture [5]. Xie et al. [56] introduced a skin lesion classification system designed to categorize lesions into two primary classes: benign and malignant. The proposed system’s classification results were benchmarked against various classifiers, including SVM, KNN, random forest, Adaboost, and others. The proposed model demonstrated an accuracy rate of 91.11%, outperforming the other classifiers by at least 7.5% in sensitivity. Choudhari and Biday [45] introduced another skin cancer diagnostic system based on artificial neural networks (ANN). In their approach, images were segmented using a maximum entropy thresholding measure, and unique features of skin lesions were extracted using a gray-level co-occurrence matrix (GLCM). Subsequently, a feed-forward ANN was employed to classify the input images into either a malignant or benign stage of skin cancer, achieving an accuracy level of 86.66%. 23 2.1.4 Naı̈ve Bayes Naı̈ve Bayes classifiers are another group of machine learning techniques that operate on Bayes’ theorem and are probabilistic classifiers widely employed in skin cancer research to accurately classify clinical and dermatological images [57]. These models have demonstrated 70.15% and 73.33% for accuracy and specificity, respectively[57]. Expanding their utility, Naı̈ve Bayes classifiers offer a method for detecting and segmenting skin diseases, as documented in [58]. The iterative process of obtaining posterior probability distributions for each output class enables efficient utilization of computational resources, minimizing the need for multiple training sessions. Results of this study indicate the diagnostic accuracy reached 72.7%. The Bayesian approach is valuable in various applications, including probabilistically predicting the nature of data points with high accuracy, as demonstrated in [59]. An iterative process obtains a posterior probability distribution for each output class, reducing the computational resources required and eliminating the need for multiple training sessions. This Bayesian sequential framework extends its utility to aiding models designed to detect melanoma invasion into human skin. In this context, three model parameters are estimated: the melanoma cell proliferation rate, the melanoma cell diffusivity, and a constant determining the degradation rate of melanoma cells in skin tissue. The algorithm learns from data sequentially, including a spatially uniform cell assay, a 2D circular barrier assay, and a 3D invasion assay. The versatility of this Bayesian framework allows for its extraction and application in various biological contexts beyond skin cancer detection. 24 2.1.5 K-Nearest Neighbors The k-nearest neighbors algorithm (KNN) is a supervised classification method that leverages distance and proximity metrics to classify data points. KNNs have been utilized and assessed in skin cancer detection, with evaluations involving the generation of a confusion matrix to depict the model’s accuracy [60]. The research shows an accuracy of 66.8% in terms of performance metrics. Furthermore, for positive predictions, the precision and recall stand at 71% and 46%, respectively. In [61], KNN is extended using the Radius Nearest Neighbors classifier to classify breast cancer, overcoming limitations posed by extreme values of k. Normalizing the radius value of each point helps effectively recognize outliers, mitigating sensitivity to outliers and underfitting issues. Despite its effectiveness in skin cancer diagnosis, KNN classifiers necessitate continuous training and encounter challenges related to limited training data availability [60, 61]. 2.1.6 Machine Learning Techniques Summary Upon analyzing the diverse implementations of machine learning models in skin cancer diagnosis, it becomes evident that Support Vector Machines (SVMs) show better precise and accurate models [20]. However, their requirement for meticulous pre-processing of input data presents a significant challenge. For user flexibility, K-means clustering and K-nearest neighbors offer viable alternatives without substantial compromises in accuracy and performance. Nonetheless, K-nearest neighbors necessitate continuous train25 Figure 2.2: Shallow ANNs vs Deep neural networks [6]. ing as additional data points are introduced, which can be burdensome due to the unpredictable volume of input data [20]. In contrast, Naı̈ve Bayes models exhibit the lowest accuracy among the studied machine learning techniques, likely because other methods, such as decision trees and random forests, build upon the foundational principles of the Naı̈ve Bayes theorem [20]. 2.2 Deep learning Deep neural networks are ANNs with a higher number of hidden layers. Figure 2.2 represents the shallow neural networks with less than two hidden layers and deep neural networks with five hidden layers. Following this section, we go through the mathematics behind neural networks and deep learning and then describe the families of deep learning models commonly used in skin cancer detection. Standard notations for neural networks and deep learning [6] 26 • superscript (i) will denote the ith training example. • m: number of examples in the dataset. {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(m) , y (m) )} • nx : input size • ny : output size (or number of classes) [l] • nh : number of hidden units of the lth layer. • L: number of layers in the network.  • X∈R nx ×m .. . .. . .. .     (1) (2)  (m) is the input matrix. X = x x ... x    .. .. .. . . . • x(i) ∈ Rnx is the ith example represented as a column vector.   .. .. .. . . .     • Y ∈ Rny ×m is the label matrix. Y = y (1) y (2) . . . y (m)    .. .. .. . . . • y(i) ∈ Rny is the output label for the ith example. • W [l] ∈ Rnumberof unitsinnextlayer×numberof unitsinthepreviouslayer is the weight matrix, and subscription [l] indicates the layer. • b[l] ∈ Rnumberof unitsinnextlayer is the bias vector in the lth layer. • ŷ ∈ Rny is the predicted output vector. It can also be denoted a[L] where L is the number of layers in the network. • a = g [l] (Wx x(i) + b1 ) = g [l] (z1 ) where g [l] is the lth layer activation function. [l] [l−1] • General Activation Formula: aj = g [l] (Σk Wj k [l] ak [l] [l] + bj ) = g [l] (zj ) 27 Figure 2.3: Notations of neural network. [6]. • J(x, W, b, y) or J(ŷ, y) denote the cost function. Examples of cost (i) (i) function: JCE (ŷ, y) = −Σm i=0 y log ŷ J(ŷ, y) = − m1 Σ[y (i) log ŷ (i) + (1 − y (i) )log(1 − ŷ (i) )] Figure 2.3 indicates notations for a neural network with two hidden layers. In this representation, nodes represent inputs, activations, or outputs, and edges represent weights or biases. Activation Function Activation functions introduce non-linearity into the CNN architecture, enabling the networks to learn complex relationships in the data. Common activation functions include ReLU (Rectified Linear Unit), Leaky ReLu, sigmoid, and tanh. ReLU is widely used due to its simplicity and effectiveness 28 Figure 2.4: Most Common Activation Functions. in mitigating the vanishing gradient problem. Figure 2.4 shows the most common activation functions in CNNs. The top left activation function is the sigmoid function, the top right function is tanh, the button left function is the Rectified unit function (ReLU), and the button right function is Leaky ReLu. The following indicates the formula for these functions. a = g(z) = sigmoid(z) = σ(z) = 1+e1−z z −z a = g(z) = tanh(z) = eez −e +e−z a = g(z) = ReLU (z) = max(0, z) a = g(z) = LeakyReLU (z) = max(0.01z, z) The derivatives of activation functions play a crucial role in neural network optimization. Thus, we present the derivatives corresponding to each activation function in the subsequent discussion. ′ a = g(z) = σ(z) = 1+e1−z → g (z) = a(1 − a) 29 z −z ′ → g (z) = 1 − a2 a = g(z) = tanh(z) = eez −e +e−z g(z) = ReLU (z) = max(0, z)     0, if z < 0    ′ → g (z) = 1, if z > 0      undef ined, otherwise g(z) = LeakyReLU (z) = max(0.01z, z)    0.01, if z < 0    ′ → g (z) = 1, if z > 0      undef ined, otherwise Calculation for shallow and deep neural networks Let’s start with doing calculations for a shallow neural network with one hidden layer for binary classification and then expand it to deep networks. Figure 2.5 indicates the architecture of the network. Figure 2.6 represents each node in the network, including two parts; z is the multiplication of weights and the input of the node in summation with bias, and a is the result of z through the activation function for that node. In a neural network, we have two processes: forward propagation from input to output (left to right), where we propagate the input in different layers of the network to find the ŷ, and backward propagation from output to input (right to left) for updating the parameters w and b in a way minimizing the error. First, do the forward propagation calculation for one input with three features. For the hidden layer, we have the following calculations: 30 Figure 2.5: A neural network with one hidden layer. Figure 2.6: Calculation in each node on neural network. 31 [1] [1]T [1] [1] [1] [1] [1]T [1] [1] [1] [1] [1]T [1] [1] [1] [1] [1]T [1] [1] [1] Z1 = w1 x + b1 , a1 = σ(z1 ) Z2 = w2 x + b2 , a2 = σ(z2 ) Z3 = w3 x + b3 , a3 = σ(z3 ) Z4 = w4 x + b4 , a4 = σ(z4 ) We can rewrite the above calculation in matrix form as below: The weight matrix W [1] includes the weights in layer one for all nodes.   [1]T . . . w1 ...     [1]T . . . w2 . . . [1]  W =   . . . w3[1]T . . .   [1]T . . . w4 ... The bias vector b[1] includes all biases in the first layer.   [1] b  1   [1]  b2  [1]  b =  [1]  b3    [1] b4 The multiplication and summation calculation vector as below:   [1] z  1   [1]  z2  [1]  z =  [1]  z3    [1] z4 The same for a: 32   [1] a1    [1]  a2   a[1] =   [1]  a3    [1] a4 a[1] = σ(z [1] ) We can write the same calculation for the output layer, a node for binary classification. Ensuring the right dimension for the matrices and vectors in the calculation is pivotal. The whole calculation for both layers in matrix form is like this: [1] [1] [1] z(4,1) = W(4,3) x(3,1) + b(4,1) [1] a(4,1) = σ(z [1] ) [2] [2] [1] [2] z(1,1) = W(1,4) a(4,1) + b(1,1) [2] a(1,1) = σ(z [2] ) Now we can expand the calculation for m inputs with a ”for loop” like the following: for i = 1 to m : z [1](i) = W [1] x(i) + b[1] a[1](i) = σ(z [1](i) ) z [2](i) = W [2] a[1](i) + b[2] a[2](i) = σ(z [2](i) ) 33 We also can use vectorization instead of using ”for loop” in programming, which means writing all the calculations in matrix form and using dot product instead of using loops in programming to speed up the process. Z [1] = W [1] X + b[1] A[1] = σ(Z [1] ) Z [2] = W [2] A[1] + b[2] A[2] = σ(Z [2] ) Where X is the input matrix as below:   .. .. .. . . .     X = x(1) x(2) . . . x(m)    .. .. .. . . .  Z [1] .. . .. . .. .     [1](1) [1](2)  [1](m) = z  z ... z   .. .. .. . . .  .. . .. . .. .      A[1] = a[1](1) a[1](2) . . . a[1](m)    .. .. .. . . . These matrices indicate that we go through the examples or inputs horizontally and vertically through the units (nodes) in hidden layers. These are the calculations for the forward propagation. Let’s start the calculation for backward propagation from output to input. It’s an optimization problem where we try to update the network pa- 34 Figure 2.7: Computational graph of the model. rameters w and b to minimize the error so that the predicted value of ŷ is closer to the actual target variable y. Gradient descent is a popular method for optimizing the cost function in neural networks to update the model parameters. This method is iterative and tries to minimize the error in each iteration by updating the parameters. In our neural network model, we have four sets of parameters including W [1] [1] [0] (n[1],n ) , b(n[1] ,1) , W [2] [2] [1] (n[2],n ) , and b(n[2] ,1) , where nx = n[0] , and n[2]=1 . In gradient descent, we update the parameters by finding the derivative of the cost function concerning that parameter. We start with a computational graph to make the calculation easier. Figure 2.7 indicates the graph, where the black arrows show the forward propagation path and the red arrows show the backward propagation path from output to input. We define the derivatives of the cost function to the parameters in the following: dw[1] = dwdJ[1] , db[1] = dbdJ[1] , dw[2] = dwdJ[2] , db[2] = dbdJ[2] , and then updates the parameters in an iterative process like below: Repeat { 35 compute predictions (ŷ, i = 1 to m) compute dw[1] , db[1] , dw[2] , db[2] W [1] := W [1] − αdw[1] , b[1] := b[1] − αdb[1] W [2] := W [2] − αdw[2] , b[2] := b[2] − αdb[2] } Where α is the learning rate, below are the derivatives based on the computational graph in Figure 2.7. In this example, solve a binary classification, and the loss function is equal to L(a[2] , y) = −[ylog(a[2] )+(1−y)log(1−a[2] )]. [2] ,y) 1−y da[2] = dL(a = − ay[2] + 1−a [2] da[2] [2] [2] ) 1−y da [2] = dL(a,y) = da[2] dσ(z = [− ay[2] + 1−a (1 − a[2] )] = dz [2] = dL(a,y) [2] ][a dz [2] da[2] dz [2] dz [2] a[2] − y [2] [2] [2] [2] ,y) ,y) da dz dW [2] = dL(a = dL(a = dz [2] a[1]T = (a[2] − y)a[1]T dw[2] da[2] dz [2] dw[2] [2] [2] [2] [2] ,y) ,y) da dz db[2] = dL(a = dL(a = dz [2] db[2] da[2] dz [2] db[2] [2] [1] ′ dz da [2]T dz [1] = dz [2] da dz [2] ∗ g [1] (z [1] ) [1] dz [1] = W dW [1] = dz [1] xT db[1] = dz [1] Where g(z) is the activation function layer one, and ∗ denotes the elementwise multiplication. The vectorization for the above calculation can be written below: dZ [2] = A[2] − Y 36 dW [2] = m1 dZ [2] A[1]T [2] [2] n h db[2] = m1 [Σm j=1 (dZi,j )]i=1 ′ dZ [1] = W [2]T dZ [2] ∗ g [1] (Z [1] ) dW [1] = m1 dZ [1] X T [1] [1] n h db[1] = m1 [Σm j=1 (dZi,j )]i=1 After calculation for a shallow neural network, we can expand the calculation for deep neural networks in the following: Forward Propagation: Z [1] = W [1] X + b[1] A[1] = g [1] (Z [1] ) Z [2] = W [2] A[1] + b[2] A[2] = g [2] (Z [2] ) .. . A[L] = g [L] (Z [L] ) Backward Propagation: dZ [L] = A[L] − Y dW [L] = m1 dZ [L] A[L−1] [L] T [L] n h db[L] = m1 [Σm j=1 (dZi,j )]i=1 37 ′ T dZ [L−1] = W [L] dZ [L] ∗ g [L−1] (Z [L−1] ) .. . ′ dZ [1] = W [2]T dZ [2] ∗ g [1] (Z [1] ) dW [1] = m1 dZ [1] X T [1] [1] n h db[1] = m1 [Σm j=1 (dZi,j )]i=1 These are the calculations from shallow to deep neural networks in a binary classification problem. In multiclass classification, all the processes are the same, but the output layer activation function is softmax. The following indicates the formula for the Softmax activation function. z [L] = W [L] a[L−1] + b[L] k = ez [L] z [L] ŷ = a[L] = g [L] (z [L] ) = Σ#ofeclasses k j=1 j This activation function takes a vector as input and produces a vector as output. Figure 2.8 illustrates a multiclass classification scenario with the softmax activation function utilized at the output layer. The following are the commonly used deep learning models in skin cancer detection. 2.2.1 Deep learning models in skin cancer detection The discipline of deep learning within artificial intelligence is rapidly expanding, offering numerous potential applications. Deep learning is one of the most potent and extensively employed machine learning techniques based on 38 Figure 2.8: Multiclass classification output layer. artificial neural networks, particularly for recognizing and categorizing images [62]. In recent years, deep learning algorithms have gained extensive usage for skin cancer classification. In contrast to traditional machine learning techniques, deep learning algorithms can accurately analyze data from large-scale datasets, enabling them to extract relevant features efficiently [63]. Deep learning techniques find applications in various domains, including speech recognition [64], computer vision and pattern recognition [65], and bioinformatics [66]. In recent years, diverse deep learning approaches, including Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Generative Adversarial Network (GAN), and Convolutional Neural Networks (CNN) have been employed for computer-based skin cancer detection. 2.2.2 Recurrent Neural Networks A recurrent neural network (RNN) is a subset of artificial neural networks and has found application in melanoma skin cancer detection [67]. Figure 2.9 shows the architecture of RNN models. In [68], deep features are extracted from clinical images in a feature extraction process using the hamming dis39 Figure 2.9: Recurrent neural networks architecture [7]. tance approach and fed into a dual bidirectional long short-term memory (LSTM) network for feature learning and a SoftMax activation function for image classification. Similarly, ensemble models are employed for automating mammogram breast cancer detection, where features extracted through the grey-level co-occurrence matrix and grey-level run-length matrix are inputted into the RNN layer. The segmented tumor binary image is provided as input to the CNN layer, leading to improved diagnostic accuracy. Moreover, RNNs have been instrumental in segmenting various dermoscopic images [69]. The recurrent model’s ability to train deeper and larger models enhances performance, ensuring better feature representation. The modified RNNs proposed in [67] exhibit an average accuracy of around 90%, with an F1-score of 0.865. Similarly, RNNs in [70] achieve an accuracy of 98% and an F1-score of 0.745. The model in [69] reports a testing accuracy of 87.09% and an average F1-score of 0.86. 40 2.2.3 Long Short-Term Memory Long Short-Term Memory (LSTM) is an artificial neural network architecture with feedback connections representing a specialized form of recurrent neural network architecture engineered to address the vanishing gradient challenge encountered in RNNs. Their capability to learn intricate temporal dependencies in sequential data makes them highly effective across various tasks, including time series prediction, natural language processing, and speech recognition. Figure 2.10 indicates the architecture of LSTM models. Memory cells are central to the LSTM model’s structure, which sustain a cell state capable of retaining information over extended durations. These memory cells incorporate a range of gates, including input, forget, and output gates, which manage the flow of information within the cell. This model efficiently maintains stateful information, leading to accurate predictions and fast recognition of target regions while requiring fewer computations than previous algorithms. Including LSTM improves the prediction accuracy due to its ability to retain information from earlier timestamps. LSTMs can predict cancer and tumors in irregular medical data, leveraging their superior performance in screening time-series data [71]. Skin disease classification models utilize deep learning approaches like LSTM, often enhanced with hybrid optimization algorithms such as the Hybrid Squirrel Butterfly Search Optimization algorithm (HSBSO) [72]. This modified LSTM, incorporating HSBSO and optimized parameters, maximizes classification accuracy and overall efficiency, achieving an average sensitivity of 53% and specificity of 80%. 41 Figure 2.10: Long short term (LSTM) architecture [8]. 42 2.2.4 Generative Adversarial Network A Generative Adversarial Network (GAN) is a powerful class of deep neural networks (DNN) inspired by zero-sum game theory [73]. GANs consist of two neural networks, a generator and a discriminator, which compete to analyze and capture the variance in a given dataset. The generator module creates fake data samples based on the data distribution, aiming to deceive the discriminator, while the discriminator distinguishes between real and fake data samples [74]. Through repeated iterations during training, both networks improve their performance as they compete against each other. GANs excel at generating fake samples resembling real ones, addressing the problem of insufficient training examples in deep learning. Figure 2.11 shows the architecture of generative adversarial networks. Rashid et al. [64] proposed a GAN-based classification system, augmenting a training set with realistic-looking skin lesion images generated via GAN. A deconvolutional network was the generator, while the discriminator utilized a CNN classifier. The proposed system achieved an accuracy of 86.1% for skin lesion classification. To address limitations in deep learning methods, such as the need for large, unbalanced datasets, [65] proposed a system combining data purification with GAN-based data augmentation. Decoupled deep convolutional GANs were employed for data generation, resulting in improved performance compared to the baseline ResNet-50 model. These studies demonstrate the effectiveness of GANs in enhancing the performance of skin cancer diagnostic systems by addressing challenges re- 43 Figure 2.11: Generative Adversarial Network (GAN) architecture [9]. lated to dataset size and imbalance. 2.2.5 Convolutional Neural Network A convolutional neural network (CNN) is a crucial subtype of deep neural networks extensively used in computer vision. CNNs are particularly adept at image classification, grouping, and recognition tasks. In CNNs, the convolution operation is a fundamental process that helps extract features from the input data, such as images. Equation 2.1 represents the mathematical expression for a 2D convolution. S(i, j) = M −1 N −1 X X K(m, n) · I(i + m, j + n) (2.1) m=0 n=0 S(i, j) is the output feature map, I(i, j) is the input image, and K(m, n) is the convolutional kernel (filter) of size M × N . The convolution operation slides the kernel K over the input image I. At each position, it computes the sum of element-wise products between the kernel and the corresponding 44 patch of the input image. The result is stored in the output feature map S. In [75], Deep CNNs have been utilized to classify skin cancer into four categories: basal cell carcinoma, squamous cell carcinoma, actinic keratosis, and melanoma. The authors assess the performance using evaluation parameters such as accuracy, sensitivity, and specificity. Recent research has explored integrating patient data with CNNs to enhance diagnostic accuracy in dermatology [76]. The patient data typically included information such as sex, age, and lesion location, and one-hot encoding was used to incorporate this data. The decision to fuse image features with patient data was contingent on the complexity of each classification task. These studies highlight the potential advantages and benefits of incorporating patient data into deep CNN algorithms in dermatology. In [77], the pre-trained Inception v3 model has been fine-tuned on two different resolution scales of input lesion images: a coarse scale and a finer scale. The coarse scale captured the lesions’ shape characteristics and overall contextual information. In contrast, the finer scale focused on gathering detailed texture information of the lesion, facilitating the differentiation between various skin lesions. In [78], a deep convolutional neural network (CNN) architecture was introduced to classify 12 distinct types of skin lesions. Initially, it was trained using 3797 lesion images; subsequently, data augmentation was applied, expanding the dataset 29 times through variations in lighting conditions and scale transformations. The proposed technique achieved an impressive AUC (Area Under the Curve) value of 0.99 for the classification of hemangioma lesions, pyogenic granuloma (PG) lesions, and intraepithelial carcinoma (IC) skin lesions. 45 2.2.6 Related Works Previous studies have demonstrated the effectiveness of Convolutional Neural Networks (CNNs) in skin cancer classification. For instance, a study utilizing the HAM10000 dataset employed MobileNet for skin lesion detection, achieving an accuracy of 83% [79]. Another study introduced a Fully Convolutional Residual Network (FCRN) with 16 residual blocks for melanoma detection, achieving an accuracy of 85.5% with segmentation and 82.8% without segmentation [80]. Huang et al. developed two deep learning models using DenseNet and EfficientNet, achieving 89.5% accuracy in binary classification on the KCGMH dataset and 85.8% on the HAM10000 dataset [81]. Furthermore, using Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) for image enhancement, coupled with a modified ResNet50 model, has improved classification metrics such as accuracy, precision, recall, and F1-score [82]. Another study focused on accurately classifying skin lesions into seven categories using the HAM10000 dataset by leveraging 13 deep transfer learning models. The research emphasizes the importance of early detection in reducing mortality rates. It highlights the potential of AI-based systems to enhance diagnostic accuracy, particularly in regions with limited access to dermatological care [83]. Most current state-of-the-art approaches rely on either hybrid models [[84], [85]] or ensembles of deep learning classifiers [[86], [87], [88]], which, despite their high accuracy, are often too resourceintensive for mobile applications. Developing a practical mobile application requires identifying a deep learning model that balances state-of-the-art performance with lightweight architecture. Therefore, this paper evaluates the 46 performance of six different CNN models and analyzes their training time requirements. Despite these advancements, several limitations remain. Many studies focus primarily on optimizing model accuracy without addressing computational complexity, which makes these models less suitable for real-time or mobile applications. Additionally, class imbalance in datasets is often not adequately addressed, leading to biased models that underperform on minority classes. This study addresses these gaps by evaluating a diverse set of pre-trained CNN models, focusing on accuracy and computational efficiency. Moreover, by fine-tuning these models and analyzing their performance across a balanced dataset, this research aims to develop a practical, scalable solution for skin cancer detection that can be deployed in resourcelimited settings. Table 2.1 summarizes related works, their limitations, and our contribution to this research. 2.2.7 CNN Architecture In this research, we use the CNN family of deep neural networks to detect the skin lesions on our dataset. Therefore, we go through the architecture details and different layers of CNNs. The hidden layers of a CNN typically include convolution layers, nonlinear pooling layers, and fully connected layers [89]. Figure 2.12 shows the basic architecture of a CNN. 47 Table 2.1: Summary of Related Works Dataset Method Accuracy HAM10000 MobileNet[79] 83% Limitations Our Contribution Focuses on accuracy; Evaluates models for accu- lacks on racy, F1-Score and compu- effi- tational efficiency in- Proposes real-time applica- discussion computational ciency HAM10000 FCRN (16 85.5% with residual segmenta- blocks)[80] tion, 82.8% Computationally tensive, segmentation tions models requirement without KCGMH DenseNet, 89.5% HAM10000 EfficientNet[81] (KCGMH), Focuses on binary Evaluates performance on classification multi-class classification High resource usage Developed a lightweight so- 85.8% (HAM10000) HAM10000 ESRGAN + 86% ResNet- lution for mobile applica- 50[82] HAM10000 13 tions deep transfer learning 82.9% Low accuracy; Com- Improve Accuracy; Devel- putationally ops a mobile-friendly solu- sive expen- tion models[83] Convolutional Layer The convolutional layer is the core building block of CNNs. It applies a set of learnable filters to the input image to extract features. Each filter scans through the input image and produces a feature map by performing elementwise multiplication and summation. The output feature maps capture different aspects of the input image, such as edges, textures, and patterns. This 48 Figure 2.12: CNNs Architecture. powerful aspect enables Convolutional Neural Networks to automatically extract essential features at each layer, eliminating the necessity for manual feature engineering or selection. CNNs inherently possess the capability to learn hierarchical representations of data, starting from low-level features such as edges and textures and progressing to higher-level features that capture complex patterns and structures. The output dimension of convolutional layers can be calculated using the equation 2.2. [ nh + 2p − f nw + 2p − f + 1] × [ + 1] s s (2.2) nh and nw are the input image height and width, f is the filter size (both height and width), p is the padding (both height and width), and s is the stride length. This formula computes the height and width of the output feature map produced by a convolutional layer based on the input image’s parameters, filter size, padding, and stride length. Stride: The stride is a parameter within the filter, influencing the extent of movement across an image. When employing a stride of 1, the network processes data pixel by pixel. Alternatively, setting a stride of 2 entails pro- 49 Figure 2.13: Filtering with the stride of 2. cessing data while skipping every other pair of adjacent pixels. Figure 2.13 indicates the calculation for filtering with the stride of 2. After any calculation, the filter skips one column and one row after completing all columns. Below is the calculation for the first row of the output matrix. (2 × (−1)) + (6 × 0) + (8 × 1) + (2 × (−2)) + (7 × 0) + (4 × 2) + ((−1) × (−1)) + (1 × 0) + (9 × 1) = 20 (8 × (−1)) + ((−1) × 0) + (0 × 1) + (4 × (−2)) + (3 × 0) + (2 × 2) + (9 × (−1)) + (0 × 0) + (3 × 1) = −18 (0 × (−1)) + (5 × 0) + (3 × 1) + (2 × (−2)) + (8 × 0) + ((−1) × 2) + (3 × (−1)) + (6 × 0) + (4 × 1) = −2 Padding: Padding pertains to the augmentation of an image with additional pixels during kernel processing. For instance, when employing zeropadding in a CNN, extra pixels with zero value are appended to the image. Applying filters or kernels to scan the image often results in a size reduction. To retain the original image dimensions and extract low-level features effectively, it becomes necessary to prevent such size reduction by adding 50 supplementary pixels around the image boundaries. Figure 2.14 shows the padding for a 5 × 5 matrix in a 3 × 3 filtering process. Generating the results matrix involves multiplying each element of the 3 × 3 filter with its corresponding neighbor in the input matrix and summing these products. As an illustration, the first row values of the result matrix are computed in the following manner: (0 × (−1)) + (0 × 0) + (0 × 1) + (0 × (−2)) + (7 × 0) + (1 × 2) + (0 × (−1)) + (2 × 0) + (9 × 1) = 11 (0 × (−1)) + (0 × 0) + (0 × 1) + (7 × (−2)) + (1 × 0) + (2 × 2) + (2 × (−1)) + (9 × 0) + (3 × 1) = −9 (0 × (−1)) + (0 × 0) + (0 × 1) + (1 × (−2)) + (2 × 0) + (4 × 2) + (9 × (−1)) + (3 × 0) + (7 × 1) = 4 (0 × (−1)) + (0 × 0) + (0 × 1) + (2 × (−2)) + (4 × 0) + (8 × 2) + (3 × (−1)) + (7 × 0) + (6 × 1) = 15 (0 × (−1)) + (0 × 0) + (0 × 1) + (4 × (−2)) + (8 × 0) + (0 × 2) + (7 × (−1)) + (6 × 0) + (0 × 1) = −15 In this instance, with the stride value of one, we shift the filter one column to the right, compute the second value of the results matrix, and so on. Convolutions on RGB images The composition of color images involves three distinct channels—red, green, and blue—each represented by a pixel intensity values matrix. The fusion 51 Figure 2.14: Padding in filtering [10]. of these channels generates an RGB image. Notably, convolution operations for RGB images deviate from those applied to 2D images with one channel. Precisely, in RGB image convolution, the filter or kernel matches the number of channels in the input RGB image. Illustrated in Figure 2.15, an RGB image with the dimension of 6 × 6 × 3 undergoes convolution with a filter sized 3 × 3 × 3. This convolution yields a resulting output of dimensions 4 × 4, constituting a 2D image. Each pixel in this output is computed by multiplying and summing the 27 values within the 3 × 3 × 3 filter, aligned with their respective pixels in the input image. For the present example, no padding is applied, and a stride of 1 is assumed. Convolutional layers typically integrate multiple filters in practical convolutional neural network (CNN) implementations. Incorporating a greater number of filters facilitates the extraction of additional features from the input data. The output is a volume where the number of output channels equals the number of filters. Each channel within the output represents the feature maps associated with its corresponding filter, as depicted in Figure 2.16. Here, the outcomes derived from two distinct filters yield an output featuring two channels. 52 Figure 2.15: Convolution on RGB images. Figure 2.16: Convolution on RGB images with 2 filters. 53 Figure 2.17: One layer of a convolutional network. One Layer of a Convolutional Network Examine a single layer within a convolutional neural network (CNN) and explore how neural network principles can illuminate its operations. Figure 2.17 illustrates such a layer, where the input is a 6 × 6 × 3 RGB image, and the output is a 4 × 4 × 2 feature map. Each channel in the output represents distinct features extracted by individual filters. In neural network mathematics, these filters can be viewed as matrices of weights, and the sample calculations may be coerced into standard matrix algebra. When the input is convolved with each filter, the resulting outputs undergo nonlinear activation functions and bias addition. These processed outputs from each filter are then stacked together to form the final output. Convolutional layer notation A summary of notation in a convolutional layer in a CNN network. If layer l is a convolutional layer, the dimension and notation for this layer are as 54 follows: [l] f [l] = f ilter size, p[l] = padding, s[l] = stride, nc = number of f ilters [l−1] Input size : nH [l−1] × nW [l−1] × nc [l−1] each f ilter size : f [l] × f [l] × nc [l] [l] [l] activations size (a[l] ) : nH × nW × nc [l] [l] [l] activations matrix size (A[l] ) : m × nH × nW × nc [l−1] wieghts size : f [l] × f [l] × nc [l] [l] × nc , where nc is number of filters in layer l. [l] bias size : nc [l] [l] [l] Output size : nH × nW × nc [l−1] nH = [l] nH [l] nW nW = +2p[l] −f [l] s[l] [l−1] +2p[l] −f [l] s[l] +1 +1 Pooling Layer Pooling layers downsample, or reduce the dimension through sampling, of the feature maps generated by convolutional layers, reducing their spatial dimensions. Max pooling and average pooling are two commonly used pooling techniques. Max pooling selects the maximum value within each pooling region, while average pooling computes the average values. Pooling helps to reduce computational complexity, control overfitting, and increase the network’s translational invariance [90]. The pooling layer does not have any 55 Figure 2.18: Pooling layer. parameters for learning in the network. Figure 2.18 indicates the max and average pooling process. The pooling layer hyperparameters are outlined as follows: f : f ilter size, s : stride, and P ooling type : [max, average] In most instances, padding is not applied in pooling layers, except for certain special cases. Typically, common values for the stride (s) and filter size (f ) parameters are set to 2. This configuration results in a halving of the input dimension in each pooling layer. Fully Connected (Dense) Layer Fully connected layers are artificial neural network layers where each neuron is connected to every neuron in the previous layer. These layers integrate high-level features extracted by convolutional and pooling layers, which are then used for final classification or regression tasks. The outputs from fully connected layers are typically passed through activation functions. 56 Dropout Deep neural networks are powerful tools in supervised learning but often face a significant challenge known as overfitting. Overfitting occurs when a model learns to perform exceptionally well on the training data but fails to generalize to new, unseen data. This issue is prevalent in deep networks due to their large number of parameters. Dropout is a regularization technique designed to combat overfitting. It randomly removes a subset of neurons (along with their connections) from the neural network during training. This forces the network to learn more robust features, as no single neuron can rely on the presence of others. During training, dropout effectively generates numerous ”thinned” networks. During testing, the averaging effect of these thinned networks is approximated by using a single network with scaled-down weights, significantly reducing overfitting and improving generalization [91]. Dropout is typically governed by a probability parameter p, which determines the percentage of neurons to exclude from the network, often ranging between 0.2 and 0.5. Figure 2.19 illustrates the dropout process in neural networks. Please refer to the Evaluation metrics section (see Section 3.1.4) for a more detailed discussion of overfitting. Convolutional neural networks used in this research In this study, we employ pre-trained CNN models and adjust their parameters to address challenges in skin lesion detection. The CNN architectures utilized in this study include VGG16 [92], VGG19 [92], MobileNet [93], MobileNetV2 57 Figure 2.19: Dropout layer. [10]. [94], MobileNetV3 [13], and ResNet [12]. Below, we outline the architectures and distinctive features of these CNN families. VGG16 & VGG19 VGG16 and VGG19 are convolutional neural network models introduced by K. Simonyan and A. Zisserman from the Visual Geometry Group at the University of Oxford [92]. The numbers 16 and 19 indicate the number of weight layers in these models. These models gained prominence for their exceptional performance, achieving a top-5 test accuracy of 92.7% on the ImageNet dataset, comprising over 14 million images distributed across 1000 classes. VGG16 was notably submitted to the ILSVRC-2014 competition, where it showcased significant improvements over its predecessor, AlexNet. Figure 2.20 shows the VGG16 architecture. One of the key advancements of VGG16 over previous models like AlexNet [95] is its utilization of multiple 3×3 kernel-sized filters in place of larger kernel sizes (e.g., 11×11 and 5 × 5 in the first and second convolutional layers of 58 Figure 2.20: VGG16 architecture. [11]. AlexNet). The network architecture consists of convolutional layers followed by ReLU activation functions, with a fixed input size of 224 × 224 × 3 RGB images. All the convolutional layers in VGG16 have 3 × 3 filters, stride of 1, and padding of 1, so the input and output of each convolutional layer have the same size. VGG16 uses 3 × 3 filters to capture spatial features effectively. VGG16 incorporates spatial pooling through five max-pooling layers, which are interspersed among the convolutional layers. Max-pooling layers use 2 × 2 filters with a stride of 2, aiding in downsampling and feature extraction. VGG16 includes three fully connected (FC) layers following the convolutional layers. The first two contain 4096 channels each, and the third performs a 1000-way classification for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The final layer employs a softmax function for classification. 59 Residual Neural Network (ResNet) The ResNet is a deep learning model designed for computer vision tasks. It introduced significant advancements in the ILVRSC 2015 competition. ResNet achieved unprecedented results by effectively addressing challenges associated with training profound neural networks. ResNet surpassed other architectures by a substantial margin, winning the image classification task in ILVRSC 2015 with an impressive top-five error rate of 3.57% [12]. One of the primary issues ResNet aims to tackle is the Vanishing/Exploding Gradient Problem commonly encountered in deeper neural networks. As the number of layers increases, gradients of the loss function to the weights may become either excessively small or excessively large during backpropagation, hindering effective learning. The key components of the ResNet architecture include: Residual Block: Residual blocks are fundamental components of Residual Neural Networks. Unlike in plain neural networks, where the input is transformed by convolutional layers and passed through an activation function, ResNet introduces a residual connection. In a residual block, the input to the block is added to the output, creating a residual connection: a[l+2] = g [l+2] (z [l+2] + a[l] ) (2.3) Here, g [l+2] represents the activation function in layer l + 2. Figure 2.21 represents the residual block. Skip Connection: Skip connections play a crucial role in forming residual 60 Figure 2.21: Residual Block. Figure 2.22: Residual Network. blocks. They involve bypassing the residual block’s input over the convolutional layer and adding it to the block’s output. Stacked Layers: ResNet architectures are constructed by stacking multiple residual blocks together. By leveraging these stacked residual blocks, ResNet can achieve remarkable depth. Various versions of ResNet, including those with 50, 101, and 152 layers, were introduced. Figure 2.22 represents stacking residual blocks to make a residual network. Global Average Pooling (GAP): ResNet architectures typically employ Global Average Pooling as the final layer before the fully connected layer. 61 Figure 2.23: ResNet Architecture. [12]. GAP reduces spatial dimensions to a single value per feature map, providing a compact representation of the entire feature map. Figure 2.23 shows the architecture of ResNet models compared to VGG16 and plain networks. MobileNets MobileNets represent a class of efficient models tailored for mobile and embedded vision tasks. They employ a streamlined architecture that relies on depthwise separable convolutions to construct lightweight deep neural networks [93]. One notable feature of MobileNets is the incorporation of two straightforward global hyperparameters, which effectively balance latency and accuracy. These hyperparameters offer model builders the flexibility to select an appropriately sized model that aligns with the constraints of their specific application. MobileNets exhibit effectiveness across diverse applications and use cases, spanning object detection, fine-grained classification, analysis of facial at- 62 tributes, and large-scale geo-localization tasks. This versatility underscores the adaptability and practical utility of MobileNets in various scenarios, particularly those requiring efficient and accurate vision processing on resourceconstrained devices. The core concept underlying the MobileNet revolves around utilizing depthwise separable convolutions instead of conventional convolutions, aiming to diminish computational complexity and model size. This approach disassembles the standard convolution operation into two distinct stages: depthwise convolution and pointwise convolution. Depthwise convolution conducts independent convolutions on each input channel, utilizing a single filter per channel. Compared to conventional convolutions, this segmentation minimizes the number of parameters and computational requirements. It employs a 3 × 3 depthwise convolution with a stride of 1, followed by batch normalization and ReLU activation, thereby effectively capturing spatial information within each channel. Pointwise convolution operates on the output of the depthwise convolution, employing a 1 × 1 convolution to amalgamate information across channels. It uses a small number of 1 × 1 filters to facilitate cross-channel feature combinations and dimensionality reduction, enabling the mixing and transformation of features from diverse channels. Figure 2.24 indicates the standard convolution and depthwise separable convolution. MobileNet efficiently reduces parameters and computations while maintaining satisfactory accuracy by dividing the convolution process into specific stages. Utilizing depthwise separable convolutions enables the network to obtain concise representations of input data, rendering MobileNet suitable for 63 Figure 2.24: Normal Convolution Vs. Depthwise Separable Convolution. [6]. Figure 2.25: MobileNet architecture. [6]. resource-limited scenarios. The architecture of MobileNet, depicted in Figure 2.25, comprises 13 blocks of depthwise separable convolutional layers as described in the original paper by [93]. MobileNet version 2 (MobileNetV2) represents a significant advancement in mobile model performance across various tasks and benchmarks and in different model sizes [94]. The architecture of MobileNetV2 revolves around an inverted residual structure, which diverges from traditional residual models by employing thin bottleneck layers at the input and output of the residual block. Figure 2.26 represents the MobileNetV2 architecture. 64 Figure 2.26: MobileNet Version 2 architecture. [6]. MobileNetV2 adopts an inverted residual structure, where the input and output of the residual block consist of thin bottleneck layers. This design contrasts with conventional residual models that typically use expanded representations in the input. Instead of employing standard convolutions, MobileNetV2 utilizes lightweight depthwise convolutions to filter features within the intermediate expansion layer. This approach helps reduce computational complexity and model size. So and so performed a comparison and found this approach reduced computational complexity and model size while maintaining effective feature extraction [94]. Removal of Non-linearities in Narrow Layers: MobileNetV2 removes nonlinear activation functions in the narrow layers to preserve the representational power, referring to its ability to capture and model the complex patterns and structures in the input data [94]. This design choice ensures that the model can capture intricate patterns and features, even in layers with fewer parameters. Overall, MobileNetV2’s innovative architectural design, incorporating inverted residual structures and lightweight depthwise convolutions, enhances performance across various tasks and model sizes. By prioritizing efficiency 65 Figure 2.27: MobileNet Version 3 architecture. [13]. without compromising accuracy, MobileNetV2 remains a powerful solution for mobile and embedded vision applications [94]. MobileNet version 3 (MobileNetV3) introduces complementary search techniques and innovative architectural designs. Tailored specifically for mobile phone CPUs, MobileNetV3 integrates hardware-aware network architecture search (NAS) [13] alongside the NetAdapt algorithm [13], further refined through novel architecture advancements. This iteration introduces two variants, MobileNetV3-Large and MobileNetV3-Small, catering to high and low-resource use cases. Figure 2.27 indicates the architecture of MobileNetV3. Compared to MobileNetV2, MobileNetV3 incorporates the Squeeze and Excitation (SE) module, initially introduced in SENet [96], to enhance feature learning. To improve computational efficiency, MobileNetV3 replaces the sigmoid activation function in the SE module with the hard-sigmoid function, where the 2.4 and 2.5 indicate the equation for this function. Additionally, MobileNetV3 replaces the traditional ReLU activation function with the Swish activation function to enhance non-linearity [13]. 2.6 shows 66 Figure 2.28: MobileNetV3 activation functions. [13]. the swish function. RELU 6(x) = min(max(0, x), 6) (2.4) RELU 6(x + 3) 6 (2.5) h − swish(x) = x.h − sigmoid(x) (2.6) h − sigmoid(x) = The swish activation can be computationally inefficient on mobile and embedded hardware [97, 13]. This issue was spurned the hard-swish (HSwish) and was incorporated in mobilenetv3. H-Swish retains the non-linear properties of Swish while offering improved efficiency for mobile hardware implementations. This ensures that MobileNetV3 maintains high performance while being well-suited for deployment on mobile and embedded devices [13]. Figure 2.28 represents the activation functions of the MobileNetV3 model. 67 2.2.8 Transfer Learning With the remarkable advancements in deep learning, transfer learning has become a central element in various computer vision fields, including multimedia [98], surveillance [99], and medical applications [100]. The concept involves leveraging pre-trained models originally trained on non-medical or natural image datasets. These models are then fine-tuned with new data to adapt to specific tasks [101]. Transfer learning is crucial in deploying convolutional neural networks for diagnostic imaging tasks such as skin cancer detection [37], Alzheimer’s Disease diagnosis [102], and chest X-ray analysis [103]. Figure 2.29 illustrates the architectures used in the transfer learning approach. Typically, open-source pre-trained models are trained on extensive datasets containing numerous classes. For instance, the ImageNet dataset comprises 14 million images distributed across 1000 classes. Transfer learning allows us to modify pre-trained networks by replacing the top layer with an output layer tailored to our dataset. Depending on the size of our dataset, we can adjust or fine-tune the parameters of the pre-trained models to better suit our specific needs. 68 Figure 2.29: Transfer learning methodology. 69 Chapter 3 Methodology This chapter focuses on detecting skin cancer in the HAM10000 dataset and the pre-trained CNN methods. Figure 3.1 visually depicts the critical steps of our methodology. We use an ASUS TUF Gaming A15 system with AMD Ryzen 7 6800H processor information with Radeon Graphics, 3201 Mhz, 8 Core(s), 16 Logical Processor(s), and 16GB of RAM. 3.1 Main stages of the methodology The methodology consists of four main steps: pre-processing, data augmentation, model architecture, and evaluation metrics. 70 Figure 3.1: Our methodology process steps. 3.1.1 Pre-processing Pre-processing plays a crucial role in detecting skin cancer using deep learning models. This improves the performance of our models. The initial stage of our analysis involved reading images from the dataset and pre-processing them. The initial size of images is 600 × 450 × 3, and we resized images to dimensions of 244 × 224 × 3, ensuring compatibility with the convolutional neural network (CNN) architectures employed. Additionally, we use builtin functions in the Keras [104] library in Python for data normalization to enhance the uniformity of their pixel values, thus preparing them for subsequent training procedures. After pre-processing, we divided the dataset into training, validation (development), and test sets with an 80/10/10 ratio. 3.1.2 Data Augmentation The HAM10000 dataset includes an imbalanced distribution, where some categories have many images while others have only a few. The imbalance 71 is one of the significant challenges because classifiers tend to be influenced by the dominant class while neglecting the smaller ones [105]. This means that the classifier does not achieve the desired level of accuracy across all classes. The idea of resampling can be applied to tackle this problem. Data augmentation, achieved by applying transformations to images, is commonly employed to mitigate the challenges of imbalanced datasets. The available data can be diversified by augmenting the dataset through transformations such as rotation, flipping, scaling, and cropping, helping to address the class imbalance issue. This augmentation process enriches the dataset with variations of existing images, providing the model with a more comprehensive understanding of different instances within each class. Consequently, it enhances the model’s ability to generalize effectively across all classes, even in scenarios where certain classes are underrepresented in the original dataset. 3.1.3 Model Architecture We use pre-trained CNN models and try to fine-tune their parameters to alleviate skin lesion detection issues. Transfer learning gives us the power of flexibility in using all the parameters of these powerful CNN models or freezing the parameters and just using pre-trained weights. The exploration of transfer learning, utilizing pre-trained models such as VGG16 [92], VGG19 [92], MobileNet [93], MobileNet V2 [94], MobileNet V3 [13], and ResNet [12], is integral to the project. Transfer learning is employed to enhance the generalization ability of computer-aided diagnostic systems. Figure 3.2 represents the model architecture, where we drop the top layer of pre-trained models and add average pooling, dropout, and softmax layers with the number of classes in our dataset. 72 Figure 3.2: Model architecture. In our skin lesion detection application, we employ transfer learning by adjusting the pre-trained weights of VGG16, VGG19, ResNet50, MobileNetV1, MobileNetV2, and MobileNetV3, which collectively have been trained on the ImageNet dataset including over 14 million images of 1000 classes. This allows us to capitalize on the rich feature representations these models learn from diverse categories. We fine-tune pre-trained model parameters, including weights and biases, based on the HAM10000 dataset, enhancing the performance and accuracy of our skin lesion detection systems. This process involves removing the top layer of the networks and replacing it with average pooling, dropout, and softmax layers tailored to classify our dataset’s categories, including seven classes. By adapting the pre-trained weights to our unique classification tasks, we optimize the performance of our models for effective skin cancer lesion detection, saving computational resources and training time while leveraging the generalization power of these architectures. 3.1.4 Evaluation Metrics Evaluating the performance of a skin cancer detection model is essential to assess its accuracy and effectiveness. In deep learning, it is crucial to ensure 73 that a model performs well on the training data and generalizes effectively to unseen data. This is where the concepts of overfitting and underfitting become particularly important. Overfitting and underfitting are two critical challenges in machine learning that directly impact model performance. Overfitting occurs when a model is too complex, capturing noise and outliers in the training data rather than the underlying distribution. This results in excellent performance on the training data but poor generalization to new data. Techniques such as dropout, cross-validation, and regularization are commonly employed to mitigate overfitting [106]. In contrast, underfitting happens when a model is too simple to capture the underlying patterns in the data. An underfit model fails to perform well even on the training data, leading to poor predictions. To address underfitting, one might consider increasing model complexity, using more sophisticated algorithms, or providing more features to the model [106]. Based on established evaluation metrics in the skin cancer image classification domain [15, 19, 56, 80], this thesis assesses the performance of the models using metrics such as accuracy and weighted F1-score. Overfitting or underfitting is monitored by comparing the performance of both the training and validation datasets. Accuracy: Accuracy measures the proportion of correctly classified instances from the total number of cases in the dataset. Equation 3.1 indicates how we calculate the accuracy. 74 Accuracy = TP + TN TP + TN + FP + FN (3.1) Where: T P (True Positives) are the instances correctly classified as positive. T N (True Negatives) are the instances correctly classified as negative. F P (False Positives) are the instances incorrectly classified as positive. F N (False Negatives) are the instances incorrectly classified as negative. Precision: Precision measures the proportion of true positive predictions among all positive predictions made by the model. Equation 3.2 shows the formula for calculating the precision. P recision = TP TP + FP (3.2) Recall (Sensitivity): Recall measures the proportion of true positive predictions among all actual positive instances in the dataset. The formula for calculating recall is shown in equation 3.3. Recall = TP TP + FN (3.3) F1-Score: The F1-score is the harmonic mean of precision and recall, balancing the two metrics. Equation 3.4 indicates the formula for F 1−Score. F1 = 2 × P recision × Recall P recision + Recall (3.4) 75 Weighted F1-Score: The weighted F1-score is a metric used to evaluate the performance of a classification model, particularly in scenarios where there is class imbalance [107]. It is computed as the weighted average of the F1-scores for each class, with the weights assigned based on the number of actual occurrences (true instances) of each class in the dataset. This weighting ensures that classes with more instances significantly influence the final score, which can be crucial in datasets where some classes are underrepresented. F1-Score for each class: calculate F1-score for each class using 3.4, and the computing the weighted F1-score using W eightedF 1 = ΣC i=1 F 1i × Wi (3.5) C denotes the total number of classes in the dataset, and Wi represents the weight for class i. Specifically, Wi is the proportion of true instances of class i relative to the total number of instances in the dataset. This means that classes with more true instances contribute more to the weighted F1score, reflecting their prevalence in the dataset. Training time: We assess the models’ performance by considering their training time alongside other evaluation metrics to determine which model is more efficient and suitable for real-world applications, particularly low-power devices. 76 Chapter 4 Results & Discussion In this chapter, we present the findings of our investigation into skin lesion detection using deep learning models. 4.1 Transfer Learning and Data Augmentation 4.1.1 Parameter Tuning and Implementation Details We fine-tuned the parameters for the pre-trained models, including MobileNetV1, MobileNetV2, MobileNetV3, VGG16, VGG19, and ResNet50. Fine-tuning was carried out on specific layers tailored to each architecture, as follows: Tuning Process: 77 To optimize the performance of these models, we conducted extensive parameter tuning. The critical parameters adjusted during this process included: Batch Size: We experimented with 16, 32, 64, 128, and 256 batch sizes to identify the optimal size for efficient learning and model convergence. Learning Rate: Various learning rates, ranging from 0.00001 to 0.01, were tested to ensure the models converged effectively without overshooting the optimal point. Number of Epochs: We varied the number of epochs, testing 10, 20, 50, 70, and 100 epochs to balance sufficient training and the prevention of overfitting. Layers for Fine-Tuning: Depending on the architecture, various layers were tested to determine the best configuration for the model. Finally, specific layers were selected for fine-tuning based on performance. Dropout: Different dropout probabilities ranging from 0.1 to 0.9 were used to prevent overfitting and find the most robust model. Implementation: The tuning process was implemented using Python’s Keras [104] library. For each model, we monitored performance metrics on the validation set to identify the best combination of parameters. The final results were reported based on validation accuracy and loss. Here are the parameters selected during the freezing and unfreezing stages: 78 Freezing Stage: Optimizer: Adam with parameters β1 = 0.9, β2 = 0.999, and α = 0.001. Dropout: p = 0.2 Batch Size: 32 Epochs: 20 Unfreezing Stage: Learning Rate: α = 0.0001 Epochs: 50 This detailed tuning and implementation strategy ensured that each model was fine-tuned to achieve optimal performance on our skin cancer detection task. 4.1.2 CNN models without data augmentation In this section, we present the results obtained from our experimentation with transfer learning techniques applied to pre-trained CNN models on the HAM10000 dataset. We initially explore the performance of these models without any data augmentation, focusing on the freezing and training of pre-trained weights while training the top layers on our dataset. Table. 4.1 presents the outcomes obtained from employing pre-trained models with frozen weights and without data augmentation, focusing on accuracy and F1-score evaluation metrics. Based on the results, ResNet50 achieves 79 the highest accuracy among the considered models, indicative of its superior adaptability when utilizing frozen pre-trained weights and adjusting the upper layers to our dataset. Conversely, MobileNetV3 exhibits the most efficient runtime, emphasizing its potential suitability for real-time applications and low-power devices. Table 4.1Pre-trained CNNs without Data Augmentation. Training Training Validation Validation Model Accuracy F1-Score Accuracy F1-Score Run Time VGG16 0.7679 0.5350 0.749 0.5289 128m 39s VGG19 0.7644 0.5777 0.754 0.5721 162m 19s ResNet50 0.8534 0.7531 0.8085 0.6451 61m 48.5s MobileNetV1 0.8156 0.6878 0.7944 0.6065 20m 51.4s MobileNetV2 0.8179 0.7039 0.7752 0.5469 22m 53.9s MobileNetV3 0.7843 0.6027 0.7772 0.5429 6m 57.6s Parameter Tuning and Implementation Details: ” Following that, we proceed with fine-tuning the parameters of the pretrained models, focusing on specific layers tailored to each architecture. For MobileNetV1, MobileNetV2, MobileNetV3, VGG16, VGG19, and ResNet50, the fine-tuning of parameters commences from layers 50, 100, 120, 10, 13, and 120 out of a total of 86, 154, 157, 19, 22, and 175 layers, respectively. Table 4.2 showcases transfer learning results utilizing pre-trained CNN models, where weights are trained based on the HAM10000 dataset without resampling or data augmentation. ResNet50 is the top performer in training and validation accuracy and the F1-score. Furthermore, MobileNetV3 demon80 strates rapid training, requiring less than 9 minutes, while achieving a training accuracy of 98.54% and a validation accuracy of 85.79%. Table 4.2Fine-tuned CNNs without Data Augmentation. Training Training Validation Validation Model Accuracy F1-Score Accuracy F1-Score Run Time VGG16 0.9842 0.8868 0.8427 0.7752 263m 16.5s VGG19 0.9868 0.8903 0.8306 0.7511 277m 51s ResNet50 0.9934 0.9233 0.8639 0.8131 92m 40.1s MobileNetV1 0.9712 0.8561 0.833 0.761 32m 21.2s MobileNetV2 0.9823 0.8839 0.8538 0.7782 37m 22.5s MobileNetV3 0.9854 0.8871 0.8579 0.7802 8m 44.8s Examine the confusion matrix of the models on the test data. Table 4.3 illustrates the true labels of the test dataset, which comprises 1001 images from 7 classes. Table 4.4 displays the confusion matrix results for MobileNetV1 on the test dataset without augmentation. The confusion matrix results reveal that MobileNetV1 successfully identifies the nv skin lesion family, achieving 642 correct predictions out of 654 instances. Bkl lesions also show a relatively high number of correct predictions (77). However, the model struggles with the vasc lesion family, which is frequently misclassified. Precisely, mel lesions are often mistaken for nv and bkl, with 64 and 30 instances, respectively. Akiec and df also show considerable misclassifications between these classes and others like bcc and nv. This indicates that MobileNetV1 has difficulty distinguishing between these classes. Table 4.5 illustrates the confusion matrix results for MobileNetV2 on the 81 Table 4.3: True labels of the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 37 0 0 0 0 0 0 0 51 0 0 0 0 0 0 0 108 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 654 0 0 0 0 0 0 0 131 0 0 0 0 0 0 0 10 Table 4.4: Confusion matrix of MobileNetV1 on the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 19 1 0 0 2 2 0 3 31 2 1 2 1 0 9 9 77 3 7 30 0 0 0 2 4 0 0 0 6 9 26 2 642 64 3 0 1 1 0 1 33 0 0 0 0 0 0 1 7 82 Table 4.5: Confusion matrix of MobileNetV2 on the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 20 6 5 1 3 2 0 1 24 0 1 3 1 0 5 5 71 2 13 30 1 0 0 0 4 0 0 0 10 12 28 2 628 47 3 1 3 4 0 6 51 0 0 1 0 0 1 0 6 test dataset without augmentation. While MobileNetV2 performs well on identifying nv (628 correct predictions), it struggles with mel lesions correctly identifying only 51 out of 131 samples, frequently misclassifying them as nv and bkl. There are also notable confusions between akiec and mel with nv. Table 4.6 indicates the confusion matrix for MobileNetV3 on the test dataset without augmentation. The model correctly identifies 16 instances of akiec but struggles with misclassifications, especially with bkl and nv. It correctly predicts 36 cases of bcc lesions, yet confusion remains with akiec, bkl, nv, and mel. While the model excels in identifying bkl with 64 correct predictions, it also misclassifies these as nv and mel. Df is relatively well-predicted, though minor confusions with other types persist. The model performs strongly in predicting nv with 602 correct predictions but faces challenges with misclassifications such as bkl and mel. Similarly, while it correctly identifies 65 instances of mel, significant misclassifications with bkl 83 Table 4.6: Confusion matrix of MobileNetV3 on the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 16 4 4 1 3 2 0 2 36 4 2 8 1 1 10 4 64 0 14 22 0 0 0 2 6 1 1 0 5 3 24 1 602 40 1 4 3 10 0 26 65 0 0 1 0 0 0 0 8 and nv occur. Vasc identification is highly accurate, with eight instances correctly classified. Overall, MobileNetV3 demonstrates excellent performance in identifying nv and vasc, but struggles significantly with distinguishing mel and faces challenges with certain other class confusions. Table 4.7 represents the confusion matrix for VGG16 on the test dataset without augmentation. The model identifies 18 akiec instances. However, it misclassified this akiec class with mel and nv. Regarding the bcc class, the model demonstrates a correct prediction rate of 33 cases; nevertheless, confusion persists with akiec, bkl, nv, and mel. While the model accurately predicts 65 bkl cases, it misclassifies some instances as nv and mel. Df prediction encounters minor confusion with other lesion types. The model showcases proficiency in predicting nv, accurately identifying 618 instances, albeit facing challenges with misclassifications as bkl, bcc, and mel. Similarly, while correctly identifying 87 mel instances, misclassifications with nv are 84 Table 4.7: Confusion matrix of VGG16 on the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 18 7 5 0 0 3 0 2 33 3 1 5 0 1 3 1 65 1 6 5 0 0 0 1 7 1 0 0 5 5 16 0 618 36 0 9 5 18 1 23 87 0 0 0 0 0 1 0 9 noted. Vasc classification is highly accurate, with nine instances correctly identified out of 10. In summary, while the VGG16 model performs well in identifying nv and vasc lesions, it struggles when classifying mel and akiec instances. Table 4.8 depicts the confusion matrix for VGG19 on the test dataset without augmentation. The model successfully identifies 25 instances of akiec. However, it misclassifies some akiec cases as bcc, bkl, and nv. Regarding the bcc class, the model achieves an accuracy rate of 30 cases; nonetheless, confusion persists with akiec, bkl, nv, and mel. While accurately predicting 75 bkl cases, the model also misclassifies some as nv and mel. Confusion with other lesion types is observed in df prediction. Notably, the model demonstrates proficiency in predicting nv, correctly identifying 614 instances, despite encountering challenges with misclassifications as bkl, bcc, and mel. Similarly, while correctly identifying 76 mel instances, misclassifications with 85 Table 4.8: Confusion matrix of VGG19 on the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 25 7 5 3 3 5 0 2 30 2 0 5 0 0 7 3 75 1 12 14 0 0 0 0 5 0 0 0 3 10 15 1 614 34 0 0 1 11 0 19 76 0 0 0 0 0 1 2 10 nv, bkl, and akiec are observed. Vasc classification is highly accurate, with all ten cases correctly identified. In conclusion, although the VGG19 model performs well in identifying vasc lesions, it faces difficulties in accurately classifying df instances. Table 4.9 shows the confusion matrix for ResNet50 on the test dataset without augmentation. The model identifies 16 instances of akiec, but it incorrectly categorizes some as bkl and nv. For the bcc class, the model achieves an accuracy rate of 28 cases; however, confusion remains with other classes. While accurately predicting 63 bkl cases, it also misclassifies some instances as nv and mel. Df prediction, with six cases correct out of 10, experiences confusion with other lesion types. Notably, the model demonstrates proficiency in predicting nv, accurately identifying 630 instances, despite encountering challenges with misclassifications as bkl, bcc, vasc, and mel. Similarly, while correctly identifying 65 mel instances, misclassifications with nv 86 Table 4.9: Confusion matrix of ResNet50 on the test dataset. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 16 3 2 0 0 2 0 2 28 0 0 5 1 0 7 3 63 1 5 14 0 2 3 2 6 1 0 0 9 4 18 3 630 45 0 1 8 21 0 9 65 1 0 2 2 0 4 4 9 and bkl are observed. Vasc classification stands out for its high accuracy, correctly identifying nine out of 10 cases. In summary, although the ResNet50 model performs well in identifying nv lesions, it faces difficulties in accurately classifying akiec instances. In summary, the findings from Table 4.1 and Table 4.2 underscore the efficacy of transfer learning in leveraging pre-trained models for skin lesion detection, despite working with imbalanced datasets such as HAM10000. However, the confusion matrices provide deeper insights into how dataset imbalance impacts model classification challenges. Moreover, the results suggest that models perform better in classes with more training data, highlighting the importance of dataset balance in improving classification accuracy. 87 4.1.3 CNN models with data augmentation After assessing the effectiveness of transfer learning in using pre-trained models across various image types and fine-tuning the weights based on our dataset, we try to address the imbalance issue in the HAM10000 dataset. To enhance the performance of our model, we applied transfer learning in conjunction with data augmentation techniques. Data augmentation helps to increase both the diversity and size of the training dataset without the need for additional data collection. Specifically, we employed the following augmentation methods: Geometric Transformations: Rotation: Random rotations up to ±20 degrees. Horizontal Flip: Random flipping of images. Random Cropping: Randomly cropping a portion of the image to simulate different perspectives. Color Transformations Brightness Adjustment: Random adjustments to the brightness of the images. These transformations were applied to generate additional instances for each class in the training dataset. Figure 4.1 shows the distribution of images across the different classes before applying data augmentation. Each class is represented by numerical values where 0, 1, 2, 3, 4, 5, and 6 correspond to ’nv,’ ’mel,’ ’bkl,’ ’bcc,’ 88 Figure 4.1: Distribution of skin lesions in training dataset before augmentation. ’akiec,’ ’vasc,’ and ’df,’ respectively. This figure highlights the class imbalance present in the original dataset. After applying the augmentation techniques, the dataset was balanced by generating additional samples for each class. Specifically, each class contained exactly 2,000 samples, resulting in a balanced training dataset with a total of 14,000 images (2,000 samples per each of the 7 classes). Table 4.10 presents the outcomes obtained using pre-trained CNN models without any fine-tuning on weights coupled with data augmentation. Once more, ResNet50 emerges as the top performer, demonstrating strong performance in training and validation accuracy and F1-score metrics. Specifically, by solely training the top layers and leveraging knowledge transferred from other datasets, ResNet50 achieves a validation accuracy of 84.94% and an F1-score of 83.59%. Conversely, MobileNetV3 demonstrates comparatively lower accuracy and F1-score than ResNet50 and different versions of the MobileNet model. However, it stands out in terms of runtime efficiency and computational costs. Following data augmentation, we proceeded to fine-tune the parame89 Table 4.10Frozen Weight CNNs with Data Augmentation. Training Training Validation Validation Model Accuracy F1-Score Accuracy F1-Score Run Time VGG16 0.7753 0.7077 0.7668 0.6954 182m 33.6s VGG19 0.7788 0.7268 0.7690 0.7055 228m 11.3s ResNet50 0.8550 0.8494 0.8359 0.8171 86m 14.6s MobileNetV1 0.8089 0.8013 0.8041 0.7663 26m 51.5s MobileNetV2 0.8148 0.8056 0.8115 0.7834 30m 10.9s MobileNetV3 0.7961 0.7770 0.7833 0.7581 10m 4.2s ters of the pre-trained models, incorporating the augmented data. Specifically, for MobileNetV1, MobileNetV2, MobileNetV3, VGG16, VGG19, and ResNet50, parameters were fine-tuned from specific layers within each architecture. These layers were chosen based on their position within the network architecture to strike a balance between retaining the learned features from the pre-trained model and adapting to the specifics of the target dataset. Table 4.11 showcases the outcomes obtained through transfer learning utilizing pre-trained CNN models, where the weights are trained based on the HAM10000 dataset with data augmentation. After fine-tuning with augmented data, all methods exhibited commendable accuracy and F1-score performance. ResNet50, once again, emerged as a top performer, achieving 99.89% accuracy on the training dataset and 92.31% accuracy on the validation dataset. Following ResNet50, MobileNetV2, MobileNetV3, VGG16, VGG19, and MobileNetV1 demonstrated progressively better accuracy performance. Runtime is a crucial metric for gauging the computational costs incurred 90 by the models. Larger networks such as VGG19, VGG16, and ResNet50 incurred significantly higher computational costs. Among them, VGG19, with almost 430 minutes, had the highest training time. Conversely, MobileNet models demonstrated notable efficiency in terms of computational costs. Among these, MobileNetV3 stood out with a training time of less than 13 minutes, making it an optimal choice for resource-constrained devices like smartphones. This comprehensive evaluation underscores the effectiveness of transfer learning and data augmentation in addressing class imbalance and enhancing model performance across various CNN architectures. ResNet50 performs best, while MobileNetV3 offers an attractive balance between performance and computational efficiency. Let’s proceed to table 4.11 to delve into each model’s detailed performance metrics and runtime statistics. Table 4.11Fine-tuned CNNs with Data Augmentation. Training Training Validation Model Accuracy F1-Score Accuracy Validation Run Time F1-Score VGG16 0.9936 0.9896 0.9106 0.9052 327m 23.1s VGG19 0.9944 0.9902 0.9092 0.9064 430m 20.8s ResNet50 0.9989 0.9957 0.9231 0.9198 133m 28.1s MobileNetV1 0.9949 0.9815 0.9011 0.8981 48m 11.9s MobileNetV2 0.9959 0.9911 0.9161 0.9131 38m 9s MobileNetV3 0.9951 0.9903 0.9155 0.9112 13m 27.9s The confusion matrices for the models on the test data after data augmentation provide valuable insights into their performance. Let’s analyze 91 each model’s results: Table 4.12 shows the confusion matrix for MobileNetV1 on the test dataset after augmentation. The model correctly identifies 29 instances of akiec, though it misclassifies several cases as bkl and nv. The model achieves 40 correct predictions for bcc cases, yet there is still notable confusion with nv and other classes. It accurately predicts 89 instances of bkl, but some are incorrectly classified as nv and bcc. For df class, the model correctly identifies 7 out of 10 instances, despite occasional confusion with other lesion types. The model predicts nv, with 650 correct identifications out of 654 cases. However, it struggles with misclassifications involving bkl, bcc, and akiec. While 58 melanoma lesions are correctly identified, many are misclassified as nv and bkl. The classification of vasc lesions is highly accurate, with 8 out of 10 instances correctly identified. Overall, MobileNet demonstrates a high accuracy rate of 99.38% for nv lesions but struggles significantly with mel lesions, achieving a detection rate of only 44.27%. Table 4.13 displays the confusion matrix results for MobileNetV2 on the test dataset after data augmentation. The model accurately identifies 33 instances of akiec but misclassifies some as bkl and nv. It correctly predicts 34 bcc cases, though confusion with other classes like nv and akiec persists. The model predicts 86 instances of bkl accurately but misclassifies some as akiec and nv. The model achieves a high accuracy rate for df, correctly identifying 8 out of 10 cases, though it occasionally confuses other lesion types. MobileNetV2 is proficient in predicting nv, with 643 correct identifications out of 654 instances, but faces challenges with misclassifications involving bkl, mel, and akiec. It correctly identifies 80 mel instances but often misclassifies these as nv and bkl. The classification of vasc lesions is accurate, with 8 out 92 Table 4.12: Confusion matrix of MobileNetV1 on the test dataset after data augmentation. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 29 0 0 0 1 3 0 2 40 2 1 1 3 0 4 4 89 1 2 20 0 0 0 1 7 0 0 0 2 6 15 1 650 45 2 0 1 1 0 0 58 0 0 0 0 0 0 2 8 of 10 instances correctly identified. Table 4.14 indicates the confusion matrix of MobileNetV3 on the test dataset with data augmentation. The model correctly identifies 29 instances of akiec, though some are misclassified as bkl and nv. It achieves 46 correct predictions for bcc, yet confusion with other classes persists. The model accurately predicts 84 instances of bkl, but some are misclassified as nv, bcc, and akiec. It demonstrates a high accuracy rate for df, correctly classifying 9 out of 10 instances despite occasional confusion with other lesion types. MobileNetV3 excels in predicting nv, accurately identifying 644 out of 654 instances, although it faces challenges with misclassifications involving bkl, mel, and bcc. The model correctly identifies 92 instances of mel but frequently misclassifies these as nv and bkl. The classification of vasc lesions is highly accurate, with all 10 cases correctly identified, achieving 100% accu- 93 Table 4.13: Confusion matrix of MobileNetV2 on the test dataset after data augmentation. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 33 5 4 0 2 1 0 0 34 0 0 1 1 0 2 3 86 1 5 18 1 0 0 0 8 0 0 0 2 7 15 1 643 31 1 0 1 3 0 2 80 0 0 1 0 0 1 0 8 racy for this lesion type. In summary, MobileNetV3 performs exceptionally well in detecting vasc and nv lesions, with a high accuracy rate of 98.47% for nv. However, it struggles with the mel lesion family, achieving a detection rate of only 70.23%. Table 4.15 represents the confusion matrix for VGG16 on the test dataset with data augmentation. The model identifies 30 instances of akiec but misclassifies some as mel and nv. It achieves 41 correct predictions for bcc, yet confusion with other classes remains an issue. The model accurately predicts 81 instances of bkl but misclassifies some as nv, mel, bcc, and akiec. It shows a high accuracy rate for df, correctly classifying 9 out of 10 instances despite occasional confusion with other lesion types. VGG16 is proficient in predicting nv, accurately identifying 638 out of 654 cases, but faces challenges with misclassifications involving mel, bkl, and bcc. It correctly identifies 100 94 Table 4.14: Confusion matrix of MobileNetV3 on the test dataset after data augmentation. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 29 2 3 0 0 1 0 2 46 3 1 2 1 0 2 1 84 0 3 11 0 0 0 1 9 0 0 0 2 1 12 0 644 26 0 2 1 5 0 5 92 0 0 0 0 0 0 0 10 mel instances but often misclassifies them as nv and bkl. The classification of vasc lesions is highly accurate, with all ten instances correctly identified, achieving 100% accuracy for this lesion type. While VGG16 achieves perfect accuracy in detecting vasc lesions, it shows lower accuracy in detecting bkl lesions. Table 4.16 represents the confusion matrix for VGG19 on the test dataset with data augmentation. The model correctly identifies 35 instances of akiec but misclassifies some cases as bkl and nv. It achieves 38 correct predictions for bcc, though confusion with other classes persists. The model accurately predicts 89 instances of bkl but misclassifies some as nv, mel, and akiec. It shows a high accuracy rate for df, correctly classifying 8 out of 10 instances despite occasional confusion with other lesion types. VGG19 is proficient in predicting nv, accurately identifying 637 out of 654 cases, but faces chal- 95 Table 4.15: Confusion matrix of VGG16 on the test dataset after data augmentation. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 30 4 3 0 0 2 0 1 41 2 0 2 0 0 1 1 81 1 3 4 0 0 0 1 9 1 0 0 2 2 9 0 638 25 0 3 3 12 0 10 100 0 0 0 0 0 0 0 10 lenges with misclassifications involving mel, bkl, akiec, and bcc. It correctly identifies 90 mel instances but frequently misclassifies them as nv and bkl. The classification of vasc lesions is highly accurate, with all ten instances correctly identified, achieving 100% accuracy for this lesion type. While VGG19 excels in detecting vasc lesions, it faces significant challenges in accurately detecting the mel skin lesion family. Table 4.17 shows the confusion matrix for ResNet50 on the test dataset with data augmentation. The model successfully identifies 32 instances of akiec but misclassifies some as bkl and nv. It achieves 43 correct predictions for bcc, but confusion with other classes remains an issue. The model accurately predicts 88 instances of bkl but misclassifies some as nv and mel. It demonstrates a high accuracy rate for df, correctly classifying 9 out of 10 instances despite occasional confusion with other lesion types. ResNet50 is 96 Table 4.16: Confusion matrix of VGG19 on the test dataset after data augmentation. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 35 4 3 1 2 3 0 0 38 1 0 2 0 0 1 2 89 0 5 10 0 0 0 0 8 0 0 0 1 6 9 1 637 26 0 0 1 6 0 7 90 0 0 0 0 0 1 2 10 proficient in predicting nv, accurately identifying 646 out of 654 cases, but faces challenges with misclassifications involving mel, bkl, and bcc. It correctly identifies 89 mel instances but frequently misclassifies them as nv and bkl. The classification of vasc lesions is highly accurate, with all ten instances correctly identified, achieving 100% accuracy for this lesion type. ResNet50 achieves the highest accuracy in detecting vasc lesions but struggles significantly with the mel skin lesion family. In summary, the results confirm the effectiveness of transfer learning in combination with data augmentation for skin lesion detection. While ResNet50 demonstrates acceptable accuracy on both the dev and test datasets, MobileNetV3 is a promising choice for real-world deployment due to its efficient runtime and compatibility with low-power devices. These findings are consistent with the performance metrics presented in Table 4.10 and Table 97 Table 4.17: Confusion matrix of ResNet50 on the test dataset after data augmentation. True Label Predicted Label akiec bcc bkl df nv mel vasc akiec bcc bkl df nv mel vasc 32 1 1 0 0 1 0 1 43 0 0 2 1 0 2 1 88 0 2 10 0 0 1 1 9 0 0 0 2 2 6 1 646 28 0 0 3 11 0 3 89 0 0 0 1 0 1 2 10 4.11, reinforcing the efficacy of the applied methodologies. 4.2 Discussion The analysis presented in this thesis highlights the effectiveness of transfer learning and data augmentation techniques in improving the performance of deep learning models for skin lesion detection. By leveraging pre-trained models on the HAM10000 dataset, we were able to develop classifiers that achieve high accuracy and F1 scores across multiple skin lesion types. However, the performance of each model varied depending on the specific characteristics of the dataset and the model architecture, as summarized in Table 4.18. 98 Table 4.18: Summary of Model Performance, Strengths, and Weaknesses Model Accuracy F1 Score VGG16 90.81% 0.9065 Strengths Weaknesses - Strong performance - Challenges in classi- in identifying nv and fying mel and akiec. vasc lesions. - Some confusion between bcc and mel. VGG19 90.61% 0.9036 - Effective in detecting - Struggles with mis- nv and df. classification between mel and nv. ResNet50 91.61% 0.9132 - Consistently strong - Challenges with dis- across most classes, tinguishing bkl from especially akiec and mel and mel from nv. bcc. MobileNetV1 MobileNetV2 88.01% 89.11% 0.8686 0.8863 - Good at predicting - High misclassifica- common classes like tion rates in df and nv. mel. - High accuracy in de- - Issues with distin- tecting df and nv. guishing mel from nv and bkl. MobileNetV3 91.31% 0.9102 - Excels in identifying - Struggles nv and vasc with high cantly accuracy. guishing mel. with signifidistin- - Misclassifies bkl and nv frequently. As shown in the table, ResNet50 emerged as the most robust model, with an accuracy of 91.61% and an F1 score of 0.9132. Its strengths lie in 99 its ability to accurately classify akiec and bcc lesions, which are critical for identifying malignant conditions. However, it faces challenges distinguishing bkl from mel and mel from nv, indicating areas where further improvement is needed. While slightly less accurate at 91.31%, MobileNetV3 demonstrated notable efficiency in runtime and performed exceptionally well in identifying nv and vasc lesions. However, it struggled significantly with mel lesions, highlighting the limitations of this model when dealing with visually similar classes. These findings suggest that while MobileNetV3 is a strong candidate for real-time applications, particularly in resource-constrained environments, further refinement is necessary to enhance its performance in more challenging classification tasks. VGG16 and VGG19, with accuracies of 90.81% and 90.61%, respectively, also demonstrated strong performance, particularly in identifying nv and vasc lesions. However, both models exhibited difficulties in classifying mel and akiec; in some cases, there was confusion between bcc and mel. This highlights the potential need for advanced augmentation techniques or alternative model architectures to capture the subtle differences between these lesion types better. While efficient, the MobileNetV1 and MobileNetV2 models showed lower accuracy than their counterparts, particularly in classifying df and mel lesions. Their performance underscores the trade-off between computational efficiency and classification accuracy, particularly in models designed for deployment on devices with limited processing power. These results align with existing literature, suggesting that deeper models 100 like ResNet50 perform better on complex classification tasks. In contrast, more lightweight models like MobileNet are better suited for scenarios where computational resources are limited. The findings also emphasize the critical role of dataset characteristics, particularly class imbalance, in influencing model performance. 101 Chapter 5 Conclusion In conclusion, the results of this study demonstrate the effectiveness of transfer learning and data augmentation in developing high-performance deeplearning models for skin lesion detection. ResNet50 consistently emerged as a top performer, achieving the highest accuracy and F1 scores across most lesion classes, making it a reliable model for clinical applications where accuracy is paramount. On the other hand, MobileNetV3, with its impressive runtime efficiency, is particularly well-suited for deployment in real-time applications and on resource-constrained devices such as smartphones. However, its struggles with distinguishing mel lesions from other types underscore further refinement, perhaps through more sophisticated data augmentation techniques or by incorporating additional features to enhance its discriminative power. The limitations of the HAM10000 dataset, including its size, demographic representation, and potential biases related to skin type, highlight the impor- 102 tance of using diverse and representative datasets in future research. These biases can impact the model’s generalizability and effectiveness across different populations, emphasizing the need for careful consideration in both dataset selection and model development. Expanding the dataset to include images from varied populations and skin types and incorporating advanced augmentation techniques, such as Generative Adversarial Networks (GANs), could help address these limitations and improve model generalization. Future research should also explore integrating domain adaptation techniques to enhance model adaptability across different datasets and real-world scenarios. Additionally, real-world validation through clinical trials and continuous learning systems will be crucial for maintaining the accuracy and relevance of these models over time. Ultimately, the findings of this study contribute to the ongoing development of accurate and efficient diagnostic tools for skin lesion detection, with significant implications for clinical practice and patient care. By continuing to refine and optimize these models, focusing on diversity and bias mitigation, we can move closer to achieving reliable, real-time diagnostic systems that are both effective and accessible to a broad range of patients. 103 Bibliography [1] “Skin cancer foundation-melanoma.” https://www. skincancer.org/skin-cancer-information/melanoma/ melanoma-warning-signs-and-images/. [2] “Skin cancer foundation-bcc.” https://www.skincancer.org/ skin-cancer-information/basal-cell-carcinoma/. [3] “Skin cancer foundation-scc.” https://www.skincancer.org/ skin-cancer-information/squamous-cell-carcinoma/. [4] “National cancer institute.” https://www.cancer.gov/types/skin/ moles-fact-sheet. [5] I. H. Sarker, “Deep cybersecurity: A comprehensive overview from neural network and deep learning perspective,” SN Computer Science, vol. 2, p. 154, 2021. [6] “Deep learning specialization.” https://www.deeplearning.ai/ courses/deep-learning-specialization/. [7] O. P. Ogunmolu, X. Gu, S. B. Jiang, and N. R. Gans, “Nonlinear systems identification using deep dynamic neural networks,” ArXiv, vol. abs/1610.01439, 2016. 104 [8] R. N. Toma, M. N. Hasan, A.-A. Nahid, and B. Li, “Electricity theft detection to reduce non-technical loss using support vector machine in smart grid,” in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–6, 2019. [9] D. Xu, Y. Wang, S. Xu, K. Zhu, N. Zhang, and X. Zhang, “Infrared and visible image fusion with a generative adversarial network and a residual network,” Applied Sciences, vol. 10, no. 2, 2020. [10] “Convolutional ture neural networks explained.” (cnn) — architec- https://medium.com/@draj0718/ convolutional-neural-networks-cnn-architectures-explained-716fb197b243. [11] M. Loukadakis, J. Cano, and M. F. P. O’Boyle, “Accelerating deep neural networks on low power heterogeneous architectures,” 2018. [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015. [13] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3,” 2019. [14] R. Ashraf, S. Afzal, A. U. Rehman, S. Gul, J. Baber, M. Bakhtyar, I. Mehmood, O.-Y. Song, and M. Maqsood, “Region-of-interest based transfer learning assisted framework for skin cancer detection,” IEEE Access, vol. 8, pp. 147858–147871, 2020. [15] M. Elgamal, “Automatic skin cancer images classification,” International Journal of Advanced Computer Science and Applications, vol. 4, no. 3, 2013. 105 [16] W. Gouda, N. U. Sama, G. Al-Waakid, M. Humayun, and N. Z. Jhanjhi, “Detection of skin cancer based on skin lesion images using deep learning,” Healthcare, vol. 10, no. 7, 2022. [17] “World cer cancer research statistics.” fund international: Skin can- https://www.wcrf.org/cancer-trends/ skin-cancer-statistics/. [18] “Skin cancer foundation-melanoma.” https://www.skincancer.org/ skin-cancer-information/melanoma/. [19] M. Q. Khan, A. Hussain, S. U. Rehman, U. Khan, M. Maqsood, K. Mehmood, and M. A. Khan, “Classification of melanoma and nevus in digital images for diagnosis of skin cancer,” IEEE Access, vol. 7, pp. 90132–90144, 2019. [20] N. Melarkode, K. Srinivasan, S. M. Qaisar, and P. Plawiak, “Aipowered diagnosis of skin cancer: A contemporary review, open challenges and future research directions,” Cancers (Basel), vol. 15, no. 4, p. 1183, 2023. [21] “Canadian skin cancer foundation: Skin cancer.” https://www. canadianskincancerfoundation.com/skin-cancer/?gad=1&gclid= CjwKCAjwjOunBhB4EiwA94JWsNs46V81Hivw4ZUdqzu6VP94k5HWbxWSqcmw_ OKeAuv5MqsezY3orBoC_sQQAvD_BwE. [22] “What cers?.” are basal and squamous cell skin can- https://www.cancer.org/cancer/types/ basal-and-squamous-cell-skin-cancer/about/ what-is-basal-and-squamous-cell.html#:~:text=About%208% 20out%20of%2010,head%2C%20neck%2C%20and%20arms. 106 [23] M. Naqvi, S. Q. Gilani, T. Syed, O. Marques, and H.-C. Kim, “Skin cancer detection using deep learning—a review,” Diagnostics, vol. 13, no. 11, p. 1911, 2023. [24] A. Kilic, A. Kilic, A. Kivanc, and A. Sisik, “Biopsy Techniques for Skin Disease and Skin Cancer: A New Approach.,” Journal of Cutaneous and Aesthetic Surgery, vol. 13, no. 3, pp. 251–254, 2020. [25] W. F. Cueva, F. Muñoz, G. Vásquez, and G. Delgado, “Detection of skin cancer ”melanoma” through computer vision,” in 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1–4, 2017. [26] R. Marks, “Epidemiology of melanoma,” Clinical and Experimental Dermatology, vol. 25, pp. 459–463, 11 2000. [27] M. A. Kadampur and S. Al Riyaee, “Skin cancer detection: Applying a deep learning based model driven architecture in the cloud for classifying dermal cell images,” Informatics in Medicine Unlocked, vol. 18, p. 100282, 2020. [28] K. Das, C. J. Cockerell, A. Patil, P. Pietkiewicz, M. Giulini, S. Grabbe, and M. Goldust, “Machine learning and its application in skin cancer,” International Journal of Environmental Research and Public Health, vol. 18, p. 13409, 2021. [29] T. Davenport and R. Kalakota, “The potential for artificial intelligence in healthcare,” Future healthcare journal, vol. 6, pp. 94–98, 2019. [30] X. Du-Harpur, F. Watt, N. Luscombe, and M. Lynch, “What is AI? Applications of artificial intelligence to dermatology,” British Journal of Dermatology, vol. 183, pp. 423–430, 09 2020. 107 [31] A. S. Panayides, A. Amini, N. D. Filipovic, A. Sharma, S. A. Tsaftaris, A. Young, D. Foran, N. Do, S. Golemati, T. Kurc, K. Huang, K. S. Nikita, B. P. Veasey, M. Zervakis, J. H. Saltz, and C. S. Pattichis, “Ai in medical imaging informatics: Current challenges and future directions,” IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 7, pp. 1837–1857, 2020. [32] A. A. Patel, Hands-on unsupervised learning using Python: how to build applied machine learning solutions from unlabeled data. O’Reilly Media, 2019. [33] R. Aggarwal, V. Sounderajah, G. Martin, D. S. Ting, A. Karthikesalingam, D. King, H. Ashrafian, and A. Darzi, “Diagnostic accuracy of deep learning in medical imaging: a systematic review and metaanalysis,” NPJ digital medicine, vol. 4, no. 1, p. 65, 2021. [34] M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov, “The transformational role of gpu computing and deep learning in drug discovery,” Nature Machine Intelligence, vol. 4, no. 3, pp. 211–221, 2022. [35] Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, and Q. Sun, “Deep learning for image-based cancer detection and diagnosisa survey,” Pattern Recognition, vol. 83, pp. 134–149, 2018. [36] M. Dildar, S. Akram, M. Irfan, H. U. Khan, M. Ramzan, A. R. Mahmood, S. A. Alsaiari, A. H. Saeed, M. O. Alraddadi, and M. H. Mahnashi, “Skin cancer detection: A review using deep learning techniques,” International Journal of Environmental Research and Public Health, vol. 18, no. 10, p. 5479, 2021. 108 [37] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115–118, 2017. [38] S. S. Han, I. Park, S. Eun Chang, W. Lim, M. S. Kim, G. H. Park, J. B. Chae, C. H. Huh, and J.-I. Na, “Augmented intelligence dermatology: Deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders,” Journal of Investigative Dermatology, vol. 140, no. 9, pp. 1753–1761, 2020. [39] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Scientific Data, vol. 5, no. 1, 2018. [40] “Skin cancer mnist: Ham10000.” https://www.kaggle.com/ datasets/kmader/skin-cancer-mnist-ham10000. [41] A.-R. Ali, J. Li, S. J. O’Shea, G. Yang, T. Trappenberg, and X. Ye, “A deep learning based approach to skin lesion border extraction with a novel edge detector in dermoscopy images,” 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, 2019. [42] M. Goyal, T. Knackstedt, S. Yan, and S. Hassanpour, “Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities,” Computers in Biology and Medicine, vol. 127, p. 104065, 2020. [43] J. L. Arroyo and B. G. Zapirain, “Automated detection of melanoma in dermoscopic images,” Series in BioEngineering, p. 139–192, 2014. [44] T. Kanimozhi and D. A. Murthi, “Computer aided melanoma skin cancer detection using artificial neural network classifier,” 2016. 109 [45] S. Choudhari and S. Biday, “Artificial neural network for skincancer detection,” 2014. [46] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. [47] S. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660–674, 1991. [48] V. Pomponiu, H. Nejati, and N.-M. Cheung, “Deepmole: Deep neural networks for skin mole lesion classification,” in 2016 IEEE International Conference on Image Processing (ICIP), pp. 2623–2627, 2016. [49] T. Tanaka and M. D. Voigt., “Decision tree analysis to stratify risk of de novo non-melanoma skin cancer following liver transplantation,” Journal of Cancer Research and Clinical Oncology, vol. 144, pp. 607– 615, 2018. [50] P. L. Quinn, J. B. Oliver, O. M. Mahmoud, and R. J. Chokshi, “Costeffectiveness of sentinel lymph node biopsy for head and neck cutaneous squamous cell carcinoma,” Journal of Surgical Research, vol. 241, pp. 15–23, 2019. [51] T. Saba, M. A. Khan, A. Rehman, and S. L. Marie-Sainte, “Region extraction and classification of skin cancer: A heterogeneous framework of deep cnn features fusion and reduction,” Journal of Medical Systems, vol. 43, pp. 15–23, 2019. 110 [52] K. Melbin and Y. Jacob Vetha Raj, “Integration of modified abcd features and support vector machine for skin lesion types classification,” Multimedia Tools Applications, vol. 80, no. 6, p. p8909, 2019. [53] A. G. Neela, “Implementation of support vector machine for identification of skin cancer,” International Journal of Engineering and Manufacturing, 2019. [54] G. Arora, A. K. Dubey, Z. A. Jaffery, and A. Rocha, “Bag of feature and support vector machine based early diagnosis of skin cancer.,” Journal of Neural Computing Applications, vol. 34, no. 11, p. p8385, 2022. [55] N. Hameed, A. Shabut, and M. A. Hossain, “A computer-aided diagnosis system for classifying prominent skin lesions using machine learning,” in 2018 10th Computer Science and Electronic Engineering (CEEC), pp. 186–191, 2018. [56] F. Xie, H. Fan, Y. Li, Z. Jiang, R. Meng, and A. Bovik, “Melanoma classification on dermoscopy images using a neural network ensemble model,” IEEE Transactions on Medical Imaging, vol. 36, no. 3, pp. 849– 858, 2017. [57] O. F. Alwan, “Skin cancer images classification using naÏve bayes,” Emergent: Journal of Educational Discoveries and Lifelong Learning (EJEDL), vol. 3, p. 19–29, Apr. 2022. [58] V. Balaji, S. Suganthi, R. Rajadevi, V. Krishna Kumar, B. Saravana Balaji, and S. Pandiyan, “Skin disease detection and segmentation using dynamic graph cut algorithm and classification through naive bayes classifier,” Measurement, vol. 163, p. 107922, 2020. 111 [59] A. Mobiny, A. Singh, and H. Van Nguyen, “Risk-aware machine learning classifier for skin lesion diagnosis,” Journal of Clinical Medicine, vol. 8, no. 8, p. 1241, 2019. [60] S. Alkhushayni, D. Al-Zaleq, L. Andradi, and P. Flynn, “The application of differing machine learning algorithms and their related performance in detecting skin cancers and melanomas.,” Journal of skin cancer, vol. 2022, no. 2839162, 2022. [61] M. F. Ak, “A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications,” Healthcare (Basel, Switzerland), vol. 8, no. 2, p. 111, 2020. [62] T. Mazhar, I. Haq, A. Ditta, S. A. H. Mohsan, F. Rehman, I. Zafar, J. A. Gansau, and L. P. W. Goh, “The role of machine learning and deep learning approaches for the detection of skin cancer.,” Healthcare (Basel, Switzerland), vol. 11, no. 3, p. 415, 2023. [63] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing, vol. 187, pp. 27–48, 2016. [64] H. Rashid, M. A. Tanveer, and H. Aqeel Khan, “Skin lesion classification using gan based data augmentation,” in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 916–919, 2019. [65] D. Bisla, A. Choromanska, R. S. Berman, J. A. Stein, and D. Polsky, “Towards automated melanoma detection with deep learning: Data purification and augmentation,” in 2019 IEEE/CVF Conference on Com- 112 puter Vision and Pattern Recognition Workshops (CVPRW), pp. 2720– 2728, 2019. [66] A. Farag, L. Lu, H. R. Roth, J. Liu, E. Turkbey, and R. M. Summers, “A bottom-up approach for pancreas segmentation using cascaded superpixels and (deep) image patch labeling,” IEEE Transactions on Image Processing, vol. 26, no. 1, pp. 386–399, 2017. [67] D. Divya and T. Ganeshbabu, “Fitness adaptive deer hunting-based region growing and recurrent neural network for melanoma skin cancer detection,” International Journal of Imaging System and Technology, vol. 30, pp. 731–752, 2020. [68] B. Ahmad, M. Usama, T. Ahmad, S. Khatoon, and C. M. Alam, “An ensemble model of convolution and recurrent neural network for skin disease classification.,” International Journal of Imaging Systems and Technology, vol. 32, pp. 218–229, 2021. [69] M. Z. Alom, C. Yakopcic, M. S. Nasrin, T. M. Taha, and V. K. Asari, “Breast cancer classification from histopathological images with inception recurrent residual convolutional neural network.,” Journal of Digital Imaging, vol. 32, no. 4, pp. 605–617, 2019. [70] R. Patil and N. Biradar, “Automated mammogram breast cancer detection using the optimized combination of convolutional and recurrent neural network.,” Evol. Intel, vol. 14, p. 1459–1474, 2021. [71] X. Wu, H. Y. Wang, P. Shi, R. Sun, X. Wang, Z. Luo, F. Zeng, M. S. Lebowitz, W. Y. Lin, J. J. Lu, R. Scherer, O. Price, Z. Wang, J. Zhou, and Y. Wang, “Long short-term memory model - a deep learning ap- 113 proach for medical data with irregularity in cancer predication with tumor markers.,” Computers in biology and medicine, vol. 144, 2022. [72] M. A. Elashiri, A. Rajesh, S. Nath Pandey, S. Kumar Shukla, S. Urooj, and A. Lay-Ekuakille, “Ensemble of weighted deep concatenated features for the skin disease classification model using modified long short term memory,” Biomedical Signal Processing and Control, vol. 76, p. 103729, 2022. [73] L. Gonog and Y. Zhou, “A review: Generative adversarial networks,” in 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 505–510, 2019. [74] I. J. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” ArXiv, vol. abs/1701.00160, 2016. [75] U.-O. Dorj, K. K. Lee, J.-Y. Choi, and M. Lee, “The skin cancer classification using deep convolutional neural network,” Multimedia Tools and Applications, vol. 77, p. 9909–9924, 2018. [76] J. Höhn, A. Hekler, E. Krieghoff-Henning, J. N. Kather, J. S. Utikal, F. Meier, F. F. Gellrich, A. Hauschild, L. French, J. G. Schlager, and et al., “Skin cancer classification using convolutional neural networks with integrated patient data: A systematic review (preprint),” Journal of Medical Internet Research, 2020. [77] T. Devries and D. Ramachandram, “Skin lesion classification using deep multi-scale convolutional neural networks,” ArXiv, vol. abs/1703.01402, 2017. [78] D. B. Mendes and N. C. da Silva, “Skin lesions classification 114 using convolutional neural networks in clinical images,” ArXiv, vol. abs/1812.02316, 2018. [79] S. S. Chaturvedi, K. Gupta, and P. S. Prasad, Skin Lesion Analyser: An Efficient Seven-Way Multi-class Skin Cancer Classification Using MobileNet, p. 165–176. Springer Singapore, May 2020. [80] L. Yu, H. Chen, Q. Dou, J. Qin, and P.-A. Heng, “Automated melanoma recognition in dermoscopy images via very deep residual networks,” IEEE Transactions on Medical Imaging, vol. 36, no. 4, p. 994–1004, 2017. [81] H. Hsin-Wei, W.-Y. H. Benny, L. Chih-Hung, and S. T. Vincent, “Development of a light-weight deep learning model for cloud applications and remote diagnosis of skin cancers,” DERMATOLOGY, vol. 48, no. 3, p. 310–316, 2020. [82] A. Ghadah, G. Walaa, H. Mamoona, and S. Najm, Us, “Melanoma detection using deep learning-based classifications,” Healthcare (Basel), vol. 10, no. 12, p. 2481, 2022. [83] F. Mohammad and F. Esraa, “On the automatic detection and classification of skin cancer using deep transfer learning,” Sensors (Basel), vol. 22, no. 13, p. 4963, 2022. [84] M. Roshni Thanka, E. Bijolin Edwin, V. Ebenezer, K. Martin Sagayam, B. Jayakeshav Reddy, H. Günerhan, and H. Emadifar, “A hybrid approach for melanoma classification using ensemble machine learning techniques with deep transfer learning,” Computer Methods and Programs in Biomedicine Update, vol. 3, p. 100103, 2023. 115 [85] L. Umesh, Kumar, S. Sarita, S. Yogesh, Kumar, K. Kuldeep, Singh, V. B. R. K, B, R. M. R. V, V, B. Anupam, B. Anchit, and A. Roobaea, “A precise model for skin cancer diagnosis using hybrid u-net and improved mobilenet-v3 with hyperparameters optimization,” Scientific Reports, vol. 14, p. 4299, 2024. [86] T. Jitendra, V, H. Nachiketa, P. Hemprasad, Y, and D. Tausif, “Skin cancer detection using ensemble of machine learning and deep learning techniques,” Multimedia Tools and Applications, vol. 82, p. 27501–27524, 2023. [87] H. Mehdi, H. Dildar, Z. M. Firas, Muhammad, A. Farhan, A, V. Amirhossein, Noroozi, A. Parvaneh, D. Aso, M. Mazhar, Hussain, and L. Sang, Woong, “A model for skin cancer using combination of ensemble learning and deep learning,” PloS one, vol. 19, no. 5, 2024. [88] M. M. Hossain, M. M. Hossain, M. B. Arefin, F. Akhtar, and J. Blake, “Combining state-of-the-art pre-trained deep learning models: A noble approach for skin cancer detection using max voting ensemble,” Diagnostics, vol. 14, no. 1, 2024. [89] A. W. Harley, “An interactive node-link visualization of convolutional neural networks,” in Advances in Visual Computing (G. Bebis, R. Boyle, B. Parvin, D. Koracin, I. Pavlidis, R. Feris, T. McGraw, M. Elendt, R. Kopper, E. Ragan, Z. Ye, and G. Weber, eds.), pp. 867– 877, Springer International Publishing, 2015. [90] H. Gholamalinezhad and H. Khosravi, “Pooling methods in deep neural networks, a review,” ArXiv, vol. abs/2009.07485, 2020. 116 [91] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929– 1958, 2014. [92] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015. [93] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017. [94] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” 2019. [95] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012. [96] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” 2019. [97] R. Avenash and P. Viswanath, “Semantic segmentation of satellite images using a modified cnn with hard-swish activation function,” in VISIGRAPP, 2019. [98] I. U. Haq, K. Muhammad, A. Ullah, and S. W. Baik, “Deepstar: Detecting starring characters in movies,” IEEE Access, vol. 7, pp. 9265– 9272, 2019. [99] K. Muhammad, S. Khan, V. Palade, I. Mehmood, and V. H. C. de Albuquerque, “Edge intelligence-assisted smoke detection in foggy 117 surveillance environments,” IEEE Transactions on Industrial Informatics, vol. 16, no. 2, pp. 1067–1075, 2020. [100] K. Muhammad, R. Hamza, J. Ahmad, J. Lloret, H. Wang, and S. W. Baik, “Secure surveillance framework for iot systems using probabilistic image encryption,” IEEE Transactions on Industrial Informatics, vol. 14, no. 8, pp. 3679–3689, 2018. [101] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– 1359, 2010. [102] S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, “Early diagnosis of alzheimer’s disease with deep learning,” in 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 1015–1018, 2014. [103] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3462–3471, 2017. [104] F. Chollet et al., “Keras.” https://keras.io, 2015. [105] S. Abokadr, A. Azman, H. Hamdan, and N. Amelina, “Handling imbalanced data for improved classification performance: Methods and challenges,” in 2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA), pp. 1–8, 2023. [106] C. Aliferis and G. Simon, Overfitting, Underfitting and General Model Overconfidence and Under-Performance Pitfalls and Best Practices in 118 Machine Learning and AI, pp. 477–524. Cham: Springer International Publishing, 2024. [107] D. M. W. Powers, “What the f-measure doesn’t measure: Features, flaws, fallacies and fixes,” 2019. 119 Appendix A Appendix: Accepted Paper 120 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 Transfer Learning Based Skin Cancer Detection Using Convolutional Neural Networks Saeid Moradi1 , Mateen Shaikh2* Department of Mathematics and Statistics Thompson Rivers University Kamloops, Canada * Corresponding author’s email: mshaikh@tru.ca Abstract— Skin cancer, a global health concern, requires early and accurate detection methods to improve patient outcomes. Despite significant advancements in deep learning, challenges like dataset imbalances and the trade-off between model accuracy and computational efficiency persist. This study introduces a comprehensive analysis of various Convolutional Neural Network (CNN) architectures for skin cancer detection using the HAM10000 dataset, comprising 10,015 dermatoscopic images of seven pigmented lesions. This research addresses class imbalances and enhances model robustness by implementing a data augmentation strategy combined with standard preprocessing techniques, such as image resizing and normalization. Six state-of-the-art CNN models—VGG16, VGG19, ResNet50, MobileNet, MobileNetV2, and MobileNetV3—are systematically evaluated to determine their effectiveness. The findings reveal that ResNet50 achieves the highest accuracy and F1-score, making it reliable for precise diagnosis. At the same time, MobileNetV3 excels in computational efficiency, suggesting its suitability for resourceconstrained environments or real-time applications. This study provides critical insights into the trade-offs between accuracy and efficiency in CNN-based skin cancer detection, offering a practical framework for selecting the appropriate model based on specific application needs. Keywords—Skin Cancer Detection; CNNs; VGG16; ResNet50; MobileNet, MobileNetV2, MobileNetV3 I. INTRODUCTION Skin cancer stands out as one of the most prevalent forms of cancer in the current decade [1]. It is mainly categorized into two major groups: melanoma and nonmelanoma skin cancer [2]. Based on the World Cancer Research Fund International (WCRFI) report, melanoma is the 17th most common cancer worldwide. It is the 13th most common cancer in men and the 15th most common cancer in women. The mortality of melanoma skin cancer around the world in 2020 was 57,043 deaths, where New Zealand, Norway, Montenegro, Slovakia, and Slovenia had the highest number of deaths. The worldwide mortality rate for non-melanoma skin cancer was 63,731 in 2020, whereas Papua New Guinea, Namibia, Mozambique, Zimbabwe, and Angola had the highest mortality rates. Early detection and accurate diagnosis are pivotal factors in treating skin cancer. Typically, physicians rely on the biopsy method for skin cancer detection, which is often painful, slow, and time-consuming [3]. Studies have indicated that dermatologists exhibit classification performance values of 75% to 84% when diagnosing melanoma, drawing upon their professional experiences [4, 5]. Additionally, globally, there is a shortage of skilled dermatologists in public healthcare systems, exacerbating the challenges in dermatological diagnosis and treatment [6]. This research is motivated by two primary goals. First, to improve the efficiency and accuracy of skin cancer diagnosis by developing an artificial intelligence-based screening system using dermoscopic images of skin lesions. Such a system could aid clinical screening tests, reduce diagnostic errors, and enhance early detection, which is critical for successful treatment. Second, this study aims to address the urgent need for reliable automated skin cancer detection systems, particularly in regions with limited access to dermatology specialists. By evaluating the classification performance of six CNN models and analyzing their training behavior and time requirements, this research provides a comprehensive assessment of AI-based solutions for skin cancer diagnosis. Ultimately, this study seeks to bridge diagnostic gaps, enable timely treatment, improve patient outcomes, and potentially save lives. Machine learning (ML) is a technique that employs statistical models and algorithms to learn from data progressively, enabling the prediction of characteristics of new samples and the execution of desired tasks [7]. ML's profound impact spans various societal domains, including production lines, healthcare, education, transportation, and food industries [7]. Deep Learning (DL), a subcategory of ML comprising deep neural networks, shares similarities with ML yet operates on a deeper level of complexity. 979-8-3503-6304-3/24/$31.00 ©2024 IEEE 121 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 In recent decades, deep learning has profoundly transformed the field of machine learning. The significant increase in processing power has facilitated remarkable progress in computer vision technologies, notably by developing deep learning models like Convolutional Neural Networks (CNNs) [8]. Deep Learning has been widely successfully applied in a variety of classification problems, such as signal processing and radar systems [9, 10], autonomous vehicles [11, 12], cybersecurity [13, 14], and healthcare [15, 16]. The urgency for early skin cancer detection has intensified, and deep learning has emerged as a powerful tool in this endeavor. Studies have demonstrated that early identification of skin cancer using deep learning improves the performance of human specialists, ultimately leading to a reduction in mortality rates [17]. By incorporating efficient formulations into deep learning techniques, exceptional and state-of-the-art processing and classification accuracy can be achieved [18]. Computer-based technology presents a promising avenue for diagnosing skin cancer symptoms, offering advantages in comfort, cost-effectiveness, and speed [18]. Quality data plays a pivotal role in the performance of machine learning models. Therefore, a diverse and comprehensive collection of dermoscopic images is necessary to assess the effectiveness of computer-based systems for skin cancer diagnosis. The HAM10000 dataset [19] is used in this research. The dataset was gathered from two sources: Cliff Rosendahl’s skin cancer practice in Queensland, Australia, and the Dermatology Department of the Medical University of Vienna, Austria. It comprises 10,015 dermatoscopic images obtained from different populations and acquired through various modalities. It includes representative cases of the most significant diagnostic categories for pigmented lesions, such as actinic keratoses and intraepithelial carcinoma (AKIEC), basal cell carcinoma (BCC), benign keratosis-like lesions (BKL), dermatofibroma (DF), melanoma (MEL), melanocytic nevi (NV), and vascular lesions (vasc). Each image is annotated with one of seven skin lesion types. Using dermoscopic images for training and applying AI models involves handling sensitive personal health information. To protect patient identities and prevent unauthorized access, all the images in the dataset are anonymized. The dataset is publicly available through the ISIC archive. HAM10000 dataset has an imbalance where it includes 327 images of AKIEC, 514 images of basal cell carcinomas, 1099 images of benign keratoses, 115 images of dermatofibromas, 6705 images of melanomas, 1113 images of melanocytic nevi, 142 images of vascular skin lesions. The imbalance is one of the significant challenges because classifiers tend to be influenced by the dominant class while neglecting the smaller ones [20]. II. LITERATURE REVIEW Leveraging AI for skin cancer detection has the potential to significantly reduce the need for biopsies and empower patients to conduct self-examinations, facilitating teledermoscopy and decreasing the frequency of medical consultations [21]. However, developing an automatic classification system for skin cancer is challenging due to the complexity and diversity of skin cancer images. Skin lesions can share significant similarities across classes, increasing the risk of misdiagnosis [22]. Even within the same class, variations in color, features, structure, size, and location add to the difficulty of accurate classification [23]. CNNs are among the most powerful and widely used ML techniques for image recognition and categorization [24]. Their architecture typically includes convolutional layers, nonlinear pooling layers, and fully connected layers [25]. Fig. 1 shows the basic architecture of a CNN. Fig. 1 CNN architecture. Previous studies have demonstrated the effectiveness of CNNs in skin cancer classification. For instance, a study utilizing the HAM10000 dataset employed MobileNet for skin lesion detection, achieving an accuracy of 83% [25]. Another study introduced a fully convolutional residual network (FCRN) with 16 residual blocks for melanoma detection, achieving an accuracy of 85.5% with segmentation and 82.8% without segmentation [26]. Huang et al. developed two deep learning models using DenseNet and EfficientNet, achieving 89.5% accuracy in binary classification on the KCGMH dataset and 85.8% on the HAM10000 dataset [27]. Furthermore, using Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) for image enhancement, coupled with a modified ResNet-50 model, improved classification metrics such as accuracy, precision, recall, and F1-score [28]. Another study aims to accurately classify skin lesions into seven categories using the HAM10000 dataset by leveraging 13 deep transfer learning models. The research emphasizes the importance of early detection in reducing mortality rates. It highlights the potential of AI-based systems to enhance diagnostic accuracy, especially in regions with limited access to dermatological care [29]. Most current state-of-the-art approaches rely on either hybrid models [30, 31] or ensembles of deep learning classifiers [32, 33, 34], which are often too resource-intensive for mobile applications. Developing a practical mobile application requires identifying a deep learning model that balances state-of-the-art performance with lightweight architecture. Therefore, this paper evaluates the performance of six different CNN models and analyzes their training time requirements. Despite these advancements, several limitations remain. Most studies focus on optimizing model accuracy without addressing the computational complexity, making them less suitable for real-time or mobile applications. Additionally, many approaches do not adequately address class imbalance in datasets, which can lead to biased models that underperform on 122 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 minority classes. This study addresses these gaps by evaluating a diverse set of pre-trained CNN models, focusing on accuracy and computational efficiency. Moreover, by fine-tuning these models and analyzing their performance across a balanced dataset, this research aims to develop a practical, scalable solution for skin cancer detection that can be deployed in resource-limited settings. In this study, we employ pre-trained CNN models and finetune their parameters to address challenges in skin lesion detection. The CNN architectures utilized in this study include VGG16 [35], VGG19 [35], MobileNet [36], MobileNetV2 [37], MobileNetV3 [38], and ResNet [39]. VGG16 and VGG19 are convolutional neural network (CNN) models named for their 16 and 19 weight layers, respectively. A notable advancement of VGG16 over earlier models like AlexNet [40] is its use of multiple 3×3 kernel-sized filters, which replaced larger kernels. The architecture of VGG16 consists of convolutional layers with ReLU activation functions designed for fixed input dimensions of 224 × 224 × 3 RGB images. Each convolutional layer employs 3×3 filters with a stride of 1 and padding of 1. Following the convolutional layers, VGG16 incorporates three fully connected layers, with the first two containing 4096 channels each and the final layer performing 1000-way classification for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) using a softmax function for output classification. ResNet is a deep learning model designed for computer vision tasks. It introduced significant advancements in the ILVRSC 2015 competition. One of the primary issues ResNet aims to tackle is the Disappearing/Exploding gradient problem commonly encountered in deeper neural networks. comprises 13 blocks of depthwise separable convolutional layers, as described in the original paper [36]. Fig. 4 MobileNet architecture [36]. MobileNet version 2 (MobileNetV2) significantly advances mobile model performance across various tasks and benchmarks and in different model sizes [37]. The architecture of MobileNetV2 revolves around an inverted residual structure, which diverges from residual models by employing thin bottleneck layers at the input and output of the residual block. Fig. 5 represents the MobileNetV2 architecture. Fig. 5 MobileNetV2 architecture [37]. MobileNet version 3 (MobileNetV3) introduces complementary search techniques and innovative architectural designs. Tailored specifically for mobile phone CPUs, MobileNetV3 integrates hardware-aware network architecture search (NAS) [38] alongside the NetAdapt algorithm [38]. Fig. 6 indicates the architecture of MobileNetV3. The critical components of the ResNet architecture include Residual Block, Skip Connection, Stacked Layers, and Global Average Pooling (GAP). Fig.3 shows the architecture of ResNet models compared to VGG16 and plain networks. Fig. 6 MobileNetV3 architecture [38]. Compared to MobileNetV2, MobileNetV3 incorporates the Squeeze and Excitation (SE) module, initially introduced in SENet [41], to enhance feature learning. MobileNetV3 replaces the sigmoid activation function in the SE module with the hardsigmoid function to improve computational efficiency. Fig. 3 ResNet architecture [39]. MobileNets represent a class of efficient models tailored for mobile and embedded vision tasks. They employ a streamlined architecture that relies on depthwise separable convolutions to construct lightweight deep neural networks [36]. The core concept underlying the MobileNet revolves around utilizing depthwise separable convolutions and disassembling the standard convolution operation into two distinct stages: depthwise convolution and pointwise convolution. The architecture of MobileNet, depicted in Fig. 4, With the remarkable advancements in deep learning, transfer learning has become a central element in various computer vision fields, including multimedia [42], surveillance [43], and medical applications [44]. The concept involves leveraging pre-trained models originally trained on nonmedical or natural image datasets and then fine-tuning these models with new data to adapt to specific tasks [45]. Transfer learning plays a crucial role in deploying convolutional neural networks for diagnostic imaging tasks such as skin cancer detection [46], Alzheimer’s Disease diagnosis [47], and chest X-ray analysis [48]. Fig. 7 illustrates the architectures used in the transfer learning approach. Typically, open-source pre-trained models 123 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 are trained on extensive datasets containing numerous classes. Transfer learning allows us to modify pre-trained networks by replacing the top layer with an output layer tailored to our dataset. Depending on the size of our dataset, we can adjust or fine-tune the parameters of the pre-trained models to better suit our specific needs. Fig. 7 Transfer learning methodology. III. METHODOLOGY The methodology consists of four main steps: preprocessing, data augmentation, model architecture, and evaluation metrics. Pre-processing: The initial stage of our analysis involved reading images from the dataset and pre-processing them. The initial size of images is 600 × 450 × 3, and we resized images to dimensions of 244 × 224 × 3, ensuring compatibility with the convolutional neural network (CNN) architectures employed. Additionally, we use built-in functions in the Keras library in Python for data normalization to enhance the uniformity of their pixel values, thus preparing them for subsequent training procedures. Data augmentation: The HAM10000 dataset includes an imbalanced distribution of data. Resampling is applied to tackle this problem. The available data is diversified by augmenting the dataset through transformations such as rotation, flipping, scaling, and cropping, helping to address the class imbalance issue and enriching the dataset with variations of existing images. Model architecture: In our skin lesion detection application, exploring transfer learning utilizing pre-trained models such as VGG16, VGG19, MobileNet, MobileNet V2, MobileNet V3, and ResNet is integral to the project. Fig. 8 represents the model architecture, where we drop the top layer of pre-trained models and add average pooling, dropout, and softmax layers with the number of classes in our dataset. the proposed model based on training time for low-power devices. IV. RESULTS & DISCUSSION This research was done using an ASUS TUF Gaming A15 system with AMD Ryzen 7 6800H processor information with Radeon Graphics, 3201 Mhz, 8 Core(s), 16 Logical Processor(s), and 16GB of RAM. A. CNN models without data augmentation First, the results of transfer learning based models without any data augmentation are reported. We proceed with finetuning the parameters of the pre-trained models, focusing on specific layers tailored to each architecture. For MobileNetV1, MobileNetV2, MobileNetV3, VGG16, VGG19, and ResNet50, the fine-tuning of parameters commences from layers 50, 100, 120, 10, 13, and 120 out of a total of 86, 154, 157, 19, 22, and 175 layers, respectively. Table 1 showcases the results, where ResNet50 emerges as the top performer in training and validation accuracy and the F1-score. Furthermore, MobileNetV3 demonstrates rapid training, requiring less than 9 minutes, while achieving a training accuracy of 98.54% and a validation accuracy of 85.79%. Model TABLE 1. FINE-TUNED CNNS WITHOUT DATA AUGMENTATION. Training Training Validation Validation Run Time Accuracy F1-Score Accuracy F1-Score VGG16 0. 9842 0.8868 0.8427 0.7752 263m 16.5s VGG19 0.9868 0.8903 0.8306 0.7511 277m 51s ResNet 50 0.9934 0.9233 0.8639 0.8131 92m 40.1s Mobile Net 0.9712 0.8561 0.833 0.761 32m 21.2s Mobile NetV2 0.9823 0.8839 0.8538 0.7782 37m 22.5s Mobile NetV3 0.9854 0.8871 0.8579 0.7802 8m 44.8s Examine the confusion matrix of the models on the test data. Fig. 9 illustrates the test dataset's true labels, comprising 1001 images from 7 classes. Fig. 8 Model architecture. Evaluation metrics: Based on the evaluation metrics in the skin cancer image classification domain [49, 50, 51], our assessment will encompass standard metrics, including accuracy and Weighted F1-Score. Additionally, we evaluate Fig. 9 True labels of the test dataset. Fig. 10 displays the confusion matrix results for MobileNetV1. The confusion matrix results reveal that MobileNetV1 successfully identifies the nv skin lesion family, 124 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 achieving 642 correct predictions out of 654 instances. Bkl lesions also show a relatively high number of correct predictions (77). However, the model struggles with the vasc lesion family, which is frequently misclassified. Mel lesions are often mistaken for nv and bkl, with 64 and 30 instances, respectively. Akiec and df also show considerable misclassifications between these classes and others like bcc and nv. This indicates that MobileNetV1 has difficulty distinguishing between these classes. accuracy in vasc identification (8 correct cases) highlights the model's proficiency with more easily distinguishable lesions. Fig. 12 Confusion matrix of MobileNetV3 on the test dataset. Fig. 10 Confusion matrix of MobileNet on the test dataset. Fig. 11 illustrates the confusion matrix results for MobileNetV2. The model shows significant misclassification between classes with likely visual similarities. Many akiec instances are misclassified as nv (10 out of 37), and some bcc instances as akiec or nv, indicating overlapping features. While bkl is correctly classified in 71 out of 108 cases, there are substantial errors with nv. The model poorly distinguishes df, with only 4 out of 10 correctly classified. It performs well on nv (628 out of 651), though some are misclassified as mel. The mel class has lower accuracy, with frequent misclassification into bkl and nv. Despite its small size, vasc is mostly correctly classified (6 out of 10). Figure 13 shows the confusion matrix for VGG16 on the test dataset without augmentation. The model identifies 18 instances of akiec but frequently misclassifies this class as mel and nv. This pattern of errors may stem from the model’s struggle to capture the unique characteristics of akiec that distinguish it from other lesions, which is crucial given the clinical significance of akiec. While the model correctly predicts 33 cases for the bcc class, the ongoing confusion with akiec, bkl, nv, and mel suggests that the model might overly rely on shared features, leading to ambiguity. Although the model performs well in predicting bkl with 65 correct identifications, the misclassifications with nv and mel indicate that these classes might share overlapping features that the model is not effectively separating. Df prediction shows minor confusion with other types, possibly due to insufficient distinctive features being learned. The model performs strongly in predicting nv (618 correct identifications), but misclassifications such as bkl, bcc, and mel suggest difficulties differentiating between lesions with similar visual characteristics. The high accuracy in vasc classification (9 out of 10 cases) underscores the model’s proficiency with more easily distinguishable classes. Fig. 11 Confusion matrix of MobileNetV2 on the test dataset. Figure 12 presents the confusion matrix for MobileNetV3 on the test dataset without augmentation. The model correctly identifies 16 instances of akiec but struggles significantly with misclassifications, particularly confusing akiec with bkl and nv. This confusion suggests that the model might focus on shared features, such as color and texture, which are not distinctive enough for accurate differentiation. Similarly, although the model accurately predicts 36 cases of bcc lesions, it often confuses them with akiec, bkl, nv, and mel, indicating a potential overlap in the feature space of these classes. The model performs well in predicting bkl with 64 correct identifications, but the confusion with nv and mel raises concerns about its ability to differentiate between lesions with subtle variations. The model’s strong performance in predicting nv (602 correct identifications) reflects its effectiveness with more distinct classes. Yet, the misclassifications as bkl and mel suggest a need for more refined feature extraction. The high Fig. 13 Confusion matrix of Vgg16 on the test dataset. Figure 14 depicts the confusion matrix for VGG19 on the test dataset without augmentation. The model successfully identifies 25 instances of akiec but struggles with misclassifications, particularly confusing akiec with bcc, bkl, and nv. This suggests that the model may be focusing on features that are not distinctive enough, leading to errors in classification, which is particularly concerning for a class as clinically crucial as akiec. While the model achieves an accuracy rate of 30 cases for the bcc class, the confusion with akiec, bkl, nv, and mel indicates that subtle visual similarities among these classes may challenge the model. The model performs strongly in predicting bkl with 75 correct identifications. Still, the misclassifications with nv and mel suggest that the model might benefit from more refined feature extraction or additional training data that better highlights the 125 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 distinctions between these classes. While the model demonstrates proficiency in predicting nv (614 correct identifications), the persistent misclassifications as bkl, bcc, and mel highlight a need for more effective differentiation of these classes. The accurate classification of all 10 vasc cases showcases the model’s strength with more visually distinct lesions. Fig. 14 Confusion matrix of Vgg19 on the test dataset. Figure 15 presents the confusion matrix for ResNet50 on the test dataset without augmentation. The model correctly identifies 16 instances of akiec, but significant misclassifications with bkl and nv are observed. These errors suggest that the model may struggle to distinguish between akiec and other lesions with overlapping features, which could be problematic in clinical settings where accurate identification of akiec is critical. While the model achieves an accuracy rate of 28 cases for the bcc class, confusion with other classes remains, indicating that the model may need more discriminative features to improve its accuracy. The model performs well in predicting bkl with 63 correct identifications. Still, the confusion with nv and mel suggests that the model may not adequately capture the subtle differences between these classes. Df prediction shows moderate performance with 6 correct cases out of 10, indicating potential challenges in differentiating this class from others, possibly due to limited training data or insufficient feature representation. The model excels in predicting nv with 630 correct identifications, but the misclassifications as bkl, bcc, vasc, and mel highlight areas where the model could benefit from further refinement. The high accuracy in vasc classification (9 out of 10 cases) reflects the model’s strength in identifying more distinct lesions. Fig. 15 Confusion matrix of ResNet50 on the test dataset. In summary, the findings from Table 1 demonstrate the effectiveness of transfer learning in utilizing pre-trained models for skin lesion detection, even when working with imbalanced datasets like HAM10000. However, the confusion matrices reveal critical insights into how this dataset imbalance exacerbates classification difficulties, particularly for lessrepresented classes. The results indicate that models perform significantly better in classes with more abundant training data. This underscores the need for a balanced dataset to achieve optimal classification accuracy across all lesion types. This highlights the importance of addressing dataset imbalance through data augmentation, re-sampling, or advanced loss functions to mitigate the bias toward majority classes and improve overall model performance. B. CNN models with data augmentation After assessing the effectiveness of transfer learning in using pre-trained models across various image types and finetuning the weights based on our dataset, we try to address the imbalance issue in the HAM10000 dataset. To improve our results, we employ transfer learning again, incorporating the data augmentation concept. This involves techniques such as random flipping, rotation, adjustment of brightness and contrast, and cropping of images within the dataset to generate additional data instances. After data augmentation, each class in the training dataset has been augmented to contain 2000 samples, resulting in a balanced training dataset. Table 2 showcases the outcomes obtained through transfer learning utilizing pre-trained CNN models, where the weights are trained based on the HAM10000 dataset with data augmentation. Model TABLE 2. FINE-TUNED CNNS WITH DATA AUGMENTATION Training Training Validation Validation Run Time Accuracy F1-Score Accuracy F1-Score VGG16 0. 9936 0.9896 0.9106 0.9052 327m 23.1s VGG19 0.9944 0.9902 0.9092 0.9064 430m 20.8s ResNet 50 0.9989 0.9957 0.9231 0.9198 133m 28.1s Mobile Net 0.9949 0.9815 0.9011 0.8981 48m 11.9s Mobile NetV2 0.9959 0.9911 0.9161 0.9131 38m 9s Mobile NetV3 0.9951 0.9903 0.9155 0.9112 13m 27.9s After fine-tuning with augmented data, all methods exhibited commendable accuracy and F1-score performance. ResNet50, once again, emerged as a top performer, achieving 99.89% accuracy on the training dataset and 92.31% accuracy on the validation dataset. Following ResNet50, MobileNetV2, MobileNetV3, VGG16, VGG19, and MobileNetV1 demonstrated progressively better accuracy. Runtime is a crucial metric for gauging the computational costs incurred by the models. Larger networks such as VGG19, VGG16, and ResNet50 incurred significantly higher computational costs. Among them, VGG19, with almost 430 minutes, had the highest training time. Conversely, MobileNet models demonstrated notable efficiency in terms of computational costs. Among these, MobileNetV3 stood out 126 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 with a training time of less than 13 minutes, making it an optimal choice for resource-constrained devices like smartphones. The confusion matrices for the models on the test data after data augmentation provide valuable insights into their performance. Fig. 16 shows the confusion matrix for MobileNet on the test dataset after data augmentation, highlighting the model's strengths and weaknesses. While MobileNet accurately identifies 29 instances of akiec, it struggles with misclassifications, particularly confusing akiec with bkl and nv. The model correctly predicts 40 bcc cases, but confusion with nv and other classes remains significant. Although 89 instances of bkl are correctly identified, the model misclassifies several as nv and bcc, indicating challenges in distinguishing between these lesion types. The model shows moderate accuracy in classifying df lesions, correctly identifying 7 out of 10 cases, but occasional misclassifications suggest room for improvement. MobileNet demonstrates a high accuracy rate (99.38%) for nv, successfully identifying 650 out of 654 cases; however, the misclassification of 4 instances, particularly as bkl, underscores the challenges in differentiating between these similar lesion types. Detecting melanoma (mel) is notably problematic, with only 44.27% accuracy, as the model frequently confuses mel with nv and bkl, which could have severe clinical implications. The classification of vasc lesions is commendable, with 8 out of 10 instances correctly identified. Fig. 17 Confusion matrix of MobileNetV2 on the test dataset after data Fig. 18 demonstrates the confusion matrix for MobileNetV3 after data augmentation showcases the model's balanced performance across various classes. While it correctly identifies 29 instances of akiec, it still misclassifies some as bkl and nv, suggesting challenges in distinguishing between these classes. The model's 46 correct predictions for bcc indicate good performance, yet confusion with other classes persists. Although the model accurately predicts 84 instances of bkl, it struggles with misclassifications involving nv, bcc, and akiec, highlighting potential areas for improvement. MobileNetV3 demonstrates high accuracy for df, correctly identifying 9 out of 10 cases. The model excels in nv classification, with 644 out of 654 instances accurately identified, yet it continues to face challenges with bkl, mel, and bcc misclassifications. Melanoma detection shows improvement with 92 instances correctly identified, yet the model frequently misclassifies these as nv and bkl, indicating that subtle distinctions between these lesions remain challenging to capture. The classification of vasc lesions is perfect, with all ten instances correctly identified, achieving 100% accuracy. While MobileNetV3 performs exceptionally well in detecting nv and vasc lesions, its lower accuracy in detecting mel (70.23%) suggests further model refinement to better distinguish between closely related lesions. Fig. 16 Confusion matrix of MobileNet on the test dataset after data augmentation. Fig. 17 represents the confusion matrix for MobileNetV2 on the test dataset with data augmentation, revealing its overall strong performance, particularly with nv lesions, correctly identifying 643 out of 654 instances. However, the model encounters substantial difficulties distinguishing mel and akiec from classes like nv and bkl. This pattern of misclassification highlights the potential visual similarities between these lesions, which are often clinically significant. For instance, the frequent misclassification of mel as nv could lead to severe clinical consequences, emphasizing the need for more nuanced feature extraction or additional augmentation strategies. The results indicate that while MobileNetV2 is proficient in handling well-represented classes, it requires further refinement to improve the differentiation of lesions with overlapping features. Fig. 18 Confusion matrix of MobileNetV3 on the test dataset after data Fig. 19 showcases the confusion matrix for VGG16 on the test dataset, with data augmentation providing a detailed view of the model's performance. VGG16 correctly identifies 30 instances of akiec but struggles with misclassifications, particularly with mel and nv, reflecting the challenges in distinguishing between these visually similar lesions. The model's 41 correct predictions for bcc indicate a solid performance, though misclassifications with other classes persist. The model accurately predicts 81 instances of bkl but also shows significant misclassifications as nv, mel, bcc, and akiec, suggesting that the model may benefit from further refinement in distinguishing between these classes. VGG16 demonstrates high accuracy in df classification, correctly identifying 9 out of 10 instances. The model is proficient in predicting nv, with 638 out of 654 cases correctly classified, 127 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 but continues to face challenges with misclassifications involving mel, bkl, and bcc. Melanoma detection is relatively strong, with 100 instances correctly identified; however, frequent misclassifications as nv and bkl highlight the need for improved feature extraction. The classification of vasc lesions is perfect, achieving 100% accuracy. While VGG16 excels in detecting vasc lesions, its lower accuracy in detecting bkl lesions indicates the need for targeted improvements in model training. differences between these lesion types. The model demonstrates high accuracy in df classification, correctly identifying 9 out of 10 cases. ResNet50 excels in predicting nv, with 646 out of 654 instances accurately identified, yet faces challenges with misclassifications involving mel, bkl, and bcc. Melanoma detection is strong, with 89 cases correctly identified, but frequent misclassifications as nv and bkl highlight the need for improved model precision. The classification of vasc lesions is perfect, achieving 100% accuracy. While ResNet50 achieves the highest accuracy in detecting vasc lesions, its struggles with the mel skin lesion family underscore the need for targeted refinements to better distinguish between these critical lesion types. Fig. 19 Confusion matrix of Vgg16 on the test dataset after data Fig. 20 shows the confusion matrix for VGG19 on the test dataset with data augmentation. The model correctly identifies 35 instances of akiec, yet misclassifications as bkl and nv persist, indicating the challenges in distinguishing these lesions. VGG19 achieves 38 correct predictions for bcc, though confusion with other classes remains an issue. The model accurately predicts 89 instances of bkl but struggles with misclassifications as nv, mel, and akiec, suggesting potential areas for improvement. VGG19 demonstrates a high accuracy rate for df, correctly identifying 8 out of 10 cases, although occasional confusion with other lesion types is observed. The model is proficient in predicting nv, correctly identifying 637 out of 654 instances, yet continues to face challenges with misclassifications involving mel, bkl, akiec, and bcc. Melanoma detection is relatively strong, with 90 cases correctly identified, but frequent misclassifications as nv and bkl highlight the need for enhanced feature differentiation. The classification of vasc lesions is perfect, achieving 100% accuracy. Although VGG19 excels in detecting vasc lesions, it faces significant challenges in accurately detecting the mel skin lesion family, suggesting further model adjustments. Fig. 20 Confusion matrix of Vgg19 on the test dataset after data Fig. 21 illustrates the confusion matrix for ResNet50 on the test dataset with data augmentation. ResNet50 successfully identifies 32 instances of akiec but misclassifies some as bkl and nv, indicating difficulties distinguishing these lesions. The model achieves 43 correct predictions for bcc, but confusion with other classes remains a challenge. ResNet50 accurately predicts 88 instances of bkl but misclassifies some as nv and mel, suggesting that the model may struggle with subtle Fig. 21 Confusion matrix of ResNet50 on the test dataset after data The results affirm the effectiveness of combining transfer learning with data augmentation for skin lesion detection. ResNet50 achieves substantial accuracy on both the development and test datasets, showcasing its robust performance across various lesion types. However, MobileNetV3 is optimal for real-world deployment, given its efficient runtime and suitability for low-power devices. This aligns with the performance metrics detailed in Tables 1 and 2, which underscore the effectiveness of the applied methodologies and their practical implications for deploying these models in resource-constrained environments. V. LIMITATIONS AND FUTURE RESEARCH DIRECTIONS The current approach and dataset have some problems. The HAM10000 dataset, while comprehensive, has limitations in size and demographic diversity, predominantly featuring images from specific population groups, which restricts the model’s generalizability across broader populations. A notable challenge is the class imbalance, where benign lesions are more prevalent, potentially biasing the model and reducing its accuracy for less common, malignant lesions. Although transfer learning from pre-trained CNNs improves performance, the model may not generalize well to new datasets or real-world scenarios due to the specific features learned from HAM10000. Data augmentation efforts may mitigate overfitting, but the risk remains if the augmented images do not fully represent real-world variability. Real-world deployment presents additional challenges, including the need for validation across diverse populations, seamless integration into clinical workflows, and adherence to regulatory standards. The potential for false positives or negatives also raises ethical concerns, underscoring the need for interpretable models that clinicians can trust. 128 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 Future research should expand and diversify the dataset, incorporate images from varied populations, and explore advanced augmentation techniques such as Generative Adversarial Networks (GANs). Fine-tuning more varied datasets and investigating domain adaptation techniques will be critical for improving adaptability. Real-world validation through clinical trials and the implementation of continuous learning systems will help maintain the model’s accuracy and relevance over time. VI. CONCLUSION Based on the findings presented in this research, transfer learning and data augmentation are effective strategies for improving the performance of deep learning models in skin lesion detection tasks. Using the HAM10000 dataset and pretrained CNN models allowed for the development of classifiers that accurately identify skin lesions. ResNet50 consistently emerged as a top performer in accuracy and F1-score metrics, demonstrating its adaptability and effectiveness in leveraging pre-trained weights for feature extraction. On the other hand, MobileNetV3 showcased notable efficiency in runtime, making it a viable option for real-time applications and resource-constrained devices. Incorporating data augmentation techniques further enhanced model performance, particularly in mitigating issues related to dataset imbalance. The models were better equipped to generalize unseen data and improve classification accuracy by generating additional training instances through random transformations. Overall, the results highlight the importance of thoughtful model selection and optimization strategies in achieving highperformance skin lesion detection systems. The findings have implications for clinical practice, where accurate and efficient diagnostic tools are essential for timely and effective patient care. Future research could investigate additional augmentation techniques like color shifting and explore advanced model architectures like vision transformers. Moreover, enhancing datasets could further improve performance and broaden the applicability of deep learning models in dermatology. ACKNOWLEDGMENT • Work was conducted on Secwepemc´ul’ecw, the unceded territory of the Secw´epemc. The TRU Kamloops campus operates on the traditional lands of the Tk’eml´ups te Secw´epemc. • This work acknowledges the support received from the NSERC Discovery grant RGPIN-2018-06787. REFERENCES [1] R. Ashraf, S. Afzal, A. U. Rehman, S. Gul, J. Baber, M. Bakhtyar, I. Mehmood, O.-Y. Song, and M. Maqsood, “Region-of-interest based transfer learning assisted framework for skin cancer detection,” IEEE Access, vol. 8, pp. 147858–147871, 2020. M. Elgamal, “Automatic skin cancer images classification,” International Journal of Advanced Computer Science and Applications, vol. 4, no. 3, 2013. [3] A. Kilic, A. Kilic, A. Kivanc, and A. Sisik, “Biopsy techniques for skin disease and skin cancer: A new approach.,” Journal of Cutaneous and Aesthetic Surgery, vol. 13, no. 3, pp. 251–254, 2020. [4] W. F. Cueva, F. Mu˜noz, G. V´asquez, and G. Delgado, “Detection of skin cancer ”melanoma” through computer vision,” in 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1–4, 2017. [5] R. Marks, “Epidemiology of melanoma,” Clinical and Experimental Dermatology, vol. 25, pp. 459–463, 2000. [6] M. A. Kadampur and S. Al Riyaee, “Skin cancer detection: Applying a deep learning based model driven architecture in the cloud for classifying dermal cell images,” Informatics in Medicine Unlocked, vol. 18, p. 100282, 2020. [7] T. Davenport and R. Kalakota, “The potential for artificial intelligence in healthcare,” Future healthcare journal, vol. 6, pp. 94–98, 2019. [8] M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov, “The transformational role of gpu computing and deep learning in drug discovery,” Nature Machine Intelligence, vol. 4, no. 3, pp. 211–221, 2022. [9] S. Naoumi, A. Bazzi, R. Bomfin and M. Chafii, "Complex Neural Network based Joint AoA and AoD Estimation for Bistatic ISAC," in IEEE Journal of Selected Topics in Signal Processing, doi: 10.1109/JSTSP.2024.3387299. [10] M. Delamou, A. Bazzi, M. Chafii and E. M. Amhoud, "Deep Learningbased Estimation for Multitarget Radar Detection," 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 2023, pp. 1-5, doi: 10.1109/VTC2023-Spring57618.2023.10200157. [11] K. Yang and L. Liu, "An Improved Deep Reinforcement Learning Algorithm for Path Planning in Unmanned Driving," in IEEE Access, vol. 12, pp. 67935-67944, 2024, doi: 10.1109/ACCESS.2024.3400159. [12] D. Chun, J. Choi, H. -J. Lee and H. Kim, "CP-CNN: Computational Parallelization of CNN-Based Object Detectors in Heterogeneous Embedded Systems for Autonomous Driving," in IEEE Access, vol. 11, pp. 52812-52823, 2023, doi: 10.1109/ACCESS.2023.3280552. [13] Y. Fang, Y. Zhang and C. Huang, "CyberEyes: Cybersecurity Entity Recognition Model Based on Graph Convolutional Network," in The Computer Journal, vol. 64, no. 8, pp. 1215-1225, Oct. 2020, doi: 10.1093/comjnl/bxaa141. [14] D. Arnold, M. Gromov and J. Saniie, "Network Traffic Visualization Coupled With Convolutional Neural Networks for Enhanced IoT Botnet Detection," in IEEE Access, vol. 12, pp. 73547-73560, 2024, doi: 10.1109/ACCESS.2024.3404270. [15] M. Zaabi, N. Smaoui, H. Derbel, and W. Hariri, "Alzheimer's disease detection using convolutional neural networks and transfer learning based methods," 2020 17th International Multi-Conference on Systems, Signals & Devices (SSD), Monastir, Tunisia, 2020, pp. 939-943, doi: 10.1109/SSD49366.2020.9364155. [16] R. Lahoti, S. K. Vengalil, P. B. Venkategowda, N. Sinha and V. V. Reddy, "Whole Tumor Segmentation from Brain MR images using Multi-view 2D Convolutional Neural Network," 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Mexico, 2021, pp. 4111-4114, doi: 10.1109/EMBC46164.2021.9631035. [17] Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, and Q. Sun, “Deep learning for image-based cancer detection and diagnosis: a survey,” Pattern Recognition, vol. 83, pp. 134–149, 2018. [18] M. Dildar, et al, “Skin cancer detection: A review using deep learning techniques,” International Journal of Environmental Research and Public Health, vol. 18, no. 10, p. 5479, 2021. [19] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Scientific Data, vol. 5, no. 1, 2018. [20] S. Abokadr, A. Azman, H. Hamdan, and N. Amelina, “Handling imbalanced data for improved classification performance: Methods and [2] 129 ICIIBMS 2024, Track 3: Bioinformatics, Biomedical, Bioengineering, Medical Imaging, Neuroscience and Natural Science, Tokyo-Okinawa, Japan, Nov. 21-24, 2024 challenges,” in 2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA), pp. 1–8, 2023. [21] K. Das, et al., “Machine learning and its application in skin cancer,” International Journal of Environmental Research and Public Health, vol. 18, p. 13409, 2021. [22] A.-R. Ali, J. Li, S. J. O’Shea, G. Yang, T. Trappenberg, and X. Ye, “A deep learning based approach to skin lesion border extraction with a novel edge detector in dermoscopy images,” 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, 2019. [23] M. Goyal, T. Knackstedt, S. Yan, and S. Hassanpour, “Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities,” Computers in Biology and Medicine, vol. 127, p. 104065, 2020. [24] T. Mazhar, et al, “The role of machine learning and deep learning approaches for the detection of skin cancer.,” Healthcare (Basel, Switzerland), vol. 11, no. 3, p. 415, 2023. [25] S. Chaturvedi, K. Gupta, P.S. Prasad, (2021). Skin Lesion Analyser: An Efficient Seven-Way Multi-class Skin Cancer Classification Using MobileNet. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore. https://doi.org/10.1007/978-981-15-3383-9_15. [26] L. Yu, H. Chen, Q. Dou, J. Qin, and P.-A. Heng, “Automated melanoma recognition in dermoscopy images via very deep residual networks,” IEEE Transactions on Medical Imaging, vol. 36, no. 4, p. 994–1004, 2017. [27] H.-W. Huang, B.W. Hsu, C.-H. Lee, and V.S. Tseng, (2021), Development of a light-weight deep learning model for cloud applications and remote diagnosis of skin cancers. J. Dermatol., 48: 310316. https://doi.org/10.1111/1346-8138.15683 [28] G . Alwakid, W . Gouda, M . Humayun, and NU. Sama. “Melanoma Detection Using Deep Learning-Based Classifications,” Healthcare (Basel). vol. 10, no. 12, p.2481,2022. [29] M Fraiwan, and E. Faouri, “On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning,” Sensors (Basel). Vol. 22, no. 13, p.4963, 2022. [30] M. Roshni Thanka, et al, “A hybrid approach for melanoma classification using ensemble machine learning techniques with deep transfer learning,” Computer Methods and Programs in Biomedicine Update, Volume 3, 2023, https://doi.org/10.1016/j.cmpbup.2023.100103. [31] K. Lilhore, et al. “A precise model for skin cancer diagnosis using hybrid U-Net and improved MobileNet-V3 with hyperparameters optimization.” Sci Rep, Vol. 14, p. 4299, 2024, https://doi.org/10.1038/s41598-024-54212-8. [32] J.V. Tembhurne, et al. “Skin cancer detection using ensemble of machine learning and deep learning techniques,” Multimed Tools Appl, Vol. 82, p. 27501–27524, 2023, https://doi.org/10.1007/s11042-02314697-3. [33] M. Hosseinzadeh, et al. “A model for skin cancer using combination of ensemble learning and deep learning.” PloS one, vol. 19, 5, 2024, doi:10.1371/journal.pone.0301275. [34] Md. Hossain, et al. "Combining State-of-the-Art Pre-Trained Deep Learning Models: A Noble Approach for Skin Cancer Detection Using Max Voting Ensemble" Diagnostics, Vol. 14, no. 1-89, 2024, https://doi.org/10.3390/diagnostics14010089. [35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015. [36] A. G. Howard, et al, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017. [37] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” 2019. [38] A. Howard, et al, “Searching for mobilenetv3,” 2019. [39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015. [40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012. [41] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” 2019. [42] I. U. Haq, K. Muhammad, A. Ullah, and S. W. Baik, “Deepstar: Detecting starring characters in movies,” IEEE Access, vol. 7, pp. 9265– 9272, 2019. [43] K. Muhammad, S. Khan, V. Palade, I. Mehmood, and V. H. C. de Albuquerque, “Edge intelligence-assisted smoke detection in foggy surveillance environments,” IEEE Transactions on Industrial Informatics, vol. 16, no. 2, pp. 1067–1075, 2020. [44] K. Muhammad, R. Hamza, J. Ahmad, J. Lloret, H. Wang, and S. W. Baik, “Secure surveillance framework for iot systems using probabilistic image encryption,” IEEE Transactions on Industrial Informatics, vol. 14, no. 8, pp. 3679–3689, 2018. [45] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– 1359, 2010. [46] A. Esteva, et al, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115–118, 2017. [47] S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, “Early diagnosis of alzheimer’s disease with deep learning,” in 2014 IEEE 11 th International Symposium on Biomedical Imaging (ISBI), pp. 1015– 1018, 2014. [48] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3462–3471, 2017. [49] L. Yu, H. Chen, Q. Dou, J. Qin, and P.-A. Heng, “Automated melanoma recognition in dermoscopy images via very deep residual networks,” IEEE Transactions on Medical Imaging, vol. 36, no. 4, p. 994–1004, 2017. [50] F. Xie, H. Fan, Y. Li, Z. Jiang, R. Meng, and A. Bovik, “Melanoma classification on dermoscopy images using a neural network ensemble model,” IEEE Transactions on Medical Imaging, vol. 36, no. 3, pp. 849– 858, 2017. [51] M. Q. Khan, et al, “Classification of melanoma and nevus in digital images for diagnosis of skin cancer,” IEEE Access, vol. 7, pp. 90132– 90144, 2019. 130