Abstract

Mesothelioma is an aggressive lung cancer, harms the linings of the lungs. It is one of the deadliest cancers diagnosed in those exposed to fibrous silicate minerals (asbestos). Millions of people face severe consequences as they are diagnosed at late stages. This study presents a comparison of several machine learning approaches with distinct feature sets and addresses the issue of class imbalance. The dataset used in this study is available publicly on the University of California Irvine (UCI) machine learning repository. This study used the resampling technique, synthetic minority oversampling technique (SMOTE), and adaptive synthetic sampling (ADASYN) to handle the class imbalance. Most of the machine learning strategies performed well with the resampling technique. The best accuracy using the resampling strategy was achieved by artificial neural networks (ANN). The highest accuracy was recorded on the feature set selected by principal component analysis (PCA) is 96%. Overall, ensemble techniques performed well. The proposed stacking-based classifier achieved the highest accuracy (89%) on data balanced using SMOTE and ADASYN.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
You do not currently have access to this article.