Using AI Image Recognition for Breast Cancer Detection and Classification
Overview
Artificial intelligence (AI) refers to the algorithms and processes that are able to mimic human intelligence in terms of cognitive functions such as problem solving. Machine learning and deep learning are subcategories of AI. Machine learning involves machines being able to learn and develop over time through either data or a model and deep learning involves attempting to imitate human neural networks without the need for pre-processed data. Some major applications of AI include voice search and recognition and personalized shopping recommendations.
In this tutorial, you will be learning how to apply the concept of transfer learning, a subset of deep learning, on the Rescale platform. Transfer learning focuses on training a model on a base dataset and a base task, and then transferring what the model has learned from that training onto another dataset and task. There are two approaches to transfer learning: developing the model yourself or using a pre-trained model. When developing a model by yourself, you must first select a predictive modeling problem that has an abundance of data to train on. Then, you must develop a skillful model where the model learns something from the base task, but not too much or else the second task will not learn much. Then, use the model learned from the base task as a starting point for the second task. However, when using a pre-trained model, you just have to choose an open source pre-trained model and just use it as a starting point for the second task instead of developing the predictive modeling problem and training it on a base task yourself.
A plethora of pre-trained models are available on the internet such as the Microsoft ResNet50 and the Google Inception Model. Today, you will be using the Microsoft ResNet50, an image classification model that was pre-trained on large datasets of images and requires the model to make predictions on a large number of classes. This allows the model to learn how to extract features from a photograph and tell the user what the input image is showing.
More specifically, this tutorial will show you how to use the Microsoft ResNet50 pre-trained model to further train it on a subset of a large dataset of 780 breast cancer ultrasound images (also known as training), obtained in 2018 from women in ages between 25 and 75 years old, and then inputting another breast ultrasound picture from that same dataset to see whether the breast in the image is malignant, benign, or normal (also known as validation). You will be using the Google Chrome Interactive software and the Conda Miniconda Interactive software to run this tutorial on Rescale Workstations. Rescale Workstations will help you interact with the model in real-time – allowing you to change the image that you want to classify and to modify the code. For the purpose of this tutorial, we will not be going through every single block of code, but instead will be focusing on getting it set up on Rescale as well as the results.
This tutorial was taken from Kaggle’s “Breast Cancer Detection Using ResNet50” by Khizar Khan. The breast ultrasound image dataset was taken from Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863.
Video Tutorial
Configuring Your Workstation
Go through the following sections to properly configure your workstation.
Monitoring Your Workstation
Now you can monitor the progress of your workstation from the Workstations home screen.
On the Workstations home screen, you can look at My Workstations to look at all of the workstations you have created as a Rescale user:
Or you can look at Active Workstations to look at the workstations you are currently running:
Improving Workload with Coretype Explorer
Optionally, you can help optimize your workload for performance and/or cost by using Rescale’s Coretype Explorer.
Rescale’s feature Coretype Explorer comes in handy when determining which core type and how many cores to use when completing a project. Coretype Explorer allows you to compare different core types across different areas such as Cores/Node, Memory/Node, Storage/node, GPUsGPUs (Graphics Processing Units) are specialized electronic .../Node, Price – On demand, and Price – On Demand Priority to help you choose the best core type for your financial, memory, and project needs. In order to use Coretype Explorer, you must go to Hardware Settings in the Setup. It should be located at the bottom right corner of the Specify Hardware Settings table:
As shown below, four different coretypes, Citrine, Iolite-1, Emerald, and Ruby, were compared based on Cores/Node, Memory/Dode, Storage/Node, and Price – ODP (on demand priority). These four coretypes were chosen to be compared because Emerald is highly popular for running jobs and workstations because of its cheaper costs, and Citrine, Iolite-1, and Ruby all had similar costs to one another, so their memory, storage, and cores would be tested. Ruby was chosen for this specific tutorial because although it costs more than Emerald, at the same price of Citrine and Iolite-1, it has a significantly higher Cores/Node, Memory/Node, and Storage/Node than all of the other core types which is perfect for projects that require a lot of memory usage and running code in an interactive environment like the Rescale Workstation. However, if you value price over memory, storage, and cores, you would probably choose Emerald.
Conclusion
This tutorial shows you how to classify an image from a given breast ultrasound image dataset that was collected in 2018 which was trained using the Microsoft ResNet50 Image Classification Model. In the future, you can change which image you would like to classify from the image dataset by copying and pasting a different image file path into the last block of code. You may also try to train a different image dataset using the same principles and the Microsoft ResNet50 to classify a different topic of images.
Furthermore, different coretypes can be compared to one another using Coretype Explorer to help you choose which one you would like to use to run your workstation in the Setup: Hardware Settings according to your project and financial needs. Please see Further (Optional) Steps for more information on how to navigate this option. Image classification models are widely used in artificial intelligence and machine learning for object identification in satellite images, animal classification, medical imaging, and brake light detection.
Completing this tutorial on Rescale helps you leverage high computing power, access to different types of hardware and software to test scalability and diversity for the same simple workstation, allows you to interact with the code in real-time and change the input image to classify, and cut the code runtime to less than half compared to if you were to run it on a local Python program on your computer.