1“M. Albanesi” Allergy and Immunology Unit, Bari, Italy
2The Allergist, Bari, Italy
3Department of Engineering and Science, Universitas Mercatorum, Rome, Italy
4Department of Medicine and Surgery, LUM University, Casamassima, Italy
Received Date: 26/01/2025; Published Date: 28/02/2025
*Corresponding author: Stefano Palazzo MSc, P. Eng. “M. Albanesi” Allergy and Immunology Unit, Bari, Italy; The Allergist, Bari, Italy; Department of Engineering and Science, Universitas Mercatorum, Rome, Italy
Email ID: stefanopalazzo971418@gmail.com; ORCID: 0009-0000-7274-5800
Tumor grading is an important aspect of diagnosis, prognosis, and treatment of cancer. Conventional grading systems, including the mitotic index for breast cancer and the Gleason score for prostate cancer, are typically based on manual evaluations susceptible to inter- and intra-observer variability. These limitations highlight the importance of standardised, objective, and automated methods to enhance the accuracy of cancer grading.
Supervised learning systems have shown great promise in tumor quantification and grading, with consistent and reproducible results across a wide range of cases. Various algorithms, including Support Vector Machines, Random Forests, and Convolutional Neural Networks, also facilitate recognition of intricate cellular patterns, automating grading, and lessening dependence on subjective assessments.
Supervised systems generally start with image pre-processing, followed by feature extraction, model learning and model validation processes which will all increase the diagnostic accuracy.
Automated cellular quantification, observer variation reduction, and the addition of molecular and genomic data are all designed to assist the pathologist in creating a better grade. Aspects like a shortage of annotated datasets, poor model generalizability over heterogeneous data and the penury of robust clinical validation will make a way to but emerging solutions, like Generative Adversarial Networks for data augmentation and Explainable AI for model transparency, will open new avenues.
Integrating supervised learning systems into oncology workflows could change the landscape of personalized cancer care. Through improving diagnostic accuracy, process efficiency, and personalized therapeutic management, these innovations lead to better clinical outcomes and may create future opportunities for advances in translational oncology research.
Keywords: Supervised Learning Systems; Tumor Cell Quantification; Histological Image Analysis; Convolutional Neural Networks; Support Vector Machines; Random Forest; Tumor Grading Automation; Digital Pathology; Precision Medicine
Any diagnosis and prognosis of a cancer grading adds value for treatment application and progression forecasting. For a long time, however, Tumor grading systems in breast cancer have relied on manual assessment of mitotic index, nuclear morphology, and glandular structure [1]. While these assessments are important, they are subject to variability both within and between observers, resulting in a heavy dependence on the pathologists forming the diagnosis. This inherent variability reinforces the important of standardized and objective methods to cancer grading [2], which predominantly focuses on breast cancer; however, these challenges are not limited exclusively to this type of cancer.
For instance, within prostate cancer, the Gleason scoring system is a well-known system in the grading of tumors but is affected by inter observer variability [3]. Furthermore, Glioma tumors are graded using features such as cell density yet there is still difficulty in differentiating between low and high-grade glioma tumors accurately [4,5].
The grade of sarcoma is similarly determined by cellular atypia, mitotic count, and necrosis. Despite being pivotal in prognostication, it suffers from inter-evaluator variability, thereby making standardization of therapeutic approaches challenging [6].
Furthermore, evaluation of gastrointestinal tumors is based on grading systems such as the GIST (Gastrointestinal Stromal Tumors) system which takes into account factors like the mitotic rate as well as tumor size [7,8]. While widely used, such methods may still be subject to variability based on subjective interpretation. For pancreatic cancer, the evaluation of glandular differentiation relies heavily on the histological grading system, which is also hindered by the issues present in observer bias and the complexity of the tissue architecture [9].
Grading in hematological malignancies including lymphomas carries added complexity, wherein systems such as the Ann Arbor staging and Lugano classification incorporate not only histological features, but also radiological findings that may be subject to subjective interpretation [10,11]. Additionally, melanoma grading includes metrics such as Breslow thickness and ulceration status, and minor variations in measurement can have a major effect on staging and treatment decisions [12].
It is very clear from these challenges that advanced computational methods and models need to be employed as adjuncts to the standard grading of various cancers [5]. These examples highlight the need for more robust and objective tools to assist pathologists in tumor grading.
In the last couple of years, the development of supervised learning systems has been helpful in solving various grading problems. These systems improve the diagnostic of tumours by minimising the reliance on subjective evaluations and improving the turnaround time and also reducing the need to rely on subjective evaluations. Furthermore, they offer better consistency across the cases enhancing the integrity and reliability of cancer grading [5,13,14].
Paige application (https://paige.ai/) is a good example of such technology innovation, it uses AI to help diagnose breast and prostate cancers based on biopsy samples. With the help of AI, Paige has been able to improve diagnostic processes by detecting certain patterns in tissue samples that might be difficult for medical specialists to see [15,16].
Other grading systems, including for gliomas and sarcomas, are also utilizing AI models with promise for decreasing variability and increasing precision in malignancies. We have seen the emergence of AI enabled platforms helping in the grading of lymphomas and melanoma too, applying image recognition and pattern detection, to tackle the complexities in the systems [17,18].
The future of oncology looks towards the integration of such tools into practice so that oncology becomes more accurate and more streamlined in grading of tumours [19,20-22].
Application of supervised learning algorithms in tumor cell quantification has represented a real step forward in computer-assisted diagnosis. Such algorithms allow analysis of complex data, improving the accuracy and efficiency in the classification and quantification of cells by exploiting statistical and machine learning approaches [23-25].
Some of the most relevant techniques are analyzed hereinafter:
Support Vector Machines (SVM):
The SVMs are very suitable for small datasets featuring linear or nonlinear separations [26].
This makes them suitable for applications where the available data are limited, but well labeled. But these algorithms can become ineffective on very large datasets due to computational complexity [27].
Random Forest:
Random forests perform well on complex datasets with feature selection, enabling interpretability [28].
Each of these trees adds the robustness of the model as a whole and consequently lowers the risks of overfitting. Also, there is automatic feature prioritization for identifying the most important parameters of classification. However, in this regard, if the dataset is not balanced, there will be a risk of overfitting on irrelevant features [29].
Convolutional Neural Networks (CNN):
The CNN is particularly suitable for analyses of highly complicated histological images since they can directly learn feature hierarchies from raw data [30,31].
The major advantage they bring forth is the ability to identify complex and subtle cellular patterns without the need for manual feature engineering.
CNNs, in particular, have been proven to outperform in classifying histological images, which is evident from the work on non-small cell lung cancer images using a ResNet-50 network that achieved an accuracy of 95%, far superior to traditional methods [32,33].
This result highlights their ability to identify complex and subtle cellular patterns, distinguishing normal cells from tumor cells with high accuracy. It is always crucial to ensure high dataset quality to avoid bias.
Furthermore, the use of advanced architectures such as ResNet and EfficientNet has further improved performance on complex datasets, making it possible to identify subtle patterns that may be imperceptible to humans [34,35].
Supervised learning algorithms have proven very effective in several real-world clinical applications. As an example, in the grading of prostate cancer, deep learning models, including CNNs, have been integrated with the widely accepted Gleason scoring system to both decrease inter-observer variability and increase both objectivity and workflow efficiency [36]. For example, supervised models have also been employed in glioma classification, where they help differentiate low-grade from high-grade tumors by examining patterns in cell density and other histological features, improving prognostic accuracy [5].
The examples showcase the impact of machine learning on cancer diagnostics, empowering clinicians with accurate and actionable data that were otherwise difficult to obtain.
Common examples include a structured pipeline, typically followed for the quantification of tumor cells, divided into several phases.
First of all, it is preprocessing with the aim to enhance image quality by noise reduction, equalization of illumination, and segmentation of regions of interest. Advanced techniques, which include adaptive shading for illuminating correction, even out the brightness across the sample to offer better visibility of critical structures at a cellular level [37]. Spatial neural network-based filtering further increases image-to-image consistency by diminishing the effects of instrumental variation and enhances the accuracy of the analytical work [38].
Next, the process of annotation is performed, and it involves creating a labeled dataset with the input of pathological experts. Semi-automatic annotation methods, such as Active Learning-based algorithms, are gaining much popularity due to their great ability in reducing the experts' workload while maintaining high accuracy in labeling. In this context, standardization is important to minimize discrepancies among experts.
Feature extraction involves identifying relevant features, such as cell size, shape, and texture, or learning them from deep learning models [39].
The feature extraction techniques based on histograms of orientation gradients (HOG) and Gabor texture descriptors can be used to complement the deep learning models to enhance the ability of the system in distinguishing between normal and tumor cells.
Training is considered the heart of the whole workflow, where labeled data is actually used for building the predictive model. Advanced optimization strategies like AdamW or Ranger significantly improved the capability of models to converge fast without facing overfitting problems; thus, they generalize on new data [40].
Finally, the performance of the model is evaluated by validation with metrics such as accuracy, precision, AUC-ROC, and F1-score. Stratified cross-validation is necessary to ensure the results are robust and generalizable, meaning that the model performs well even on data other than those used in training [41].
Supervised systems have demonstrated a significant impact on tumor grading. A practical example is the use of Convolutional Neural Networks (CNNs) for prostate cancer grading, where a supervised model outperformed the average diagnostic accuracy of expert pathologists, reducing inter-observer variability [42,43].
These systems provide diagnostic accuracy through the reduction of variability between pathologists by automating cellular quantification and give objective support to grading, making it less subjective. They allow the identification of complex cellular patterns that might be difficult for the human eye to catch and allow the creation of heat maps to visualize regions of interest, providing extra clinically relevant information. This automates the diagnostic process thanks to the use of such systems, enabling the integration of objective and technological elements into a context that was traditionally based on the visual and subjective experience of the pathologist. As a result, supervised systems significantly improve tumor grading by automating cellular quantification and reducing inter-observer variability. However, their integration into clinical workflows presents challenges, such as ensuring compatibility with hospital systems like PACS and RIS, training medical personnel, and managing high implementation costs. Open-source and cloud-based solutions offer viable alternatives for resource-limited settings [44-47].
With proper validation by regulatory bodies, these systems can be seamlessly incorporated into clinical practice, enhancing diagnostic accuracy and patient care.
Interventions of supervised learning enable one to do model-specific learning for prognosis. Further study of cellular properties and biomarkers, like cell density and proliferation in tumorigenesis, promises far higher accuracy in prediction by integrating the clinical outcomes and developing models based on individual persons [48]. Integrated models comprising a total tissue-wide histological basis as obtained along with molecular and genetic bases currently make available all-new tools for precision prognosis; further, this has just gotten farther advanced into more precised medicine [49].
These innovations find a clear application in supervised learning models, such as Gene Set Enrichment Analysis (GSEA) and other statistical methods that incorporate histological, genomic, and biomarker data (e.g. HER2, PD-L1). These methods improve therapy stratification, enhancing prognostication in precision medicine [50-52]. By sharing data and using standardized protocols, multi-institutional models enhance reliability and generalizability, which ultimately advances oncology towards more tailored and individual treatment.
Despite the progress, several challenges still persist.
Annotating data involves so much time and resources that large and high-quality datasets are few and far in between [53]. Data Augmentation approaches like Generative Adversarial Networks (GAN) allow for the generation of new realistic images to mitigate such a limitation [54].
Generalization of models is another big challenge, since most of the models perform poorly on data from sources different from those seen during training. Domain Adaptation techniques may improve the performance of models in heterogeneous environments [55].
Clinical validation is the most important step, where rigorous large-scale validation needs to be performed before clinical implementation. Multicenter studies with heterogeneous cohorts are necessary in order to test the robustness and reliability of the systems.
The problem of accessibility in developing countries is because of the low condition of the infrastructure and a requirement of training among the staff. Investments in low-cost technologies and global partnerships could ease their adoption into less privileged contexts. There are many challenges despite progress.
Most large and high-quality datasets are not available because annotation of data is a very time-consuming process and requires a lot of resources. Data Augmentation techniques like Generative Adversarial Networks (GAN) generate new images that appear realistic [56].
Model generalization is another key challenge because many models have poor performance when applied to data from sources other than those the model was trained on. Domain Adaptation techniques enhance the capability of models to operate in heterogeneous environments.
Supervised learning systems represent a promising frontier for tumor cell quantification and oncology grading.
Future research efforts should be directed toward the use of molecular and genomic information to enhance predictive accuracy, increasing insight into the biological mechanisms underpinning the disease processes [57].
Another area of potential interest is the development of automated pipelines, which depend less on manual annotation hence optimizing the data analysis pipeline.
Multicenter clinical trials will be set up to ensure that the robustness and reliability of the models are tested on heterogeneous populations in real conditions.
Further, a more detailed analysis will be enabled, improving diagnosis and prognosis, by possibly combining imaging, clinical, and genomic data using various approaches.
Finally, the use of Explainable AI (XAI) systems is necessary to ensure transparency and trust in the clinical use of models, making the results more interpretable and acceptable for healthcare professionals [58].
With continuous development in machine learning technologies, automation of diagnostic processes may become a fundamental pillar in personalized medicine, significantly improving the quality of oncology care and opening new avenues for translational research.
Acknowledgments: None
Authorship
S.P. conceived the project idea and conducted the relevant literature review. He wrote the manuscript and managed the technical and engineering development, consistent with his role as a biomedical engineer, given the technical nature of the paper.
A.D., as an anatomic pathologist, provided support in drafting the clinical section of the manuscript, ensuring the accuracy of the more strictly clinical-medical aspects.
Consent for Publication: All the authors have approved the manuscript and the submission.
Funding: The authors declare that this study was carried out with institutional resources only.
Conflict of Interest: The authors declare no competing financial interests.