A Machine Learning Model for Prostate Cancer Prediction in Korean Men
Article information
Abstract
Purpose
Unnecessary prostate biopsies for detecting prostate cancer (PCa) should be minimized. Therefore, this study developed a machine learning (ML) model to predict PCa in Korean men and evaluated its usability.
Materials and Methods
We retrospectively analyzed clinical data from 928 patients who underwent prostate biopsies at Kangwon National University Hospital between May 2013 and May 2023. Of these, 377 (41.6%) were diagnosed with PCa, and 551 (59.4%) did not have cancer. For external validation, clinical data from 385 patients aged 48–89 years who underwent prostate biopsies from September 2005 to September 2023 at Wonju Severance Christian Hospital were also included. Twenty-two clinical features were used to develop an ML model to predict PCa. Features were selected based on their contributions to model performance, leading to the inclusion of 15 features. A meta-learner was constructed using logistic regression to predict the probability of PCa, and the classifier was trained and validated on randomly extracted training and test sets at an 8:2 ratio.
Results
The prostate health index, prostate volume, age, nodule on digital rectal examination, and prostate-specific antigen were the top 5 features for predicting PCa. The area under the receiver operating characteristic curve (AUC) of the meta-learner logistic regression model was 0.89, and the accuracy, sensitivity, and specificity were 0.828, 0.711, and 0.909, respectively. Our model also showed excellent prediction performance for high-grade PCa, with a Gleason score of 7 or higher and an AUC of 0.903. Furthermore, we evaluated the performance of the model using external cohort clinical data and achieved an AUC of 0.863.
Conclusions
Our ML model excelled in predicting PCa, specifically clinically significant PCa. Although extensive cross-validation in other clinical cohorts is needed, this ML model is a promising option for future diagnostics.
INTRODUCTION
Prostate cancer (PCa) is the most frequently diagnosed cancer in more than half of the countries around the world (112 out of 185) and the fifth leading cause of cancer death among men in 2020 [1]. According to the National Cancer Information Center, PCa was the third most frequently diagnosed cancer in Korean males in 2020. The incidence of PCa in Korea has increased significantly, while the mortality rate has remained stable [2]. The low mortality rate compared to the high incidence rate reflects advancements in treatment and earlier detection through increased screening [3,4].
For several decades, the serum prostate-specific antigen (PSA) was introduced as a screening tool for PCa and has been widely used as the most valuable diagnostic marker [5]. However, there has been controversy about the usefulness of the PSA screening test. First, PSA is prostate-specific, not cancer-specific; therefore, there are various factors that might affect PSA levels, such as benign prostatic hyperplasia, prostatic inflammation, and ejaculation [6]. Therefore, it has low specificity. The positive-predictive value for PSA was approximately 25% in a pooled meta-analysis [7], leading to many false-positive results. This finding shows that up to 3 of 4 patients undergo unnecessary prostate biopsies. Therefore, there is an unmet need for a new, convenient method to improve the diagnostic ability of PCa.
Recently, active research has been conducted in the medical field for diagnosis, patient morbidity or mortality risk assessment, and treatment strategies based on machine learning (ML) models [8-10]. In PCa, ML is applied to perform image analysis tasks such as prostate segmentation [11] and pathological slides [12]. It is also applied to high level inference and prediction tasks such as PCa detection and characterization [13]. These studies applied MRI or CT image-based information to ML techniques to increase the accuracy of PCa lesion detection.
In this study, we developed an ML prediction model and evaluated its performance in predicting PCa occurrence. Our ML model based on clinical information will help clinicians make decisions before prostate biopsy, and it will lead to a substantial reduction in unnecessary biopsies, medical resources, and patient suffering.
MATERIALS AND METHODS
1. Study Subjects
The data of 928 male patients who underwent transrectal ultrasound-guided prostatic biopsy from May 2013 to May 2023 at Kangwon National University were retrospectively retrieved. The study population included men aged 40–95 years with total PSA (tPSA) ≥3.5 ng/mL who underwent prostate biopsy for suspected PCa. All patients underwent a systematic biopsy with 10–12 cores. Clinical information, including PSA and prostate health index (PHI), of these patients was collected. For external validation of the model, clinical data from 385 patients aged 48–89 years who underwent prostate biopsy from September 2005 to September 2023 at Wonju Severance Christian Hospital were also included.
2. Clinical Information
Clinical data were selected for features considered to be highly or potentially related to PCa. First, clinical data include PSA, family history of PCa, abnormal digital rectal examination (DRE), and prior prostate biopsy results, which are constitutive factors of the Prostate Cancer Prevention Trial risk calculator [14]. Clinical information consists of 22 characteristics, including age, height, weight, the American Urological Association (AUA) symptom score, hypertension, diabetes status, PHI, high-density lipoprotein (HDL), triglyceride (TG), fasting blood sugar (FBS), glycosylated hemoglobin, C-reactive protein, lactate dehydrogenase, and prostate volume. Prior to the prostate biopsy, blood was drawn to measure the prebiopsy tPSA, free PSA (fPSA), and [-2]proPSA (p2PSA) levels. Patients with waist circumference ≥90 cm, TG ≥150 mg/dL, HDL <40 mg/dL, FBS ≥100 mg/dL, and blood pressure ≥130/85 mmHg or who were taking hypertension medication were considered to have metabolic syndrome. The blood samples were processed using the Access2 immunoassay kit (Beckman Coulter, Brea, CA, USA). The serum samples were analyzed using calibrated Access tPSA and fPSA assays at a single laboratory. The prostate volume was determined using transrectal ultrasonography. A single skilled genitourinary pathologist who was blind to the test results processed and evaluated the specimens. PCa was identified and graded according to the 2005 consensus conference of the International Society of Urological Pathology definitions. The PHI was calculated as [(p2PSA/fPSA)×√tPSA].
3. Data Preprocessing and Feature Selection
Among the 22 characteristics, features with very low importance were excluded (data not shown). Finally, 15 variables were selected to construct the prediction model, including 7 categorical variables (family history, prior biopsy results, finasteride, DRE nodules, metabolic disease, hypertension, and diabetes) and 8 continuous variables (age, AUA symptom score, height, weight, testosterone levels, PSA, PHI, and prostate volume). The categorical data were divided based on their presence or absence. All categorical features were preprocessed using one-hot encoding. Continuous variables were normalized using the robust method, which is resilient to outliers. Furthermore, the imputation of missing values was carried out using the light gradient boosting machine (LightGBM) algorithm through an iterative approach [15]. The data from Wonju Severance Christian Hospital used for external validation did not include PHI values. The 14 clinical features were included in the external validation test excluding PHI. Data preprocessing was performed in the same manner as Kangwon National University.
4. Model Development and Evaluation
The model employed in this study was based on a stacking framework designed to minimize bias and variance. The base learners consisted of LightGBM, random forest, and logistic regression, and based on the 3 base learners, the meta-learner was a logistic regression model as shown in Fig. 1 [15-17]. The hyperparameters of each base learner were selected through 10-fold cross-validation on the training patient data. Model evaluation was conducted by assessing the model's stability via 1,000 bootstrapping iterations [18]. Additionally, feature importance for each model variable was determined using the SHAP (SHapley Additive exPlanations) method [19]. The performance of the model was evaluated by comparing 4 metrics: the receiver operating characteristic (ROC) curve and its corresponding area under the curve (AUC), accuracy, sensitivity, and specificity.
5. Statistical Analysis
For comparative analysis between the 2 groups (PCa vs. non-PCa), Student t-test was used for normally distributed continuous variables, and the Mann-Whitney U-test was used for categorical variables with nonnormal continuous variables. The box plots for these analyses were constructed with the “ggplot2” package in R v4.3.1 (R Foundation for Statistical Computing, Vienna, Austria). All statistical significance levels were set at 5% (p<0.05). The bootstrap method was employed to split the training set into an 8:2 ratio for 1,000 iterations for evaluating the model’s performance, assessing the model’s stability using test dataset.
RESULTS
1. Patient Data
Nine hundred twenty-eight men underwent prostate biopsy owing to suspicion of PCa caused by tPSA over 3.5 ng/mL at Kangwon National University Hospital from May 2013 to May 2023. Out of the 928 patients, 377 (41.6%) were diagnosed with PCa (Table 1), and among them, 324 patients (86%) had aggressive cancer with a Gleason score (GS) ≥7 (Table 2). Among PCa patients, 189 patients (57.0%) had tPSA of 3.5–10 ng/mL, so-called gray zone, and 10.8% PCa patients had tPSA <4 ng/mL. In the case of Wonju Severance Christian Hospital, among the 385 patients, 117 (30.4%) were diagnosed with PCa, and among them, 222 (83%) had aggressive cancer with GS ≥7. Among the PCa patients, 117 patients (44%) had a tPSA of 3.5–10 ng/mL, which is referred to as the gray zone.
2. Baseline Patient Characteristics
The baseline characteristics of all the subjects are shown in Tables 1, 2, and Fig. 2. Among the continuous variables (Fig. 2A), age, PSA, and PHI levels were significantly higher in the PCa group than in the non-PCa group. However, prostate volume, %fPSA, and the body mass index (BMI) were lower in the PCa group than in the non-PCa group. No significant differences were observed in the other continuous variables between the 2 groups. Among categorical variables, only DRE positivity was found to be significantly higher in the cancer group (Fig. 2B). These characteristics show similar trends in the subgroups of GS <7 and GS ≥7 (Table 2). For more aggressive cancers (GS ≥7), age, PSA, and PHI were significantly higher, and DRE positivity was more frequent (p<0.05). However, the %fPSA was not significant between the 2 subgroups.
3. Model Evaluation
The performance of the stacked model was rigorously evaluated using a bootstrap method with 1,000 replications to estimate 95% confidence intervals (CIs), ensuring the robustness and reliability of the model assessments. The AUC, accuracy, sensitivity, and specificity were the metrics used for evaluation (Table 3).
As a base learner, LightGBM achieved an AUC of 0.874, with 95% CI of 0.861–0.884, indicating a high degree of discriminative ability. The model also achieved an accuracy of 0.809 (95% CI, 0.792–0.825), sensitivity of 0.709 (95% CI, 0.654–0.764), and specificity of 0.877 (95% CI, 0.841–0.909), reflecting a balanced performance between identifying positive and negative classes. Random forest, another base learner, exhibited superior performance with an AUC of 0.907 (95% CI, 0.892–0.893), suggesting excellent classification capabilities. Accuracy was 0.836 (95% CI, 0.815–0.856), coupled with a sensitivity of 0.779 (95% CI, 0.724–0.827) and a specificity of 0.875 (95% CI, 0.837–0.909), underscoring the robustness of the model in handling diverse PCa patients. The logistic regression base learner showed an AUC of 0.844 (95% CI, 0.832–0.854), accuracy of 0.775 (95% CI, 0.756–0.794), sensitivity of 0.625 (95% CI, 0.565–0.681), and specificity of 0.878 (95% CI, 0.837–0.914). While the model presented the least discriminative power among the base learners, a high level of specificity was maintained.
The meta-learner, also a logistic-regression model, demonstrated an integrated assessment with an AUC of 0.903 (95% CI, 0.891–0.915). This high AUC value, along with an accuracy of 0.833 (95% CI, 0.815–0.850), sensitivity of 0.754 (95% CI, 0.701–0.804), and specificity of 0.887 (95% CI, 0.855–0.918), validated the effectiveness of the stacking approach in achieving a harmonized prediction by leveraging the strengths of individual base learners.
4. Diagnostic Performance of the Metamodel
The diagnostic performance of metamodel for PCa detection in the test dataset is shown in Table 4, and the ROC curves are shown in Fig. 3. In terms of PCa detection, the metamodel achieved an AUC of 0.890, accuracy of 0.828, sensitivity of 0.711, and specificity of 0.909, indicating high diagnostic accuracy. Our model also demonstrated a superior AUC of 0.903 for high-grade PCa (GS ≥7), suggesting a more precise ability to predict clinically significant PCa.
5. Feature Importance
In our predictive model analysis, the Shapley additive explanation method was employed to quantify the impact of each feature on the performance of the model. The resultant feature importance plots (Fig. 4) reveal the varying influences of features for each model, including logistic regression, LightGBM, and random forest. The feature importance ranking slightly differs between base learners. However, PHI, prostate volume, age, DRE nodule presence, and PSA were the top 5 important features of the final combined meta learning model (Fig. 4C).
6. External Validation
We verified the model performance on 385 data samples from the Wonju Severance Christian Hospital. However, as this hospital did not collect PHI values, 14 clinical features were included in the external validation test. The model achieved an AUC of 0.863 (Fig. 5A) and accuracy, sensitivity, and specificity of 0.82, 0.84, and 0.76, respectively. Prostate volume, PSA level, age, and DRE nodule presence were also ranked as important features for the final combined meta-learning model (Fig. 5B).
DISCUSSION
Presently, PSA is the most widely used and representative tumor marker for the screening and prognosis of PCa [5,20]. However, the problem is that unnecessary biopsies frequently occur owing to the low specificity of PSA [7]. Unfortunately, there are no ideal biomarkers for PCa that can distinguish PCa from benign prostatic conditions and differentiate between aggressive and indolent cancers. Therefore, considerable effort is being made to develop new noninvasive novel biomarkers with high sensitivity and specificity for PCa detection [21-23]. Several new biomarkers, such as the 4K score and PHI, have shown promising results in improving PCa risk assessment [24,25]. Previously, our research team reported on the usefulness of PHI, when the PHI cutoff value was set to 22.9 in gray zone patients, the sensitivity for the detection of PCa was 90%, and the specificity was 68.3%, indicating good results [26].
In this study, we developed an ML model that merged several ML algorithms based on the clinical information of 928 Korean men, which demonstrated excellent PCa prediction accuracy. The model also showed excellent predictive performance for aggressive PCa. The final verification of the model’s performance using separate test data showed excellent prediction performance, and the evaluation of 4 metrics (AUC, accuracy, sensitivity, and specificity) was within the 95% CI of the model constructed by bootstrapping 1,000 iterations using training data. Furthermore, we evaluated the performance of the model by using external cohort clinical data and achieved an AUC of 0.863. These results confirm the robustness and stability of the model. In our study, prostate volume, PSA level, age, and DRE nodule presence were the most significant characteristics influencing the PCa prediction model.
A few Asian studies have reported on ML prediction models using clinical information. Chen et al. [27] constructed 5 models using 4 algorithms for 551 Chinese men. The multivariate logistic regression model exhibited the best performance, demonstrating the limitations of using PSA alone as a predictor of PCa. Jeong et al. [28] developed a logistic regression model using the clinical variables of 3,482 Korean men. The predictive accuracy (AUC=0.81) showed higher benefits than other PCa risk calculators (Prostate Cancer Prevention Trial Risk Calculator; European Randomized Study of Screening for Prostate Cancer Risk Calculator).
Our study differs from existing studies in 3 main aspects.
First, our model was constructed using more clinical information of patients than existing studies.
Second, from a methodological perspective, until now, most PCa prediction ML models have been developed with a single algorithm or used to compare single algorithms. However, we adopted an ensemble method that combined multiple ML algorithms rather than a single model. Such methods improve predictive performance, like the accuracy and stability of a single model, by training multiple models and combining their predictions [29]. The main premise of our model is that by combining multiple models, the error of a single model is more likely to be compensated by other models, resulting in better overall prediction performance for the ensemble model than for a single model.
Third, we also achieved high specificity, which can compensate for the shortcomings of PSA (Table 4). In the training dataset, the ML model outperformed with specificity of 0.887, and the performance of the test dataset also maintained high specificity (0.909) in both the overall population and high grade PCa patients. High specificity indicates low false positive rates and allows better identification of patients who do not have cancer. This approach might lead to a substantial reduction of unnecessary biopsies.
To the best of our knowledge, we demonstrated the development of a PCa prediction ensemble ML model based on clinical data for the first time in a Korean cohort.
However, there are several limitations to our study. First, the sample size may not be enough for the ML algorithm. The larger the data, the more likely it is that the model will learn common patterns and minimize overfitting or bias. Therefore, verification is needed through large-scale external cohort studies in the future to confirm the findings.
Second, data from the individual datasets were collected retrospectively. Although patient management was consistent, the computational estimation of missing values of clinical information should be analyzed more precisely and specifically.
Despite these limitations, our model archived high accuracy in external validation using data from another institution, verifying the reliability and robustness of the results. Our findings provide valuable insights into the role of ML in developing of PCa diagnosis.
CONCLUSIONS
We can conclude from our study that an ML model based on clinical data could be an excellent option for future PCa diagnosis, and large-scale multi-cohort cross-studies are needed.
Notes
Grant/Fund Support
This work was supported by the Promotion of Innovative Businesses for Regulation-Free Special Zones funded by the Ministry of SMEs and Startups (MSS, Korea) (1425170909).
Research Ethics
The present study protocol was reviewed and approved by the Institutional Review Board of Kangwon National University Hospital University Hospital (Reg. No. KNUH-2021-09-010-005). The study protocol and the use of patient data for recruitment and follow-up were approved before patient recruitment. Informed consent was submitted by all participants when they were enrolled.
Conflicts of Interest
The authors have nothing to disclose.
Author Contribution
Conceptualization: JHK; Provision of study materials and patients: JHK, HP, SWL, GS, JHJ, JKJ, EBC; Data curation: SC, JML, SHK, SEL; Formal analysis: SC, BS; Funding acquisition: JHK; Methodology: BS, SO; Project administration: JHK, SC, SEL; Writing - original draft: SC, BS; Writing - review & editing: JHK, SC.
Acknowledgements
The authors thank the Biobank of Kangwon University Hospital, a member of the Korea Biobank Network, for providing the biospecimens and data used in this study.