A Machine Learning Model for Prostate Cancer Prediction in Korean Men

Article information

J Urol Oncol. 2024;22(3):201-210
Publication date (electronic) : 2024 November 30
doi : https://doi.org/10.22465/juo.244800400020
1Department of Urology, Kangwon National University School of Medicine, Chuncheon, Korea
2LifeSemantics Inc., Seoul, Korea
3Department of Urology, National Medical Center, Seoul, Korea
4Department of Urology, Hanyang University College of Medicine, Seoul, Korea
5Department of Biomedical Research Institute and Biobank, Kangwon National University Hospital, Chuncheon, Korea
6Department of Precision Medicine and Urology, Yonsei University Wonju College of Medicine, Wonju, Korea
Corresponding author: Jeong Hyun Kim Department of Urology, Kangwon National University School of Medicine, Chuncheon 24341, Korea Email: urodr348@kangwon.ac.kr
Received 2024 May 20; Revised 2024 August 13; Accepted 2024 September 18.

Abstract

Purpose

Unnecessary prostate biopsies for detecting prostate cancer (PCa) should be minimized. Therefore, this study developed a machine learning (ML) model to predict PCa in Korean men and evaluated its usability.

Materials and Methods

We retrospectively analyzed clinical data from 928 patients who underwent prostate biopsies at Kangwon National University Hospital between May 2013 and May 2023. Of these, 377 (41.6%) were diagnosed with PCa, and 551 (59.4%) did not have cancer. For external validation, clinical data from 385 patients aged 48–89 years who underwent prostate biopsies from September 2005 to September 2023 at Wonju Severance Christian Hospital were also included. Twenty-two clinical features were used to develop an ML model to predict PCa. Features were selected based on their contributions to model performance, leading to the inclusion of 15 features. A meta-learner was constructed using logistic regression to predict the probability of PCa, and the classifier was trained and validated on randomly extracted training and test sets at an 8:2 ratio.

Results

The prostate health index, prostate volume, age, nodule on digital rectal examination, and prostate-specific antigen were the top 5 features for predicting PCa. The area under the receiver operating characteristic curve (AUC) of the meta-learner logistic regression model was 0.89, and the accuracy, sensitivity, and specificity were 0.828, 0.711, and 0.909, respectively. Our model also showed excellent prediction performance for high-grade PCa, with a Gleason score of 7 or higher and an AUC of 0.903. Furthermore, we evaluated the performance of the model using external cohort clinical data and achieved an AUC of 0.863.

Conclusions

Our ML model excelled in predicting PCa, specifically clinically significant PCa. Although extensive cross-validation in other clinical cohorts is needed, this ML model is a promising option for future diagnostics.

INTRODUCTION

Prostate cancer (PCa) is the most frequently diagnosed cancer in more than half of the countries around the world (112 out of 185) and the fifth leading cause of cancer death among men in 2020 [1]. According to the National Cancer Information Center, PCa was the third most frequently diagnosed cancer in Korean males in 2020. The incidence of PCa in Korea has increased significantly, while the mortality rate has remained stable [2]. The low mortality rate compared to the high incidence rate reflects advancements in treatment and earlier detection through increased screening [3,4].

For several decades, the serum prostate-specific antigen (PSA) was introduced as a screening tool for PCa and has been widely used as the most valuable diagnostic marker [5]. However, there has been controversy about the usefulness of the PSA screening test. First, PSA is prostate-specific, not cancer-specific; therefore, there are various factors that might affect PSA levels, such as benign prostatic hyperplasia, prostatic inflammation, and ejaculation [6]. Therefore, it has low specificity. The positive-predictive value for PSA was approximately 25% in a pooled meta-analysis [7], leading to many false-positive results. This finding shows that up to 3 of 4 patients undergo unnecessary prostate biopsies. Therefore, there is an unmet need for a new, convenient method to improve the diagnostic ability of PCa.

Recently, active research has been conducted in the medical field for diagnosis, patient morbidity or mortality risk assessment, and treatment strategies based on machine learning (ML) models [8-10]. In PCa, ML is applied to perform image analysis tasks such as prostate segmentation [11] and pathological slides [12]. It is also applied to high level inference and prediction tasks such as PCa detection and characterization [13]. These studies applied MRI or CT image-based information to ML techniques to increase the accuracy of PCa lesion detection.

In this study, we developed an ML prediction model and evaluated its performance in predicting PCa occurrence. Our ML model based on clinical information will help clinicians make decisions before prostate biopsy, and it will lead to a substantial reduction in unnecessary biopsies, medical resources, and patient suffering.

MATERIALS AND METHODS

1. Study Subjects

The data of 928 male patients who underwent transrectal ultrasound-guided prostatic biopsy from May 2013 to May 2023 at Kangwon National University were retrospectively retrieved. The study population included men aged 40–95 years with total PSA (tPSA) ≥3.5 ng/mL who underwent prostate biopsy for suspected PCa. All patients underwent a systematic biopsy with 10–12 cores. Clinical information, including PSA and prostate health index (PHI), of these patients was collected. For external validation of the model, clinical data from 385 patients aged 48–89 years who underwent prostate biopsy from September 2005 to September 2023 at Wonju Severance Christian Hospital were also included.

2. Clinical Information

Clinical data were selected for features considered to be highly or potentially related to PCa. First, clinical data include PSA, family history of PCa, abnormal digital rectal examination (DRE), and prior prostate biopsy results, which are constitutive factors of the Prostate Cancer Prevention Trial risk calculator [14]. Clinical information consists of 22 characteristics, including age, height, weight, the American Urological Association (AUA) symptom score, hypertension, diabetes status, PHI, high-density lipoprotein (HDL), triglyceride (TG), fasting blood sugar (FBS), glycosylated hemoglobin, C-reactive protein, lactate dehydrogenase, and prostate volume. Prior to the prostate biopsy, blood was drawn to measure the prebiopsy tPSA, free PSA (fPSA), and [-2]proPSA (p2PSA) levels. Patients with waist circumference ≥90 cm, TG ≥150 mg/dL, HDL <40 mg/dL, FBS ≥100 mg/dL, and blood pressure ≥130/85 mmHg or who were taking hypertension medication were considered to have metabolic syndrome. The blood samples were processed using the Access2 immunoassay kit (Beckman Coulter, Brea, CA, USA). The serum samples were analyzed using calibrated Access tPSA and fPSA assays at a single laboratory. The prostate volume was determined using transrectal ultrasonography. A single skilled genitourinary pathologist who was blind to the test results processed and evaluated the specimens. PCa was identified and graded according to the 2005 consensus conference of the International Society of Urological Pathology definitions. The PHI was calculated as [(p2PSA/fPSA)×√tPSA].

3. Data Preprocessing and Feature Selection

Among the 22 characteristics, features with very low importance were excluded (data not shown). Finally, 15 variables were selected to construct the prediction model, including 7 categorical variables (family history, prior biopsy results, finasteride, DRE nodules, metabolic disease, hypertension, and diabetes) and 8 continuous variables (age, AUA symptom score, height, weight, testosterone levels, PSA, PHI, and prostate volume). The categorical data were divided based on their presence or absence. All categorical features were preprocessed using one-hot encoding. Continuous variables were normalized using the robust method, which is resilient to outliers. Furthermore, the imputation of missing values was carried out using the light gradient boosting machine (LightGBM) algorithm through an iterative approach [15]. The data from Wonju Severance Christian Hospital used for external validation did not include PHI values. The 14 clinical features were included in the external validation test excluding PHI. Data preprocessing was performed in the same manner as Kangwon National University.

4. Model Development and Evaluation

The model employed in this study was based on a stacking framework designed to minimize bias and variance. The base learners consisted of LightGBM, random forest, and logistic regression, and based on the 3 base learners, the meta-learner was a logistic regression model as shown in Fig. 1 [15-17]. The hyperparameters of each base learner were selected through 10-fold cross-validation on the training patient data. Model evaluation was conducted by assessing the model's stability via 1,000 bootstrapping iterations [18]. Additionally, feature importance for each model variable was determined using the SHAP (SHapley Additive exPlanations) method [19]. The performance of the model was evaluated by comparing 4 metrics: the receiver operating characteristic (ROC) curve and its corresponding area under the curve (AUC), accuracy, sensitivity, and specificity.

Fig. 1.

Machine learning model based on a stacked model structure. Logistic regression, light gradient boosting machine, and random forest were used as base learners. The final prediction was made using the meta-learner logistic regression model based on each prediction result. PCa, prostate cancer.

5. Statistical Analysis

For comparative analysis between the 2 groups (PCa vs. non-PCa), Student t-test was used for normally distributed continuous variables, and the Mann-Whitney U-test was used for categorical variables with nonnormal continuous variables. The box plots for these analyses were constructed with the “ggplot2” package in R v4.3.1 (R Foundation for Statistical Computing, Vienna, Austria). All statistical significance levels were set at 5% (p<0.05). The bootstrap method was employed to split the training set into an 8:2 ratio for 1,000 iterations for evaluating the model’s performance, assessing the model’s stability using test dataset.

RESULTS

1. Patient Data

Nine hundred twenty-eight men underwent prostate biopsy owing to suspicion of PCa caused by tPSA over 3.5 ng/mL at Kangwon National University Hospital from May 2013 to May 2023. Out of the 928 patients, 377 (41.6%) were diagnosed with PCa (Table 1), and among them, 324 patients (86%) had aggressive cancer with a Gleason score (GS) ≥7 (Table 2). Among PCa patients, 189 patients (57.0%) had tPSA of 3.5–10 ng/mL, so-called gray zone, and 10.8% PCa patients had tPSA <4 ng/mL. In the case of Wonju Severance Christian Hospital, among the 385 patients, 117 (30.4%) were diagnosed with PCa, and among them, 222 (83%) had aggressive cancer with GS ≥7. Among the PCa patients, 117 patients (44%) had a tPSA of 3.5–10 ng/mL, which is referred to as the gray zone.

Demographic and clinical characteristics of the study subjects

Comparison between PCa patients with a GS less than 7 and more aggressive disease

2. Baseline Patient Characteristics

The baseline characteristics of all the subjects are shown in Tables 1, 2, and Fig. 2. Among the continuous variables (Fig. 2A), age, PSA, and PHI levels were significantly higher in the PCa group than in the non-PCa group. However, prostate volume, %fPSA, and the body mass index (BMI) were lower in the PCa group than in the non-PCa group. No significant differences were observed in the other continuous variables between the 2 groups. Among categorical variables, only DRE positivity was found to be significantly higher in the cancer group (Fig. 2B). These characteristics show similar trends in the subgroups of GS <7 and GS ≥7 (Table 2). For more aggressive cancers (GS ≥7), age, PSA, and PHI were significantly higher, and DRE positivity was more frequent (p<0.05). However, the %fPSA was not significant between the 2 subgroups.

Fig. 2.

Characteristics of total subjects. Student t-test was used for normally distributed continuous variables (A), and the Mann-Whitney U-test was used for categorical variables with nonnormal continuous variables (B). PCa, prostate cancer; tPSA, total prostate-specific antigen; fPSA, free prostate-specific antigen; PHI, prostate health index; BMI, body mass index; AUA, American Urological Association; circ., circumference; ns, not significant; Hx, history; p-bx, prostate biopsy; DRE, digital rectal examination. *p<0.05. ***p<0.001.

3. Model Evaluation

The performance of the stacked model was rigorously evaluated using a bootstrap method with 1,000 replications to estimate 95% confidence intervals (CIs), ensuring the robustness and reliability of the model assessments. The AUC, accuracy, sensitivity, and specificity were the metrics used for evaluation (Table 3).

Model evaluation using the training dataset

As a base learner, LightGBM achieved an AUC of 0.874, with 95% CI of 0.861–0.884, indicating a high degree of discriminative ability. The model also achieved an accuracy of 0.809 (95% CI, 0.792–0.825), sensitivity of 0.709 (95% CI, 0.654–0.764), and specificity of 0.877 (95% CI, 0.841–0.909), reflecting a balanced performance between identifying positive and negative classes. Random forest, another base learner, exhibited superior performance with an AUC of 0.907 (95% CI, 0.892–0.893), suggesting excellent classification capabilities. Accuracy was 0.836 (95% CI, 0.815–0.856), coupled with a sensitivity of 0.779 (95% CI, 0.724–0.827) and a specificity of 0.875 (95% CI, 0.837–0.909), underscoring the robustness of the model in handling diverse PCa patients. The logistic regression base learner showed an AUC of 0.844 (95% CI, 0.832–0.854), accuracy of 0.775 (95% CI, 0.756–0.794), sensitivity of 0.625 (95% CI, 0.565–0.681), and specificity of 0.878 (95% CI, 0.837–0.914). While the model presented the least discriminative power among the base learners, a high level of specificity was maintained.

The meta-learner, also a logistic-regression model, demonstrated an integrated assessment with an AUC of 0.903 (95% CI, 0.891–0.915). This high AUC value, along with an accuracy of 0.833 (95% CI, 0.815–0.850), sensitivity of 0.754 (95% CI, 0.701–0.804), and specificity of 0.887 (95% CI, 0.855–0.918), validated the effectiveness of the stacking approach in achieving a harmonized prediction by leveraging the strengths of individual base learners.

4. Diagnostic Performance of the Metamodel

The diagnostic performance of metamodel for PCa detection in the test dataset is shown in Table 4, and the ROC curves are shown in Fig. 3. In terms of PCa detection, the metamodel achieved an AUC of 0.890, accuracy of 0.828, sensitivity of 0.711, and specificity of 0.909, indicating high diagnostic accuracy. Our model also demonstrated a superior AUC of 0.903 for high-grade PCa (GS ≥7), suggesting a more precise ability to predict clinically significant PCa.

Metamodel’s diagnostic performance of PCa using test dataset

Fig. 3.

Performance of the machine learning model. The blue line is the receiver operating characteristic (ROC) curve predicting overall prostate cancer (PCa) patients, and the red line is the predictive performance for patients with a Gleason score of 7 or higher.

5. Feature Importance

In our predictive model analysis, the Shapley additive explanation method was employed to quantify the impact of each feature on the performance of the model. The resultant feature importance plots (Fig. 4) reveal the varying influences of features for each model, including logistic regression, LightGBM, and random forest. The feature importance ranking slightly differs between base learners. However, PHI, prostate volume, age, DRE nodule presence, and PSA were the top 5 important features of the final combined meta learning model (Fig. 4C).

Fig. 4.

Model feature importance plots generated by the SHAP (SHapley Additive exPlanations) method. (A) Ranking of feature importance for each base learner. (B) Impact assessment of each algorithm on the metamodel. (C) The importance of the final combined feature on meta-learning logistic regression model. PHI, prostate health index; tPSA, total prostate-specific antigen; AUA, American Urological Association; DRE, digital rectal examination; PCa, prostate cancer.

6. External Validation

We verified the model performance on 385 data samples from the Wonju Severance Christian Hospital. However, as this hospital did not collect PHI values, 14 clinical features were included in the external validation test. The model achieved an AUC of 0.863 (Fig. 5A) and accuracy, sensitivity, and specificity of 0.82, 0.84, and 0.76, respectively. Prostate volume, PSA level, age, and DRE nodule presence were also ranked as important features for the final combined meta-learning model (Fig. 5B).

Fig. 5.

External validation of the model. (A) Receiver operating characteristic (ROC) plot based on the external validation set. (B) Ranking of feature importance. tPSA, total prostate-specific antigen.

DISCUSSION

Presently, PSA is the most widely used and representative tumor marker for the screening and prognosis of PCa [5,20]. However, the problem is that unnecessary biopsies frequently occur owing to the low specificity of PSA [7]. Unfortunately, there are no ideal biomarkers for PCa that can distinguish PCa from benign prostatic conditions and differentiate between aggressive and indolent cancers. Therefore, considerable effort is being made to develop new noninvasive novel biomarkers with high sensitivity and specificity for PCa detection [21-23]. Several new biomarkers, such as the 4K score and PHI, have shown promising results in improving PCa risk assessment [24,25]. Previously, our research team reported on the usefulness of PHI, when the PHI cutoff value was set to 22.9 in gray zone patients, the sensitivity for the detection of PCa was 90%, and the specificity was 68.3%, indicating good results [26].

In this study, we developed an ML model that merged several ML algorithms based on the clinical information of 928 Korean men, which demonstrated excellent PCa prediction accuracy. The model also showed excellent predictive performance for aggressive PCa. The final verification of the model’s performance using separate test data showed excellent prediction performance, and the evaluation of 4 metrics (AUC, accuracy, sensitivity, and specificity) was within the 95% CI of the model constructed by bootstrapping 1,000 iterations using training data. Furthermore, we evaluated the performance of the model by using external cohort clinical data and achieved an AUC of 0.863. These results confirm the robustness and stability of the model. In our study, prostate volume, PSA level, age, and DRE nodule presence were the most significant characteristics influencing the PCa prediction model.

A few Asian studies have reported on ML prediction models using clinical information. Chen et al. [27] constructed 5 models using 4 algorithms for 551 Chinese men. The multivariate logistic regression model exhibited the best performance, demonstrating the limitations of using PSA alone as a predictor of PCa. Jeong et al. [28] developed a logistic regression model using the clinical variables of 3,482 Korean men. The predictive accuracy (AUC=0.81) showed higher benefits than other PCa risk calculators (Prostate Cancer Prevention Trial Risk Calculator; European Randomized Study of Screening for Prostate Cancer Risk Calculator).

Our study differs from existing studies in 3 main aspects.

First, our model was constructed using more clinical information of patients than existing studies.

Second, from a methodological perspective, until now, most PCa prediction ML models have been developed with a single algorithm or used to compare single algorithms. However, we adopted an ensemble method that combined multiple ML algorithms rather than a single model. Such methods improve predictive performance, like the accuracy and stability of a single model, by training multiple models and combining their predictions [29]. The main premise of our model is that by combining multiple models, the error of a single model is more likely to be compensated by other models, resulting in better overall prediction performance for the ensemble model than for a single model.

Third, we also achieved high specificity, which can compensate for the shortcomings of PSA (Table 4). In the training dataset, the ML model outperformed with specificity of 0.887, and the performance of the test dataset also maintained high specificity (0.909) in both the overall population and high grade PCa patients. High specificity indicates low false positive rates and allows better identification of patients who do not have cancer. This approach might lead to a substantial reduction of unnecessary biopsies.

To the best of our knowledge, we demonstrated the development of a PCa prediction ensemble ML model based on clinical data for the first time in a Korean cohort.

However, there are several limitations to our study. First, the sample size may not be enough for the ML algorithm. The larger the data, the more likely it is that the model will learn common patterns and minimize overfitting or bias. Therefore, verification is needed through large-scale external cohort studies in the future to confirm the findings.

Second, data from the individual datasets were collected retrospectively. Although patient management was consistent, the computational estimation of missing values of clinical information should be analyzed more precisely and specifically.

Despite these limitations, our model archived high accuracy in external validation using data from another institution, verifying the reliability and robustness of the results. Our findings provide valuable insights into the role of ML in developing of PCa diagnosis.

CONCLUSIONS

We can conclude from our study that an ML model based on clinical data could be an excellent option for future PCa diagnosis, and large-scale multi-cohort cross-studies are needed.

Notes

Grant/Fund Support

This work was supported by the Promotion of Innovative Businesses for Regulation-Free Special Zones funded by the Ministry of SMEs and Startups (MSS, Korea) (1425170909).

Research Ethics

The present study protocol was reviewed and approved by the Institutional Review Board of Kangwon National University Hospital University Hospital (Reg. No. KNUH-2021-09-010-005). The study protocol and the use of patient data for recruitment and follow-up were approved before patient recruitment. Informed consent was submitted by all participants when they were enrolled.

Conflicts of Interest

The authors have nothing to disclose.

Author Contribution

Conceptualization: JHK; Provision of study materials and patients: JHK, HP, SWL, GS, JHJ, JKJ, EBC; Data curation: SC, JML, SHK, SEL; Formal analysis: SC, BS; Funding acquisition: JHK; Methodology: BS, SO; Project administration: JHK, SC, SEL; Writing - original draft: SC, BS; Writing - review & editing: JHK, SC.

Acknowledgements

The authors thank the Biobank of Kangwon University Hospital, a member of the Korea Biobank Network, for providing the biospecimens and data used in this study.

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–49.
2. Hong Y, Lee S, Moon S, Sung S, Lim W, Kim K, et al. Projection of cancer incidence and mortality from 2020 to 2035 in the Korean population aged 20 years and older. J Prev Med Pub Health 2022;55:529–38.
3. Etzioni R, Tsodikov A, Mariotto A, Szabo A, Falcon S, Wegelin J, et al. Quantifying the role of PSA screening in the US prostate cancer mortality decline. Cancer Causes Control 2008;19:175–81.
4. Tsodikov A, Gulati R, Heijnsdijk EAM, Pinsky PF, Moss SM, Qiu S, et al. Reconciling the effects of screening on prostate cancer mortality in the ERSPC and PLCO trials. Ann Intern Med 2017;167:449–55.
5. Oesterling JE. Prostate specific antigen: a critical assessment of the most useful tumor marker for adenocarcinoma of the prostate. J Urol 1991;145:907–23.
6. Hoffman RM. Screening for prostate cancer. N Engl J Med 2011;365:2013–9.
7. Mistry K, Cable G. Meta-analysis of prostate-specific antigen and digital rectal examination as screening tests for prostate carcinoma. J Am Board Fam Pract 2003;16:95–101.
8. Lopes UK, Valiati JF. Pretrained convolutional neural networks as feature extractors for tuberculosis detection. Comput Biol Med 2017;89:13543.
9. Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med 2018;24:1716–20.
10. Schwalbe N, Wahl B. Artificial intelligence and the future of global health. Lancet 2020;395:1579–86.
11. Peng T, Tang C, Wu Y, Cai J. Semi-automatic prostate segmentation from ultrasound images using machine learning and principal curve based on interpretable mathematical model expression. Front Oncol 2022;12:878104.
12. Park HG, Bhattacharjee S, Deekshitha P, Kim CH, Choi HK. A study on deep learning binary classification of prostate pathological images using multiple image enhancement techniques. J Korea Multimed Soc 2020;23:539–48.
13. Litjens G, Debats O, Barentsz J, Karssemeijer N, Huisman H. Computer-aided detection of prostate cancer in MRI. IEEE Trans Med Imaging 2014;33:1083–92.
14. Ankerst DP, Hoefler J, Bock S, Goodman PJ, Vickers A, Hernandez J, et al. Prostate Cancer Prevention Trial risk calculator 2.0 for the prediction of low- vs high-grade prostate cancer. Urology 2014;83:1362–8.
15. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems; Red Hook (NY): Curran Associates Inc.; 2017. p. 3149-57.
16. Rigatti SJ. Random forest. J Insur Med 2017;47:31–9.
17. LaValley MP. Logistic regression. Circulation 2008;117:2395–9.
18. Didona D, Romano P. Using analytical models to bootstrap machine learning performance predictors. In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS); 2015 Dec 14-17; Melbourne, VIC, Australia. 2015. p. 405-3.
19. Ekanayake IU, Meddage DPP, Rathnayake U. A novel approach to explain the blackbox nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud Constr Mater 2022;16:e01059.
20. Saini S. PSA and beyond: alternative prostate cancer biomarkers. Cell Oncol 2016;39:97–106.
21. Sartori DA, Chan DW. Biomarkers in prostate cancer: what’s new? Curr Opin Oncol 2014;26:259–64.
22. Lazzeri M, Abrate A, Lughezzani G, Gadda GM, Freschi M, Mistretta F, et al. Relationship of chronic histologic prostatic inflammation in biopsy specimens with serum isoform [-2] proPSA (p2PSA), %p2PSA, and prostate health index in men with a total prostate-specific antigen of 4-10 ng/mL and normal digital rectal examination. Urology 2014;83:606–12.
23. Hatakeyama S, Yoneyama T, Tobisawa Y, Ohyama C. Recent progress and perspectives on prostate cancer biomarkers. Int J Clin Oncol 2017;22:214–21.
24. Konety B, Zappala SM, Parekh DJ, Osterhout D, Schock J, Chudler RM, et al. The 4Kscore® test reduces prostate biopsy rates in community and academic urology practices. Rev Urol 2015;17:231–40.
25. Carroll PR, Parsons JK, Andriole G, Bahnson RR, Castle EP, Catalona WJ, et al. NCCN guidelines insights: prostate cancer early detection, version 2.2016. J Natl Compr Canc Netw 2016;14:509–19.
26. Park H, Lee SW, Song G, Kang TW, Jung JH, Chung HC, et al. Diagnostic performance of %[-2]proPSA and prostate health index for prostate cancer: prospective, multi-institutional study. J Korean Med Sci 2018;33:e94.
27. Chen S, Jian T, Chi C, Liang Y, Liang X, Yu Y, et al. Machine learning-based models enhance the prediction of prostate cancer. Front Oncol 2022;12:941349.
28. Jeong CW, Lee S, Jung JW, Lee BK, Jeong SJ, Hong SK, et al. Mobile application-based Seoul National University Prostate Cancer Risk Calculator: development, validation, and comparative analysis with two western risk calculators in Korean men. PLoS One 2014;9:e94441.
29. Sagi O, Rokach L. Ensemble learning: a survey. WIREs Data Min Knowl Discov 2018;8:e1249.

Article information Continued

Fig. 1.

Machine learning model based on a stacked model structure. Logistic regression, light gradient boosting machine, and random forest were used as base learners. The final prediction was made using the meta-learner logistic regression model based on each prediction result. PCa, prostate cancer.

Fig. 2.

Characteristics of total subjects. Student t-test was used for normally distributed continuous variables (A), and the Mann-Whitney U-test was used for categorical variables with nonnormal continuous variables (B). PCa, prostate cancer; tPSA, total prostate-specific antigen; fPSA, free prostate-specific antigen; PHI, prostate health index; BMI, body mass index; AUA, American Urological Association; circ., circumference; ns, not significant; Hx, history; p-bx, prostate biopsy; DRE, digital rectal examination. *p<0.05. ***p<0.001.

Fig. 3.

Performance of the machine learning model. The blue line is the receiver operating characteristic (ROC) curve predicting overall prostate cancer (PCa) patients, and the red line is the predictive performance for patients with a Gleason score of 7 or higher.

Fig. 4.

Model feature importance plots generated by the SHAP (SHapley Additive exPlanations) method. (A) Ranking of feature importance for each base learner. (B) Impact assessment of each algorithm on the metamodel. (C) The importance of the final combined feature on meta-learning logistic regression model. PHI, prostate health index; tPSA, total prostate-specific antigen; AUA, American Urological Association; DRE, digital rectal examination; PCa, prostate cancer.

Fig. 5.

External validation of the model. (A) Receiver operating characteristic (ROC) plot based on the external validation set. (B) Ranking of feature importance. tPSA, total prostate-specific antigen.

Table 1.

Demographic and clinical characteristics of the study subjects

Variable Total Non­PCa PCa p­value
No. of patients 928 (100) 551 (59.4) 377 (41.6) ­-
Age (yr), 71.0±9.0 69.0±9.1 73.0±8.1 <0.001
BMI (kg/m2) 24.4±3.2 24.6±3.1 24.1±3.2 <0.05
AUA symptom score 13.5±8.0 13.8±8.1 13.0±7.9 0.205
Waistcircumference 83.5±0.4 83.8±5.9 83.2±5.5 0.437
Testosterone 4.0±1.5 4.0±1.4 4.0±1.6 0.466
PHI (median, range) 38.3 (3.2–588.1) 32.9 (3.2–137.5) 59.2 (11.0–588.1) <0.001
tPSA (ng/mL) 6.1 (3.5–395.0) 5.3 (3.5–42.6) 8.2 (3.5–395.0) <0.001
PV (cm3) 49 (10–149) 54 (10–149) 42 (13–145) <0.001
%fPSA 0.17 (0.01–0.82) 0.2 (0.01–0.82) 0.13 (0.01–0.51) <0.001
DRE nodule (+) 115 (12.4) 22 (4.0) 93 (24.7) <0.001
Family history 8 (0.86) 3 (0.54) 5 (1.33) 0.184
Prior prostate biopsy 131 (14.1) 95 (17.2) 36 (9.5) <0.05
Finasteride medication 39 (4.2) 23 (4.2) 16 (4.2) 0.958
Metabolic syndrome 0.762
 Yes 273 (29.4) 166 (30.1) 107 (28.4) ­-
 No 551 (59.4) 326 (59.2) 225 (59.7) ­-
 Unknown 104 (11.2) 59 (10.7) 45 (11.9) ­-
Diabetes 193 (20.8) 106 (19.2) 87 (23.1) 0.157
Hypertension 608 (65.5) 364 (66.1) 244 (64.7) 0.849

Values are presented as number (%), mean±standard deviation, or median (range).

PCa, prostate cancer; BMI, body mass index; AUA, American Urological Association; PHI, prostate health index; tPSA, total prostate-specific antigen; PV, prostate volume; fPSA, free prostate-specific antigen; DRE, digital rectal examination.

Patients who had taken finasteride for more than 6 months.

Table 2.

Comparison between PCa patients with a GS less than 7 and more aggressive disease

Variable GS <7 GS ≥7 p­value
No. of patients 53 (14) 324 (86) ­-
Age (yr) 70.0±8.9 74.0±7.9 <0.01
tPSA (ng/mL) 5.1 (3.7–62.2) 9.3 (3.5–395.0) <0.001
%fPSA 0.15 (0.06–0.45) 0.13 (0.01–0.51) 0.301
PHI 41.8 (12.8–71.8) 63.62 (20.2–588.1) <0.001
PV (cm3) 42 (16–102) 42 (13–145) 0.478
DRE nodule (+) 7 (13.2) 92 (28.4) <0.05

Values are presented as number (%), mean±standard deviation, or median (range).

PCa, prostate cancer; GS, Gleason score; tPSA, total prostate-specific antigen; fPSA, free prostate-specific antigen; PHI, prostate health index; PV, prostate volume; DRE, digital rectal examination.

Table 3.

Model evaluation using the training dataset

Learner Model AUC (95% CI) Accuracy (95% CI) Sensitivity (95% CI) Specificity (95% CI)
Base learner LightGBM 0.874 (0.861–0.884) 0.809 (0.792–0.825) 0.709 (0.654–0.764) 0.877 (0.841–0.909)
Random forest 0.907 (0.892–0.893) 0.836 (0.815–0.856) 0.779 (0.724–0.827) 0.875 (0.837–0.909)
Logistic regression 0.844 (0.832–0.854) 0.775 (0.756–0.794) 0.625 (0.565–0.681) 0.878 (0.837–0.914)
Meta­learner Logistic regression 0.903 (0.891–0.915) 0.833 (0.815–0.850) 0.754 (0.701–0.804) 0.887 (0.855–0.918)

AUC, area under the receiver operating characteristic curve; CI, confidence interval.

Table 4.

Metamodel’s diagnostic performance of PCa using test dataset

Variable AUC Accuracy Sensitivity Specificity
Overall PCa 0.890 0.828 0.711 0.909
High­grade PCa 0.903 0.842 0.731 0.909

PCa, prostate cancer; AUC, area under the receiver operating characteristic curve.