Construction of a Retrospective Cohort to Observe 10-Year Urologic Cancer Treatment Trends at the Biggest Medical Center of South Korea
Article information
Abstract
Purpose
To construct a urologic cancer database using a standardized, reproducible method, and to assess preliminary characteristics of this cohort.
Materials and Methods
Patients with prostate, bladder, and kidney cancers who were en-rolled with diagnostic codes in the electronic medical record (EMR) at Asan Medical Center from 2007–2016 were included. Research Electronic Data Capture (REDCap) was used to design the Asan Medical Center-Urologic Cancer Database (AMC-UCD). The process included developing a data dictionary, applying branching logic, mapping clinical data warehouse structures, al-pha testing, clinical record summary testing, creating “standards of procedure,” importing data, and entering data. Descriptive statistics were used to identify rates of surgeries and numbers of patients.
Results
Clinical variables (n=407) were selected to develop a data dictionary from REDCap. In total, 20,198 urologic cancer patients visited our institution from 2007–2016 (bladder can-cer, 4,616; kidney cancer, 5,750; prostate cancer, 10,330). The overall numbers of patients and surgeries increased over time, with robotic surgeries rapidly growing over a decade. The most common treatment for urologic cancer was surgery, followed by chemotherapy and radiation therapy.
Conclusions
Using a standardized method, the AMC-UCD fosters multidisciplinary research. This constructed database provides access to clinical statistics to effectively assist research. Preliminary data should be refined through EMR chart review. The successful organization of data from 2007–2016 provides a framework for future periods of investigation and prospective models.
INTRODUCTION
With the popular dissemination of electronic medical records (EMRs) systems, many medical institutions began storing clinical data in databases in the late 1990s and 2000s.1–3 The transition away from paper has accelerated data storage. Therefore, with EMR adoption, large quantities of clinical data have been stored.4 Recently, attention to secondary use of this clinical data to improve clinical care has grown.5–8
The fast-growing quantity of clinical data makes reused clinical data a candidate source for “big data.”4 Big data enables researchers to easily explore data, generate research questions, and determine study feasibility.9 In the field of urology, there is a desire to use past data to guide clinical decisions for the future.10
With emerging evidence of the benefits of multi-disciplinary research in cancer care,11 numerous multidisciplinary studies are being conducted in various departments on one topic, such as urological cancer.12,13 However, researchers still use databases that differ between project units, researchers, or departments. These methods lead to increased risk of information leakage in process of multidisciplinary studies with collaborators.14 Further, it is redundant and has a lot of missing value because the data are extracted manually with a higher likelihood of human error.15
A disease group-specific clinical database may help compensate for EMR limitations, as well as provide more readily-accessible means of research.16 Databases like these can be made available to researchers of various departments that all study urologic cancer, for example. This study aims to (1) construct the Asan Medical Center-Urologic Cancer Database (AMC-UCD) using a standardized and reproducible method, and (2) to identify preliminary characteristics of this cohort.
MATERIALS AND METHODS
The administrative procedures for registry pla-nning took place from 2016–2018. We developed a retrospective cohort using Research Electronic Data Capture (REDCap) from July 2018– December 2018 with the methods shown in Fig. 1. The cohort included patients with prostate cancer (C61), bladder cancer (C67 and D09), and kidney cancer (C64). Only patients enrolled with diagnostic codes at Asan Medical Center between 2007 and 2016 were included.
1. Development of a Data Dictionary
We developed a case report form (CRF) to define the range of detail (variables) of the registry to be collected. Because data requirements differ, the scope of the variable was defined by considering data migration efficiency from the EMR to the group-specific database. And, we created a data dictionary to conduct physical modeling of the database from the CRF which is cannot be used to create a data structure by itself.
2. Applying Branching Logic
The branching logic provided by REDCap allows researchers the option of showing or hiding input fields. Because the registry consists of several input forms with the patient's initial information, treatment information, and follow-up information, the data collection instrument must be changed dynamically by cancer type. With this, it is possible to show only the input form to be filled out by the researchers. For example, when entering patients only with kidney cancer, there is no need to show the input fields for bladder cancer and prostate cancer. If these fields are present, this can confuse researchers, reducing the user input experience, and slowing the data entry speed. Therefore, we applied branching logic to show the data fields specifically for each cancer type.
3. Mapping With Clinical Data Warehouse Structure
When we constructed cohort, we planned on linking data from clinical data warehouse (CDW) to it, so we wished to build databases considering the interoperability in the future. Therefore, the data type and data name used for cohort were made to be compatible with CDW. We therefore created a REDCap database structure that reflects the data types used in EMR and order communication systems as much as possible.
4. Alpha Test
We structured the REDCap form through the data dictionary, installed it on the test server, and tested the function of REDCap. We conducted an alpha test with a total of 6 people, including 3 residents, 1 fellow, 1 staff professor, and 1 research coordinator. Our objective of this test was to confirm that (1) there were no data that were not summarizing the clinical records, (2) there were no technical errors while entering data into the REDCap, and (3) there were no spelling errors.
In addition, the most important function of REDCap is that clinical records are able to be comprehensively included. To determine whether the clinical records were sufficiently included, we categorized 3 types of treatments for each of the 3 cancers included in our cohort (surgeries, chemotherapy, radiotherapy), classified them into 9 subcategories total, and extracted some samples included in our test. We recruited clinicians and assigned 5 samples per clinician during the test, and asked them to input the patient records into REDCap.
5. Clinical Record Summary Test
The person who summarized the clinical records of the cohort we built was the clinical research coordinator. Depending on how well the clinical research coordinator summarizes the clinical record, the reliability of data may vary. Therefore, it was important to evaluate whether the coordinator abstracted clinical records properly. To do this, we asked the coordinator to enter the records of 2 patients per subgroup similarly to the alpha test (total 18 cases). To evaluate the records entered by the coordinator, a professor with more than 10 years of experience with urology entered the record of the same patient (answer label). We then evaluated the accuracy of the records entered by coordinator and compared these two. However, the clinical records could have been wrong, so the coordinator and clinician together compared the 18 cases, and the answer data set was ultimately made. Finally, both records were compared.
The records were evaluated by calculating the number of data correctly input by the coordinator divided by the number of data on the correct answer data set multiplied by 100 to get a percent. This study aimed to achieve clinical record quality of at least 95%. Fortunately, the coordinator in this study secured the reliability of clinical records by exceeding 95%.
6. Create a Standard of Procedure Document
To ensure data quality, we developed a standard of procedure (SOP) document for entry when 2 or more research coordinators participated in the study. In addition, when the research coordinator participated, the participant would be subjected to a clinical record summary test. Despite this, to ensure that data from the same clinical record was consistent when more than 2 coordinators entered it, a SOP was developed. The SOP included information on all data contained in REDCap (including label name, variable name, and code value) and where to find the data in the EMR. It also provided guidance on various situations that could be found in the medical records.
The role of managing the SOP was delegated to the research coordinator. In addition, we stated in the SOP document that whenever the version of REDCap is changed, the SOP must also be updated. After this, 2 clinical professors reviewed the SOP and confirmed it.
7. Data Importation
Considering the number of variables included in REDCap and the number of subjects in our cohort, it would have been a large burden to enter data into REDCap by summarizing the medical records manually. To prevent this, we preprocessed data that could be easily fitted to the REDcap data structure and imported them into REDCap. We extracted the subject's data according to the range of the cohort in the CDW, and we preprocessed it with R ver. 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). Then, using the import data function of REDCap, the data was loaded according for all subjects in the cohort.
8. Data Entry
Except for preprocessed data that were already uploaded, the clinical research coordinator direc-tly input the parts requiring medical context. Data was searched and recorded using the developed entry SOP through an anonymous chart review in CDW.
9. Analysis Methods
Descriptive statistical analyses were performed to identify the likelihoods of surgeries and the numbers of patients according to age over 10 years, as well as the proportions of surgeries. Analysis of variance and Student t-tests were performed using R ver. 3.5.1 for comparisons of continuous variables. Categorical variables were analyzed using chi-square testing. Because there were patients with 2 or more cancers, we did not compare the sta-tistical differences between groups.
To identify the broad characteristics of the cohort, we analyzed data using natural language processing with several rules. To find surgeries performed mainly at our institute Urology Department, surgery methods by instrument and resection type (radical vs. partial) were classified using the names of surgeries. In addition, during the entire visit period for each patient, we found chemotherapy treatment and radiation therapy data.
10. Ethics
The study protocol was approved by the Institutional Review Board at Asan Medical Center (2018–0941). The need for informed consent was waived because of the retrospective nature of the study.
RESULTS
We defined the data requirements for REDCap development at the CRF definition stage. Fur-thermore, 407 data variables were selected for development of the data dictionary (Table 1).
1. Characteristics of the Cohort
Approximately 20% of our cohort had bladder cancer (including overlap with other cancers). Most patients were male (81.1%), and transurethral resection of the bladder (TURB) was performed at a high rate among surgical treatments for bladder cancer (87.5%). Most radical cystectomies were performed as open surgeries (99.6%).
Among kidney cancer patients, the average age (56.2±13.8 years) was relatively low compared with those of the other 2 cancer cohorts. In addition, the proportion of radical nephrectomies and partial cystectomies were almost equal (50.1% vs. 49.9%, respectively). Furthermore, open surgery was often used for operation (50.1%). In addition to the open method, however, various methods were used.
In the prostate cancer patient group, the proportion of prostate cancer patients was the highest among urologic patients (51.1%). A large proportion of the number of surgeries was performed by robotically (n=3,888), with open surgery next-most likely (n=1,832) (Table 2).
2. Treatment Trends
The treatment methods were classified, and the surgical procedures and the number of patients were examined. The numbers of patients in each cancer group tended to increase steadily over the measured decade, and patients diagnosed with prostate cancer were the majority (Fig. 2A).
For the treatment of bladder cancer, radical cys-tectomy was performed more commonly than par-tial cystectomy, and the trend of bladder resection did not increase over the measured decade. However, the number of TURB procedures steadily increased (Fig. 2B).
For the treatment of kidney cancer, radical ne-phrectomy rates increased more than partial ne-phrectomy rates (Fig. 2C). For method, laparotomy use increasing steadily over the measured decade, and robotic surgery has increased at a constant rate since 2012. Indeed, by 2007, the number of robotic surgeries increased dramatically (Fig. 2D).
For the treatment of prostate cancer, the number of open surgeries and transurethral resections of the prostate did not significantly change, but the number of robotic surgeries increased at a constant rate (Fig. 2E).
Of the surgical approaches in the cohort, the number of transurethral surgeries was the most common, followed by robotic and open surgeries. The number of laparoscopic surgeries and hand-assisted laparoscopic surgeries was less 200 per year (Fig. 2F).
3. Treatment Modality Proportion
The most common treatment was surgery, except for patients who had just been diagnosed and who did not yet receive treatment. However, the ratios of treatment modalities varied for each urologic cancer. Unlike the treatment of kidney and prostate cancer, the number of patients undergoing surgery and chemotherapy (n=2,068) was higher than the number of patients who underwent only surgery (n=1,111) (Fig. 3).
DISCUSSION
We ultimately constructed a cohort of 20,198 urologic cancer patients for which 407 clinical variables were analyzed at Asan Medical Center from 2007–2016. The overall number of patients and surgeries increased over the measured decade, and robotic surgeries showed rapid growth. The most common treatment for urologic cancer was surgery, followed by chemotherapy and radiation therapy. In addition, we confirmed that there were numerous patients who had just been diagnosed at Asan Medical Center but not yet treated.
Through discussions with related departments, we compared data capture tools and databases. The tools and databases compared were ABLE (in-house CDW), REDcap, common data model, and clinical information system. Among these, we chose REDcap because it can securely process data by granting authority to each account, and the database setting is designed to be flexible and efficient.17 In addition, it was possible to generate and modify CRF in real time. In addition, we were in the process of developing a linkage function with ABLE, and we expected that REDCap as a data capture tool would reduce the labor of the input.
We tried to ensure that researchers had indivi-dually collected data sets and could evaluate them for development. Secondary use of data and the ability to load existing data could shorten the time needed to build a cohort. However, the data collected individually by researchers could not be guaranteed as complete, and would need more resources. Therefore, a new registry was constructed that did not require the use of existing data sets.
From 2007–2016, the number of surgeries and patients continuously increased. This did not mean that there were increased rates of cancer. In Korea, the age-standardized incidence rates of prostate, kidney, and bladder cancers were 25.5% (only male), 6.0%, and 4.4% per 100,000, respectively.18 The annual percentage changes were stable for prostate and kidney cancers. Even for bladder cancer, annual percentage change decreased slightly (−1.4%).18 Therefore, our results are likely related to the growing ability to accommodate cancer patients at Asan Medical Center. A new building was opened in May 2008, and an additional robotic machine was introduced in July 2007. In addition, the differential number between patients and surgeries was likely caused by multiple surgeries in single patients, especially for TURB.
Several limitations are present in this study. The exact numbers of urologic cancer patients and their treatments require validation. The numbers in this study were checked only by only the entered diagnostic codes in the EMR. However, our preliminary results likely do reflect an accurate trend over the measured decade. Furthermore, the differences between these numbers and previously entered data could be used to confirm the differences between the diagnostic codes in the EMR and real medical services. This may be related to entry error or the medical insurance system.19 Finally, selection bias cannot be avoided with the retrospective model. Our institution is the biggest medical center in Korea, and many patients were transferred here from other hospitals.
CONCLUSIONS
Using a standardized method, the AMC-UCD fosters multidisciplinary research. This constructed database provides access to clinical statistics to effectively assist research. Preliminary data should be refined through EMR chart review. The successful organization of data from 2007–2016 provides a framework for future periods of investigation and prospective models.
Notes
The authors claim no conflicts of interest.