Using Patient-Reported Outcomes to Describe the Patient Experience on Phase I Clinical Trials

Abstract Background Symptoms are common among patients enrolled in phase I trials. We assessed the validity of Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) items in relation to previously validated assessments of quality of life and psychological distress. We used data from a randomized trial testing a palliative care support intervention for patients enrolled on phase I trials. Methods Patients (n = 479) were accrued to the parent study prior to initiating a phase I clinical trial with data collected at baseline, 4, and 12 weeks. We determined the correlation of PRO-CTCAE with distress level, Functional Assessment of Cancer Therapy - General (FACT-G) total, and subscale domain scores. Results Patients were predominantly female (56.8%) and older than age 60 years, and 30.7% were from minority populations. The correlation coefficient for distress level for all PRO-CTCAE items was small to moderate (Pearson r = 0.33-0.46). Pearson correlation coefficient for FACT-G total was moderate (r = -0.45 to -0.69). Stronger associations were noted for mood items of the PRO-CTCAE only (with distress level, r = 0.55-0.6; with FACT-G, r = -0.54 to -0.6). PRO-CTCAE symptom interference scores had the strongest correlation with distress level (Pearson r = 0.46) and FACT-G total (Pearson r = -0.69). Correlations between PRO-CTCAE items and corresponding FACT-G (total and subscales) and distress levels reached statistical significance for all items (P <.001). Conclusion Evidence demonstrates validity of PRO-CTCAE in a heterogeneous US sample of patients undergoing cancer treatment on phase I trials, with small to moderate correlations with distress level for all PRO-CTCAE items and moderate correlations with quality of life as measured by FACT-G total.

Standard adverse event (AE) reporting in phase I clinical trials has historically not engaged patients to self-report symptoms, leading to potential underestimation of harms, both at baseline and over the course of a trial (1-10). There is a growing body of evidence supporting the use of patient-reported outcomes (PROs) in oncology clinical trials (11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22). The National Cancer Institute's (NCI) Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) was developed to allow patients to self-report symptomatic AEs and improve the quality of symptomatic AE detection (23). Attention to the patient experience is essential for optimal care, especially as health-related quality of life is becoming an integral part of cancer clinical trials (16,(24)(25)(26).
Industry sponsors are now beginning to implement PRO-CTCAE across the continuum of trials including early phase, phase III, and postmarketing studies. The US Food and Drug Administration has also encouraged adoption of this tool in oncology trials (27,28). Historically, the most common PRO strategy for oncology has been to assess the broad multidomain concept of health-related quality of life (28)(29)(30). These existing measures have strengths, including familiarity with their use among the cancer therapeutic development community, but they often ask questions less relevant to the trial context and/or miss the assessment of important symptoms. This limitation is especially important in the era of novel cancer therapeutics, where adverse events can differ from traditional cytotoxic chemotherapy because of novel mechanisms of action, continuous oral administration of therapy, and more prolonged duration of treatment.
Although phase I studies are primarily focused on treatment safety and feasibility, it remains important to both quantify mild or moderate adverse events and assess their impact on patient function and well-being. This concept of "treatment tolerability" becomes increasingly important in an era where patients are living longer and cancer is increasingly managed as a chronic disease (31). Few investigators conducting phase I oncology trials have explored whether PRO-CTCAE is correlated with important patient measures, such as distress and quality of life.
The PRO-CTCAE is an item library that includes individual patient questions representing 78 unique symptomatic AEs (23). Items are explained in patient-friendly language and have undergone rigorous psychometric development and validation (32,33). The PRO-CTCAE includes up to 3 discrete questions for each AE, separately representing the frequency (F), severity (S), and/or interference (I) of each event. Items are available from the NCI at http://healthcaredelivery.cancer.gov/pro-ctcae.
It is essential to establish that PRO-CTCAE accurately and reliably captures the underlying experience it is intended to measure. To accomplish this, we performed a secondary analysis of all patients who enrolled on a randomized trial investigating a palliative care intervention for cancer patients enrolled on phase I trials at 2 institutions. We evaluated the measurement properties of several items in the PRO-CTCAE library and correlated these items with the Distress Thermometer and FACT-G total and subscale scores. These anchors were primary endpoints of the main parent trial and were chosen given published studies associating patient quality of life and distress with physical function and symptom burden (34)(35)(36).

Design
This secondary analysis of patient-reported outcome data was derived from a randomized clinical trial funded by the NCI to test integration of palliative care for patients beginning a phase 1 trial. In the parent study, patients with solid tumors (n ¼ 479) were accrued from 2 NCI Comprehensive Cancer Centers with baseline data collected prior to the initial phase I treatment. Patients were randomly assigned to usual care or the palliative care intervention. Procedures for patients in the palliative care intervention group included a care plan created by the study nurse based on data from the baseline evaluation and a discussion of the patient in an interdisciplinary meeting of the study investigators, nurses, a chaplain, and a social worker, and the patient received 2 teaching sessions by the research nurse using standardized teaching materials addressing symptom and quality-of-life (QOL) concerns. Follow-up evaluation occurred at 4 and 12 weeks.
The primary outcome of the parent study was to test the effects of a palliative care intervention on patients' quality of life, psychological distress, and satisfaction with oncology care and communication. To qualify for the study, patients were required to be 21 years of age or older, fluent in English, without cognitive impairment, diagnosed with a solid tumor, and initiating treatment on a phase I clinical trial. Exclusion criteria included cognitive impairment and hematologic malignancy. The trial was approved by institutional review boards at each site and registered with ClinicalTrials.gov ID NCT01612598. Written informed consent was provided by each participant.

Questionnaire
The previously developed PRO-CTCAE item library consists of 78 symptomatic AEs represented by 124 distinct items (23). When planning for this study, we met with stakeholders from the NCI and selected a pool of 45 items that were deemed relevant to the study population (see Table 1). PRO-CTCAE items were completed by phase I trial participants prior to clinic appointments. Participants were required to answer questions without assistance but could request technical assistance from study staff.

Quantitative Data and Anchors
A demographic data tool and 2 well-validated, patient-reported psychosocial measures were used as comparators in the instrument validation. These PRO anchors were administered to participants prospectively and selected based on literature review and expert consensus.

Psychological Distress Scale
The Psychological Distress Scale is a single item asking patients to rate their distress on a scale of 0 (none) to 10 ( extreme distress) (37). A mark of 5 or above indicates a need for intervention.

FACT-G
The FACT-G is a well-established validated QOL scale consisting of 27 items rated on a 0-4 scale. The tool includes subscales of physical well-being, social/family well-being, emotional wellbeing (EWB), and functional well-being, and overall QOL. All of the FACT-G items have a 5-point scale from 0 to 4 for responses ranging from "not at all" to "very much." The highest possible score for the EWB subscale is 24, and 28 for the other 3 subscales. Thus, the total FACT-G score can range from 0 to 108, with higher scores indicating better QOL. Subscale scores can be prorated for defined missing data (38).

Statistical Analysis
This study was designed as a 2-group experiment, powered to detect statistically significant group differences in QOL and related metrics in the intervention and control cohorts over time. As the patient-reported outcome components were part of the secondary endpoints in our study, the parent study was not powered to analyze the data specific to this tool.
Aggregate scores using PRO-CTCAE were calculated to explore the effect of overall symptom frequency (10 items), symptom severity (39 items), and symptom interference (21 items), by calculating the total of all scored items classified within each of those attributes. Dueck et al. implemented an intricate scoring system based on permutations of responses to the 45 questions (39). Our goal was to take a simplified approach, which measured the overall load of severity (S), interference (I), and frequency (F) of patient-reported outcomes (the 2 items related to presence of symptoms were not included in these calculations). The list of items used to compose each of these overall scores is found in Table 1. We used these metrics to identify associations between this and other validated tools.
To assess convergent validity, baseline scores were used to compute Pearson correlations between each PRO-CTCAE attribution group (F, S, I), and Functional Assessment of Cancer Therapy: General (FACT-G) Health Related Quality of Life (HRQOL) summary, subscales, and distress level. Corresponding calculations using scores at subsequent time points were also considered. Correlation values less than 0.3 were considered negligible, 0.3-0.5 small, and 0.5-0.7 as moderate in our comparisons (40). When applicable, P values were provided to indicate the probability of seeing a Pearson correlation coefficient greater than the observed values, under the null hypothesis that the coefficient was equal to 0, thus using a 2-sided test. The cut point used for statistical significance was .001, because a number of coefficients were tested.

Quantitative Data: PRO-CTCAE Scores and Correlation With Other Validated Tools
PRO symptom frequency, interference, severity, and problem and/or presence (P) are scored from 0 (not at all, no problem, or none) to 4 (all the time, big problem, a lot). Frequencies of PRO-CTCAE Category S (n ¼ 39) Problems with memory Attention/Memory X X --3 Arm or leg swelling Cardio/Circulatory X X --4 Pounding or racing heartbeat (palpitations) Cardio/Circulatory X -X -5 Tremors Cardio/Circulatory X -X -6 Acne and pimples Cutaneous X ---7 Hair loss Cutaneous ---X 8 Hand and foot syndrome Cutaneous X ---9 Problems with nails Cutaneous X ---10 Skin burns from radiation Cutaneous X ---11 Skin problems Cutaneous X X --12 Bloating of abdomen Gastrointestinal X -X -13 Constipation Heartburn Hiccups Problems tasting food or drink Gastrointestinal X ---20 Vomiting Urge to urinate Gynecologic/Urinary -X X -22 Frequent urination Gynecologic/Urinary -X X 23 Bruise easily Miscellaneous ---X 24 Hot flashes Miscellaneous X ---25 Shivering Numbness in hands and feet Neurological X X --31 Difficulty swallowing Oral X ---32 Dry mouth Oral X ---33 Mouth sores  Oral  X  X  --34  Skin cracking at mouth  Oral  X  ---35  Headache  Pain  X  X  --36  Pain  Pain  X  X  --37  Problems with breathing  Respiratory  X  X  --38  Cough  Respiratory  X  X  --39  Shortness of breath  Respiratory  X  X  --40  Decreased sexual interest  Sexual  X  ---41  Problems with ejaculation  Sexual  --X  -42 Fatigue Em dashes indicate items not used for this measure. F ¼ frequency; symptoms at baseline and follow-up (unadjusted and adjusted rates) are reported in Table 3, with number (percentage) of patients who reported any symptoms with grade 0 or higher and number reporting symptoms with grade 3 or higher.
Symptom levels reported at 4-and 12-week follow-up were combined to reflect the highest level across both time points for each symptom. Unadjusted scores reflect the worst or highest level of each symptom reported during both follow time points, without consideration of symptoms at baseline. Adjusted scores were obtained using the baseline grade subtraction method for the patient report (24), which takes into account the level of the PRO reported at baseline, with intent to identify the number of patients whose symptom worsened at either point in follow-up. Therefore, adjusted scores represent the subset of patients whose symptom for a PRO item was worse in follow-up (week 4 or 12) than at baseline, discounting the individuals who may have reported symptoms that were the same or improved from the baseline observation. This was deemed important to capture overall symptom burden and how symptoms changed over time given our interest to correlate with both patient distress and quality of life. Almost all participants reported the presence of at least 1 symptom (ie, a score of >0) at baseline. At baseline, patients reported frequent problems with memory (S, 45.3%), concentration (S, 42.6%), appetite (S, 47.0%), constipation (S, 45.9%), anxiety (F, 73.9%), depression (S, 49.7%), numbness in hands and feet (S, 42.2%), dry mouth (I, 41.8%), pain (S, 61.6%), fatigue (S, 73.5%), and insomnia (S, 50.3%). Symptoms at baseline that were scored 3 or higher by more than 10% of patients included hair loss (P, 13.  Figure 4). Low-frequency items (sexual, gynecologic and/or urinary, other miscellaneous) were not included in the graphs.
Mood, pain, sleep, and attention were common patientreported issues (Figure 1). Although most symptoms were reported as either mild or moderate in severity, fatigue and lack of energy were more commonly severe or very severe (16.2%). Pain was commonly reported as severe (6.3%) and interfering "quite a bit" or "very much" (7.0%).
Gastrointestinal adverse events ( Figure 2) were reported with higher severity. Appetite was one of the most common issues reported (47.0%) and increased in severity at follow-up (6.3% severe or very severe at baseline to 9.2% at follow-up). Cutaneous and oral symptoms ( Figure 3) were largely mild or moderate, with the exception of frequency of hair loss (13.4% at baseline). Interference with skin problems (baseline 12.9% to 18.1% follow-up) and dry-mouth severity (2.3% baseline to 3.1% follow-up) were more commonly reported between baseline and follow-up.
Respiratory, neurologic, and cardiovascular symptoms were mostly mild or moderate ( Figure 4). However, the severity of cough, dyspnea, dizziness, numbness, swelling, and tremors had increased.
Results related to correlation of derived aggregate scores at baseline for frequency, severity, and interference PRO-CTCAE items to previously validated tools are displayed in Table 4. (Correlation coefficients using scores at subsequent time points [week 4 and week 12] were also computed, yielded comparable results, and were not included.) Overall, PRO-CTCAE was positively correlated with distress scale and negatively correlated with FACT-G. The correlation coefficient (r) for psychologic distress scale was low (frequency: Pearson r ¼ 0.33; severity, r ¼ 0.43; and symptom interference r ¼ 0.46). For FACT-G (total scores), correlation with PRO-CTCAE symptom frequency (Pearson r ¼ -0.45), severity (Pearson r ¼ -0.67), and symptom interference (Pearson r ¼ -0.69) was moderate.
The moderate correlation between the PRO-CTCAE tool and the physical well-being subscale of FACT-G ( Table 4) provides evidence of concurrent validity, because both measures explore related constructs in symptom assessment. Similar results were   Table 4 focuses on PRO category of mood items of the PRO-CTCAE (anxiety and depression). Stronger Pearson correlation coefficients were noted between distress level and all PRO-CTCAE mood attributes: symptom frequency (Pearson r ¼ 0.57), symptom severity (Pearson r ¼ 0.60), and symptom interference (Pearson r ¼ 0.55). Similar correlations were also noted between PRO-CTCAE mood-related items and FACT-G total score (Pearson r ¼ -0.54 to -0.6), with strongest correlation among the EWB subscore. Overall, symptom severity explained a larger proportion of variability in distress and FACT-G than interference or frequency. Correlations between PRO-CTCAE items and corresponding FACT-G (total and subscales) and distress levels reached statistical significance for all items (P <.001).

Discussion
This secondary analysis provides evidence supporting the added value of PRO-CTCAE to measure the symptoms of patients enrolled onto phase I oncology trials. We noted small to moderate correlations for distress level for all PRO-CTCAE items (Pearson r ¼ 0.33-0.46) and moderate correlations with QOL as measured by FACT-G total (Pearson r ¼ À0.45 to À0.69).
Strengths of this study include a diverse patient sample with respect to age and disease site, with enrichment of less common cancers (pancreatic, kidney, sarcoma). Both institutions involved in the study are leaders in phase I therapeutics. We focused on patients treated on phase I trials for advanced cancer given a high frequency of symptomatic adverse events. As patients accrue new toxicities or worsening of baseline symptoms over the course of treatment, it is anticipated to observe a change in QOL or distress levels. In addition, 30% of participants were of minority population, reflecting the feasibility of survey administration to a range of racial backgrounds.
Our primary objective was to investigate the association of symptomatic toxicities, as measured by PRO-CTCAE, with global quality of life and psychological distress anchors. Demonstrating a correlation between symptoms, patient quality of life, and distress are important for several reasons. First, the US Food and Drug Administration has identified symptomatic AEs, physical function, and patient QOL as priority areas of interest for PRO analysis (41). Second, although phase I trials are primarily focused on dose finding and a preliminary assessment of the safety of a new agent or drug combination, several investigators have suggested expanding the definition of a dose-limiting toxicity to include PRO data (42). A deeper      understanding of how symptoms impact patient QOL and distress informs on the overall tolerability of a cancer therapeutic.
As it now stands, lower-grade toxicities below the threshold of the drug-limiting toxicity definition elude current methods for AE analysis and may underestimate drug contribution to patient well-being (43). Therefore, establishing correlation between patient-reported toxicities as measured by PRO-CTCAE and QOL has potential to capture the impact of cumulative lower-grade toxicities.
Recent studies have demonstrated underreporting by physicians, compared with patients, on common symptoms of anorexia, nausea, constipation, diarrhea, and hair loss (44). In this study, many symptoms, such as bloating of the abdomen, constipation, problems with memory and concentration, frequent urination, dry mouth, anxiety, depression, and shortness of breath, affected nearly 40% of patients and were rated as severe or very severe (Table 3). These symptoms would be missed using global health-related QOL assessment tools (FACT-G). Our data also demonstrates that symptoms are experienced differently by patients, with distinct quality, frequency, intensity, and levels of interference. For example, hiccups (19%) and easy bruising (22%) were frequent problems but were almost never identified as severe or interfering with daily activities, whereas urination not only occurred frequently (39.9%) but also interfered with activities (26.1%) and was noted as a high-grade toxicity (scored ! 3) by 9% of patients. Therefore, it is important to measure not only the presence of a symptom but also the distinct symptom experience and how it impacts patient-reported overall quality of life. This heightened level of awareness would allow clinicians to better target the psychosocial needs of patients.
Correlating PRO-CTCAE and distress level is similarly important, because patients with advanced cancer often experience distress associated with disease-related symptoms or treatment-related side effects. In a preliminary study of the trial reported here, emotional distress levels for patients were high (45). The average overall distress on the Distress Thermometer was 3.6, with scores above 3 generally requiring clinical assessment and intervention (46). Stronger associations were noted for mood items of the PRO-CTCAE only with distress level (r ¼ 0.55-0.6), and PRO-CTCAE symptom interference scores had the strongest correlation with distress level (Pearson r ¼ 0.46) and FACT-G total (Pearson r ¼ -0.69). Previous investigators have documented the negative relationship between symptom distress and QOL, both physically and emotionally (47,48).
The PRO-CTCAE was not intended to combine individual items; the best way to combine the attributes (frequency, severity, interference) and how to interpret the scores has not been established and is under study. Dueck and colleagues recently presented a novel scoring algorithm for mapping PRO-CTCAE individual item scores into a single composite AE grade (39). Our intent of formulating an aggregate score was to explore whether symptom clusters in subcategories (interference, frequency, severity) would better characterize the patient experience. This is consistent with guidance from the NCI recommending descriptive reporting of available attribute (49). Importantly, our work enhances the interpretability and utility of PRO-CTCAE and adds to the currently sparse literature.
Several caveats and limitations should be considered. Our study was conducted in an English-speaking, US-residing patient population and limited in this regard. Second, we assessed convergent validity, but other measures of construct validity, such as divergent, discriminative, and predictive validity, are warranted. Third, the items tested were correlated with FACT-G and distress anchors, both of which were not widely used in validation and reliability studies to date. Future work will be critical regarding which modifications could be made to existing HRQOL instruments to reduce duplication and patient burden, with the ultimate goal of achieving a comprehensive evaluation of the patient experience most affected by therapy while maximizing the relevance of individual questions and minimizing duplicative work.
In conclusion, the results of this study suggest PRO-CTCAE is correlated with validated patient-reported tools measuring All PRO-CTCAE items, summarized over all PRO-CTC items with frequency (n ¼ 10), severity (n ¼ 39), or interference (n ¼ 21) attributes, respectively. c Mood items only (anxiety and depression), summarized overall mood-related PRO-CTC items with frequency (n ¼ 1), severity (n ¼ 2), or interference (n ¼ 2) attributes.
general quality of life and psychological distress and can achieve its intended aim to amplify the patient's voice. Further validation and additional psychometric work is needed to advance the clinical utility of PROs.

Funding
This research is supported by a research grant from NCI-RO1 CA177562, "Integration of Palliative Care for Cancer Patients on Phase 1 Trials" (B. Ferrell, T. Smith: Co-PIs); the City of Hope Core, NCI P30CA033572; and the Johns Hopkins Sidney Kimmel Comprehensive Cancer Center Core Grant, NCI.

Notes
Role of the funders: The funders had no role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication.

Conflicts of interest:
The authors have no conflicts of interest to report.
Role of the authors: All authors contributed to review and revision. BF and TJS wrote grant applications. All authors performed data analysis and interpretation. RS, BF, NR, TJS drafted the manuscript. All authors contributed to editing and critical revision for important intellectual content.