U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Testing Ways to Display Patient-Reported Outcomes Data for Patients and Clinicians

Testing Ways to Display Patient-Reported Outcomes Data for Patients and Clinicians

, PhD, Principal Investigator, , MD, MSc, Principal Investigator, , PhD, Project Team, , MHS, Project Team, , PhD, Project Team, , Project Team, , ScM, Project Team, , PhD, Stakeholder Advisory Board, , MD, Stakeholder Advisory Board, , MD, Stakeholder Advisory Board, , MD, Stakeholder Advisory Board, , MPH, Stakeholder Advisory Board, , PhD, Stakeholder Advisory Board, , Stakeholder Advisory Board, , Stakeholder Advisory Board, and , Stakeholder Advisory Board.

Author Information and Affiliations

Structured Abstract

Background:

Patient-reported outcomes (PROs) assess health, disease, and treatment from the patient perspective. The large number of PRO questionnaires and lack of standardization in scoring and scaling make it difficult for patients and clinicians to interpret PRO scores for use in practice.

Objectives:

We investigated PRO score presentation approaches to promote patient and clinician understanding and use. We addressed (1) individual patients' PRO scores for monitoring and management (individual-level) and (2) PRO results from research studies comparing treatment options (group-level) data. Like previous research, we conducted the study in a cancer treatment setting.

Methods:

We conducted a 3-part mixed-methods study. In Part 1, we conducted in-person semistructured interviews with 50 survivors and 20 clinicians to assess aspects of current data display formats to determine whether they were helpful/confusing. In Part 2, work groups composed of Part 1 participant volunteers partnered with the research team to develop improved data presentation formats, which were then preliminarily evaluated via in-person interviews with 39 survivors and 40 clinicians. Part 3 tested the formats that emerged from Part 2 using a broad-based online survey of cancer survivors (n = 1256), cancer clinicians (n = 608), and PRO researchers (not cancer specific) (n = 747) recruited via email lists of stakeholder groups and snowball sampling, plus in-person interviews with 20 survivors and 25 clinicians. Across Parts 1 through 3, we recruited in-person interviewees from a mid-Atlantic consortium of academic and community health systems. We purposively sampled survivors based on education, cancer type, and clinical setting; we selected clinicians based on specialty and clinical setting. A 9-member Stakeholder Advisory Board informed all aspects of study design, conduct, and reporting.

Results:

The Part 1 findings supported presenting line graphs of scores over time for individual-level data; the group-level data findings suggested that clinicians value statistical information (eg, P values, confidence limits), but patients find this information confusing. Therefore, in Parts 2 through 3, we addressed group-level data presentation to patients separate from clinicians. Part 2 identified formats to test in Part 3. In Part 3, for individual-level data, interpretation accuracy and clarity ratings were better for line graphs, with higher scores always indicating better outcomes vs higher scores indicating “more” of the outcome (better for function, worse for symptoms). Clarity ratings and overall preferences supported using a threshold line to indicate possibly concerning scores. For presentation of group-level data to patients, interpretation accuracy, clarity ratings, and overall preferences supported presenting proportions using pie charts (vs bar graphs or icon arrays); interpretation accuracy supported “better” line graphs (compared with “more” or normed line graphs); clarity ratings supported “better” vs “more” line graphs. For presentation of group-level data to clinicians, interpretation accuracy and clarity did not differ significantly between pie charts and bar graphs. Interpretation accuracy and clarity ratings supported better over normed line graphs (no difference between “better” and “more”). Clarity ratings supported including some indication of significant differences between groups (eg, asterisk).

Conclusions:

Interpretation accuracy, clarity ratings, and preferences differ among PRO presentation formats. We will now use these results to conduct a modified-Delphi consensus process to develop recommendations for PRO data presentation.

Background

One of the key measures of health is how people feel and function—their well-being and ability to perform their usual activities. Patient-reported outcome (PRO) measures are questionnaires that assess information directly from patients about topics such as symptoms, functioning, health-related quality of life, and so on.1,2 PRO measures can be used to promote patient-centered care in many ways.3 For example, PROs can be administered to individuals as part of routine care and then used to monitor the person's status and inform his or her management.4-17 PRO results from clinical trials and other comparative research studies (eg, treatment X vs treatment Y) can inform management and decision making, both through educational materials/decision aids directed to the patient and through publication of research results in peer-reviewed journals directed to clinicians.18-22

However, many barriers prevent optimal use of PROs in clinical practice. Many different PRO questionnaires exist,23 and there is no standardization in the field in how these PROs are scored and scaled, or in how the data are presented. Some PROs are scored such that higher scores indicate better outcomes, some PROs are scored such that lower scores indicate better outcomes, and some PROs are scored such that higher scores indicate “more” of what is being measured (making higher scores better for function domains but worse for symptom domains). Further, some measures are scaled 0 to 100, with the best and worst scores at the extremes, whereas others are normed to, say, a population average of 50. Patients and clinicians report that this lack of standardization creates confusion when working with individual patient-level data because a score of 50 can mean completely different things depending on the measure, and it can be hard to remember whether higher scores are better or worse on any given questionnaire.24,25 Similarly, for group-level results from research studies, clinicians have endorsed the value of PRO data for informing patient-centered care but report a lack of confidence in interpreting the findings.21

There is also wide variation in the way that PRO data are presented. In terms of individual patient-level data, some examples of formats used on score reports are line graphs of scores over time,24 tabulated data,26 and heat maps.27 For group-level research studies, results may be presented based on a “responder definition” (ie, the proportion of patients improved, stable, worsened) or by showing mean scores over time.28 Given the lack of standardization in scoring and scaling, identifying approaches for presenting PRO results to promote understanding and use by patients and clinicians is critical. To that end, we investigated the following research questions:

  1. To what extent do current practices of PRO reporting limit clinician and patient understanding and use of PRO data, and what are the most and least desirable attributes of current practices of presenting PROs?
  2. What are novel ways to present PRO results to clinicians and patients to improve their usefulness?
  3. To what extent are these novel ways of presenting PROs effective in improving understanding and application of the data?

We conducted the study in 3 parts. Each part followed a formal study protocol and was reviewed by the Johns Hopkins School of Medicine Institutional Review Board. Across all parts, the project addressed both presentation of individual patient scores (individual-level data) and results from comparative research studies (group-level data). Because preliminary data addressing these questions were derived primarily from cancer populations, we conducted the study in a cancer treatment setting. Below, we describe the methods and results for each part separately.

Role of the Stakeholders in Shaping the Research Design

To ensure that relevant stakeholders' perspectives were incorporated in the design of the research, and through every stage of the project that followed, we first identified the key stakeholder groups: cancer patients/survivors, oncology clinicians, and PRO researchers. We then ensured that at least 1 representative of each of these stakeholder groups was on our team of principal and coinvestigators. Thus, every aspect of the design of the study, conduct of the project, interpretation of the findings, and reporting of the results included the key stakeholder perspectives. Supplementing the stakeholder perspectives on the investigator team, a Stakeholder Advisory Board (SAB), which included survivor, caregiver, clinician, and PRO researcher representatives, also informed the study design, project conduct, and results interpretation. In each section below, we describe the role of stakeholders during each part of the study.

Part 1: Evaluating Current Approaches for Pro Data Presentation

The Part 1 study has been published in Quality of Life Research.29 Here, we summarize key aspects of the methods and results.

Methods

Study Design and Role of Stakeholders

Part 1 was a cross-sectional mixed-methods study with the objectives to (1) evaluate survivor and clinician comprehension of PRO data using existing presentation approaches and (2) obtain qualitative feedback on attributes of different presentation formats that are helpful or confusing. Cancer survivors and clinicians were randomly assigned to evaluate either individual-level or group-level presentation formats. We held a 1-day in-person SAB meeting before the conduct of Part 1. During this meeting, we oriented the SAB to the project and their role. The SAB also reviewed and piloted the draft interview guides and suggested significant changes, which we implemented. For example, while we had initially planned only interview data collection, the SAB recommended that there be a self-directed portion followed by a debriefing interview (both parts described in more detail below). The SAB also reviewed the formats we planned to test.

Study Population and Setting

We conducted Part 1 with cancer survivors and oncology clinicians recruited from the Johns Hopkins Clinical Research Network (JHCRN), a consortium of mid-Atlantic academic and community health systems. To be eligible, cancer survivors had to be aged 21 years or older, have a cancer history (excluding nonmelanoma skin cancer), be > 6 months postdiagnosis, no longer be receiving acute therapy (long-term, chronic treatments taken for > 1 year were acceptable), be able to communicate in English without a translator, and have known education status. We purposively sampled survivors such that ≤ 30% represented a given cancer type, ≥ 10% had < a college degree, and ≥ 30% were from Johns Hopkins and ≥ 30% from the other study sites. Clinicians had to be in active practice treating adult cancer patients as a medical oncologist, radiation oncologist, surgical oncologist, gynecologic oncologist/urologist, oncology nurse practitioner/physician assistant, or oncology fellow. We purposively sampled clinicians, recruiting ≥ 2 from each specialty and ≥ 30% from Johns Hopkins and ≥ 30% from the other study sites. Both survivor and clinician participants provided written consent. We initially conducted 50 survivor interviews and 20 clinician interviews, with the option to continue to conduct interviews if thematic saturation had not been achieved. We informed all participants that they would receive a $35 gift card.

Data Collection

We conducted 1-time, in-person, semistructured interviews with the survivor and clinician participants. No difference in the data collection instruments or interviews existed between survivors and clinicians, and while the content of the interviews differed between individual-level and group-level data, the structure of the interviews was the same. We audio-recorded and transcribed the interviews. We pilot-tested the interviews before fielding.

First, participants completed a self-directed exercise, during which they completed an example PRO measure that included 6 domains from the European Organization for the Research and Treatment of Cancer Quality-of-Life Core Questionnaire (QLQ-C30)30: physical function, emotional function, global/overall quality of life, fatigue, nausea/vomiting, and pain.

We then showed different ways to display hypothetical PRO score results for the 6 QLQ-C30 domains from the practice exercise. Formats selected for testing covered a range of examples from the literature, were included in the draft interview guide submitted with the PCORI application, and were refined based on consultation with the SAB. For most formats, the data were displayed using the QLQ-C30 scoring convention of higher indicating “more” of what is measured (ie, better for function, worse for symptoms). Each format was shown on a separate page with its own description of which data were being shown and whether higher scores were better or worse. Instructions informed participants to refer to only the information on that particular format's page. There were 4 different formats for individual-level data: line graphs of scores over time,24 tabulated scores,26 bubble plots,11 and heat maps27 (Figures 1a-1d). There were 6 different formats for group-level data: simple line graphs of mean scores over time, line graphs with norms, line graphs with confidence intervals, bar charts of average changes, bar charts based on a responder definition, and cumulative distribution functions1 (Figures 2a-2f). To control for order effects, there were 4 different orders of presentation for both the individual-level and group-level interviews.

Figure 1a. Individual Patient Line Graphs of Scores Over Time (Adapted From Brundage et al).

Figure 1a

Individual Patient Line Graphs of Scores Over Time (Adapted From Brundage et al).

Figure 1b. Tabulated Scores (Adapted From Brundage et al).

Figure 1b

Tabulated Scores (Adapted From Brundage et al).

Figure 1c. Bubble Plot of Scores (Adapted From Brundage et al).

Figure 1c

Bubble Plot of Scores (Adapted From Brundage et al).

Figure 1d. Heat Map of Scores (Adapted From Brundage et al).

Figure 1d

Heat Map of Scores (Adapted From Brundage et al).

Figure 2a. Line Graphs of Mean Scores.

Figure 2a

Line Graphs of Mean Scores.

Figure 2b. Normed Line Graphs.

Figure 2b

Normed Line Graphs.

Figure 2c. Line Graphs with Confidence Intervals (Adapted From Brundage et al).

Figure 2c

Line Graphs with Confidence Intervals (Adapted From Brundage et al).

Figure 2d. Bar Charts of Average Changes (Adapted From Brundage et al).

Figure 2d

Bar Charts of Average Changes (Adapted From Brundage et al).

Figure 2e. Bar Charts of Responders (Adapted From Brundage et al).

Figure 2e

Bar Charts of Responders (Adapted From Brundage et al).

Figure 2f. Cumulative Distribution Function (Adapted From Brundage et al).

Figure 2f

Cumulative Distribution Function (Adapted From Brundage et al).

For each format shown, we asked 2 comprehension questions, primarily to engage participants with the formats. An example question from the individual-level data is “How is the patient doing today, in terms of fatigue, compared to the last few visits?” (better, worse, about the same, not sure); an example from the group-level data is “On which treatment do patients report better emotional function?” (treatment L, treatment P, treatments are about the same, not sure). Participants were then asked to rate both the format's ease of understanding (0 = very difficult to 10 = very easy) and perceived usefulness (0 = not at all to 10 = extremely). Following completion of the self-directed portion, the interviewer reviewed the questionnaire for missing items and asked participants if they wanted to complete the item, while reminding participants doing so was not required. The interviewer then proceeded with the debriefing interview, asking participants questions regarding how they interpreted the data on the different formats and aspects of the data presentation they found helpful or confusing. At the end of the interview, participants were asked to select the format they found easiest to understand and, separately, most useful.

Analysis

The Part 1 quantitative analyses were descriptive, with summary statistics for the proportions who answered each comprehension question correctly for each format, and median ratings for ease of understanding and usefulness. We analyzed patients and clinicians separately, exploring differences in perspectives. We also calculated the proportions selecting each format as most preferred for ease of understanding and for usefulness. The qualitative results from the debriefing interviews, including for missing data, informed our interpretation of the quantitative data. Specifically, since small numbers precluded dealing with missing data statistically, the interviewer used the debriefing interview to probe nonresponses on the self-directed portion to identify why the participant did not answer. To identify emergent themes, each transcript was coded in Atlas.ti using a codebook based on the format being discussed and whether the respondent was saying something positive, saying something negative, or recommending a change. The coding was completed by 1 investigator and reviewed by a second research team member. Atlas.ti reports were generated summarizing the positive and negative comments and recommended changes for each format. Team members independently reviewed the reports and identified themes, which we then discussed and refined based on these discussions.

Results

Study Population

Thematic saturation was achieved with our initial target sample size of 50 cancer survivors and 20 oncology clinicians (Table 1), half of whom evaluated the individual-level data formats and half the group-level formats. The survivors were a median age of 66, were 54% female; were 78% White; were a median of 3 years from their most recent diagnosis; and, per our purposive sampling, represented a range of cancer types, with 44% without a college degree. The clinicians were a median age of 42; were 40% female; were 85% White; and, per our purposive sampling, represented the various specialty categories. The clinicians had been in practice for a median of 17 years.

Table 1. Part 1 Participant Demographics (Adapted From Brundage et al).

Table 1

Part 1 Participant Demographics (Adapted From Brundage et al).

Individual-Level Data Formats

Table 2 details the respondents' ratings, overall preferences, and feedback regarding the individual-level data formats. Accuracy of interpretation was generally high across formats, ranging from 67% to 100% for survivors and 90% to 100% for clinicians. Both survivors and clinicians consistently preferred line graphs of scores over time. Specifically, 50% of survivors and 70% of clinicians selected line graphs as best for ease of understanding, and 62% of survivors and 80% of clinicians selected them for usefulness. Line graphs also had the highest median ratings for ease of understanding (median 8.0 for survivors and 8.5 for clinicians [out of 10]) and for usefulness (median 8.0 for survivors and 9.0 for clinicians).

Table 2. Summary of Results for Individual-Level Data Formats from Part 1 (Adapted From Brundage et al).

Table 2

Summary of Results for Individual-Level Data Formats from Part 1 (Adapted From Brundage et al).

Two interpretation challenges emerged from the individual-level data interviews: (1) how to highlight scores requiring clinical attention and (2) how to deal with directionality (ie, whether higher scores represent “more” or are always “better”). Regarding the clinical alerts, the approaches for highlighting potentially concerning scores (eg, the yellow frame on the line graphs) were not always understood, indicating a need to improve approaches for conveying scores requiring clinical attention. In terms of the score directionality, many comments reported confusion at the change in whether higher scores were good or bad: “It's just a little confusing that way, because it switches up from chart to chart”; “Just for user friendliness, I would make the directionality the same on all of them.” There were also some indications that this inconsistency affected interpretation accuracy, with some respondents who interpreted the score direction incorrectly reporting, “It was going up so I felt it was better, and … I didn't notice [indications of direction]” and “For some of them going up is better and so … my first instinct was incorrect.”

Group-Level Data Formats

Table 3 details the respondents' ratings, overall preferences, and feedback regarding the group-level data formats. Accuracy of format interpretation ranged from 36% to 100% for survivors and 56% to 100% for clinicians. Among the 25 survivors randomized to evaluate group-level formats, simple line graphs were most often preferred, with a third of participants selecting them as easiest to both understand and use. Survivors also rated them highest for ease of understanding and usefulness (median 8.0 out of 10 on both measures). Among the 10 clinicians randomized to evaluate group-level formats, 30% selected line graphs with confidence limits and 30% selected line graphs with norms as easiest to understand and use, although they rated simple line graphs highest for ease of understanding (median 9.0) and use (median 8.5). From the interviews, we found that clinicians valued the additional statistical information of P values, confidence limits, and norms, but survivors found this information confusing.

Table 3. Summary of Results for Group-Level Data Formats from Part 1 (Adapted From Brundage et al).

Table 3

Summary of Results for Group-Level Data Formats from Part 1 (Adapted From Brundage et al).

Based on these results, in which survivors and clinicians expressed different preferences, we consulted with our SAB and jointly decided that, to be responsive to this feedback, the needs of survivors and clinicians should be addressed separately. Further, while there was greater support for line graphs displaying scores over time, the appropriate format depends on context (ie, the operationalization of the study's primary outcome). Thus, Part 2 addressed both presentation of mean scores over time as well as proportions meeting a responder definition. In addition to confusion associated with inconsistent directionality also identified from the individual-level data collection, the key issues that emerged for group-level data presentation included (1) how to highlight meaningful differences (ie, statistical significance and clinical importance) and (2) how to address scaling variations.

Part 2: Partnering with Stakeholders to Develop Improved Data Presentation Formats

The Part 2 study has been published in Supportive Care in Cancer,31 and the literature review that informed the conduct of Part 2 was published in Patient Education and Counseling.32 Below, we summarize key aspects of the methods and results for Part 2.

Methods

Study Design and Role of Stakeholders

Part 2 was an iterative, innovative, stakeholder-engaged project designed to address the interpretation challenges identified in Part 1 through improved PRO data presentation approaches. Building on a key Part 1 finding that, for group-level data, survivors and clinicians prefer differing levels of detail, we addressed 3 different PRO data applications in Part 2: (1) individual-level data (as before), group-level data for presentation to patients (such as in educational materials and decision aids), and group-level data for presentation to clinicians (such as in peer-reviewed publications).

The objective of Part 2 was to identify candidate formats for broad-scale testing in Part 3. To do so, for each of the 3 applications, we undertook the same steps:

  1. Based on Part 1 results, the research team developed potential presentation approaches.
  2. We presented these candidate formats to a stakeholder work group composed of Part 1 participant volunteers who collaborated with the research team in improving presentation approaches.
  3. Based on feedback from the work group, we narrowed down and refined the candidate formats.
  4. These candidate formats were evaluated in one-on-one in-depth interviews.
  5. We presented results to our SAB.

In addition to these steps, we conducted a targeted review of the literature regarding PRO data display.

Stakeholder engagement occurred on multiple levels: (1) Our SAB contributed to the interpretation of the Part 1 results, which informed the potential presentation approaches the research team developed; (2) our novel work group approach involved Part 1 participants who were invited to become Part 2 collaborators in improving the presentation formats; (3) we obtained additional input from survivor and clinician one-on-one interviewees; and (4) the SAB provided input on interpreting the Part 2 findings. We describe the work groups and one-on-one interviews in more detail below. In terms of the SAB's role, we held a 2-day in-person meeting. During the first day, we reviewed the Part 1 results with the members and obtained their input on the interpretation of the findings. During the second day, we consulted with them on the plans for the Part 2 work groups and one-on-one interviews. We specifically elicited their input on how the work groups should be composed and conducted, as well as which formats and variations to test based on their interpretation of the Part 1 findings.

Literature Review

We reviewed the published literature addressing the presentation of PROs to inform developing and testing the candidate PRO data presentation formats. To briefly summarize the findings that have been published previously,32 we found in the literature 9 relevant empirical studies, all of which were conducted in oncology, with 8 of the 9 focusing on adult populations. Of the studies, 4 addressed individual-level data presentation, 4 addressed group-level data presentation, and 1 addressed both.

The 4 key themes identified from the review were the following: (1) Many patients and most clinicians can accurately interpret some PRO graphs; (2) interpretation accuracy, personal preference, and perceived level of understanding are sometimes discordant; (3) patient age and education may predict PRO graph comprehension; and (4) patients tend to prefer simpler graphs than clinicians do. These findings supported our Part 1 results and strategy for Part 2.

Study Population and Setting

Similar to Part 1, we conducted Part 2 in cancer survivors and oncology clinicians recruited from the JHCRN. Work group members were Part 1 participants who had volunteered to work with the research team to develop improved data presentation formats. Eligibility criteria for the one-on-one interviews were the same as Part 1, including the purposive sampling. For application 1 (individual-level data) and application 2 (group-level data presented to patients), both survivors and clinicians were represented on the work groups and in the interviews; for application 3 (group-level data for clinicians), only clinicians were represented on the work groups and in the interviews. All work group and interview participants provided written consent and were informed that work group members would receive a $70 gift card for attending meetings and interviewees a $35 gift card for participating.

Work Group Process

Although the work groups were not focus groups, because the moderation of the discussions was similar, 2 research team members with extensive experience in focus group moderation led the discussions. These experienced moderators were able to elicit input from participants with diverse outlooks and differing levels of authority/expertise. At each work group meeting, the preliminary findings from Part 1 were presented, the key interpretation challenges described, and potential approaches for addressing these issues presented and discussed. Discussions were structured to encourage participation from all stakeholders toward jointly developing best practices. This was aided by prioritizing the communicative needs and capacities of the primary audience in each case (eg, clinicians for journal publications, patients for decision aids). The Individual-Level Data Work Group included 6 survivors and 2 clinicians; the Group-Level Data for Patients Work Group included 3 survivors and 2 clinicians; and the Group-Level Data for Clinicians Work Group included 5 clinicians. The work group discussions were not formally analyzed, but audio-recordings, transcripts, and notes were used for reference.

One-on-One Interviews

We tested the presentation formats recommended by the work groups in one-on-one interviews with new survivors and clinicians, as appropriate. The structure of these one-on-one interviews was similar to that used in Part 1, with some modification. As in Part 1, the participants first completed a self-directed exercise, including completion of an example PRO questionnaire, followed by presentation of hypothetical results using different formats shown in a random order. For each format, participants were asked 2 data interpretation questions (to engage them with the formats), and were also asked to rate the format's ease of understanding. Following the self-directed portion, a debriefing interview was conducted during which participants were probed about their interpretation of the formats and presentation attributes they found helpful or confusing. Before conducting the debrief, the interviewer reviewed the questionnaire for missing items and asked participants if they wanted to complete the item, while reminding participants doing so was not required.

Unique to the Part 2 interviews was a series of pair-wise comparisons in which, for each format displayed, a slightly different version was shown during the interview for comparison (eg, instead of shading the normal region green, shading the concerning region red). Participants were asked which option they preferred and why. Table 4 summarizes, for each of the applications (individual-level, group-level for patients, group-level for clinicians), the interpretation challenges addressed, the formats included in the self-directed portion, and the alternatives presented for pair-wise comparisons. Notably, with the separation of patients and clinicians, we reevaluated the formats to test. For example, we added pie charts and icon arrays for testing with patients, as these are commonly used in patient educational materials and decision aids.

Table 4. Summary of Interpretation Challenges Addressed and Approaches Tested for Each Data Presentation Topic in Part 2 (Adapted From Smith et al).

Table 4

Summary of Interpretation Challenges Addressed and Approaches Tested for Each Data Presentation Topic in Part 2 (Adapted From Smith et al).

For individual-level data, based on the Part 1 findings, we tested only line graphs of scores over time. The 2 interpretation challenges addressed for this application were (1) directional inconsistency (ie, whether higher scores should represent better outcomes regardless of the domain, or more of what is measured [better for function, worse for symptoms]) and (2) how to highlight potentially concerning scores for clinical attention. Notably, the directionality issue applies across both individual-level and group-level applications, but we included it only in the individual-level interviews for Part 2 due to the other considerations that required attention for the group-level formats. The 3 formats included in the self-directed portion for individual-level data were (1) y-axis labeled with descriptors (eg, none, mild, moderate, severe), (2) shading the normal range of the graph in green, and (3) red circles around potentially concerning scores. Table 4 lists the various alternatives to each of these “base-case” formats.

For group-level data presented to patients, we addressed both line graphs of scores over time and proportions meeting a responder definition because the appropriate approach for data presentation is determined by the operationalization of the study's primary endpoint. The 2 interpretation challenges addressed were (1) meaning of scores (ie, what is good, bad, normal) and (2) highlighting important differences between treatments. Because the Part 1 findings identified statistical terminology as confusing to patients, these formats focused on differences described as “important,” rather than distinctions between statistical and clinical significance. The self-directed portion included 3 line graphs: (1) descriptive labels added to the y-axis, (2) normal region shaded green, and (3) asterisks indicating important differences. The self-directed portion included 3 proportion formats: (1) bar graphs, (2), icon arrays, and (3) pie charts. Table 4 lists the various alternatives to each of these base-case formats.

We addressed mean scores over time/average changes and proportions for group-level data presented to clinicians. The interpretation challenges addressed included (1) meaning of scores, (2) representations of statistical significance, and (3) representation of clinical importance. For the mean scores over time, the base-case formats included in the self-directed portion were (1) line graphs with descriptors on the y-axis, (2) line graphs with confidence limits and P values, (3) line graphs with an indication of clinical significance, and (4) bar charts of average changes at a single time point with an asterisk indicating statistical significance. A bar chart of proportions meeting a responder definition was also included. Table 4 lists the various alternatives to each of these base-case formats.

We pilot-tested the interviews before fielding.

Analysis

The analysis for Part 2 was similar to Part 1, including exploring missing self-directed exercise responses via qualitative methods. Quantitative analyses were descriptive, with summary statistics for the proportions selecting each format in the pair-wise comparisons and median ratings for ease of understanding. We analyzed survivors and clinicians separately, exploring differences in perspectives. We also analyzed the qualitative data from the debriefing interviews for emergent themes using similar approaches to Part 1, including a coding scheme based on the different formats, coding by 1 researcher and review by another, creation of summary reports, and discussion of emergent themes as a team.

Results

One-on-One Interview Study Population

We conducted a total of 79 one-on-one interviews: 19 survivors and 10 clinicians evaluated individual-level data formats, 20 survivors and 10 clinicians evaluated group-level data formats for presentation to patients, and 20 clinicians evaluated group-level data formats for presentation to clinicians (Table 5). The sample characteristics reflect our purposive sampling approach.

Table 5. Part 2 Interview Participant Characteristics (Adapted From Smith et al).

Table 5

Part 2 Interview Participant Characteristics (Adapted From Smith et al).

Individual-Level Data Formats

Across the formats, median ease-of-understanding ratings were high, ranging from 9 to 10 for patients and 8.5 to 10 for clinicians. However, there were wide ranges in ratings for the patients, with some ratings as low as 0 to 1. Clinicians had less variation, with minimum ratings ranging from 5 to 7.

The descriptive labels were considered helpful in interpreting the scores, with multiple participants referring to them when responding to the interpretation questions. However, some did not like the labels: “none, mild, moderate, and severe are maybe arbitrary or a little bit fuzzy.” Overall, 79% of patients and 90% of clinicians preferred having descriptive labels (either standard or based on the questionnaire response format) vs numbers alone on the y-axis. Participants considered the shading helpful, both in terms of understanding the score meaning and for directionality: “lets you know immediately visually whether good is at the top or the bottom.” Participants considered the green-shaded normal range to be “less threatening” and “more positive,” while they found the red-shaded concerning range represented “a call to action.” Participants considered the red/yellow/green spectrum formats to be “too busy” and to have “too many colors.” Overall, 74% of patients and 80% of clinicians preferred red or green shading over the spectrum. The red circles effectively helped participants identify possibly concerning scores (“the little circles were a good give away”), but the threshold line alternative gave some “a little bit more information … [about] how far below the threshold that score is.” Overall, 69% of patients and 70% of clinicians preferred the red circles or threshold lines over shading. We also tested using an exclamation point to indicate possibly concerning changes in scores, but it was not clear that patients could distinguish between poor scores in absolute terms as opposed to important worsening. Finally, both patients and clinicians were evenly split on whether it was preferable to have higher scores represent better outcomes consistently (“I … like … where they're all the same”) or “more” of the domain being measured (“going up to equal less doesn't make sense to me”).

Based on these findings, and in consultation with our SAB, we selected 3 formats for testing in Part 3: green shading of the normal range, red-circled possibly concerning scores, and a threshold line between normal and concerning scores.

Group-Level Data Formats for Presentation to Patients

All 3 line graph approaches had high ease-of-understanding ratings from both patients and clinicians. Patients' median ratings ranged from 8 to 9 (minimum = 4, maximum = 10) and clinicians' median ratings ranged from 7.5 to 10 (minimum = 6, maximum = 10). As with the individual-level formats, patients considered descriptive labels on the y-axis to be helpful: “I think the word use here of ‘good,’ ‘moderate,’ ‘poor,’ ‘very poor’ is also pretty clear.” Overall, 80% of patients and 100% of clinicians preferred labels over numbers alone. While some participants liked the green-shaded normal range (“It really gives an idea of where a patient should be”) many did not understand it (“I was not sure what the shaded green areas meant”). Patients also considered it busy, and there were questions about whether the shaded region should be normal for the general population, or normal for those on treatment. Some patients understood the asterisks indicating important differences (“means the difference is big enough to be important”), but others did not (“doesn't really mean anything to me”). In the comparison between asterisks and shading to indicate important differences, patients were split 50% and 50%, but 80% of clinicians preferred the asterisks. Notably, although we did not evaluate scoring directionality specifically, patients made several comments related to this: “One you want going up and the other you want your line going down”; “I got tripped up over the higher is better for patients functioning over time and patients symptoms over time lower is better.” Overall, 80% of clinicians preferred asterisks over shading and 90% preferred asterisks over the descriptive labels; patients were more evenly split (50% and 50% on asterisks vs shading; 45% and 55% for asterisks vs descriptive labels).

In terms of the proportion formats, median ease-of-understanding ratings tended to be a bit lower, ranging from 7 to 9 (minimum = 3, maximum = 10) for patients and 6 to 7.5 (minimum = 1, maximum = 10) for clinicians. Patients most preferred pie charts (55% of patients, 70% of clinicians) over the bar graphs and icon arrays. Participants considered the pie charts to be “easy to understand” and “familiar.” Patients considered bar charts to be more challenging to interpret (“I have to study it, and study it, and study it”) and icon arrays to be busy (“It just hurts my eyes, all these people”), although patients did note that the icons may be “more relatable.”

Based on these findings, and in consultation with our SAB, we tested 3 line graphs in Part 3: descriptive y-axis labels with higher = better, descriptive y-axis labels with higher = more, and normed line graphs. We also included all 3 proportion formats: pie charts, icon arrays, and bar graphs.

Group-Level Data Formats for Presentation to Clinicians

Clinicians' median ease-of-understanding ratings for the 3 line graph versions ranged from 7.5 to 8 (minimum = 1, maximum = 10), which was higher than both the bar charts of average changes at a single time point (median = 6.5, range 2-10) and bar charts of proportions meeting a responder definition (median = 7.0, range 1-10). They had a slight preference for descriptive labels on the y-axis (55%) vs numbers only (45%). Most clinicians (95%) wanted P values in addition to confidence limits (“adds another degree of clarity”). Although several participants appreciated confidence limits, others had difficulty interpreting them correctly: “I don't want to … look at crossing confidence intervals.” Further, some clinicians appreciated the distinction between statistical and clinical significance: “The P value is always important … but whether they're clinically significant or not it's very difficult to tell just from a P value.” Others confused the concepts by, for example, referring to P values when asked about clinically important differences: “I just jumped to the P values.” Overall, 75% of clinicians preferred some indication of clinical importance in addition to statistical significance, but they had no clear preference between shading and asterisks for doing so, with 45% of clinicians endorsing each approach; clinicians considered having a legend only to be insufficient. Clinicians less preferred the bar charts of average changes at a single point in time because they represent only a single time point and provide information only on change from baseline (as opposed to absolute levels). When asked to select their preferred line graph format, 60% selected ones with an indication of clinical significance and 30% selected ones with confidence limits. Clinicians preferred line graphs of mean scores over time to the bar chart of average changes at a single time point (75% vs 25%). When proportions meeting a responder definition was added as an option, this was preferred by only 10% of clinicians.

Based on these results, and in consultation with our SAB, in Part 3, we tested 3 approaches to line graphs of mean scores over time: descriptive y-axis labels with higher = better, descriptive y-axis labels with higher = more, and normed line graphs, as well as variations on each of these for indicating statistical significance and clinical importance. We also included 2 formats for displaying proportions responding to treatment because, even though this approach was not most preferred, it is commonly used as an outcome in comparative research studies. In addition to the bar charts tested here, we also included the pie charts, which were most preferred for the group-level data communicated to patients.

Part 3: Evaluating Pro Presentation Approaches from Part 2

Three papers report the main Part 3 study results, 1 for each of the 3 applications investigated.33-35 In this section, we summarize key aspects of the methods for Part 3 overall, and the findings for each of the 3 applications.

Methods

Study Design and Role of Stakeholders

Part 3 included an internet survey of stakeholder groups, supplemented with one-on-one in-person interviews conducted through the JHCRN. The objectives of Part 3 were to evaluate interpretation accuracy and clarity of the candidate formats developed in Part 2. The SAB met in person for 2 days at the end of Part 2 to review the results with the research team and to discuss the strategy for the Part 3 online survey and interviews. As described above, the SAB's input during these meetings was critical in determining the formats to test in Part 3. We also held a web conference at the end of Part 3 to review the results with the SAB and to discuss next steps. During this web conference, the SAB advised that the evidence base from the project was sufficient to inform the development of consensus standards, but did not set the standards as is. We applied for funding to support this consensus development in response.

Study Population and Setting

We conducted the internet survey in adult cancer survivors, oncology clinicians, and PRO researchers (not necessarily cancer focused). Survey participants had to confirm they were aged 21 years or older and were then asked which group they most closely identified with: researcher with experience developing and investigating methods associated with PROs, health care provider to cancer patients, or cancer patient/survivor. We used a snowball sampling technique in which we partnered with our SAB and other contacts to identify email lists and social media accounts to which to circulate our survey. Examples of organizations that helped circulate the survey include Stupid Cancer (survivors), MDRing (clinicians), and the International Society for Quality of Life Research (researchers). The survey invitation encouraged participants to forward the link to others who might be interested and eligible.

There was no target sample size for the online survey, and the instructions informed participants that completion of the survey represented their consent. We offered the chance to win $100 gift cards for online survey participants.

The one-on-one in-person interviewees included cancer survivors and clinicians recruited from the JHCRN, excluding individuals who had participated in Parts 1 or 2. The eligibility criteria were the same as for the previous one-on-one interviews, including the purposive sampling approaches. The target sample sizes for the Part 3 interviews were 10 survivors and 10 clinicians for the individual-level data application, 10 patients and 5 clinicians for group-level data presented to patients, and 10 clinicians for group-level data presented to clinicians. As before, all interviewees provided written consent and were informed they would receive a $35 gift card.

Data Collection

We used the tool Qualtrics for the internet survey administration. There were a total of 30 survey versions: 6 versions addressing individual-level data, 6 addressing group-level data presented to patients, and 18 addressing group-level data presented to clinicians. The versions primarily differed in the order in which the candidate PRO data presentation formats were displayed. We describe in more detail the specific formats tested for each of the 3 applications under the Results section, and the appendix includes a sample version from each of the 3 applications. Clinicians and PRO researchers were randomly assigned to any of the 30 versions; patients/survivors were randomized to either the individual level or group level for patient versions. Clinician randomization was slightly underweighted for group-level data for patients, given that clinicians are a secondary audience for this application.

For all survey versions, a brief description of the PRO application (ie, individual patient monitoring, patient education/decision aids, peer-reviewed publications) was provided. Participants were informed that the survey would show different ways of displaying hypothetical PRO results. Based on feedback from the Part 2 work groups, only 4 domains were shown for each format: physical function, emotional function, fatigue, and pain. For each candidate format displayed, participants responded to 1 or more questions to evaluate accuracy of interpretation and were asked to rate the clarity on a 4-point scale from “very clear” to “very confusing.” They also had the opportunity to provide free-text comments. A screen notified participants when the format and data were changing. At the end of the survey, participants were asked to select the most useful format and to comment on why. If participants skipped a question, a pop-up invited them to complete it, but allowed them to proceed without answering if they chose.

The one-on-one interviewees completed 1 of the online survey versions in the presence of a research coordinator. Interviewees were asked to verbalize their thought processes while answering the questions, as well as to respond to specific prompts to describe aspects of the presentation formats that were helpful or confusing. The encounters were audio-recorded and transcribed for analysis.

We pilot-tested both the internet survey and the one-on-one interviews before fielding.

Analysis

We summarized the sample characteristics for each of the 3 applications separately. We analyzed the interpretation accuracy questions and clarity ratings descriptively. In addition, we constructed multivariable logistic regression generalized estimating equation models with logit links and exchangeable correlation structures. We modeled both interpretation accuracy and clarity as a function of format and respondent type (survivor, clinician, researcher) and for accuracy, with fixed effects for specific accuracy survey questions. We counted missing responses as incorrect in the multivariable models because respondents did not answer the question correctly; therefore, we included missing responses in the analysis. We used Fisher's exact and chi-square tests to test for differences in the format selected as most useful. These quantitative analyses were complemented by qualitative analyses of both the comments from the online survey and the analysis of the in-person interview transcripts. For the interview data, similar to Parts 1 and 2, 1 researcher coded the transcripts in Atlas.ti using a codebook based on the format, and a second researcher reviewed the coding. Atlas.ti reports and online comments were organized by format and reviewed to identify themes, which were then discussed by team members.

Results

The Part 3 study population, comparators and survey design, and findings are summarized for each of the 3 applications separately.

Individual-Level Data

Study population

A total of 1113 online respondents was randomized to evaluate individual-level data formats: 627 survivors, 236 clinicians, and 250 researchers (Table 6).

Table 6. Part 3 Individual-Level Data Online Survey Participant Characteristics (Adapted From Snyder et al).

Table 6

Part 3 Individual-Level Data Online Survey Participant Characteristics (Adapted From Snyder et al).

  • Survivors were a mean age of 59 years, 85% female, and 96% White. Breast cancer was the most common diagnosis (56%), and 49% were within 5 years of diagnosis. There were 20% who had not graduated from college.
  • Clinicians were a mean age of 45 years, 58% female, and 73% White. They had been in practice for 17 years on average, and 44% were medical oncologists.
  • Researchers were a mean age of 46 years, 72% female, and 85% White. They most commonly reported expertise in PRO assessment, psychology, or sociology (51%), and 45% had > 10-years' experience.

The in-person interviewees included 10 survivors (3 breast cancer survivors, 7 from Johns Hopkins, and 3 with less than a college degree). The 10 clinician interviewees included ≥ 1 participant from each medical specialty; 4 were from Johns Hopkins.

Comparators and survey design

Based on the Part 1 and 2 findings, we tested 3 line graph formats in Part 3: green-shaded normal range, red-circled potentially concerning scores, and red threshold lines between normal and concerning scores (Figures 3a-3c). The formats were presented in 3 different orders depending on the version, such that each format was presented first, second, or third for a third of respondents.

Figure 3a. Green-Shaded Normal Range-“More” Directionality (Adapted From Snyder et al).

Figure 3a

Green-Shaded Normal Range-“More” Directionality (Adapted From Snyder et al).

Figure 3b. Red-Circled Concerning Scores-“More” Directionality (Adapted From Snyder et al).

Figure 3b

Red-Circled Concerning Scores-“More” Directionality (Adapted From Snyder et al).

Figure 3c. Threshold Line for Possibly Concerning Scores-“More” Directionality (Adapted From Snyder et al).

Figure 3c

Threshold Line for Possibly Concerning Scores-“More” Directionality (Adapted From Snyder et al).

Further, half the respondents were shown line graphs in which higher scores indicated “more” of what is measured (better for function, worse for symptoms) and half were shown line graphs in which higher scores always indicated better outcomes (example in Figure 3d). Thus, there were a total of 6 survey versions (3 format orders × 2 approaches to directionality).

Figure 3d. “Better” Directionality (Example Using Threshold Line for Possibly Concerning Scores) (Adapted From Snyder et al).

Figure 3d

“Better” Directionality (Example Using Threshold Line for Possibly Concerning Scores) (Adapted From Snyder et al).

For the first format seen by participants, 2 interpretation accuracy questions evaluated directionality (eg, did the patient's ability to do physical activities get better or worse), 1 question asked participants to select which of the domains shown “have changed by at least 10 points (select all that apply),” and 1 question asked participants to select which of the domains “are possibly concerning? (select all that apply).” For each of the second and third formats, there was 1 question regarding directionality and 1 question regarding possibly concerning scores (“select all that apply” format).

Because of the “select all that apply” questions, there were a total of 12 accuracy questions for the first format seen by participants and 24 accuracy questions across the 4 formats. The underlying data changed for each of the 3 formats (ie, the data for the second format seen differed from the data for the first format seen), and as noted above, a dividing screen warned of the change. However, the underlying data and questions were always the same across all versions for the first, second, and third formats seen. Thus, the only difference was the format used to display the results. This enabled us to compare formats without confounding by differences in question difficulty or differences in the underlying data.

Findings

Accuracy of interpretation for clinical importance was generally high across the 3 formats and 2 directionalities. Survivors identified the highlighted areas of concern accurately 67% to 84% of the time (depending on the item) for green shading, 67% to 86% for red circles, and 53% to 85% for threshold lines (Table 7). They identified changes ≥ 10 points accurately 74% to 82% of the time for green shading, 80% to 83% for red circles, and 76% to 83% for threshold lines. Clinicians identified the highlighted areas of concern accurately 67% to 90% of the time for green shading, 64% to 98% of the time for red circles, and 65% to 90% for threshold lines. They identified changes ≥ 10 points accurately 82% to 87% of the time for green shading, 80% to 98% for red circles, and 80% to 85% for threshold lines. Researchers identified the highlighted areas of concern accurately 78% to 95% of the time for green shading, 76% to 98% for red circles, and 66% to 93% for threshold lines. They identified changes ≥ 10 points accurately 85% to 93% of the time for green shading, 86% to 98% for red circles, and 86% to 93% for threshold lines.

Table 7. Part 3 Individual-Level Data: Accuracy of Interpretation for Clinical Importance (Adapted From Snyder et al).

Table 7

Part 3 Individual-Level Data: Accuracy of Interpretation for Clinical Importance (Adapted From Snyder et al).

Table 8. Part 3 Individual-Level Data: Accuracy of Interpretation for Directionality (Adapted From Snyder et al).

Table 8

Part 3 Individual-Level Data: Accuracy of Interpretation for Directionality (Adapted From Snyder et al).

In the multivariable models comparing accuracy of interpretation (Table 9), the only statistically significant difference by format (ie, green shading, red circles, threshold lines) was an advantage of red circles over green shading when the first format was seen (odds ratio [OR] = 1.29; P = .05). In terms of directionality, having higher scores consistently indicate better outcomes was more accurately interpreted than having higher scores indicate “more” of what is measured across all format questions (OR, 1.30; P = .009).

Table 9. Part 3 Individual-Level Data: Multivariable Model Results for Accuracy of Interpretation and Clarity Ratings.

Table 9

Part 3 Individual-Level Data: Multivariable Model Results for Accuracy of Interpretation and Clarity Ratings.

Clarity ratings were also high across formats (Table 10). The proportion of survivors rating each format either “somewhat” or “very” clear ranged from 83% for “more” green shading to 90% for “better” red circles. For clinicians, the range was 75% for “more” threshold lines to 85% for “more” red circles and “better” threshold lines. For researchers, the range was 79% for “more” red circles to 93% for “better” threshold lines. In the multivariable models (Table 9), the threshold lines were more likely to be rated “very” clear than the green shading (OR, 1.43; P < .0001) and the red circles (OR, 1.22; P = .03). Compared with the “more” directionality, “better” directionality was more likely to be rated “somewhat” or “very” clear (OR, 1.39; P = .002) and “very” clear (OR, 1.36; P < .0001).

Table 10. Part 3 Individual-Level Data: Clarity Ratings.

Table 10

Part 3 Individual-Level Data: Clarity Ratings.

In the in-person interviews and online comments, respondents described the green shading as “user friendly, easy on the eye and brain” and noted “with the shading, it was more apparent even without reading indicators … which areas were good, which areas were bad.” However, other respondents did not understand the shading (“not sure why there is green shading. Does this represent something?”) or thought it made more sense to shade the concerning scores (“like the ‘danger zone’”). Respondents considered the red circles an “awesome … obvious” way to indicate problems; they felt the circle format “instantly conveys ‘Hey, warning—look at this!’” Again, some participants did not understand the meaning of the red circles, and noted that they do not indicate how far from the normal range a score is. Others found the circles to be a “distraction” or were concerned that they could “cause alarm.” Respondents considered the threshold lines to be “very easy to read and understand” and “idiot-proof.” However, others thought having a single cutoff between normal and concerning scores was arbitrary, and some who saw the “more” directionality complained that “you have to pay attention to … where the instructions say results above or below.”

Overall, across respondent types (survivors, clinicians, researchers) and the directionalities (“more” and “better”), the threshold lines were most often selected as most useful (Table 11).

Table 11. Part 3 Individual-Level Data: Proportion Selecting Each Format “Most Useful”.

Table 11

Part 3 Individual-Level Data: Proportion Selecting Each Format “Most Useful”.

Differences in the preference percentages were statistically significant, with the exception of survivors who saw the “more” directionality.

Group-Level Data for Presentation to Patients

Study population

A total of 1017 online respondents were randomized to evaluate group-level data formats for presentation to patients: 629 survivors, 139 clinicians, and 249 PRO researchers (Table 12).

Table 12. Part 3 Group-Level for Patients: Online Survey Participant Characteristics (Adapted From Tolbert et al).

Table 12

Part 3 Group-Level for Patients: Online Survey Participant Characteristics (Adapted From Tolbert et al).

  • Survivors were a mean age of 58 years, 87% female, and 94% White. Breast cancer was the most common diagnosis (56%), and 46% were within 5 years of diagnosis. There were 23% with less than a college degree.
  • Clinicians were a mean age of 44 years, 54% female, and 70% White. They had been in practice for 16 years on average, and 44% were medical oncologists.
  • Researchers were a mean age of 45 years, 67% female, and 79% White. They most commonly reported expertise in PRO assessment, psychology, or sociology (54%), and 46% had >10 years' experience.

The in-person interviewees included 10 survivors (3 breast cancer survivors, 3 from Johns Hopkins, and 3 with < college degree). The 5 clinician interviewees included an oncology fellow; a nurse practitioner; and surgical, radiation, and medical oncologists—4 were from Johns Hopkins.

Comparators and survey design

The formats tested for group-level data presented to patients included 3 different proportions (pie charts, bar graphs, and icon arrays) (Figures 4a-4c) and 3 different line graphs of mean scores over time (higher = “more” of the outcome, higher = “better” for all outcomes, normed to US general population) (Figures 4d-4f). Respondents were shown all 3 proportions, with the order randomized so that each was shown first, second, and third for a third of respondents. Respondents saw only 1 version of the line graphs, so a third of the population saw line graphs with higher = “more,” a third higher = “better,” and a third normed. Half the respondents saw the line graph before the proportions and half saw the line graphs after the proportions. Thus, there were a total of 6 survey versions (3 proportion format orders with line graphs either before or after).

Figure 4a. Pie Charts: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4a

Pie Charts: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4b. Bar Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4b

Bar Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4c. Icon Arrays: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4c

Icon Arrays: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4d. “More” Line Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4d

“More” Line Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4e. “Better” Line Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4e

“Better” Line Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4f. Normed Line Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Figure 4f

Normed Line Graphs: Group-Level Data for Patients (Adapted From Tolbert et al).

Two accuracy of interpretation questions were asked for the first proportion format seen and 1 each for the second and third proportion formats seen (eg, “At 9 months, on which treatment did more patients improve with regard to doing PHYSICAL activities?”). As with the individual-level application, the underlying data and questions were always the same for the first format seen, second format seen, and third format seen, but different from each other, with a screen warning of changes between formats.

Similarly, regardless of the line graph format shown, the underlying data and survey questions were the same. A total of 3 accuracy questions was asked on the line graph format (eg, “At 12 months, on which treatment do patients report better EMOTIONAL well-being?”).

Findings

For the 3 proportions tested (pie charts, bar graphs, icon arrays), all respondents were most likely to respond correctly to the first 2 accuracy questions if they saw the pie charts (Table 13).

Table 13. Part 3 Group-Level for Patients: Accuracy of Interpretation-Proportions.

Table 13

Part 3 Group-Level for Patients: Accuracy of Interpretation-Proportions.

Specifically, among survivors, 77% answered the first 2 questions correctly if they saw pie charts, vs 51% for bar graphs and 75% for icon arrays. The differences were more distinct among clinicians (91% pie charts, 70% bar graphs, 79% icon arrays) and researchers (94% pie charts, 66% bar graphs, 84% icon arrays). In the multivariable models (Table 14), bar graphs were statistically significantly less likely to be interpreted correctly than pie charts not only when the first format seen (OR, 0.22; P < .0001) but also across all format questions (OR, 0.39; P < .0001). Bar graphs were also less accurately interpreted than the icon arrays when the first format seen (OR, 0.30; P < .0001) and across all format questions (OR, 0.47; P < .0001). No statistically significant differences existed between pie charts and icon arrays.

Table 14. Part 3 Group Level for Patients: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings Proportions (Adapted From Tolbert et al).

Table 14

Part 3 Group Level for Patients: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings Proportions (Adapted From Tolbert et al).

Pie charts were also most likely to be rated “somewhat” or “very” clear (Table 15): 90% of survivors (vs 73% for bar graphs and 55% for icon arrays), 91% of clinicians (vs 73% for bar graphs and 71% for icon arrays), and 87% of researchers (vs 83% for bar graphs and 64% for icon arrays). In the multivariable models, bar graphs and icon arrays were consistently less likely to be rated clear than pie charts, and bar graphs were more likely to be rated clear than icon arrays (Table 14). From the in-person interviews, respondents noted that pie charts were “easiest to read” and made it “immediately clear which treatment is better,” although some found them “difficult to interpret.” Among positive comments on the bar graphs, respondents described them as “easy to compare treatments side by side” and “very crisp, visually clean,” with negative comments noting “you have to concentrate to ascertain what they mean” and that they are “too clinical looking for the everyday patient.” Finally, respondents described the icon arrays as “cute and pleasant” and valued them for “represent[ing] people, which too often gets lost in looking at cancer statistics.” On the negative side, others noted that they looked “overwhelming” and the challenges of having “to sit and count the little people.”

Table 15. Part 3 Group-Level for Patients: Clarity Ratings for Proportions.

Table 15

Part 3 Group-Level for Patients: Clarity Ratings for Proportions.

Overall, pie charts were most likely to be selected as “most useful”: 64% of survivors, 44% of clinicians, and 39% of researchers (Table 16). The differences in preference percentages were statistically significant.

Table 16. Part 3 Group-Level for Patients: Proportion Selecting Each Proportion Format “Most Useful”.

Table 16

Part 3 Group-Level for Patients: Proportion Selecting Each Proportion Format “Most Useful”.

Among the line graphs, survivors and clinicians were most likely to answer all 3 accuracy questions correctly if they saw the version in which higher scores always indicate better outcomes: 56% of survivors (vs 41% for “more” and 40% for normed) and 70% of clinicians (vs 65% for “more” and 65% for normed). Researchers were more likely to answer all 3 accuracy questions correctly for higher scores indicating “more” of what is measured: 75% (vs 65% for “better” and 40% for normed) (Table 17). In the multivariable models, “better” line graphs were more likely to be interpreted accurately than “more” line graphs (OR, 1.43; P = .01) and normed line graphs (OR, 1.88; P = .04).

Table 17. Part 3 Group-Level for Patients: Accuracy of Interpretation-Line Graphs.

Table 17

Part 3 Group-Level for Patients: Accuracy of Interpretation-Line Graphs.

Normed line graphs were interpreted less accurately than “more” line graphs (OR, 0.76; P = .04) (Table 18).

Table 18. Part 3 Group-Level for Patients: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings-Line Graphs (Adapted From Tolbert et al).

Table 18

Part 3 Group-Level for Patients: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings-Line Graphs (Adapted From Tolbert et al).

Survivors, clinicians, and researchers who were exposed to the “better” line graphs were more likely to rate them “somewhat” or “very” clear (84% for survivors, 81% for clinicians, 85% for researchers) compared with those who saw the “more” line graphs (78%, 77%, 78%, respectively) and the normed line graphs (77%, 78%, 82%, respectively) (Table 19). The difference in the ratings was statistically significantly better for respondents who saw the “better” line graphs vs those who saw the “more” line graphs (OR, 1.51; P = .05); none of the other differences in clarity ratings were statistically significant (Table 18). From the qualitative data, across the line graph formats, respondents had many positive comments regarding depiction of trends over time: “Could show, for example if the relative differences are getting farther apart.” Respondents made numerous comments regarding the “more” line graphs in terms of the change in interpretation of higher scores between function and symptom domains: “The fact that the positive/negative scale changes between functioning and symptoms … makes error much, much more likely”; “Directional changes … can lead to interpretation errors if someone views the whole set quickly.” However, there were also some comments about the way the scoring was flipped on the “better” line graphs such that lines going up always indicated better outcomes even if the numerical score was lower. Finally, some respondents who saw the normed version found them to be confusing. Because respondents were exposed to only 1 version of the line graphs, there is no “overall preference” question.

Table 19. Part 3 Group-Level for Patients: Clarity Ratings for Line Graphs.

Table 19

Part 3 Group-Level for Patients: Clarity Ratings for Line Graphs.

Group-Level Data for Presentation to Clinicians

Study population

A total of 481 online respondents was randomized to evaluate group-level data formats for presentation to clinicians: 233 clinicians and 248 researchers (Table 20).

Table 20. Part 3 Group-Level for Clinicians: Online Survey Participant Characteristics (Adapted From Brundage et al).

Table 20

Part 3 Group-Level for Clinicians: Online Survey Participant Characteristics (Adapted From Brundage et al).

  • Clinicians were a mean age of 45 years, 55% female, and 74% White. They had been in practice for 17 years on average, and 55% were medical oncologists.
  • Researchers were a mean age of 44 years, 63% female, and 79% White. They most commonly reported expertise in PRO assessment, psychology, or sociology (38%), and 38% had > 10-years' experience.

The in-person interviewees included 10 clinicians, including 1 medical oncologist, 1 radiation oncologist, 1 urologist, 1 oncology nurse practitioner, 3 surgical oncologists, and 3 oncology fellows; 5 were from Johns Hopkins.

Comparators and survey design

The formats tested for group-level data presented to clinicians included 2 different proportions (pie charts, bar graphs) (Figures 5a-5b) and 9 different line graphs of mean scores over time. In all cases, the line graphs were shown before the proportion formats; half the respondents saw the pie charts first, half the bar graphs first. Respondents saw only 1 version of the line graphs, so a third of the population saw line graphs with higher = “more,” a third higher = “better,” and a third normed.

Figure 5a. Pie Charts: Group-Level Data for Clinicians (Adapted From Brundage et al).

Figure 5a

Pie Charts: Group-Level Data for Clinicians (Adapted From Brundage et al).

Figure 5b. Bar Graphs: Group-Level Data for Clinicians (Adapted From Brundage et al).

Figure 5b

Bar Graphs: Group-Level Data for Clinicians (Adapted From Brundage et al).

However, 3 different variations in these line graph versions were included: plain (P values only; no confidence limits or clinical significance), clinically significant differences indicated by an asterisk (but no confidence limits), and confidence limits in addition to the asterisk indicating clinical significance (Figures 5c-5g). The order in which the plain, clinical significance, and confidence limit line graphs were displayed was randomized such that each was shown first, second, or third, for a third of the sample. Thus, there was a total of 18 survey versions: 3 line graph types (“more,” “better,” normed) by 3 orders of line graph variations (plain, clinical significance, confidence limits) by 2 orders of proportions (pie charts or bar graphs first). As with the other applications, the underlying data were the same across orders; only the formats differed.

Figure 5c. “More” Line Graphs: Group-Level Data for Clinicians (Plain Version) (Adapted From Brundage et al).

Figure 5c

“More” Line Graphs: Group-Level Data for Clinicians (Plain Version) (Adapted From Brundage et al).

Figure 5d. “More” Line Graphs: Group-Level Data for Clinicians (Clinical Significance Version) (Adapted From Brundage et al).

Figure 5d

“More” Line Graphs: Group-Level Data for Clinicians (Clinical Significance Version) (Adapted From Brundage et al).

Figure 5e. “More” Line Graphs: Group-Level Data for Clinicians (Confidence Limit Version) (Adapted From Brundage et al).

Figure 5e

“More” Line Graphs: Group-Level Data for Clinicians (Confidence Limit Version) (Adapted From Brundage et al).

Figure 5f. Example of “Better” Line Graphs: Group-Level Data for Clinicians (Confidence Limit Version) (Adapted From Brundage et al).

Figure 5f

Example of “Better” Line Graphs: Group-Level Data for Clinicians (Confidence Limit Version) (Adapted From Brundage et al).

Figure 5g. Example of Normed Line Graphs: Group-Level Data for Clinicians (Confidence Limit Version) (Adapted From Brundage et al).

Figure 5g

Example of Normed Line Graphs: Group-Level Data for Clinicians (Confidence Limit Version) (Adapted From Brundage et al).

Similarly, for the most part, the accuracy questions asked were based on the order of presentation (ie, 2 “standard” accuracy questions were asked for the first line graph format seen [eg, “On which treatment do patients report worse FATIGUE over time?”]). However, in addition, questions specific to the line graph variation were asked regardless of when that variation fell in the order. For example, a question was asked about domains with clinically important differences between treatments at 9 months whenever the clinical significance variation was shown, and an additional question regarding at which time points the differences between treatments were statistically significant was asked whenever the confidence limit variation was shown. Notably, while publications tend to show error bars on the treatment curves, as we have done in this report, evaluating the degree of overlap can only hint at the statistical significance, and using confidence intervals for the difference between treatments is preferred.

In summary, if the confidence limit format was shown first, 4 different accuracy questions were asked (2 standard questions, 1 on clinical importance, 1 on statistical significance), whereas if the plain line graph was shown first, only the 2 standard accuracy questions were asked. This design enabled us to compare the accuracy of interpretation across the “more,” “better,” and normed formats, as well as to detect differences in accuracy of interpretation with the addition of clinical significance with or without confidence limits.

For the proportions, we asked 2 accuracy questions on the first format seen and 2 on the second format seen (eg, “At 9 months, on which treatment did more patients worsen with regard to PAIN?”).

Findings

For the proportions, clinicians who saw pie charts first were more likely to respond correctly to the first 2 accuracy questions than those who saw bar graphs first (25% vs 19%), although the opposite was true for researchers (11% for pie charts vs 16% for bar graphs) (Table 21). The reason for the low accuracy rate overall is that 1 of the questions asked which treatment was better with a P value of .10; most respondents selected the treatment that was better in absolute terms rather than “about the same,” which is technically correct due to the insignificant P value. In the multivariable models (Table 22), no statistically significant differences in interpretation accuracy existed between pie charts and bar graphs. In terms of clarity ratings, 74% of clinicians and 71% of researchers rated pie charts “somewhat” or “very” clear, and 70% of clinicians and 71% of researchers rated bar graphs “somewhat” or “very” clear (Table 23). These differences were not statistically significantly different in the multivariable models (Table 22). In the qualitative interviews, several respondents were very negative about pie charts: “I hate pie graphs”; “Pie graphs like these are simpler to read but don't look as ‘scientific.’” However, others noted “the pie chart is much easier to understand … there are only 3 ways to stratify patients worse and improved and about the same and that they add up to about 100. On the bar graphs it just took me a while to understand what was really being displayed.” Bar graphs were described as “boring,” although other respondents liked them: “it's just easier to look at the bar graphs and appreciate what's bigger, what's smaller, and what's the same in a more simplistic fashion.” Overall, clinicians were evenly split between pie charts and bar graphs; researchers preferred bar graphs (56% vs 44%), but the difference was not statistically significant (Table 24).

Table 21. Part 3 Group-Level for Clinicians: Accuracy of Interpretation-Proportions.

Table 21

Part 3 Group-Level for Clinicians: Accuracy of Interpretation-Proportions.

Table 22. Part 3 Group-Level for Clinicians: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings-Proportions.

Table 22

Part 3 Group-Level for Clinicians: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings-Proportions.

Table 23. Part 3 Group-Level for Patients: Clarity Ratings for Proportions.

Table 23

Part 3 Group-Level for Patients: Clarity Ratings for Proportions.

Table 24. Part 3 Group-Level for Clinicians: Proportion Selecting Each Proportion Format “Most Useful”.

Table 24

Part 3 Group-Level for Clinicians: Proportion Selecting Each Proportion Format “Most Useful”.

Among the line graphs, clinicians and researchers were most likely to answer both “standard” accuracy questions correctly if they saw the version in which higher scores always indicate better outcomes: 68% of clinicians (vs 62% for “more” line graphs and 61% for normed line graphs) and 68% of researchers (vs 64% for “more” and 54% for normed) (Table 25). In the multivariable models, “better” line graphs were more likely to be interpreted accurately than normed line graphs (OR, 1.55; P = .04) (Table 26). Table 27 displays the clarity ratings for clinicians and researchers randomized to “more,” “better,” or normed line graphs. In the multivariable models, the normed versions were less likely to be rated “somewhat” or “very” clear (OR, 0.61; P = .005) and “very” clear (OR, 0.66; P = .005) compared with “more”; the “better” versions were more likely to be rated “somewhat” or “very” clear (OR, 1.53; P = .01) and “very” clear (OR, 1.91; P < .0001) compared with the normed (Table 26).

Table 25. Part 3 Group-Level for Clinicians: Accuracy of Interpretation-Line Graphs.

Table 25

Part 3 Group-Level for Clinicians: Accuracy of Interpretation-Line Graphs.

Table 26. Part 3 Group-Level for Clinicians: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings-Line Graphs.

Table 26

Part 3 Group-Level for Clinicians: Multivariate Model Results for Accuracy of Interpretation and Clarity Ratings-Line Graphs.

Table 27. Part 3 Group-Level for Clinicians: Clarity Ratings for Line Graphs.

Table 27

Part 3 Group-Level for Clinicians: Clarity Ratings for Line Graphs.

Respondents' comments on the different line graph versions echo themes previously identified. Respondents who saw the “more” versions were confused by the changing directionality: “whether it's physical or fatigue is in one graph lower and in one graph higher … is somewhat confusing and requires very close attention to detail to figure out what the true comparison is.”

Across clinicians and researchers, and regardless of randomization to “better,” “more,” or normed versions, the variation with asterisks indicating clinical significance was most likely to be rated “somewhat” or “very” clear, compared with the plain or confidence limit variations (Table 27). In the qualitative comments, respondents valued additional statistical information, with some preferring asterisks only (“I believe the asterisk format is the easiest in showing patient results without the confidence intervals”) and others preferring confidence limits in addition (“I feel that they offer more statistical information that is helpful to the clinician”). Several respondents commented that using asterisks to indicate clinical importance was confusing since asterisks are so commonly used for statistical significance. Others questioned how clinical importance was determined. Overall, clinicians and researchers preferred the confidence limit variation (52% and 49%, respectively) (Table 28). These differences in preference percentages were statistically significant.

Table 28. Part 3 Group-Level for Clinicians: Proportion Selecting Each Line Graph Variation “Most Useful”.

Table 28

Part 3 Group-Level for Clinicians: Proportion Selecting Each Line Graph Variation “Most Useful”.

Discussion

Context for Study Results

We conducted a 3-part mixed-methods study to investigate approaches for presenting PRO data to patients and clinicians to promote understanding and use. Part 1 identified aspects of current presentation formats that were helpful or confusing. Both survivors and clinicians preferred line graphs for displaying individual-level patient data. For presenting group-level data, clinicians valued additional statistical information (eg, P values, confidence limits), but survivors found this information confusing. This finding is consistent with one of the themes that emerged from our Part 2 literature review (described above)—that patients tend to prefer simpler graphs than clinicians do. Thus, in Parts 2 and 3, we addressed presentation of group-level data to patients and clinicians separately. Part 2 involved an innovative, iterative, stakeholder-engaged approach to develop improved presentation formats, which we then tested in Part 3.

Part 3 involved a large, broad-based internet survey of key stakeholder groups. Findings suggest that for individual-level data, having higher scores always indicate better outcomes improved accuracy and clarity, and threshold lines had better clarity ratings than the alternatives. The “better” line graph formats were also more accurately interpreted and more likely to be rated clear for group-level data for patients. For group-level data for clinicians, the normed versions were inferior, with no clear winner between “better” and “more” versions. Across applications, having descriptive labels for the scores on the y-axis were considered helpful, as were explanations for how to interpret upward trending lines. For displaying proportion data, pie charts were more accurately interpreted and more likely to be rated clear for presentation to patients; no significant differences existed between pie charts and bar graphs for presentation to clinicians. The proportion formats tested were intended to convey “gist” rather than “verbatim” estimates,36 which may have made the precision of icon arrays seem less useful in this context.

As noted in the Part 2 literature review, few studies have investigated the best approaches for presenting PRO data to patients and clinicians to promote understanding and use; our study contributes to the science through its strong evidence base regarding the accuracy of interpretation, clarity, and preferences of different PRO data display formats.

Strengths, Limitations, and Generalizability

The findings from this study should be interpreted within the context of the strengths and limitations of its design and conduct. The application of mixed methods was particularly valuable to the interpretation of our results. We know not only how participants answered our questions but also why they answered that way. To be as broad as possible, we conducted the study through the JHCRN, enabling recruitment of diverse survivor and clinician populations. However, the Part 3 internet survey used a convenience sample of stakeholder groups and resulted in a cancer survivor population that was predominantly White, female, from the United States, and highly educated. While caution should be used when generalizing the internet survey results to populations with greater diversity, the in-person interviews conducted through Parts 1 through 3 enabled purposive sampling, including for lower education. All study parts were conducted in English, limiting the generalizability of the findings to non-English speakers. Our cancer survivor samples had a median age in the 60s; this matches with the cancer survivor population overall, whose median age at diagnosis is 66,37 but may not be reflective of younger populations. We have reported the results for survivors and clinicians (and for Part 3, researchers) separately. However, even for the Part 3 online survey, given the 30 different survey versions and the homogeneity of our sample, sample sizes limited the ability to conduct subgroup analyses. However, these 30 different versions to which participants were randomly assigned enabled us to control for order effects, thus minimizing bias in our findings. This study was conducted in a cancer setting, and while we expect the results are generalizable to other disease groups, this requires further study. It would also be helpful to explore whether these findings are replicated in populations with greater diversity in age, race, sex, education level, country of origin, and language.

Another strength of our study was the stakeholder engagement, including the implementation of several novel approaches. The research questions were motivated by previous feedback from patients and clinicians; our SAB was involved in helping us design the data collection instruments, reviewing the results, and discussing next steps. They also helped with recruiting participants for the Part 3 online survey. Part 2 used the innovative approach of recruiting volunteers from Part 1 to partner with the research team to review the results and develop improved presentation formats.

Future Research

The findings from this research agenda, along with data from other, smaller studies,32,38,39 provide a valuable evidence base for developing recommendations for PRO data display. We have been funded by PCORI to conduct a modified-Delphi process with key stakeholders to review the findings from this study, and other research, to develop recommended best practices for PRO data display.

Conclusions

Patient-reported outcomes have enormous potential to promote patient-centered care. However, patients and clinicians need to be able to interpret score meaning for this potential to be realized. The results from this study, conducted in cancer survivors and clinicians, support for individual-level data, having higher scores always indicate better outcomes and threshold lines to indicate possibly concerning results. For group-level data, “better” line graph formats are preferred for patients, and for clinicians, the findings warn against presenting normed scores. Across applications, descriptive labels along the y-axis and for directionality are recommended. Pie charts are the preferred approach for presenting proportion data to patients, although clinicians had no clear preference. The results from this study, along with other, smaller studies, will now be used as part of a modified-Delphi process to recommend best practices for PRO data display. Additional research in noncancer populations is needed to evaluate the generalizability of these findings to other disease areas.

References

1.
US Food and Drug Administration. Guidance for industry. Patient reported outcome measures: use in medical product development to support labeling claims. Federal Register. 2009;74(35):65132-65133.
2.
Acquadro C, Berzon R, Dubois D, et al. Incorporating the patient's perspective into drug development and communication: an ad hoc task force report of the Patient-Reported Outcomes (PRO) Harmonization Group meeting at the Food and Drug Administration, February 16, 2001. Value Health. 2003;(6):522-531. [PubMed: 14627058]
3.
Greenhalgh J. The applications of PROs in clinical practice: what are they, do they work, and why? Qual Life Res. 2009;(18):115-123. [PubMed: 19105048]
4.
Snyder CF, Aaronson NK, Choucair AK, et al. Implementing patient-reported outcomes assessment in clinical practice: a review of the options and considerations. Qual Life Res. 2012;(21):1305-1314. [PubMed: 22048932]
5.
Snyder CF, Aaronson NK. Use of patient-reported outcomes in clinical practice. Lancet. 2009;(374):369-370. [PubMed: 19647598]
6.
Greenhalgh J, Meadows K. The effectiveness of the use of patient-based measures of health in routine practice in improving the process and outcomes of patient care: a literature review. J Eval Clin Pract. 1999;(5):401-416. [PubMed: 10579704]
7.
Marshall S, Haywood K, Fitzpatrick R. Impact of patient-reported outcome measures on routine practice: a structured review. J Eval Clin Pract. 2006;(12):559-568. [PubMed: 16987118]
8.
Haywood K, Marshall S, Fitzpatrick R. Patient participation in the consultation process: a structured review of intervention strategies. Patient Educ Couns. 2006;(63):12-23. [PubMed: 16406464]
9.
Velikova G, Booth L, Smith AB, et al. Measuring quality of life in routine oncology practice improves communication and patient well-being: a randomized controlled trial. J Clin Oncol. 2004;(22):714-724. [PubMed: 14966096]
10.
Berry DL, Blumenstein BA, Halpenny B, et al. Enhancing patient-provider communication with the Electronic Self-Report Assessment for Cancer: a randomized trial. J Clin Oncol. 2011;(29):1029-1035. [PMC free article: PMC3068053] [PubMed: 21282548]
11.
Santana M-J, Feeny D, Johnson JA, et al. Assessing the use of health-related quality of life measures in the routine clinical care of lung-transplant patients. Qual Life Res. 2010;(19):371-379. [PubMed: 20146009]
12.
Detmar SB, Muller MJ, Schornagel JH, Wever LDV, Aaronson NK. Health-related quality-of-life assessments and patient-physician communication. a randomized clinical trial. JAMA. 2002;(288):3027-3034. [PubMed: 12479768]
13.
Cleeland CS, Wang XS, Shi Q, et al. Automated symptom alerts reduce postoperative symptom severity after cancer surgery: a randomized controlled trial. J Clin Oncol. 2011;(29):994-1000. [PMC free article: PMC3068055] [PubMed: 21282546]
14.
McLachlan SA, Allenby A, Matthews J, et al. Randomized trial of coordinated psychosocial interventions based on patient self-assessment versus standard care to improve the psychosocial functioning of patients with cancer. J Clin Oncol. 2001;(19):4117-4125. [PubMed: 11689579]
15.
Basch E, Deal AM, Kris MG, et al. Symptom monitoring with patient-reported outcomes during routine cancer treatment: a randomized controlled trial. J Clin Oncol. 2016;(34):557-565. [PMC free article: PMC4872028] [PubMed: 26644527]
16.
Jensen RE, Snyder CF, Abernethy AP, et al. A review of electronic patient reported outcomes systems used in cancer clinical care. J Oncol Pract. 2014;(10):e215-e222. [PMC free article: PMC4094646] [PubMed: 24301843]
17.
Wu AW, Jensen RE, Salzburg C, Snyder C. Advances in the use of patient reported outcome measures in electronic health records: including case studies. Landscape Review prepared for: PCORI National Workshop to Advance the Use of PRO measures in Electronic Health Records. November 19-20, 2013, Atlanta, GA. -Workshop-EHR-Landscape-Review-111913.pdf. Accessed May 6, 2016.
18.
Till JE, Osoba D, Pater JL, Young JR. Research on health-related quality of life: dissemination into practical applications. Qual Life Res. 1994;3(4):279-283. [PubMed: 7812281]
19.
Au H-J, Ringash J, Brundage M, et al. Added value of health-related quality of life measurement in cancer clinical trials: the experience of the NCIC CTG. Expert Rev Pharmacoecon Outcomes Res. 2010;10(2):119-128. [PubMed: 20384559]
20.
Stacey D, Légaré F, Col NF, et al. Decision aids for people facing health treatment or screening decisions. Cochrane Database Syst Rev. 2014;(1):CD001431. doi:10.1002/14651858.CD001431.pub5 [PubMed: 24470076] [CrossRef]
21.
Brundage M, Bass B, Ringash J, Foley K. A knowledge translation challenge: clinical use of quality of life data from cancer clinical trials. Qual Life Res. 2011;(20):979-985. [PubMed: 21279446]
22.
Bezjak A, Ng P, Skeel R, Depetrillo AD, Comis R, Taylor KM. Oncologists' use of quality of life information: results of a survey of Eastern Cooperative Oncology Group physicians. Qual Life Res. 2001;(10):1-13. [PubMed: 11508471]
23.
Online support for clinical outcomes assessments. ePROVIDE website. Accessed May 6, 2016. https://eprovide​.mapi-trust.org/
24.
Snyder CF, Jensen R, Courtin SO, Wu AW. PatientViewpoint: a website for patient-reported outcomes assessment. Qual Life Res. 2009;(18):793-800. [PMC free article: PMC3073983] [PubMed: 19544089]
25.
Jones JB, Snyder CF, Wu AW; Website for Outpatient QOL Assessment Research Network. Issues in the design of internet-based systems for collecting patient-reported outcomes. Qual Life Res. 2007;(16):1407-1417. [PubMed: 17668293]
26.
Abernethy AP, Wheeler JL, Zafar SY. Management of gastrointestinal symptoms in advanced cancer patients: the rapid learning cancer clinic model. Curr Opin Support Palliat Care. 2010;(4):36-45. [PMC free article: PMC2871247] [PubMed: 19952928]
27.
PROMIS software demonstration. HealthMeasures system website. Accessed May 6, 2016. http://nihpromis​.org​/software/demonstration
28.
Brundage M, Bass B, Davidson J, et al. Patterns of reporting health-related quality of life outcomes in randomized clinical trials: implications for clinicians and quality of life researchers. Qual Life Res. 2011;(20):653-664. [PubMed: 21110123]
29.
Brundage MD, Smith KC, Little EA, Bantug ET, Snyder CF; PRO Data Presentation Stakeholder Advisory Board. Communicating patient-reported outcome scores using graphic formats: results from a mixed methods evaluation. Qual Life Res. 2015;(24):2457-2472. [PMC free article: PMC4891942] [PubMed: 26012839]
30.
Aaronson NK, Ahmedzai S, Bergman B, et al. The European Organization for Research and Treatment of Cancer QLQ-C30: a quality-of-life instrument for use in international clinical trials in oncology. J Natl Cancer Inst. 1993;(85):365-376. [PubMed: 8433390]
31.
Smith KC, Brundage MD, Tolbert E, et al. Engaging stakeholders to improve presentation of patient-reported outcomes data in clinical practice. Support Care Cancer. 2016;(24):4149-4157. [PubMed: 27165054]
32.
Bantug ET, Coles T, Smith KC, et al. Graphical displays of patient-reported outcomes (PRO) for use in clinical practice: what makes a PRO picture worth a thousand words? Patient Educ Couns. 2016;(99):483-490. [PubMed: 26603445]
33.
Snyder C, Smith K, Bantug E, et al. What do these scores mean? presenting patient-reported outcomes data to patients and clinicians to improve interpretability. Cancer. 2017;(123):1848-1859. [PMC free article: PMC5419857] [PubMed: 28085201]
34.
Tolbert E, Brundage M, Bantug E, et al. Picture this: presenting patient-reported outcome research study results to patients. Med Decis Making. 2018;38(8):994-1005. [PMC free article: PMC6221949] [PubMed: 30132393]
35.
Brundage M, Tolbert E, Blackford A, et al. Presenting comparative study PRO results to clinicians and researchers: beyond the eye of the beholder. Qual Life Res.2018; 27(1):75-90. [PMC free article: PMC5770492] [PubMed: 29098606]
36.
Corbin JC, Reyna VF, Weldon RB, Brainerd CJ. How reasoning, judgment, and decision making are colored by gist-based intuition: a fuzzy-trace theory approach. J Appl Res Mem Cogn. 2015;(4):344-355. [PMC free article: PMC4671075] [PubMed: 26664820]
37.
Rick factors-age. National Cancer Institute website. Accessed June 22, 2017.
38.
Kuijpers W, Giesinger JM, Zabernigg A, et al. Patients' and health professionals' understanding of and preferences for graphical presentation styles for individual-level EORTC QLQ-C30 scores. Qual Life Res. 2016;(25):595-604. [PMC free article: PMC4759250] [PubMed: 26353905]
39.
Hartzler AL, Izard JP, Dalkin BL, Mikles SP, Gore JL. Design and feasibility of integrating personalized PRO dashboards into prostate cancer care. J Am Med Inform Assoc. 2016;(23):38-47. [PMC free article: PMC5009933] [PubMed: 26260247]

Acknowledgments

The Johns Hopkins Clinical Research Network site investigators and staff include Ravin Garg, MD, and Steven P. DeMartino, CCRC, CRT, RPFT (Anne Arundel Medical Center); Melissa Gerstenhaber, MAS, MSN, RN, CCRN (JHCRN/Anne Arundel Medical Center); Gary Cohen, MD, and Cynthia MacInnis, BS, CCRP (Greater Baltimore Medical Center); James Zabora, ScD, MSW (Inova Health System); Sandra Schaefer, BSN, RN, OCN (JHCRN/Inova Health System); Paul Zorsky, MD, Lynne Armiger, MSN, CRNP, ANP-C, Sandra L. Heineken, BS, RN, OCN, and Nancy J. Mayonado, MS (Peninsula Regional Medical Center); Michael Carducci, MD (Johns Hopkins Sibley Memorial Hospital); and Carolyn Hendricks, MD, Melissa Hyman, RN, BSN, OCN, and Barbara Squiller, MSN, MPH, CRNP (Suburban Hospital). We are most appreciative to all the study participants, with special thanks to the Part 1 participants who volunteered for the Part 2 work groups, and to all the individuals and organizations that assisted us in circulating the internet survey.

Research reported in this report was [partially] funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (323) Further information available at: https://www.pcori.org/research-results/2012/testing-ways-display-patient-reported-outcomes-data-patients-and-clinicians

Appendix

Survey (PDF, 6.4M)

Original Project Title: Presenting Patient-Reported Outcomes Data to Improve Patient and Clinician Understanding and Use
PCORI ID: 323

Suggested citation:

Snyder C, Brundage M, Smith KC, et al. (2018). Testing Ways to Display Patient-Reported Outcomes Data for Patients and Clinicians. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/12.2018.CE.323

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

*

Institutional affiliation has changed since the time of study conduct; original affiliation is listed.

Copyright © 2018. Johns Hopkins University. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK592644PMID: 37315167DOI: 10.25302/12.2018.CE.323

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (13M)

Other titles in this collection

Related information

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...