Part One. Review of Existing Guidance Documents
Key Points
- Publicly available guidance discussing when and how to conduct a MTC as well as how to interpret and report the results of such analysis is summarized.
- The majority of guidance is applicable to network meta-analysis in general, and not specific to MTC.
- Guidance is provided from many organizations including: Health Information and Quality Authority, ISPOR, AHRQ Effective Health Care (EHC) Program, DERP, CRD, Canadian Agency for Drugs and Technologies in Health (CADTH), Australian Department of Health and Ageing, NICE, Health Care Knowledge Centre in Belgium, German Institute for Quality and Efficiency in Health Care, Haute Autorite de Sante, National Department of Health – Republic of South Africa and the Cochrane Collaboration.
- Guidance from these organizations is not comprehensive and many aspects are not fully commented on. This reflects the lack of definitive evidence in the literature on these approaches and the need for future research.
- Either a Bayesian or Frequentist framework can be used to conduct a MTC.
- Limitations of the Lumley Frequentist method include: it is restricted to studies with at least one closed loop, it does not account for correlations that may exist between effect estimates when they are obtained from a single multi-arm trial, and there are weaknesses in situations where zero cells are common.
- These limitations can be addressed through special preparations such as using a small increment to address zero cells and adding steps to adjust for correlations between effect estimates.
- Limitations of the Bayesian method include: it requires specification of noninformative priors, it is more complex to understand, and more difficult to use the software.
- Regardless of the method used to conduct the MTC, homogeneity and consistency of factors and event rates between direct and indirect comparisons is paramount if network meta-analysis is to be conducted.
- Homogeneity and consistency should always be assessed for as an a priori component of the review process.
- What is regarded as homogeneous and consistent enough is not well defined and is a subjective determination.
- Some organizations recommend presenting direct and indirect evidence separately and if deemed consistent, performing network meta-analysis/MTC.
- Sensitivity analyses should include testing alternative specifications of the prior distribution to assess robustness of model results.
- For the Bayesian method, assessment and reporting of model fit is recommended.
- ISPOR provides a comprehensive checklist for conducting and synthesizing network meta-analysis including MTC.
- Reporting the study selection process, providing a description of included individual studies, and use of a graphical representation of the network results can help to improve transparency.
Detailed Analysis
Although our objective for part one was to focus on guidance for conducting a MTC, the majority of guidance available is applicable to network meta-analysis in general. When available, we also present guidance specific to MTCs, using either Bayesian or Frequentist methods.
General Description of Guidance Documents
Searches identified 25 relevant documents from which we extracted information. These included documents from regulatory/government-affiliated groups and nongovernmental organizations and collaborations involved in comparative effectiveness review and health technology assessment. Appendix A provides noteworthy verbatim statements from the 25 documents organized according to the categories listed in the Methods section. Most guidance is for network meta-analysis in general, regardless of the specific methodology used to conduct the analysis. The documents identified include:
- A guidance document from Health Information and Quality Authority (2011)7
- A guidance document from AHRQ's EHC Program (2010)3
- A guidance document from DERP (2011)14
- A guidance document from CRD (2009)15
- A published proceedings paper from the Ad Hoc Network Meta-analysis Methods Meeting Working Group (Li et al, 2011)16
- A guidance document from the NICE (2008)19
- Seven guidance documents from NICE's Decision Support Unit (DSU) (each updated in 2012)20
- A guidance document from the Health Care Knowledge Centre in Belgium (2008)21
- A guidance document from the German Institute for Quality and Efficiency in Health Care (2011)22
- A guidance document from Haute Autorite de Sante (2009)23
- A guidance document from the National Department of Health – Republic of South Africa (2010)24
When To Conduct a Network Meta-Analysis/MTC
The definition or meaning of the term network meta-analysis varies across the identified guidance documents.9-11, 23 Often these documents use terms such as “indirect treatment comparison,” “multiple treatment comparison,” “multiple treatment meta-analysis,” “mixed treatment meta-analysis,” (MTM) and “mixed treatment comparison” as synonyms for network meta-analysis.9,10,16,18,20,25 When used in this way, these terms are meant to represent the simultaneous synthesis of evidence of all pairwise comparisons across three or more interventions.9,10,20,25 However, other documents use the above terms more definitely in order to differentiate the statistical analysis framework to be applied. Many guidance documents used the term MTC as we do in this report, specifically to describe a statistical approach used to analyze a network of evidence with more than two interventions which are being compared indirectly, and at least one pair of interventions compared both directly and indirectly.9,10,23 However, in some cases, MTC is referred to as “an extension”25 or “special case”9,10 of only a Bayesian framework.7,10 Of note, a Bayesian framework can be used for, but is not restricted to, synthesizing networks with at least one closed loop.9,10,23 Lumley's mixed model approach is used to describe one common Frequentist mixed model method for “analyzing a combination of direct and indirect evidence where there is at least one closed loop of evidence connecting the two technologies of interest.”7,11,12,23 Other similar mixed model methods exits.27,28
A key component of nearly all documents is a discussion around when conducting a network meta-analysis is justified. Here the documents are almost entirely in agreement that synthesizing direct evidence only (from sufficient head-to-head or randomized controlled trials) “should take precedence”25 or is “preferred”14,17 over analysis containing indirect evidence.19,21,22 However, in the absence of sufficient direct evidence, network meta-analysis “can be considered as an additional analytic tool”,3,19,21-23 although one document specifically states “pursuit of qualitative or quantitative indirect comparison is never required…”.14 In cases where analysis of both direct and indirect comparisons is undertaken, two guidance documents suggest the approaches should be considered and reported separately.19,25 Of note, a few documents9,10,19,23 appear to advocate for conducting MTC even in the presence of reasonable direct evidence, suggesting the combination of indirect and direct evidence may “add information that is not available from head-to-head comparison”,19 “strengthen the assessment between treatments directly evaluated,”9,10 and “yield a more refined and precise estimate of the interventions directly compared and broaden inference to the population sampled because it links and maximizes existing information within the network of treatment comparisons.”9,10
An additional key discussion theme of identified guidance documents revolves around the assumptions of “homogeneity” and “consistency” (also referred to as “exchangeability” in some documents) that must be met in order to undertake network meta-analysis. Documents agreed that the validity of a network meta-analysis relies on the included studies or trials being similar in all factors (other than the intervention) that may affect outcomes, an assumption also important in standard pair-wise meta-analysis and that direct and indirect estimates are similar.
How To Conduct a Network Meta-Analysis/MTC
A number of the identified guidance documents reaffirmed that the same “good research practices” or “principles of good practice” used when conducting a traditional systematic review and meta-analyses should be carried over to conducting a network meta-analysis.9,10,19,23 These documents often reminded readers, “to minimize error and ensure validity of findings from meta-analyses, the systematic review, whether it involves a standard, pairwise meta-analysis or a network meta-analysis, must be designed rigorously and conducted carefully.”16 This includes an a priori declaration of the intent to conduct a network meta-analysis and clearly stating in the protocol the methods and implementation methods to be utilized.
A particularly variable area of focus of these documents includes strategies for systematically searching for studies. While many documents suggest following “conventional guidance” when conducting systematic literature searches for a network meta-analysis, some documents also acknowledge the additional time and resources necessary to conduct a network meta-analysis search due to larger number of interventions to assess. While one document suggests an investigator might consider restricting a search to the minimum number of interventions of interest,7 another document emphasizes that “different specification of eligibility criteria may result in differences in the structure or extent of a network, leading to discrepant findings for network meta-analyses on the same topic.”26 Moreover, many documents acknowledged that as more interventions are included in a network meta-analysis, the greater that uncertainty is reduced,20 precision is increased26 and “the ability to establish whether various sources of evidence ‘agree’ with each other” is enhanced.26 In doing so, the documents suggest that network meta-analyses may need to include comparisons not of direct interest (e.g., placebo controls and therapies no longer used in current practice) as they may provide valuable information for the primary comparison(s) through indirect means.20,26 To this end, documents propose various strategies to balance validity and efficiency, and with the understanding that in some cases inclusion of therapies no longer used in clinical practice may at times be inappropriate as erroneous conclusions may be drawn on the efficacy and/or safety of these outdated treatments versus standards of care. Some guidance suggests these strategies include restricting to direct evidence only and broadening the search only after demonstrating that no direct data exists,17 using “iterative search methods” such as those proposed by Hawkins et al.,29 and using previously published, good quality and up-to-date systematic reviews to augment a search.16 While not uniformly done, some guidelines state16 or imply20 that evidence should be derived from RCTs only.16,17
Perhaps the most comprehensive guidance on the planning and design of a network meta-analysis is available in the ISPOR document, which provided “a checklist of good research practices”.10 Below is the checklist, which includes guidance in the areas of search strategies, data collection, statistical analysis planning, data analysis and reporting (Table 1). Of note, the checklist often refers researchers to conventional guidelines on conducting meta-analysis.
Many of the identified guidance documents provided advantages and disadvantages for the use of the different analysis frameworks (i.e., Frequentist and Bayesian methods) to network meta-analysis. Documents highlight that the “pattern” of the network of included studies may often dictate the framework used.7,9-12,23 Networks of studies that do not contain a “closed loop” such as a simple star, star or ladder pattern (Figure 1) cannot be analyzed using the Frequentist method described by Lumley, since a closed loop design is needed for calculating the estimate of incoherence, which is then used to construct 95% confidence intervals for the indirect estimate(s). However, those networks containing a closed loop (Figure 1) can be analyzed using the two of the more complex approaches, either Bayesian or Frequentist methods. The Bayesian method of conducting a MTC can be used to analyze any network pattern.
Documents list a number of additional considerations when choosing between a Frequentist and Bayesian framework for analyzing these more complex closed loop networks of studies. Perhaps the most frequent consideration noted is the potential advantage of Bayesian methods in that “the method naturally leads to a decision framework that supports decisionmaking”9-11,23 by facilitating ranking of compared interventions.
With respect to statistical modeling, most guidance documents refer reviewers to the paper by Lumley (2002) for the statistical guidance in implementing Frequentist MTC when multi-arm trials are not present, including the necessary code. For MTCs with a Bayesian framework, the DSU of NICE has built a set of “core models” based upon the framework of generalized linear modeling.20 The guidance document provides for Normal, Binomial, Poisson and Multinomial likelihoods, with identity, logit, log, complementary log-log, and probit link functions. Moreover, these “core models” can accommodate the assumptions of fixed-effect and random-effects settings, as well as multi-arm trials and multi-/shared parameter models for trials reporting results in different formats (trial versus group level data).
Identified guidance documents also comment on additional statistical modeling issues related to MTC conducted with either Frequentist or Bayesian methods. The merits of using a fixed- or random-effects model are discussed in a number of documents. While fundamentally, either a fixed or random-effects model can be used,9,10 at least one document18 states a preference for using the random-effects approach “because the standard error obtained from a fixed effect analysis will be too small if there is heterogeneity between trials (beyond random variation)…”, and due to the fact “that there may be additional heterogeneity in an indirect comparison compared to a direct comparison.”18 A few documents acknowledge the potential benefit of incorporating study-level covariates into the model (extending the network to include treatment-by-covariate interactions or meta-regression); however, they also note concerns in the implementation as too few studies are often included in such meta-analyses which increases the potential for ecological bias. Guidance from NICE19 highlights that when a comparison of the results from single treatment arms from different RCTs is undertaken, the data must be treated as observational and appropriate steps taken to adjust for possible bias and increased uncertainty (including extending network to include treatment-by-covariate interactions or meta-regression). To this end, guidance typically suggest such naïve analyses “are completely untrustworthy” and should never be undertaken.23
The implementation of Bayesian methods in a MTC was discussed in detail in many documents. Of note, the guidance from the Haute Autorite de Sante provides a detailed description of Markov chain Monte Carlo methods (simulation-based methods which can be used for the analysis of complex statistical models and to obtain estimates from distributions) and their use in MTC. While acknowledging the potential “arbitrary” nature of selection of priors (or priors whose form is not defended) in a MTC using Bayesian methods (particularly for between-study variance in a random-effects model), many of these documents suggested “vague” or perhaps more accurately described “noninformative priors” for such analyses, provided specific values (Appendix A) for different model parameters, and proposed alternative strategies for eliciting/determining priors (i.e., use of larger meta-analyses or expert clinicians in a field) when applicable. Documents also highlighted the need for checking convergence (i.e., running at least three chains, starting from widely different but sensible initial values, and examining posterior distributions visually for spikes and unwanted peculiarities) and running a “conservatively” large number of iterations for both the initial “burn-in” and the posterior sampling.
Additional statistical modeling discussion from the guidance documents included (1) the selection of the referent in MTC with Bayesian methods (as this can affect the posterior estimates), (2) the inappropriateness of treating multi-arm trials as if they were separate trials in a network meta-analysis (the correlation among the effect estimates of pair-wise comparisons must be taken into account), (3) the potential need for multi-/shared parameter models to address situations where trials report results in different formats (i.e., binomial data versus summary log odds and variance), and (4) the summary effect measure to be chosen with documents often recommending relative versus absolute measures due to concerns regarding varying baseline risk; and odds ratios as the preferred relative measure as they are symmetrical around the line of unity.
Nearly all guidance documents addressed identification and handling of potential bias and inconsistency in network meta-analyses. Inconsistency was commonly defined by documents as a conflict between “direct” evidence and “indirect” evidence of a comparison. As noted by one of the NICE guidance documents, “like heterogeneity, inconsistency is caused by effect-modifiers, and specifically by an imbalance in the distribution of effect modifiers in the direct and indirect evidence.”20 Many documents reminded readers that network meta-analyses, like traditional meta-analysis, are akin to observational studies because the value of randomization does not hold across trials (albeit, they allow one to compare to or ore treatments that have not previously been directly compared, while maintaining the benefit of within trial randomization).9,10,17,25 Consequently, they are prone to similar biases, particularly confounding bias. Other noted factors that might potentially influence effect estimates include the number of trials with two or more comparison arms and heterogeneity (as with traditional pair-wise meta-analysis).16
Documents unanimously agree that the “consistency” or “exchangeability” assumption must be assessed and should be an a priori component of the review protocol.7,9-26 Both the CADTH and Australian Government's Department of Health and Ageing documents provide guidance for determining whether the “consistency” or “exchangeability” assumption is met based upon a detailed review of included studies (Appendix A). Both frameworks include an assessment of comparability of the common or “linking” treatment and comparability of patients in trials for presence of clinical or methodological heterogeneity. The Australian Government's Department of Health and Ageing document more specifically suggests for the direct trials and indirect comparison, to assess whether the measure of comparative treatment effect is appropriate and assess the event rates of linking interventions. Another document further warned “with increased complexity and greater numbers of treatments, the prospect of inconsistency increases.”7
Documents also suggest more quantitative methods for detecting inconsistency between direct and indirect evidence. As noted in the ISPOR document, many regulatory agencies require the direct estimates and indirect estimates be calculated separately and shown to be consistent before they are combined. Within a Bayesian framework, a consistency model can be compared to an inconsistency model, with the residual deviance used as a test of “global inconsistency”. The same NICE DSU document that provided the core Bayesian code also provides these models to assess inconsistency.20 Other, less favored, statistical methods noted by documents for detecting inconsistency include node splitting and use of measures of inconsistency variance.
Guidance documents are clear in their cautions about conducting network meta-analysis if the “consistency” assumption is not met. Unfortunately, as pointed out by one document, even if inconsistency is detected, “no commonly accepted standard [defines] which studies are similar enough”9,10 and that the determination is a “subjective” one. Moreover, some guidance documents stress that the validity of indirect comparisons may often be “unverifiable” because of limited detail in publications3 and the underpowered nature of detecting heterogeneity,18,20 and yet, another cautioned that inconsistency may affect different regions of a network of trials differently.16 Therefore, many documents provide more unwavering recommendations against network meta-analysis in the presence of inconsistency, while others make more flexible statements such as: “large inconsistencies rule out meta-analysis, small inconsistencies should add uncertainty to the results”11,12 and “…researchers must evaluate departures from consistency and determine how to interpret them.”9
A number of documents discussed the importance of assessing model fit when conducting a MTC using a Frequentist or Bayesian framework, both to aid in fixed- versus random-effects model (or other competing model, i.e., with or without covariate interaction) selection, and to demonstrate that the overall model fit is adequate. Examination of residual deviance (the lower the residual deviance the better the fit) and deviance information criteria (DIC) statistics were most commonly recommended when using a Bayesian approach.
Some of the guidance documents emphasized researchers should test alternate specifications of the prior distribution to assess robustness of model results. Noted assumptions to be tested in sensitivity analysis included different priors, inclusion or exclusion of covariate/modifiers in the model, and use of a fixed- or random-effects model.
How To Report and Interpret a Network Meta-Analysis/MTC
The proper interpretation of network meta-analyses is of paramount importance given their propensity to inform both clinical decisionmaking as well as coverage for third-party payers. A few guidance documents discussing the proper interpretation and reporting of network meta-analyses were identified in our literature search and are discussed here.7,9,10,16,19,23
When interpreting the results of a network meta-analysis, it is important to consider the internal validity of the analyses as this “maximizes transparency and avoid(s) errors in interpretation.”16 This can be achieved by assessing the appropriateness of inclusion criteria of the evidence network, the quality of the included studies, and the existence of confounding bias.9,10,23 As mentioned previously, “good research practices” are necessary when conducting network meta-analyses, similar to traditional systematic reviews, and this includes use of “rigorous and extensive literature search methods”9,10 to minimize the potential for publication bias. Moreover, the validity of the network meta-analysis also hinges on the internal validity of the studies included in the review. It is recommended that “each study included in the network meta-analysis should be critically evaluated for bias.”9,10 One of these determinants should be the similarity between the included trials. This involves evaluating the clinical and methodological characteristics of the included studies in order to identify potential sources of bias and includes (but is not limited to) assessing differences in patient populations, methods for outcomes measurement, protocol requirements, duration of follow-up, and the time-frame the study was conducted.9,10,16 Differences in these characteristics could affect the integrity of the network and potentially impact interpretation of its results if a treatment-by-covariate interaction exists. An example would be significant differences in “baseline risks and placebo responses across trials” which “can reflect additional important differences in study or patient characteristics across studies.”9,10
In addition to assessing the internal validity of both the included studies as well as the network itself, decisionmakers should assess the external validity of the meta-analysis' findings and whether they apply to the population of interest.9,10 This is important since many clinical trials are conducted using selected and homogenous populations, which can compromise external validity. However, decisionmakers should embrace a certain level of dissimilarity between studies in a network meta-analysis, as this often times more closely reflects real-world clinical practice. It has been said that “some heterogeneity across trials in the network may arguably increase external validity.”9,10 This view should be interpreted with caution, as a high degree of heterogeneity within the direct comparisons may also significantly weaken the network and adversely affect its outputs.
As discussed above, probability statements regarding which intervention in a MTC is “best” are commonplace. It has been recommended that these “probability statements should be interpreted carefully since the difference between treatments might be small and not clinically meaningful.”16 Moreover, posterior probabilities resulting from a MTC using a Bayesian framework —which themselves are estimates and contain inherent random variability—may (in certain situations) lead to misinterpretation (of the relative efficacy of an evaluated intervention that can limit, rather than enhance, decision-making. For example, two interventions could demonstrate quite comparable safety and efficacy profiles (that is, be similar clinically), but may appear different based on their posterior probabilities. Additionally, this determination “cannot be made on the basis of efficacy endpoints alone.”9,10 This assessment should include evaluations of other available safety and effectiveness data not included in the network meta-analysis, including observational evidence. This will provide a more detailed picture of the totality of information for the intervention and allow the decisionmaker to more properly assess its place in medical practice.
Guidance from the Haute Autorite de Sante provides a brief “critical review guide” with suggests users of network meta-analyses/MTC consider the following to evaluate its validity/usefulness: (1) acceptability of the approach used; (2) search strategy and selection process for data contributing to the indirect comparison calculations; (3) clinical homogeneity of trials and stability of effects; (4) consistency of estimates; (5) degree of concordance of the result with that of existing direct comparisons; and (6) correct interpretation of results in the proposed conclusions.23 Similar guidance has recently been provided by The NICE Decision Support Unit in the form of a “reviewer checklist” for evidence synthesis reports, which addresses “issues specific to network synthesis” including: (1) adequacy of information on model specification and software implementation, (2) multi-arm trials; (3) connected and disconnected networks; and (4) inconsistency”.20
Guidance documents have been published providing recommendations for the proper reporting of indirect comparisons and network meta-analyses.9,10,13,16,30,31 A Task Force on Indirect Treatment Comparisons Good Research Practices by the ISPOR has proposed a simplified checklist to assist decisionmakers in the proper evaluation of a reported network meta-analysis.9,10 The items included by this task force are included in Table 2. It should be noted that this list is not all-inclusive and does not include enough information to adequately assess either the internal or external validity of an indirect comparison or network meta-analysis.
This guidance document provides recommendations on items that should be included in the introduction, methods, results, and discussion sections of a network meta-analysis report as well as a detailed description of what to look for in each of these sections.9,10 Many of the items discussed overlap with guidance on the proper reporting of traditional meta-analyses.32 Aspects unique to conducting a network meta-analysis deserve special mention, much of which involves appropriate reporting of methods and results. If a Bayesian framework was used to perform the data analysis, it is recommended that “the choice of prior distributions for the model parameters should be defined.”9,10 If sensitivity analyses were conducted evaluating the prior distribution assumptions, these results should be also reported. In addition, the software package used to analyze the data as well as the written code from the program should be provided, “at least in an online appendix.”9,10
When reporting the results of a network meta-analysis, the ISPOR Task Force suggests that a graphical representation of the network be provided to “improve transparency of the analyses.”9,10 In addition to discussing the study selection process and description of the individual studies, the report should provide results of both the pairwise comparisons as well as indirect treatment comparisons.9,10,19 It has also been recommended that investigators “explain the difference between direct and indirect evidence based upon study characteristics.”3,19 Additionally recommended items for good reporting include goodness-of-fit of the data as well as calculations of residual deviance.9,10
Additional guidance documents for reporting of studies using a Bayesian framework come from the Reporting Of Bayes Used in clinical Studies (ROBUST) criteria, BayesWatch (Bayesian analysis in biomedical research), and Bayesian Standards in Science (BaSiS).13,30,31 Although these documents are intended for Bayesian analyses in general, they can also be applied to meta-analyses as well. The ROBUST criteria suggests that the following information should be included in any Bayesian study report: prior distributions used, including specified, justified, and sensitivity analysis, analyses run including the statistical model and analytical techniques, and results including central tendency, standard deviation or credible intervals/Bayesian confidence interval (an interval in the domain of a posterior probability distribution used for interval estimation).13 The BayesWatch and BaSiS include more technical and computational items such as information about the model itself, including details about the software used, if Markov Chain Monte Carlo simulation was used, and if so the number and length of runs as well as convergence diagnostics, shape of the posterior densities, and use of appropriate Bayes factors, amongst others.30,31 It has been questioned whether these more detailed requirements are important to include for a clinical journal and should be reserved for a more methodologically focused periodical.13
Part Two. Systematic Review of Existing MTCs
Results of the Literature Search
A total of 626 citations were identified through the database search with an additional five citations identified manually (Figure 2). After duplicates were removed, 572 citations remained and were screened at the abstract level. Of the abstracts reviewed, 341 were excluded and 231 were considered at the full-text level. After full-text review, 44 articles representing 43 unique MTCs that utilized either Bayesian or Frequentist methods to conduct a MTC were included. A list of excluded studies can be found in Appendix E.
Key Points
- Of the included MTCs, the majority use Bayesian methods.
- Thirty-four unique MTCs that used Bayesian methods were identified and were conducted in 10 different countries. Thirteen disease categories were evaluated, with the most common being cardiovascular. Most analyses were funded government/foundation sources.
- Pharmacologic interventions were evaluated in the majority of networks.
- The statistical code was rarely made available to the reader, although raw data was commonly published.
- A similar percent of MTCs either reported using vague priors or did not specify if the priors were intended to be vague or informative. Few models declared using informative priors. It was uncommon to find specific priors, and may be related to lack of code reporting. However, the majority of journals that published these MTCs allowed supplement or appendix publication and several manuscripts did utilize this option.
- Random effects models were used in the majority of MTCs. A broad range of methods were used to evaluate convergence, heterogeneity, and inconsistency. Unfortunately, lack of reporting within manuscripts may or may not mean such evaluations were omitted.
- It was common for authors to rank order interventions based on the probability of the intervention being best for a given outcome. Rarely did authors conclude equivalence or non-inferiority of interventions based on MTC results.
- Most MTCs evaluated binary outcomes and reported results as odds ratios or relative risks. However, most MTCs did not specify whether these were mean or median values of the posterior distribution. All models reported 95 percent credible intervals. Of the models that reported continuous outcomes, the weighted mean difference was the effect measure used almost exclusively.
- A mixture of tables, text, and figures was commonly used to report results of the MTCs.
- Nine MTCs used Frequentist methods.
- These MTCs were conducted in five different countries and evaluated five disease categories including cardiology, behavioral health, pain management, rheumatology and gastro-urology.
- Three analyses specifically referenced/used Lumley's MTC method.
- Most analyses evaluated pharmacologic interventions with on average 7.3 interventions evaluated.
- Eight MTCs included a traditional meta-analysis as well. It was more common for heterogeneity to be evaluated in the traditional meta-analysis than in the network meta-analysis. The majority of MTCs evaluated inconsistency.
- None of the MTCs made claims of equivalence, non-inferiority, or defined minimally important differences. Most analyses reported binary outcomes with the majority using odds ratios as the effect estimates. All analyses reported variance using 95 percent confidence intervals.
Detailed Analysis
The results are first presented for the journals in which identified MTCs were published followed by results according to the method used to conduct the MTC, either Bayesian or Frequentist. When applicable, mean values are accompanied by SDs (mean±SD). Text and tables do not duplicate in all cases and either format may have been used to present data.
Journal-Level Characteristics
Our systematic literature search identified 42 unique MTCs that used either Bayesian or Frequentist methods to conduct MTC. The majority of MTCs used Bayesian methods (33 out of 42, 78.6 percent)33-66 and few used Frequentist methods (8 out of 42, 19.0 percent).68-75 One review (2.4 percent) used both methods.67 Complete details of each journal in which at least one review was published and the journal's characteristics can be found in Appendix Table F-1. The 42 MTCs were published in 32 different journals, with a mean impact factor of 8.67±8.1 (Table 3). The journal which had the highest number of MTC published was the British Medical Journal (5 of the 42 reviews, 11.9 percent).The majority of journals allowed online supplements or appendices, and also imposed word count limits (Table 3). However, the majority of these journals did not impose limitations on the number of tables or figures allowed.
MTC Using Bayesian Methods
A summary of the results of Bayesian MTCs can be found in Table 4 to Table 6. Detailed characteristics of each analysis can be found in Appendix Tables F-2 to F-4. One analysis used both Bayesian and Frequentist methods and is considered in both sections of the results.67 The analysis by Orme et al.43 included two individual networks and whether this analysis was considered once or twice for a given characteristic is defined within table legends.
General Characteristics
The majority of identified MTCs identified in our literature search used Bayesian methods to conduct the analysis (81.0 percent). On average, 6.1±4.8 authors were listed per publication and the majority of publications (52.9 percent) did not include a methodologist as an author. The most common country from which authors published reviews was the United Kingdom (35.3 percent), followed by the United States (11.8 percent) and Greece (11.8 percent).The remaining analyses were published in a variety of countries (Table 4). The majority of analyses were funded by government/foundation sources (29.1 percent), followed by industry (26.5 percent) and analysis which did not report funding sources (23.6 percent). Only two analyses (5.9 percent) identified an affiliation, one each with the Health Technology Assessment Program and The Cochrane Collaboration. The mean number of printed pages per publication was 16.6±36.3 and over half (58.8 percent) published a supplement or appendix. Only one publication from those that did not publish a supplement or appendix did not have the option given the journal specifications and one was an affiliated report that did not have a word or page restriction.
There were 13 different categories of disease states evaluated with a wide dispersion of categories. The most common category was cardiology (17.6 percent) (Table 4). The mean number of interventions included within the analyses was 8.5±4.3. The majority of analyses evaluated pharmacologic interventions (85.7 percent) with few evaluating devices (8.5 percent) or other interventions (2.9 percent), such as blood glucose monitoring. One analysis included both pharmacologic interventions and devices (2.9 percent). The mean number of trials included in the analyses was 35.9±30.1 and the mean number of patients included was 33,460±71,233.
Methods Characteristics
The majority of analyses also included a traditional meta-analysis (76.5 percent) (Table 5). The most common model used in Bayesian MTCs was a random-effects model (58.8 percent), followed by both a random and fixed effects model (20.6 percent), unspecified (17.6 percent), or a fixed-effects model (2.9 percent). The majority of analyses did not report information about whether there was adjustment for multiple arms (82.4 percent) or adjustment for covariates (73.8 percent). Less than half of the analyses reported testing the model fit (44.1 percent), while the remaining did not comment on testing model fit. Of the 15 analyses that reported tested model fit, the most common method was use of residual deviance (40.0 percent) followed by using both residual deviance and the deviance information criterion (20.0 percent), solely the deviance information criterion (13.3 percent), unspecified methods (13.3 percent), mean sum deviation (6.7 percent), or Q-Q plots (6.7 percent).
All analyses used WinBUGS software. Two analyses also further specified additional software including BUGS XLA Wrapper and S-Plus. The majority of analyses did not make their code available to the reader (79.4 percent), although of the seven analyses that did provide the code (20.6 percent) the most common presentation was within the online supplement (five MTCs, 71.4 percent). Raw data was frequently available to the reader (61.8 percent of MTCs) and of the 21 analyses that published raw data, the most common format was within the manuscript itself (18 MTCs, 85.7percent). Most analyses did not report evaluating convergence (64.7 percent). Of the 12 analyses (35.3 percent) that did evaluate convergence, the most common method was the Gelman Rubin statistic (58.8 percent), although several less frequent methods were used as well (Table 5). Totals of each individual method combined may not add up to the number of studies because one study may have used multiple methods.
Most analyses did not report whether the priors used were considered vague or informative (47.1 percent) while 44.1 percent of MTCs specifically described the prior distributions used as vague or non-informative. The remaining 8.8 percent of analyses used informative priors. It was uncommon for the actual prior distribution to be reported for the population treatment effect (d) and the between-study standard deviation of population treatment differences across studies (sigma), as only 32.1 percent and 29.4 percent of MTCs, respectively, reported the actual priors. Most analyses did not perform sensitivity analysis based on the priors used (88.2 percent).
Evaluation of heterogeneity within traditional meta-analyses was common (16 out of 26 MTCs that included a traditional meta-analysis, 61.5 percent). Some reported multiple means to test for heterogeneity and therefore the totals of each individual method combined may not add up to the number of studies. The most common method used was the I2 statistic (81.3 percent) followed by the Cochrane Q-statistic (43.8 percent), among many less frequent methods (Table 5). Evaluation of heterogeneity within the MTC was less common, reported in only 32.4 percent. Some analyses reported multiple means to test for heterogeneity and therefore totals of each individual method combined may not add up to the number of studies. Of these 11 analyses, the most common method used to assess heterogeneity was Tau2 (54.5 percent) followed by between study standard deviation (45.5 percent), among several other less frequent methods (Table 5).
Inconsistency was evaluated in 70.6 percent of analyses. One review reported being unable to evaluate inconsistency due to lack of direct data while the remaining MTCs (10 MTCs, 29.4 percent) did not report evaluating inconsistency. Totals of each individual method combined may not add up to the number of studies because one study may have used multiple methods. The majority of the 24 analyses that evaluated inconsistency did so through comparison of the results with either the results of their traditional meta-analysis or a previously conducted meta-analysis (50.0 percent) followed by unspecified methods (33.3 percent), among several others (Table 5).
Outcome and Results Reporting
Few analyses presented graphical representation of the posterior distribution of outcomes (8.8 percent) (Table 6). The use of rank ordering of interventions based on the probability the given intervention was the best for a given outcome was reported in 61.8 percent of analyses. Only one analysis made claims of equivalence (2.9 percent) and two made claims of non-inferiority (5.9 percent). Of the three analyses that made claims of equivalence or non-inferiority, two defined a minimally important difference. Four (11.8 percent) analyses defined minimally important differences although did not make specific claims of equivalence or non-inferiority.
Most analyses reported outcomes that were binary (67.6 percent) followed by both binary and continuous outcomes (17.6), solely continuous outcomes (11.8 percent), and one reported on a categorical non-binary outcome (2.9 percent). Of the 29 analyses that reported binary outcomes, odds ratios were the most commonly reported effect measure (62.1 percent), followed by relative risks (17.2 percent) and hazard ratios (13.8 percent), among other less frequent measures. Of the 10 analyses that reported continuous outcomes, the weighted-mean difference was the most common effect measure (80.0 percent). Two network meta-analyses used multiple effect measures including standardized mean difference and a measure specific to the content (e.g., prevention fraction in a dental analysis). The one analysis that reported a categorical non-binary outcome used relative risk to measure effect. All analyses reported variance with 95 percent credible intervals and one also reported standard errors. Most analyses (85.3 percent) did not report if the posterior distribution was the mean or median value. Presentation of results data varied although most analyses used multiple media (and were therefore counted multiple times) including tables, figures, and text. Of the 34 analyses, 32 used text (94.1 percent), 24 used tables (70.6 percent), and 21 used figures (61.8 percent) to present results.
Frequentist MTCs
A summary of the results of MTCs that used Frequentist methods can be found in Table 7 to Table 9. Detailed characteristics for each analysis can be found in Appendix Tables F-5 to F-7. One analysis used both Bayesian and Frequentist methods and is considered in both sections of the results.67 When applicable, mean values are accompanied by SDs (mean±SD).
General Characteristics
A minority of the analyses identified by our systematic review used Frequentist methods (nine MTCs, 20.9 percent). Again, one MTC used both Bayesian and Frequentist methods.67 On average, 7.1±5.4 authors were listed per publication and a majority of publications were not considered to have a methodologist as an author (44.4 percent) (Table 7). The most common country from which authors published these MTCs were from the United States (44.4 percent), followed by the United Kingdom (22.2 percent) and France (22.2 percent). The majority of analyses were funded by government/foundation sources (44.4 percent) followed by industry (33.3 percent) among other sources. Two analyses identified an affiliation, one each with the Health Technology Assessment Program and the Cochrane Collaboration. The mean number of printed pages per publication was 16.1±16.0 and most of the publications (66.7 percent) published supplements or appendices. The two MTC with affiliations were those without a supplement.
There were five different categories of disease states evaluated in the analyses with the most in cardiology (33.3). The mean number of interventions included within the evaluated analyses was 7.3±2.8. Eight analyses evaluated pharmacologic interventions (88.9 percent) while one evaluated multiple intervention types (11.1 percent). The mean number of trials included in the analyses was 59.0 ±51.9 and the mean number of patients included was 59615±70268.
Methods Characteristics
Eight of the nine MTCs also included a traditional meta-analysis. The language used to describe the model implemented in each analysis was heterogeneous and can be found in Appendix Table F-6. Of note, three MTCs specifically referenced use of Frequentist methods described by Lumley70-72 and the other 6 analyses used other mixed model approaches for Frequentist MTC.67-69,73,74 Weighting of studies was not reported in most analyses (88.9) while one (11.1 percent) weighted studies using inverse variance (Table 8). Two analyses (22.2 percent) adjusted the model for covariates while the others did not report whether adjustments were made or not. Raw data was available in most analyses (88.9 percent) and of the eight that published raw data, the format was mostly within the manuscript itself (62.5 percent) as opposed to an online supplement (37.5). Three analyses (37.5 percent) used R as the software while three (37.5 percent) used SAS, one used Stata (11.1 percent) while the last did not report software used.
Heterogeneity within traditional meta-analyses was evaluated in four of eight reviews (50.0 percent) that conducted a traditional meta-analysis. The most common method used in these four analyses was the I2 statistic (50.0 percent) while one analysis used both the I2 statistic and the Cochrane-Q statistic (25.0 percent) and one used the Riley Day test (25.0 percent). Evaluation of heterogeneity within network meta-analyses was less common, reported in only two of the nine analyses (22.2 percent). One used covariance statistics and standard error and one used tau2. Inconsistency was evaluated in eight of the nine analyses. The majority of analyses (62.5 percent) evaluated inconsistency by comparing results from the MTC to either the traditional meta-analysis or previously published literature. Other methods reported to evaluate inconsistency included evaluating incoherence values (25.0 percent) and t-tests based on odds ratios from the traditional and network meta-analyses (12.5 percent).
Outcome and Results Reporting
None of the analyses made claims of equivalence, noninferiority, or defined a minimally important difference (Table 9). Seven analyses reported outcomes that were binary (77.8 percent) while one analysis reported continuous outcomes and the last reported both outcome types. Of the eight analyses that reported binary outcomes, most used odds ratios as effect measures. All analyses reported variance with 95 percent confidence intervals. Presentation of results data varied although most reviews used multiple media including tables, figures, and text. Of the nine analyses, eight used text (88.9 percent), three used tables (33.3 percent), and six used figures (66.7 percent) to present results.
Part Three. MTC Focus Group
Key Points
- Nine individuals participated in our focus group, all of whom were authors of MTCs using Bayesian methods identified in part two of this report. Unfortunately despite all efforts, none of the limited number of investigators who conducted MTC using Frequentist methods replied to our invitation or participated in the group.
- The majority of respondents were from academic settings, have been trained in network meta-analysis methods and have conducted at least two such analyses. The respondents seemed to be involved in a variety of the steps in conducting the identified network meta-analysis.
- Respondents seem to feel the term “network meta-analysis” is used ambiguously and inconsistently in the medical literature, although they do not feel the same about the terms “mixed treatment comparison” or “Frequentist network meta-analysis.”
- Of the questions asking general opinion of network meta-analysis, most responses to questions were on average a neutral response on a 5-point scale. Of the comments which had clear majority opinions were:
- Disagreement that investigators should consider restricting their search to the minimum number of interventions of interest when conducting a network meta-analysis
- Agreement that the combination of indirect and direct evidence adds valuable information that is not available from head-to-head comparisons.
- Agreement that network meta-analysis should provide a graphical depiction of the evidence network.
- When asked specifically about Bayesian methods to conduct MTC, respondents provided a variety of strengths and limitations. Although many were unique, the limitation mentioned most commonly was in regards to the software while there was no commonly mentioned strength.
- When asked specifically about their MTC, most respondents built the code from scratch or adapted the code from a previously published code. Unfortunately we did not gain insight as to how or why prior distributions were chosen but rather what the priors chosen were.
- Additionally respondents were asked to rate 11 criteria on how influential each was in their decision to use Bayesian methods for their MTC. The most influential criteria, on average, were the method's ability to handle multi-arm studies and collaborator's or respondent's prior expertise and/or experience. The least influential criterion was the requirement to specify noninformative priors.
Detailed Analysis
Tables are used throughout this section to present results for each individual focus group question or to present free text responses. Not all data appear in both text and table format and some data are exclusively reported in within either format.
Composition of the Focus Group
The focus group was comprised of nine individuals (hereafter respondents), who authored a unique MTC using Bayesian methods identified in part two of this project. Despite all efforts to contact the authors of the analyses using Frequentist methods, no authors successfully replied or participated in the group. Therefore, the presented results represent the views of investigators who have used Bayesian methods to conduct their MTC. Most respondents work in academic settings (66.7 percent) and consider themselves to have the expertise needed to implement a network meta-analysis themselves (77.8 percent). Most respondents (88.9 percent) have received either formal or informal training in network meta-analysis methods (Table 10).
Three respondents are affiliated with an organization involved in conducting synthesis, systematic review, or meta-analysis, including AHRQ (n=2) and Cochrane (n=1). The referenced meta-analysis was not the first in which any of the respondents used such methods. All respondents have conducted at least two network meta-analyses and three of the nine respondents (33.3 percent) have conducted five or more of these analyses. When asked to select which activities described their involvement in the given analysis, it appears that the respondents were involved in multiple steps of the process (Table 11).
General Questions Regarding Network Meta-analysis
Respondents were asked a series of 14 questions, using a 5-point Likert scale, regarding general principles and views of network meta-analysis. The results for each question are presented in Table 12. In summary, mixed results were obtained when asking the respondents their opinion as to the ambiguity and consistency in which certain terms were used in the literature. Respondents felt that the term “network meta-analysis” is used ambiguously and inconsistently in the medical literature, whereas the term “mixed treatment comparison” was consistently and unambiguously used. Last, most respondents were neutral to how the term “Frequentist network meta-analysis” is used in the literature. All respondents agreed that the combination of indirect and direct evidence adds valuable information that is not available from head-to-head comparisons as well as the necessity for MTCs to provide a graphical depiction of the evidence network. The majority of respondents disagreed that “when conducting a network meta-analysis, an investigator should consider restricting a search to the minimum number of interventions of interest.” All respondents agreed or were neutral with the statement “the combination of direct and indirect evidence yields a more refined and precise estimate of the interventions directly compared” and the statement “the combination of direct and indirect evidence broadens the external validity of the analysis.” The remaining questions had a mixture of responses that did not have a majority representation.
Questions Specific to Bayesian Methods for MTC
The respondents were asked a series of open-ended questions. First, they were asked to list the three most significant barriers of Bayesian methods when conducting MTC, results of which are found in Table 12. All respondents listed at least one barrier, seven listed two barriers, and five listed three barriers. Respondents were also asked to list the three most significant strengths of Bayesian methods when conducting MTC, results of which are listed in Table 13. All respondents listed at least one strength, eight listed two strengths, and six listed three strengths.
Questions Specific to Respondent's MTC
Respondents were asked to rate 11 criteria based on how influential each criterion was in their decision to conduct a MTC using Bayesian methods. A 5-point scale was used ranging from “not at all” to “extremely.” The responses to each question can be found in Table 14. On average, the criteria with the most influence were the method's ability to handle multi-arm trials and the collaborator's or respondent's prior experience and/or expertise. The next most influential criterion was the amount of methodological research supporting this method followed by the method's ability to allow rank ordering of interventions according to the probability they are best. The remaining criteria were less influential in the respondents' decision-making to use Bayesian methods (Table 14).
In response to a true/false question, five of nine respondents involved a researcher /collaborator solely due to their methodological expertise in Bayesian methods. Eight of the nine respondents did not use formal guidance to guide how the MTC was conducted. One respondent replied that there was no guidance available at the time of their analysis. Respondents were asked how the code used in the analysis was derived. Three codes were adapted from a previously published code, three codes were built from scratch, one code was built from scratch with the help of WinBUGS examples, one code was adapted from a publically available code, and the last instance the respondent was unsure how the code was derived (Table 15). The last open-ended question asked how prior distributions were chosen for the meta-analysis and why they were chosen over others. Unfortunately, the responses collected do not seem to provide insight as to how or why, but rather what the prior distributions were (Table 16).
- Results - Use of Mixed Treatment Comparisons in Systematic ReviewsResults - Use of Mixed Treatment Comparisons in Systematic Reviews
Your browsing activity is empty.
Activity recording is turned off.
See more...