Abstract

Government programs are often offered on an optional basis to market participants. We explore the economics of such voluntary regulation in the context of a Medicare payment reform, in which one medical provider receives a single, predetermined payment for a sequence of related healthcare services, instead of separate service-specific payments. This “bundled payment” program was originally implemented as a five-year randomized trial, with mandatory participation by hospitals assigned to the new payment model; however, after two years, participation was made voluntary for half of these hospitals. Using detailed claim-level data, we document that voluntary participation is more likely for hospitals that can increase revenue without changing behavior (“selection on levels”) and for hospitals that had large changes in behavior when participation was mandatory (“selection on slopes”). To assess outcomes under counterfactual regimes, we estimate a stylized model of responsiveness to and selection into the program. We find that the current voluntary regime generates inefficient transfers to hospitals, and that alternative (feasible) designs could reduce these inefficient transfers and raise welfare. Our analysis highlights key design elements to consider under voluntary regulation.

I. Introduction

Government intervention is designed to move market actors away from market equilibrium. Yet some government programs allow these actors to voluntarily decide whether to participate in the program. There are a number of reasons voluntary programs are popular. From a political perspective, voluntary programs may face less opposition from industry or consumer lobbies, since their members need only sign up if they benefit. Voluntary programs may also be more palatable to those with an ideological aversion to government mandates and a preference for regulatory “nudges” (Thaler and Sunstein 2003).

The key economic benefit of voluntary programs—that is, “choose your own incentives”—is that they might generate favorable selection. If actors have private information about their net benefits from changing behavior, then the resulting “selection on slopes”—also known as selection on gains or Roy selection (Heckman and Honoré 1990)—might result in selection into the program by those with the highest net social benefits. However, if voluntary programs attract participants who, without changing their behavior, can simply receive a higher government transfer, the resulting “selection on levels” could lead to higher government spending without the desired behavior change. Thus, the extent to which a voluntary program is more or less socially desirable depends critically on the nature and extent of selection into the program.

Conceptually, this idea is not new. Indeed, it is a core trade-off highlighted in theoretical analyses of optimal contract design and government regulation, generally (e.g., Laffont and Tirole 1993) and in the healthcare sector (e.g., Newhouse 1996, 2004). This article’s key contribution is to leverage a unique, midcourse reform that changed a mandatory-participation, randomized controlled trial of a particular incentive scheme into a voluntary participation program; this provides a rare opportunity to estimate “slope” effects for both those who choose to participate and those who do not. This allows us to estimate empirically the core components of the canonical model and illustrate—via our particular context and estimates—how the framework can be used for applied analysis.

We explore these trade-offs in the context of the U.S. Medicare program, the public health insurance program for the elderly and the disabled. Since about 2011, Medicare has rapidly expanded the use of alternative models for reimbursing healthcare providers, such as accountable care organizations, bundled payments, and primary care coordination models. By 2016, over 30% of Traditional Medicare spending was based on alternative payments models (Shatto 2016). With few exceptions, provider participation in these payment models has been voluntary (GAO 2018). There is an ongoing and active policy debate over whether these programs should be made mandatory, but this has focused primarily on concerns about selection on levels and has ignored the potential salutary benefits from selection on slopes (e.g., Gronniger et al. 2017; Levy, Bagley, and Rajkumar 2018; King 2019; Frakt 2019b; Liao, Pauly, and Navathe 2020).

We analyze the Medicare bundled payments program for hip and knee replacement, known as Comprehensive Care for Joint Replacement (CJR). Hip and knee replacement is a large category, with almost half a million procedures and |${\$}$|10.7 billion in Medicare spending in 2014. Under bundled payments, Medicare makes a single payment to the hospital for all services related to the episode of care, including the initial hospital stay and physician fees, as well as any subsequent care by other medical providers during the recovery period. By contrast, under the status quo fee-for-service (FFS) system, Medicare makes separate payments to different providers based on the care provided. The idea behind the bundled payments reform is that by making the hospital the residual claimant on the costs related to the entire episode of care, it will internalize the incentives to provide care efficiently, including coordination with downstream providers. In practice, this was implemented by providing financial bonuses or penalties—known as reconciliation payments—to hospitals when the submitted Medicare FFS claims for the episode deviated from the bundle price.

CJR was initially designed by Medicare administrators as a mandatory-participation, five-year randomized trial. Randomization was conducted at the metropolitan statistical area (MSA) level. In the 67 treatment MSAs, hospitals were paid under the bundled payments program. In the 104 control MSAs, hospitals were paid under the FFS system. The program was implemented as designed in April 2016. However, toward the end of the second year of the program, Medicare announced that participation would be made voluntary in half the treated MSAs (CMS 2017), and about three-quarters of the affected hospitals subsequently opted out.

We begin by providing descriptive evidence on the mandatory and voluntary regimes. In the mandatory regime, we closely follow prior analyses and find that consistent with them, bundled payments caused, on average, a modest reduction in submitted Medicare claims, driven predominantly by reduced discharges to post–acute care (PAC) facilities (Finkelstein et al. 2018; Lewin Group 2018; Barnett et al. 2019; Haas et al. 2019). Once reconciliation payments were taken into account, however, the mandatory bundled payment regime had no effective impact on government expenditures. We also explore and document heterogeneity across hospitals in the effect of the payment reform, as well as the nature of selection into the program once it became voluntary. Consistent with selection on levels, we find that hospitals with lower claims under FFS—who, holding behavior constant, would benefit more from bundled payments—are more likely to remain in the program. Consistent with selection on slopes, we find that hospitals that achieved larger reductions in claims under mandatory bundled payments are less likely to opt out when it becomes voluntary.

Motivated by these patterns, we specify and estimate a stylized model of responsiveness to and selection into the bundled payment program. In the model, hospitals are characterized by a hospital-specific “level” (average claims per episode under FFS incentives) and a hospital-specific “slope” (the reduction in claims under bundled payments). Under a voluntary regime, the selection decision depends on the hospital-specific bundle price—the bundled payment the hospital receives from the government under the program—as well as on the hospital’s level and slope parameters. The random assignment in years 1–2 of the program, when participation was mandatory, identifies the levels and slopes, and the voluntary decision in year 3 identifies the selection equation.

We estimate that average episode claims under the status quo FFS incentives would be about |${\$}$|25,000 and that bundled payments reduced claims, on average, by about |${\$}$|250 per episode. These averages, however, mask substantial heterogeneity across hospitals in both levels and slopes. Heterogeneity is particularly large in levels, where the standard deviation across hospitals in claims under FFS incentives is about |${\$}$|5,000. Observed bundle prices do not come close to capturing this heterogeneity, thus making selection on levels the key driver of the participation decision once participation becomes voluntary.

We use the estimated model to compare outcomes and social welfare under the observed voluntary and mandatory programs as well as under alternative counterfactual designs. We define social welfare as the sum of consumer surplus and producer (hospital) profits minus the social cost of public spending. We assume that consumer surplus is not affected by the payment regime; this is consistent with evidence from the randomized trial that healthcare quality, patient mix, and patient volume did not change with bundled payments (Finkelstein et al. 2018; Lewin Group 2018; Barnett et al. 2019; Haas et al. 2019). We define the social cost of public spending as government (i.e., Medicare) spending multiplied by the shadow cost of public funds, which we assume (conservatively) to be 0.15. This “cost of public funds” generates the key trade-off in designing a voluntary bundled payment model: higher bundle prices will induce more hospitals to participate and increase productive efficiency, but will involve higher government spending, which is socially costly. Producer surplus and government spending can be calculated directly from the data and estimated model parameters.

The model estimates, like the descriptive work, indicate substantial selection on levels: hospitals that opt into the voluntary bundled payment regime would have had (counterfactual) average episode claims under the FFS status quo that were substantially lower (relative to the bundle price) than hospitals which do not opt in. This selection on levels is offset by a small amount of favorable selection on slopes and social savings from a “too low” bundled price. On net, we estimate that the voluntary bundled payments regime modestly raises social surplus relative to the FFS status quo.

We also compare the surplus from the observed bundle prices to counterfactuals with better and worse targeting. Within the range defined by no targeting at the bottom and the best feasible targeting at the top, we find that the observed bundled prices generate approximately two-thirds of the feasible gains. We also discuss how defining the bundle more narrowly could effectively allow for better targeting of the bundle price.

Although the midstream regulatory change to the program we study is unique from a research perspective, the regulator’s problem of whether to allow for voluntary participation is ubiquitous. Our empirical estimates are naturally specific to our setting, but the payment reform we analyze is quite similar in nature to a host of alternative payment models that Medicare has been introducing over the past decade (CMS 2020) and is continuing to introduce through a mixture of voluntary and mandatory models.1 Incentive programs with voluntary participation are widespread in other sectors. For example, recent work has analyzed selection by landowners into voluntary incentive programs for providing environmental services (Jack and Jayachandran 2019), by polluting firms into whether to pay taxes based on their disclosed and verifiable emissions or the average emission rate among nondisclosers (Cicala, Hémous, and Olsen 2021), by private schools into whether to accept public vouchers (DeAngelis, Burke, and Wolf 2019), and by residential electricity consumers into whether to face a constant or time-varying regulated electricity price schedule (Ito, Ida, and Tanaka 2021).

Our article relates to several distinct literatures. In addition to our conceptual debt to the theoretical literature on optimal regulation discussed above, our empirical analysis of the potential for selection not only on levels but also on slopes relates to work in labor economics on selection on gains (Heckman and Honoré 1990). Within health economics, our work contributes to the growing literature on the impact and optimal design of financial incentives for healthcare providers (e.g., Cutler 1995; Gaynor, Rebitzer, and Taylor 2004; Clemens and Gottlieb 2014; Ho and Pakes 2014; Einav, Finkelstein, and Mahoney 2018; Eliason et al. 2018; Gaynor, Mehta, and Richards-Shubik 2020). It also relates to work on so-called selection on moral hazard—that is, consumer selection of health insurance plans based not only on levels but also on slopes (Einav et al. 2013, 2016; Shepard 2020; Marone and Sabety, 2021); here, we examine selection on moral hazard from the provider side rather than the consumer side.

Most narrowly, the article contributes to the literature on the effect of Medicare bundled payment programs. This includes several recent evaluations of the program we study focusing on random assignment under mandatory participation (e.g., Finkelstein et al. 2018; Lewin Group 2018; Barnett et al. 2019; Haas et al. 2019; Einav et al. 2020a; Wilcock et al. 2021). It also includes evaluations of the much larger number of voluntary participation bundled payment programs for a host of conditions, including coronary bypass, prenatal care, cancer, and hip and knee replacement.2 It is well understood that nonrandom selection into voluntary models can bias the evaluation of these programs (e.g., Gronniger et al. 2017; Levy, Bagley, and Rajkumar 2018). Our focus here is on proposing (and applying) a framework that allows us to quantify the impact of such programs, accounting for this selection.

The rest of the article proceeds as follows. Section II provides background on our setting. Section III describes the data and presents descriptive evidence of the impact of bundled payments under mandatory participation as well as the nature of hospital selection once the program became voluntary. Section IV presents a stylized model of selection into a voluntary bundled payment program. Section V presents the econometric specification of the model and describes its identification and estimation. Section VI presents our main results. The last section concludes. All appendix material can be found in the Online Appendix.

II. Setting

II.A. Medicare Bundled Payment Programs

Medicare is the public health insurance program for the elderly and the disabled in the United States. We focus on the Traditional Medicare program, which provides coverage to about two-thirds of Medicare enrollees. In 2017, Traditional Medicare (hereafter “Medicare”) had 38.7 million enrollees and annual expenditures of |${\$}$|377 billion (CMS 2019).

Throughout most of its history, Medicare has paid providers on a cost-plus basis referred to as fee-for-service (FFS), in which providers are reimbursed based on claims submitted for services. For instance, for a patient undergoing hip replacement, Medicare might make separate payments to the hospital for the initial hospital stay, the surgeon for performing the procedure, and the skilled nursing facility for post acute care, as well as additional payments for each postoperative visit by the surgeon or for renting a wheelchair during the recovery period. Moreover, in most of these categories, the payment would depend on the specific services provided.3

Over the past decade, Medicare has responded to concerns that the FFS system may encourage excessive healthcare use by attempting to shift providers toward alternative payment models, such as accountable care organizations (ACOs), bundled payments, and primary care coordination models. By 2016, over 30% of Traditional Medicare spending was based on these alternative models (Shatto 2016).

Our focus is on bundled payments, which represent a middle ground between FFS and fully capitated models, such as ACOs, in which providers are paid a fixed per capita amount per annum. Under bundled payments, Medicare makes a predetermined, single payment to one provider for all services related to a clearly defined episode of care. Episodes typically start with an acute-care hospital stay (e.g., for hip replacement surgery) and include most subsequent care during the recovery period. The payments are sometimes adjusted to reflect predictable variation in patient health or costs in the local medical market. The contracts may also be structured to limit risk exposure for the hospital.

Proponents argue that by providing a single, fixed reimbursement, bundled payments will improve coordination of care and reduce unnecessary healthcare utilization. Yet some are concerned that because providers do not receive marginal payments, they may cut back on necessary care or cherry-pick low-cost patients (Cutler and Ghosh 2012; Fisher 2016).

Most prior studies of bundled payments have been observational, focusing on the experience of a small number of hospitals that voluntarily participated. Many of these studies have found large government savings associated with bundled payments (e.g., Cromwell, Dayhoff, and Thoumaian 1997; Newcomer et al. 2014; Doran and Zabinski 2015; Froemke et al. 2015; Dummit et al. 2016; Navathe et al. 2017; Carroll et al. 2018). However, voluntary participation makes separating treatment from selection difficult, and the small number of participating hospitals raises concerns about generalizability (Gronniger et al. 2017; King 2019).

II.B. Comprehensive Care for Joint Replacement (CJR)

We focus on the Medicare bundled payment program for hip and knee replacement, known as CJR. Hip and knee replacement (also referred to in the medical literature as lower extremity joint replacement, or LEJR) is a large Medicare category; in 2014, the year before CJR was announced, Medicare covered almost half a million LEJR procedures, accounting for about 5% of Medicare admissions and inpatient spending (Finkelstein et al. 2018).

Under CJR, an episode begins with a hospital stay in a qualifying diagnosis-related group (DRG) and ends 90 days after hospital discharge. Medicare pays the hospital a predetermined bundle price for the episode. The hospital is then financially responsible for medical claims over the entire episode (except for care that is deemed as obviously unrelated). By contrast, under the status quo FFS regime, Medicare pays the hospital a fixed amount for the hospital stay, and reimburses the surgical procedure and post discharge care separately based on those providers’ submitted claims.

The level and targeting of the bundle price are key design elements in a bundled payment program. Let bh denote the average per episode bundle price at hospital h in a given year, and let yh denote average per episode claims submitted that year. Under FFS, Medicare pays yh on average. Under bundled payments, Medicare pays bh on average. More specifically, under bundled payments, providers continue to submit claims and receive reimbursement of yh on average as if they were under FFS, allowing us to observe yh even under bundled payments. At the end of the year, hospitals under bundled payments receive a reconciliation payment of bhyh per episode, so that the gross Medicare payment is bh.

Under CJR, Medicare tried to set the bundle price before each program year to be slightly lower than expected per episode claims under FFS.4 To do so, the bundle price included a small discount off a weighted average of historical hospital and regional (defined by the nine census divisions) per episode claims from three prior reference years, with the weight on the regional component increased over time from one-third in the first two years of the program to 100% in the last two years. The discount factor was designed to reflect Medicare’s portion of expected savings from CJR.

We abstract in our analyses from two other features of CJR. First, to mitigate concerns that bundled payments would create incentives to shirk on quality, hospitals were only eligible for positive reconciliation payments if they met a minimum quality standard. However, in practice the quality standard was not binding for the vast majority of the hospitals.5 In addition, prior research has not detected effects on either incentivized or nonincentivized measures of quality (Finkelstein et al. 2018).

Second, like most bundled payment programs, CJR is not a pure bundled payment model that exposes hospitals and Medicare to unbounded risk. Rather, to limit risk exposure to both parties, the reconciliation payment was subject to stop-loss and stop-gain provisions. Specifically, if bhyh was less than (the negative of) the stop-loss amount, the hospital “only” has to pay Medicare the stop-loss amount. Similarly, if bhyh is greater than the stop-gain amount, Medicare “only” has to pay the hospital the stop-gain amount.6 These provisions complicate the model and its estimation presented later, but they do not affect the qualitative economic analysis and do not seem to be quantitatively important (see Einav et al. 2020b), so we abstract from them throughout.

II.C. Experimental Design

CJR was initially designed by CMS as a five-year, mandatory-participation, randomized trial. Year 1 was defined as April 1 to December 31, 2016, and years 2–5 were defined as the 2017–2020 calendar years. CMS randomized 196 eligible MSAs into treatment (bundled payments) or control (status quo FFS). Specifically, MSAs were divided into eight strata based on the interaction of historical LEJR spending quartile and above- versus below-median MSA population. MSA treatment probabilities varied by strata (ranging from 30% to 45%), with higher treatment probabilities for strata with higher historical LEJR payments. CMS announced assignment to treatment and control in the July 2015 Federal Register (CMS 2015b). Treatment and control MSAs are balanced on outcome variables and MSA characteristics (Finkelstein et al. 2018). After exclusions, the program covered 67 treatment MSAs and 104 control MSAs.7 Within the 171 MSAs assigned to treatment or control, a small number of hospital types and episode types were further excluded from eligibility (see Section III for more details).

Participation was mandatory in treatment MSAs: eligible hospitals had no choice but to be reimbursed under the new bundled payment model. This mandatory participation feature was immediately controversial, with then U.S. Representative Tom Price spearheading a letter in September 2016, signed by 179 members of Congress, complaining that mandatory participation was unethical and unauthorized (The Hill 2016). Subsequently, as the new secretary of Health and Human Services—the federal agency charged with overseeing Medicare—Price led the effort to roll back mandatory-participation bundled payment models. As a result, in a rule finalized in December 2017, Medicare decided to cancel two previously scheduled mandatory bundled payment models (Advancing Care Coordination Through Episode Payment and Cardiac Rehabilitation Incentive Payment Models) and modified CJR to be voluntary in half of the treated MSAs for the remaining three program years (CMS 2017).

Specifically, hospital participation in CJR was made voluntary in 33 of the 67 treatment MSAs with the lowest episode claims under FFS (using 2011–2014 data, before random assignment was announced). In these “voluntary treatment” MSAs, hospitals had to make a one-time decision at the beginning of program year 3 of whether to opt in and continue to be paid under bundled payments for the remaining three program years. If they did not opt in, reimbursement would revert to FFS for the remaining three years. About a quarter of the hospitals in the voluntary bundled payment MSAs (73 out of 279) chose to remain under bundled payments, which is typical of participation rates in other voluntary Medicare payment programs, where rates have ranged from about 15% to one-third (Lewin Group 2015). In the 34 mandatory bundled payment MSAs, hospitals did not face a choice and continued to be paid under bundled payments. Control group hospitals were unaffected by this change and continued to be paid under FFS.

In the analysis that follows, we define three time periods. Period 1 is the period before bundled payments when all providers were reimbursed under FFS. Period 2 covers the approximately two years when the mandatory participation regime was in effect. Period 3 is defined as the final three years of the program, when the program was voluntary for some hospitals.8

Figure I shows a flow chart of the experimental design. The middle part of the figure shows the initial assignment to treatment and control for period 2, when the program was mandatory, and the bottom part shows period 3, where treatment MSAs were divided into mandatory and voluntary treatment groups. Because this division was based on predetermined historical MSA spending, we can analogously divide the control MSAs into mandatory and voluntary control MSAs based on this variable. For some of the subsequent analysis, we compare mandatory treatment to mandatory control and voluntary treatment to voluntary control.

Experimental Design
Figure I

Experimental Design

The figure shows the design of the bundled payments experiment. The top part shows the preprogram period. The middle part shows the initial mandatory design in program years 1–2. The bottom part shows the partially voluntary design in program years 3–5. Episode shares are based on data from program years 1–2. *Control group MSAs were assigned to mandatory versus voluntary by the authors, using the same criterion that CMS used to assign treatment group MSAs to mandatory versus voluntary. Specifically, the bottom half of the MSAs in the control group—based on historical spending over the period July 1, 2011, through June 30, 2014—were assigned to voluntary.

III. Data and Descriptive Evidence

In this section, we describe the data and sample and present evidence of the average effects of bundled payments during the mandatory participation period. We present evidence on heterogeneity in levels and slopes across hospitals and on selection on levels and slopes during the voluntary period. These patterns motivate our subsequent modeling decisions.

III.A. Data and Sample

Our main data are the 100% Medicare enrollment and claims files from 2013 to 2018. These contain basic demographic information (age, race, sex, and Medicaid enrollment) and claims for inpatient, outpatient, and post acute care. The claims data include information on Medicare payments made to providers and out-of-pocket payments owed, dates of admission and discharge, diagnoses, and discharge destinations.

We supplement these data with several additional data sources. First, we obtained data from the CJR website on the eligibility and treatment status of each hospital in each year, the hospital’s annual bundle price, annual reconciliation payment, and whether the hospital opted into bundled payments when it became voluntary in 2018 (CMS Innovation Center n.d.). When participation is voluntary (in 2018), we only observe bundle prices for hospitals that select in or are mandated to take part in the bundle payments. In Online Appendix A, we describe how we impute bundle prices for the hospitals that do not select into the program.9 Second, we use data from the 2016 American Hospital Association (AHA) annual survey on the number of beds, ownership type (for-profit, nonprofit, or government-owned), and the teaching status of the hospital. Third, we obtained data from Hospital Compare on each hospital’s official quality measures (for 2016 and 2017).

We limit our analysis sample to the 171 eligible MSAs and, within these MSAs, to hospitals and episodes that were eligible for CJR. MSAs were excluded primarily due to a low volume of hip and knee replacements. Within treatment and control MSAs, hospitals were excluded from CJR if they were already participating in a preexisting Medicare voluntary bundled payment model for LEJR. Episodes were excluded if the patient did not have Medicare as the primary payer, was readmitted during the episode for LEJR, or died during the episode. Finkelstein et al. (2018) provide more detail on these eligibility criteria.

We define period 2 (the period of mandatory participation bundled payments) to include all episodes with an index admission between April 1, 2016, and September 15, 2017. The start date corresponds to the program start date, and the end date was chosen so that nearly all 90-day episodes would end by December 31, 2017, the close of the second year. The end date also ensures that all admissions (and most discharges) occurred before the December 2017 announcement that participation would become voluntary for some MSAs starting on January 1, 2018 (CMS 2017). Similarly, we define period 3 to include all episodes admitted between January 1, 2018 and September 15, 2018. Following prior work (Finkelstein et al. 2018), we define period 1 (the period when all hospitals are under FFS) to include all episodes admitted between April 1, 2013, and September 15, 2014, and omit 2015 from the analysis to avoid contamination from potential anticipatory effects; treatment and control MSAs were announced in July 2015 (CMS 2015a).

To construct our baseline sample, we start with the universe of 1,584 hospitals in the 171 treatment and control MSAs that had a CJR episode during period 2; these hospitals had a total of 396,643 CJR episodes during period 2. So that we can observe outcomes in all three periods, we restrict the sample to the 1,416 hospitals that have at least one CJR episode in each period. These 1,416 hospitals constitute our baseline sample, out of which 647 hospitals are located in a treatment MSA and 769 hospitals are in a control MSA. A total of 379,150 CJR episodes in period 2 fall into these 1,416 hospitals.

III.B. Average Treatment Effects

Average effects of CJR in the two-year mandatory participation period (period 2) have been well studied (Finkelstein et al. 2018; Lewin Group 2018; Barnett et al. 2019; Haas et al. 2019). Because the program was mandatory and assignment was random at the MSA level, we follow Finkelstein et al. (2018) and estimate:
(1)
where outcomej2 is the average per episode outcome in MSA j and period 2, BPj is an indicator for being randomly assigned to bundled payments, and β1 is the average treatment effect of bundled payments. We include lagged outcomes from 2013 and 2014 (i.e., period 1) as controls to improve statistical power. Because the probability of random assignment to treatment varied across strata, we include strata fixed effects, δs(j), to isolate the experimental variation. In all tables, we report heteroskedasticity-robust standard errors.

Table I shows the average treatment effects. To provide a baseline, the first two columns show the mean and the standard deviation of the outcome from the control group in period 2. The remaining columns show the average treatment effect, standard error, and p-value of the estimate.

TABLE I

Experimental Estimates during the Mandatory Participation Period

Control meanControl std. dev.Average treatment effectStandard errorp-value
Panel A: Claims, utilization, and gov’t spending (per episode)
Claims25,2943,603−790204.001
  Claims for index admission13,5422,389−16989.060
  Claims for institutional PAC4,1191,378−499128.001
  Claims for home health1,800918−8959.131
  Other claims5,8325322855.610
Utilization
  Number of days in index admission2.60.4−0.10.0.217
  Number of days in institutional PAC7.72.3−0.60.2.014
Discharge destination
  Institutional PAC0.3130.104−0.0340.009.001
  Home health agency0.3390.1960.0040.018.812
  Home (w/o home health agency)0.3290.2320.0420.018.020
  Other0.0190.032−0.0040.002.052
Government spending25,2943,60340208.848
Panel B: Quality measures
  Complication rate0.0110.0050.0010.001.255
  ER visit during episode0.1980.0270.0030.003.399
  90-day all cause readmission rate0.1020.015−0.0010.002.725
Panel C: Admissions and patient composition
  LEJR admissions (per 1,000 enrollees)29.915.8−0.80.5.095
  CJR-eligible LEJR admissions (per 1,000 enrollees)23.611.30.10.5.889
  Elixhauser comorbidity score2.40.30.00.01.00
Control meanControl std. dev.Average treatment effectStandard errorp-value
Panel A: Claims, utilization, and gov’t spending (per episode)
Claims25,2943,603−790204.001
  Claims for index admission13,5422,389−16989.060
  Claims for institutional PAC4,1191,378−499128.001
  Claims for home health1,800918−8959.131
  Other claims5,8325322855.610
Utilization
  Number of days in index admission2.60.4−0.10.0.217
  Number of days in institutional PAC7.72.3−0.60.2.014
Discharge destination
  Institutional PAC0.3130.104−0.0340.009.001
  Home health agency0.3390.1960.0040.018.812
  Home (w/o home health agency)0.3290.2320.0420.018.020
  Other0.0190.032−0.0040.002.052
Government spending25,2943,60340208.848
Panel B: Quality measures
  Complication rate0.0110.0050.0010.001.255
  ER visit during episode0.1980.0270.0030.003.399
  90-day all cause readmission rate0.1020.015−0.0010.002.725
Panel C: Admissions and patient composition
  LEJR admissions (per 1,000 enrollees)29.915.8−0.80.5.095
  CJR-eligible LEJR admissions (per 1,000 enrollees)23.611.30.10.5.889
  Elixhauser comorbidity score2.40.30.00.01.00

Notes. The table shows results from estimating equation (1) by OLS on period 2 data; the regression includes strata fixed effects and lagged outcomes from period 1. Standard errors are heteroskedasticity robust. Control means and standard deviations are from period 2. “Claims” include patient cost-sharing. The complication rate is defined, as in Finkelstein et al. (2018), as the share of CJR-eligible patients who have at least one of eight underlying complications that go into the total hip arthroplasty/total knee arthroplasty 90-day complication measure used in the targeted quality score.

TABLE I

Experimental Estimates during the Mandatory Participation Period

Control meanControl std. dev.Average treatment effectStandard errorp-value
Panel A: Claims, utilization, and gov’t spending (per episode)
Claims25,2943,603−790204.001
  Claims for index admission13,5422,389−16989.060
  Claims for institutional PAC4,1191,378−499128.001
  Claims for home health1,800918−8959.131
  Other claims5,8325322855.610
Utilization
  Number of days in index admission2.60.4−0.10.0.217
  Number of days in institutional PAC7.72.3−0.60.2.014
Discharge destination
  Institutional PAC0.3130.104−0.0340.009.001
  Home health agency0.3390.1960.0040.018.812
  Home (w/o home health agency)0.3290.2320.0420.018.020
  Other0.0190.032−0.0040.002.052
Government spending25,2943,60340208.848
Panel B: Quality measures
  Complication rate0.0110.0050.0010.001.255
  ER visit during episode0.1980.0270.0030.003.399
  90-day all cause readmission rate0.1020.015−0.0010.002.725
Panel C: Admissions and patient composition
  LEJR admissions (per 1,000 enrollees)29.915.8−0.80.5.095
  CJR-eligible LEJR admissions (per 1,000 enrollees)23.611.30.10.5.889
  Elixhauser comorbidity score2.40.30.00.01.00
Control meanControl std. dev.Average treatment effectStandard errorp-value
Panel A: Claims, utilization, and gov’t spending (per episode)
Claims25,2943,603−790204.001
  Claims for index admission13,5422,389−16989.060
  Claims for institutional PAC4,1191,378−499128.001
  Claims for home health1,800918−8959.131
  Other claims5,8325322855.610
Utilization
  Number of days in index admission2.60.4−0.10.0.217
  Number of days in institutional PAC7.72.3−0.60.2.014
Discharge destination
  Institutional PAC0.3130.104−0.0340.009.001
  Home health agency0.3390.1960.0040.018.812
  Home (w/o home health agency)0.3290.2320.0420.018.020
  Other0.0190.032−0.0040.002.052
Government spending25,2943,60340208.848
Panel B: Quality measures
  Complication rate0.0110.0050.0010.001.255
  ER visit during episode0.1980.0270.0030.003.399
  90-day all cause readmission rate0.1020.015−0.0010.002.725
Panel C: Admissions and patient composition
  LEJR admissions (per 1,000 enrollees)29.915.8−0.80.5.095
  CJR-eligible LEJR admissions (per 1,000 enrollees)23.611.30.10.5.889
  Elixhauser comorbidity score2.40.30.00.01.00

Notes. The table shows results from estimating equation (1) by OLS on period 2 data; the regression includes strata fixed effects and lagged outcomes from period 1. Standard errors are heteroskedasticity robust. Control means and standard deviations are from period 2. “Claims” include patient cost-sharing. The complication rate is defined, as in Finkelstein et al. (2018), as the share of CJR-eligible patients who have at least one of eight underlying complications that go into the total hip arthroplasty/total knee arthroplasty 90-day complication measure used in the targeted quality score.

1. Healthcare Claims, Use, and Government Spending

Table I, Panel A examines effects on healthcare claims, healthcare use, and government spending per episode. “Claims” consist of Medicare claims paid and patient cost sharing owed over the entire episode of care but do not account for any reconciliation payment associated with a bundled payment; they thus correspond to yh in the notation from Section II.B and are measured as average per episode claims. Average per episode claims in the control group were about |${\$}$|25,300, with roughly half of this spending on the index admission, which is already reimbursed with a DRG-based, prospective payment under the status quo. Of the remaining |${\$}$|11,800, about |${\$}$|4,100 comes from postdischarge claims for institutional post acute care (PAC)—predominantly skilled nursing facilities (SNFs)—|${\$}$|1,800 represents post discharge claims for home health care, and the remaining |${\$}$|5,800 includes categories such as claims for the surgeon and other physicians (both inpatient and outpatient), hospice, and durable medical equipment, such as wheelchair rental.

We estimate that bundled payments reduced average episode claims by about |${\$}$|800, or about 3%, a statistically significant but economically modest result. This reduction is primarily driven by a statistically significant |${\$}$|500 decline in claims for institutional PAC (12% of the control mean), with no statistically or economically significant effects on other categories of claims. The effects on claims are reflected in the effects on utilization. Bundled payments did not affect average length of stay for the index admission, but they decreased the unconditional average number of days spent in institutional PAC by about 0.6 days (8% of the control mean).

The decline in use of institutional PAC in turn reflects, at least in part, an extensive-margin response in patients’ discharge destination following their index admission. In the control group, patients are discharged to institutional PAC, home with home healthcare, and home without home healthcare in equal proportion. Bundled payments reduced discharges to institutional PAC by a statistically significant 3.4 percentage points (11%). The decline in discharges to institutional PAC is accompanied by a similarly sized increase in discharges to home without home health. This can be either because the patients who would have been sent to institutional PAC are being sent home without home health, or because there is a cascading effect where the patients who would have been sent to institutional PAC are being sent home with home health, and patients who would have been sent home with home health are now being sent home without home health support. A cascading effect seems more likely (to us), but we cannot differentiate between these two mechanisms. These experimental estimates are consistent with qualitative evidence on how hospitals respond to CJR. In a survey of hospital executives and administrators, Zhu et al. (2018) find that hospitals report responding by reducing SNF discharges using risk stratification and home care support and by forming networks of preferred SNFs to influence quality and costs, conditional on discharge.

Although bundled payments reduced episode claims as they would be paid under FFS, actual government spending (claims, yh, under FFS and the bundle price, bh, under bundled payments) does not change. The point estimate indicates a statistically insignificant increase of |${\$}$|33 (standard error of |${\$}$|208) in average government spending per episode. The lack of a reduction in government spending—despite modest reductions in submitted claims—reflects the design feature of bundled payments, according to which bundle prices were set to approximate counterfactual claims under FFS.

2. (Lack of) Quality Shirking and Cream Skimming

The primary concern with bundled payments is that because providers are no longer paid on the margin, they will cut back on medically necessary care, cherry-pick patients who have lower costs of provision, or both. The quality incentives provided by the program, as well as physician ethics, reputational concerns, and the threat of malpractice lawsuits may limit any quality response. Indeed, to the extent that low-quality care increases downstream costs, hospitals may have incentives to improve quality. Consistent with prior work on CJR, the bottom two panels of Table I show no evidence of an impact of CJR on quality of care or patient composition.

Panel B examines three measures of quality: a clinically defined complication rate, whether the patient had an emergency room visit during the episode, and 90-day all-cause readmission. We estimate a fairly precise zero effect on all of these measures. Of course, in interpreting this evidence, it is important to bear in mind that the quality measures available in claims data are limited. We cannot, for example, measure outcomes such as morbidity, mobility, or activities of daily living in our data. However, surveys of patients at the end of their episode of care show similar improvements in functional status and pain in the treatment and control groups (Lewin Group 2020), which is consistent with clinical trial evidence that patient mobility following knee surgery is not improved by inpatient (as opposed to outpatient) rehabilitation (Buhagiar et al. 2017). Given this lack of evidence of a quality response on the margins we can observe, in our subsequent model and counterfactual exercises, we assume that quality remains fixed and that patients’ utility is unaffected by the hospital’s response to incentives.

Panel C examines patient volume and composition. We estimate a precise zero effect on the number of LEJR admissions per 1,000 Medicare enrollees and the number of CJR-eligible admissions. We examine patient composition by estimating effects on the Elixhauser Comorbidity Score of the patient pool, which is constructed as the sum of indicators for 31 comorbidities (Elixhauser et al. 1998; Quan et al. 2005). We estimate a precise zero effect on this measure as well. The lack of a patient volume response is consistent with LEJR as a nondiscretionary procedure, or at least a procedure where the change in financial incentives from bundled payments is small relative to other determining factors.

The lack of cream skimming may also reflect the fact that assignment to bundled payments in period 2 is determined at the MSA level, so that the closest substitutes for a given hospital are likely to be paid under the same regime. In principle, cream-skimming responses could potentially be different in period 3 once participation is voluntary, as there may now be hospitals paid under bundled payments and under FFS in the same MSA. Indeed, in a different voluntary payment model, Alexander (2020) documents that physicians strategically direct patients across hospitals in a local area to maximize revenue. We therefore looked at whether hospitals that opted to remain under the bundled payment model experienced changes in their patient volume or patient composition relative to hospitals in the same market that opted out of the bundled payment regime. We might expect physicians to direct more complex patients toward hospitals that are no longer under bundled payment while steering less complex (and hence lower expected cost) patients toward hospitals that remained under bundled payment. However, Online Appendix Table A.1 finds no clear evidence of such effects, at least during the first year of the voluntary regime, which is covered by our data.

Because our article focuses on selection—which is a hospital-level choice—all of our remaining analyses are conducted at the hospital level, with hospitals weighted by the number of episodes in period 2 so that the results are representative of the average episode. Online Appendix Table A.2 shows that the main results from Table I are largely similar when estimated at the MSA level weighted by the number of episodes, or at the hospital level weighted by the number of episodes, although some of the point estimates shrink in magnitude. In addition, one of our three quality measures (the complication rate) shows a statistically significant increase of 0.2 percentage points (off a base of 1.1%) in one of the three specifications in Online Appendix Table A.2.

3. (Lack of) Dynamic Effects

Online Appendix Table A.3 explores the dynamics of these treatment effects. We focus on the outcomes where the treatment effects are noticeable: total episode claims, institutional PAC claims, and the share discharged to institutional PAC. For these outcomes, we report treatment effects for four roughly biannual time periods in period 2 (2016–2017) for all MSAs (Panel A) and voluntary MSAs (Panel B). In addition, when we analyze the time pattern of treatment effects for mandatory MSAs (Panel C), we extend the analysis to add two additional time periods in 2018 (the first year of period 3). The results indicate that the treatment effects occur immediately and are largely stable over time, albeit with slight (but statistically insignificant) upward trends for total episode and institutional PAC claims.

The immediate and relatively stable treatment effects suggest that dynamics (e.g., learning by doing) are unlikely to be first order. The similar time patterns for the treatment effects across voluntary and mandatory MSAs (compare Online Appendix Table A.3, Panels B and C) suggest that there was little anticipatory response to the opportunity to opt out of CJR in period 3 (finalized in December 2017). That is, there is no evidence that the treatment effects fade out in voluntary MSAs in anticipation of the opt out of many of these hospitals in 2018.

More generally, the patterns are consistent with qualitative evidence from Zhu et al. (2018) and the Lewin Group (2018, 2019a, 2019b, 2020) on how hospitals responded to the incentives from bundled payments. Based on semistructured interviews with hospital executives and administrators, these authors report that hospitals endeavored to reduce spending on institutional PAC through a mix of fixed cost and variable cost activities. To reduce discharges to institutional PAC, hospitals employed presurgery patient education and physical therapy (so-called prehabilitation), both primarily variable costs. To control spending conditional on discharge to institutional PAC, hospitals worked to form networks of preferred SNFs and to monitor patients at these SNFs. Although the formation of networks is primarily a fixed cost, building data platforms and hiring care coordinators to follow up with PAC facilities and track patient suitability for discharge involves a mix of variable and fixed costs.

III.C. Heterogeneity in Levels and Slopes

The average treatment effects reported in the last section mask substantial heterogeneity in the levels and slopes across hospitals. Table II examines this heterogeneity, focusing again on total episode claims, institutional PAC claims, and the share discharged to institutional PAC.

TABLE II

Correlates of Levels and Slopes

Panel A: Heterogeneity in levelsPanel B: Heterogeneity in slopes
ClaimsClaims for institutional PACProbability of discharge to PACClaimsClaims for institutional PACProbability of discharge to PAC
Mean (std. dev.)28,357(5,998)5,814(3,021)0.455(0.187)−1,154(3,054)−1,011(1,809)−0.057(0.105)
Coefficient (std. err.) from bivariate regression
 Number of CJR episodes−5.31(1.56)−3.39(0.69)−0.0001(0.0001)−0.85(0.57)−1.10(0.37)−0.00003(0.00002)
 Quality−441(41)−197(18)−0.010(0.001)−123(24)−77(15)−0.004(0.001)
 Number of beds3.90(1.21)0.62(0.32)0.0001(0.00003)0.53(0.46)0.09(0.20)0.000004(0.00001)
 Teaching4,528(599)561(258)0.049(0.021)819(525)−212(241)−0.025(0.017)
 For-profit−3,030(660)−387(304)−0.064(0.025)−1,012(651)−414(292)−0.030(0.020)
 Nonprofit−219(596)369(264)0.008(0.023)−32(617)−160(246)−0.012(0.018)
Panel A: Heterogeneity in levelsPanel B: Heterogeneity in slopes
ClaimsClaims for institutional PACProbability of discharge to PACClaimsClaims for institutional PACProbability of discharge to PAC
Mean (std. dev.)28,357(5,998)5,814(3,021)0.455(0.187)−1,154(3,054)−1,011(1,809)−0.057(0.105)
Coefficient (std. err.) from bivariate regression
 Number of CJR episodes−5.31(1.56)−3.39(0.69)−0.0001(0.0001)−0.85(0.57)−1.10(0.37)−0.00003(0.00002)
 Quality−441(41)−197(18)−0.010(0.001)−123(24)−77(15)−0.004(0.001)
 Number of beds3.90(1.21)0.62(0.32)0.0001(0.00003)0.53(0.46)0.09(0.20)0.000004(0.00001)
 Teaching4,528(599)561(258)0.049(0.021)819(525)−212(241)−0.025(0.017)
 For-profit−3,030(660)−387(304)−0.064(0.025)−1,012(651)−414(292)−0.030(0.020)
 Nonprofit−219(596)369(264)0.008(0.023)−32(617)−160(246)−0.012(0.018)

Notes. Panel A reports hospital-specific levels (hospital-specific period 1 outcomes) and coefficients from separate regressions of hospital-specific levels on hospital characteristics. Panel B reports hospital-specific slopes (obtained from estimating equation (2)) and coefficients from separate regressions of hospital-specific slopes on hospital characteristics. In both panels, the coefficients on for-profit and nonprofit are obtained from the same regression, where government-owned is the omitted category. All regressions are weighted by the number of episodes in period 2. Robust standard errors are shown in parentheses.

TABLE II

Correlates of Levels and Slopes

Panel A: Heterogeneity in levelsPanel B: Heterogeneity in slopes
ClaimsClaims for institutional PACProbability of discharge to PACClaimsClaims for institutional PACProbability of discharge to PAC
Mean (std. dev.)28,357(5,998)5,814(3,021)0.455(0.187)−1,154(3,054)−1,011(1,809)−0.057(0.105)
Coefficient (std. err.) from bivariate regression
 Number of CJR episodes−5.31(1.56)−3.39(0.69)−0.0001(0.0001)−0.85(0.57)−1.10(0.37)−0.00003(0.00002)
 Quality−441(41)−197(18)−0.010(0.001)−123(24)−77(15)−0.004(0.001)
 Number of beds3.90(1.21)0.62(0.32)0.0001(0.00003)0.53(0.46)0.09(0.20)0.000004(0.00001)
 Teaching4,528(599)561(258)0.049(0.021)819(525)−212(241)−0.025(0.017)
 For-profit−3,030(660)−387(304)−0.064(0.025)−1,012(651)−414(292)−0.030(0.020)
 Nonprofit−219(596)369(264)0.008(0.023)−32(617)−160(246)−0.012(0.018)
Panel A: Heterogeneity in levelsPanel B: Heterogeneity in slopes
ClaimsClaims for institutional PACProbability of discharge to PACClaimsClaims for institutional PACProbability of discharge to PAC
Mean (std. dev.)28,357(5,998)5,814(3,021)0.455(0.187)−1,154(3,054)−1,011(1,809)−0.057(0.105)
Coefficient (std. err.) from bivariate regression
 Number of CJR episodes−5.31(1.56)−3.39(0.69)−0.0001(0.0001)−0.85(0.57)−1.10(0.37)−0.00003(0.00002)
 Quality−441(41)−197(18)−0.010(0.001)−123(24)−77(15)−0.004(0.001)
 Number of beds3.90(1.21)0.62(0.32)0.0001(0.00003)0.53(0.46)0.09(0.20)0.000004(0.00001)
 Teaching4,528(599)561(258)0.049(0.021)819(525)−212(241)−0.025(0.017)
 For-profit−3,030(660)−387(304)−0.064(0.025)−1,012(651)−414(292)−0.030(0.020)
 Nonprofit−219(596)369(264)0.008(0.023)−32(617)−160(246)−0.012(0.018)

Notes. Panel A reports hospital-specific levels (hospital-specific period 1 outcomes) and coefficients from separate regressions of hospital-specific levels on hospital characteristics. Panel B reports hospital-specific slopes (obtained from estimating equation (2)) and coefficients from separate regressions of hospital-specific slopes on hospital characteristics. In both panels, the coefficients on for-profit and nonprofit are obtained from the same regression, where government-owned is the omitted category. All regressions are weighted by the number of episodes in period 2. Robust standard errors are shown in parentheses.

To examine hospital-specific levels, in principle we would want to examine the level for each outcome in period 2. Because treated hospitals change behavior in response to bundled payments (see Table I), we do not observe unaffected spending levels. The model developed in the following sections allows us to formally recover counterfactual spending levels. To provide model-free evidence, for now we simply look at outcomes in period 1, when all hospitals were paid under FFS. Since levels are strongly autocorrelated within hospitals over time, period 1 levels are a good proxy for levels in period 2.10

To construct hospital-specific slopes (i.e., the behavioral changes in response to bundled payments in period 2), we estimate a modified version of equation (1) that allows the treatment effect of bundled payments to vary by hospital. Letting outcomeh2 denote the average episode outcome for hospital h in period 2, we estimate:
(2)
where BPh is an indicator for being randomly assigned to bundled payments and β1h is the hospital-specific treatment effect. As in equation (1), we include lagged outcomes as covariates to improve statistical power, although in this specification the lags are defined at the hospital level rather than the MSA level. As before, we also include strata fixed effects because randomization was conducted within strata. We estimate this specification on the set of voluntary treatment and control hospitals (see Figure I).

The top row of Table II reports the mean and standard deviation of the hospital-specific levels and slopes. The means of the hospital-specific levels and slopes estimates are similar to control mean and average effects reported in Table I, and the standard deviations indicate substantial heterogeneity in levels and slopes across hospitals.11 The remaining rows of Table II report coefficients from bivariate regressions of these hospital-specific levels and slopes on hospital characteristics. Panel A shows that levels for each outcome are lower at hospitals with more CJR episodes, a higher quality index, and for-profit facilities, but higher at larger hospitals, teaching hospitals, and nonprofits. The patterns are similar in Panel B, where we examine associations with hospital slopes. That is, the hospital characteristics that are associated with low levels are also associated with larger decreases in outcomes in response to the incentives from bundled payments.12

Although there are clear patterns in these associations, hospital characteristics only explain a small share of the overall variation in levels and slopes. In Online Appendix B, we describe a variance decomposition exercise that quantifies the explanatory power of the hospital characteristics. The accompanying Online Appendix Table A.5 shows that the hospital-level characteristics explain little of the cross-hospital variation. Specifications with all of the characteristics, along with strata and MSA fixed effects, explain only a quarter of the variation in levels and even less of the variation in slopes. Fully saturated specifications that additionally control for all available patient characteristics still leave at least half of the variation unexplained.

In addition to elucidating the specific activities that hospitals engage in to respond to the incentives from bundled payments, the interviews conducted by Zhu et al. (2018) and the Lewin Group (2018, 2019a, 2019b, 2020) also shed light on some of the sources of the remaining unexplained heterogeneity in slopes (i.e., treatment effects). Some hospitals reported that prior experience with bundled payments (e.g., with private insurance) or with other value-based performance reforms in Medicare left them well placed to make changes, while others noted that their prior efforts left less scope for further improvement (Lewin Group 2018). There was more general agreement across hospitals on the importance of having a “physician champion” who would coordinate and lead efforts for change in response to the bundled payment (Lewin Group 2019b). Hospitals also noted the importance of obtaining buy-in from surgeons, which was generally easier when surgeons were employed by the hospital but could also be achieved when surgeons were not employees (Lewin Group 2020).

III.D. Selection on Levels and Slopes

As we formalize in the next section, hospitals have incentives to select into CJR on both levels and slopes. By selection on levels, we mean that hospitals have a larger incentive to select in if their average claims, holding behavior fixed at what it would be under FFS, would be below their bundle price. By selection on slopes, we mean that hospitals have a larger incentive to select in if they can more easily reduce their average claims below the bundle price. We present descriptive evidence on both margins, examining how the decision among voluntary treatment hospitals to select in or out of bundled payments in period 3 correlates with episode claim levels in period 1 (levels) and behavioral responses to bundled payments in period 2 (slopes). Table III presents the results.

TABLE III

Selection

Voluntary controlVoluntary select-inVoluntary select-outp-value of select-in vs. select-out difference
(1)(2)(3)(4)
Number of hospitals32373183
Number of episodes in period 151,46914,66424,777
Percent of episodes in period 137.2%62.8%
Panel A: Selection on levels (period 1 outcomes per episode)
 Claims26,52426,14627,776.03
(5,367)(4,176)(5,497)
 Claims for institutional PAC4,8114,6815,551.01
(2,402)(2,054)(2,446)
 Share discharged to institutional PAC36.9%37.3%41.6%.10
(16.0%)(16.2%)(14.5%)
Panel B: Selection on slopes
 Impact on episode claims−791−665.73
(1,931)(2,826)
 Impact on institutional PAC claims−518−176.05
(973)(1,474)
 Impact on share discharged to institutional PAC−3.3%−1.2%.12
(7.8%)(9.2%)
Panel C: Selection on hospital characteristics
 Mean number of CJR episodes306320252.13
 Mean number of beds339320362.43
 Teaching18.7%4.3%16.1%.02
 For-profit17.5%26.6%12.5%.06
 Nonprofit78.5%63.4%68.6%.56
 Government-owned4.0%10.0%18.9%.20
 Mean quality score11.713.110.6.001
Voluntary controlVoluntary select-inVoluntary select-outp-value of select-in vs. select-out difference
(1)(2)(3)(4)
Number of hospitals32373183
Number of episodes in period 151,46914,66424,777
Percent of episodes in period 137.2%62.8%
Panel A: Selection on levels (period 1 outcomes per episode)
 Claims26,52426,14627,776.03
(5,367)(4,176)(5,497)
 Claims for institutional PAC4,8114,6815,551.01
(2,402)(2,054)(2,446)
 Share discharged to institutional PAC36.9%37.3%41.6%.10
(16.0%)(16.2%)(14.5%)
Panel B: Selection on slopes
 Impact on episode claims−791−665.73
(1,931)(2,826)
 Impact on institutional PAC claims−518−176.05
(973)(1,474)
 Impact on share discharged to institutional PAC−3.3%−1.2%.12
(7.8%)(9.2%)
Panel C: Selection on hospital characteristics
 Mean number of CJR episodes306320252.13
 Mean number of beds339320362.43
 Teaching18.7%4.3%16.1%.02
 For-profit17.5%26.6%12.5%.06
 Nonprofit78.5%63.4%68.6%.56
 Government-owned4.0%10.0%18.9%.20
 Mean quality score11.713.110.6.001

Notes. The table reports means (standard deviations in parentheses). In Panel A, all outcomes are defined in period 1 (although the bundle price is defined in period 2). Panel B reports the average (and standard deviation) of the hospital-specific slopes (obtained from estimating equation (2)). In Panel C, the mean number of CJR episodes are based on period 1 data. The mean number of beds and percentage of hospitals that are teaching, for-profit, nonprofit, and government-owned are taken from the 2016 American Hospital Association annual survey; we are unable to match three hospitals to this survey. For these outcomes, the number of hospitals in control, select-in, and select-out are 323, 73, and 182, respectively. Finally, mean quality score is based on a modified version of the hospitals’ composite quality scores from period 2, which is based on the first 18 points of the score. The p-values of the differences are computed based on a simple t-test of equality of the means.

TABLE III

Selection

Voluntary controlVoluntary select-inVoluntary select-outp-value of select-in vs. select-out difference
(1)(2)(3)(4)
Number of hospitals32373183
Number of episodes in period 151,46914,66424,777
Percent of episodes in period 137.2%62.8%
Panel A: Selection on levels (period 1 outcomes per episode)
 Claims26,52426,14627,776.03
(5,367)(4,176)(5,497)
 Claims for institutional PAC4,8114,6815,551.01
(2,402)(2,054)(2,446)
 Share discharged to institutional PAC36.9%37.3%41.6%.10
(16.0%)(16.2%)(14.5%)
Panel B: Selection on slopes
 Impact on episode claims−791−665.73
(1,931)(2,826)
 Impact on institutional PAC claims−518−176.05
(973)(1,474)
 Impact on share discharged to institutional PAC−3.3%−1.2%.12
(7.8%)(9.2%)
Panel C: Selection on hospital characteristics
 Mean number of CJR episodes306320252.13
 Mean number of beds339320362.43
 Teaching18.7%4.3%16.1%.02
 For-profit17.5%26.6%12.5%.06
 Nonprofit78.5%63.4%68.6%.56
 Government-owned4.0%10.0%18.9%.20
 Mean quality score11.713.110.6.001
Voluntary controlVoluntary select-inVoluntary select-outp-value of select-in vs. select-out difference
(1)(2)(3)(4)
Number of hospitals32373183
Number of episodes in period 151,46914,66424,777
Percent of episodes in period 137.2%62.8%
Panel A: Selection on levels (period 1 outcomes per episode)
 Claims26,52426,14627,776.03
(5,367)(4,176)(5,497)
 Claims for institutional PAC4,8114,6815,551.01
(2,402)(2,054)(2,446)
 Share discharged to institutional PAC36.9%37.3%41.6%.10
(16.0%)(16.2%)(14.5%)
Panel B: Selection on slopes
 Impact on episode claims−791−665.73
(1,931)(2,826)
 Impact on institutional PAC claims−518−176.05
(973)(1,474)
 Impact on share discharged to institutional PAC−3.3%−1.2%.12
(7.8%)(9.2%)
Panel C: Selection on hospital characteristics
 Mean number of CJR episodes306320252.13
 Mean number of beds339320362.43
 Teaching18.7%4.3%16.1%.02
 For-profit17.5%26.6%12.5%.06
 Nonprofit78.5%63.4%68.6%.56
 Government-owned4.0%10.0%18.9%.20
 Mean quality score11.713.110.6.001

Notes. The table reports means (standard deviations in parentheses). In Panel A, all outcomes are defined in period 1 (although the bundle price is defined in period 2). Panel B reports the average (and standard deviation) of the hospital-specific slopes (obtained from estimating equation (2)). In Panel C, the mean number of CJR episodes are based on period 1 data. The mean number of beds and percentage of hospitals that are teaching, for-profit, nonprofit, and government-owned are taken from the 2016 American Hospital Association annual survey; we are unable to match three hospitals to this survey. For these outcomes, the number of hospitals in control, select-in, and select-out are 323, 73, and 182, respectively. Finally, mean quality score is based on a modified version of the hospitals’ composite quality scores from period 2, which is based on the first 18 points of the score. The p-values of the differences are computed based on a simple t-test of equality of the means.

Table III, Panel A shows how the selection decision varies with period 1 levels. Specifically, we show mean outcomes and their standard deviation for three groups of hospitals. Column (1) shows hospitals in the voluntary control group, which we define as control group hospitals that would have been assigned to voluntary based on their prior spending levels (see Figure I). Columns (2) and (3) show period 1 outcomes for hospitals in the voluntary treatment group, split by those which in period 3 selected into bundled payments and out of bundled payments. The three rows report results for the three outcomes where we observed a statistically significant effect of bundled payments in Table I: episode claims, claims for institutional PAC, and share of patients discharged to institutional PAC.

The results—which are similar to those in Wilcock et al. (2021)—are consistent with selection on levels. In Table II we showed substantial heterogeneity across hospitals in period 1 levels, indicating potential scope for selection. Table III, columns (2) and (3) show that, as expected, hospitals which select into bundled payments have, on average, about |${\$}$|1,600 lower average episode claims than those who select out, a statistically significant difference that is about 6% of the control mean. The patterns are similar for claims at and the share discharged to institutional PAC.

To assess selection on slopes, we use the estimated hospital-specific slopes from Table II (i.e., the estimated behavioral changes in response to bundled payments in period 2) and then examine how selection into bundled payments in period 3 varies with this measure. Table III, Panel B shows the results. Specifically, we show the average estimated hospital-specific treatment effects (β1h) and their standard deviation separately across hospitals that select into bundled payments (column (2)) and those that select out (column (3)). The results once again show selection in the expected direction: for hospitals that selected into bundled payments in period 3, bundled payments in period 2 reduces average claims per episode by |${\$}$|791, compared with a lower reduction (⁠|${\$}$|665) for hospitals that revert to FFS in period 3. However, these differences in average slopes are not statistically distinguishable (column (4)). Selection is also in the expected direction for the other two outcomes in Panel B: hospitals that experienced greater declines in institutional PAC claims and in the share of patients discharged to institutional PAC due to bundled payments are more likely to remain under bundled payments. The difference in the effect on institutional PAC claims in period 2 between those who remain in bundled payments (⁠|${\$}$|518) and those who select out (⁠|${\$}$|176) in period 3 is statistically distinguishable (p-value = .05).13

Table III, Panel C briefly examines other characteristics of hospitals that select in and select out of bundled payments. Hospitals that select into bundled payments in period 3 have a somewhat higher volume of CJR episodes in period 1, suggesting there may be fixed costs to remaining in the program, a point we return to with our model specification in Section V. Hospitals that select in are less likely to be teaching hospitals, more likely to be for-profit, and less likely to be government-owned; they are also associated with higher measured quality.

IV. Model of Voluntary Selection

IV.A. Setting

We consider a pool of CJR episodes, indexed by i, which are admitted to hospital h. We assume throughout that this pool is taken as given and is known to the hospital.

Under FFS, providers are reimbursed based on claims. Let λi denote the claims generated under FFS incentives by a given episode. The preceding sections’ description of the institutional environment and the estimates of the average effects of bundled payments suggest that it is useful to decompose |$\lambda _i = f_i^{HOSP} + f_i^{OTH}$|⁠, where |$f_i^{HOSP}$| are the fixed, DRG-based claims submitted for the index hospitalization and |$f_i^{OTH}$| are the claims submitted by PAC and other downstream providers. Let |$c_i^{HOSP}$| denote the costs incurred by the hospital and |$c_i^{OTH}$| the costs incurred by the other providers. For tractability, we assume that other providers are reimbursed at cost, so that |$f_i^{OTH}=c_i^{OTH}$|⁠.14 In what follows, for each variable xi, we focus on hospital-level averages, defined as |$x_h = \frac{1}{n_h}\sum _{i=1}^{n_h} x_i$|⁠, where nh is the number of episodes at the hospital.

1. Hospital Profits and Participation Incentive

Under FFS, average government spending (i.e., Medicare reimbursement) per episode is |$\lambda _h = f_h^{HOSP} +f_h^{OTH}$|⁠. Hospitals only incur costs and receive Medicare payment for costs in the hospital, so they earn profits |$\pi _h^{FFS} = f_h^{HOSP}-c_h^{HOSP}$|⁠. Under bundled payments, Medicare reimburses the admitting hospital the fixed bundle price bh for the entire episode, so average government spending per episode is bh. Hospitals are effectively required to incur not only hospital costs |$c_h^{HOSP}$| but also downstream providers’ claims |$f_h^{OTH}$|⁠, which would have been reimbursed by Medicare under FFS. We assume that the hospital can reduce claims outside of the hospital by e by exerting “effort” φh(e), where φh(0) = 0, |$\phi ^{\prime }_h > 0$|⁠, and |$\phi ^{\prime \prime }_h > 0$|⁠.15 Hospitals, thus, choose effort to maximize
(3)
and optimal effort is pinned down by |$\phi ^{\prime }_h(e_h^*)=1$|⁠. Because the hospital internalizes both the social marginal cost and benefit of effort, bundled payments results in the first-best effort level.

For tractability, we assume that the cost of effort is quadratic of the form |$\phi _h(e) = \frac{e^2}{2\omega _h}$|⁠, where ωh > 0 is a hospital-specific parameter. With this assumption, under bundled payments the hospital’s optimal choice of effort is |$e_h^*=\omega _h$|⁠, average claims are |$f_h^{HOSP} + f_h^{OTH} - \omega _h = \lambda _h-\omega _h$|⁠, and hospital profits are |$\pi _h^{BP} = b_h - (c_h^{HOSP} +f_h^{OTH} - \frac{\omega _h}{2})$|⁠.

Hospitals select into a voluntary bundled payment program, denoted by the indicator BPh = 1, if and only if |$\pi _h^{BP}> \pi _h^{FFS}$|⁠. Substituting in yields the criterion
(4)
where the left-hand side of the inequality is the sum of a level effect (bh − λh) and a slope effect (⁠|$\frac{\omega _h}{2}$|⁠). The level effect (bh − λh) represents the transfer payment hospitals would receive from the government under bundled payments relative to FFS if they did not change their behavior from what it was under FFS. The slope effect (⁠|$\frac{\omega _h}{2}$|⁠) denotes the net savings that hospitals get from any change in behavior under bundled payments, which are the reduced provider costs |$e_h^*=\omega _h$| net of the effort cost that reduction entails (⁠|$\frac{\omega _h}{2}$|⁠). These incentives are well understood by the hospital industry. For example, ArborMetrix, a healthcare consulting firm, advises their client hospitals to consider the following questions when deciding whether to participate in a bundled payments program: “One: ‘How good is my target price?’ Two ‘What has changed [since the target prices were set]?’ And three: What is my opportunity to improve?” (ArborMetrix n.d.)

2. Social Welfare

The distinction between selection on levels and slopes has important implications for the social welfare consequences of voluntary programs. We define social welfare W as the sum of consumer surplus (S) and producer profits (π), minus government spending (G) weighted by the marginal cost of public funds Λ > 0:
(5)
The multiplier Λ > 0 captures the deadweight loss associated with raising government revenue through distortionary taxation. Alternatively, it can be thought of as capturing a societal preference for money in the hands of the government (or consumers) rather than in the hands of hospitals.16 Consistent with the descriptive results in Section III, we assume that the hospital’s effort does not affect patient welfare (S).
Government spending per episode (G) is bh under bundled payments and λh under FFS. Plugging in for these and for hospital profits implies that hospital participation in the bundled payments improves social welfare if and only if
(6)

Equation (6) illustrates the key social welfare trade-off. On the one hand, bundled payments incentivize hospitals to exert the first–best level of effort |$e_h^*=\omega _h$|⁠, which increases social welfare by |$\frac{\omega _h}{2}$|⁠. On the other hand, enticing hospitals to participate in bundled payments increases government spending by bh − λh, which is associated with a social cost of Λ per dollar transferred. In other words, because of the cost of public funds Λ, and the need to ensure that hospitals are willing to participate in bundled payments, hospital participation is not always social-welfare enhancing.

IV.B. Graphical Intuition

We illustrate the setting graphically in Figure II, which depicts the participation incentives for hospitals and the corresponding social-welfare implications. Hospitals are represented by a {λh, ωh} pair. If one could mandate participation without any additional government costs, the welfare-maximizing outcome would be to mandate that all hospitals join the BP program (given that ωh is positive, by design, for all hospitals). However, if participation is voluntary and Medicare’s ability to encourage participation rests on the financial incentive, bh, the trade-off is represented in Figure II.

Hospital Selection Into Bundled Payment and Social Welfare Implications
Figure II

Hospital Selection Into Bundled Payment and Social Welfare Implications

The figure shows, for a given bundle price b, the hospital participation decision and social welfare implications as a function of the hospital’s level (λ), shown on the horizontal axis, and slope (ω), shown on the vertical axis.

To draw the figure, we hold the bundle price b fixed across hospitals. At this payment, the solid line represents the set of hospitals that are indifferent between participation in bundled payments and FFS. Hospitals to the left prefer bundled payments, because the sum of the transfer holding their behavior constant (b − λ) and the savings they get under bundled payments (⁠|$\frac{\omega }{2}$|⁠) is positive. Hospitals to the right of the solid line prefer to remain under FFS. Thus, hospitals have both a simple “level” incentive to participate (if bh − λh > 0) and an additional “slope” incentive, which explains why the solid line slopes up. All else equal, a higher ωh provides an additional incentive for the hospital to join the bundled payment program as it captures some of the savings it can generate.

The dashed line in Figure II represents the set of hospitals for which social welfare is the same whether they participate in bundled payment or FFS. While the slope effect (⁠|$\frac{\omega }{2}$|⁠) enters identically (and positively) into the private participation condition (equation (4)) and the social-welfare condition (equation (6)), the level effect (b − λ) enters positively into the hospital’s participation decision (equation (4)) but negatively in the social-welfare calculus (equation (6)). This explains why the dashed line is downward sloping and illustrates the central social-welfare tension in designing a voluntary regime: enticing providers to participate can be socially costly.

Taken together, Figure II partitions hospitals into three groups: hospitals that choose the FFS regime, hospitals that efficiently select into bundled payments, and hospitals that select into bundled payments inefficiently because they get paid much more than they “should” but do not generate significant efficiency gains (due to low ωh).

IV.C. Targeting

The bundled payment program aligns effort incentives. If Medicare could generate participation without any additional public expenditure, it would be social-welfare improving to do so (since we assume ωh > 0). However, in a voluntary regime Medicare must respect the hospitals’ participation constraint. If Medicare has perfect information about {λh, ωh}, it could maximize social welfare by setting |$b_h=\lambda _h- \frac{\omega _h}{2}$| for each hospital. Under these bundle prices, all hospitals would voluntarily participate and government spending would be lower.

Once information about the joint distribution of {λh, ωh} is incomplete, setting the payment amount involves a trade-off, similar to the one in the classic optimal regulation design problem of Laffont and Tirole (1993). Figure III illustrates this trade-off in our setting. In Panel A we start with Figure II and superimpose on it the participation and social welfare indifference sets that are associated with a higher bundle price b > b. The black (solid and dashed) indifference lines that correspond to b are analogous to the gray lines, which correspond to b. Naturally, the higher payment amount increases the share of hospitals that select into bundled payments. For many of the marginal hospitals that opt in, participation increases social welfare. At the same time, however, the greater payment increases the social welfare cost associated with inframarginal participants, by a fixed amount of Λ(bb), and in doing so makes participation social-welfare reducing for some of these hospitals (those that lie in between the two dashed lines).

Model Illustration
Figure III

Model Illustration

The figure illustrates some of the key analytics in voluntary bundled payment design as a function of the hospital’s level (λ), shown on the horizontal axis, and slope (ω), shown on the vertical axis. Panel A illustrates the trade-offs involved in setting higher bundle prices b > b; Panels B–D consider the effect of different primitives and targeting, with Panels B and C comparing outcomes with higher versus lower ω and Panels C and D comparing outcomes with more versus less unobserved heterogeneity in λ.

The ability to effectively target depends on the underlying joint distribution of {λh, ωh} as well as any information about this joint distribution that the social planner can condition on in setting bundle prices. This is shown in the remaining panels of Figure III, which use ovals to illustrate three examples of underlying joint distributions, conditional on (priced) observables.

Comparing Panels B and C shows the importance of the overall level of ωh. When ωh is high (i.e., the cost of effort associated with reducing claims is lower and therefore optimal effort under bundled payment is higher), as in Panel B, the participation incentives of the hospital and the social planner are more closely aligned. In this case, even if λh is heterogeneous after conditioning on available information, it is easier to generate social-welfare-enhancing participation in bundled payment. However, when ωh is low (i.e., the cost of effort is higher and optimal effort is lower), as in Panel C, selection is primarily driven by levels, and it is difficult to generate social-welfare-enhancing participation by hospitals. In this case, it requires much less heterogeneity in λh, or more precise information, to be able to generate social welfare gains through the voluntary bundled payment program.

Comparing Panels C and D shows the importance of the relative heterogeneity in λh and ωh. Because the primary policy instrument is a fixed payment, large heterogeneity in λh, as in Panel C, leads to more inefficient selection into bundled payment. In contrast, if the primary source of heterogeneity is the “slope” ωh, as in Panel D, voluntary participation is more likely to generate social welfare gains.

The joint distribution of λh and ωh, conditional on (priced) observables, is key in assessing the potential social-welfare gains from alternative payment models. It is therefore the object of interest in our econometric exercise in the next section.

V. Specification and Results

V.A. Econometric Specification

We turn to econometrically estimating the economic model presented in the last section. Recall the way we defined the periods associated with the experimental setting. In period 1, all hospitals are still under FFS, and assignment to bundled payment has not been announced. In period 2, a subset of the hospitals are randomly assigned to bundled payment. In period 3, a subset of these latter hospitals endogenously choose to remain under bundled payment while the rest switch back to FFS.

For each hospital in the sample, we observe three periods of claims data {yh1, yh2, yh3} and three periods of bundled payment participation indicators {BPh1, BPh2, BPh3}, such that in period 1 BPh1 = 0 for all hospitals, in period 2 BPh2 depends on the treatment assignment, and in period 3 BPh3 stays the same as in period 2 for all hospitals, except for those in the voluntary treatment group, which endogenously choose their period 3 participation status. For hospitals randomly assigned to bundled payment in period 2, we also observe the hospital-specific bundle prices in period 2 and period 3 (even if they select out of the program), {bh2, bh3}.

The model in Section IV defines each hospital by two hospital-specific parameters (level and slope). To accommodate the panel nature of the data, we maintain the assumption that hospitals are associated with these level and slope parameters, {λh, ωh}. Although we assume that ωh is fixed over time (consistent with the evidence presented in Online Appendix Table A.3) and is known to the hospital, we allow the realized level parameter to vary over time, such that
(7)
where {γ1, γ2, γ3} are period-specific indicators (with a normalization of γ2 = 0), and εht is drawn (i.i.d.) from |$N(0,\sigma _\epsilon ^2)$|⁠. Thus, using the model of Section IV, these assumptions imply that claims are given by
(8)
We also assume that hospitals know their level type λh and the period-specific shifters (γ1 and γ3) but do not have any ex ante information about the random shock εht.
We further assume that {λh, ωh} are drawn from a joint log-normal distribution with a correlation parameter ρ, so that
(9)
where xh is a vector of hospital characteristics as well as strata fixed effects.17 Because claims for the index admission are fixed, we restrict ωh from being infeasibly large by truncating it from above at 0.71λh.18
Finally, we use the model of Section IV to specify the participation decision:
(10)
The participation decision (which only applies for period 3) follows the model we described in Section IV, with two changes. First, because hospitals do not know εh3 at the time that they make their participation decision, they do not observe λh3 and instead base their decision on the components they do observe (namely, λh and γ3).19 The second change is that we introduce a hospital-level choice shifter, νh, into the participation equation, which we assume to be drawn (i.i.d.) from |$N(x^{\prime }_h\, \beta ^{\nu },\sigma _\nu ^2)$|⁠. For now, we remain agnostic as to what this choice shifter represents and, in particular, whether it only affects hospital choice or is also relevant for social welfare. In our counterfactual analyses in Section VI, we consider both possibilities.

To summarize, our econometric model is fully characterized by equations (7), (8), and (10), and the parametric (normal and log-normal) distributional assumptions we described. It can be thought of as three equations, one for λ, one for ω, and one for ν, each associated with mean shifter parameters β and a variance parameter σ. The remaining model parameters are γ1, γ3, σε, and ρ.

V.B. Identification

The conceptual identification of the model is fairly straightforward given random assignment to bundled payments in period 2. The model has a set of parameters that correspond to the level of claims under FFS incentives and its evolution over time: βλ, σλ, γ1, γ3, σε; a set of parameters that correspond to the reduction in claims under bundled payment: βω, σω; a correlation parameter that relates the levels and slopes: ρ; and choice shifter parameters: βν, σν. The intuition for identification follows in three steps.

First, using data from the control group alone, in which we observe λh1, λh2, and λh3 for the same set of hospitals, we can identify βλ, γ1, γ3, σλ, and σε. Intuitively, these would be identified from a standard random effects model estimated on the control group. We can use the control group alone to estimate these parameters because random assignment guarantees that parameters estimated from the control group are valid for the entire sample.

Second, using the mandatory assignment part of the data (that is, periods 1 and 2), we can identify βω, σω, and ρ. We observe λh1 and λh2 for all hospitals in the control group, and |$\lambda_{h2} - \omega{h}$| for all hospitals in the treatment group. Because of random assignment we know that γ1, which is identified off the control group, is valid for the whole sample. For treatment hospitals, the average difference in λh1 and |$\lambda_{h2} - \omega{h}$| in excess of γ1 identifies βω.

The dispersion within the treatment hospitals in the change in episode claims between period 1 and period 2 is driven by a combination of the stochastic evolution of λht and the dispersion in ωh. Since the stochastic evolution of λht is already identified from the control group, we can (loosely) net it out, and the residual dispersion for treatment hospitals identifies σω. The intuition for identifying ρ is similar: we observe the reduction in claims for each hospital in the treatment group, can correlate it with the hospital’s period-1 claims, and adjust it appropriately for the additional independent noise driven by the stochastic evolution of λht, which is already identified by the control group.

Finally, the bundled payment participation equation identifies the distribution of the remaining choice shifter parameters βν and σν. This equation resembles a probit equation, but the error term has an economic interpretation as reflecting hospitals’ profit-maximizing choices. The joint distribution of λh and ωh, which is identified from the previous two steps, together with our model, generates predictions for the overall participation rate in a voluntary bundled payment program. Any systematic deviation from this “predicted” participation rate identifies βν and σν.

V.C. Estimation

We estimate the model using a Markov chain Monte Carlo (MCMC) Gibbs sampler. Without the participation equation, the model is linear and resembles a random effects model, so estimation using traditional methods such as maximum likelihood would be fairly straightforward. However, the participation equation makes estimation more difficult, as it requires us to numerically integrate over two-dimensional random effects, which combined with the large number of parameters introduced to capture observed heterogeneity across hospitals, makes optimization infeasible in our computing environment.

In contrast, the Gibbs sampler is tractable because our econometric model is fully parameterized, and we can rewrite our econometric model in a hierarchical manner, augmenting λh, ωh, and νh as (pseudo) parameters of the model. This means that we can break up simulation from the posterior distribution into smaller steps. Conditional on draws of λh, ωh, and νh the model is linear and pretty standard. Drawing values from the posterior distribution of each component of λh, ωh, and νh, conditional on the other two components, is reasonably simple. Online Appendix C provides complete details of the estimation procedure.

We verified the estimation method using simulated data. Reassuringly, the model also generated similar estimates for a simpler version of the model from a previous version of the article that we estimated by maximum likelihood (Einav et al. 2020b). The Gibbs sampler converges to a stable (posterior) distribution after several thousand simulations. We thus ran the Gibbs sampler for 100,000 iterations, and used the last 90,000 to construct our parameter estimates. For each parameter, we report the posterior mean and posterior standard deviation, which we refer to interchangeably as the parameter estimate. For other quantities of interest, we compute each quantity simulation-by-simulation and then report the posterior mean and posterior standard deviation, thus capturing any underlying correlation among parameters. Online Appendix D provides more details on how we map the MCMC estimates to the results reported below.

V.D. Results

We present the results in Table IV, Table V, and Figure IV.20 We start in Table IV by presenting the parameter estimates. Because the parameter estimates are not always economically interpretable, in Table V we present summary statistics on the distribution of key economic objects. Finally, in Figure IV, we present empirical analogues of the selection figures we used to illustrate the model in Section IV.

Model Estimates
Figure IV

Model Estimates

The figure reports the empirical analogues of Figures II and III. Specifically, it reports simulated hospitals based on our estimates for the 256 hospitals in the voluntary treatment group (see Figure I), with circles proportional to the number of episodes. Panel A reports results assuming there are no bundle prices and the choice shifter νh in the hospital selection equation is not decision-relevant. Selection decisions are characterized by levels (λh3), shown on the horizontal axis, and slopes (ωh), shown on the vertical axis. Panel B considers the role of targeted bundle prices by plotting |$\lambda _{h3} - b_h-\overline{b}$| on the horizontal axis. In this panel, in addition to netting out the bundle price bh, we add the average bundle price |$\overline{b}$| so that the axis remains on the same scale as in Panel A.

TABLE IV

Parameter Estimates

ln(λ) equationln(ω) equationν equation
MeanStd. dev.MeanStd. dev.MeanStd. dev.
Panel A: Equation-specific parameters
 Constant*10.1650.0054.8950.293−7,98412,616
 ln(CJR Episodes)−0.0660.004−0.5590.1454,9307,601
 ln(Beds)0.0500.0060.5020.2471,1725,065
 Quality score−0.1690.0224.8951.21743,47354,507
 Teaching0.0170.002−0.0340.084−2,2753,939
 For-profit−0.0080.0020.0780.0663,9124,517
 Government-owned−0.0020.001−0.1020.060−6211,322
 Nonprofitomitted categoryomitted categoryomitted category
 Strata fixed effectsyesyesyes
 σ0.1390.0030.7270.11724,66927,548
Panel B: Additional model parameters
γ10.0670.004
γ2normalized to 0
γ30.0150.003
σε0.0730.001
 ρ0.1430.196
ln(λ) equationln(ω) equationν equation
MeanStd. dev.MeanStd. dev.MeanStd. dev.
Panel A: Equation-specific parameters
 Constant*10.1650.0054.8950.293−7,98412,616
 ln(CJR Episodes)−0.0660.004−0.5590.1454,9307,601
 ln(Beds)0.0500.0060.5020.2471,1725,065
 Quality score−0.1690.0224.8951.21743,47354,507
 Teaching0.0170.002−0.0340.084−2,2753,939
 For-profit−0.0080.0020.0780.0663,9124,517
 Government-owned−0.0020.001−0.1020.060−6211,322
 Nonprofitomitted categoryomitted categoryomitted category
 Strata fixed effectsyesyesyes
 σ0.1390.0030.7270.11724,66927,548
Panel B: Additional model parameters
γ10.0670.004
γ2normalized to 0
γ30.0150.003
σε0.0730.001
 ρ0.1430.196

Notes. The table reports posterior means and posterior standard deviations of the model parameters (parameter estimates), estimated using a Gibbs sampler. Panel A reports parameter estimates associated with the three hospital-specific components of the model (λh, ωh, and νh). Hospital characteristics are demeaned so that the constant term can be interpreted as the mean of that component of the model. Panel B shows estimates of the time trend (γt) and other non-hospital-specific parameters. *Constant is the episode-weighted average of strata-specific constants.

TABLE IV

Parameter Estimates

ln(λ) equationln(ω) equationν equation
MeanStd. dev.MeanStd. dev.MeanStd. dev.
Panel A: Equation-specific parameters
 Constant*10.1650.0054.8950.293−7,98412,616
 ln(CJR Episodes)−0.0660.004−0.5590.1454,9307,601
 ln(Beds)0.0500.0060.5020.2471,1725,065
 Quality score−0.1690.0224.8951.21743,47354,507
 Teaching0.0170.002−0.0340.084−2,2753,939
 For-profit−0.0080.0020.0780.0663,9124,517
 Government-owned−0.0020.001−0.1020.060−6211,322
 Nonprofitomitted categoryomitted categoryomitted category
 Strata fixed effectsyesyesyes
 σ0.1390.0030.7270.11724,66927,548
Panel B: Additional model parameters
γ10.0670.004
γ2normalized to 0
γ30.0150.003
σε0.0730.001
 ρ0.1430.196
ln(λ) equationln(ω) equationν equation
MeanStd. dev.MeanStd. dev.MeanStd. dev.
Panel A: Equation-specific parameters
 Constant*10.1650.0054.8950.293−7,98412,616
 ln(CJR Episodes)−0.0660.004−0.5590.1454,9307,601
 ln(Beds)0.0500.0060.5020.2471,1725,065
 Quality score−0.1690.0224.8951.21743,47354,507
 Teaching0.0170.002−0.0340.084−2,2753,939
 For-profit−0.0080.0020.0780.0663,9124,517
 Government-owned−0.0020.001−0.1020.060−6211,322
 Nonprofitomitted categoryomitted categoryomitted category
 Strata fixed effectsyesyesyes
 σ0.1390.0030.7270.11724,66927,548
Panel B: Additional model parameters
γ10.0670.004
γ2normalized to 0
γ30.0150.003
σε0.0730.001
 ρ0.1430.196

Notes. The table reports posterior means and posterior standard deviations of the model parameters (parameter estimates), estimated using a Gibbs sampler. Panel A reports parameter estimates associated with the three hospital-specific components of the model (λh, ωh, and νh). Hospital characteristics are demeaned so that the constant term can be interpreted as the mean of that component of the model. Panel B shows estimates of the time trend (γt) and other non-hospital-specific parameters. *Constant is the episode-weighted average of strata-specific constants.

TABLE V

Posterior Distributions

E(x)SD(x)P5P25P50P75P95
Panel A: All hospitals
 ln(⁠|$\lambda_{h}$|⁠)10.170.199.8910.0410.1410.2710.51
 ln(ω|$_{h}$|⁠)4.8951.7471.8353.7175.0206.1797.528
λh327,0285,96019,62123,04325,92129,78638,238
ωh4851,1807441605001,902
Panel B: Hospitals in the voluntary treatment group only
λh25,2474,52719,53222,20724,39027,37033,723
λh325,5175,10919,33321,94924,59727,84135,277
bh23,6593,74419,53021,17422,78525,11031,141
bh − λh|$e^{\gamma_3}$|−1,9612,640−6,728−3,224−1,706−3581,390
ωh246611524732181,014
(bh − λh|$e^{\gamma_3}$|⁠) + |$\frac{\omega _h}{2}$|−1,8382,623−6,593−3,074−1,584−2581,477
 νh−7,71931,103−59,838−28,472−7,22713,45442,939
E(x)SD(x)P5P25P50P75P95
Panel A: All hospitals
 ln(⁠|$\lambda_{h}$|⁠)10.170.199.8910.0410.1410.2710.51
 ln(ω|$_{h}$|⁠)4.8951.7471.8353.7175.0206.1797.528
λh327,0285,96019,62123,04325,92129,78638,238
ωh4851,1807441605001,902
Panel B: Hospitals in the voluntary treatment group only
λh25,2474,52719,53222,20724,39027,37033,723
λh325,5175,10919,33321,94924,59727,84135,277
bh23,6593,74419,53021,17422,78525,11031,141
bh − λh|$e^{\gamma_3}$|−1,9612,640−6,728−3,224−1,706−3581,390
ωh246611524732181,014
(bh − λh|$e^{\gamma_3}$|⁠) + |$\frac{\omega _h}{2}$|−1,8382,623−6,593−3,074−1,584−2581,477
 νh−7,71931,103−59,838−28,472−7,22713,45442,939

Notes. The table presents summary statistics on the distribution of economic objects, weighting each hospital by the number of episodes per hospital so that the statistics are representative. Panel A shows statistics for all hospitals. Panel B shows statistics for hospitals in the voluntary treatment group, which is the group for which we can observe bundle prices. See Online Appendix D for more details on how the model estimates are reported.

TABLE V

Posterior Distributions

E(x)SD(x)P5P25P50P75P95
Panel A: All hospitals
 ln(⁠|$\lambda_{h}$|⁠)10.170.199.8910.0410.1410.2710.51
 ln(ω|$_{h}$|⁠)4.8951.7471.8353.7175.0206.1797.528
λh327,0285,96019,62123,04325,92129,78638,238
ωh4851,1807441605001,902
Panel B: Hospitals in the voluntary treatment group only
λh25,2474,52719,53222,20724,39027,37033,723
λh325,5175,10919,33321,94924,59727,84135,277
bh23,6593,74419,53021,17422,78525,11031,141
bh − λh|$e^{\gamma_3}$|−1,9612,640−6,728−3,224−1,706−3581,390
ωh246611524732181,014
(bh − λh|$e^{\gamma_3}$|⁠) + |$\frac{\omega _h}{2}$|−1,8382,623−6,593−3,074−1,584−2581,477
 νh−7,71931,103−59,838−28,472−7,22713,45442,939
E(x)SD(x)P5P25P50P75P95
Panel A: All hospitals
 ln(⁠|$\lambda_{h}$|⁠)10.170.199.8910.0410.1410.2710.51
 ln(ω|$_{h}$|⁠)4.8951.7471.8353.7175.0206.1797.528
λh327,0285,96019,62123,04325,92129,78638,238
ωh4851,1807441605001,902
Panel B: Hospitals in the voluntary treatment group only
λh25,2474,52719,53222,20724,39027,37033,723
λh325,5175,10919,33321,94924,59727,84135,277
bh23,6593,74419,53021,17422,78525,11031,141
bh − λh|$e^{\gamma_3}$|−1,9612,640−6,728−3,224−1,706−3581,390
ωh246611524732181,014
(bh − λh|$e^{\gamma_3}$|⁠) + |$\frac{\omega _h}{2}$|−1,8382,623−6,593−3,074−1,584−2581,477
 νh−7,71931,103−59,838−28,472−7,22713,45442,939

Notes. The table presents summary statistics on the distribution of economic objects, weighting each hospital by the number of episodes per hospital so that the statistics are representative. Panel A shows statistics for all hospitals. Panel B shows statistics for hospitals in the voluntary treatment group, which is the group for which we can observe bundle prices. See Online Appendix D for more details on how the model estimates are reported.

Table IV, Panel A presents parameter estimates associated with the three hospital-specific components of the model (ln λh, ln ωh, and νh). We demean the hospital characteristics so that the constants can be interpreted as the mean of that component of the model. Although we estimate separate constants for each stratum, for the sake of brevity we report the episode-weighted mean of the strata-specific constants in the table.

The parameter estimates for the hospital characteristics largely match the patterns from the descriptive analysis shown in Table II. First, there is a distinctive role for large hospitals in terms of their number of beds relative to “experienced” hospitals in terms of their CJR episode volume. Large hospitals are associated with higher levels and slopes, whereas hospitals with a high volume of CJR episodes are associated with a smaller level and a smaller slope. Large hospitals also have higher propensity to select into the bundled payment program, although all of the parameters associated with ν are imprecisely estimated. Second, high-quality hospitals have slightly lower episode spending, are more responsive to the incentives from bundled payments, and based on the point estimate, are more likely to select into the bundled payment programs. Third, for-profit hospitals respond to incentives more strongly and, based on the point estimates, are more likely to select into the bundled payment program relative to teaching hospitals or government-owned hospitals. Finally, via the observables, the choice shifter ν is negatively correlated with the level of spending (correlation of −0.136) and positively correlated with the slope (correlation of 0.108).21 Because the choice equation already accounts for the level and slope incentives, this suggests that hospitals are perhaps even slightly overweighting the level and slope incentives in their selection decisions. We defer discussion of the constants and the variance parameters (σλ, σω, σν) to our discussion of Table V.

Table IV, Panel B shows estimates of the time trend and other non-hospital-specific parameters. The γt parameters capture the time trends in the level of spending. The time trend is u-shaped with a large 6.7% decline in spending between period 1 and period 2 (γ1 = 0.067 relative to γ2 = 0) followed by a small 1.5% increase from period 2 and period 3 (γ3 = 0.015). The correlation between the level and slope is positive but small (ρ = 0.14). We discuss σε below.

Table V, Panel A presents summary statistics on the distribution of key economic objects across hospitals, weighting each hospital by the number of episodes per hospital so that the statistics are representative of the average episode. Among all hospitals, the mean level of claims (Eh3]) in period 3 is |${\$}$|27,028, and the mean slope (Eh]) is |${\$}$|485, which is qualitatively similar but somewhat smaller than the estimate of |${\$}$|797 in Table I. Although not statistically significant, the difference in point estimates presumably reflects some combination of the log specification and richer structure of the estimated model. The 0.19 standard deviation of ln (λh) is large relative to the 0.073 standard deviation of the idiosyncratic component (σε = 0.073 in Table IV), indicating that the cross-hospital component is the primary determinant of the level of claims.

Panel B shows summary statistics for hospitals in the voluntary treatment group. This is the group of hospitals for which we can observe bundle prices and will be the focus of our counterfactuals in the next section. Among voluntary treatment hospitals, the mean level of claims is unsurprisingly lower for all hospitals (recall that the voluntary treatment group was specified as treatment group hospitals from MSAs with below-median historical spending) and the slope is lower as well.

The model emphasizes the importance of heterogeneity in levels relative to the slope in determining the nature of selection and social welfare under voluntary bundled payments. The standard deviation of the level of claims (⁠|${\$}$|4,527) is much larger than the mean or standard deviation of the slope (⁠|${\$}$|246 and |${\$}$|611, respectively). While the observed bundle prices reduce the heterogeneity in levels by roughly half (standard deviation of bh − λhexp (γ3) of |${\$}$|2,640), the reduction is not large enough to undo the result that most of the selection is on levels. Finally, the choice shifter term ν has a large negative mean and substantial variation.

One question is whether additional observables could be used to better set the bundle prices. A comparison of the conditional standard deviation of ln λ (0.139 in Table IV) to the “unconditional” standard deviation of ln λ (0.19 in Table V) is not encouraging, as it indicates that observables account for less than one-third of the cross-hospital variation in levels.22 This underscores the difficulty in customizing bundle prices appropriately to specific hospitals, a point we return to in the next section.

To further examine the role of bundle prices, Figure IV produces empirical analogues of the selection figures (Figure III) we used to illustrate the model in Section IV, again focusing on the set of hospitals in the voluntary treatment group where we can observe bundle prices. To provide a baseline, Figure IV, Panel A plots simulated hospitals from the joint distribution of ωh (vertical axis) and λh3 (horizontal axis), without netting out the bundle price. Consistent with the discussion of Table V, Panel A indicates that selection on levels is a primary concern, with a large mass of hospitals selecting bundled payments inefficiently or selecting FFS. Figure IV, Panel B examines the role of targeting by plotting |$\lambda _{h3} - b_h + \overline{b}$| on the horizontal axis, which nets out the hospital’s bundle price bh and adds in the average bundle price |$\overline{b}$| so that the axis remains on the same scale as in Panel A. Netting out bundle prices shrinks the heterogeneity along the horizontal axis, but not enough, so that a sizable mass of hospitals continue to select bundled payments inefficiently or select FFS.

VI. Counterfactuals

We use the estimated model to perform two sets of counterfactual exercises. First, we compare social welfare under mandatory FFS, mandatory bundled payments, and voluntary bundled payments. Second, in the voluntary bundled payments regime, we consider how alternative bundle prices affect social welfare.

The current bundled payment option offers the choice of only one bundle price to each hospital, paired with incentives that make the hospital the full residual claimant on effort. It would be natural to explore social welfare under menus of contracts that trade off higher bundle prices in return for shallower incentives, as in the classic optimal regulation problem in Laffont and Tirole (1993). However, given that, as shown in Figure IV, selection on levels rather than slopes is the primary concern and that the average slope is relatively modest, better targeting of the price level rather than better design of the screening contracts seems a more fruitful avenue to explore.

Throughout, we focus on hospitals in the voluntary treatment group—that is, treatment group hospitals that were given a choice in period 3 of whether to remain under bundled payments or revert back to FFS (see Figure I), since this is the group of hospitals for which we observe bundled prices.

VI.A. Voluntary versus Mandatory

We first compare outcomes under the observed period 3 voluntary bundled payment program to two period 3 counterfactuals: all hospitals mandated to be under the status quo FFS regime (as they were in period 1) or all hospitals mandated to participate in the bundled payment program (as they were in period 2). These counterfactuals can be thought of as measuring the effect of the government’s decision to make the bundled payment program voluntary, relative to canceling the program entirely or keeping it mandatory. Throughout, we assume a social cost of funds of Λ = 0.15. Online Appendix D provides more details about how we map the estimation results to the reported quantities.

Table VI, Panel A shows the results. The first row reports the results that would occur if there were no bundled payment program and all hospitals were paid under FFS. Government spending (G in the definition of social welfare from equation (5)) averages |${\$}$|25,517 per episode, which corresponds to the mean λh3 reported in Table V, Panel B. The remainder of the entries are normalized to zero; the mandatory FFS counterfactual will serve as a benchmark to which we compare other regimes.

TABLE VI

Counterfactuals

Ignoring choice shifter
Percent selecting inGovernment spendingRelative social costsRelative hospital profitRelative social surplus
(1)(2)(3)(4)(5)
Panel A: Mandatory vs. voluntary
 Mandatory FFS (benchmark)0.025,517000
 Mandatory bundled payment100.023,659−2,137−1,736402
 Voluntary bundled payment38.825,055−532−405127
Panel B: Alternative voluntary regimes with different bundle prices
 Perfect targeting38.724,870−745−589155
 Feasible targeting38.524,908−700−551150
 Observed targeting38.725,018−574−440133
 No targeting39.125,302−248−15791
 Narrow bundle, no targeting38.525,045−543−413130
Ignoring choice shifter
Percent selecting inGovernment spendingRelative social costsRelative hospital profitRelative social surplus
(1)(2)(3)(4)(5)
Panel A: Mandatory vs. voluntary
 Mandatory FFS (benchmark)0.025,517000
 Mandatory bundled payment100.023,659−2,137−1,736402
 Voluntary bundled payment38.825,055−532−405127
Panel B: Alternative voluntary regimes with different bundle prices
 Perfect targeting38.724,870−745−589155
 Feasible targeting38.524,908−700−551150
 Observed targeting38.725,018−574−440133
 No targeting39.125,302−248−15791
 Narrow bundle, no targeting38.525,045−543−413130

Notes. All counterfactuals are conducted on the 259 hospitals in the voluntary treatment group (see Figure I). We weight the hospital-level simulated data by the number of episodes per hospital, so that the statistics are representative of the average episode. In Panel A, the first row reports results from the counterfactual in which bundled payment does not exist and all hospitals are paid FFS, the second row reports results from a mandatory participation bundled payment counterfactual, and the third row reports results from the observed voluntary participation bundled payment regime. Panel B reports results from counterfactual voluntary participation regimes that vary in their bundle prices. Column (2) reports Medicare spending (G from equation (5)). All other columns report results relative to the FFS counterfactual. Column (3) reports relative social costs (i.e., (1 + Λ)(bh − λh)). Columns (4) and (5) report hospital profits and social surplus relative to the FFS counterfactual, under the assumption that νh is not welfare relevant; therefore hospital profits relative to FFS are given by |$(b_h - \lambda _h) + \frac{\omega _h}{2}$|⁠, and social surplus relative to FFS is given by |$-\Lambda (b_h - \lambda _h) + \frac{\omega _h}{2}$|⁠.

TABLE VI

Counterfactuals

Ignoring choice shifter
Percent selecting inGovernment spendingRelative social costsRelative hospital profitRelative social surplus
(1)(2)(3)(4)(5)
Panel A: Mandatory vs. voluntary
 Mandatory FFS (benchmark)0.025,517000
 Mandatory bundled payment100.023,659−2,137−1,736402
 Voluntary bundled payment38.825,055−532−405127
Panel B: Alternative voluntary regimes with different bundle prices
 Perfect targeting38.724,870−745−589155
 Feasible targeting38.524,908−700−551150
 Observed targeting38.725,018−574−440133
 No targeting39.125,302−248−15791
 Narrow bundle, no targeting38.525,045−543−413130
Ignoring choice shifter
Percent selecting inGovernment spendingRelative social costsRelative hospital profitRelative social surplus
(1)(2)(3)(4)(5)
Panel A: Mandatory vs. voluntary
 Mandatory FFS (benchmark)0.025,517000
 Mandatory bundled payment100.023,659−2,137−1,736402
 Voluntary bundled payment38.825,055−532−405127
Panel B: Alternative voluntary regimes with different bundle prices
 Perfect targeting38.724,870−745−589155
 Feasible targeting38.524,908−700−551150
 Observed targeting38.725,018−574−440133
 No targeting39.125,302−248−15791
 Narrow bundle, no targeting38.525,045−543−413130

Notes. All counterfactuals are conducted on the 259 hospitals in the voluntary treatment group (see Figure I). We weight the hospital-level simulated data by the number of episodes per hospital, so that the statistics are representative of the average episode. In Panel A, the first row reports results from the counterfactual in which bundled payment does not exist and all hospitals are paid FFS, the second row reports results from a mandatory participation bundled payment counterfactual, and the third row reports results from the observed voluntary participation bundled payment regime. Panel B reports results from counterfactual voluntary participation regimes that vary in their bundle prices. Column (2) reports Medicare spending (G from equation (5)). All other columns report results relative to the FFS counterfactual. Column (3) reports relative social costs (i.e., (1 + Λ)(bh − λh)). Columns (4) and (5) report hospital profits and social surplus relative to the FFS counterfactual, under the assumption that νh is not welfare relevant; therefore hospital profits relative to FFS are given by |$(b_h - \lambda _h) + \frac{\omega _h}{2}$|⁠, and social surplus relative to FFS is given by |$-\Lambda (b_h - \lambda _h) + \frac{\omega _h}{2}$|⁠.

The second row of Panel A considers a counterfactual in which all hospitals are mandated to enroll in bundled payments in period 3, as was intended under the initial design. Under mandatory bundled payments, hospitals receive a transfer of (bh − λh3) and are residual claimants on the ω-related savings they generate (⁠|$\frac{\omega _h}{2}$|⁠). If bundle prices had been calibrated to equal counterfactual claims under FFS incentives (⁠|$\mathbb {E}_h b_h = \mathbb {E}_h \lambda _{h3}$|⁠), government spending would have been unaffected relative to the baseline. However, as seen in Table V, Panel B bundle prices bh were on average nearly |${\$}$|2,000 lower than FFS claims, so government spending is |${\$}$|1,859 lower and (multiplied by 1 + Λ = 1.15) social costs decrease by |${\$}$|2,137 (column (3)).

We consider two different versions of the welfare analysis, depending on whether we treat the choice shifter term, νh, as welfare relevant. The choice shifter would be welfare relevant if it represents a real hospital cost, such as fixed or variable costs of changing behavior, or perhaps uncertainty about λh, which might be greater for smaller hospitals. The choice shifter might not be welfare relevant, however, if it represents a choice friction such as status quo bias. In Table VI, columns (4) and (5), where we assume that νh is not welfare relevant, hospital profits relative to FFS decline by |${\$}$|1,736 (column (4)). This decline reflects the difference between the ω-related savings of |${\$}$|123 (half of the expected value of ωh from Table V, Panel B) and the |${\$}$|1,859 reduction in government payments. Social surplus rises by |${\$}$|402 (column (5)). In other words, the incentive effects of bundled payment (which generate |$\frac{\omega _h}{2} = \$123$| in social savings on average) represent about one-third of the social gain, with the remainder coming from the reduction in government spending. Naturally, if νh is taken into account (as in Online Appendix Table A.10), both hospital profits and relative social surplus are lowered by its average value of more than |${\$}$|7,700 (see Table V, Panel B).

The third row of Panel A considers the voluntary selection scenario that actually took place. (To be consistent with the other counterfactuals, outcomes are still simulated from the model.) We find that 38.8% of episode-weighted hospitals select into bundled payments, which is almost identical to the actual selection percentage (37.2% in Table III), providing assurances about the in-sample fit of the model. As we discussed in Section V, the much greater heterogeneity in bh − λh, relative to |$\frac{\omega _h}{2}$|⁠, suggests that selection into bundled payments is primarily on levels. Consequently, voluntary participation raises government spending per episode by |${\$}$|1,396 relative to the mandatory bundled payment regime, but lowers government spending per episode by |${\$}$|463 relative to the mandatory FFS regime.

Because hospitals are given a choice, their profits must be weakly higher under the voluntary regime relative to a mandatory regime. However, because we treat the νh term as non-welfare-relevant, hospital profits (net of νh) decline by |${\$}$|405 relative to the mandatory FFS benchmark, which is |${\$}$|1,330 higher than under mandatory bundled payments. Ignoring νh, social surplus under voluntary bundled payments is higher than under the mandatory FFS benchmark (by |${\$}$|127, column (5)) but lower than the mandatory bundled payment regime (by |${\$}$|275); lower social surplus reflects both the larger transfers to hospitals and the smaller share of hospitals generating ω-related efficiency gains. When we treat νh as welfare relevant (in Online Appendix Table A.10), hospital profits and social welfare are much higher under the voluntary regime, as hospitals sort on the value of their choice shifter, and the standard deviation in νh across hospitals is large. Intuitively, if we think that the νh-related costs are real, it is important to let hospitals avoid them if the offsetting ω-related benefits are not large enough.

Whether or not we treat the νh term as welfare-relevant, the results indicate that the voluntary bundled payment model generates social surplus relative to the mandatory FFS status quo. Yet the gains (when we ignore the νh term) are relatively modest. This is because voluntary bundled payments generate positive ω-related efficiency gains, which are partially offset by the social welfare losses associated with “overpaying” participating hospitals relative to their counterfactual FFS claims. We turn to exploring whether and how the voluntary program performance could improve if Medicare were able to set bundle prices to better reflect underlying hospital-specific costs and thus capture more of the |${\$}$|60 million dollars of potential annual ω-related gains in social surplus (from approximately 500,000 CJR episodes per year, with an average |$\frac{\omega _h}{2}$| of |${\$}$|123).

VI.B. Targeting

To explore price targeting under voluntary participation in a systematic fashion, we approximate the observed bundle prices using a parametric distribution, and then examine the effects of shifting its parameters. Specifically, we assume that hospital-level bundle prices, bh, are log-normally distributed and are correlated with hospital costs. We then explore voluntary participation under different parameter values. Online Appendix E provides more details.

Figure V summarizes the outcomes from this exercise, plotting social surplus relative to the mandatory FFS benchmark (y-axis) against government spending (x-axis) for different bundle price counterfactuals. In the plot, we focus on the social surplus values that do not consider νh (column (5) of Table VI). Table VI, Panel B reports outcomes associated with each exercise (and Online Appendix Table A.10 reports the parallel exercise when we treat νh as welfare-relevant). The black dot in Figure V corresponds to the observed distribution of bh and serves as a baseline. Because of the log-normal parameterization of bh, the outcomes for observed targeting in Table VI, Panel B are slightly different from those in the voluntary bundled payments row of Panel A.

Medicare Costs and Social Surplus under Alternative Bundle Prices
Figure V

Medicare Costs and Social Surplus under Alternative Bundle Prices

The figure plots social surplus per episode (relative to the FFS counterfactual) against government spending, and correspond to the values in Table VI, columns (5) and (2). See that table note for more details.

We consider three counterfactual targeting policies. The first, indicated by the point labeled “perfect targeting,” sets bundle prices so that they perfectly correlate with the realized claims under FFS (λh3), while maintaining the mean of the log bundle price at the level we estimate in the data. This improved targeting leads to lower government spending, and raises average social surplus by |${\$}$|155 per episode, or |${\$}$|22 more per episode (17%) than observed targeting.

Perfect targeting is a useful benchmark, but it is not feasible in the context of our model. To see this, recall that we model ln λh3 = ln λh + γ3 + εh3, where εh3 is an i.i.d. random variable that is not predictable in advance. To gauge the benefits of a more feasible contract, we consider a second scenario labeled “feasible targeting,” in which Medicare sets a bundle price that is perfectly correlated with ln λh but is uncorrelated with the εh3. We estimate that relative to the observed targeting, feasible targeting generates average welfare gains of |${\$}$|17 per episode, or 77% (⁠|$=\!\frac{\$150-\$133}{\$155-\$133}$|⁠) of the social-welfare gains that could be achieved with perfect targeting.

The analysis asks how well observed targeting does compared to its potential; the mirror image of this is to ask how well it does relative to no targeting. We therefore undertake a third exercise, labeled “no targeting,” which considers a case in which bundle prices are uniform across hospitals at a value equal to the average of bundle prices. Relative to the observed targeting, the no targeting case leads to slightly greater participation in bundled payments, higher government spending, and welfare gains of |${\$}$|91 per episode relative to the FFS benchmark. Within |${\$}$|91 to |${\$}$|155, the range defined by no targeting at the bottom and feasible targeting at the top, the observed targeting generates approximately 66% (⁠|$=\!\frac{\$133-\$91}{\$155-\$91}$|⁠) of the feasible gains.

Although improved information is a natural way for Medicare to achieve better targeting, it is not the only way to do so. Medicare could also achieve better targeting through a narrower definition of the bundle. To illustrate, we decompose average per episode claims into two additive separable components: hospital claims and other claims, which include PAC. Because the hospital claims portion was already reimbursed with a predetermined, DRG-based amount under FFS, we do not expect hospital claims to respond to the incentives from bundled payments; in effect, the hospitals already faced a bundled payment for the services they provided. Indeed, the empirical evidence show that most of the ω-related savings come from nonhospital claims, and PAC claims in particular. However, while hospital claims are not a source of ω-related savings, they are heterogeneous across hospitals, and thus a source of selection on levels. Eliminating hospital claims from the bundle thus effectively increases the degree of targeting.

We simulate the effects of eliminating hospital claims from the bundle. Under this counterfactual payment regime, labeled “narrow bundling, no targeting,” the bundled payment for hospitals in treatment MSAs covers only the nonhospital costs; in both treatment and control MSAs, hospitals continue to be paid a lump-sum DRG-based “bundle” to cover the within-hospital costs. As shown in the last row of Panel B, this bundle achieves almost identical social surplus to the observed targeting. This suggests that a narrowly targeted bundle with even a modest amount of targeting of nonhospital costs would raise surplus relative to the observed regime.

Overall, our findings suggest that the observed targeting performs fairly well, but that there are also feasible improvements in targeting that could generate meaningful reductions in government costs and higher total surplus. These could arise through tailoring bundle prices to better reflect claims under FFS of each hospital, or focusing the bundle on a narrower set of services in which cost savings can be realized.

VII. Conclusion

Government regulations are sometimes based on voluntary participation, allowing market actors to “choose their own incentives.” These voluntary regimes may be socially beneficial if they induce selection of actors with private information about the net benefits from changing their behavior, but they may be socially costly if they primarily attract actors who can receive higher government payments with no behavior change. We explored this trade-off between selection on slopes and selection on levels in the context of Medicare payment reform.

Our analysis takes advantage of a unique setting in which Medicare introduced an alternative payment model—called bundled payments—as a randomized trial and then modified the experimental design in midstream. Bundled payments were originally imposed as a mandatory participation model for hospitals in 67 randomly selected treatment markets, with hospitals in 104 other randomly selected control markets paid under the status quo. Two years into this five-year experiment, however, hospitals in half of the treatment markets were allowed to choose whether to remain under bundled payments or revert to the status quo payment model. This provided an opportunity to estimate the effect of the alternative model on all hospitals’ behavior and then observe which hospitals voluntarily choose to continue under it.

The descriptive evidence and the model estimates indicate that selection was primarily based on levels rather than on slopes. The main driver of participation in the voluntary bundled payment model was whether the hospital would benefit financially from the alternative regime without any change in behavior (selection on levels). We also found selection on slopes—hospitals who changed their behavior more in response to bundled payments were also more likely to opt in—but selection on this margin was much less quantitatively important.

As a result, we estimated that the voluntary bundled payment model generated inefficient transfers to hospitals and only a modest increase in social welfare relative to imposing the status quo FFS payment regime on all hospitals. However, we also estimated that alternative (feasible) voluntary designs that targeted reimbursement more closely to hospitals’ claims level under FFS could reduce these inefficient transfers substantially. Of course, any design with less generous reimbursements to better-performing actors may raise concerns about fairness, as well as concerns about ratchet effects (Freixas, Guesnerie, and Tirole 1985).

Our quantitative results are, of course, specific to our setting. They are, however, likely to be fairly representative of Medicare’s experience with alternative payment models, which take up over 30% of Medicare spending (Shatto 2016). In these other models, the estimated effects (slopes) have also tended to be modest—“singles, not home runs” in the words of Frakt (2019a). This suggests that as in our setting, there may be limited scope for social-welfare-improving selection on slopes and that voluntary models should be designed with particular attention toward limiting selection on levels.

In addition, while the cancellation of a mandatory participation RCT provides us with a unique research opportunity to estimate treatment effects and then observe participation decisions, from the participant perspective, our setting is similar to other Medicare programs in allowing one-time retroactive withdrawal from the program. For example, the BPCI-Advanced models, which are voluntary bundled payment models for certain inpatient and outpatient episodes, allowed participating hospitals the opportunity to leave the program—that is, retroactively withdraw with no financial consequences—during the first six months of the program. Thus, in these voluntary participation programs, as in the one we study, there is opportunity for participants to learn something about their levels and slopes before making a final participation decision (CMS 2018).

Beyond the specific quantitative results, our analysis suggests the importance of considering—and ideally estimating—selection on both slopes and levels in settings in which the regulator is considering a voluntary regime. Although there is an active and ongoing debate over the merits of voluntary versus mandatory payment reforms, voluntary participation is currently the norm across Medicare’s alternative payment models. With few exceptions, all other bundled payment models, ACOs, and primary care coordination models have been implemented in a voluntary manner. Moreover, as we noted in the introduction, voluntary regulation is widespread outside of healthcare, in sectors such as education, electricity, and environmental regulation. In these sectors, in addition to the trade-off between selection on slopes and levels, firm exit may be an important margin of adjustment. Exploring the impact and optimal design of such voluntary programs in such settings—with heterogeneity in levels, slopes, and exit propensity—is an important and fruitful direction for further work.

Data Availability

Code replicating the tables and figures in this article can be found in Einav et al. (2021) in the Harvard Dataverse, https://doi.org/10.7910/DVN/DEULAR.

Footnotes

*We thank Kate Ho, Parag Pathak, Jonathan Skinner, four anonymous referees, Stefanie Stantcheva (the editor), and participants in many seminars for helpful comments. We are grateful to Yining Mo, Xuyang Xia, and Chuan Yu for outstanding research assistance. We gratefully acknowledge support from J-PAL North America’s Health Care Delivery Initiative (Finkelstein and Mahoney), the National Institute of Aging grant P01AG019783-15, the Laura and John Arnold Foundation (Einav, Finkelstein, and Mahoney), the Becker Friedman Institute at the University of Chicago (Mahoney), and the National Science Foundation grant SES-1730466 (Mahoney).

1.

For instance, the announced End-Stage Renal Disease (ESRD) Treatments Model is mandatory, whereas the Kidney Care Choices (KCC) Model is voluntary. See the CMS Innovation Center website for more on these and other models: https://innovation.cms.gov/innovation-models/kidney-care-choices-kcc-model.

3.

One exception to this system is hospital reimbursements. Starting in 1982, Medicare adopted the Prospective Payment System, in which it makes a fixed payment for the hospital stay based on the patient’s diagnosis.

4.

In particular, Medicare set hospital-specific bundle prices for four severity groups determined by the two-by-two interaction of the patient’s DRG (469 or 470) and whether the patient had a hip fracture.

5.

For example, based on our calculation from the CJR reconciliation data, in the first year of the program fewer than 9% of treatment hospitals failed to meet the minimum quality standard for receiving a bonus. See https://innovation.cms.gov/Files/x/cjr-qualsup.pdf for more details on the quality standard.

6.

The stop-loss and stop-gain amounts increased over time. In the first year, the stop-gain amount was set as 5% of bh and the stop-loss was 0 (meaning that hospitals would never need to make payments to Medicare). By years 4 and 5, the stop-gain and stop-loss amounts were each scheduled to be set at 20% of bh.

7.

After the initial assignment, Medicare realized that they did not exclude some hospitals that were already (prior to assignment) signed up for BPCI (a different Medicare program), and subsequently excluded an additional 8 MSAs from the treatment group. Medicare later identified the 17 MSAs in the control group that would have been excluded based on these criteria. Since these exclusions were based on hospital decisions made prior to assignment, we simply drop these 25 MSAs from the study.

8.

At the start of period 3, Medicare also began to cover (in all hospitals) outpatient knee replacement as well as inpatient knee replacement. Because we find no statistically or quantitatively significant effect of treatment assignment on period 3 knee replacement volume or setting (not reported), we abstract from it in what follows.

9.

Among hospitals where we observe prices, the correlation between the imputed prices and the observed prices is 0.98. Because of the strength of this correlation, for the rest of the article we abstract from any potential measurement error associated with the imputation procedure.

10.

Specifically, for control group hospitals, the correlation coefficient between period 1 and period 2 is 0.77 for total episode claims, 0.65 for institutional PAC claims, and 0.71 for the share discharged to institutional PACs.

11.

Heterogeneity in the hospital-specific slopes may partially reflect idiosyncratic changes in hospitals over time. In the model we specify below, we explicitly model such idiosyncratic variation, which may be partially driven by measurement error.

12.

To examine sensitivity of these estimates, we estimate slopes using a modified version of equation (2) that controls for hospital-specific linear time trends using hospital-level data going back to 2010. Online Appendix Table A.4 reproduces the analysis in Table II using these alternative estimates of the hospital-level slopes.

13.

In Online Appendix Table A.6, we examine the sensitivity of these estimates to a specification that allows for hospital-specific (linear) time trends using data going back to 2010. The general results remain qualitatively similar, though differences between select-in and select-out hospitals remain statistically insignificant and the differences between the point estimates are smaller.

14.

This assumption is primarily made to simplify notation. It is straightforward in the context of the model to allow other providers to obtain a fixed markup, but reasonable levels of such markups would only slightly affect the quantitative results and would have no effect on the qualitative conclusions.

15.

For simplicity, we assume that |$c_h^{HOSP}$| remains the same under bundled payments and is not affected by e. This is not essential and can be viewed as a normalization, although it is a natural assumption. If the effort to reduce hospital cost and the effort to reduce PAC cost are separable, the hospital cost level was already optimized under FFS given that hospitals were already paid (under FFS) a fixed amount for the hospital portion of the episode.

16.

Recall that yh in practice measures claims paid by Medicare or owed out of pocket by consumers. Under this interpretation, it would be natural to multiply S by (1 + Λ), so that Λ represents the wedge between hospitals on the one hand and consumers and government on the other. Because we net S out of the calculations below, this can be done without loss of generality.

17.

Because assignment to bundled payments in period 2 was random conditional on strata, allowing the mean to vary by strata isolates the experimental variation and is the analogue to controlling for strata fixed effects in equation (1).

18.

The value of 0.71 represents the 99th percentile in the data for the ratio of these other claims to total episode claims. By truncating the distribution of ωh at the 99th percentile of other claims, we are essentially making the assumption that savings cannot exceed the 99th percentile of “other” spending.

19.

This specification captures a setting where hospitals base their participation decision on their point estimate for λh3 without accounting for the realization noise. It can also be viewed as an approximation of a setting where hospitals make their participation decision based on expected profits and integrate over the realization noise. In this latter case, given the log-normal distribution of λh3, the part of the expectation that is based on the variance of σε will be captured by the mean (across hospitals) of the choice shifter νh.

20.

Recall that we define period 2 to include data from 2016 and 2017. To address the potential concern that hospitals might have adjusted their behavior in anticipation of the switch to voluntary participation that was finalized in December 2017, we reestimate the model and reconduct the counterfactuals using data from 2016 only (and not 2017) for period 2. The results, which are essentially unchanged, are shown in Online Appendix Tables A.7, A.8, and A.9.

21.

To estimate this correlation, we use the correlation (across hospitals) between ν and λ and between ν and ω in each iteration of the Gibbs sampler and then average across iterations. Recall that ν is assumed to be independent of other parts of the model, so this correlation is solely driven by observables.

22.

In contrast, the analogous exercise indicates that observables account for more than half of the cross-hospital variation in slopes.

References

Alexander
Diane
,
“How Do Doctors Respond to Incentives? Unintended Consequences of Paying Doctors to Reduce Costs,”
Journal of Political Economy
,
128
(
2020
),
4046
4096
.

ArborMetrix
,
“A Data-Driven Approach to BPCI Advanced: Methods for Selecting Appropriate Risk,”
Technical report, n.d
.

Barnett
Michael L.
,
Wilcock
Andrew
,
Michael McWilliams
J.
,
Epstein
Arnold M.
,
Joynt Maddox
Karen E.
,
John Orav
E.
,
Grabowski
David C.
,
Mehrotra
Ateev
,
“Two-Year Evaluation of Mandatory Bundled Payments for Joint Replacement,”
New England Journal of Medicine
,
380
(
2019
),
252
262
.

Buhagiar
Mark A.
,
Naylor
Justine M.
,
Harris
Ian A.
,
Xuan
Wei
,
Kohler
Friedbert
,
Wright
Rachael
,
Fortunato
Renee
,
“Effect of Inpatient Rehabilitation vs a Monitored Home-Based Program on Mobility in Patients with Total Knee Arthroplasty: The HIHO Randomized Clinical Trial,”
Journal of the American Medical Association
,
317
(
2017
),
1037
1046
.

Carroll
Caitlin
,
Chernew
Michael
,
Mark Fendrick
A.
,
Thompson
Joe
,
Rose
Sherri
,
“Effects of Episode-Based Payment on Health Care Spending and Utilization: Evidence from Perinatal Care in Arkansas,”
Journal of Health Economics
,
61
(
2018
),
47
62
.

Cicala
Steve
,
Hémous
David
,
Olsen
Morten
,
“Adverse Selection as a Policy Instrument: Unraveling Climate Change,”
Tufts University Working Paper 2021.
https://www.stevecicala.com/papers/unraveling/unraveling_climate_change_draft.pdf.

Clemens
Jeffrey
,
Gottlieb
Joshua D.
,
“Do Physicians’ Financial Incentives Affect Medical Treatment and Patient Health?,”
American Economic Review
,
104
(
2014
),
1320
1349
.

CMS
,
“Medicare Program; Comprehensive Care for Joint Replacement Payment Model for Acute Care Hospitals Furnishing Lower Extremity Joint Replacement Services. Final Rule,”
Federal Register
,
80
(
2015a
),
73273
73554
.

CMS
,
“Medicare Program; Hospital Inpatient Prospective Payment Systems for Acute Care Hospitals and the Long-Term Care Hospital Prospective Payment System Policy Changes and Fiscal Year 2016 Rates; Revisions of Quality Reporting Requirements for Specific Providers, Including Changes Related to the Electronic Health Record Incentive Program; Extensions of the Medicare-Dependent, Small Rural Hospital Program and the Low-Volume Payment Adjustment for Hospitals. Final Rule; Interim Final Rule with Comment Period,”
Federal Register
,
80
(
2015b
),
49325
49886
.

CMS
,
“Medicare Program; Cancellation of Advancing Care Coordination through Episode Payment and Cardiac Rehabilitation Incentive Payment Models; Changes to Comprehensive Care for Joint Replacement Payment Model: Extreme and Uncontrollable Circumstances Policy for the Comprehensive Care for Joint Replacement Payment Model,”
Federal Register
,
82
(
2017
),
57066
57104
.

CMS
,
“CMS Statement Regarding the Retroactive Withdrawal Policy and the New Participation Agreement for BPCI Advanced,”
CMS Technical report
,
2018
.

CMS
,
“CMS Fast Facts,”
CMS Technical report
,
2019
.

CMS
,
“CMS Innovation Center Episode Payment Models,”
CMS Technical report
,
2020
. .

CMS Innovation Center, n.d
. .

Cromwell
Jerry
,
Dayhoff
Debra A.
,
Thoumaian
Armen H.
,
“Cost Savings and Physician Responses to Global Bundled Payments for Medicare Heart Bypass Surgery,”
Health Care Financing Review
,
19
(
1997
),
41
.

Cutler
David M.
,
“The Incidence of Adverse Medical Outcomes under Prospective Payment,”
Econometrica
,
63
(
1995
),
29
50
.

Cutler
David M.
,
Ghosh
Kaushik
,
“The Potential for Cost Savings through Bundled Episode Payments,”
New England Journal of Medicine
,
366
(
2012
),
1075
1077
.

DeAngelis
Corey A.
,
Burke
Lindsey M.
,
Wolf
Patrick J.
,
“The Effects of Regulations on Private School Choice Program Participation: Experimental Evidence from Florida,”
Social Science Quarterly
,
100
(
2019
),
2316
2336
.

Doran
James P.
,
Zabinski
Stephen J.
,
“Bundled Payment Initiatives for Medicare and Non-Medicare Total Joint Arthroplasty Patients at a Community Hospital: Bundles in the Real World,”
Journal of Arthroplasty
,
30
(
2015
),
353
355
.

Dummit
Laura A.
,
Kahvecioglu
Daver
,
Marrufo
Grecia
,
Rajkumar
Rahul
,
Marshall
Jaclyn
,
Tan
Eleonora
,
Press
Matthew J.
,
Flood
Shannon
,
Daniel Muldoon
L.
,
Gu
Qian
et al. ,
“Association between Hospital Participation in a Medicare Bundled Payment Initiative and Payments and Quality Outcomes for Lower Extremity Joint Replacement Episodes,”
Journal of the American Medical Association
,
316
(
2016
),
1267
1278
.

Einav
Liran
,
Finkelstein
Amy
,
Ji
Yunan
,
Mahoney
Neale
,
“Randomized Trial Shows Healthcare Payment Reform has Equal-Sized Spillover Effects on Patients Not Targeted by Reform,”
Proceedings of the National Academy of Sciences
,
117
(
2020a
),
18939
18947
.

Einav
Liran
,
Finkelstein
Amy
,
Ji
Yunan
,
Mahoney
Neale
,
“Voluntary Regulation: Evidence from Medicare Payment Reform,”
NBER Working Paper 27223
,
2020b
.

Einav
Liran
,
Finkelstein
Amy
,
Ji
Yunan
,
Mahoney
Neale
,
“Replication Data for: ‘Voluntary Regulation: Evidence from Medicare Payment Reform’,”
(
2021
), .

Einav
Liran
,
Finkelstein
Amy
,
Kluender
Raymond
,
Schrimpf
Paul
,
“Beyond Statistics: the Economic Content of Risk Scores,”
American Economic Journal: Applied Economics
,
8
(
2016
),
195
224
.

Einav
Liran
,
Finkelstein
Amy
,
Mahoney
Neale
,
“Provider Incentives and Healthcare Costs: Evidence from Long-Term Care Hospitals,”
Econometrica
,
86
(
2018
),
2161
2219
.

Einav
Liran
,
Finkelstein
Amy
,
Ryan
Stephen P.
,
Schrimpf
Paul
,
Cullen
Mark R.
,
“Selection on Moral Hazard in Health Insurance,”
American Economic Review
,
103
(
2013
),
178
219
.

Eliason
Paul J.
,
Grieco
Paul L. E.
,
McDevitt
Ryan C.
,
Roberts
James W.
,
“Strategic Patient Discharge: The Case of Long-Term Care Hospitals,”
American Economic Review
,
108
(
2018
),
3232
65
.

Elixhauser
Anne
,
Steiner
Claudia
,
Harris
D. Robert
,
Coffey
Rosanna M.
,
“Comorbidity Measures for Use with Administrative Data,”
Medical Care
, 36 (
1998
),
8
27
.

Finkelstein
Amy
,
Ji
Yunan
,
Mahoney
Neale
,
Skinner
Jonathan
,
“Mandatory Medicare Bundled Payment Program for Lower Extremity Joint Replacement and Discharge to Institutional Postacute Care: Interim Analysis of the First Year of a 5-Year Randomized Trial,”
Journal of the American Medical Association
,
320
(
2018
),
892
900
.

Fisher
Elliott S.
,
“Medicare’s Bundled Payment Program for Joint Replacement: Promise and Peril?,”
Journal of the American Medical Association
,
316
(
2016
),
1262
1264
.

Frakt
Austin
,
“‘Value’ of Care Was a Big Goal. How Did It Work Out?”
New York Times
, October
23
,
2019a
.

Frakt
Austin
,
“Which Health Policies Actually Work? We Rarely Find Out,”
New York Times
, October
9
,
2019b
.

Freixas
Xavier
,
Guesnerie
Roger
,
Tirole
Jean
,
“Planning under Incomplete Information and the Ratchet Effect,”
Review of Economic Studies
,
52
(
1985
),
173
191
.

Froemke
Cecily C.
,
Wang
Lian
,
DeHart
Matthew L.
,
Williamson
Ronda K.
,
Matsen Ko
Laura
,
Duwelius
Paul J.
,
“Standardizing Care and Improving Quality under a Bundled Payment Initiative for Total Joint Arthroplasty,”
Journal of Arthroplasty
,
30
(
2015
),
1676
1682
.

GAO
,
“Voluntary and Mandatory Episode-Based Payment Models and Their Participants,”
Technical Report GAO-19-156
,
2018
.

Gaynor
Martin
,
Mehta
Nirav
,
Richards-Shubik
Seth
,
“Optimal Contracting with Altruistic Agents: A Structural Model of Medicare Payments for Dialysis Drugs,”
NBER Technical Report
,
2020
.

Gaynor
Martin
,
Rebitzer
James B.
,
Taylor
Lowell J.
,
“Physician Incentives in Health Maintenance Organizations,”
Journal of Political Economy
,
112
(
2004
),
915
931
.

Gronniger
T.
,
Fiedler
M.
,
Patel
K.
,
Adler
L.
,
Ginsberg
P.
,
“How Should the Trump Administration Handle Medicare’s New Bundled Payment Programs?,”
Health Affairs Blog
,
Apr 10
2017
, .

Haas
Derek A.
,
Zhang
Xiaoran
,
Kaplan
Robert S.
,
Song
Zirui
,
“Evaluation of Economic and Clinical Outcomes Under Centers for Medicare & Medicaid Services Mandatory Bundled Payments for Joint Replacements,”
JAMA Internal Medicine
,
179
(
2019
),
924
931
.

Heckman
James J.
,
Honoré
Bo E.
,
“The Empirical Content of the Roy Mode,”
Econometrica
,
58
(
1990
),
1121
1149
.

Ho
Kate
,
Pakes
Ariel
,
“Hospital Choices, Hospital Prices, and Financial Incentives to Physicians,”
American Economic Review
,
104
(
2014
),
3841
3884
.

Ito
Koichiro
,
Ida
Takanori
,
Tanaka
Makoto
,
“Selection on Welfare Gains: Experimental Evidence from Electricity Plan Choice,”
NBER Working Paper 28413
,
2021
.

Jack
B. Kelsey
,
Jayachandran
Seema
,
“Self-Selection into Payments for Ecosystem Services Programs,”
Proceedings of the National Academy of Sciences
,
116
(
2019
),
5326
5333
.

King
Robert
,
“CMS to Use Mandatory Models ‘Very Judiciously,’ Official Says,”
Modern Healthcare
,
26
(
2019
).

Laffont
Jean-Jacques
,
Tirole
Jean
,
A Theory of Incentives in Procurement and Regulation
(Cambridge, MA: MIT Press
,
1993
).

Levy
S.
,
Bagley
N.
,
Rajkumar
R.
,
“Reform at Risk-Mandating Participation in Alternative Payment Plans,”
New England Journal of Medicine
,
378
(
2018
),
1663
1665
.

Lewin Group
,
“CMS Bundled Payments for Care Improvement (BPCI) Initiative Models 2-4: Year 1 Evaluation & Monitoring Annual Report,”
Technical report
,
2015
.

Lewin Group
,
“CMS Comprehensive Care for Joint Replacement Model: Performance Year 1 Evaluation Report,”
Technical report
,
2018
.

Lewin Group
,
“CMS Comprehensive Care for Joint Replacement Model: Performance Year 2 Evaluation Report.”
Technical report
,
2019a
.

Lewin Group
,
“CMS Comprehensive Care for Joint Replacement Model: Performance Year 2 Evaluation Report: An In-Depth Look: Hospital Case Studies,”
Technical report
,
2019b
.

Lewin Group
,
“CMS Comprehensive Care for Joint Replacement Model: Performance Year 3 Evaluation Report,”
Technical report
,
2020
.

Liao
Joshua M.
,
Pauly
Mark V.
,
Navathe
Amol S.
,
“When Should Medicare Mandate Participation in Alternative Payment Models?,”
Health Affairs
,
39
(
2020
),
305
309
.

Marone
Victoria R.
,
Sabety
Adrienne
,
“Should There Be Vertical Choice in Health Insurance Markets?,”
NBER Working Paper 28779
,
2021
.

Navathe
Amol S.
,
Troxel
Andrea B.
,
Liao
Joshua M.
,
Nan
Nan
,
Zhu
Jingsan
,
Zhong
Wenjun
,
Emanuel
Ezekiel J.
,
“Cost of Joint Replacement Using Bundled Payment Models,”
JAMA Internal Medicine
,
177
(
2017
),
214
222
.

Newcomer
Lee N.
,
Gould
Bruce
,
Page
Ray D.
,
Donelan
Sheila A.
,
Perkins
Monica
,
“Changing Physician Incentives for Affordable, Quality Cancer Care: Results of an Episode Payment Model,”
Journal of Oncology Practice
,
10
(
2014
),
322
326
.

Newhouse
Joseph P.
,
“Reimbursing Health Plans and Health Providers: Efficiency in Production versus Selection,”
Journal of Economic Literature
,
34
(
1996
),
1236
1263
.

Newhouse
Joseph P.
,
Pricing the Priceless: A Health Care Conundrum
(Cambridge, MA: MIT Press
,
2004
).

Quan
Hude
,
Sundararajan
Vijaya
,
Halfon
Patricia
,
Fong
Andrew
,
Burnand
Bernard
,
Luthi
Jean-Christophe
,
Saunders
L. Duncan
,
Beck
Cynthia A.
,
Feasby
Thomas E.
,
Ghali
William A.
,
“Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data,”
Medical Care
, 43(
2005
),
1130
1139
.

Shatto
John D.
,
“Center for Medicare and Medicaid Innovation’s Methodology and Calculations for the 2016 Estimate of Fee-for-Service Payments to Alternative Payment Models,”
CMS Technical report
,
2016
.

Shepard
Mark
,
“Hospital Network Competition and Adverse Selection: Evidence from the Massachusetts Health Insurance Exchange,”
NBER Working Paper
22600
,
2020
.

Thaler
Richard H.
,
Sunstein
Cass R.
,
“Libertarian Paternalism,”
American Economic Review
,
93
(
2003
),
175
179
.

The Hill
,
“Lawmakers Call for End to Medicare ‘Experiments’,”
Technical report
,
2016
. .

Wilcock
Andrew D.
,
Barnett
Michael L.
,
McWilliams
J. Michael
,
Grabowski
David C.
,
Mehrotra
Ateev
,
“Hospital Responses to Incentives in Episode-Based Payment for Joint Surgery: A Controlled Population-Based Study,”
JAMA Internal Medicine
,
181
(
2021
),
932
940
.

Zhu
Jane M.
,
Patel
Viren
,
Shea
Judy A.
,
Neuman
Mark D.
,
Werner
Rachel M.
,
“Hospitals Using Bundled Payment Report Reducing Skilled Nursing Facility Use and Improving Care Integration,”
Health Affairs
,
37
(
2018
),
1282
1289
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data