Made available through Montana State University’s ScholarWorks Who chooses commitment? Evidence and welfare implications Mariana Carrera, Heather Royer, Mark Stehr, Justin Sydnor, Dmitry Taubinsky This is a pre-copyedited, author-produced PDF of an article accepted for publication in The Review of Economic Studies following peer review. The version of record [Who Chooses Commitment? Evidence and Welfare Implications. The Review of Economic Studies (2021)] is available online at: https://doi.org/10.1093/restud/rdab056. NBER WORKING PAPER SERIES WHO CHOOSES COMMITMENT? EVIDENCE AND WELFARE IMPLICATIONS Mariana Carrera Heather Royer Mark Stehr Justin Sydnor Dmitry Taubinsky Working Paper 26161 http://www.nber.org/papers/w26161 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 August 2019, Revised August 2021 A previous version of this paper circulated under “How are Preferences for Commitment Revealed?” We are grateful to seminar and conference participants at Harvard, Wharton, UC San Diego, University of Zurich, Dartmouth, Claremont Graduate University, Erasmus University, the Economics Science Association conference, the American Society of Health Economists conference, Hebrew University, Stanford Institute for Theoretical Economics, and the Stanford- Berkeley mini conference for helpful comments and suggestions, as well as to Doug Bernheim, Stefano DellaVigna, David Molitor, Matthew Rabin, Gautam Rao, Frank Schilbach, Charles Sprenger, Séverine Toussaert, and Jonathan Zinman for helpful comments. Paul Fisher, Max Lee, Priscila de Oliveira, and Afras Sial provided excellent research assistance. We are grateful for funding from an NIH grant R21AG042051 entitled “Commitment Contracts for Health Behavior Change,” and from an Alfred P. Sloan Foundation grant entitled “Behavioral Economics in Equilibrium: Evidence and Welfare Implications.” This study was approved by the IRB at Case Western Reserve University and UC Santa Barbara. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. © 2019 by Mariana Carrera, Heather Royer, Mark Stehr, Justin Sydnor, and Dmitry Taubinsky. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Who Chooses Commitment? Evidence and Welfare Implications Mariana Carrera, Heather Royer, Mark Stehr, Justin Sydnor, and Dmitry Taubinsky NBER Working Paper No. 26161 August 2019, Revised August 2021 JEL No. C9,D9,I12 ABSTRACT This paper investigates whether offers of commitment contracts, in the form of self-imposed choice-set restrictions and penalties with no financial upside, are well-targeted tools for addressing self-control problems. In an experiment on gym attendance (N= 1;248), we examine take-up of commitment contracts, and also introduce a separate elicitation task to identify actual and perceived time inconsistency. There is high take-up of commitment contracts for greater gym attendance, resulting in significant increases in exercise. However, this is take-up is influenced both by noisy valuation and incorrect beliefs about one’s time inconsistency. Approximately half of the people who take up commitment contracts for higher gym attendance also take up commitment contracts for lower gym attendance. There is little association between commitment contract take-up and reduced-form and structural estimates of actual or perceived time inconsistency. A novel information treatment providing an exogenous shock to awareness of time inconsistency reduces demand for commitment contracts. Structural estimates of a model of quasi-hyperbolic discounting and gym attendance imply that offering our commitment contracts lowers consumer surplus, and is less socially efficient than utilizing linear exercise subsidies that achieve the same average change in behavior. Mariana Carrera Justin Sydnor Department of Agricultural Economics Wisconsn School of Business, and Economics ASRMI Department Montana State University University of Wisconsin-Madison P.O. Box 172920 975 University Avenue, Room 5287 Bozeman, MT 59717 Madison, WI 53726 and NBER and NBER mariana.carrera@montana.edu jsydnor@bus.wisc.edu Heather Royer Dmitry Taubinsky Department of Economics University of California, Berkeley University of California, Santa Barbara Department of Economics 2127 North Hall 530 Evans Hall #3880 Santa Barbara, CA 93106 Berkeley, CA 94720-3880 and NBER and NBER royer@econ.ucsb.edu dmitry.taubinsky@berkeley.edu Mark Stehr Drexel University LeBow College of Business Ghall 10th Floor 3220 Market Street Philadelphia, PA 19104 stehr@drexel.edu One of the central insights from economic models of time inconsistency and limited self-control is that people should desire incentives and mechanisms that help them alter their own future be- havior (Strotz, 1955; Laibson, 1997; O’Donoghue and Rabin, 1999; Heidhues and Kőszegi, 2009). Although this insight has a number of economic implications, the most prominent focus in the field- experimental literature has been on demand for commitment contracts, which we define as contracts that reduce choice-sets or impose penalties with no financial upside.1 As shown in Table 1, there are thirty-three empirical studies of commitment contract take-up as of the writing of this paper, spanning domains such as savings, health, and work effort, with all but two written in the last ten years. The high take-up rates (see Table 1) and significant effects on behavior documented in the literature suggest that commitment contracts could be welfare-enhancing, but this is not guaranteed. For example, if individuals are partially naive—they are aware of their time inconsistency but underestimate it—then they might incur costs from choosing ineffective commitment devices (e.g., Heidhues and Kőszegi, 2009). Nor do existing results shed light on whether other approaches to behavior change, such as taxes or subsidies (e.g., Gruber and Kőszegi, 2001; O’Donoghue and Rabin, 2006), might be more or less efficient. In this paper, we develop a framework to answer three key research questions. First, who takes up commitment contracts? Specifically, how does take-up of commitment contracts relate to people’s actual and perceived time inconsistency and marginal benefits of behavior change? What are the causal effects of increasing people’s awareness of their time inconsistency on their demand for commitment contracts? Second, do other factors—such as stochastic valuation errors in perception of incentives (see, e.g., Woodford, 2019, for a review)—affect take-up of commitment contracts? The existence of these other factors may help reconcile the high take-up rates observed in experiments with the low take-up rates predicted by theory (see, e.g., Laibson, 2015). Third, taking into account all of the drivers of commitment contract take-up, do commitment contracts increase consumer surplus and social welfare? Are commitment contracts more or less efficient than the kinds of tax instruments studied by, e.g., Gruber and Kőszegi (2001) and O’Donoghue and Rabin (2006)? We address these questions through a combination of theory and empirical findings from a field experiment on gym attendance with 1,248 participants. Our approach has four novel features. First, we directly assess how commitment take-up relates to reduced-form and structural estimates of both perceived and actual time inconsistency. In addition to offering commitment contracts, we utilize a separate experimental elicitation to estimate people’s perceived and actual time inconsistency. Second, we introduce a new approach to detecting stochastic valuation errors or other confounds in the take-up of commitment contracts. We offer individuals commitment contracts both for going to the gym more and for going to the gym less, and we study the correlation in people’s propensity 1This definition of commitment contracts implies that the contracts would not be taken up by time-consistent individuals. Thus, the definition excludes contracts such as those analyzed by DellaVigna and Malmendier (2004), which individuals may want to take up to counteract their perceived time inconsistency, but that may also be taken up by time-consistent individuals who see significant financial upside in some states of the world (e.g., contracts with high fixed fees and low utilization fees can be appealing to time-consistent individuals forecasting high utilization). 1 to take up both types of contracts. Third, we develop a novel information treatment that increases people’s sophistication about their time inconsistency, and we use this treatment to study the causal effect of sophistication on commitment contract take-up. Fourth, our rich experimental data allows us to estimate a structural model of quasi-hyperbolic discounting and partial naivete (Laibson, 1997; O’Donoghue and Rabin, 1999, 2001), and to validate it with out-of-sample tests—one of the first such estimates using field-experimental data. The model allows us to estimate whether commitment contracts are on net welfare-enhancing in our setting. We further use this model to study the key question of whether it is more socially efficient to use commitment contracts or linear tax instruments to counteract failures of self-control. Section 2 fleshes out our approach to estimating models of time inconsistency. The empirical content of models of time inconsistency consists of three objects: (i) how people desire to behave in the future, (ii) how people expect to behave in the future, and (iii) how people actually behave in the future. Objects (ii) and (iii) can be estimated directly by measuring people’s forecasts and actual attendance at different levels of attendance incentives. We show that the wedge between (i) and (ii) can be elicited by extending the insights from DellaVigna and Malmendier (2004) and Acland and Levy (2015). Intuitively, the Envelope Theorem implies that a person who believes herself to be time-inconsistent, and forecasts, say, 8 attendances over the experimental period at an incentive of $p per attendance, should value a marginal $dp per attendance increase in incentives by $8dp. Valuations above $8dp indicate that the person values the behavior change induced by the incentive increase more than a time-consistent individual would. We call the deviation from the time-consistent benchmark the behavior change premium, and we provide a simple sufficient statistics formula for estimating this object using people’s forecasted behavior and willingness to pay for incentives. Commitment contract take-up is a coarse measure of the behavior change premium, and can be misleading in the presence of noise in people’s valuations of incentives. On the one hand, take-up may underestimate perceived time inconsistency because uncertainty about the future, and thus the need for flexibility, erodes the value of such contracts. Generalizing the numerical examples in Laibson (2015), we provide formal mathematical results that there should be little take-up of commitment contracts under even moderate uncertainty. On the other hand, we show that take-up decisions may not reflect perceived time inconsistency and may be systematically biased by mean- zero noise in people’s valuations of incentives. This bias will be an upward bias when there is sufficient uncertainty such that demand for commitment contracts would be very low in the absence of noise in people’s valuations. This is in contrast to our sufficient statistics approach to estimating the behavior change premium, which we show delivers an unbiased estimate at the population level. Our experimental design, summarized in Section 3, revolves around the concepts introduced in Section 2. The experiment involved 1,248 members of a fitness facility in a large city in the midwest of the United States, and consisted of an online elicitation followed by four weeks of observed gym attendance under different attendance incentives. Following the measurement approach laid out in Section 2, we first elicited people’s forecasted 2 attendance over the next four weeks at different levels of piece-rate incentives that ranged from $0 to $12 per attendance. We then used an incentive-compatible procedure to elicit participants’ willingness to pay (WTP) for different piece-rate incentives. Finally, we randomly assigned different piece-rate incentives to a subset of the subjects and measured the impact on actual gym attendance. To study commitment contract take-up, we elicited demand for commitment contracts tied to attending the gym at least 8, 12, or 16 times over the next four weeks. For each of these thresholds, participants chose between an unconditional payment of $80 and a conditional payment of $80 that they received only if their attendance met or exceeded the threshold. We also asked participants to choose between receiving $80 unconditionally or conditional on going to the gym fewer than 8, 12, or 16 times over the next four weeks. To estimate the causal effects of increasing participants’ awareness of time inconsistency, we included a randomized information treatment prior to the elicitations, aimed at reducing overesti- mation of gym attendance.2 The treatment provided participants with information about their past gym attendance and highlighted (truthfully) that members of this gym tended to overestimate how often they would use the gym. After describing the data in Section 4, in Section 5 we report reduced-form results on people’s forecasted, desired, and actual attendance. On average, people overestimate their future gym at- tendance. At the same time, we estimate a significantly positive average behavior change premium, which implies partial sophistication about time inconsistency. The estimates imply that, on av- erage, participants valued increasing their future selves’ gym attendance by $1.78 per visit. Our information treatment significantly increased the behavior change premium, and simple proxies for sophistication are also strongly positively associated with the behavior change premium. In Section 6, we report results on commitment contract take-up. We find high take-up of commitment contracts to attend the gym more, consistent with the take-up rates observed in other studies with similar designs (64% for 8+ visits, 49% for 12+ visits, and 32% for 16+ visits). We also find that participants who were randomly assigned to receive the conditional $80 incentive for 12+ visits increased their attendance by 3.51 visits, on average. Results such as these are often interpreted as smoking-gun evidence for widespread awareness of time inconsistency, as well evidence of the welfare benefits of commitment contracts. However, we present a range of new findings that suggest that such inferences may be inap- propriate in the absence of additional evidence. Most strikingly, we find that 27-34 percent of participants chose commitment contracts to attend the gym less, and that the take-up of “more” and “less” contracts at each threshold is significantly positively correlated.3 Choosing both contracts 2As we describe in Section 3, in our first wave of the experiment we had a simpler information treatment that only provided information about past visits to the gym and found that this did not meaningfully affect beliefs. The second two waves of the experiment used an enhanced information treatment, which we show in Section 5 significantly reduced expectations of gym visits. 3We present a range of robustness checks for these results. We show that take-up is not concentrated only on participants who think these contracts will not be binding for them: those whose expected attendance in the absence of incentives is well above the contract threshold are almost as likely to take up the “less” contracts as those below the contract threshold. We also rule out other explanations for our results, such as participants simply confusing the “fewer visits” contracts for the “more visits” contracts, or participants simply disengaging and not taking their 3 is inconsistent with using commitment contracts as a self-control strategy, but is consistent with our theoretical predictions about the consequences of stochastic valuation errors, including predictions about the positive correlation. Intuitively, if stochastic valuation errors are the primary driver of take-up, then individuals most prone to these errors will be most likely to take up both types of contracts, which generates the positive correlation in take-up. Consistent with this evidence, we find little association between commitment contract take up and the behavior change premium and other proxies for awareness of time inconsistency. Finally, the information treatment signifi- cantly decreased the take-up of commitment contracts for higher gym attendance, suggesting that in our setting increased sophistication reduces desire for commitment contracts. Taken together, this evidence suggests take-up of commitment contracts partly reflects a combination of limited sophistication and noisy valuation of contracts. In Section 7, we combine our empirical results with a structural model to evaluate the welfare effects of commitment contracts, taking into account that at least some of the take-up reflects mistakes. We first use our data on piece-rate incentives to estimate a structural model of quasi- hyperbolic preferences with partial sophistication. We assume that all future utility is discounted by an additional β ≤ 1, which we refer to as present focus in the language of Ericson and Laibson (2019). Following O’Donoghue and Rabin (2001), we parametrize misprediction of time inconsistency by allowing people to believe that their future selves behave as if their present focus parameter is β̃. We estimate an actual average present focus parameter of β̂ = 0.55 and an average (across both information treatment and control groups) perceived present focus parameter of ˆ̃β = 0.84. We estimate a (perceived) long-run benefit of exercise of b̂ = $9.66 per attendance, which sits comfortably in the range of health benefits estimated in the public health literature. These estimates imply an average internality—the harms people impose on themselves due to present focus—of (1− β̂) · b̂ = $4.39. Our information treatment lowered the perceived present focus parameter from ˆ̃ to ˆ̃β = 0.86 β = 0.78 and increased awareness of present focus from ˆ̃(1 − β)/(1 − β̂) = 0.30 to ˆ̃ (1− β)/(1− β̂) = 0.49. However, and consistent with our reduced-form results, commitment contract take-up is largely unrelated to any of the model parameters. This suggests that offering our commitment contracts is not a well-targeted intervention, and this is reflected formally in our welfare estimates. On average, consumers who take up the 8+, 12+, and 16+ commitment contracts incur losses equivalent to −$7.91, −$18.69, and −$10.51 per person, respectively, under the long-run criterion. Moreover, while we estimate that the contracts lead to modest gains in the social efficiency of gym attendance, these gains pale in comparison to the effects of linear per-attendance incentives that are offered to the entire population and scaled to generate the same increases in average attendance. Our study fleshes out a number of mechanisms for why take-up and behavior change are not sufficient statistics for evaluating the efficacy of commitment contracts, and provides methods for assessing the importance of these mechanisms in other domains. This is illustrated by our results about how our commitment contracts are suboptimal tools for both measuring and addressing self- decisions seriously. 4 control problems in our exercise setting. Of course, this need not be true for all other domains of behavior or other types of contracts. In Section 8, we summarize a number of caveats to our results and discuss how our methods can be usefully extended to address other questions about data-driven incentive design for present-focused individuals. 1 Relation to prior literature Although take-up of commitment contracts is commonly interpreted as smoking gun evidence for awareness of present focus, we are not the first to consider the possibility of decision-making errors influencing take-up. Kaur, Kremer, and Mullainathan (2015) document that take-up of commitment contracts is positively associated with indicators of time inconsistency for data-entry workers, but only after workers have repeated exposure to contracts. Initial take-up decisions seem to reflect some degree of valuation errors. Our finding of the simultaneous take-up of contracts for more and fewer visits to the gym provides direct evidence of this possibility. This suggests that learning from repeated take-up decisions, as in Kaur, Kremer, and Mullainathan (2015) and Schilbach (2019), may be important for interpreting take-up of commitment contracts. This is particularly important in light of the fact that only seventeen of the thirty-three studies in Table 1 ever even mention potential confounds, and only eight discuss the confounds in depth as potential drivers of take-up.4 There is also related work in both psychology and economics that investigates experimenter demand effects (e.g., Oettingen et al., 2015; de Quidt, Haushofer, and Roth, 2018), though this work is not explicitly focused on demand effects in commitment contract take-up. Our novel design feature of offering commitment contracts for fewer visits to the gym is a complementary approach.5 Several studies have documented positive associations between demand for commitment con- tracts and indicators of actual time inconsistency (Augenblick, Niederle, and Sprenger, 2015; Kaur, Kremer, and Mullainathan, 2015). However, other studies have found at best weak (Ashraf, Karlan, and Yin, 2006) or negative associations between commitment contract take-up and time inconsis- tency (Sadoff, Samek, and Sprenger, 2019; John, 2020).6,7 John (2020) reports a negative association 4We coded a study as discussing confounds if it used the keywords experimenter effects, demand effects, alternative considerations, alternative explanations, confusion, noise, desirability bias, or Hawthorne effects. Eight discuss such effects but consider them to be relatively minor determinants of commitment take-up, and another eight mention that they may play an important role. For example, Exley and Naecker (2017) discuss demand effects, John (2020) discusses intrahousehold conflict, Brune et al. (2016) discuss the desire to shield savings from one’s social network, Bonein and Denant-Boèmont (2015) discuss the role of peer pressure, and Kaur, Kremer, and Mullainathan (2015) and Schilbach (2019) discuss both perceived social pressure and confusion. 5Methods in prior studies are focused on the idea that subjects may have beliefs about which behavior experi- menters desire. Our approach is different in that it reveals a more general tendency to accept novel options one is presented with, but not necessarily specific beliefs about what behavior the experimenter desires. There are no clear beliefs about experimenter demand for behavior that would justify the behavior we observe of people committing to both more and fewer gym visits, but this behavior is consistent with generally accepting novel options (along with other forms of noisy valuation as outlined in Section 2). 6Ashraf et al. (2006) find a significant positive association between commitment demand and an indicator of present focus from monetary discounting decisions for women, but they find no significant association for women when present focus is measured over consumption decisions (e.g., rice or ice cream), and no significant associations for men. 7Even in cases where there is an overall positive association between indicators of actual time inconsistency and 5 between proxies for naivete and take-up of commitment contracts for saving. We extend these re- sults by providing a uniquely detailed analysis of correlates of take-up that relates take-up to both a set of reduced-form proxies and structural estimates of perceived and actual present focus. We also introduce a novel information treatment that increases awareness of time inconsistency, and we use it to provide unique causal evidence about the impact of sophistication on take-up of commitment contracts.8 Studying the link between commitment contract demand and sophistication is important because as Heidhues and Kőszegi (2009) show theoretically, partially naive individuals can harm themselves by taking up ineffective commitment contracts. Bai et al. (Forthcoming) estimate a parametrized distribution of β and β̃ from commitment contract choices and conclude that a large share of individuals are partially naive in their setting and commitment contracts are likely damaging to individual welfare. In our setting, we similarly find that commitment contracts appear to harm individual welfare. An advantage of our approach is that we use empirical moments that are separate from contract take-up to directly estimate β, β̃, and internalities both for individuals who take up the contracts and for those who do not. Our welfare evaluation of commitment contracts is also the first to allow both a non-deterministic decision environment and stochastic valuation errors in take-up decisions. Finally, we contribute to work estimating structural models of time inconsistency, particularly in field settings. While there is a growing set of papers estimating the present focus parameter in the field after assuming either naivete or sophistication,9 only a handful of papers provide more complete and direct identification by estimating both people’s actual and perceived present focus: Skiba and Tobacman (2018), Augenblick and Rabin (2019), Chaloupka, Levy, and White (2019), Allcott et al. (Forthcoming), and Bai et al. (Forthcoming). Our estimation approach follows the ideas of DellaVigna and Malmendier (2004) and Acland and Levy (2012), and is most similar in spirit to that of Augenblick and Rabin (2019), who provide direct estimates of people’s desired, forecasted, and realized effort in a laboratory experiment with college students.10 But unlike Augenblick and Rabin (2019), our approach does not rely on the assumption that future effort costs are deterministic, and commitment contract take-up, there is often evidence consistent with our central finding that take-up may partly reflect something other than sophistication about time inconsistency. For example, in Augenblick, Niederle, and Sprenger (2015), 33 percent of subjects are identified as present-focused based on effort allocation decisions, yet 59 percent take up an offer of a commitment contract. Our theory and evidence on the link between commitment contract take-up and both noisy valuation and partial naivete help to explain why some studies document robust commitment contract take-up that may not be solely targeting time inconsistency. 8Our information treatment connects to a recent theoretical and empirical literature on how giving people statistics derived from their own experience impacts beliefs and behavior (Hanna, Mullainathan, and Schwartzstein, 2014; Schwartzstein, 2014; Gagnon-Bartsch, Rabin, and Schwartzstein, 2021), to recent evidence linking imperfect recall to over-optimistic beliefs about one’s self (Huffman, Raymond, and Shvets, 2020), and to recent evidence that in some situations individuals may learn from observing their past behavior (Allcott et al., Forthcoming). 9For field estimates, see Fang and Silverman (2004), Shui and Ausubel (2005), Paserman (2008), Laibson et al. (2018), Mahajan, Michel, and Tarozzi (2020), and Martinez, Meier, and Sprenger (2020). There is also a large laboratory literature focused almost exclusively on estimating actual but not perceived time inconsistency; see, e.g., the review in Ericson and Laibson (2019). 10Unlike the working paper version of Acland and Levy (2012), we utilize an approach that provides estimates of both β and β̃, and we develop our behavior change premium statistic to provide a model-free test of perceived time inconsistency that is not tied to specific parametric assumptions. 6 can be tractably applied in many field settings. For example, Allcott et al. (Forthcoming) extend our approach to study present focus among payday loan borrowers—a complex decision environment with non-separable payoffs and high uncertainty, non-quasilinearity in money, and potentially low financial literacy of experimental subjects. 2 Theoretical predictions and measurement techniques 2.1 Model setup We consider individuals who in periods t = 1, . . . , T have the option to take an action at ∈ {0, 1}. Choosing at = 1 generates immediate stochastic costs ct realized in period t as well as deterministic delayed benefits b realized in period T + 1. We assume that ct > 0 with positive probability, but don’t preclude the possibility of draws ct < 0. For concreteness, we will often refer to at = 1 as attending the gym and at = 0 as not attending the gym, with the understanding that our results apply to the∑general model presented here and not just gym attendance. For ā = Tt=1 at, we consider incentive contracts that pay out in T + 1, denoted as (y, P (ā)), that consist of a fixed transfer y (which could be negative), and a contingent reward P (ā) for certain levels of gym attendance. The contingent component P (ā) is non-negative, with minā∈[0,T ] P (ā) = 0. We assume for simplicity that utility is quasilinear in money, given the relatively modest incentives involved in our experiment. A piece-rate incentive contract with per-attendance incentive p has y = 0 and P (ā) = pā. Penalty-based commitment contracts for attending the gym at least r times are (−p, P ), with P (ā) = p · 1ā≥r. Conversely, a contract (−p, P ), with P (ā) = p · 1ā 0 implies that β̃ < 1. We call this reduced-form measure the behavior change premium per dollar of financial incentives, as it corresponds to individuals’ valuation of the behavior change induced by a ∆ = $1 increase in piece-rate incentives.11 The assumption about negligible terms is essentially the same as those in the canonical Harberger (1964) formula of the dead-weight loss of taxation: the change in incentives is not too large, par- ticularly relative to the degree of curvature in the region of the incentive change. The assumptions are reasonable in our data, where we find that both the actual and expected attendance curves are approximately linear. We note that the result in Proposition 1 cannot by itself be used to identify β̃; we make additional parametric assumptions in Section 7 to separately estimate β̃ and b. 2.2.1 Commitment contract take-up coarsely measures the behavior change premium Take-up of commitment contracts is less informative about perceived and actual time inconsistency than the behavior change premium. We illustrate this by returning to Figure 1 and assuming a single period of action (T = 1), so that the attendance curves in Figure 1 give the probability of a = 1, and the vertical line running through points H and I corresponds to the individual attending the gym with probability 1. A commitment contract where the individual puts an amount ∆ at stake is equivalent to the individual receiving an increase ∆ in attendance incentives, while also having to pay ∆ for sure. The surplus loss from paying ∆ is the rectangle BEHI, and thus a commitment contract is perceived to be valuable if the behavior change premium DCFG exceeds the loss CFHI. This illustrates that commitment contact take-up constitutes a coarse measure of the behavior change premium. In general, it is unlikely that the behavior change premium DCFG exceeds the loss CFHI when the probability of attendance is non-negligibly below 1. In Appendix A.2.2 we derive two gen- 11Assuming quasilinearity in money is not without loss, but is plausible for the relatively modest incentive sizes that are offered in field experiments such as ours. If participants are non-negligibly risk-averse over small amounts of money, then the statistic in (3) underestimates the WTP for behavior change, and leads to overestimates of β̃ (see Allcott et al., Forthcoming, for further details). Empirically, we do not find associations between the behavior change premium and our measure of small-stakes risk aversion. This is suggestive evidence that relative to other sources of variation in the behavior change premium, risk aversion doesn’t appear to be an important determinant of the BCP. Perhaps more speculatively, it may also be worth noting that to the extent that subjects’ apparent risk aversion in small-stakes lab gambles is more of a perceptual bias (as in the work by Khaw, Li, and Woodford, 2021), it is not obvious that it should manifest itself as anything other than mean-zero noise in our WTP exercise, and our results point in that direction. 9 eral results about the demand for commitment contracts when costs are uncertain. These results generalize the numerical simulation arguments in Laibson (2015), which make a number of special assumptions, such as uniform densities. First, we show that for a broad class of stochastic cost distri- butions, the quasi-hyperbolic model predicts that there should not be demand for any commitment contract when there is at least a moderate chance that costs exceed delayed benefits. Second, when there is enough uncertainty to make commitment contracts unattractive, the perceived harms of a commitment contract, given by the difference between CFHI and DCFG in the figure, are increasing in perceived present focus 1− β̃. That is, people who perceive themselves to be more present-focused will find commitment contracts less attractive (i.e., more harmful). In Appendix A.2.2 we show that there are two key conditions on the distribution of cost draws under which the value of commitment contracts is eroded, which we summarize here. First, the chances of getting a cost draw under which it is suboptimal to take the action (c > b) must be at least as high as the chances of getting a cost draw under which the time t = 0 individual thinks she should choose a = 1 but thinks that her time t = 1 self will not do so. Second, the cost draws exceeding b must not concentrated in a “small” neighborhood of b. As a simple numerical illustration for the case T = 1, suppose that c is uniformly distributed on [0, 1]. Then, it can be shown that no individuals with β̃ ≥ 0.8 desire any kind of commitment contract when the costs of attendance exceed the benefits at least 20% of the time—an arguably modest degree of uncertainty. Appendix A.2.2 presents additional examples. 2.3 The consequences of stochastic mean-zero mistakes In light of the results above, a natural question is why we see so much take up of commitment contracts in behavioral economics experiments. One possible reason is that because evaluating incentive schemes may be complicated, individuals may do so imperfectly. This is in line with a long intellectual history of measuring and modeling stochastic valuation errors in individuals’ decisions, starting from Block and Marschak (1960), continuing with Quantal Response Equilibrium (McKelvey and Palfrey, 1995), and recently gaining prominence in a variety of new approaches to bounded rationality (e.g., Woodford, 2012; Wei and Stocker, 2015; Khaw, Li, and Woodford, 2021; Natenzon, 2019). We refer to this mechanism as imperfect perception. Another reason is that some individuals simply like to say “yes” to offers, feel pressure to do so (DellaVigna et al., 2012), or falsely assume that the authority offering the contracts must be offering something valuable. We incorporate such social pressure effects into our model in Appendix A.2.3, and we derive our results under more general assumptions that allow for these effects. We formalize this with a reduced-form econometric model that supposes that for a given choice- set j, individual i behaves as if her forecasted utility under contract (y, P ) is V̂ (y, P ) = V (y, P ) + σ(P )εij (4) where εij has unbounded support, and σ(P ) > σ(0) when P 6= 0—i.e., the presence of contingent 10 incentives amplifies complexity and thus stochastic errors. We allow (but do not require) σ(0) = 0, meaning that individuals have no problems assessing sure incentives. The assumption that P affects the error term only through the variance guarantees that the error term is mean-zero; this is a key assumption of this model, and is typical in standard “random utility” models. In the types of decisions we study, this model is consistent with the two-stage Luce model (Echenique and Saito, 2019) when εij has the standard logistic distribution, σ(0) = 0, and σ is constant over all P 6= 0. When choosing between a sure incentive y′ a(nd a contract (y,)P ) with V (0, P ) ≥ 0, the individual chooses (y, P ) with probability eV (y,P )/σ ′/ ey /σ + eV (y,p)/σ) .12 For short, we refer to this framework as the imperfect perception model. 2.3.1 Commitment contract take-up is systematically biased by mean-zero mistakes The take-up of commitment contracts is a particularly problematic measure in the presence of imperfect perception because binary take-up decisions are biased by even mean-zero valuation errors (Aigner, 1973; Hausman, 2001). Even if the errors are symmetric—say 10% of the individuals always choose the wrong option—binary choice data will typically introduce bias. For example, if 10% of choices are mistakes, and only 5% of people actually want a given option, 14% will still end up choosing that given option. As we show formally in Appendix A.2.3, the imperfect perception model generates three predic- tions for penalty-based commitment contracts: 1. Individuals will demand commitment contracts to both exercise more and to exercise less. 2. As long as average β̃ is not too far below 1, there will be a positive correlation between take-up of commitment contracts to exercise more and take-up of commitment contracts to exercise less. 3. In the presence of moderate to high uncertainty about costs, increasing individuals’ sophis- tication about their present focus will decrease their demand for commitment contracts to exercise more.13 The intuition for the first prediction is that an extreme enough draw of ε can lead individuals to mistakenly choose undesirable contracts. The intuition for the second prediction is that if commit- ment contracts would generally look unappealing to individuals in the absence of valuation errors, then individuals with the highest variance in the stochastic valuation term ε will be the most likely to take up both types of contracts. The intuition for the third prediction is that under moderate 12At the same time, a key property of the model, arising from the fact that εij is common to all options in choice set j, is that if (y, P ) transparently dominates another contract (y′, P ′), in the sense that y ≥ y′ and V (0, P ) ≥ V (0, P ′), then the dominated contract is never chosen when σ(0) = 0 and σ is constant over all P 6= 0. This is consistent with our experimental results that participants almost never choose $0 over a larger sure reward, or $0 over a positive incentive for gym attendance. 13Interestingly, the converse does not hold for the “less” contracts. Intuitively, this is because a lower β̃ dampens the impact of financial incentives in both cases, and thus makes penalty-based contracts potentially more harmful in both cases. 11 to large uncertainty, the perceived harms of a commitment contract are decreasing in β̃ in the standard quasi-hyperbolic model (see Appendix A.2.2). Although in the standard quasi-hyperbolic model these conditions would lead individuals to never choose a commitment contract, in our imper- fect perception model individuals still choose the contract, but with a propensity that is decreasing in the expected harms in the standard model. 2.3.2 Estimates of the behavior change premium are not biased by mean-zero mistakes Measuring the behavior change premium is not subject to bias at the population level, because it is a continuous variable that preserves the mean-zero nature of people’s valuation errors. Specifically, let the subscript i denote each individual i’s WTP w, beliefs α, and so forth. Then Proposition 1 continues to for population averages, as we show in Appendix A.2.3. For example, equation (2) beco[mes ] [ ] wi(p+ ∆)− wi(p) α̃i(p+ ∆) + α̃i(p) − α̃i(p+ ∆)− α̃i(p)E = E + (1 β̃i)(bi + p+ ∆/2) (5) ∆ 2 ∆ The formula also continues to hold if individuals’ stated beliefs αi are a noisy function of their true subjective beliefs, as long as the noise is also mean-zero.14 Core to our result is that the WTP can range from below to above expected earnings, meaning that the measure of WTP for behavior change can range from negative to positive.15 Having some, but not full, continuity in a commitment measure is insufficient.16 3 Experimental design Our study recruited members of a fitness facility in a large city in the Midwest U.S. The facility is affiliated with a private university, offering subsidized memberships to graduate students, faculty, and staff, but is also open to the public.17 The university has a separate facility for undergraduates. The study that consisted of an online component followed by four weeks of observation of gym attendance. Appendix Table A1 shows the ordering of all parts of the online component of the 14Systematic over-statement of true beliefs would make this a particularly conservative test, as this would bias against us finding a demand for behavior change. 15Note that even though our experiment imposed a lower bound of $0 for WTP for a piece-rate incentive, the multiplicative nature of errors in our model implies that the perceived valuations for a piece-rate incentive cannot be below zero. Intuitively, individuals should not perceive the value of a positive piece-rate incentive as negative. 16For example, restricting WTP for a commitment contract, as in Milkman, Minson, and Volpp (2014), would mechanically lead to an upward bias in valuations, since negative draws of errors in valuation would be censored at 0 while positive draws of errors would be uncensored. Similarly, presenting experimental participants with a continuous commitment contract range of many possible penalties or targets as in, e.g., Kaur, Kremer, and Mullainathan (2015), would lead to bias if the range only allows participants to commit to doing more of something, but not less of something. 17There are three membership types at the gym: regular, graduate student, and members through a wellness program offered by their health insurance company. Graduate students have a subsidized membership fee by semester, included by default with their tuition and fees. Members of a health insurer’s wellness program are also able to obtain heavily subsidized memberships. Regular members pay an initiation fee and a monthly membership fee, which varies based on their affiliation with the university or other local employers. 12 study, which we summarize in more detail below. Enrollment was limited to people over the age of 18 who had held memberships over the past eight weeks. The study was open for three recruitment periods starting in October 2015 and ending in March 2016. During each recruitment period, the study was advertised through email invitations and flyers posted near the gym. Waves 1, 2, and 3 had 350, 528, and 414 participants, respectively.18 A key feature of the design is that we elicited preferences for commitment contracts and valua- tions of linear attendance incentives from all participants in an incentive-compatible manner, while at the same time generating random assignment of contracts and attendance incentives for most participants. The full study instructions are contained in a separate Study Instructions Appendix. Information treatment Before answering any of the questions described below, participants were assigned to receive an information treatment with 50% chance. In wave 1 of the study, the information treatment consisted of a graph showing the number of visits made by the participant in each of the past twenty weeks. In waves 2 and 3, we enhanced the information treatment in two ways. First, participants were asked to enter their best estimate for the average number of weekly visits they had made, while viewing the graph of their past visits. We anticipated that this would prompt them to pay more attention and better process the information. Second, participants were given information on how participants from the prior wave of the study overestimated their future attendance: “Participants estimated that they would visit [gym name] 4 more days over 4 weeks than they actually did. On average, that means they overestimated their attendance by 1 visit per week.” Participants randomized into the no-information control group proceeded directly to the elici- tations described below. Forecasted attendance and WTP for incentives All participants were asked to give their “best guess” of the number of days they would visit over the next 4 weeks (starting the Monday following the date of the online component), their goal number of visits over that period, and their perceived probability of meeting their goal. Additionally, participants were asked to consider six different incentive contracts for the four weeks starting the Monday after they completed the online component. The incentives were $1/day, $2/day, $3/day, $5/day, $7/day, and $12/day. Each incentive was presented on a separate page, and the order of these pages was randomized. For each incentive, participants were first asked to estimate how many days (0-28) they expected they would visit the gym over the next four weeks under each incentive. On the same page, they used a slider to indicate their willingness to pay (WTP) for this incentive; i.e., the largest possible 18Because many gym members are university students or employees, we scheduled the four-week incentive periods to avoid long breaks in the academic calendar. Thus, the first wave of the online component was in the fall semester, the second wave was in the spring semester preceding spring break, and the third wave was in the spring semester following spring break. 13 fixed payment over which they would prefer to receive the piece-rate incentive. Importantly, this WTP could be as low as $0 and thus substantially below the expected earnings from the incentive. If participants indicated the maximum WTP allowed by the slider (i.e., positioned it all the way to the right), they were taken to a fill-in-the-blank question where they entered their willingness to pay.19 Consistent with our theoretical model, all financial rewards were paid out after the four-week period. The WTP elicitation used the incentive-compatible Becker-DeGroot-Marschak (BDM) mecha- nism: at the end of the online component, participants would learn which of the questions had been randomly chosen to apply to them, and which randomly chosen fixed payment would be compared to their WTP to determine their outcome. If their WTP was above the randomly chosen fixed payment, they would receive the piece-rate incentive. If their WTP was below the randomly chosen fixed payment, they would receive the randomly chosen fixed payment. We devoted several screens to developing participants’ understanding of how to use a slider to indicate WTP and why truth-telling was incentive compatible. We also included two questions testing participants’ comprehension of the slider. Participants who answered one or both of these questions incorrectly were given another chance to answer correctly before moving to the next section of the online component. We did not incentivize accuracy of people’s attendance forecasts because according to standard models of time inconsistency, individuals with β̃ < 1 could use these forecasts as a means of commit- ment: stating a forecast higher than one’s actual belief would incentivize additional attendance.20 Because there is no incentive to misreport beliefs in the absence of financial incentives (and a strict dis-incentive in the presence of lying costs), we plausibly assume that on average (up to mean-zero noise), people accurately report their subjective beliefs in our study. Commitment contracts In the next section, participants were presented with commitment con- tract options targeting both more and fewer visits over the same four-week period. For example, participants were asked to answer both of the following questions: Which do you prefer? • $80 fixed payment (regardless of how often you go to the gym) • $80 incentive you get only if you go to the gym at least 12 days over the next four weeks. Which do you prefer? • $80 fixed payment (regardless of how often you go to the gym) 19The minimum value on each slider was zero, and the maximum was the value of the per-day incentive multiplied by 30 to include (slightly more than) the maximum possible expected earnings. 7.4% of responses were at the slider maximum. Of the subsequent fill-in-the-blank responses, half indicated a willingness to pay that was actually below the maximum, 22% indicated a willingness to pay equal to the maximum, and 28% indicated a willingness to pay that was above the maximum. 20Although Augenblick and Rabin (2019) show that this inflation is theoretically small for small incentives in deterministic environments, this is not generally true in environments featuring some uncertainty, such as ours. 14 • $80 incentive you get only if you go to the gym 11 or fewer days over the next four weeks. In waves 1 and 2, participants made binary choices like these between an unconditional $80 payment and $80 conditional on making “8 or more,” “12 or more,” “16 or more,”“7 or fewer,” “11 or fewer,” and “15 or fewer” visits to the gym (i.e., a series of 6 choices). In wave 3, this section of the online component was modified. Participants were only asked to consider commitments to visit “12 or more” and “11 or fewer” days, but they were also asked for their beliefs about their probabilities of meeting these commitments.21 Incentive-compatibility and assignment of attendance incentives One question was ran- domly chosen to determine each participant’s attendance incentive. When the selected question involved a piece-rate incentive, the participant’s WTP for that incentive was compared against a randomly drawn fixed payment. Fixed payments were drawn from a mixture distribution with two components: a uniform distribution from $0-$7 (mixture weight = 0.99), and a uniform distribution from the full range of slider values (mixture weight = 0.01). The rationale for this distribution was to avoid the endogenous assignment of incentives to participants with higher WTPs for those incentives. Given this design, incentives were exogenously assigned, with the exception of two rare cases. The first case is when the fixed payment draw exceeded $7 (n=12). The second case is when a participant indicated a WTP value within the $0-$7 range from which our fixed payments were heavily drawn (n=32). In these two cases, participants with higher WTP values are more likely to receive an attendance incentive, which would bias our estimation of incentive effects on gym visits due to selection. These 44 observations are excluded from the analyses throughout, but their exclusion makes very little qualitative or quantitative difference. We targeted a small number of questions with high probabilities of selection in order to power our comparisons of the incentive effects. In wave 1, the questions about the $2 and $7 piece-rate incentives were each assigned a 0.33 probability of being chosen. To create a group that did not face any incentive to visit the gym, the study also included a choice between a $0 per day incentive and a $20 fixed payment, and this question was also chosen with 0.33 probability. The remaining 1% was a random draw from all six piece-rate incentives and commitment contract questions.22 21After observing the surprising patterns in commitment demand in wave 1 (i.e., many participants chose both “fewer” and “more” contracts), we sought to replicate the patterns in wave 2 with no changes to the commitment con- tract component. After the wave 2 replication, we altered our design in wave 3 to further investigate the mechanisms of commitment contract demand. We elicited beliefs about the likelihood of meeting the thresholds stipulated by the “more” and “fewer” contracts to rule out some alternative hypotheses not consistent with the model we propose in Section 2.3. This also motivated us to randomize some participants into actually receiving the commitment contracts, to make sure that we could replicate previous findings that the commitment contracts do alter behavior (thereby also confirming that participants were not confused about the terms ex-post)—we discuss this randomization below. 22We informed the participants about this randomization scheme in the instructions by clarifying: “To keep within our grant budget, incentives and fixed payments with lower amounts are more likely to be randomly selected, but every incentive and fixed amount we ask you about has some chance of being selected.” 15 The targeted incentives were varied to document the effects of different incentive sizes.23 In wave 2, we shifted half of the probability mass at the $7 piece-rate incentive to the $5 piece-rate incentive to better understand the curvature of attendance as a function of the linear incentives. This shift resulted in the following incentive assignment probabilities: 33% for the $0 incentive; 33% for the $2 incentive; 16.5% for the $5 incentive; 16.5% for the $7 incentive. In wave 3, we added a group that would receive $80 conditional on making 12 or more visits, an attendance incentive equivalent to receiving one of the commitment contracts. Participants in this group would receive the $80 conditional payment as long as they had chosen option (a) for the question: “Which do you prefer? (a) $80 incentive you get only if you go to the gym at least 12 days over the next four weeks or (b) $0 fixed payment – no chance to earn money.”24 Since an incentive of $80 for 12 visits equals $6.67 per visit, we determined $7 to be the most comparable piece-rate incentive. Thus, our assignment probabilities in wave 3 were 33% for the $80 incentive to make 12 visits, 33% for the $0 incentive, and 33% for the $7 piece-rate incentive, to allow us to compare their effects. Announcement and disbursement of incentives In the final section of the online component, participants learned which incentive, if any, they would receive in the next four weeks. Participants received an email upon completion of the online component that confirmed their incentive and reminded them that the four-week incentive period would begin on the upcoming Monday. After- wards, participants were notified via email of their total number of visits and the total payment they had earned. Final payments were disbursed via mailed checks. 4 Data Attendance data Our measure of attendance is computed from participants’ swiping into the gym using their membership ID cards. Gym login records are potentially problematic if participants enter and leave the gym to earn incentives without exercising. We do not believe this possibility is a major concern because this behavior includes many of the costs of attending the gym (e.g., travel) but excludes some benefits (e.g., exercise). We also introduced a new checkout procedure partway through the study (in February 2016). Participants after that time were required to swipe out after attending the gym for at least 10 minutes in order to get credit for a visit toward their incentive. Introducing this procedure did not change visit patterns or the estimated incentive effects in the study and the swipe-out records reveal that the vast majority of gym visits lasted substantially longer than 10 minutes. 23Our initial plan to target only two distinct incentive levels was based on conservative estimates of the number of participants our budget would support and the potential variance of the incentive effects. 24Note that this is different from the question we used to elicit demand for commitment contracts, in which participants chose between a fixed payment of $80 and the $80 conditional payment. This enabled us to observe behavior under the incentive among both the participants who would and would not select into commitment contracts on their own. All but five individuals (1.2% of wave 3 participants) who were asked this question chose the $80 incentive over $0. 16 Sample Table 2 summarizes characteristics of our sample, including a break-down by wave. The participant pool is 61% female with a mean age of just under 34 years. 57% of the participants are either part- or full-time students, 57% work either part- or full-time, 27% are married, just under half hold an advanced degree, and household income averages fifty-five thousand dollars. Participants averaged 6.9 visits over the past four weeks. We find that the participant pools look similar across waves, but in relevant analyses we still include wave fixed effects. Appendix Table A2 shows the p-values for tests that the information treatment group means equal those of the information control group for wave 1, as well as for waves 2-3. Overall, the results are consistent with good balance between treatment and control groups. Compared to samples in other field experiments on commitment contract demand—particularly those involving low-income populations—our sample is more educated and numerate due to being affiliated with a university. For example, 95.2% of our sample correctly answered two numeracy questions from Lusardi and Mitchell (2007), which is significantly higher than the rate in the broader U.S. population.25 Given this high numeracy, it does not seem likely that our sample is more sus- ceptible to imperfect perception than the typical sample in commitment contract field experiments. Attention checks We have a few measures that proxy for engagement and attention to our online elicitations. First, as described in Section 3, we had two questions that offered a binary choice in which one of the choices, $0, was clearly dominated by the other. Only 1.8% of participants chose a dominated option. Second, we had an attention check question that presented a multiple-choice question to the participants but instructed them to click the “next” button without filling out one of the choices, with the explanation that this would indicate their attention to the question prompts. Only 3.5% of participants failed the attention check. Finally, we had two comprehension checks about the WTP elicitations and can use failing both as an additional indicator of lack of engagement. We find that only 4.3% of participants failed these comprehension checks twice. Taken together, these statistics suggest that attention and engagement were high, and compare favorably with most other lab-in-the-field studies. 5 Actual, forecasted, and desired attendance 5.1 Actual and forecasted attendance Figure 2 summarizes the forecasted and actual attendance curves, as introduced in Section 2.2. Both forecasted and actual attendance increase significantly with incentives, and there is a significant difference between the two, consistent with naivete (β̃ > β). On average, participants forecasted 11.5 visits in the absence of incentives and 17.7 visits with the $7 incentive during the four-week 25The percentage calculation question asks, “If the chance of getting a disease is 10 percent, how many people out of 1,000 would be expected to get the disease?” The lottery division question asks, “If 5 people all have the winning number in the lottery and the prize is 2 million dollars, how much will each of them get?” For comparison, in a sample of 1,984 adults aged 51-56 in the 2004 HRS, the percentages answering each question correctly were 83.5% (the percentage calculation) and 56% (the lottery division) (Lusardi and Mitchell, 2007). 17 study period. In reality, participants attended the gym an average of 7.2 times in the absence of incentives and 13.3 times with the $7 incentive. Figure 3 shows how the information treatments affected expectations and actual visits, splitting the sample into information treatment and control groups. Our simple wave 1 information treatment had no effect on either expectations of visits or realized visit patterns, as shown in panel (a). By contrast, the enhanced information treatment in waves 2 and 3 had a significant effect on beliefs that partially reduced participants’ overoptimism, as seen in panel (b). This “first-stage” allows us to study the causal effects of sophistication on the behavior change premium and commitment contract take-up. Figure A2 in Appendix C.1 presents a binned scatter plot of actual attendance versus expected attendance for the (randomly assigned) incentive people actually received. Although participants are over-optimistic about their attendance, the figure shows a tight relationship between forecasted and realized attendance. 5.2 Willingness to pay for incentives Figure 4 plots the average WTP for piece-rate incentives elicited from our participants for each of the six different piece-rate levels. The figure also shows the average subjective expected earnings at that piece-rate—i.e., the piece-rate multiplied by the participants’ forecasted attendance. The WTP is above participants’ subjective expected earnings for low incentives. For example, under a $1 per-visit piece-rate, participants believed that they would attend an average of 12.92 times but had an average WTP of $18.30, $5.38 more than their subjective expected earnings. The fact that people are willing to pay more for small incentives than they expect to earn is consistent with the theoretical predictions for agents that are aware of present focus (i.e., β̃ < 1). We also observe that the WTP is below the expected earnings on average for high incentives. This is consistent with the implication of equation (2), given moderate perceived present focus (β̃i reasonably close to 1).26 Figure A3 in Appendix C.2 presents binned scatter plots of how WTP for the incentives varies with people’s forecasts about attendance given those incentives. As would be implied by standard models, there is a tight relationship between WTP and both the size of the incentive and people’s forecasted attendance with that incentive. Moreover, the size of the incentive changes not only the level of WTP, but also its slope with respect to forecasted attendance. 5.3 The behavior change premium The seven different incentive levels for which we elicited WTP and forecasts allow us to produce a precise estimate of the average behavior change premium. Formally, order the incentive levels p0 = 0, p1, . . . , pK in ascending order. For each pair of adjacent incentives, pk and pk+1, we construct an estimate of the behavior change premium according to equation (3), applied to p = pk and 26To see this formally, note that the derivative of expected earnings with respect to the incentive level p is given by E[αi(p) + α′i(p)]. Thus as long as E[(bi + p)(1 − β̃i)] < 1, which will be the case for moderate levels of perceived present focus, d E[wi(p)] < E[αi(p) + α′i(p)].dp 18 ∆ = pk+1 − pk. We then take the average across all participants and all incentive pairs. We focus primarily on the average, rather than individual differences, because Corollary 1 in Appendix A.2.3 shows that the average statistic is the unbiased measure of the mean behavior change premium in the presence of imperfect perception. Consistent with our conjecture of imperfect perception of contract values, we find substantial variation in estimates of the behavior change premium at the individual level.27 Figure 5 shows the average value across six incentive levels, as well as the average excluding the valuation of increasing the piece-rate from $0 to $1, along with 95% confidence intervals. On average, the behavior change premium is $2.01 per $1 of incentive increase. However, this valuation is driven in part by an especially large premium for the $1 incentive. As Corollary 1 in the Appendix shows, if there are social pressure effects influencing willingness to pay for contingent incentives, the more robust measure of the behavior change premium is calculated only from changes in positive piece-rate levels. This more conservative average is $1.20 per dollar of piece-rate increase, and is also statistically significant. A linear regression of expected attendance on the piece-rate incentives shows that partici- pants expect that, on average, a $1 change in piece-rates will increase attendance by 0.67 visits (participant-cluster-robust s.e. 0.014). This implies that our two measures of the behavior change premium imply that individuals on average value increasing their future selves’ attendance by $1.78 per visit (based on the conservative measure) to $3.00 per visit (based on the less conservative measure). Throughout the rest of the paper, we focus on this more conservative measure of the behavior change premium, unless otherwise stated. 5.4 Correlates and determinants of the behavior change premium Table 3 examines the relationship between the behavior change premium and our information treat- ment, as well as proxies for people’s perceived present focus. In column 1 of Table 3 we regress the behavior change premium on indicators for the information treatments. Consistent with the null effect on beliefs documented in Section 5.1, the wave 1 information treatment had no effect on the behavior change premium. Consistent with the strong effect on beliefs documented in Section 5.1, the enhanced information treatment significantly increased the average behavior change premium, increasing the measure by $1.36 from the information control group average of $0.66. In columns 2 and 3 we examine the association between the behavior change premium and two proxies for awareness of present focus. In column 2 we study a standardized measure of the gap between goal and forecasted attendance as a covariate. We find that a one standard deviation increase in the gap between stated goal and expected visits is associated with a $0.71 increase in the behavior change premium, compared to an overall mean of $1.17. In column 3 we study the standardized difference between participants’ actual attendance under the incentive they were 27For example, we observe that the estimated value of behavior change is negative for 33 percent of observations. If we took those negative measures at face value, it would imply that participants have a desire to reduce their gym use at some incentive level 33 percent of the time. However, these negative values more likely represent valuation errors in participants’ decisions about willingness to pay and/or their estimates of visit rates. 19 randomly assigned and their expected attendance under that incentive. This difference is negative on average, reflecting participants’ over-optimism. We find that a one standard deviation decrease in the gap between expected and actual attendance corresponds to a $0.45 increase in the behavior change premium.28 In Appendix C.3 we present a regression of the behavior change premium on people’s expected change in behavior. Consistent with Proposition 1, we find that it is strongly related to the expected change in attendance. Moreover, when excluding the $1/visit incentive, the constant term in column 1 of Table A3 implies that the behavior change premium is indistinguishable from zero for individuals who expect no change in behavior. In summary, we find that the behavior change premium is significant (though modest) even in the information control group, is significantly affected by the enhanced information treatment, varies strongly with proxies for sophistication, varies strongly with individuals’ subjective beliefs about behavior change, and is approximately zero for individuals not expecting behavior change. 6 Take-up of commitment contracts 6.1 Take-up of “more” commitment contracts Participants in our study had high take-up of commitment contracts to visit the gym more than 8, 12, or 16 times. The take-up rates were 64% at the 8 visit threshold, 49% at the 12 visit threshold, and 32% at the 16 visit threshold. These take-up rates fit comfortably in the literature.29 Consistent with the existing literature, we find that commitment contracts had a substantial effect on behavior. Recall that in wave 3, we randomized some participants into receiving the commitment contracts, and that for most participants this assignment was exogenous to their stated desire to take up the contract. We find that assignment of a “12 or more” visits contract increased attendance by 3.51 visits (p-value < 0.01) for those participants who wanted the contract, and by 4.04 visits (p-value < 0.01) for those who did not. At the same time, and also consistent with prior work, we find that a substantial fraction of participants who took up the contract subsequently failed to reach the target (35%). Our results, like those in prior studies, would typically be interpreted as clear evidence of widespread awareness of present focus. However, we show that such inference may not be war- ranted without additional tests. 28Appendix Table A4 shows that the estimates are virtually unchanged when controlling for demographic charac- teristics. 29As Table 1 shows, while take-up rates are lower for studies that require participants to put their own money at stake, take-up rates are much higher for studies like ours that feature “house money” or other currency like course grade points. Most similar to our contract options, Schilbach (2019) also offers participants a choice between money for sure versus the same amount of money only if participants stay sober, and finds take-up rates ranging from 31% to 55%. 20 6.2 Commitment contract take-up is at best weakly related to awareness of present focus Building on the analysis in Section 5.4, we examine how take-up of “more” commitment contracts is affected by our information treatments, how it is associated with the proxies for sophistication introduced in Section 5.4, and how it is associated with the behavior change premium. Table 4 presents our main results. In column 1, we study the effects of our basic and enhanced information treatments. Consistent with the basic information treatment having no effect on beliefs, we find no effect of the information treatment on commitment contract take-up. On the other hand, we find a significant and negative effect of the enhanced information treatment. Recall that the enhanced information treatment had a significantly positive effect on the behavior change premium, consistent with the treatment increasing awareness of present focus. Thus, its negative effect on commitment contract take-up is consistent with the prediction in Section 2.3 that increasing sophistication can decrease take-up of commitment contracts for more gym attendance. Intuitively, the information treatment reduces our participants’ confidence that they will meet the threshold of the commitment contract. Moreover, we find only a weak association between take-up of commitment contracts and the behavior change premium, as shown in column 2. A one standard deviation increase in the be- havior change premium is associated with around a 3 percentage point increase in the take-up of commitment contracts. We supplement these findings with Appendix Table A5, which examines the association between the behavior change premium and take-up of each type of contract, both in the information treatment and information control group. The table shows that this association is even smaller for the information control group.30 Next, we examine how take-up of “more” commitment contracts correlates with our proxies for sophistication introduced in Section 5.4. Column 3 shows that the gap between goal and expected attendance is positively associated with take-up of commitment contracts. However, in contrast to the relationship with the behavior change premium, the association with commitment contract take-up is relatively small in magnitude: a one standard deviation increase in the gap between goal and expected attendance is associated with a 3.8 percentage point increase in the take-up of commitment contracts, from an average take-up rate of 49 percent. Moreover, and in starker contrast to our results on the behavior change premium, column 4 shows that participants who are more over-optimistic about their gym attendance are actually more likely to take up commitment contracts for higher gym attendance.31 30One potential reason for the lack of association between the behavior change premium and commitment contract take-up could be that both measures are noisy and there is attenuation bias in the relationship. However, the analysis in Table 3 showed very strong associations between the behavior change premium and our proxy for sophistication, suggesting the measure is not so noisy as to attenuate all relationships. Moreover, the average pairwise correlation of the individual-level behavior change premium at different incentive levels is 0.17 (bootstrapped cluster-robust s.e. 0.06) and the average pairwise correlation of demand for the different “more” contracts is 0.49 (bootstrapped cluster-robust s.e. 0.02). 31Appendix Table A6 shows that the estimates are virtually unchanged when controlling for demographic charac- teristics. 21 Collectively, these results are consistent with the hypotheses introduced in Sections 2.2.1 and 2.3: commitment contract take-up might not be positively related to perceived or actual present focus, because commitment contracts are most unattractive to those with stronger perceived present focus and/or because their take-up may be influenced by stochastic valuation errors. The next section provides a more direct test of whether stochastic valuation errors are affecting the take-up of commitment contracts. 6.3 Commitment contract take-up appears to reflect imperfect perception Table 5 presents our central result about take-up of both “more” commitments and “fewer” commit- ments at each of the visit thresholds. Column 2 shows that approximately one-third of participants selected the “fewer visits” contracts. Under the standard interpretation of commitment contracts as indicating a desire to influence one’s future behavior, take-up of these “fewer visits” contracts would be interpreted as a reasonably large share of the population having either awareness of future bias or perceiving visits to the gym as having immediate benefits and delayed costs. However, the imperfect perception model in Section 2.3 not only predicts that some participants will select the “fewer visits” contracts, but also makes the stronger prediction that some participants will select both types of contracts at the same threshold. Our within-subject design allows us to examine this prediction. Columns 3 and 4 in the table show the shares of participants selecting each type of contract conditional on selecting the other contract type for each threshold. Many participants selected both the “more visits” and the “fewer visits” contracts at the same threshold. In particular, among participants who selected “more visits” contracts at each threshold, nearly half also selected the “fewer visits” contract at the same threshold. Choosing both contracts at the same threshold is inconsistent with decisions driven by awareness of present focus, and thus a strong indicator that stochastic valuation errors or perceived social pressure are prevalent in commitment contract take-up. An even stronger prediction of our imperfect perception model is that there will be a positive correlation in the take-up of the two types of contracts. Consistent with this, the last two columns of Table 5 show that participants who chose the “fewer” commitment contracts were significantly more likely to choose the “more” commitment contracts, and vice versa. Appendix Table A7 shows that these patterns are consistent in both our information control group and the group receiving the enhanced information treatment. While these results suggest the presence of stochastic valuation errors (or social pressure effects), they do not imply that all take-up of commitment contracts is explained by these confounds. For example, just over half of the participants who selected “more visits” commitments at each threshold did not select the “fewer visits” contracts and conversely for participants who selected “fewer visits” contracts. These patterns could be consistent with some participants truly wanting to commit to attending the gym more, and some participants wanting to commit to attending the gym less. However, in Appendix Table A8 we investigate the association between the measured behavior change premium and taking up a “more” but not “fewer” contract, and we do not find any positive 22 association. This suggests that it may not be possible to reliably identify the behavior change premium by simply restricting to individuals who take up “more” contracts but not “fewer” contracts. 6.4 Robustness of results on take-up of “fewer visits” contracts 6.4.1 Participants don’t confuse “fewer visits” for “more visits” contracts Although the reported patterns of behavior are consistent with the imperfect perception model in Section 2.3, one could argue that an asymmetric error process could make take-up of “fewer visits” contracts noisy while not affecting take-up of “more visits” contracts. For example, people could mistake “fewer visits” contracts for “more visits” contracts. But the fact that some people select “fewer visits” contracts without also selecting “more visits” speaks against this possibility as an explanation for all choices. The experimental instructions made a clear distinction between the two types of contracts, with the differences underlined for emphasis. Moreover, if participants were simply confusing “fewer” contracts for “more” contracts, then any variable that is positively associated with perceived success in or take-up of a “more” contract should also be positively associated with perceived success in or take-up of a “fewer” contract. Table 6 shows that participants differentiated between questions about perceived likelihood of success in a “more” contract versus a “fewer” contract. Participants who expected to attend the gym frequently in the absence of incentives were more likely to believe that they would meet the terms of a “more” contract, and less likely to believe that they would meet the terms of a “fewer” contract. Moreover, the positive and negative coefficients are not identified off of different subgroups: when restricting to the subgroup who both chose “more” and “fewer” contracts, the results are very similar, as shown in column 4. This implies that at least in answering the forecasting questions, participants were not simply misreading the “fewer” contract to be the “more” contract. In Appendix C.6 we continue with this analysis and present associations of commitment contract take-up with (i) perceived likelihood of success under “more” and “fewer” commitment contracts (Appendix Table A9), (ii) subjective expected attendance in the absence of incentives (Appendix Table A10), (iii) past attendance (Appendix Table A10), and (iv) desired goal attendance (Appendix Table A10). Each of these variables is significantly positively associated with take-up of “more” contracts, and significantly negatively associated with take-up of “fewer” contracts. 6.4.2 Results are not a consequence of disengagement from the study In Section 4 we summarized results from attention and comprehension checks, which suggest strong engagement and attention. When we exclude the small percentage of participants who failed a com- prehension check or attention check or chose a dominated option, overall demand for the “fewer” contracts falls from 31% to 30%, and this exclusion has no effect on demand for the “more” con- tracts. While these proxies cannot be guaranteed to identify all individuals who disengaged or misunderstood some portion of the study, the lack of association between the proxies and demand for commitment contracts implies that disengagement or misunderstanding is unlikely to drive our 23 results. 6.4.3 Results are not driven by participants for whom the contracts are not binding Because our commitment contract offers are only weakly financially dominated, some of the take-up may be driven by individuals for whom the contracts are not really binding. For example, individuals who choose the 11 or fewer visits contract could be individuals who would already attend the gym 11 or fewer times in the absence of any discouragement.32 In our data, it does not appear that much of the take-up is driven by individuals for whom the contracts would be inconsequential. As shown in Appendix C.7, individuals whose expected attendance exceeds the “fewer” threshold by 2 or 4 visits are nearly as likely to select the “fewer visits” contracts as the full sample. The same pattern holds for the “more visits” contracts. Perhaps most importantly, the positive association between take-up of “more” and “fewer” contracts remains unchanged when restricting to a subset of participants for whom either the “more” or the “fewer” contract would be at least moderately binding (Appendix Table A12). 6.5 Summary of reduced-form results Sections 5 and 6 establish the following set of reduced-form results. First, participants in our study perceive themselves to be time-inconsistent. Second, participants appear to be only partially aware of their time inconsistency, as they overestimate their future gym attendance. Third, awareness of time inconsistency appears to be malleable, as our information treatment significantly increased the average behavior change premium. Fourth, take-up of commitment contracts is not strongly related to perceived present focus and appears to be influenced by stochastic valuation errors. This suggests that commitment contracts are unlikely to be a well-targeted tool for addressing time inconsistency in this setting, which we examine formally in the next section using a structural model of quasi-hyperbolic discounting. 7 Structural estimates and welfare implications 7.1 Summary of methodology We estimate the model of present focus introduced in Section 2.1 using data on forecasted and actual attendance and the WTP for the piece-rate incentives. We estimate the model both by pooling over the full population, as well as for various subsamples to incorporate heterogeneity. For simplicity, we assume that once people have financial incentives in place, their daily gym attendance decisions are not biased by stochastic valuation errors, although our welfare results do incorporate people’s possible errors in contract take-up decisions. We discuss this assumption in Section 7.3.1. 32Such patterns of choice appear to be prevalent in some studies, such as Augenblick, Niederle, and Sprenger (2015), who find that demand for choice-set restrictions decreases substantially when a small price is introduced. However, other studies, such as Schilbach (2019), find less evidence for this. 24 We assume that each day corresponds to a period, and we thus set T = 28 to correspond to the four-week study period. We assume attendance costs in each period are distributed independently and identically according to the exponential distribution with rate parameter λ. This assumption implies that the net immediate costs of attending the gym—taking into account the hassle costs of getting to the gym, but also possible gratification from entertainment or endorphins—are always non-negative. The free parameters in our model are the perceived and actual present focus parameters β̃ and β, the (perceived) delayed health benefits b, and the rate parameter λ. The parametric assumptions impl[y that actual]and forec[asted average]attendance at per-attendance incentive p are given by 28 · 1− e−λβ(b+p) and 28 · 1− e−λβ̃(b+p) , respectively. We note that people’s behavior is determined by their perceptions of the per-attendance health benefits, not the actual health benefits. If the two are different, our methodology identifies the perceived health benefits, and our welfare results overestimate (underestimate) the benefits of in- creasing attendance if people overestimate (underestimate) the true health benefits. Because we have rich information about the forecasted and actual attendance curves and the be- havior change premium, and because these objects are functions of only four parameters (β, β̃, b, λ), identification of our parametric model follows straightforwardly from the logic introduced in Section 2.2. Roughly speaking, the projected intersection of the forecasted and actual attendance curves identifies b, the behavior change premium identifies β̃, the difference between forecasted and actual attendance identifies β̃−β, and the slopes of the forecasted and actual attendance curves identify λ. In sum, we have four parameters, and we have five sets of moments identifying them: the average behavior change premium, the intercepts of the forecasted and actual attendance curves, and the slopes of the forecasted and actual attendance curves. Formally, we estimate the parameters using generalized method of moments (GMM), with the moment equations and the estimation procedure detailed in Appendix D.1. Since the forecasted attendance curve and the behavior change premium utilize multiple observations per person, we cluster all standard errors at the subject level. In Appendix D.2 we show that, to a first order, our parameter estimates can be regarded as estimates of population averages, under the assumption that the health benefits b and the cost parameter λ are independent of each other, and independent of actual and perceived present focus parameters β and β̃. We provide evidence for these assumptions in the results we summarize below. Appendix D.3 presents the derivations for how present-focused individuals behave in the presence of commitment contracts, and how commitment contracts affect their period 0 surplus. The thresh- old incentives of the commitment contracts generate payoffs that are non-separable over time, and we solve for individuals’ equilibrium strategies by backwards induction—formalized as the Percep- tion Perfect Equilibrium by O’Donoghue and Rabin (2001). Given an incentive scheme, a person’s perceived and actual expected utility of starting out in period t with ht prior attendances can be computed recursively. These value functions allow us to conduct welfare analyses and to obtain analytic solutions for a person’s strategy in each period t. Our welfare analyses take the long-run 25 preferences of present-focused individuals as the normative criterion, which is a common but not un- controversial assumption (Bernheim and Rangel, 2009; Bernheim, 2016; Bernheim and Taubinsky, 2018). 7.2 Parameter estimates and out-of-sample validation Table 7 presents our parameter estimates. Column 1 presents our estimate of the (average) present focus parameter β, column 2 presents our estimate of the (average) perceived present focus param- eter β̃, column 3 presents our estimate of the (average) perceived health benefits b, and column 4 presents our estimate of the average attendance cost c. Column 5 presents our estimate of the average internality (1−β)b, which is the wedge between forecasted and desired attendance, in units of marginal utility. Column 6 presents a measure—introduced by Augenblick and Rabin (2019)—of the degree to which people are aware of their present focus: (1− β̃)/(1− β). Row 1 presents our estimates for all participants in the study. We estimate actual and perceived present focus parameters β̂ = 0.55 and ˆ̃β = 0.84, respectively, and health benefits b̂ = $9.66 per attendance. Our estimates of (β, β̃) are approximately in the middle of the range of estimates from studies estimating both parameters: (0.31, 0.73) in Mahajan, Michel, and Tarozzi (2020), (0.37, 0.8) in Bai et al. (Forthcoming), (0.67, 0.85) in Chaloupka, Levy, and White (2019), (0.74, 0.77) in Allcott et al. (Forthcoming), and (0.85, 1) in Augenblick and Rabin (2019). As reviewed in Appendix D.9, our estimate b̂ of (perceived) health benefits is close to the middle of the range of public health estimates. Rows 2 and 3 present parameter estimates for participants in the information control group and participants who received the enhanced information treatment. Consistent with our interpretation that the information treatment affects awareness of present focus, the two rows show a significant difference in the estimated ˆ̃β, but essentially identical estimates β̂ and b̂. The remarkable simi- larity of the β̂ and b̂ estimates across the two rows would be a highly unlikely coincidence if our model were misspecified—e.g., if overestimation of future attendance was due to underestimation of future cost shocks or aspirational reporting of beliefs, but we incorrectly modeled the gap be- tween reported beliefs and behavior as due solely to naivete about present focus. If this were the case, the information treatment would not change the behavior change premium, or at least not in a way that aligns perfectly with its effects on overestimation of attendance. Thus, the reduced gap between forecasted and actual attendance would be interpreted as the information treatment increasing β and/or decreasing b, which would lead the estimates β̂, b̂ to be significantly impacted by the information treatment. Rows 4 and 5 explore heterogeneity by gym attendance over the past four weeks. Past attendance is highly predictive of future attendance, suggesting that there are stable “attendance types”: the regression coefficient from a regression of realized attendance on past attendance is 0.685 (robust s.e. 0.028).33 Consistent with economic intuition, lower attendance is associated with lower β̂ and 33The fact that weekly attendance is predictable and fairly stable might suggest that this is an environment conducive to learning. The fact that individuals overestimate their attendance in this fairly stable environment might 26 b̂ estimates. On the other hand, we find that ˆ̃(1 − β)/(1 − β̂) is remarkably stable across the two attendance groups. In rows 6-8, we estimate the model for the subsamples of participants who indicated that they wanted the 8+, 12+, and 16+ contracts, respectively; we present estimates for those rejecting the contracts in Table A13 in Appendix D.4. Consistent with our reduced-form results, we find slightly lower estimates of β and β̃ for individuals taking up the “more” contracts, but the differences are economically small. We find no evidence that commitment contracts are chosen by those with par- ticularly high perceived or actual self-control problems, or those with particularly high internalities (1− β)b. Row 9 explores the potential bias that might result from ignoring heterogeneity. We assume that there are eight types of individuals corresponding to eight subgroups: below- or above-median past attendance, crossed with receiving either the enhanced information treatment or no informa- tion treatment, crossed with willingness to take up the 12+ commitment contract.34 We exclude individuals who received the ineffective information treatment in wave 1, although treating these individuals as being in the information control group leads to essentially identical results. We es- timate the parameters separately for these eight groups, and then report the average, with each group weighted in proportion to its size. As rows 2-5 show, there is significant heterogeneity along these dimensions. However, the estimates in row 9 show that averaging over these eight subgroups produces essentially the same estimates as in row 1. Of course, there is likely additional hetero- geneity not captured by the subsample splits in row 9, but the exercise illustrates the econometric result from Appendix D.2 that our estimates can be regarded as sample averages. Figure A4 in Appendix D.4 shows a tight in-sample fit of our model to the actual and forecasted attendance curves. Panel (a) uses the representative agent specification from row 1 of Table 7, while panel (b) allows for eight different types as in row 9 of Table 7. The fact that the in-sample fit is nearly identical in both panels is consistent with the Appendix D.2 result that our parameter estimates can be regarded as sample averages. Table A14 in Appendix D.4 shows that our estimates are virtually unchanged when excluding subjects flagged for potential confusion.35 7.2.1 Out-of-sample validation tests Recall that in wave 3, we elicited preferences for commitment from all participants, but only a subset of participants were randomized to actually receive the 12+ contract. Row 1 of Table 8 reports our empirical estimates of how the 12+ commitment contract affects the behavior of those who want it. Column 1 reports the change in average attendance, column 2 reports the likelihood of attending 12 or more times with the contract, and column 3 reports the likelihood of attending 12 or more times without the contract. Column 4 reports the difference between columns 3 and 2: the impact of the commitment contract on the likelihood of attending 12 or more times. be consistent with imperfect memory and/or low perceived benefits of having well-calibrated expectations. 34We focus on the 12+ commitment contract since the other contracts were offered only in the first two waves. 35Specifically, we exclude the 8.4% of subjects who either failed the attention check, the slider comprehension check, or preferred $0 to a larger fixed or contingent payment. 27 Rows 2-5 report our model’s predictions under different assumptions about heterogeneity, still restricting to those individuals who chose to take up the contract offer. Row 2 assumes homogeneity conditional on taking up the 12+ contract, which is analogous to the specification in row 7 of Table 7. Row 3 allows for more heterogeneous parameters, allowing them to vary by the attendance and information subgroups considered in Row 9 of Table 7. Rows 4-6 consider robustness to alternative heterogeneity assumptions—in particular, heterogeneity by median past attendance only, by quartile of past attendance only, or by quartile of past attendance crossed with receipt of the enhanced information treatment. Table 8 shows that while all specifications accurately predict the impact on average attendance, more realistic heterogeneity assumptions are required to match the impact of the 12+ commitment contract on the likelihood of attending the gym 12 or more times. When individuals are assumed to be homogeneous, the model counterfactually predicts that individuals who take up the contract almost always meet its 12-visit threshold but that they rarely do so in the absence of the contract. Allowing for heterogeneity substantially changes the predictions, because individuals with high β and b are likely to attend the gym 12 or more times both with and without the commitment contract, while individuals with low β and b are unlikely to attend the gym 12 or more times both with and without the commitment contract. As illustrated by the similar predictions of rows 4-6, the exact modeling of heterogeneity is largely inconsequential, as long the model allows for both “low”- and “high”-attendance types. 7.3 Welfare effects of offering commitment contracts Table 9 presents our welfare estimates for different types of incentive schemes. We conduct these calculations under the assumption of eight heterogeneous types, as in row 9 of Table 7. The welfare results are similar for other assumptions about heterogeneity, and are reported in Appendix D.6. The results for the 8+ and 16+ contracts, which were offered only in waves 1 and 2, are also very similar, and reported in Appendix D.5. Column 1 of Table 9 reports the predicted impact on average gym attendance. Column 2 reports the average impact on individuals’ long-run utility. Column 3 reports the average impact on health benefits.36 Column 4 reports the average increase in attendance costs that results from an increase in attendance. Any incentive scheme that increases the likelihood of attendance each day must mechanically increase the incurred attendance costs. Column 5 reports the difference between columns 3 and 4. The number reported in column 5 is the social surplus from an incentive scheme, and corresponds to a standard utilitarian welfare criterion, such as the one used in Gruber and Kőszegi (2001) or O’Donoghue and Rabin (2006). The difference between individual surplus (column 2) and social surplus (column 5) is due to how the individuals’ financial outcomes are treated: the former treats penalty payments as a “loss” to individuals, while the latter assumes that these payments are “recycled” back to society.37 36Specifically, if ∆k is the average impact on∑attendance of type k individuals who have delayed health benefits bk, then the average impact on health benefits is k µk∆kbk where µk is the fraction of type k individuals. 37Here we make the implicit assumption that the marginal cost to the gym of an additional attendance is negligible. 28 Row 1 presents the estimated surplus of offering a commitment contract for 12 or more gym attendances. Offering this commitment contract lowers individuals’ private surplus, as shown in column 2. Individuals who take up this contract incur a surplus loss of −$18.69 per person. Aver- aging over all participants (not just those who take up the contract), this implies that offering this contract lowers overall consumer surplus by an average of −$9.23 per person. Although individuals are made worse off by taking up the contract, the increased gym atten- dance generated by this contract—2.47 visits for those who take it up, 1.22 visits averaged over all participants—increases social efficiency. However, the 12+ contract is not the most efficient means of generating the average 1.22 visits increase. As reported in row 2, a gym attendance subsidy of $1.90 per attendance generates the same change in average attendance, but in a more socially efficient manner. This subsidy generates both a higher increase in health benefits and a smaller increase in attendance costs, leading to a net social surplus gain of $4.39 per person.38 The fact that this subsidy generates higher surplus to individuals is mechanical and not economically interesting. The results are similar for the 8+ and 16+ contracts, as reported in Appendix D.5. Both contracts lower individuals’ private surplus, and both generate positive but small increases in social efficiency. In both cases, linear attendance subsidies that generate the same average increase in attendance are far more socially efficient. Row 3 considers the per-attendance subsidy that maximizes social surplus, which approximately equals the average value of (1−βi)bi/βi. We calculate this subsidy to be $7.54 per attendance, and we find that the subsidy increases social surplus by $9.36 per person. We do not compare to the “optimal” commitment contract because theory does not provide clear guidance about what this would be, particularly in light of our findings about stochastic valuation errors. By contrast, the optimal subsidy is straightforward to calculate and implement, and is estimated to yield large social gains. This illustrates the potential benefits of using structural estimates to inform the design of simple incentive schemes. Linear incentives are estimated to be more socially efficient than commitment contracts for two basic reasons. First, although commitment contracts are not more likely to be taken up by those with the largest internalities (1 − βi)bi, they nevertheless change behavior unevenly across people. Mechanically, only those who take up the contracts increase their attendance. However, If the gym incurs non-negligible costs from additional attendances, the social efficiency criterion in column 5 would need to be modified to include those costs as well. The column 5 measure also corresponds to a consumer surplus metric when providers fund the subsidies through lump-sum taxes or fees and return commitment contract penalties through lump-sum rebates. For example, employers might provide gym attendance subsidies at the ultimate expense of less generous bonuses or other benefits, such that on net, the subsidies only change behavior and do not create a financial transfer between employees and employers. In principle, there may be cases where provider revenue is weighted more heavily than consumer incomes. Such cases push against subsidies and toward commitment contracts. However, such cases also push most strongly toward Pigovian taxes. E.g., “sin taxes” would compare particularly favorably to commitment contracts in, e.g., the case of reducing sugary drinks consumption. Thus, a high marginal value of provider funds does not mechanically favor using commitment contracts as a policy tool. 38Additionally, column 3 of Table 9 reveals that a linear attendance subsidy not only minimizes costs, but is also more targeted to people with the highest estimates of health benefits bi. This is not a general property of subsidies, and is not true for the 16+ contract, as shown in Appendix Table A15. 29 the efficiency gains from behavior change are concave: it is more efficient to increase everyone’s attendance by 1.5 visits than to increase half of the population’s attendance by 3.0 visits, if that half of the population does not differ from the broader population.39 Second, commitment contracts change behavior unevenly across time. By definition, a linear attendance subsidy increases a person’s motivation to attend the gym by the same degree each day. Commitment contracts, however, introduce time-varying incentives because financial rewards are discontinuous at the threshold.40 The incentives to attend the gym are relatively small at the beginning, when there are many remaining opportunities for meeting the threshold. Moreover, present-focused individuals will “procrastinate” on fulfilling the threshold requirement. As shown in Figure A5 in Appendix D.7, our structural model predicts that on average, commitment contracts will have a limited effect on behavior at the beginning of the four-week period and a large effect on behavior at the end of the four-week period. Appendix Figure A6 shows that this prediction is borne out in the data: the 12+ commitment contract has a larger effect on people’s behavior at the end of the four-week period. For reasons summarized above, this unequal distribution of treatment effects over time is less efficient than the constant effects of linear attendance subsidies.41 7.3.1 Further robustness considerations Alternative assumptions about the cost distribution. We have assumed that the smallest value of a cost draw c is zero and we consider robustness to this assumption in Appendix D.8. As Appendix D.8 shows, our conclusions about individual and social surplus are largely the same under alternative assumptions—commitment contracts on net harm those who take them up, and linear incentives are a more efficient means of changing behavior. The parameter estimates naturally change—but in a manner that worsens both the in-sample and out-of-sample fit of the model. Because our data on perceived and actual attendance is sufficiently rich, and the curves them- selves exhibit only modest curvature, how we “connect the dots” via parametric assumptions does not have a big impact on our key structural estimates. To illustrate, when we re-estimate row 1 of Table 7 with a quadratic approximation to the cumulative distribution function of cost draws,42 we obtain very similar estimates of perceived and actual present focus that are within the confidence bands of our reported estimates: ˆ̃β = 0.82 and β̂ = 0.51. 39The intuition is simply that if c∗i is the marginal cost draw at which a person is indifferent between attending the gym or not, then a marginal change in this person’s motivation to attend the gym generates social benefits of bi − c∗i . Thus, the more motivated a person is to attend in the first place, the higher is c∗i , and thus the lower are the social benefits of providing this person with additional motivation to exercise. 40A similar argument would apply to financial rewards that are kinked at the threshold, as in, e.g., Kaur, Kremer, and Mullainathan (2015). 41Both of these principles apply to non-stationary cost distributions, including situations where costs might decrease or increase over time. More generally, it is most efficient for incentives for behavior ch[ange to be distrib(uted eve)nly].2 42That is, pe[rceived and actual attendan]ce are modeled, respectively, as α̃(p) = 28 λ1β̃(b+ p)− λ2 β̃(b+ p) and α(p) = 28 λ1β(b+ p)− λ2 (β(b+ p))2 . 30 Imperfect perception of incentives on the “intensive” margin. Although we have allowed for stochastic valuation errors in people’s choice of incentives, we have assumed that stochastic valuation errors are not present in people’s daily gym attendance decisions once the chosen incentives are instituted. This does not exclude the possibility that people’s perceptions of the health benefits of exercise are incorrect; we only exclude that these perceptions fluctuate over the time frame of our experiment. This assumption seems plausible for at least the linear piece-rate incentives, where a person’s daily attendance decision involves comparing the costs c to the benefits b+ p for a single day, and does not involve complex aggregation over a longer horizon beyond formulation of beliefs about b. This assumption is also consistent with our model’s tight fit to various moments of the data. For example, the stability of our estimates of b and β in rows 2 and 3 of Table 7, or the out-of-sample validation in Table 8, would be less likely in a misspecified model. At the same time, this assumption may be less realistic for the dynamic incentives generated by the threshold incentives of commitment contracts, since reacting to these incentives requires people to solve the dynamic programming problem detailed in Appendix D.3. If this complexity injects noise in people’s decisions about gym attendance, it would strengthen our qualitative results about commitment contracts’ negative effects on consumer surplus, and the greater social efficiency of simple linear subsidies. 8 Concluding remarks and implications for future work Who chooses commitment contracts? The typical revealed preference logic in the literature has been that people are revealing a desire to change their future selves’ behavior when they agree to penalties with no financial upside. Our results show that take-up of commitment contracts is not strongly related to perceived present focus, appears to be influenced by stochastic valuation errors, and reduces welfare. Better understanding how present-focused individuals make choices between various incentives, including commitment contracts, informs both positive and normative analysis. In addition to producing new estimates of present focus and new evidence about who takes up commitment con- tracts, the insights from this study can help inform policy design aimed at counteracting limited self-control. For example, while economists have long-studied “sin taxes” (e.g., O’Donoghue and Rabin, 2006; Allcott, Lockwood, and Taubinsky, 2019), there is little work on when the optimal policy mix should involve such taxes instead of offering commitment contracts, or when the two tools are complementary. One intuition is that because taxes and subsidies are blunt policy tools that affect everyone, policy instruments that don’t restrict choices, such as offers of commitment contracts, are better targeted. However, our results about the disappointing welfare effects of our commitment contracts illustrate how a combination of naivete and other types of mistakes can make freedom-preserving policies particularly poorly targeted, and consequently less socially efficient than the standard economic tools of taxation. Our results come with caveats and leave open many questions. First, given the potential for 31 measurement error, it may not be surprising that different estimates of time inconsistency may have low association with each other. Thus, commitment contract take-up may be useful as one imperfect measure of awareness of time inconsistency, even if measurement error creates a bias for binary outcomes like take-up of commitment contracts. In our setting, both the experimental evidence and structural estimates suggest that this is an upward bias: the commitment contracts should have been unattractive to many of those who were fully sophisticated about their time inconsistency. Continuous measures, such as the behavior change premium approach in this paper, make it possible to study awareness of time inconsistency using population averages that are more robust to noisy valuations and measurement error. But that does not imply that there is no additional information about time inconsistency in the take-up of commitment contracts. Second, our analyses focus on a particular set of commitment contracts and incentive schemes; it will be important for future work to apply our methodology to evaluate other types of commit- ment contracts and incentive schemes. Although our results illustrate that high take-up and high treatment effects on behavior do not by themselves imply that commitment contracts are welfare- enhancing, our results do not preclude the possibility that commitment contracts different from ours may be more beneficial. Third, it is natural to expect that in the presence of noisy valuation and other frictions such as perceived social pressure, stakes will matter. Although our $80 stakes were not low relative to many other commitment contract experiments, settings like those of Ashraf, Karlan, and Yin (2006), Kaur, Kremer, and Mullainathan (2015), and Schilbach (2019) feature larger stakes. Although the participants in those studies are likely to be less numerate than the participants in our study, and thus presumably more susceptible to valuation errors, it is possible that the larger stakes in those studies lead to less noise than what we observe. Analyzing the impact of stakes, holding the sample constant, is another important question for future research. Fourth, our estimates are local to the participants of our fitness center. Even within the exercise domain, it will be valuable to apply our methodology to other populations. More broadly, it will be valuable to extend our methods to other domains of behavior, such as food choice, education, and saving and borrowing decisions. For example, Allcott et al. (Forthcoming) extend our method for estimating present focus parameters to consumer lending markets, though they do not examine offers of commitment contracts. Fifth, although we theoretically clarify the important role that uncertainty about future costs plays in commitment contract demand, we do not explore it empirically. Yet, results from settings with naturally occurring differences in uncertainty, like Kaur, Kremer, and Mullainathan (2015), are clearly in line with our theoretical results. Future work should hone in on this comparative static. Sixth, our analyses assume the long-run criterion is the normative standard, which has been chal- lenged by Bernheim and Rangel (2009) and others. Exploring welfare implications under alternative criteria could be fruitful. While there is a clear need for further testing, refining, and critiquing of our approach, our 32 results illustrate the value of theoretically-grounded quantitative methods such as ours in helping improve incentive design for people with limited self-control. References Acland, Dan, and Vinci Chow. 2018. “Self-Control and Demand for Commitment in Online Game Playing: Evidence from a Field Experiment.” Journal of the Economic Science Association 4 (1): 46–62. Acland, Dan, and Matthew R. Levy. 2012. “Naiveté, Projection Bias, and Habit Formation in Gym Attendance.” Working Paper: GSPP13-002. Acland, Dan, and Matthew R. Levy. 2015. “Naiveté, Projection Bias, and Habit Formation in Gym Attendance.” Management Science 61 (1): 146–160. Afzal, Uzma, Giovanna D’Adda, Marcel Fafchamps, Simon R. Quinn, and Farah Said. 2019. “Implicit and Explicit Commitment in Credit and Saving Contracts: A Field Experiment.” NBERWorking Paper 25802. Aigner, Dennis J. 1973. “Regression with a Binary Independent Variable Subject to Errors of Observation.” Journal of Econometrics 1 49–60. Alan, Sule, and Seda Ertac. 2015. “Patience, self-control and the demand for commitment: Evidence from a large-scale field experiment.” Journal of Economic Behavior and Organization 115 111–122. Allcott, Hunt, Joshua Kim, Dmitry Taubinsky, and Jonathan Zinman. Forthcoming. “Are High- Interest Loans Predatory? Theory and Evidence from Payday Lending.” Review of Economic Studies. Allcott, Hunt, Benjamin B. Lockwood, and Dmitry Taubinsky. 2019. “Regressive Sin Taxes, with an Application to the Optimal Soda Tax.” Quarterly Journal of Economics 134 (3): 1557–1626. Ariely, Dan, and Klaus Wertenbroch. 2002. “Procrastination, Deadlines, and Performance: Self-Control by Precommitment.” Psychological Science 13 (3): 219–224. Ashraf, Nava, Dean Karlan, and Wesley Yin. 2006. “Tying Odysseus to the Mast: Evidence From a Commitment Savings Product in the Philippines.” The Quarterly Journal of Economics 121 (2): 635–672. Augenblick, Ned, Muriel Niederle, and Charles Sprenger. 2015. “Working Over Time: Dynamic Inconsistency in Real Effort Tasks.” The Quarterly Journal of Economics 130 (3): 1067–1115. Augenblick, Ned, and Matthew Rabin. 2019. “An Experiment on Time Preference and Misprediction in Unpleasant Tasks.” The Review of Economic Studies 86 (3): 941–975. Avery, Mallory, Osea Giuntella, and Peiran Jiao. 2019. “Why Don’t We Sleep Enough? A Field Experiment among College Students.” IZA Discussion Paper, No. 12772. Bai, Liang, Benjamin Handel, Ted Miguel, and Gautam Rao. Forthcoming. “Self-Control and De- mand for Preventive Health: Evidence from Hypertension in India.” Review of Economics and Statistics. Bernheim, B. Douglas. 2016. “The Good, the Bad, and the Ugly: A Unified Approach to Behavioral Welfare Economics.” Journal of Benefit-Cost Analysis 7 (1): 12–68. 33 Bernheim, B. Douglas, and Antonio Rangel. 2009. “Beyond Revealed Preference: Choice-Theoretic Foundations for Behavioral Welfare Economics.” Quarterly Journal of Economics 124 (1): 51–104. Bernheim, B. Douglas, and Dmitry Taubinsky. 2018. “Behavioral Public Economics.” In The Handbook of Behavioral Economics, edited by Bernheim, B. Douglas, Stefano DellaVigna, and David Laibson Volume 1. New York: Elsevier. Beshears, John, James J. Choi, Christopher Harris, David Laibson, Brigitte C. Madrian, and Jung Sakong. 2020. “Which Early Withdrawal Penalty Attracts the Most Deposits to a Commitment Savings Account?” Journal of Public Economics 183 Article 104144. Bhattacharya, Jay, Alan M. Garber, and Jeremy D. Goldhaber-Fiebert. 2015. “Nudges in Exercise Commitment Contracts: A Randomized Trial.” NBER Working Paper 21406. Bisin, Alberto, and Kyle Hyndman. 2020. “Present-Bias, Procrastination and Deadlines in a Field Experiment.” Games and Economic Behavior 119 339–357. Blair, Steven N., Harold W. Kohl, Ralph S. Paffenbarger, Debra G. Clark, Kenneth H. Cooper, and LarryW. Gibbons. 1989. “Physical Fitness and All-Cause Mortality A Prospective Study of Healthy Men and Women.” Journal of the American Medical Association 262 (17): 2395–2401. Block, H.D., and Jacob Marschak. 1960. “Random Orderings and Stochastic Theories of Response.” In Contributions to Probability and Statistics. Essays in Honor of Harold Hotelling, edited by Olkin, Ingram, Stanford University Press. Bonein, Aurélie, and Laurent Denant-Boèmont. 2015. “Self-Control, Commitment and Peer Pressure: A Laboratory Experiment.” Experimental Economics 18 (4): 543–568. Brune, Lasse, Eric Chyn, and Jason T. Kerwin. Forthcoming. “Pay Me Later: A Simple Employer- Based Saving Scheme.” American Economic Review. Brune, Lasse, Xavier Giné, Jessica Goldberg, and Dean Yang. 2016. “Facilitating Savings for Agriculture: Field Experimental Evidence from Malawi.” Economic Development and Cultural Change 64 (2): 187–220. Casaburi, Lorenzo, and Rocco Macchiavello. 2019. “Demand and Supply of Infrequent Payments as a Commitment Device: Evidence from Kenya.” American Economic Review 109 (2): 523–55. Chaloupka, Frank J., Matthew R. Levy, and Justin S. White. 2019. “Estimating Biases in Smoking Cessation: Evidence from a Field Experiment.” NBER Working Paper 26522. Chow, Vinci. 2011. “Demand for a Commitment Device in Online Gaming.” Unpublished. DellaVigna, Stefano, John A List, and Ulrike Malmendier. 2012. “Testing for Altruism and Social Pressure in Charitable Giving.” Quarterly Journal of Economics 127 (1): 1–56. DellaVigna, Stefano, and Ulrike Malmendier. 2004. “Contract Design and Self-Control: Theory and Evidence.” The Quarterly Journal of Economics 119 (2): 353–402. Dupas, Pascaline, and Jonathan Robinson. 2013. “Why Don’t the Poor Save More? Evidence from Health Savings Experiments.” American Economic Review 103 (4): 1138–71. 34 Echenique, Federico, and Kota Saito. 2019. “General Luce Model.” Economic Theory 68 (4): 811–826. Ek, Claes, and Margaret Samahita. 2020. “Pessimism and Overcommitment.” Working Paper. Ericson, Keith M., and David Laibson. 2019. “Intertemporal Choice.” In Handbook of Behavioral Eco- nomics, edited by Bernheim, B. Douglas, Stefano DellaVigna, and David Laibson Volume 2. Elsevier. Exley, Christine L., and Jeffrey K. Naecker. 2017. “Observability Increases the Demand for Commit- ment Devices.” Management Science 63 (10): 3262–3267. Fang, Hanming, and Dan Silverman. 2004. “Time Inconsistency and Welfare Program Participation: Evidence from the NLSY.” July, Cowles Foundation Discussion Paper No. 1465. Gagnon-Bartsch, Tristan, Matthew Rabin, and Joshua Schwartzstein. 2021. “Channeled Attention and Stable Errors.” Working Paper. Giné, Xavier, Dean Karlan, and Jonathan Zinman. 2010. “Put Your Money Where Your Butt Is: A Commitment Contract for Smoking Cessation.” American Economic Journal: Applied Economics 2 (4): 213–235. Gruber, Jonathan, and Botond Kőszegi. 2001. “Is Addiction Rational? Theory and Evidence?” Quar- terly Journal of Economics 116 (4): 1261–1305. Hall, Alistair R. 2005. Generalized Method of Moments. Oxford University Press. Hanna, Rema, Sendhil Mullainathan, and Joshua Schwartzstein. 2014. “Learning Through Noticing: Theory and Evidence from a Field Experiment.” The Quarterly Journal of Economics 129 (3): 1311–1353. Hansen, Lars Peter. 1982. “Large Sample Properties of Generalized Method of Moments Estimators.” Econometrica 50 (4): 1029–1054. Harberger, Arnold. 1964. “Taxation, Resource Allocation, and Welfare.” In The role of direct and indirect taxes in the Federal Reserve System, 25–80, Princeton University Press. Hausman, Jerry. 2001. “Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left.” Journal of Economic Perspectives 15 (4): 57–67. Heidhues, Paul, and Botond Kőszegi. 2009. “Futile Attempts at Self-Control.” Journal of the European Economic Association 7 (2): 423–434. Houser, Daniel, Daniel Schunk, Joachim Winter, and Erte Xiao. 2018. “Temptation and Commit- ment in the Laboratory.” Games and Economic Behavior 107 329–344. Huffman, David, Collin Raymond, and Julia Shvets. 2020. “Persistent Overconfidence and Biased Memory: Evidence from Managers.” Working Paper. John, Anett. 2020. “When Commitment Fails: Evidence from a Field Experiment.” Management Science 66 (2): 503–529. Karlan, Dean, and Leigh L. Linden. 2017. “Loose Knots: Strong Versus Weak Commitments to Save for Education in Uganda.” NBER Working Paper 19863. 35 Kaur, Supreet, Michael Kremer, and Sendhil Mullainathan. 2015. “Self-Control at Work.” Journal of Political Economy 123 (6): 1227–1277. Khaw, Mel Win, Ziang Li, and Michael Woodford. 2021. “Cognitive Imprecision and Small-Stakes Risk Aversion.” Review of Economic Studies 88 (4): 1979–2013. Laibson, David. 1997. “Golden Eggs and Hyperbolic Discounting.” Quarterly Journal of Economics 112 (2): 443–478. Laibson, David. 2015. “Why Don’t Present-Baised Agents Make Commitments?” American Economic Review 105 (5): 267–272. Laibson, David, Peter Maxted, Andrea Repetto, and Jeremy Tobacman. 2018. “Estimating Dis- count Functions with Consumption Choices over the Lifecycle.” Working Paper. Lusardi, Annamaria, and Olivia S. Mitchell. 2007. “Baby Boomer Retirement Security: The Roles of Planning, Financial Literacy, and Housing Wealth.” Journal of Monetary Economics 51 (1): 205–224. Mahajan, Aprajit, Christian Michel, and Alessandro Tarozzi. 2020. “Identification of Time- Inconsistent Models: The Case of Insecticide Treated Nets.” NBER Working Paper 27198. Martinez, Seung-Keun, Stephan Meier, and Charles Sprenger. 2020. “Procrastination in the Field: Evidence from Tax Filing.” Working Paper. McKelvey, Richard D., and Thomas R. Palfrey. 1995. “Quantal Response Equilibria for Normal Form Games.” Games and Economic Behavior 10 (1): 6–38. Milgrom, Paul, and Ilya Segal. 2002. “Envelope Theorems for Arbitrary Choice Sets.” Econometrica 70 (2): 583–601. Milkman, Katherine L., Julia A. Minson, and Kevin G. M. Volpp. 2014. “Holding the Hunger Games Hostage at the Gym: An Evaluations of Temptation Bundling.” Management Science 60 (2): 283–299. Natenzon, Pauolo. 2019. “Random Choice and Learning.” Journal of Political Economy 127 (1): 419–457. Neumann, Peter J., Joushua T. Cohen, and Milton C. Weinstein. 2014. “Updating Cost- Effectiveness: The Curious Resilience of the $50,000 per-QALY-Threshold.” The New England Journal of Medicine 371 (9): 796–797. O’Donoghue, Ted, and Matthew Rabin. 1999. “Doing It Now or Later.” American Economic Review 89 (1): 103–124. O’Donoghue, Ted, and Matthew Rabin. 2001. “Choice and Procrastination.” Quarterly Journal of Economics 116 (1): 121–160. O’Donoghue, Ted, and Matthew Rabin. 2006. “Optimal Sin Taxes.” Journal of Public Economics 90 (10): 1825–1849. Oettingen, Gabriele, Heather Barry Kappes, Katie B. Guttenberg, and Peter M. Gollwitzer. 2015. “Self-regulation of Time Management: Mental Contrasting with Implementation Intentions.” Euro- pean Journal of Social Psychology 45 (2): 218–229. 36 Paserman, M. Daniele. 2008. “Job Search and Hyperbolic Discounting: Structural Estimation and Policy Evaluation.” The Economic Journal 118 (531): 1418–1452. de Quidt, Jonathan, Johannes Haushofer, and Christopher Roth. 2018. “Measuring and Bounding Experimenter Demand.” American Economic Review 108 (11): 3266–3302. Royer, Heather, Mark Stehr, and Justin Sydnor. 2015. “Incentives, Commitments, and Habit Forma- tion in Exercise: Evidence from a Field Experiment with Workers at a Fortune-500 Company.” American Economic Journal: Applied Economics 7 (3): 51–84. Sadoff, Sally, and Anya Samek. 2019. “Can Interventions Affect Commitment Demand? A Field Exper- iment on Food Choice.” Journal of Economic Behavior and Organization 158 90–109. Sadoff, Sally, Anya Savikhin Samek, and Charles Sprenger. 2019. “Dynamic Inconsistency in Food Choice: Experimental Evidence from a Food Desert.” Review of Economic Studies 1–35. Schilbach, Frank. 2019. “Alcohol and Self-Control: A Field Experiment in India.” American Economic Review 109 (4): 1290–1322. Schwartz, Janet, Daniel Mochon, Lauren Wyper, Josiase Maroba, Deepak Patel, and Dan Ariely. 2014. “Healthier by Precommitment.” Psychological Science 25 (2): 538–546. Schwartzstein, Joshua. 2014. “Selective Attention and Learning.” Journal of the European Economic Association 12 (6): 1423–1452. Shui, Haiyan, and Lawrence M. Ausubel. 2005. “Time Inconsistency in the Credit Card Market.” Working Paper. Skiba, Paige Marta, and Jeremy Tobacman. 2018. “Payday Loans, Uncertainty, and Discounting: Explaining Patterns of Borrowing, Repayment, and Default.” Working Paper. Strotz, R. H. 1955. “Myopia and Inconsistency in Dynamic Utility Maximization.” The Review of Economic Studies 23 (3): 165–180. Sun, Kai, Jing Song, Larry M. Manheim, Rowland W. Chang, Kent C. Kwoh, Pamela A. Semanik, Charles B. Eaton, and Dorothy D. Dunlop. 2014. “Relationship of Meeting Physical Activity Guidelines with Quality Adjusted Life Years.” Seminars in Arthritis and Rheumatism 44 (3): 264–270. Toussaert, Séverine. 2018. “Eliciting Temptation and Self-Control Through Menu Choices: A Lab Exper- iment.” Econometrica 86 (3): 859–889. Toussaert, Séverine. 2019. “Revealing Temptation Through Menu Choice: Field Evidence.” Unpublished. Wei, Xue-Xin, and Alan A. Stocker. 2015. “A Bayesian Observer Model Constrained by Efficient Coding Can Explain Anti-Bayesian Percepts.” Nature Neuroscience 18 1509–1517. Woodford, Michael. 2012. “Inattentive Valuation and Reference-Dependent Choice.” Unpublished. Woodford, Michael. 2019. “Modeling Imprecision in Perception, Valuation and Choice.” Annual Review of Economics 12 579–601. 37 Zhang, Qing ©r Ben Greiner. 2021. “Time Inconsistency, Sophistication, and Commitment: An Experi- mental Study.” Economic Letters 203 Article 109982. 38 Table 1: Summary of commitment contract studies Type of contract Authors (year) Take-up rate At stake A. Penalty-based: Giné, Karlan, and Zinman (2010) 11% own money Royer, Stehr, and Sydnor (2015) 12% earned money Bai et al. (Forthcoming) 14% own money Bhattacharya, Garber, and Goldhaber-Fiebert (2015) 23% own money John (2020) 27% own money Kaur, Kremer, and Mullainathan (2015) 36% own money Schwartz et al. (2014) 36% house money Bonein and Denant-Boèmont (2015) 42% other1 Beshears et al. (2020) 39-46%2 house money Toussaert (2019) 21-65% house money Schilbach (2019) 31-55% house money Exley and Naecker (2017) 41-65% house money Avery, Giuntella, and Jiao (2019) 63% house money Ariely and Wertenbroch (2002) 73% other3 Average take-up rates (Penalty-based contracts) Own money at stake 22% House money at stake 47% Other stakes 42% Overall 37% B. Removing options: Restricted access to Brune et al. (2016) 6% own money Afzal et al. (2019) 4-9% own money Zhang ©r Greiner (2021) 16-31% other Sadoff and Samek (2019) 20-50% other Ek and Samahita (2020) 27%4 other Ashraf, Karlan, and Yin (2006) 28% own money Sadoff, Samek, and Sprenger (2019) 33% other Acland and Chow (2018) 35% other John (2020) 42% own money Karlan and Linden (2017) 44% own money Toussaert (2018) 45% other Bisin and Hyndman (2020) 31-62% other Houser et al. (2018) 48% other Brune, Chyn, and Kerwin (Forthcoming) 50% own money Beshears et al. (2020) 56%5 house money Augenblick, Niederle, and Sprenger (2015) 59% other Milkman, Minson, and Volpp (2014) 61%4 other Dupas and Robinson (2013) 65% own money Alan and Ertac (2015) 69% house chocolates Chow (2011) 79% other Casaburi and Macchiavello (2019) 93% own money Average take-up rates (Option removal contracts) Own money at stake 42% House money/object at stake 63% Other stakes 43% Overall 45% 1 Points in a two-part experiment 4 Percent of participants with WTP>0 2 Fraction of endowment put into account with 5 Fraction of endowment put into account with early withdrawal penalty early withdrawal prohibited 3 Grade points Notes: This table reports the take-up rates for (weakly) dominated commitment contracts offered at no cost. We include studies appearing in Table 1 of Schilbach (2019) or Table 1 of John (2020) as well as six more recent studies. Panel A represents contracts that imposed a penalty when the commitment threshold was not reached, i.e. non-binding contracts, while Panel B represents fully binding commitments. For studies that reported take-up rates from different waves or treatment groups, the range of relevant take-up rates is shown. At the bottom of each panel, we report unweighted averages across the studies of each type. 39 Table 2: Study demographics Wave 1 Wave 2 Wave 3 Overall Female 0.66 0.61 0.57 0.61 (0.47) (0.49) (0.50) (0.49) Agea 30.93 34.55 34.38 33.51 (12.61) (15.29) (15.70) (14.82) Student, full-time 0.64 0.54 0.55 0.57 (0.48) (0.50) (0.50) (0.50) Working, full- or part-time 0.50 0.60 0.59 0.57 (0.50) (0.49) (0.49) (0.50) Married 0.25 0.28 0.27 0.27 (0.44) (0.45) (0.45) (0.44) Advanced degreeb 0.41 0.48 0.47 0.46 (0.49) (0.50) (0.50) (0.50) Household incomea 45,804 58,502 58,527 55,139 (40,574) (48,248) (49,722) (47,121) Visits in the past 4 weeks, recorded 7.04 7.63 5.89 6.91 (5.86) (6.12) (5.36) (5.86) N 340 509 399 1,248 a. Imputed from categorical ranges. b. A graduate degree beyond a B.A. or B.S. Notes: This table shows the means of demographic variables reported in the study across the three waves of implementation. The table also summarizes data on past visit frequencies to the gym. Recorded visits are obtained from the fitness center’s log-in records. 40 Table 3: Association between the behavior change premium and proxies for sophistication Behavior change premium (1) (2) (3) Basic info. treatment 0.30 0.45 0.28 (0.56) (0.57) (0.56) Enhanced info. treatment 1.36** 1.41** 1.25** (0.57) (0.58) (0.59) Goal − exp. attend. 0.71** (z-score) (0.29) Actual − exp. attend. 0.45** (z-score) (0.22) Dep. var. mean: 1.17 1.17 1.17 (0.22) (0.22) (0.22) Dep. var. mean, 0.66 0.66 0.66 info. control group: (0.24) (0.24) (0.24) Wave FEs Yes Yes Yes N 1,126 1,126 1,126 Notes: This table reports the association between the estimated behavior change premium (calculated exclud- ing the $1 incentive) and proxies for sophistication. Basic info. treatment and Enhanced info. treatment are dummies for whether participants received the basic and enhanced information treatments, respectively (see Section 3 for further details about the two information treatments). Goal − exp. attend. is the standardized (z-score) difference between participants’ goal attendance and their subjective expectations of attendance in the absence of incentives (unstandardized mean: 3.34, SD: 3.64). Actual − exp. attend. is the standardized (z-score) difference between participants’ actual attendance and their subjective expectations of attendance for the incentive assigned to them (unstandardized mean: −4.17, SD: 6.61). Each column presents coefficient estimates from OLS regressions with heteroskedasticity-robust standard errors in parentheses. Dependent variable means, with standard errors in parentheses, are reported for the full sample and information control group. The sample excludes participants in wave 3 assigned a commitment contract (122 participants) rather than a piece-rate incentive, since the Actual − exp. attend. proxy cannot be computed for those participants. ** denotes a statistic that is statistically significantly different from 0 at the 5% level. 41 Table 4: Association between take-up of “more” commitment contracts and proxies for sophistication Take-up of “more” visits contracts (1) (2) (3) (4) Basic info. treatment –0.022 –0.023 –0.013 –0.019 (0.041) (0.041) (0.041) (0.041) Enhanced info. treatment –0.080** –0.086*** –0.079** –0.072** (0.031) (0.032) (0.031) (0.031) Behavior change premium 0.027** (z-score) (0.011) Goal − exp. attend. 0.038*** (z-score) (0.013) Actual − exp. attend. –0.043*** (z-score) (0.013) Dep. var. mean: 0.49 0.49 0.49 0.49 (0.01) (0.01) (0.01) (0.01) Dep. var. mean, 0.52 0.52 0.52 0.52 info. control group: (0.01) (0.01) (0.01) (0.01) Wave FEs Yes Yes Yes Yes Contract FEs Yes Yes Yes Yes N 2,824 2,824 2,824 2,824 Clusters 1,126 1,126 1,126 1,126 Notes: This table reports the association between take-up of a “more” visits commitment contract and proxies for sophistication and the behavior change premium. We pool the data by participant and include commitment contract threshold fixed effects (i.e., 8-, 12-, 16-visit thresholds). The independent variables in this table are defined exactly as in Table 3, and the behavior change premium is standardized to be a z-score as well. Each column presents coefficient estimates from OLS regressions with standard errors, clustered by subject, in parentheses. Dependent variable means, with standard errors in parentheses, are reported for the full sample and information control group. The sample excludes participants in wave 3 assigned a commitment contract (122 participants) rather than a piece-rate incentive, since the Actual − exp. attend. proxy cannot be computed for those participants. **,*** denote statistics that are statistically significantly different from 0 at the 5% and 1% level respectively. 42 Table 5: Take-up of “more” and “fewer” commitment contracts Chose “more” Chose “fewer” Chose “more” Chose “fewer” given chose given chose contract contract “fewer” “more” Diff Diff Threshold (1) (2) (3) (4) (3)-(1) (4)-(2) 8 visits 0.64 0.34 0.89 0.47 0.25*** 0.13*** 12 visits 0.49 0.31 0.67 0.43 0.18*** 0.12*** 16 visits 0.32 0.27 0.50 0.43 0.18*** 0.15*** Notes: Column 1 reports take-up rates of commitment contracts to visit the gym at least 8, 12, or 16 days over the next four weeks (i.e., take-up of the “more” contract). Column 2 reports take-up rates of commitment contracts to visit the gym less than 8, 12, or 16 days over the same period (i.e., take-up of the “fewer” contract). Columns 3 and 4 show the take-up rates of each type of commitment contract conditional on having chosen the other type of commitment contract, for each threshold. Columns 5 and 6 display the difference in the take-up rates of column 3 versus column 1 and the difference in the take-up rates of column 4 versus column 2, respectively. Over three study waves, all participants faced the choice of a commitment contract at the 12-visit threshold (N=1,248) while the 8-visit and 16-visit commitment contracts were only presented in the first two waves (N=849). *** denotes differences that are statistically significantly different from 0 at the 1% level. Table 6: Association between perceived success in contracts and expected attendance Subjective expected attendance without incentives (1) (2) (3) (4) Subj. prob. succeed in 8.46*** 9.17*** 9.68** “more” contract (1.31) (1.17) (3.79) Subj. prob. succeed in –3.96*** –4.64*** –9.97*** “fewer” contract (0.91) (0.85) (3.10) N 399 399 399 76 “More” − “Fewer” 13.81*** 19.64*** (1.37) (6.02) Notes: This table reports the association between subjective beliefs about commitment contract success and expected attendance with no incentives. Each column presents coefficient estimates from OLS regressions with heteroskedasticity-robust standard errors in parentheses. Subj. prob. succeed in “more” contract is participants’ subjective expectations of attending the gym 12 or more days during the 4-week incentive period, coded as a probability between 0 and 1. Subj. prob. succeed in “fewer” contract is participants’ subjective expectations of attending the gym fewer than 12 times during the 4-week incentive period, coded as a probability between 0 and 1. The dependent variable is participants’ subjective expectations of attendance in the absence of any incentives. The “More” − “Fewer” row shows the estimated difference between the coefficient on the probability of success under the “more” contract versus the coefficient on the probability of success under the “fewer” contract. The sample in columns 1-3 consists of all participants in wave 3, the only wave in which we elicited the probabilities of contract success. The sample in column 4 is restricted to participants in wave 3 who indicated that they wanted both the “more” and “fewer” contract with a threshold of 12 visits. **,*** denote statistics that are statistically significantly different from 0 at the 5% and 1% level respectively. 43 Table 7: Parameter estimates (1) (2) (3) (4) (5) (6) ˆ̃ ˆ̃ β̂ β b̂ 1/λ̂ (1− β̂) · b̂ (1−β) (1−β̂) All 0.55 0.84 9.66 14.81 4.39 0.36 1 (N=1, 126) (0.51, 0.58) (0.80, 0.88) (9.05, 10.28) (13.61, 16.00) (4.02, 4.77) (0.29, 0.43) Information control 0.54 0.86 10.03 15.02 4.63 0.30 2 (N=560) (0.50, 0.58) (0.82, 0.90) (9.13, 10.93) (13.48, 16.55) (4.15, 5.11) (0.22, 0.37) Enhanced information 0.54 0.78 9.83 14.76 4.49 0.49 3 treatment (N=392) (0.47, 0.62) (0.69, 0.87) (8.77, 10.89) (12.33, 17.19) (3.73, 5.26) (0.35, 0.63) Below-median past 0.38 0.78 7.07 13.75 4.39 0.36 4 attendance (N=550) (0.33, 0.43) (0.70, 0.86) (6.45, 7.68) (11.91, 15.58) (3.92, 4.86) (0.25, 0.46) Above-median past 0.68 0.88 12.57 15.66 4.08 0.36 5 attendance (N=576) (0.63, 0.72) (0.84, 0.92) (11.45, 13.69) (14.09, 17.24) (3.54, 4.63) (0.26, 0.45) Chose 8+ visit 0.54 0.84 9.16 14.23 4.23 0.36 6 contract (N=546) (0.49, 0.59) (0.77, 0.90) (8.34, 9.98) (12.51, 15.96) (3.70, 4.76) (0.24, 0.47) Chose 12+ visit 0.50 0.81 9.62 12.33 4.84 0.37 7 contract (N=556) (0.45, 0.54) (0.75, 0.88) (8.78, 10.47) (10.86, 13.81) (4.31, 5.38) (0.26, 0.47) Chose 16+ visit 0.47 0.75 10.30 10.33 5.46 0.48 8 contract (N=275) (0.39, 0.55) (0.63, 0.86) (8.94, 11.67) (8.22, 12.44) (4.57, 6.34) (0.33, 0.64) Averaging heterogeneity 0.55 0.85 10.24 15.55 4.21 0.35 9 (N=952) (0.52, 0.58) (0.81, 0.89) (9.50, 10.98) (14.24, 16.85) (3.83, 4.59) (0.27, 0.42) Notes: This table reports parameter estimates and respective 95% confidence intervals for various subsamples. The subsamples are determined by the participants’ days of attendance over the 4 weeks prior, selection into the enhanced information treatment group, and their take-up of the various commitment contracts for more visits. Section 7.1 describes how the parameter estimation was performed. The present focus parameter is denoted by β, the perceived present focus parameter is denoted by β̃, people’s (perceived) health benefits of a gym attendance are denoted by b, and people’s expected costs of a gym attendance are denoted by 1/λ. Row 9 averages estimates across eight subsamples corresponding to (i) assignment to either the enhanced information treatment or the information control group, crossed with (ii) whether days of attendance over the 4 weeks prior to the experiment is below or above the median, crossed with (iii) take-up of the more- visit contract with a threshold of 12 visits. Over the three study waves, only participants in waves 2 and 3 (N=908) were eligible for random assignment to the enhanced information treatment group, and thus row 9 excludes participants assigned to the “basic” information treatment in wave 1. Inference for the statistics in columns 4-6, and for the averages reported in row 9, is conducted using the Delta method. All participants faced a take-up decision about a commitment contract with a 12-visit threshold (N=1,248), while the 8-visit and 16-visit commitment contracts were only presented in the first two waves (N=849). The samples exclude participants in wave 3 assigned a commitment contract (122 participants), rather than a piece-rate incentive, as our structural estimates only make use of data about how participants behave under piece-rate incentives. 44 Table 8: Estimated impact of 12+ contract on attendance (1) (2) (3) (4) Pr(att. ≥ 12) Pr(att. ≥ 12) ∆ in att. ∆ in Pr(att. ≥ 12) with contract without contract 3.51 0.65 0.22 0.42 1 Empirical (1.38, 5.65) (0.52, 0.78) (0.10, 0.35) (0.26, 0.58) 2 Homogeneous 3.05 0.91 0.15 0.76 Heterogeneous by median 3 2.47 0.74 0.34 0.40 past att., info. treatment Heterogeneous by 4 2.61 0.74 0.33 0.41 median past att. Heterogeneous by 5 2.74 0.73 0.31 0.41 quartile past att. Heterogeneous by quartile 6 2.65 0.73 0.32 0.41 past att., info. treatment Notes: This table assesses our estimated models’ predictions about how the “12 visits or more” contract affects the behavior of participants who indicated that they would take it up. All calculations are for the four-week period in our experiment. Row 1 reports empirical estimates from OLS regressions with wave fixed effects, with 95% confidence intervals in parentheses. In row 2, we assume that participants are homogeneous conditional on taking up the 12+ contract. Thus, row 2 assumes that there are only two types of individuals: those who take up the 12+ contract and those who don’t. In row 3, we estimate a heterogeneous model, as in row 9 of Table 7. In rows 4-6, we consider alternative heterogeneity assumptions. Row 4 divides individuals only according to their median past attendance. Row 5 divides individuals by quartile of past attendance. Row 6 divides individuals by quartile of past attendance crossed with receipt of the enhanced information treatment. 45 Table 9: Estimated welfare effects of piece-rates and commitment contracts (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 12+ visits contract 1.22 −$9.23 $10.88 $9.68 $1.21 2 Linear incentive, p = $1.90 1.22 $22.95 $12.45 $8.06 $4.39 Optimal linear incentive, 3 4.38 $106.71 $44.46 $35.10 $9.36 p = $7.54 Notes: This table reports the estimated effects of three different incentive schemes, averaged over the full population, using the heterogeneity assumptions from row 9 of Table 7. Row 1 reports the estimated effect of offering individuals the 12+ commitment contract. All calculations are for a four-week period, as in our experiment. The numbers reported in row 1 are averages over those who take up the contract (and thus are affected by it) and those who do not. Row 2 reports the estimated effects of a linear per-attendance subsidy of p = $1.90, which has the same impact on average population attendance as does the 12+ contract. Row 3 reports the effects of the optimal per-attendance subsidy. The formula for this subsidy is derived in Appendix D.3.3. 46 Figure 1: Illustration of the behavior change premium for a present-focused agent Actual Forecasted 𝑝𝑝𝑝 + Δ E F H Change in Total Surplus if Desired Time Consistent Z 𝑝𝑝𝑝 B C Behavior change I premium G D 𝑝𝑝! + 𝑏𝑏 𝑝𝑝 = −𝑏𝑏 A 𝛼𝛼((𝑝𝑝𝑝) 𝛼𝛼((𝑝𝑝𝑝 + Δ) Attendance Notes: This figure gives a representation of actual, forecasted, and desired attendance curves as a function of incentives. See Section 2.2 for a detailed description of this figure. 47 Incentive (𝑝𝑝) Figure 2: Actual attendance and subjective expectations of attendance by incentive 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Average expected visits Average realized visits Notes: This figure reports the means and 95% confidence intervals for participants’ subjective expectations of gym attendance (“Best guess of days I would attend over the next four weeks”) and realized attendance, for different levels of piece-rate incentives. Subjective expectations are averaged over all participants in the analysis sample, while average realized visits are based on the subsets of participants who were randomized to receive each incentive. Section 3 describes how different incentive levels were probabilistically targeted in each of the three study waves. Because the incentive levels shown here were not all targeted in every wave, the sample sizes underlying the average realized visits statistics differ (N=413 ($0); N=293 ($2); N=75 ($5); N=342 ($7)). 48 Visits Figure 3: Effect of information treatments on actual attendance and subjective expectations of attendance (a) Impact of basic information treatment 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Average expected visits, information control Average realized visits, information control Average expected visits, basic information treatment Average realized visits, basic information treatment (b) Impact of enhanced information treatment 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Average expected visits, information control Average realized visits, information control Average expected visits, enhanced information treatment Average realized visits, enhanced information treatment Notes: This figure presents the effects of the basic and enhanced information treatments on participants’ subjective expectations of attendance, as well as their actual attendance. Panel (a) presents results from wave 1, where the basic information treatment was randomized. Panel (b) presents results from waves 2 and 3, where the enhanced information treatment was randomized. Subjective expectations are averaged over all participants in the analysis sample, while average realized visits are based on the subsets of participants who were randomized to receive each incentive. Section 3 describes how different incentive levels were probabilistically targeted in each of the three study waves. Because the incentive levels shown here were not all targeted in every wave, the sample sizes underlying the average realized visits statistics differ (Panel (a): N=105 ($0), N=112 ($2), N=121 ($7); Panel (b): N=308 ($0); N=181 ($2); N=74 ($5); N=221 ($7)). 49 Visits Visits Figure 4: Subjective expectations of earnings and willingness to pay for piece-rate incentives 225 200 175 150 125 100 75 50 25 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Avg. subjective expected earnings Avg. WTP for that incentive Notes: This figure compares participants’ WTP for piece-rate incentives to their subjective expected earnings from the piece-rate incentives. For each incentive, subjective expected earnings are the product of the piece- rate level and participants’ subjective beliefs about the number of days they would visit under that incentive. Figure 5: Estimated average behavior change premium Average across incentives Average excluding $1 incentive -1 -.5 0 .5 1 1.5 2 2.5 Behavior change premium ($) Notes: This figure shows the participants’ average behavior change premium per dollar of additional incentive, as formalized in Sections 2.2 and 2.3.2. The top number averages across all incentive levels, while the bottom number reports the average excluding the $1 incentive. 95% confidence intervals are obtained from heteroskedasticity-robust standard errors. 50 $ Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Online Appendix Table of Contents A Theory Appendix 52 A.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.2 Formal results for Section 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 A.3 Proofs of the remaining Propositions . . . . . . . . . . . . . . . . . . . . . . . . . 60 B Further study details 67 C Further results and robustness tests for reduced-form results 69 C.1 Further results on actual versus expected attendance . . . . . . . . . . . . . . . . . 69 C.2 Additional results on willingness to pay for incentives . . . . . . . . . . . . . . . . 70 C.3 Additional results on the behavior change premium . . . . . . . . . . . . . . . . . . 71 C.4 Additional results for Section 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 C.5 Additional results for Section 6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 C.6 Additional results for Section 6.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 C.7 Additional results for Section 6.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 D Structural estimation appendix 78 D.1 Details on GMM estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . 78 D.2 Implications of heterogeneity for our parameter estimates . . . . . . . . . . . . . . 79 D.3 Details on equilibrium strategies, value functions, and simulated behavior . . . . . 80 D.4 Additional structural estimation results . . . . . . . . . . . . . . . . . . . . . . . . 84 D.5 Welfare effects of other commitment contracts . . . . . . . . . . . . . . . . . . . . . 87 D.6 Welfare estimates for alternative specifications of heterogeneity . . . . . . . . . . . 87 D.7 How commitment contracts affect attendance over time . . . . . . . . . . . . . . . 89 D.8 Alternative assumptions about the cost distribution . . . . . . . . . . . . . . . . . 90 D.9 Dollar value of exercise from public health estimates . . . . . . . . . . . . . . . . . 94 51 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky A Theory Appendix A.1 Proof of Proposition 1 Proof. Let Ft and ft denote the CDF and PDF, respectively, of the cost draws in period t. When the costs are distributed independently, we have ∫ d ( ∑ ) d ∑ V 0, p at = ∑ (b+ p− c)ft(c)dcdp dpt t c≤β̃(b+p) ∑ = Ft(β̃(b+ p)) + (1− β̃)(b+ p)β̃ ft(β̃(b+ p)) t t = α̃(p) + (1− β̃)(b+ p)α̃′(p) d2 ( ∑ ) V 0, p a = α̃′t (p) + (1− β̃)(b+ p)α̃′′(p) + (1− β̃)α̃′(p) dp2 t d3 ( ∑ ) V 0, p a = O(α̃′′t (p)) dp3 t Consequently, if the terms ∆3 and ∆2α̃′′(p) are negligible, ( ∑ ) ( ∑ ) d ( ∑ )− (∆)2 d2 ∑V 0, (p+ ∆) at V 0, p at = (∆) V 0, p at + V (0, p at) dp 2 dp2 t t t +O(∆3,∆2α̃′′(p)) (∆)2 = ∆α̃(p) + ∆(1− β̃)(b+ p)α̃′(p) + (2− β̃)α̃′(p) 2 +(O(∆3,∆2α̃′′(p))) ∆ = ∆ α̃(p) + α̃′(p) + ∆(1− β̃)(b+ p+ ∆/2)α̃′(p) 2 +O(∆3,∆2α̃′′(p)) α̃(p+ ∆) + α̃(p) = ∆ + (1− β̃)(b+ p+ ∆/2)(α̃(p+ ∆)− α̃(p)) 2 +O(∆3,∆2α̃′′(p)) Next, consider the case in which the costs are not distributed independently, but β̃ = 1. Here, we regard a strategy as a mapping from cost vectors (c∑1, . . . , cT ) to a set of actions (a1, . . . , aT ). The person’s expected utility under piece-rate p, V (0, p at), will be differentiable in t as long as the costs are smoothly distributed. Thus, Theorem 1 of Milgrom and Segal (2002) implies that d ( ∑ ) V 0, p at = α̃(p). dp t 52 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Proceeding as above shows that ( ∑ ) ( ∑ ) − α̃(p+ ∆) + α̃(p)V 0, (p+ ∆) at V 0, p at = ∆ +O(∆3,∆2α̃′′(p)) 2 t t A.1.1 Relaxing local linearity assumptions and assessing approximation error More generally, ( ∑ ) ( ∑ ) ∫ x=p+∆ ( ) V 0, (p+ ∆) a − V 0, p a = α̃(x) + (1− β̃)(b+ x)α̃′t t (x) dx t t x=p To assess the potential approximation error in the proposition, suppose that the [cost draws ar]e exponentially distributed with rate λ, as in our structural model, so that α̃(x) = 28 · 1− e−λβ̃(b+x) and α̃′(x) = 28 · λβ̃e−λβ̃(b+x). Now using ∫ 1 x=p+∆ e−λβ̃(b+p) ( ) ∫ α̃(x)dx = ∆− 1− e −λβ̃∆ 28 x=p λβ̃ 1 x=p+∆ ( ) α̃′(x)dx = e−λβ̃(b+p) 1− e−λβ̃∆ 28∫ x=p 1 x=p+∆ ∫ x=p+∆ xα̃′(x)dx = λβ̃e−λβ̃b xe−λβ̃xdx 28 x=p [ x=p ( ) ] = e−λβ̃b 1 p+ 1− e−λβ̃∆ −∆e−λβ̃∆ λβ̃ we obtain that ∑ ∑ V (0, (p+ ∆) t at)− V (0, p a ) e−λβ̃(b+p) ( ) t t = ∆− 1(− e−λβ̃∆28 λβ̃ ) + (1− β̃)be−λβ̃([b+p) 1−(e−λβ̃∆1 )] + (1− β̃)e−λβ̃b p+ 1− (1 + λβ̃∆)e−λβ̃∆ λβ̃ meaning that the exact value of the(behavior c)hange premium i[s given b(y )] (1− β̃)be−λβ̃(b+p) 1− e−λβ̃∆ + (1− β̃)e−λβ̃b p+ 1 1− (1 + λβ̃∆)e−λβ̃∆ λβ̃ BCP (p,∆) = ∆ The approximation from Proposition 1 is that 53 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky ∑ ∑ ( ) V (0, (p+ ∆) t at)− V (0, p a ) −λβ̃∆t t ≈ ∆ 1− e−λβ̃(b+p) 1 + e 28 (2 ) + (1− β̃)(b+ p+ ∆/2) e−λβ̃(b+p) − e−λβ̃(b+p+∆) and the approximation error in the BCP is therefore(given by∑ ∑ ) V (0,(p+(∆) t at)−V ()0,p t at) −∆ 1−[ e−λβ̃(b+(p) 1+e−λβ̃∆28 2 )] − 1 (1− β̃)be−λβ̃(b+p) 1− e−(λβ̃∆ + (1−)β̃)e−λ(β̃b p+ 1 1− (1 + λ)β̃∆)e−λβ̃∆λβ̃ ∆− e−(λβ̃(b+p) 1−)e−λβ̃∆ −∆ 1−[ e−λβ̃(b(+p) 1+e−λβ̃∆λβ̃ 2= )] (6) (1− β̃)be−λβ̃(b+p) 1− e−λβ̃∆ + (1− β̃)e−λβ̃b p+ 1 1− (1 + λβ̃∆)e−λβ̃∆ λβ̃ At our values of ˆ̃λ̂ = 0.068, β̂ = 9.66, and β = 0.84, this implies that the approximation error in the estimated value of the behavior change premium for the pairs (p,∆) ∈ {(1, 1), (2, 1), (3, 2), (5, 2), (7, 5)} is 0.10, 0.06, 0.26, 0.16, and 1.29 percent respectively. A.2 Formal results for Section 2 Except where noted, we state our formal results for the case of T = 1 to simplify intuition and exposition. Where noted, we generalize the key results to T > 1. A.2.1 Behavior in absence of stochastic valuation errors or perceived social pressure In period 1, individuals choose a = 1 if β(b+p)−c ≥ 0, or equivalently if c ≤ β(b+p). This decision rule says that for the person to act, the current costs of action have to be less than the discounted future benefits plus contingent rewards from action. In period 0, an individual’s perceived expected utility given contract (y, ap) is [ ∫ ] V (y, ap) = β y + (b+ p− c)dF (c) c≤β̃(b+p) Assume p > 0. We call a contract (−p, ap) a commitment contract for a = 1 with penalty p. This contract is perceived as a dominated contract by an individual who believes himself to be time-consistent. We call a contract (−p, (1− a)p) a commitment contract for a = 0 with penalty p. We define ∆V (p) = V (−p, pa)− V (0, 0). 54 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky A.2.2 With uncertainty about costs, quasi-hyperbolic preferences rarely generate de- mand for commitment Commitment contracts for a = 1 will be desired when β̃ < 1 and there is little uncertainty about the action a = 1 being desirable from the period t = 0 perspective. For example, suppose that the costs c are always smaller than the delayed benefits b, but that the individual thinks that because of present focus she may sometimes choose a = 0. In this case, the individual will always want a commitment contract with a high enough penalty p that guarantees that she will always choose a = 1. In our notation, this is a contract (−p, ap) with p ≥ (1−β̃)b . β̃ More generally, when there is only a small chance that immediate costs will exceed the delayed benefits, individuals with β̃ < 1 will want penalty-based contracts as long as β̃ is not too low. If β̃ is too low, then the penalties will lead to financial losses that are too large in magnitude relative to the desired behavior change. This line of logic can be used to establish that when there is a small chance that costs exceed benefits, there will be demand for commitment by some individuals, and it will be non-monotonic in β̃. This is analogous to the results of Heidhues and Kőszegi (2009), John (2020), and Schilbach (2019). Those with β̃ = 1, due to either naivete or actual time consistency, do not want commitment contracts. Those with very low β̃ do not want commitment contracts because they perceive the contracts to be largely ineffective. But those with intermediate levels of β̃ do want the contracts. However, such results about (non-monotonic) demand for commitment depend on strong as- sumptions about how much uncertainty there is about the costs of doing the action. We now show that the standard quasi-hyperbolic model predicts that there should not be demand for commitment when there is at least a moderate chance that costs exceed delayed benefits. We consider first whether for a fixed penalty p there exists any β̃ such that individuals will want the contract. Second, we consider whether for a given β̃ there exists any commitment contract (in- cluding fully binding ones) that will be desirable. Throughout, we will assume that the distribution of costs can be characterized by a continuous density function f with support on [c, c̄]. Proposition 2. Fix p and assume that f(c2)/f(c1) ≥ (c1/c2)2 for all c2 > c1 in some interval [βb, β̄(b + p)]. Then ∆V (p) is strictly increasing in β̃ ∈ [β, β̄]. In particular, if β = 0 and β̄ = 1, then ∆V (p) is strictly increasing in β̃ for all β̃, and thus no individual will want the contract. The economic content of the assumption in Proposition 2 is that in the region of cost draws where individuals’ decisions can actually be affected by a financial incentive of size p, the amount of uncertainty is not “too small.” In particular, the chances of a cost draw that exceeds the benefits do not rapidly vanish to zero. The assumption is satisfied by, for example, a uniform distribution on [0, c̄], where c̄ ≥ b+ p. For instance, suppose that c ∼ U [0, 1.5b], so that time-consistent individuals do not want to take the action 33% of the time. In this case, there does not exist any β̃ for which a commitment contract with penalty p < b/2 is desirable. In fact, the uniform distribution example overstates how big the probability of costs exceeding benefits must be to erode demand for commitment. Proposition 2 shows that even if the density of 55 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky cost draws between b and 1.5b is decreasing at rate 1/c2, individuals will still not want commitment. We complement our first result with a proposition that fixes β̃ and gives sufficient conditions for there to exist no desirable commitment contract at any value of p. This includes commitment contracts that simply restrict choice to a = 1 with infinite penalties p =∞ for choosing a = 0. Proposition 3. Fix β̃ and assume that (i) f is unimodal;43 (ii) c̄ ≥ b+(1− β̃)b; (iii) f(c2)/f(c1) ≥ (c1/c2) 2 for all c2 > c1 in the interval [β̃b, c̄); and (iv) 1 − F (b) ≥ F (b) − F (β̃b) if f does not have a mode in [β̃b, b + (1 − β̃)b], and otherwise 1 − F (b) ≥ [F (b) − F (β̃b)]/β̃. Under these four assumptions, there exists no value of p, including p =∞, such that a penalty of size p for choosing a = 0 is desirable. The economic content of the assumptions of Proposition 3 is again that there is at least some meaningful uncertainty about the desirability of choosing a = 1. While assumption (i) is a technical regularity condition, assumptions (ii)-(iv) provide bounds on uncertainty. The key assumption is assumption (iv), which says that the chances of getting a cost draw under which it is suboptimal to take the action (c > b) are at least as high as the chances of getting a cost draw under which the time t = 0 individual thinks she should choose a = 1, but thinks that her time t = 1 self will not do so (c ∈ [β̃b, b]). Assumptions (ii) and (iii) strengthen the content of assumption (iv) by ensuring that the cost draws exceeding b are not all concentrated at a point only slightly higher than b. All four of the assumptions of Proposition 3 are satisfied by a uniform distribution with support [0, c̄], where c̄ ≥ b+ (1− β̃)b. For example, with β̃ = 0.8, the assumptions are satisfied by a uniform distribution with support [0, 1.2b]. For this distribution, a time-consistent individual would not want to take the action only 17% of the time, and in those 17% of cases, the cost draws do not exceed the delayed benefits by more than 20%. This is an arguably modest amount of uncertainty. Yet this modest amount of uncertainty erodes demand for all possible commitment contracts. Figure A1 summarizes commitment contract demand for the case in which c is uniformly dis- tributed on [0, 1].44 43Formally, there do not exist c1 < c2 < c3 such that f(c2) < min(f(c1), f(c3)). 44Since particularly high draws of c are what make commitment contracts particularly costly, the thin-tailed uniform distribution overstates the amount of uncertainty it would take to erode demand for commitment. 56 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Figure A1: Commitment contract demand for uniform distribution of costs Notes: This figure illustrates the commitment contract demand for the case in which costs are distributed uniformly on the unit interval (c ∼ U [0, 1]). Commitment contract demand is a function of delayed benefits b and perceived short-run discount factor β̃. As can be seen, for β̃ ≥ 0.75 and b ≤ 0.8, individuals do not want any commitment contract. In that case, the perceived damages from a commitment contract are increasing in the degree of perceived present focus, 1 − β̃. When individuals do want a commitment contract, they prefer that it is binding, a sharp result that holds for uniform distributions but is not generally true. A.2.3 Imperfect perception and social pressure More generally, for a given decision j, individual i behaves as if her forecasted utility under contract (y, P ) is V̂ (y, P ) = V (y, P ) + σ(P )εij + ηi1P 6=0 (7) where E[εij ] = 0 and 1P 6=0 is an indicator that at least some contingent incentives are involved. The ηi term, which need not be positive, captures perceived social pressure. We model this term as additive to reflect the common intuition that social motives such as social desirability bias have a smaller percentage effect at larger stakes. For simplicity, we assume that ηi and εij are unrelated to βi and β̃i. To allow for some heterogeneity in the propensity for stochastic valuation, we assume that for a fraction µ of individuals εij ∼ G is i.i.d. with G supported on (−∞,∞), while for a fraction 1− µ of individuals εij ≡ 1. To characterize the new implications of the model, we begin with the observation that in the standard quasi-hyperbolic model, no individuals would ever choose commitment contracts for a = 0. This is simply because individuals would not choose to commit to take actions that in effect have immediate benefits and delayed costs. However, choice of commitment contracts for a = 0 can be consistent with our imperfect perception model in this section. As can be choice of commitment 57 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky contracts for a = 1 and a = 0 by the same person, even when the conditions of Proposition 3 are met. Proposition 4. Set p > 0 and assume that either µ > 0 or Pr(ηi > βip) > 0. Then 1. Irrespective of the distribution of βi, a positive mass of individuals will choose penalty-based commitment contracts for both a = 1 and a = 0. 2. There will be a positive association between demand for commitment contracts for a = 1 and commitment contracts for a = 0 if E[β̃i] is sufficiently close to 1 and one of the following conditions holds: (i) µ = 1 and there are individual differences in ηi, (ii) µ = 0 and Pr(ηi > βip) > 0, or (iii) µ ∈ (0, 1) and ηi = 0 for all i. Part 1 of Proposition 4 establishes that imperfect perception and demand effects can lead indi- viduals to choose commitment contracts both for a = 0 and for a = 1, even when there is significant uncertainty about the cost of doing the activity. Part 2 shows that in experiments in which individuals are faced with a number of decisions, with only one decision randomly selected to be implemented, there can be a positive association between demand for commitment contracts to do more of an activity and to do less of an activity. As we show below, the imperfect perception model also implies that with at least moderate uncertainty about future costs, the likelihood of choosing a penalty-based commitment contract for a = 1 will be monotonically increasing in β̃. This is in contrast to the more standard results about non-monotonicity, such as those of Heidhues and Kőszegi (2009) and John (2020). Proposition 5. Suppose that f(c2)/f(c1) ≥ (c1/c )22 for all c2 > c1 in the interval [0, b+ p]. Then the likelihood of choosing the contract (−p, ap), for p ≥ 0, is increasing in β̃. This result is a corollary of Proposition 2, which shows that under moderate to large uncertainty, the perceived harms of a commitment contract are decreasing in β̃ in the standard quasi-hyperbolic model. Although in the standard quasi-hyperbolic model these conditions would lead individuals to never choose a commitment contract, in our imperfect perception model individuals still choose the contract, but with a propensity that is decreasing in the expected harms in the standard model.45 Intuitively, the less harmful the contracts would seem in the absence of noise and demand effects, the less noise and demand effects it takes to generate take-up. Finally, we have the following corollary to Proposition 1: Corollary 1. Under the assumptions in Proposition 1 and the imperfect perception model, if β̃i = 1 for all i and p > 0 then [ ] [ ] w E i (p+ ∆)− wi(p) α̃i(p+ ∆) + α̃i(p) = E (8) ∆ 2 45Interestingly, the converse of Proposition 5 does not hold for commitment contracts for a = 0. That is, it does not hold that the likelihood of choosing a commitment contract for a = 0 is decreasing in β̃. Intuitively, this is because a lower β̃ dampens the impact of financial incentives in both cases, and thus makes penalty-based contracts potentially more harmful in both cases. 58 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky and[if β̃i < 1 for some ]i and c[osts are independent across time then ] wi(p+ ∆)− wi(p) α̃i(p+ ∆) + α̃i(p) − α̃i(p+ ∆)− α̃i(p)E = E + (1 β̃i)(bi + p+ ∆/2) . (9) ∆ 2 ∆ We condition on p > 0 in the corollary because that allows the fixed terms ηi to be differenced out. Variations of our imperfect perception model in which valuation errors are not mean-zero, or in which perceived social pressure rises with stakes, would invalidate the methodology we propose here, along with using commitment demand as a measurement tool, and all other approaches to measurement of time inconsistency. Fortunately, the key assumptions behind Corollary 1 are testable: individuals who expect no change in behavior (α̃i(p+ ∆)− α̃i(p) = 0), should have an average behavior change premium equal to zero when p > 0. If instead ηi increased with p, or if E[εij ] > 1, then we would estimate a positive behavior change premium even for individuals who expect no behavior change. We implement this test in Appendix C.3. Proof of the Corollary ∑ ∑ Proof. If p > 0, then wi(p+ ∆)− wi(p) = Vi[(0, ((p+ ∆) t at)− V)i (0, p t at]) εij , and thus∑ E[wi(p+ ∆)− wi(p)] = E Vi 0, (p+ ∆) at − Vi(0, p) . t If p = 0, ( ∑ ) E[wi(∆)− wi(p)] = E[Vi 0,∆ at − Vi(0, 0)] + E[ηi]. t A.2.4 Generalization of Proposition 2 to the dynamic case We generalize Proposition 2 by considering commitment contracts like those in our experiment, which involve a penalty p if the individual does not choose at = 1 at least r ≤ T times. Proposition 6. Fix p and suppose that F (·|ht) has a density function f(·|ht) for each ht, which satisfies f(c2|ht)/f(c1|ht) ≥ (c /c )21 2 for all c1 < c∑2 < b + p. Then the perceived utility loss of a commitment contract that involves a penalty p for at < r is decreasing in β̃. Consequently, no individuals should desire commitment contracts. Analogous to before, the key condition for commitment contracts to be unattractive is that the density of cost shocks in period t, conditional on any period t history of actions, does not diminish too quickly toward zero, in the sense of Proposition 2. Under this condition, backwards induction using repeated application of Proposition 2 establishes a result analogous to Proposition 2. One possible intuition, in the spirit of the Central Limit Theorem, is that uncertainty becomes less of an issue when there are more opportunities to act. However, this is counteracted by the fact that future selves’ misbehavior is also more of an issue in dynamic settings in which payoffs are not 59 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky separable in actions; this non-separability is generated by commitment contracts to meet a certain threshold. A.3 Proofs of the remaining Propositions A.3.1 Proof of Proposition 2 Proof. We have d ∆V/β = p(b+ p)f(β̃(b+ p)) + (b+ p)(b− β̃(b+ p))f(β̃(b+ p))− b(b− β̃b)f(β̃b) dβ̃ = (1− β̃)(b+ p)2f(β̃(b+ p))− (1− β̃)b2f(β̃b) (10) The expression (10) is positive if f(β̃(b+p)) ≥ b2 . f(β̃b) (b+p)2 Since the condition implies Pr(c > b) > 0 when β̄ = 1, β̃ = 1 individuals have ∆V < 0. The first part of the proposition then implies that ∆V < 0 for all β̃. A.3.2 Proof of Proposition 3 We begin with a lemma: Lemma 1. Under the assumptions of the proposition, no individuals will want commitment contracts that force a = 1. Proof. To shorten equations, set γ = (1 − β̃)b. The perceived expected gains from a binding com- mitment contract are given by ∫ ∆V/β = (b− c)f(c)dc. c≥β̃b ∫ The goal is thus to show that c≥β̃b(b− c)f(c)dc < 0 under the assumptions of the proposition. Case 1: Suppose that f is increasing on [b, b + γ]. Then by the single-peak assumption, f is increasing on [b− γ, b+ γ] . Then the value of the fully binding contract is 60 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky ∫ ∞ ∫ c=b+(1−β̃)b (b− c)f(c)dc ≤ ∫ (b− c)f(c)dcc=β̃b c=β̃bb ∫ b+(1−β̃)b = ∫ (b− c)f(c)dc+ ∫ (b− c)f(c)dcc=β̃b c=bb b+(1−β̃)b ≤ ∫ (b− c)f(c)dc+ ∫ (b− c)f(2b− c)dcc=β̃b c=bb b = (b− c)f(c)dc− (b− c)f(c)dc c=β̃b c=β̃b = 0 where to get to the second-to-last line we perform a change-of-variable on the second integral via the function ϕ(x) = 2b− x. Case 2: Suppose now that f is decreasing on [b−γ, b+γ]. Define µ := F (b)−F (∫b−γ), and recall∫that the fourth assumption requires that 1 − F (b) ≥ µ. On the other hand, bµ = x=b−γ f(x)dx ≥b x=b−γ f(b)dx = γf(b). Now, ∫ b ∫ b ∫ b (b− c)f(c)dc = (b−∫c)f(b)dc+ (b− c)(f(c)− f(b))dcc=β̃b c=β̃b c=β̃b γ2 b = f(b) + ∫ (b− c)(f(c)− f(b))dc2 c=β̃b γ2 b≤ f(b) + γ(f(c)− f(b))dc 2 c=β̃b γ2 = f(b) + (µ− γf(b))γ 2 γ2 = γµ− f(b) (11) 2 Intuitively, all of the mass that is in excess of a uniform distribution on [b − γ, b] with density f(c) = f(b) is concentrated on the point adding the most to the mean: c = β̃b. Next, 61 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky ∫ ∫ b+γ ∫ (b− c)f(c)dc = ∫ (b− c)f(c)dc+ ∫ (b− c)f(c)dcc≥b c=b c≥b+γb+γ ≤ ∫ (b− c)f(c)dc− γf(c)dcc=b c≥b+γb+γ = ∫ (b− c)f(c)dc− γ(1− F (b+ γ))c=bb+γ = ∫ (b− c)f(c)dc− γ [((1− F∫(b)− (F (b+) γ)− F (b)))]c=bb+γ b+γ ≤ ∫ (b− c)f(c)dc− γ µ− f(c)dcc=b c=bb+γ = ∫ (b+ γ − c)f(c)dc− γµc=bb+γ ≤ (b+ γ − c)f(b)dc− γµ c=b γ2 = f(b)− γµ (12) ∫ 2 Intuitively, the quantity − b+γc=b (b − c)f(c)dc is minimized∫when 1 − F (b) = µ and as much of the mass µ as possible belongs to b+γ[b, b+ γ]. So to minimize − c=b (b− c)f(c)dc, we need to maximize the mass of F on [b, b + γ], and the way to do that is to let it be uniform on [b, b + γ], with density f(c) := f(b). In this case, the rest lies on points c ≥ b+ γ and has to integrate to at least (µ− γf(b))γ. ∫ Putting (11) and (12) together shows that c≥β̃b(b− c)f(c)dc ≤ 0. Case 3: Suppose that the mode of f lies in [b− γ, b] and that µ ≥ γf(b). Equation (12) holds because as in Case 2, f is decreasing on [b, b+ γ]. ∫ Next, we consider the maximum of the function A given by bA(f) := c=β̃b(b − c)f(c)dc, over all f that∫ have a mode on [b − γ, b]. Suppose for a given f that the mode is at c∗ > β̃b, and tha∫t bc=β̃b(f(c∗) − f(c))dc > 0. Then consider f̃ given by f̃(c) = f(c) for c ≥ c∗, andb (f(c∗)−f(β̃b))dc f̃(c) = c=β̃b ∗− for c < c ∗. Since f is increasing on [β̃b, c∗], f stochastically dominates f̃ . c β̃b Consequently, since b− c is positive and decreasing in c, A(f̃) > A(f). This establishes that the f that maximizes A must be decreasing almost everywhere on [β̃b∫, b] (except for a set of zero Lebesgue2 measure). We can then proceed as in Case 2 to establish that b γc=β̃b(b− c)f(c)dc ≤ γµ− 2 f(b). Case 4: Suppose that the mode lies in [b − γ, b] and that µ < γf(b). As in Case 3, we have shown that A is maximized when f is decreasing almost everywhere. But since µ < γf(b), this means that f must be uniform almo∫st everywhere, with density f(c) = µ/γ. Thus in this caseb (b− c)f(c)dc ≤ γµ/2. (13) c=β̃b 62 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky ∫ Now the highest value of c≥b(b−c)f(c)dc is obtained by a density function f that puts as much mass toward b as possible, and minimizes the value of f(b). Th∫at is, f(c) = (b/c)2f(b) for c ≥ b, with c̄ = b+ γ, and f(b) large enough to satisfy the constraint c≥b f(c) = µ/β̃. The constraint on f(b) is ∫ b+γ 2 µ/β̃ ≤ b f(b)dx x=b x 2 b2 = (− f(b)|b+)γx b b2 = b− f(b) b+ γ γ = bf(b) b+ γ 63 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Now for k = 1− β̃, ∫ b+γ ∫ b+γ 2 − (b− bx)f(c)dc = (x x=b x=b ∫− b)( f(b)dxx2b+γ ) = b2 1 f(b) [ − b ] dxx=b x x2b+γ = b2 b f(b) [ln(x) + x x=b ] = b2 b f(b) [ln(b+ γ) + −] ln(b)− 1b+ γ k = b2f(b) [ln (1 + k)− 1 +]k ≥ 2 − k 2 b f(b) [k − k 2 1 + k ] (14) k + k22 − k k2= b f(b) [ −1 + k ] 2 k2 k2 = b2f(b[) −1 + k 2] γ2 γ2 = f(b) [ −1 + k 2] γ2(1− k) = f(b) 2(1 + k) β̃γ2 = f(b) 2(1 + k) 1 γ = β̃γ bf(b) 2 b+ γ ≥ β̃γ µ 2 β̃ = γµ/2 (15) To obtain (14), we need to show that log(1 + x) ≥ x − x2/2 for x ≥ 0. To that end, note that equality holds when x = 0. The derivatives of the left and right hand side of the inequality with respect to x are 11+x and 1− x, respectively, so it is enough to show that 1 1+x ≥ 1− x. This holds iff 1 ≥ 1− x2, which follows because x2 ≥ 0. ∫ The combination of (13) and (15) implies that c≥β̃b(b− c)f(c)dc ≤ 0. Case 5. Suppose that the mode∫is in [b, b + γ]. Since this implies that f is increasing on [b− γ, b], the highest possible value of bc=β̃b(b− c)f(c)dc, given that F (b)− F (β̃b) = µ, is obtained w∫ hen f is almost everywhere uniform, with density f(c) = µ/γ. ∫As in Case 4, this implies thatb c=β̃b(b− c)f(c)dc ≤ γµ/2. And as in Case 4, the highest value of c≥b(b− c)f(c)dc is obtained by a density function f that puts as much mass toward b as possible, and minimizes the value of f(b). That is, f(c) = (b/c)2f(b) for c ≥ b, with c̄ = b+ γ, and f(b) large enough to satisfy the constraint 64 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky ∫ c≥b f(c) = µ/β̃. Proceeding as in that case establishes the result. With the lemma in hand, we are ready to prove Proposition 3. Proof of the proposition Proof. Case 1: Suppose that c̄ =∞. Then Proposition 2 implies that for any value of p, the value of the commitment contract is increasing in β̃. But since ∆V < 0 for β̃ = 1 individuals, it must be that ∆V < 0 for all β̃. Case 2: Suppose that c̄ < ∞. Set β† = min (1, c̄/(b+ p)). If β† < β̃ then this commitment contract generates the same utility as a fully binding commitment contract. Lemma 1 implies that it is undesirable. If β† > β̃ then Proposition 2 implies that an individual with perceived present focus β† expects higher gains from this contract than an individual with perceived present focus β̃. However, to an individual with perceived present focus β†, this contract is equivalent to a fully binding commitment contract. It is thus enough to show that a fully binding commitment contract is undesirable to an individual with perceived present focus β†. To this end, note that a commitment contract that binds individuals to a = 1 is (weakly) less attractive to individuals with higher β̃. But since Lemma 1 implies that a fully binding commitment contract is undesirable to an individual with perceived present focus β̃, a fully binding commitment contract must also be undesirable to an individual with perceived present focus β†. A.3.3 Proof of Proposition 4 Proof. Consider the contracts (y, P ) and (y, P ′) given by (−p, ap) and (−p, (1− a)p), respectively. An individ[ual will choose (−p, ap) if∫ ]β̃i(b+p) ∫ β̃ib (b+ p− c)dF (c)− (b− c)dF (c) + (σ(P )− σ(0))εij ≥ p− ηi/βi (16) c=0 c=0 an[d will choose (−p, (1− a)p) if∫ ∫ ]β̃i(b−p) ∫ β̃ib pdF (c) + (b− c)dF (c)− (b− c)dF (c) + (σ(P ′)− σ(0))εij ≥ p− ηi/βi c≥β̃i(b−p) c=0 c=0 (17) Both conditions will be satisfied if either ηi > βip, or if the individual is prone to stochastic valuation errors and the draw εij is sufficiently high. This establishes part 1. To prove part 2, first suppose that β̃i = 1 for all individuals. In this case, the propensity to choose either contract is strictly increasing in ηi both for individuals subject to stochastic valuation errors and for those who are not. Thus, if the population share of those making stochastic valuation errors is µ = 1, there is a strictly positive association in the take-up of contracts. If it is µ = 0 and Pr(ηi > βip) > 0, then there will also be a strictly positive association. Finally, consider 65 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky µ ∈ (0, 1) and ηi = 0 for all i. Since only individuals prone to stochastic valuation errors will take up either contract with positive probability, take-up of these contracts will again be strictly positively correlated. This establishes a strictly positive correlation in take-up of contracts for E[β̃i] = 1. By continuity, the positive correlation holds if E[β̃i] is sufficiently close to 1. More generally, for the case of T > 1, an individual will choose a commitment contract (y, P ) if V (y, P )− V (0, 0) + (σ(P )− σ(0))εij + ηi ≥ 0 (18) Clearly, this will hold for either ηi or εij high enough, and thus both “more” and “fewer” contracts will be chosen with positive probability. The propensity to choose either contract will again be increasing in ηi and thus there will be a positive correlation in take-up when µ ∈ {0, 1} and β̃i = 1 for all i. Similarly, when µ ∈ (0, 1), ηi ≡ 0 and β̃i = 1 for all i, only individuals with stochastic valuation errors will choose either type of contract with positive probability, and thus there is again a positive correlation in take-up. A.3.4 Proof of Proposition 5 Proof. Since the probability of choosing a commitment contract is increasing in ∆V, the result follows if we show that ∆V is increasing in β̃i and in b. By Proposition 2, ∆V is increasing in β̃i. A.3.5 Proof of Proposition 6 Throughout, we use the following straightforward but useful extension of Proposition 2: Lemma 2. Consider a density function f(c) such that f(c2)/f(c1) ≥ (c1/c 22) for all c1 < c2 < B. Let the payoffs for choosing a = 0 and a = 1 be b0 and b1, respectively. Suppose that the density f∫unction f(c) is such that f(c2)/f(c1) ≥ (c1/c2)2 for all c1 < c2 < b1 − b0. Define W = b0 + (b − b − 2c)f(c)dc. Then ∂ Wc≤β̃(b −b ) 1 0 < 0, and consequently ∂W 1 0 ∂β̃∂b0 ∂b > 0. 0 2 Proof. The first part, ∂ W < 0, is an immediate consequence of Proposition 2, since decreasing b ∂β̃∂b 00 is equivalent to instituting a penalty for choosing a = 0. The second part follows because ∂W∂b > 00 clearly holds for β̃ = 1, and thus by the first statement must hold for any β̃ < 1. We now prove the proposition: ∑ Proof. Let Vt(ht) denote the period 0 expectation of period t self’s utility, following ht = t−1τ=1 aτ choices of aτ = 1. Note that Vt(ht) is also the period t − 1 expectation of self-t utility, since both period 0 and period t− 1 selves have the same beliefs about period t self’s behavior. Step 1. We first show that Vt(h + 1) ≥ Vt(h) for all h. We do this by induction. Consider t = T . If h ≥ r or if h ≤ r− 2 then Vt(h+ 1) = Vt(h), since in the former case the individual meets 66 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky the threshold regardless and in the latter case the individual fails to meet the threshold regardless. If ht = r− 1 then Proposition 2 implies that Vt(h+ 1) > Vt(h), since in the former case there is no penalty for choosing at = 1 while in the latter case there is. Now suppose that Vt+1(h) is increasing in h. In period t, this means that the delayed payoffs from choosing at = 1 and at = 0, respectively, are Vt+1(ht + 1) and Vt+1(ht). Clearly, period t utility is increasing in Vt+1(ht + 1). Lemma 2 establishes that period t utility must also be increasing in Vt+1(ht), the payoff from choosing at = 0. And since Vt+1 is increasing in ht by the induction hypothesis, this establishes that Vt must also be increasing in ht. Step 2. We now show that Vt(ht) is increasing in β̃ for all ht. We again do this by induction. Consider first t = T . If hT ≥ r or if hT ≤ r−2, then the penalty does not matter. If hT = r−1 then 2 Proposition 2 implies that ∂∂pVT (hT ) < 0 and ∂ VT (hT ) > 0 . Now suppose that ∂∂β̃∂p ∂pVt+1(ht+1) < 0 2 and ∂ Vt+1(ht+1) > 0. In period t, the delayed payoffs from choosing a∂β̃∂p t = 1 and at = 0, respectively, are Vt+1(ht + 1) and Vt+1(ht). The induction hypothesis implies that these delayed payoffs decrease with p, which by Lemma 2 implies that Vt is decreasing in p. Moreover, the induction hypothesis implies that these payoffs decrease the most for those with the lowest β̃. Lemma 2 therefore also implies that Vt decreases the most in p for those with the lowest β̃. B Further study details Table A1: Study details by wave Notes: This table describes the variations in the study across the three waves of implementation. 67 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A2: Demographics and balance Difference in means: Overall mean Treatment − control Waves 1-3 Wave 1 P-value Waves 2-3 P-value (1) (2) (3) (4) (5) Female 0.613 –0.043 0.41 –0.042 0.20 Agea 33.51 –0.47 0.73 –0.83 0.42 Student, full-time 0.569 –0.089 0.09 0.004 0.91 Working, full- or part-time 0.571 0.141 0.01 –0.004 0.91 Married 0.272 0.082 0.08 –0.004 0.89 Advanced degreeb 0.457 0.045 0.40 –0.002 0.94 Household incomea 55,139 1,637 0.74 –4,399 0.21 Visits in the past 4 weeks, recorded 6.91 0.21 0.74 –0.10 0.79 166 control 456 control N 1,248 174 treated 452 treated a. Imputed from categorical ranges. b. A graduate degree beyond a B.A. or B.S. Notes: This table shows the means of demographic variables elicited in our online survey, as well as differences in treatment and control group means. In wave 1 of the experiment, the treatment group received the basic information treatment. In waves 2 and 3, treated participants received the enhanced information treatment. See Section 3 for further details about the two information treatments. The table also summarizes data on past visit frequencies to the gym. Recorded visits are obtained from the fitness center’s log-in records. 68 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky C Further results and robustness tests for reduced-form results C.1 Further results on actual versus expected attendance Figure A2: Actual attendance versus participants’ subjective expectations of attendance 30 25 20 15 10 5 0 0 5 10 15 20 25 30 Expected attendance under assigned incentive Notes: This figure shows a binned scatterplot comparing participants’ actual attendance to their subjective expectations of gym attendance under the incentives they received, along with a regression-fitted line for the scatterplot. A dashed 45-degree line is included for reference. The sample excludes participants in wave 3 assigned a commitment contract (122 participants) rather than a piece-rate incentive. The fact that the first point does not lie below the 45-degree line does not imply that some people are under-optimistic. This is consistent with mean-zero noise in stated beliefs generating a form of mean-reversion between actual and forecasted behavior. 69 Actual attendance Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky C.2 Additional results on willingness to pay for incentives Figure A3: Willingness to pay versus participants’ subjective expectations of attendance 300 250 200 150 100 50 0 0 5 10 15 20 25 30 Expected attendance Per-visit incentive ($) 1 2 3 5 7 12 Notes: This figure presents a binned scatterplot comparing participants’ WTP for piece-rate incentives to their subjective expectations of attendance under those incentives. 70 WTP for incentive ($) Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky C.3 Additional results on the behavior change premium Table A3: Association between the behavior change premium and expected behavior change Behavior change premium (1) (2) Expected behavior change 1.51*** 1.52*** (0.13) (0.13) Constant 0.10 (0.22) Dep. var. mean: 1.20 1.20 (0.15) (0.15) Wave FEs No Yes N 6,240 6,240 Clusters 1,248 1,248 Notes: This table reports the association between the estimated behavior change premium at each piece-rate incentive level and the expected behavior change in visits per dollar increase in the piece-rate incentive. Each column presents coefficient estimates from OLS regressions with heteroskedasticity-robust standard errors in parentheses. All incentive levels except the $1 incentive are included. The regression in column 2 includes wave fixed effects and omits the constant term. *** denotes statistics that are statistically significantly different from 0 at the 1% level. 71 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A4: Association between the behavior change premium and proxies for sophistication, with demographic controls Behavior change premium (1) (2) (3) Basic info. treatment 0.28 0.41 0.25 (0.57) (0.57) (0.56) Enhanced info. treatment 1.20** 1.25** 1.07* (0.54) (0.55) (0.55) Goal − exp. attend. 0.59** (z-score) (0.30) Actual − exp. attend. 0.55** (z-score) (0.21) Dep. var. mean: 1.17 1.17 1.17 (0.22) (0.22) (0.22) Dep. var. mean, 0.66 0.66 0.66 info. control group: (0.24) (0.24) (0.24) Demographic controls Yes Yes Yes Wave FEs Yes Yes Yes N 1,119 1,119 1,119 Notes: This table reports the association between the estimated behavior change premium (calculated exclud- ing the $1 incentive) and proxies for sophistication. Basic info. treatment and Enhanced info. treatment are dummies for whether participants received the basic and enhanced information treatments, respectively (see Section 3 for further details about the two information treatments). Goal − exp. attend. is the standardized (z-score) difference between participants’ goal attendance and their subjective expectations of attendance in the absence of incentives (unstandardized mean: 3.34, SD: 3.64). Actual − exp. attend. is the standardized (z-score) difference between participants’ actual attendance and their subjective expectations of attendance for the incentive assigned to them (unstandardized mean: −4.17, SD: 6.61). Each column presents coefficient estimates from OLS regressions with heteroskedasticity-robust standard errors in parentheses. Dependent variable means, with standard errors in parentheses, are reported for the full sample and information control group. Each column includes controls for gender, age, student status, employment status, marital status, attainment of an advanced degree, and household income. The sample excludes participants who declined to answer one or more demographic questions, as well as those in wave 3 assigned a commitment contract (122 participants) rather than a piece-rate incentive, since the Actual − exp. attend. proxy cannot be computed for those participants. *,** denote statistics that are statistically significantly different from 0 at the 10% and 5% level respectively. C.4 Additional results for Section 6.2 Here we show that the results in Table 4 on the association between take-up of “more” contracts and the behavior change premium are robust to splitting the sample by those in the information control group and those receiving the enhanced information treatment, and also hold for each of the “more” contracts separately. We find here that there is no significant correlation for the control group and the point estimates are actually negative. There is a somewhat stronger association between the measured behavior change premium and the take-up of “more” commitments for those who received the enhanced information intervention. We also show that Table 4 is largely unchanged when controlling for demographic characteristics. 72 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A5: Association between the behavior change premium and take-up of “more” contracts (a) Information control group Take-up of “more” visits contract 8+ visits 12+ visits 16+ visits Pooled (1) (2) (3) (4) Behavior change premium –0.040 –0.013 –0.036 –0.028 (z-score) (0.025) (0.024) (0.029) (0.022) Dep. var. mean: 0.65 0.52 0.36 0.51 (0.02) (0.02) (0.02) (0.01) Wave FEs Yes Yes Yes Yes Contract FEs No No No Yes N 429 622 429 1,480 Clusters 429 622 429 622 (b) Information treatment group Take-up of “more” visits contract 8+ visits 12+ visits 16+ visits Pooled (1) (2) (3) (4) Behavior change premium 0.035*** 0.041*** 0.055*** 0.044*** (z-score) (0.013) (0.013) (0.014) (0.012) Dep. var. mean: 0.62 0.47 0.31 0.47 (0.03) (0.02) (0.03) (0.02) Wave FEs Yes Yes Yes Yes Contract FEs No No No Yes N 246 452 246 944 Clusters 246 452 246 452 (c) Full sample Take-up of “more” visits contract 8+ visits 12+ visits 16+ visits Pooled (1) (2) (3) (4) Behavior change premium 0.019* 0.020* 0.026* 0.022** (z-score) (0.011) (0.012) (0.013) (0.010) Dep. var. mean: 0.64 0.49 0.32 0.49 (0.02) (0.01) (0.02) (0.01) Wave FEs Yes Yes Yes Yes Contract FEs No No No Yes N 849 1,248 849 2,946 Clusters 849 1,248 849 1,248 Notes: This table reports OLS regressions of the take-up of “more” commitment contracts on the estimated average behavior change premium (calculated excluding the $1 incentive and expressed as a z-score) for the information control group only (panel (a)); the enhanced information treatment group only (panel (b)); and the full sample (panel (c)). In columns 1, 2, and 3, the dependent variables are the take-up of the “more” visit contract with a threshold of 8, 12, and 16 visits, respectively. In column 4, the dependent variable is the take-up of a “more” visit contract, with observations pooled across the three contracts, controlling for commitment contract threshold fixed effects (i.e., 8-, 1-2, 16-visit thresholds). Standard errors are heteroskedasticity-robust in columns 1-3, and are clustered at the subject level in column 4. *,**,*** denote statistics that are statistically significantly different from 0 at the 10%, 5%, and 1% level respectively. 73 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A6: Association between take-up of “more” commitment contracts and proxies for sophisti- cation, with demographic controls Take-up of “more” visits contracts (1) (2) (3) (4) Basic info. treatment –0.024 –0.025 –0.017 –0.022 (0.041) (0.041) (0.041) (0.041) Enhanced info. treatment –0.091*** –0.096*** –0.090*** –0.084*** (0.031) (0.031) (0.031) (0.031) Behavior change premium 0.024** (z-score) (0.011) Goal − exp. attend. 0.032** (z-score) (0.013) Actual − exp. attend. –0.038*** (z-score) (0.014) Dep. var. mean: 0.49 0.49 0.49 0.49 (0.01) (0.01) (0.01) (0.01) Dep. var. mean, 0.52 0.52 0.52 0.52 info. control group: (0.01) (0.01) (0.01) (0.01) Demographic controls Yes Yes Yes Yes Wave FEs Yes Yes Yes Yes Contract FEs Yes Yes Yes Yes N 2,807 2,807 2,807 2,807 Clusters 1,119 1,119 1,119 1,119 Notes: This table reports the association between take-up of a “more” visits commitment contract and proxies for sophistication and the behavior change premium. We pool the data by participant and include commitment contract threshold fixed effects (i.e., 8-, 12-, 16-visit thresholds). The independent variables in this table are defined exactly as in Table A4, and the behavior change premium is standardized to be a z-score as well. Each column presents coefficient estimates from OLS regressions with standard errors, clustered by subject, in parentheses. Dependent variable means, with standard errors in parentheses, are reported for the full sample and information control group. Each column includes controls for gender, age, student status, employment status, marital status, attainment of an advanced degree, and household income. The sample excludes participants who declined to answer one or more demographic questions, as well as those in wave 3 assigned a commitment contract (122 participants) rather than a piece-rate incentive, since the Actual − exp. attend. proxy cannot be computed for those participants. **,*** denote statistics that are statistically significantly different from 0 at the 5% and 1% level respectively. C.5 Additional results for Section 6.3 We first show that the patterns of take-up for “more” and “fewer” commitment contracts, and in particular the positive association between those two take-up decisions, holds when we split the sample separately into information control and enhanced information treatment groups. We then examine the associations between proxies for sophistication and the decision to take up a “more” but not a “fewer” contract. At least qualitatively, these results are largely similar to those of Table 4. 74 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A7: Take-up of “more” and “fewer” commitment contracts (a) Information control group Chose “more” Chose “fewer” Chose “more” Chose “fewer” given chose given chose contract contract “fewer” “more” Diff Diff Threshold (1) (2) (3) (4) (3)-(1) (4)-(2) 8 visits 0.65 0.36 0.88 0.49 0.23*** 0.13*** 12 visits 0.52 0.33 0.72 0.45 0.20*** 0.13*** 16 visits 0.36 0.31 0.56 0.48 0.20*** 0.17*** (b) Information treatment group Chose “more” Chose “fewer” Chose “more” Chose “fewer” given chose given chose contract contract “fewer” “more” Diff Diff Threshold (1) (2) (3) (4) (3)-(1) (4)-(2) 8 visits 0.62 0.30 0.89 0.43 0.27*** 0.13*** 12 visits 0.47 0.29 0.62 0.38 0.15*** 0.09*** 16 visits 0.31 0.22 0.47 0.34 0.16*** 0.12*** Notes: This table performs analysis identical to that of Table 5 in the body of the paper, but split by infor- mation control versus information treatment groups. *** denotes statistics that are statistically significantly different from 0 at the 1% level. 75 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A8: Association between take-up of “more” but not “fewer” commitment contracts and proxies for sophistication Take-up of “more” but not “fewer” visits contracts (1) (2) (3) (4) Basic info. treatment 0.023 0.022 0.031 0.024 (0.038) (0.038) (0.038) (0.038) Enhanced info. treatment –0.018 –0.020 –0.017 –0.014 (0.031) (0.031) (0.031) (0.031) Behavior change premium 0.009 (z-score) (0.014) Goal − exp. attend. 0.039*** (z-score) (0.012) Actual − exp. attend. –0.020 (z-score) (0.012) Dep. var. mean: 0.27 0.27 0.27 0.27 (0.01) (0.01) (0.01) (0.01) Dep. var. mean, 0.27 0.27 0.27 0.27 info. control group: (0.01) (0.01) (0.01) (0.01) Wave FEs Yes Yes Yes Yes Contract FEs Yes Yes Yes Yes N 2,824 2,824 2,824 2,824 Clusters 1,126 1,126 1,126 1,126 Notes: This table performs analysis identical to that of Table 4 in the body of the paper using the take-up of “more” but not “fewer” visits commitment contracts as the dependent variable. *** denotes statistics that are statistically significantly different from 0 at the 1% level. C.6 Additional results for Section 6.4.1 Here we provide additional results showing that measures that are positively correlated with the take-up of “more” commitments tend to be negatively correlated with the take-up of “fewer” com- mitments. These results bolster the arguments in Section 6.4.1 that participants were not simply confusing “fewer” contracts for “more” contracts. 76 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A9: Correlation between perceived success in contracts and take-up of contracts Subj. prob. succeed in Subj. prob. succeed in “more” contract “fewer” contract (1) (2) (3) (4) (5) (6) Commit to “more” 0.12*** 0.14*** –0.09*** –0.13*** (0.02) (0.02) (0.03) (0.03) Commit to “fewer” –0.05* –0.08*** 0.17*** 0.20*** (0.03) (0.02) (0.03) (0.03) N 399 399 399 399 399 399 “More” − “Fewer” 0.22*** -0.34*** (0.03) (0.05) Notes: This table reports the association between the take-up of “more” and “fewer” commitment contracts (with a threshold of 12 visits) and subjective beliefs about the probability of success if exogenously assigned the contract. Each column presents coefficient estimates and heteroskedasticity-robust standard errors in parentheses from separate OLS regressions. Columns 1-3 display associations with participants’ subjective expectations of following through on the “more” contract with a threshold of 12 visits, with the subjective expectations coded on a scale of 0 to 1. Columns 4-6 display associations with participants’ subjective expectations of following through on the “fewer” contract with a threshold of 12 visits, with the subjective expectations coded on a scale of 0 to 1. The sample consists of participants in wave 3, the only wave in which we elicited the probabilities of contract success. *,**,*** denote statistics that are statistically significantly different from 0 at the 10%, 5%, and 1% level respectively. Table A10: Other correlates of commitment contract take-up Expected attendance Past attendance Goal attendance (1) (2) (3) Chose “more contract” 1.94*** 1.31*** 2.56*** (0.21) (0.22) (0.22) Chose “fewer” contract –0.87*** –1.94*** –1.03*** (0.23) (0.23) (0.25) N 2,946 2,946 2,946 “More” − “Fewer” 2.81*** 3.25*** 3.59*** (0.34) (0.35) (0.36) Notes: This table presents results from three stacked OLS regressions that study how the three dependent variables in columns 1-3 relate to people’s decision to take up the “more” contracts and the “fewer” contracts. Since participants were asked about multiple commitment contracts in waves 1 and 2, each participant contributes three observations to the regressions in these two waves. Heteroskedasticity-robust standard errors are reported in parentheses. *** denotes statistics that are statistically significantly different from 0 at the 1% level. C.7 Additional results for Section 6.4.3 Here we present additional results that highlight that the patterns of selecting “more” and “fewer” commitment contracts are not limited to participants for whom the contract was unlikely to be bind- ing. For each visit threshold, we identify participants whose self-reported subjective expectations for gym visits in the absence of incentives were at least two or four visits below the threshold. For 77 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky these individuals, the “more” contract would likely be significantly binding. Similarly, we identify participants whose subjective expectations for gym visits in the absence of incentives were at least one or three more than the threshold, which implies two or four more than the limit for compliance with the “fewer” contract. The tables show that the take-up of both types of contracts is similar if we limit to those for whom they were more likely to be binding (Table A11). Moreover, the correlation between the take-up of “more” and “fewer” contracts is similar as we limit to those for whom one of the contract types was more likely to be binding (Table A12). Table A11: Take-up rate by expected attendance Chose “more” Chose “more” Chose “fewer” Chose “fewer” Chose “more” given exp. att. given exp. att. Chose “fewer” given exp. att. given exp. att. contract ≤ r − 2 ≤ r − 4 contract ≥ r + 1 ≥ r + 3 Threshold (r) (1) (2) (3) (4) (5) (6) 8 visits 0.64 0.62 0.63 0.34 0.31 0.29 12 visits 0.49 0.39 0.35 0.31 0.30 0.29 16 visits 0.32 0.24 0.23 0.27 0.31 0.32 Notes: Each column reports the take-up rate of a “more” or “fewer” commitment contract with a given visits threshold r ∈ {8, 12, 16}. In columns 2, 3, 5, and 6, the samples are restricted to participants whose subjective expectations of gym attendance in the absence of incentives are ≤ r − 2 (column 2), ≤ r − 4 (column 3), ≥ r + 1 (column 5), or ≥ r + 3 (column 6). Table A12: Correlation of “more” and “fewer” take-up by expected attendance Exp. att. Exp. att. Exp. att. Exp. att. Exp. att. Exp. att. All ≤ r − 2 ≤ r − 4 ≥ r + 1 ≥ r + 3 ≤ 6 ≥ 17 Threshold (r) (1) (2) (3) (4) (5) (6) (7) 8 visits 0.37*** 0.39*** 0.46*** 0.37*** 0.38*** 0.39*** 0.41*** 12 visits 0.24*** 0.23*** 0.27*** 0.31*** 0.27*** 0.29*** 0.32*** 16 visits 0.23*** 0.22*** 0.22*** 0.33*** 0.33*** 0.25** 0.33*** Notes: Each column reports the correlation between the take-up of “more” and “fewer” commitment contracts with a given visits threshold, with the sample limited in columns 2-7 by participants’ attendance expectations in the absence of incentives. **,*** denote statistics that are statistically significantly different from 0 at the 5% and 1% level respectively. D Structural estimation appendix D.1 Details on GMM estimation of parameters Let ξ = (β, β̃, b, λ) denote the vector of parameters that we are seeking to estimate. Let α̃i(p) denote an individual i’s forecasted visits as a function of piece-rate incentive p, and let ai denote actual visits. Let pi denote the piece-rate incentive assigned to individual i. We have three sets of moment conditions. The first set of moment conditions corresponds to forecasted attendance: 78 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky [( ( ) ) ] E 28 1− e−λ(β̃(b+p)) − α̃i(p) pn = 0 for all p ∈ P = {0, 1, 2, 3, 5, 7, 12}, and all n ∈ {0, 1, 2}. The set P is the set of all incentives for which we elicited forecasts. We use 1, p, p2 as the instruments for the forecasted attendance equation, and our results are virtually unchanged for smaller and higher n. The second set of moment co[(ndit(ions corresponds)to act)ual ]attendance: E 28 1− e−λ(β(b+pi)) − a ni pi = 0 for all n ∈ {0, 1, 2}. The third set of moment conditions corresponds to the behavior change premium: [ ( )] − α̃i(p+ ∆k)− α̃i(p) − wi(p+ ∆k)− wi(p) − α̃i(p+ ∆k) + α̃i(p)E (1 β̃)(b+ (pk + pk+1)/2) = 0 ∆k ∆k 2 where pk and pk+1 are one of five pairs of adjacent incentives from the set P \ {0}, and ∆k := pk+1 − pk. Letting ξ̂ denote the parameter estimates, the GMM estimator chooses the parameter ξ̂ that minimizes ( )′ ( ) m(ξ)−m(ξ̂) W m(ξ)−m(ξ̂) , where m(ξ) are the theoretical moments, m(ξ̂) are the empirical moments, and W is the optimal weighting matrix given by the inverse of the variance-covariance matrix of the moment conditions. We approximateW using the two-step estimator outlined in Hall (2005). In the first step, we setW equal to the identity matrix,46 and use this to solve the moment conditions for ξ̂, which we denote ξ̂1. Since ξ̂1 is consistent, by Slutsky’s theorem the sample residuals û will also be consistent. We then use these residuals to estimate the variance-covariance matrix of the moment conditions, S, given by Cov(zu), where z are the instruments for the moment conditions. We then minimize ( )′ ( ) m(ξ)−m(ξ̂) Ŵ m(ξ)−m(ξ̂) using Ŵ = Ŝ−1, which gives the optimal ξ̂ (Hansen, 1982). D.2 Implications of heterogeneity for our parameter estimates Consider a first-order, linear approximation to person i’s expected linear attendance, A 0i(p) = λi + λ1iβi(bi + p). The forecasted attendance curve is given by à 0 1 i(p) = λi +λi β̃i(bi + p), and the desired attendance curve is given by A∗i (p) = λ 0 i + λ 1 i (bi + p). The behavior change premium is then given 46One other common approach is to use (zz′)−1as the weighting matrix in the first-stage, where z is a vector of the instruments in the moment equations. We confirmed our standard errors and point estimates are the same under both choices. 79 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky by BCPi(p,∆) = (1− β̃i)(bi + p+ ∆/2)λ1i β̃i. We show that we can recover E[βi],E[β̃i] and E[bi] from the population averages Ā(p), A¯̃(p), and BCPi(p,∆) . In other words, if one assumes that the aggregate forecasted and realized attendance curves and the behavior change premium are generated by a representative agent, the parameters ascribed to that representative agent in fact correspond to the average parameters in the population. We make the following assumptions: Assumption 1. The parameters β̃i, bi, λ1i are mutually independent. Assumption 2. The parameters βi, bi, λ1i are mutually independent. Assumption 3. Terms of order E[(1− β̃ )2i ] are negligible. Proof. Without loss of generality, consider two values of p: p1 and p = p + 1. Let A ¯̃−1 2 1 denote the inverse of A¯̃(p), which is also approximately linear, by assumption. We then have E[Ãi(p2)− Ãi(p1)] = E[β̃i]E[λ1i ] (19) E[A 1i(p2)−Ai(p1)] = E[βi]E[λi ] (20) A¯̃−1(0) = −E[bi] (21) Since the left-hand-side of all three equations above is observed in the data, we can solve for E[β̃ 1 1i]E[λi ], E[βi]E[λi ], E[bi]. Next, note that E[BCPi(p,∆)] = E[(1− β̃i)(bi + p+ ∆/2)λ1i β̃i] ( ) = E[(1− β̃i)((bi + p+ ∆/2)]E[β̃i]E[λ1] +O E[(1−)β̃ )2i i ] = E[(1− β̃i] E[bi])E[β̃i]E[λ1i ] + (p+ ∆/2)E[β̃ ]E[λ1i i ] +O E[(1− β̃i)2] Since E[b 1i]E[β̃i]E[λi ] and E[β̃i]E[λ1i ] are identified from the system of equations (19)-(21), we can therefore solve for E[1 − β̃i] given a value of E[BCPi(p,∆)] for a pair of (p,∆). Given a value of E[β̃i], equation (19) then identifies E[λ1i ], and given the value of E[λ1i ], equation (20) then identifies E[βi]. D.3 Details on equilibrium strategies, value functions, and simulated behavior D.3.1 Equilibrium value functions and strategies We let f denote the probability density function (PDF) of a random variable given by c+X, where X is distributed exponentially with rate parameter λ. We let F denote the cumulative distribution 80 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky function (CDF). As before, T is the total number of periods to which the contract applies. The exponential distribution provides closed-form solutions for both the conditional expectation and the CDF. ∫ x 1 ( ) cf(c)dc = c+ 1− e−λ(x−c) − xe−λ(x−c) (22) c=c λ F (x) = 1− e−λ(x−c) (23) ∑ Let h t−1t = j=1 aj denote the period-t history summarizing a person’s total attendance in periods 1, . . . , t− 1. Given a contract C, we let W ∗t (C, ht;β, β̃) denote a person’s expected utility using the period t−1 information set and the long-run criterion. LetWt(C, ht;β, β̃) denote a person’s forecast of the expected utility (normalized by β), which may differ from W ∗t if β̃ 6= β. When C is a linear piece-rate incentive of p per attendance, ∫ β(b+p) W ∗t (C, ht;β, β̃) = (T − t) · ∫ (b− c)f(c)dcc=cβ̃(b+p) Wt(C, ht;β, β̃) = (T − t) · (b− c)f(c)dc c=c and in each period a person chooses to attend the gym if and only if β(b + p) ≥ ct. We now characterize W ∗t and Wt when C is a contract where participants lose p if they don’t attend at least g times. We start with the sophisticated case where β = β̃. In period T ,∫∫ βbc=c(b− c)f(c)dc if hT ≥ r W ∗(h ) = ∫ β(b+p)T t  c=c (b− c)f(c)dc− (1− F (β(b+ p)))p if hT = r − 1βb c=c(b− c)f(c)dc− p if hT < r − 1 Now, for any history h, define ∆W ∗t+1(h) := W ∗ ∗t+1(h + 1) −Wt+1(h). Then a person chooses to attend the gym in period t if and only if β(b+ ∆W ∗t+1(ht)) ≥ ct. For t < T , we have the following recursion on∫the value functions:β(b+∆W ∗ ∫ ∗ t+1 (ht)) ∞ Wt (ht) = (b+W ∗ t+1(ht + 1)− c)f(c)dc+ W ∗t+1(ht)f(c)dc. (24) c=c c=β(b+∆W ∗t+1(ht)) Note that (22) and (23) imply that the expression in (24) above has a closed-form solution for Wt given a value function Wt+1. Next, note thatWt(C, ht;β, β̃) = W ∗t (C, ht; β̃, β̃), meaning that subjective expectations of utility of partial naifs are immediately implied by the recursion for sophisticates. In period T ,∫∫ β̃bc=c(b− c)f(c)dc if hT ≥ r WT (C, ht;β, β̃) = ∫ β̃(b+p) c=c (b− c)f(c)dc− (1− F (β̃(b+ p)))p if hT = r − 1β̃b c=c(b− c)f(c)dc− p if hT < r − 1 81 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky while W ∗(C, h ;β, β̃) = W ∗T T T (C, hT ;β, β). For any history h, define ∆Wt+1(h) := Wt+1(h + 1) − Wt+1(h). In period t, a person chooses to attend the gym if and only if β(b+ ∆Wt+1(ht)) ≥ ct. For t < T , we have the following recursion on the value functions: ∫ β(b+∆W ∫t+1(ht)) ∞ W ∗t (C, ht;β, β̃) = (b+W ∗ ∗t+1(ht + 1)− c)f(c)dc+ Wt+1(ht)f(c)dc. c=c c=β(b+∆Wt+1(ht)) (25) A person’s incremental gain from the contract is given by W ∗0 (C, ht;β, β̃)−W ∗0 (∅, ht;β, β̃), where ∅ denotes the absence of a contract. D.3.2 Simulating the impacts of contracts on behavior Under a piece-rate incentive of p per attendance, a person attends in period t if and only if β(b+p) ≥ ct, and thus the impact of a piece-rate incentive on behavior is simply F (β(b + p)) − F (βb), for which an analytic solution is given by (23). An analytic solution does not exist for the impacts of commitment contracts. We thus study the effects using simulation methods. Specifically, we simulate attendance under a commitment contract over 10,000 draws of a T - period cost vector (c1, c2, . . . , cT ), where each ct is an independent draw from the exponential distri- bution with CDF F . In each draw, a person’s behavior in each period can be computed recursively by “forward induction”—i.e., first computing behavior in period t = 1, then t = 2, and so forth. In particular, in period 1, a person ch[ooses a1 = 1 if ] c1 ≤ β b+W2(C, 1;β, β̃)−W2(C, 0;β, β̃) . For periods t > 1, a person chooses at = 1 if[ ] ct ≤ β b+Wt+1(C, ht + 1;β, β̃)−Wt+1(C, ht;β, β̃) . D.3.3 Optimal piece-rate incentives for efficient behavior change Consider a set J of types indexed by j, and having a share µj in the population. The efficiency of behavior change under a piece-rate incentive p is given by∑ ∫ c=bj+p WE = T ·  µj (bj − c)fj(c)dc . j∈J c=bj The first-order condition is∑ µjβj(bj(1− βj)− βjp)f(βj(bj + p)) = 0, j 82 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky which implies that the optimal inc∑entive must satisfy j∈J∑µj(1− βj)bjβjfj(βj(bj + p))p = 2 . j µjβj fj(βj(bj + p)) For example, under homogeneity, the optimal value of p is simply (1−β)b/β. We verify numerically that there is a unique value of p satisfying the condition above in the heterogeneous cases that we study. 83 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky D.4 Additional structural estimation results Table A13: Additional parameter estimates (1) (2) (3) (4) (5) (6) ˆ̃ ˆ̃ β̂ β b̂ 1/λ̂ (1− β̂) · b̂ (1−β) (1−β̂) All 0.55 0.84 9.66 14.81 4.39 0.36 1 (N=1, 126) (0.51, 0.58) (0.80, 0.88) (9.05, 10.28) (13.61, 16.00) (4.02, 4.77) (0.29, 0.43) Waves 1 and 2 0.56 0.84 9.64 14.94 4.23 0.36 2 (N=849) (0.52, 0.60) (0.79, 0.89) (8.92, 10.36) (13.53, 16.35) (3.78, 4.67) (0.27, 0.45) Waves 2 and 3 0.53 0.81 10.07 14.70 4.75 0.40 3 (N=786) (0.49, 0.57) (0.76, 0.86) (9.29, 10.84) (13.18, 16.22) (4.27, 5.23) (0.31, 0.49) Chose 8+ visit 0.54 0.84 9.16 14.23 4.23 0.36 4 contract (N=546) (0.49, 0.59) (0.77, 0.90) (8.34, 9.98) (12.51, 15.96) (3.70, 4.76) (0.24, 0.47) Chose 12+ visit 0.50 0.81 9.62 12.33 4.84 0.37 5 contract (N=556) (0.45, 0.54) (0.75, 0.88) (8.78, 10.47) (10.86, 13.81) (4.31, 5.38) (0.26, 0.47) Chose 16+ visit 0.47 0.75 10.30 10.33 5.46 0.48 6 contract (N=275) (0.39, 0.55) (0.63, 0.86) (8.94, 11.67) (8.22, 12.44) (4.57, 6.34) (0.33, 0.64) Rejected 8+ visit 0.61 0.86 10.64 16.69 4.13 0.35 7 contract (N=303) (0.55, 0.67) (0.81, 0.92) (9.23, 12.04) (14.37, 19.00) (3.39, 4.86) (0.24, 0.47) Rejected 12+ visit 0.59 0.86 9.46 17.26 3.84 0.35 8 contract (N=570) (0.55, 0.64) (0.82, 0.89) (8.59, 10.32) (15.55, 18.98) (3.36, 4.32) (0.27, 0.43) Rejected 16+ visit 0.58 0.85 9.11 16.70 3.83 0.36 9 contract (N=574) (0.54, 0.62) (0.81, 0.89) (8.28, 9.94) (15.09, 18.30) (3.37, 4.29) (0.28, 0.43) Notes: This table reports parameter estimates and respective 95% confidence intervals for various subsamples. The subsamples are determined by the participants’ take-up of the various commitment contracts for more visits, or the wave in which they participated. Section 7.1 describes how the parameter estimation was performed. The present focus parameter is denoted by β, the perceived present focus parameter is denoted by β̃, people’s (perceived) health benefits of a gym attendance are denoted by b, and people’s expected costs of a gym attendance are denoted by 1/λ. Inference for the statistics in columns 4-6 is conducted using the Delta method. All participants faced a take-up decision about a commitment contract with a 12-visit threshold (N=1,248), while the 8-visit and 16-visit commitment contracts were only presented in the first two waves (N=849). The samples exclude participants in wave 3 assigned a commitment contract (122 participants), rather than a piece-rate incentive, as our structural estimates only make use of data about how participants behave under piece-rate incentives. 84 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Figure A4: Structural models’ in-sample fit to participants’ forecasted and realized attendance (a) Homogenous structural parameters (b) Heterogeneous structural parameters 20 20 15 15 10 10 5 5 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Per-visit incentive ($) Predicted expected visits Prediction 95% CI Predicted expected visits Prediction 95% CI Predicted realized visits Prediction 95% CI Predicted realized visits Prediction 95% CI Average expected visits Average realized visits Average expected visits Average realized visits Notes: These figures assess the structural models’ fit to participants’ subjective expectations of attendance and actual attendance. Panel (a) considers the specification in row 1 of Table 7. Panel (b) considers the structural model with eight heterogeneous types, as in row 9 of Table 7. The empirical estimates of realized attendance and subjective expectations of attendance are as in Figure 2. 85 Visits Visits Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A14: Parameter estimates excluding subjects flagged for some form of confusion (1) (2) (3) (4) (5) (6) ˆ̃ ˆ̃ β̂ β b̂ 1/λ̂ (1− β̂) · b̂ (1−β) (1−β̂) All 0.55 0.84 9.39 14.56 4.22 0.36 1 (N=1, 031) (0.51, 0.59) (0.80, 0.88) (8.79, 9.99) (13.36, 15.77) (3.84, 4.59) (0.28, 0.43) Information control 0.55 0.87 9.88 15.08 4.43 0.28 2 (N=516) (0.51, 0.59) (0.84, 0.91) (8.99, 10.78) (13.53, 16.64) (3.95, 4.91) (0.20, 0.36) Enhanced information 0.54 0.77 9.34 14.14 4.31 0.50 3 treatment (N=349) (0.46, 0.62) (0.67, 0.87) (8.33, 10.35) (11.62, 16.66) (3.53, 5.10) (0.34, 0.65) Below-median past 0.40 0.79 7.03 13.92 4.24 0.35 4 attendance (N=502) (0.35, 0.45) (0.71, 0.87) (6.41, 7.64) (12.00, 15.84) (3.77, 4.72) (0.23, 0.46) Above-median past 0.67 0.89 11.98 15.16 3.93 0.34 5 attendance (N=529) (0.63, 0.71) (0.85, 0.93) (10.90, 13.06) (13.62, 16.71) (3.41, 4.44) (0.24, 0.44) Chose 8+ visit 0.55 0.83 8.69 13.57 3.95 0.37 6 contract (N=510) (0.49, 0.60) (0.76, 0.91) (7.92, 9.46) (11.88, 15.26) (3.43, 4.46) (0.24, 0.49) Chose 12+ visit 0.49 0.82 9.12 11.81 4.61 0.36 7 contract (N=507) (0.45, 0.54) (0.75, 0.89) (8.32, 9.91) (10.36, 13.25) (4.10, 5.12) (0.25, 0.47) Chose 16+ visit 0.48 0.76 9.25 9.53 4.81 0.46 8 contract (N=253) (0.40, 0.56) (0.64, 0.88) (8.07, 10.43) (7.63, 11.44) (4.02, 5.60) (0.29, 0.63) Averaging heterogeneity 0.56 0.85 9.96 15.44 4.08 0.34 9 (N=865) (0.52, 0.59) (0.81, 0.89) (9.23, 10.69) (14.12, 16.76) (3.70, 4.45) (0.26, 0.41) Notes: This table performs parameter estimation identical to Table 7 in the body of the paper, but excludes participants flagged for potential confusion. 86 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky D.5 Welfare effects of other commitment contracts Table A15: Estimated welfare effects of piece-rates and commitment contracts (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 8+ visits contract 0.77 −$5.09 $6.41 $6.14 $0.27 2 Linear incentive, p = $1.21 0.77 $14.42 $8.18 $5.26 $2.93 3 16+ visits contract 1.43 −$3.40 $15.00 $12.05 $2.94 4 Linear incentive, p = $2.24 1.43 $27.75 $14.77 $9.70 $5.06 Notes: Analogous to Table 9, this table reports the estimated effects of four different incentive schemes, averaged over the full population. There are eight heterogeneous types in all rows. In rows 1 and 2, we assume that there are eight types of individuals, corresponding to eight subgroups: below- or above-median past attendance, crossed with receiving either the enhanced information treatment or no information treatment, crossed with choosing the 8+ commitment contract. In rows 3 and 4, we assume that there are eight types of individuals, corresponding to eight subgroups: below- or above-median past attendance, crossed with receiving either the enhanced information treatment or no information treatment, crossed with choosing the 16+ commitment contract. D.6 Welfare estimates for alternative specifications of heterogeneity Table A16: Estimated welfare effects of piece-rates and commitment contracts, homogeneity (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 12+ visits contract 1.51 −$3.82 $14.49 $14.86 −$0.38 2 Linear incentive, p = $2.15 1.51 $26.91 $14.37 $8.67 $5.70 Optimal linear incentive, 3 5.04 $118.61 $48.07 $36.53 $11.53 p = $7.98 4 8+ visits contract 0.63 −$1.39 $5.81 $6.08 −$0.28 5 Linear incentive, p = $0.88 0.63 $10.57 $6.13 $3.62 $2.50 6 16+ visits contract 1.64 −$3.46 $16.88 $16.69 $0.20 7 Linear incentive, p = $2.32 1.64 $29.80 $15.61 $9.42 $6.19 Notes: This table reports welfare effects for the incentive schemes considered in Tables 9 and A15 along with several others, but under different assumptions about heterogeneity. In this table, we assume that individuals are homogeneous conditional on their choice of contract, as in row 2 of Table 8 (and its analogues for rows 4/5 and rows 6/7). 87 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A17: Estimated welfare effects of piece-rates and commitment contracts, heterogeneity along past attendance (below/above median) (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 12+ visits contract 1.29 −$9.10 $10.84 $9.83 $1.02 2 Linear incentive, p = $1.97 1.29 $23.72 $12.67 $7.99 $4.68 Optimal linear incentive, 3 4.61 $111.78 $45.17 $35.17 $10.00 p = $7.83 4 8+ visits contract 0.91 −$5.07 $6.53 $6.80 −$0.27 5 Linear incentive, p = $1.37 0.91 $16.27 $9.07 $5.77 $3.30 6 16+ visits contract 1.31 −$5.95 $12.60 $11.91 $0.70 7 Linear incentive, p = $1.98 1.31 $24.22 $12.86 $8.15 $4.71 Notes: This table reports welfare effects for the incentive schemes considered in Tables 9 and A15 along with several others, but under different assumptions about heterogeneity. In this table, we make the heterogeneity assumption in row 4 of Table 8 (and its analogues for rows 4/5 and rows 6/7). Table A18: Estimated welfare effects of piece-rates and commitment contracts, heterogeneity along past attendance (quartile) (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 12+ visits contract 1.35 −$9.82 $11.04 $10.17 $0.86 2 Linear incentive, p = $2.15 1.35 $25.74 $13.48 $8.65 $4.83 Optimal linear incentive, 3 4.42 $108.70 $43.75 $34.23 $9.52 p = $7.74 4 8+ visits contract 0.91 −$7.29 $6.64 $6.40 $0.24 5 Linear incentive, p = $1.43 0.91 $16.85 $9.25 $6.04 $3.20 6 16+ visits contract 1.25 −$6.82 $11.09 $10.41 $0.68 7 Linear incentive, p = $1.95 1.25 $23.52 $12.39 $8.06 $4.34 Notes: This table reports welfare effects for the incentive schemes considered in Tables 9 and A15, along with several others, but under different assumptions about heterogeneity. In this table, we make the heterogeneity assumption of row 5 of Table 8 (and its analogues for rows 4/5 and rows 6/7). 88 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky D.7 How commitment contracts affect attendance over time Figure A5: Simulated probability of attendance each day, chose 12+ visits contract 1 Baseline 0.9 With 12+ visits contract First best 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Day Notes: This figure displays the simulated probability of attending the gym each day, under the heterogeneity assumptions of Table 9. Figure A6: Change in likelihood of attendance each day, chose 12+ visits contract .5 .4 .3 .2 .1 0 -.1 -.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Day Notes: This figure displays the estimated change in the likelihood of attending the gym each day from assignment to the “more” contract with a threshold of 12 visits. Estimates are obtained from an OLS regression of gym attendance on indicators for each day and their interactions with an indicator for assignment to the contract. The coefficients on the interaction terms are plotted with 95% confidence intervals, obtained from standard errors clustered at the subject level. The sample is limited to participants who wanted the contract and were exogenously assigned to either receive the contract or to receive no incentives. A line is plotted with an intercept and slope equal to the coefficients on 12+ visits contract and Day × 12+ visits contract , respectively, from the regression in Table A19. 89 Change in likelihood of going to the gym Probability of going to the gym Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A19: Daily likelihood of attendance, chose 12+ visits contract Attendance likelihood (1) Day –0.005*** (0.001) 12+ visits contract 0.051 (0.045) Day × 12+ visits contract 0.005** (0.002) Wave FEs Yes N 7,336 Clusters 262 Notes: This table reports the estimated change in the likelihood of attending the gym each day by assignment to the “more” contract with a threshold of 12 visits. Day is an index for the day in the 4-week study period, from 1 to 28, and 12+ visits contract is an indicator for assignment to the contract. The table presents coefficient estimates and standard errors clustered at the subject level in parentheses from an OLS regression. The sample is limited to participants who wanted the contract and were exogenously assigned to either receive the contract or to receive no incentives. **,*** denote statistics that are statistically significantly different from 0 at the 5% and 1% level respectively. D.8 Alternative assumptions about the cost distribution We consider models in which c ∼ −$5 + X or c ∼ $10 + X, where X is exponentially distributed with rate λ. The first assumption corresponds to the net immediate costs being negative on “good” days, while the second assumption corresponds to the minimal net cost being equivalent to $10. The parameter estimates naturally change—but in a manner that worsens both the in-sample and out-of-sample fit of the model. Higher mean costs lead to a higher estimate of perceived health benefits b; this, in turn, leads to lower estimates of (1−β̃) and (1−β) because the wedges between the actual, forecasted, and desired attendance are functions of (1− β)b and (1− β̃)b. The in-sample fit to the actual and forecasted attendance curves does not suffer when we assume the higher cost-draw distribution, but it worsens significantly when assume the lower cost-draw distribution, as shown in Appendix Figure A8. The out-of-sample fit to the effects of the 12+ commitment contracts worsens dramatically for both assumptions. The higher distribution of cost draws leads the model to predict that commitment contracts have too high of an effect on the probability of attending the gym 12 or more times, while the lower distribution of cost draws leads the model to predict that commitment contracts have too small of an effect on both average attendance and the probability of attending the gym 12 or more times. 90 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky D.8.1 Minimal cost draw of $10 Figure A7: Structural models’ in-sample fit to participants’ forecasted and realized attendance (a) All (b) Averaging heterogeneity 20 20 15 15 10 10 5 5 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Per-visit incentive ($) Predicted expected visits Prediction 95% CI Predicted expected visits Prediction 95% CI Predicted realized visits Prediction 95% CI Predicted realized visits Prediction 95% CI Average expected visits Average realized visits Average expected visits Average realized visits Notes: This figure replicates Figure A4, but assumes that the distribution of cost draws is given by 10 +X, where X is an exponentially distributed random variable. Table A20: Estimated impact of 12+ contract on attendance (1) (2) (3) (4) Pr(att. ≥ 12) Pr(att. ≥ 12) ∆ in att. ∆ in Pr(att. ≥ 12) with contract without contract 3.51 0.65 0.22 0.42 1 Empirical (1.38, 5.65) (0.52, 0.78) (0.10, 0.35) (0.26, 0.58) 2 Homogeneous 3.78 0.96 0.10 0.86 Heterogeneous by median 3 3.80 0.89 0.30 0.58 past att., info. treatment Heterogeneous by 4 3.99 0.91 0.30 0.61 median past att. Heterogeneous by 5 4.24 0.90 0.29 0.61 quartile past att. Heterogeneous by quartile 6 4.03 0.89 0.31 0.59 past att., info. treatment Notes: This table replicates Table 8, but assumes that the distribution of cost draws is given by 10 + X, where X is an exponentially distributed random variable. 91 Visits Visits Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A21: Estimated welfare effects of piece-rates and commitment contracts (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 12+ visits contract 1.88 −$2.01 $38.03 $35.64 $2.39 2 Linear incentive, p = $2.21 1.88 $30.33 $39.19 $30.61 $8.58 Optimal linear incentive, 3 5.56 $114.84 $115.99 $100.26 $15.74 p = $7.34 4 8+ visits contract 1.15 −$1.06 $22.46 $21.61 $0.85 5 Linear incentive, p = $1.36 1.15 $18.20 $24.27 $18.76 $5.51 6 16+ visits contract 1.76 −$1.53 $39.83 $36.81 $3.02 7 Linear incentive, p = $2.12 1.76 $29.22 $37.37 $29.11 $8.26 Notes: This table replicates Tables 9 and A15, but assumes that the distribution of cost draws is given by 10 +X, where X is an exponentially distributed random variable. D.8.2 Minimal cost draw of -$5 Figure A8: Structural models’ in-sample fit to participants’ forecasted and realized attendance (a) All (b) Averaging heterogeneity 20 20 15 15 10 10 5 5 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 Per-visit incentive ($) Per-visit incentive ($) Predicted expected visits Prediction 95% CI Predicted expected visits Prediction 95% CI Predicted realized visits Prediction 95% CI Predicted realized visits Prediction 95% CI Average expected visits Average realized visits Average expected visits Average realized visits Notes: This figure replicates Figure A4, but assumes that the distribution of cost draws is given by −5 +X, where X is an exponentially distributed random variable. 92 Visits Visits Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A22: Estimated impact of 12+ contract on attendance (1) (2) (3) (4) Pr(att. ≥ 12) Pr(att. ≥ 12) ∆ in att. ∆ in Pr(att. ≥ 12) with contract without contract 3.51 0.65 0.22 0.42 1 Empirical (1.38, 5.65) (0.52, 0.78) (0.10, 0.35) (0.26, 0.58) 2 Homogeneous 1.57 0.78 0.33 0.45 Heterogeneous by median 3 0.64 0.58 0.41 0.17 past att., info. treatment Heterogeneous by 4 0.63 0.58 0.41 0.17 median past att. Heterogeneous by 5 0.69 0.57 0.39 0.18 quartile past att. Heterogeneous by quartile 6 0.70 0.59 0.39 0.19 past att., info. treatment Notes: This table replicates Table 8, but assumes that the distribution of cost draws is given by −5 + X, where X is an exponentially distributed random variable. 93 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky Table A23: Estimated welfare effects of piece-rates and commitment contracts (1) (2) (3) (4) (5) Avg. ∆ in ∆ Agent ∆ Health ∆ Attendance ∆ Social attendance surplus benefits costs surplus 1 12+ visits contract 0.32 −$16.27 $1.85 $1.65 $0.20 2 Linear incentive, p = $0.86 0.32 $9.51 $1.94 $1.08 $0.86 Optimal linear incentive, 3 2.09 $75.70 $12.73 $9.60 $3.12 p = $6.12 4 8+ visits contract 0.19 −$10.25 $0.91 $0.66 $0.25 5 Linear incentive, p = $0.55 0.19 $6.02 $1.28 $0.71 $0.58 6 16+ visits contract 0.50 −$11.49 $3.75 $4.19 −$0.44 7 Linear incentive, p = $1.41 0.50 $15.88 $3.10 $1.82 $1.28 Notes: This table replicates Tables 9 and A15, but assumes that the distribution of cost draws is given by −5 +X, where X is an exponentially distributed random variable. D.9 Dollar value of exercise from public health estimates We provide two “back of the envelope” calculations of the dollar benefit of an hour of exercise. Our goal is not to provide a comprehensive review of the literature on the value of exercise, but to demonstrate that the literature provides a range of possible values. Sun et al. (2014) find a median difference of 0.112 Quality Adjusted Life Years (QALYs) between a group that was inactive over a two-year period and a group that exercised on average at least 2.5 hours per week over the two-year period controlling for sociodemographic characteristics (age, race/ethnicity, living arrangement, income, and education) and health status (e.g., smoking and BMI). If we adopt 50,000 dollars as the value for a QALY (Neumann, Cohen, and Weinstein, 2014), the benefit from an hour of exercise is: 0.112× ($50,000)/(2.5× 104) = $21.5 Despite the inclusion of control variables, this study likely overstates the causal effect of exercise because it does not control for other factors that may affect the difference in QALYs between the two groups, such as diet before and during the period of study and exercise before the period of study. Blair et al. (1989) examine the association between mortality risk and exercise over a fifteen-year period among a population of healthy non-geriatric adults. They find that a male who moved from the least fit quintile to the average of the other four quintiles would reduce his chances of dying by 36.7%, and a female who made a similar move would reduce her chances of dying by 48.4%. 94 Online Appendix Carrera, Royer, Stehr, Sydnor, and Taubinsky The authors also find that a brisk walk of 30 to 60 minutes each day would be sufficient to move an individual to a plateau where further exercise would not further lower the risk of death. If we assume that 45 minutes per day of exercise would at least move a person out of the lowest quintile of exercise and into the upper four quintiles (a smaller change than reaching the plateau), then it would lead to the reported reductions in mortality (36.7% for men and 48.4% for women). The paper reports an age-adjusted all-cause mortality rate of 64 per 10,000 person-years among men in the lowest quintile of exercise and 39.5 per 10,000 person-years among women in the lowest quintile. The sample in our study is 61.3% female and 38.7% male with an average age of 34 years. Assuming men at age 34 have a death rate of 161 per 100,000 and women at age 34 have a death rate of 85 per 100,000, the weighted average reduction in the death rate from this level of exercise for an individual at age 34 in our sample is47 reduction in deathrate = 0.387 ∗ 0.367 ∗ 161/100,000 + 0.613 ∗ 0.484 ∗ 85/100,000 = 48.1/100,000 The value of the exercise then depends on the value of remaining life for a 34-year-old. If we adopt the SVL (statistical value of life) used by the US Environmental Protection Agency of 9.0 million dollars, we obtain 48.1/100,000× $9,000,000 = $4,329 Since the exercise required to achieve this gain was 45 minutes per day, the value of an hour of exercise is: $4,329/(0.75× 365) = $15.81 Alternatively, we could assume that a QALY is worth $50,000, use life tables to calculate the probability of survival to each age beyond 34, and calculate the present discounted value (PDV) of life remaining. Using a discount rate of 2%, we calculate $1,431,000 for men and $1,519,000 for women. Performing similar calculations to the ones above for men and women and then taking the weighted average based on the fraction of each gender in the sample, we obtain $2.61 per hour of exercise. Since part of the reason for discounting is to take account of the decreasing probability of survival at higher ages, it may be appropriate to apply an even lower discount rate. If we assume a discount rate of 0% so that the decrease in the contribution of QALYs at higher ages is entirely attributable to a decreased probability of survival, the value of life remaining past age 34 increases to $2,189,000 for men and $2,390,000 for women, and the value of an hour of exercise increases to $4.06. 47NCHS, National Vital Statistics System, Mortality. “United States Life Tables, 2014”. National Vital Statistics Reports Vol. 66 No. 4. August 14, 2017. 95