Can Simple Psychological Interventions Increase Preventive Health Investment?

Abstract Behavioral constraints may explain part of the low demand for preventive health products. We test the effects of two light-touch psychological interventions on water chlorination and related health and economic outcomes using a randomized controlled trial among 3,750 women in rural Kenya. One intervention encourages participants to visualize alternative realizations of the future, and the other builds participants’ ability to make concrete plans. After 12 weeks, visualization increases objectively measured chlorination, reduces diarrhea episodes among children, and increases savings. Effects on chlorination and savings persist after almost 3 years. Effects of the planning intervention are weaker and largely insignificant. Analysis of mechanisms suggests both interventions increase self-efficacy—beliefs about one’s ability to achieve desired outcomes. Visualization also increases participants’ skill in forecasting their future utility. The interventions do not differentially affect beliefs and knowledge about chlorination. Results suggest simple psychological interventions can increase future-oriented behaviors, including use of preventive health technologies.


Appendix B: Cost-Effectiveness Per Disability-Adjusted Life Year Saved
To compute the cost per disability-adjusted life years saved with our interventions, we proceed as follows. The cost of our interventions was USD 4 per household. This is USD 3.70 per child under 5 or USD 1.58 per child under 15 because participants have 1.08 children under 5 and 2.53 children under 15 on average. This includes the costs relevant to a potential scale-up, i.e. the cost of running the sessions including overheads, but excludes our incentivized surveys, as well as the cost of sampling/targeting 18-35 year old women. Troeger et al. (2018) estimate the DALYs lost per child under 5 in Kenya due to diarrhea in 2016, the year before our study was conducted, to be 0.127. This includes both acute, immediate effects of illness and/or death and the longer-term burden of disease associated with growth impairment due to diarrhea. Our region is one of the poorer ones in Kenya and likely has worse health outcomes and a higher burden of disease from diarrhea than the Kenya-wide average, so using this gure will lead to a conservative estimate. We focus on children under 5, as these are the focus of the estimates in Troeger et al. (2018). We assume that the treatment effects of our interventions on diarrhea relative to the active control group were never higher during the study period of three months than what we measured at the 3-month endline, and then immediately went to zero. This, too, is a conservative assumption and will lead to a lower-bound estimate. For children under 5, Table 2 shows reductions in under-5 diarrhea of 47% in the Visualization group, and no signi cant effect in the Planning group. DALYs saved due to the Visualization intervention are 0:127 47% 3=12 D 0:0149225, if one accounts for both acute and long term effects of diarrhea. The cost of one DALY saved is USD 3.70=0:0149225 D USD 248.
For children under 15, the effect of our interventions on diarrhea relative to the active control group is 46% for Visualization and 23% for Planning. We could not nd published estimates for Kenya of DALYs lost per child under 15 annually to diarrhea, so we use the same gure of 0.127 as for under 5 children. This likely overstates the disease burden on children under 15, as older children are less likely to die. However, older children do bene t from a reduction in ongoing enteric dysfunction, which may cause adverse effects like stunting and impaired cognition in a wide age range. Thus, a restriction to children under 5 would be too narrow. With this assumption, DALYs saved per child due to the Visualization intervention are 0:014605, and of the Planning intervention 0:0073025. Combined with an intervention cost of USD 1.58 per child for both interventions, this implies that the cost of one DALY saved is USD 1.58=0:014605 D USD 108 for the Visualization intervention and USD 1.58=0:0073025 D USD 216 for the Planning intervention.
Extrapolating to other contexts, these numbers will be sensitive to the number of children per treated participant. Estimates are also sensitive to how long the effect on diarrhea lasts: If, instead of three months, the effects lasted for one year and then faded out, the estimated cost per DALY saved for the Visualization intervention would be reduced to USD 27 (when considering children under 15) and USD 62 (when considering only children under 5). All our estimates remain highly cost-effective by WHO standards. The WHO classi es an intervention as "cost-effective" for a cost per DALY saved below USD 4525, and "highly cost-effective" below USD 1508 (https://www.who.int/bulletin/volumes/93/2/14-138206/en/).
Policymakers may be interested in the comparison of our interventions to other cost-effective preventive health interventions, such as insecticide-treated nets. Cost effectiveness estimates for malaria bednets range from USD 29 to 100/DALY (Wisniewski et al. 2020), depending on assumptions. The WHO estimates a cost per DALY of USD 29 to 34 (https://www.who.int/news-room/feature-stories/detail/newcost-effectiveness-updates-from-who-choice). The higher estimate of USD 100 is from GiveWell (https://forum.effectivealtruism.org/posts/HbunzTyFPRwcYihg6/longlasting-insecticide-treated-nets-usd3-340-per-life). A recent meta-analysis of studies in Africa estimates a cost per DALY between USD 42 and USD 80 (Wisniewski et al. 2020). Notes: The table reports OLS estimates of heterogeneous treatment effects on diarrhea in the 33-month survey, by season and type of water source. The sample is restricted to individuals who attended the baseline survey. Surveyed during rainy season is an indicator for whether the respondent was surveyed after November 1, 2020 for the follow-up survey. Unprotected water source equals 1 if the household's primary water source is not protected (unprotected well, unprotected spring, rainwater, surface water), and 0 otherwise (private or public tap, borehole or tubewell, protected well or spring). We report standard errors in parentheses. All columns include village-level xed effects, control for diarrhea at baseline and a vector of individual characteristics, and cluster standard errors at the level of the intervention cohort. As regressors are potentially endogenous to treatment, all regressions should be interpreted as correlational evidence. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level.

Appendix D: Supplementary Evidence on Mechanisms
T D.2. Correlates of chlorination. Notes: Predictive OLS regressions of chlorination and savings measures on alternative mechanisms, as well as on demographics. All regressors except those measured at baseline are potentially endogenous to treatment, and thus provide only correlational evidence. All regressions include village-level xed effects, the demographic controls listed, and standard errors which are clustered at the level of the intervention cohort.
Covariates are listed on the left, and are described in detail in Section 4. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level.  (1) and (6) report the mean and standard deviation of the control group. Columns (2)-(3) and (7)-(8) report the coef cients of interest and standard errors in parentheses. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. Square brackets contain additional p-values corrected for multiple hypothesis testing using the false discovery rate. All columns include village-level xed effects, a vector of individual characteristics, xed effects for the week and the day of the week of the relevant survey, and standard errors which are clustered at the level of the intervention cohort. Outcome measures are listed on the left, and are described in detail in Section 4. T E.2. Psychological outcomes (comparison with pure control group).

Appendix E: Pure Control Comparison
Endline (10-12 weeks) Follow-Up (30-36 months) (1) (3)  (1) and (6) report the mean and standard deviation of the control group. Columns (2)- (3) and (7)- (8) report the coef cients of interest and standard errors in parentheses. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. Square brackets contain additional p-values corrected for multiple hypothesis testing using the false discovery rate. All columns include village-level xed effects, a vector of individual characteristics, xed effects for the week and the day of the week of the relevant survey, and standard errors which are clustered at the level of the intervention cohort. Outcome measures are listed on the left, and are described in detail in Section 4. T E.3. Alternative mechanisms (comparison with pure control group).
Endline (10-12 weeks) Follow-Up (30-36 months) (1) (3)  (1) and (6) report the mean and standard deviation of the control group. Columns (2)- (3) and (7)-(8) report the coef cients of interest and standard errors in parentheses. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. All columns include village-level xed effects, a vector of individual characteristics, xed effects for the week and the day of the week of the relevant survey, and standard errors which are clustered at the level of the intervention cohort. Outcome measures are listed on the left, and are described in detail in Section 4. The bottom panel of the table reports the probability of remembering a chlorine-related word, or a savings-related word, on a given word list in the salience task. In the endline survey, participants were read three word lists, resulting in three observations per individual. In the long-run follow-up, participants were read one word list (randomly selected out of three lists). Salience regressions additionally control for the total number of words the participant remembered on that word list. Table F.1 lists our hypotheses and pre-speci ed outcome variables. We adjust for multiple hypothesis testing within outcome groups (behaviors and psychological mechanisms) and hierarchical outcome categories (primary, secondary, and exploratory), but not across them. Behavioral outcomes are our main focus. Our primary hypothesis is that interventions affect water chlorination, measured with the primary outcome of objectively measured water chlorination (10 weeks) and self-reported chlorination (30-36 months). In the follow-up, we add another primary hypothesis, that the intervention affects chlorination-related health outcomes, and measure child diarrhea. After 30-36 months, we correct p-values across these two primary hypotheses.

Appendix F: Test Corrections and Experimental Integrity
Our secondary hypothesis tests if the interventions have domain-general effects on future investments. We consider one pre-speci ed outcome measuring savings behavior, labor supply, and education investment (10 weeks) and savings behavior and labor supply (30-36 months). We adjust p-values across this group of outcomes in each round.
For analysis on psychological outcomes, after 10 weeks, we test three main hypotheses, namely that interventions affect planning, time preferences and self-ef cacy, with one primary variable to capture each concept. We correct p-values over the three hypotheses. After 30-36 months, we only examine time preferences and self-ef cacy, as we found few short term effects on planning measures. We also run exploratory analysis on some pre-speci ed and some non-speci ed variables. We correct across all the exploratory tests we run on behaviors, and separately, on psychological outcomes.  (1) reports the mean and standard deviation of the control group. Columns (2)-(3) report the coef cients of interest and standard errors in parentheses. Square brackets contain additional p-values corrected for multiple hypothesis testing using the false discovery rate. All columns include village-level xed effects, control for a vector of individual characteristics, and cluster standard errors at the level of the intervention cohort. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. Outcome measures are listed on the left, and are described in detail in Section 4. The Tower of London is a lab game that measures a participant's ability to plan ahead. The General Self-Ef cacy score measures a participant's belief in their own ability to achieve the outcomes they desire. Time preference parametersˇand ı measured over money are derived from responses to Multiple Price Lists (MPL).
(1) Notes: OLS estimates of baseline balance on observed characteristics for villages with and without WASH chlorine dispensers. For each variable, we report the mean of villages without a chlorine dispenser, with the standard deviation in parentheses. Column (2) reports the difference for villages with a chlorine dispenser, with standard errors in parentheses. All standard errors are clustered at the level of the intervention cohort. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level.
T F.5. Attrition analysis: treatments vs. active control. (1) (3) Notes: OLS estimates of the probability of attriting relative to the active control group. For each variable, we report the coef cients of interest, and standard errors in parentheses. Each column represents a different speci cation, with or without controls and interaction terms to assess whether i) there was differential attrition for groups with certain observed characteristics (columns (4)-(6)) and ii) there was any differential effect of an observed characteristic on the probability of attriting for any treatment group (columns (7)-(9)). All standard errors are clustered at the level of the intervention cohort. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. T F.6. Attrition analysis: active treatments vs. pure control. (1) (3) Notes: OLS estimates of the probability of attriting relative to the pure control group. For each variable, we report the coef cients of interest, and standard errors in parentheses. Each column represents a different speci cation, with or without controls and interaction terms to assess whether i) there was differential attrition for groups with certain observed characteristics (columns (4)-(6)) and ii) there was any differential effect of an observed characteristic on the probability of attriting for any treatment group (columns (7)-(9)). All standard errors are clustered at the level of the intervention cohort. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level.

G.1. Tower of London Planning Task
In our computerized version of the task, participants see a screen with two parts: on the left side is the word "start" with a picture of three "pegs" and various shapes positioned on the pegs; on the right side is the word "goal" with a similar picture of three "pegs" and the same shapes positioned differently on the pegs. To complete the task, participants must reposition the shapes underneath the "start" on the left to match the "goal" position on the right. They are instructed to complete each round in as few moves as possible, with the minimum number of moves shown as a number on the screen. In addition to a practice round, participants attempt four rounds of increasing complexity, beginning with one shape requiring only one move, and concluding with three shapes in a pattern that necessitates at least four moves. In all rounds, participants are limited to a maximum of 10 moves. If this occurs, the round ends and the participant is required to contact a staff member to ensure she understands the task before continuing to the next round. Therefore, the distribution of scores is censored at both ends. Performance on the Tower of London task is computed as the total number of moves used across the four rounds. An example of the participant's screen is shown in Figure G.1. Payment was based on a randomly selected round, with a payment of KES 250 for completing the task with the minimum number of moves, and a KES 50 deduction for each additional move.

G.2. Effort Discounting Task
Following recent innovations in the elicitation of time preferences (Andreoni and Sprenger 2012;Augenblick et al. 2015), we estimate time preferences in the effort domain, using the methodology of Augenblick (2017): participants choose how many units of an effort task they want to complete at a time t for a piece rate w, where t is 0, 1, 7, or 8 days from today, and the piece rate w is KES 2, 6, or 10. Variation in time identi es the discount rate, while variation in piece rates identi es the curvature of the utility function. One time and one piece rate are randomly implemented at the end (described below). Figure G.2 provides an example of the participant interface for the task. 1 In contrast to Augenblick (2017), we hold the time of decision constant and vary the time of effort provision, which requires us to control for weekday effects. All questions required a minimum effort allocation of one task at each time to control for the xed costs of starting, and allow a maximum of 50 tasks.
Developing an effort task that is adapted to a eld setting in a developing country, with low levels of literacy, was challenging: the required variation in timing meant that effort could not be completed in the laboratory. We needed to monitor and enforce when participants supply effort, and how much, while they are in their homes, and do 1. To consider the possibility that respondents feel obligated to carry out some effort regardless of the wage, a subsample of participants was also asked how many units of effort they would supply for a piece rate of KES 0 (but still receiving the KES 100 completion bonus explained below). not have access to a computer. We thus developed a new effort task that is adapted to our setting: participants completed data entry tasks by SMS, using toll-free numbers administered by the Busara Center. 2 Each SMS was supposed to contain a 30-digit random number string, which takes approximately two minutes to type. Participants were given a sheet with 50 such strings, including a counter to keep track. To ensure comprehension, participants completed one practice SMS during the survey. At the end of the survey, one decision (out of 12) was randomly selected to be the "decision that counts": at the selected piece rate and time horizon, participants had to send the exact number of SMS they chose. If they did, they received the full piece rate payment plus a KES 100 completion bonus. If they failed to implement the decision they made, they lost both the payment for this task and the completion bonus (see Augenblick (2017) for a full description of this method). 3 Earnings from this task were paid 14 days from the survey date, regardless of the selected effort time horizon.
We estimate time preferences over effort following the approach of Augenblick (2017) by assuming quasi-linear utility (linear in money, convex in effort) and a power cost of effort function. We additionally assume quasi-hyperbolic discounting. Following DellaVigna and Pope (2017), we allow for a non-monetary reward s, which participants receive for each task in addition to the piece rate. The non-monetary reward captures a range of motives, from norm or sense of duty, to reciprocity towards the employer (for the at payment), to intrinsic motivation and personal competitiveness. It was motivated by the observation that participants supply non-zero amounts of effort even for low piece rates (DellaVigna and Pope 2017). The optimal level of effort is thus given by e* D argmax .s CD m .14/ ' w/ e ˇI .t >0/ ı t ..1= /e Cd w e/ (G.1) whereˇand ı capture (hyperbolic) temporal discounting of effort, w is the piece rate, D m .14/ captures monetary discounting of the payment in 14 days (this is constant for all questions, and thus allowed to differ from effort discounting), t is the time of effort provision, > 1 captures convex costs of effort, ' is a slope parameter, and d w are weekday indicators which allow the opportunity cost of time to vary across weekdays. Within the non-linear objective function above, we estimate additive treatment effects of V, P, and AC on the parametersˇ, ı, s, and . 4 Sixty-six percent of participants identi ably sent at least one SMS (that was not a practice SMS during the session), 60% sent the correct number of SMS during the correct time window, and 41% additionally satis ed the required accuracy threshold (see footnote 3) and got paid. The key challenge for the veri cation of the effort task was matching SMS to participants: despite various safety provisions (including name and subject ID in each SMS, asking participants to report all phone numbers they might use), 59049 SMS from 3144 phone numbers could not be matched to any of our 2983 participants. This challenge arises from a eld setting where individuals commonly share multiple phones within or across households (see footnote 2).
To test for dif culties in access to phones, we included a small module in the endline survey in which participants were asked about their access to a mobile phone, particularly at the times necessary to complete the SMS task. To alleviate the concern that respondents did not understand the payment scheme, we included three multiplechoice comprehension questions immediately before the task that asked participants to calculate the payout in different circumstances. Respondents could not participate in the task until they had answered the comprehension questions correctly. Table G.4 shows phone access and task comprehension by treatment group. We nd high rates of phone access and comprehension across all treatment groups, and no large differences across treatment groups. The exception is the pure control group, which showed lower comprehension at endline compared to the active treatment groups, presumably because it was their rst time completing the task, while the other groups had already experienced it at baseline. We therefore interpret differences in time preferences between this group and the others with caution.

G.3. Money Discounting Task
In addition to the effort discounting task, we included a conventional Multiple Price List (MPL) task to measure monetary discounting. Participants were asked to make 10 choices between payments at earlier and later dates. The payment at the early date was always KES 100, while the payment at the later date increased gradually from KES 110 to KES 300, using gross interest rates 1.1, 1.25, 1.75, 2, and 3. Each decision was rst made in a near time-frame (today vs. four weeks from today), and later in a future time-frame (four weeks vs. eight weeks from today). The list of decisions is presented in Table G.1. One decision was randomly selected to be paid out. As outcome measures from the MPL we estimateˇand ı in the quasi-hyperbolic discounting model of Laibson (1997), assuming linearity of utility in money.

G.4. Alternative Mechanisms
Beliefs About Effectiveness of Chlorination. We assess differential beliefs across treatment groups about the proportion of pediatric diarrhea cases which can be prevented by water chlorination. At baseline, all participants in the active treatment groups ("Visualization," "Planning," and "AC+INF") are told that water chlorination reduces childhood diarrhea by approximately one third. At endline they are asked this question in a multiple-choice format. We take the proportion of cases the participant believes chlorine can avert as a measure of belief about chlorine effectiveness.
Knowledge of How to Use Chlorine. We assess differential knowledge across treatment groups of how to use chlorine to sanitize water. We ask two multiple-choice questions at endline, to which all three active treatment groups were told the correct answer at baseline: (i) how much chlorine to add to water; (ii) the amount of time that needs to pass after chlorine is added for water to be safe to drink.

Risk Preferences.
We include a modi ed Eckel-Grossman task to account for changes in risk preferences (Charness et al. 2013). Participants choose between one of three 50/50 lotteries, represented as bets on a coin ip. We construct an ordinal measure of risk aversion based on the expected payout the participant is willing to forgo for an increase in certainty of payout.

Salience of Chlorination.
We test for the possibility that our treatments differentially increased the salience of water chlorination. During the endline survey, enumerators read out three lists of nine words each to every participant, and asked her to recall as many words as possible directly after reading each list. Participants were paid KES 5 for every word they remembered. Each list contained three categories of future-related words (chlorine, savings, and farm investment), as well as non-future related ller words. The word lists are available in original Swahili and English translation in Table  G.2. We estimate salience effects using equation G.2: where w i m is an indicator for participant i correctly recalling the word related to chlorine in list m; X im refers to the number of words that the individual correctly recounted from that list; ı m is a xed effect for list m; and T j are treatment indicators. We test H 0 W˛1 D˛2 D˛3, with the null hypothesis corresponding to no differential salience of chlorine across (active) treatment groups. In case our treatments differentially affected the salience of chlorine, we further test whether this is due to an increased salience of future-oriented behaviors in general -which may result from our main psychological mechanisms of interest. To this end, we estimate whether the differential treatment effect on chlorine words also holds for two other future-oriented behaviors (saving and farm investment), which were not emphasized in the sessions. We estimate where w i mn is an indicator for participant i correctly recalling the words in list m from future oriented behavior n (chlorination, savings or farm investment); and chlorine n is a dummy for the word being related to chlorine. The a j coef cients capture increased future orientation due to treatment, while the b j coef cients indicate that salience increased differentially for chlorination. We test H o W b 1 D b 2 D b 3 , with the null hypothesis corresponding to no differential salience of chlorine across (active) treatments.
G.4.1. Schedule of Tasks and Treatments. Participants were assigned randomly to attend baseline and intervention sessions either in the morning or in the afternoon. While participants were encouraged to attend the session type assigned to them, they were allowed to switch to the other session time if necessary to minimize attrition. Within a geographical region and within each treatment group, participants were invited to sessions in alphabetical order, based on the rst letter of their last name. Participants were invited to a 7:30AM or 12:30PM session at a village hall in their area. Sessions lasted between two and four hours. Participants received short breaks between each item on the agenda.
During zTree portions of the session, each participant sat in front of a Windows tablet computer, suf ciently spaced to prevent participants from seeing the answers of their neighbors. One enumerator read instructions and answer options aloud in Kiswahili from the center of the room, while several others were available to answer individual questions or assist with the technology.
During the SurveyCTO questionnaires at endline, ve to eight enumerators went through questionnaires with participants individually, in the order that participants arrived.
Interventions were carried out in cohorts of approximately ve, in a circle outside when weather permitted. Groups were physically separated to ensure participants could not be overheard. All participants received the same intervention on a given day.  (1) and (6) report the mean and standard deviation of the control group. Columns (2)-(3) and (7)- (8) report the coef cients of interest and standard errors in parentheses. All columns include village-level xed effects and a vector of individual characteristics, and cluster standard errors at the level of the intervention cohort. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. , which were included in the long-run follow-up survey. Respondents are assigned to a group A (B), and told "We hypothesize that people who participated in this study and received the same treatment as you will give higher (lower) responses to these questions than others." They are then asked how often they added chlorine to water collected from their primary source in the last 7 days. Following de Quidt et al. (2018), the responses can be used to obtain bounds a C . / and a . / for the impact of experimenter demand effects on self-reports. The Notes: OLS estimates of treatment effects on chlorine in water (TCR) after 12 weeks, with additional controls for testing order within the village. For each variable, we report the coef cients of interest, and standard errors in parentheses. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. All columns include village-level xed effects, a vector of individual characteristics, and standard errors which are clustered at the level of the intervention cohort. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. Notes: OLS estimates of treatment effects, using the speci cation from the PAP, without survey date xed effects. For each variable, columns (1) and (6) report the mean and standard deviation of the control group. Columns (2)-(3) and (7)-(8) report the coef cients of interest and standard errors in parentheses. * denotes signi cance at 10 pct., ** at 5 pct., and *** at 1 pct. level. Square brackets contain additional p-values corrected for multiple hypothesis testing using the false discovery rate. All columns include village-level xed effects, a vector of individual characteristics, and standard errors which are clustered at the level of the intervention cohort. The sample in all regressions is restricted to participants in active treatment groups who attended the baseline survey. Where available, we control for the baseline outcome of the dependent variable. Outcome measures are listed on the left, and are described in detail in Section 4.

Appendix H: Robustness Checks
T H.4. Psychological outcomes (without survey date xed effects).