Choice with Endogenous Categorization

We propose a novel categorical thinking model (CTM) where the framing of the decision problem affects how the agent categorizes each product, and the product's category affects her evaluation of the product. We show that a number of prominent models of salience, status quo bias, loss-aversion, inequality aversion, and present bias all fit under the umbrella of CTM. This suggests categorization as an underlying mechanism for key departures from the neoclassical model of choice and an account for diverse sets of evidence that are anomalous from its perspective. We specialize CTM to provide a behavioral foundation for the salient thinking model of Bordalo et al. (2013), highlighting its strong predictions and distinctions from other existing models.


Introduction
Psychologists have long held that knowledge about our environment is organized into categories, and that this categorization plays a key role in decision making. Categorization has been used by both humans and animals for thousands of years. As Ashby & Maddox [2005] write, "All organisms assign objects and events in the environment to separate classes or categories... Any species lacking this ability would quickly become extinct." Categorization plays a key role in a number of important anomalies for the neoclassical model of choice. Attributes categorized as losses get higher weight relative to those categorized as gains [Tversky & Kahneman, 1991]. An object's most salient attribute plays a disproportionate role in the agent's subsequent evaluation [Bordalo et al., 2013]. Subjects avoid objects they categorize as not-obviously-better-than the status-quo [Masatlioglu & Ok, 2005]. Agents are less patient when deciding between dated rewards in the short-term than in the long-term [Strotz, 1955]. Allocations among members of society are evaluated according to whether inequities are advantageous or disadvantageous [Fehr & Schmidt, 1999]. This paper proposes and axiomatizes a simple model of the role that categorization plays in economic decisions. In the Categorical Thinking Model (CTM), a decision maker (DM) first groups objects together into categories, consciously or unconsciously, then evaluates each object through the lens of the category to which it belongs. The model has two key features motivated by psychological evidence. First, categorization is context-dependent, as summarized by a reference point that may depend on the choice set. Second, how an object is categorized affects its valuation. Prominent models of loss-aversion, salience, status quo bias, present bias, and inequality aversion all fit under the umbrella of CTM. Hence, CTM suggests categorization as an underlying explanation for many key departures from the neoclassical model in many different decision-making environments.
To make our results comparable with previous work, we begin by assuming that a family of reference-dependent preference relations describe the DM's choices for each reference point. Each alternative has a pair of observed attributes, such as price and quality, height and weight, or size and timing of a reward. In CTM, the context in which the decision takes place determines a reference point, which in turn divides the alternatives into categories. Each category has its own utility function, and within a given category, the DM evaluates the options according to it. Hence, the DM makes different trade-offs between the attributes when they are differentially categorized. We show that the DM conforms to CTM if and only if she behaves as a standard DM when comparing objects categorized the same way. That is, her choices satisfy some standard axioms, such as acyclicity, and do not depend on the reference point when restricted to alternatives that belong to the same category.
CTM is a parsimonious approach to incorporating psychological evidence into economics. Psychological factors determine how each alternative is perceived, which CTM captures through different categories. Moreover, they predict how being categorized in a particular way affects the DM's choice, which CTM captures through the category's utility function. For instance, salience and loss-aversion make distinct predictions about when a DM puts higher weight on a dimension. The most salient attribute gets more weight, as does an attribute classified as a loss. Our result shows that CTM closes the model by requiring that the DM acts consistently within the alternatives categorized the same way.
Despite its generality, CTM makes testable predictions and excludes certain types of modeling choices. For instance, a number of models capture salience effects, including the salient thinking model [Bordalo et al., 2013] (BGS), Kőszegi & Szeidl [2013], Bhatia & Golman [2013], Gabaix [2014], and Bushong et al. [2015]. Of these models, only BGS is a CTM. In other words, even the most general version of BGS excludes these models, so BGS offers a different method of modeling salience. Our results highlight trade-offs between the different modeling approaches. For instance, BGS maintains a stronger consistency condition across reference points than does the constant loss aversion of [Tversky & Kahneman, 1991], but the latter, unlike BGS, satisfies Monotonicity across regions.
We then provide the first complete characterization of the observable choice behavior equivalent to the BGS model, clarifying and identifying the nature of the assumptions used in the model. The first crucial step towards understanding the model is getting a handle on its novel salience function that determines which attribute stands out for a given reference point. We study the salience function based on a simple observation: while it influences which attribute is salient, the weight given to each attribute is independent of its magnitude. This makes BGS a special case of CTM, so our earlier results allow a characterization.
One key feature of BGS is that the reference point is endogenously determined by the set of available options. Since the salience of each alternative depends on the reference point, varying the budget set affects the salience of, and so the DM's evaluation of, a given alternative. Our final contribution addresses this challenge by extending our characterization of CTM to the setting where the reference point is endogenous. Our primitive is a choice correspondence describing the DM's choices.
The menu maps to a reference point, such as the average level of each attribute over alternatives in the set. As long as the reference point varies systematically with the choice problem, we characterize the properties of the choice correspondence equivalent to CTM. Specifically, we show that if the DM's choices obey the natural analogs of our earlier axioms, then CTM rationalizes her behavior. We apply it to provide a completely endogenous characterization of the BGS function.
The paper proceeds as follows. The next subsection provides a brief overview of the relevant psychology literature on categorization. Section 2 introduces CTM and discusses the models covered under its umbrella. Section 3 axiomatizes CTM and compares and contrasts the models of riskless choice discussed in Section 2. Section 4 contains our analysis of the salient thinking model. Section 5 introduces the endogenous reference point setting, and applies our axiomatizations of CTM to it. Section 6 concludes with a discussion of related literature. First, categories are context dependent. Tversky [1977], Tversky & Gati [1978] present evidence that replacing one item in a set of objects can drastically alter how people categorize the remaining objects. Tversky & Gati [1978] argue that categorization "is generally not invariant with respect to changes in context or frame of reference." For example, they show that subjects put East Germany and West Germany into the same category when the salient feature is geography or cultural background, but categorize the two differently if political system is salient. Similarly, Choi & Kim [2016] posit that depending on the context an Apple Watch can be categorized as a tech product, a fashion product, a fitness product, or a simple watch. Ratneshwar & Shocker [1991] show that subjects categorize ice cream and cookies together in terms of similarity (e.g. they are both desserts), but categorize ice cream and hot dogs together in terms of usage benefit (e.g. both are good snacks to have at the pool). Stewart et al. [2002] present evidence that relative magnitude information, derived from a comparison of the reference point, is used in categorization of sounds.
Second, how an object is categorized affects its final valuation. In a classic series of experiments, Rosch [1975] shows that differently categorized but physically identical stimuli are perceptually encoded as distinct objects. Wanke et al. [1999] demonstrate that "wine" is evaluated more positively when categorized with "lobster" than with "cigarettes." Mogilner et al. [2008] show that categorizing goods differently resulted in different reported satisfaction. Chernev [2011] shows that bundling a healthy food item with a junk food item reduced the reported caloric content beyond that of the junk food alone.
Finally, categories take the form of regions in the alternative space. This tracks very closely with the decision bound theory in the psychology.

Model
To aid in comparison with the existing literature and to separate the effects of reference point formation, we follow Tversky & Kahneman [1991] by taking as given a family of reference-dependent preference relations. We assume that the space of alternatives is X = R n ++ , focusing on n = 2 when not otherwise noted. 1 We often use the convention of writing x as (x i , x −i ) with x −i denoting the components of x different for i. The next subsections explore three different interpretations of X in different contexts: as a riskless object with different attributes, as a dated reward or consumption stream, and as an allocation of consumption across individuals. For each reference point r ∈ X, the DM maximizes a complete and transitive preference relation, denoted by r , over X. As usual, r denotes strict preference and ∼ r indifference. The primitive of the model is a family of such preferences indexed by the set of reference points, { r } r∈X . In this section, we assume that the reference point is exogenously 1 We note when there is a distinction between general n and n = 2. Theorem 5 and the results that rely on it use the full structure of R n ++ . The remaining results all generalize to any X that is a finite Cartesian product of open, linearly ordered, separable, connected sets endowed with the order topology, where X itself has the product topology.
given. We relax this assumption in Section 5 to allow endogenous reference point formation.
2.1. Categorical Thinking Model. The first ingredient of the model is a mapping from the reference r to categories. Each category corresponds to a different psychological treatment and changes as the reference changes. We allow the categories to have a very general structure.
if each K k : X → 2 X satisfies the following properties: (1) K k (r) is a non-empty, regular open set, and cl(K k (r)) is connected, 2 Categories arise from the psychology of the phenomenon to be modeled. For CTM to be applicable, the psychology must make an unambiguous prediction about which alternatives are affected. For instance, with gain-loss utility, alternatives that dominate the reference point are treated differently than those better in only one dimension.
Similarly, with present-bias, alternatives that pay-off sooner than the reference are categorized together. While we take the categories as given, if the psychology only makes partial predictions, then the categorization of other alternatives can often be inferred from choice. Proposition 1 does so for the salient thinking model.
We interpret the properties as follows. Every category contains some alternative for every reference point. If a particular product, say x, belongs to the category k, then so do all products that are close enough to x. There is a path that stays within the category between any two points, so categories cannot be the union of "islands." Almost every alternative is in at least one category, and none are in two categories.
Finally, if the reference point does not change too much, then neither do the categories.
The consumer values each good in a way that depends not only on alternative of a product, as in the standard neoclassical model, but also on the category to which the product belongs. When alternatives x and y are both categorized in category k, the category utility function U k : X → R represents the DM's choices. That is, she prefers x to y if and only if U k (x) ≥ U k (y). We focus on the effect of categorization on distorting trade-offs, so we require that a category utility function is additively separable and monotonic: is strictly monotone and continuous. 4 The utility index U k i represents the DM's preferences over dimension i when an alternative belongs to the category k.
When alternatives belong to different categories, the reference point may affect the DM's choice. If the alternative x lies in the category k when the reference is r, that is, x ∈ K k (r), then the value of consumption x is represented by U k (x|r). However, the reference does not affect the utility trade-off within a category. To capture this, we require that U k (·|r) agrees with U k , in the sense that it is an increasing transformation thereof. Then, U k (x|r) ≥ U k (y|r) if and only if U k (x|r ) ≥ U k (y|r ) for any references r, r ∈ X. We can now formally define the model as follows. under category function K = (K 1 , K 2 , . . . , K m ) if for each category k there is a category utility function U k so that when x ∈ K k (r) and y ∈ K l (r) for some r x r y ⇐⇒ U k (x|r) ≥ U l (y|r) and U k (·|r) is an increasing transformation of U k (·) for each r ∈ X and category k.
A CTM is increasing if U k i is increasing in x i for every category k and dimension i. We also consider two sub-classes: A CTM is affine if U k (·|r) an affine transformation of U k for each r. A CTM is strong if U k (·|r) = U k (·) for each r. Most of the models we discuss below are affine CTM, and those of riskless consumer choice are all increasing. 4 That is, U k i is either strictly increasing on R + or strictly decreasing on R + .

Riskless Consumer Choice.
In this subsection, we consider our primary application: riskless consumer choice. The four models are introduced formally, and each is shown to be CTM. Figure 1 plots their indifference curves and categories, with darker lines indicating higher utility.  Bordalo et al. [2013] propose an intuitive and descriptive behavioral model based on salience. In the model, an attribute receives more weight when it is salient than when it is not. The magnitude of salience is determined by a salience function, σ := R ++ × R ++ → R + . Given a reference (r 1 , r 2 ), attribute 1 is salient for good x if σ(x 1 , r 1 ) > σ(x 2 , r 2 ), and attribute 2 is salient for good x if σ(x 1 , r 1 ) < σ(x 2 , r 2 ). 5 That is, the salient attribute is the one that differs the most from the reference according to the salience function.
Definition 3. The family { r } r∈X has a BGS (σ; w 1 , w 2 , u 1 , u 2 ) representation if each r is represented by for a salience function σ, strictly positive weights with , and each u i strictly increasing.
To illustrate this model, consider the salience function proposed by BGS: Based on it, the left-upper panel in Figure 1 illustrates BGS. There are two categories: those that are 1-salient, i.e. σ(x 1 , r 1 ) > σ(x 2 , r 2 ), and those that are 2-salient, i.e.
σ(x 2 , r 2 ) > σ(x 1 , r 1 ). To visualize them, note that the entire product space is divided into four distinct areas by the two dashed curves that intersect at the reference point.
The areas lying the north and south of the reference point are categorized as the 2-salient products. Similarly, 1-salient products lie east and west of the reference point. The figure incorporates indifference curves as well, holding fixed the reference point. There are two potential sets of indifference curves, illustrated by dotted lines.
Depending on the category, one of the two is utilized to determine the DM's choice.
When attribute 1 is salient, the steeper one becomes the indifference curve since it puts higher weight on the first attribute. Conversely, the flatter one is the indifference curves when attribute 2 is salient. We draw two different indifference curves, where the darker color corresponds to higher utility.
, which attaches equal weight to each attribute. If she experiences a loss in attribute i, then she inflates the weight attached to that attribute by λ i > 1. There are four different categories in the TK formulation: (i) gain in both dimensions, (ii) gain in the first dimension and loss in the second dimension, (iii) loss in the first dimension and gain in the second dimension, and (iv) loss in both dimensions (see the right-upper panel in Figure 1). We model this the gain-loss category function. Then, the utility function is where λ 1 , λ 2 > 0 (> 1 if loss averse) and each u i strictly increasing. TK is a special case of affine CTM with four categories defined by a gain-loss category function. an alternative is unambiguously superior to the status quo, the DM does not feel any psychological discomfort to forgo the status quo; in such cases there will be no cost.
Formally, Q(r) is a closed set denoting the alternatives that are unambiguously superior to the default option r (see the left-bottom panel of Figure 1). If an alternative does not belong to this set, then the DM pays a cost c(r) > 0, which may depend on the reference point, to move away from the status quo. In this model, there are two categories . This is an example of an affine CTM for general c, and a strong CTM when c(r) is constant.
Prototype Theory (PT): Prototype theory was first proposed by Posner & Keele [1970]. According to it, each category is associated with a prototype, its "most typical" member. Initial categorization is determined by comparing each product to each prototype. We now formalize this idea and show that this is CTM.
There are m prototypes, p 1 , . . . , p m . The DM categorizes alternatives according to how similar they are to a given prototype. Then, category K i (r) is the set of alternatives categorized as most similar to exemplar p i . Similarity may depend on the reference. There is a family of metrics indexed by r so that d r (x, y) indicates how far away the DM perceives x to be from y given reference r. Formally, and the DM evaluates alternatives in category i according to where U (·) is a hedonic utility function and λ i j > 0. A particularly interesting specification is where λ i j = ∂ ∂p i j U (p i ). Then, the DM approximates the utility of x according to a first-order Taylor expansion around the prototype most similar to it (see the right-bottom panel of Figure 1). 6 This is an example of a Strong CTM.

Time Preferences.
We apply our model to choices of dated rewards. The pair (x, t) represents a payment of x at time t. Motivated by present bias, we propose a model where the DM divides time periods according to short term and long term.
Given a reference r = (r x , r t ), rewards arriving before r t are perceived as a short-term and after r t as long-term. Hence The utility function is where 0 < δ < 1 and 0 < β ≤ 1. The model is additively separable after taking logs, so it is a special case of CTM. It exhibits present bias when β < 1: there exist values y > x > 0 so that the DM prefers (x, τ ) r (y, τ + 1) if and only if τ < r t − 1. 7 Figure   2 plots its indifference curves.  In the Relative Inequality Aversion (RIA) model, the DM i feels guilty if her own relative gain (x i −r i ) is higher than the relative gain of individual j (x j −r j ). Otherwise, Observe that when r i = r j for all i and j, the utility function reduces to that of Fehr & Schmidt [1999]. Throughout, we follow them in assuming that α ≥ β ≥ 0 and β < 1 We illustrate that this model is CTM by using the two-person version of it. With two individuals, the category function can be written as The set K j R (r) contains allocations where individual j gets a relatively better deal than the other. The relative inequality aversion model can be written as which is an affine CTM.
Distributional Preferences: Charness & Rabin [2002] propose a model of social preferences where utility is increasing with the minimum of all individuals' payoffs and The parameter δ ∈ (0, 1) measures the degree of concern for helping the worst-off individual (Rawlsian) versus maximizing the total social payoffs (Utilitarian). The parameter λ ∈ (0, 1) measures how the DM balances social welfare with her own utility, where λ = 0 captures pure self-interest.
We propose a natural extension of their model with an exogenously given reference point. We call this model Reference-Dependent Distributional Preferences. That is, According this model, each individual cares to maximize the minimum possible relative We show that this model is CTM. To do that, we first define categories for this model. Each category corresponds to the individual who has the worst relative payoff.
In this case, showing that the model is an affine CTM.

Behavioral Foundation for CTM
In this section, we provide a set of behavioral postulates characterizing increasing CTM. These postulates represents the key features of the model. We show that they hold if and only if the data is representable by increasing CTM, rendering the model behaviorally testable. In subsequent subsections, we explore the various strengthenings of the model and provide axiomatizations of these as well.
For each category k, define the revealed ranking within that category k so that x k y if and only if there exists r such that x, y ∈ K k (r) and x r y. The sub-relations k and ∼ k are defined in the usual way. The ranking k captures preference within category k. The following axiom states that the within-category revealed preference has no cycles.
Weak Reference Irrelevance ensures that the DM reacts consistently to alternatives when they are categorized the same way. That is, the categories reflect the DM's psychological treatment of the alternative. Although she may have choice cycles, these cycles occur only when the context changes how the DM categorizes alternatives. Since k is acyclic, we can take its transitive closure to derive full comparisons. Let k * be is transitive closure, with k * and ∼ k * the asymmetric and symmetric parts.
Within a category, preference has an additive structure. The next axiom implies that each r satisfies Cancellation when restricted to a given category.

Axiom 2 (Category Cancellation).
For all x 1 , y 1 , z 1 , x 2 , y 2 , z 2 ∈ R + , r ∈ X, and cate- Category Cancellation adapts the well-known Cancellation axiom to our setting, differing in its requirement that the alternatives belong to the same category. Without the qualifiers on how alternatives are categorized, the axiom is a well-known necessary condition for an additive representation that appears in Krantz et al. [1971] and Tversky & Kahneman [1991], among others. If X has strictly more than two dimensions, then we can replace it with the analog of P2 [Savage, 1954]; see Debreu [1959]. 8 The next axiom requires that Monotonicity holds between objects categorized the same way.
Axiom 3 (Category Monotonicity (CM)). For any x, y, r ∈ X: if x ≥ y and x = y, then y k * x for any category k; in particular, if x, y ∈ K k (r), then x r y.
Since both attributes are "goods" as opposed to "bads," Monotonicity means that if a product x contains more of some or all attributes, but no less of any, than another product y, then x is preferred to y. The postulate requires that choice respects Monotonicity for alternatives within the same category. However, it does not require that this comparison holds when the goods belong to different categories, and we shall see later that salience can distort comparisons enough to cause Monotonicity violations.
Finally, the family of preference relations is suitably continuous.
Axiom 4 (Category Continuity). For any r ∈ X and any x ∈ i K i (r), the sets Moreover, the set has an empty interior.
Category continuity adapts the usual continuity condition to apply only within a category. It says that when y is preferred to x in a given context and y is close enough to y, then y is also preferred to x, provided that y belongs to the same category as y. The final condition requires that if an alternative x is neither better than everything within category j nor worse than everything within category j, then there exists something in category j that is as good as x, or as good as something arbitrarily close to x. For such an x, the category must intersect almost all indifference curves close to x's since each category is almost connected.
Finally, we make a structural assumption.
Assumption (Structure). The category function K is such that for any category k, the following sets are connected: The Structure Assumption is satisfied all the models we discussed in the previous section. Indeed, E k = R n ++ for every category in these models. These conditions establish that the objects categorized in the same way have enough topological structure so that "local" properties can be extended to global ones. Chateauneuf & Wakker [1993] show that the structure assumption, applied to a single preference relation and domain, is needed to guarantee that a local additive representation implies a global one. We provide a brief outline of how the proof works, and all omitted proofs can be found in the appendix. The axioms are sufficient for a "local" additive representation of r (and thus k ) on an open ball around each alternative within category k. The Structure Assumption allows us to apply Theorem 2.2 of Chateauneuf & Wakker [1993] to aggregate the local additive representation of k into a global one. To do so, we must establish that the global preference is complete, transitive, monotone, and continuous.
We establish these properties for preference within each category by showing that the transitive closure of each k is complete and suitably continuous. The remainder of the proof shows that Categorical Continuity allows us to stitch the different withincategory representations together into an overall utility function.

3.1.
Reweighting. In all of the models discussed in Section 2.2, the DM evaluates the difference between alternatives categorized in the same way similarly. That is, regardless of the category, the DM agrees on how much better a value of x versus y is in dimension i. Categorization affects only how much weight she puts on each dimension. This is captured by the following axiom.
Axiom 5 (Reference Interlocking). For any a, b, a , b , x , y , x, y ∈ X and categories The term "Reference Interlocking" comes from Tversky & Kahneman [1991]. If each k is complete, then their statement of it is equivalent given the other axioms.
Roughly, the DM agrees on the difference in utilities along a given dimension regardless of how an alternative is categorized. To interpret, observe that the first pair of comparisons reveals that the difference between a i and b i exceeds that between x i and y i when the alternatives belong to category k. For alternatives categorized in j, the DM should not reveal the opposite ranking. We defer to the above paper for a detailed discussion.

Theorem 2. Suppose that { r } r∈X conforms to increasing CTM under K and each E k
is connected. For each dimension i, there exist a utility index u i and a weight w k i > 0 for each category k so that each category utility U k is cardinally equivalent to one that

and only if Reference Interlocking holds.
All of the models in Section 2.2 satisfy the axiom, and are thus special cases of increasing CTM satisfying Reference Interlocking. For instance, differences in the salient dimension of BGS receive higher weight, but the relative size of two given differences in the same dimension is the same regardless of whether both are salient or both are not. The axiom implies that the utility index within each category must be the same, up to an increasing, affine transformation.

Behavioral Foundation for Affine CTM.
In this section, we explore when an affine CTM exists. That is, when is U k (·|r) a positive affine transformation of U k (·|r ) for any r, r ? All of the models from Section 2.2 fall into this class. 9 Unsurprisingly, the key restriction relative to CTM is that tradeoffs across categories are affine. As is usual, this is captured by a form of lineariry, or the "Independence Axiom." We require it to hold only when alternatives combined belong to the same category, and adjust for the curvature of the utility index.
To state the key axiom, we define an operation ⊕ k along similar lines as Ghirardato et al. [2003]. For x, y ∈ R and a category k, . If k has an additive representation, Axiom 6 (Affine Across Categories (AAC)). For any r ∈ X, This axiom is a natural adaptation of the linearity axiom, a close relative of the independence axiom. If we strengthened Affine Across Categories to be stated using the traditional linearity condition, then we would obtain a representation where each U k (·|r) is itself an affine function. Otherwise, it requires that the ⊕ k operation preserves indifference.
The second axiom deals with a technical issue.
Axiom 7 (Unbounded). For any r ∈ X: if K k (r) contains a sequence x n so that We note that U k is unique up to a positive affine transformation. Hence whenever the utility of some sequence goes to infinity for some representation of k , it must also converge to infinity for any other representation as well. While the axiom can be stated in terms of primitives, we instead state it in terms of the U k . 11 It ensures that a category containing alternatives whose utility goes to positive (negative) infinity must contain an alternative better (worse) than any other given alternative. If it failed, then no affine transformation of the category utility would represent the preference. All the models discussed in Section 2 fall into the class of Affine CTM, so the result reveals the behavior all have in common. Relative to CTM, Affine Across Categories imposes stronger requirements on how the DM relates alternatives in different categories. Not only does the DM evaluate utility within a category using an additive function, but the additive structure persists across categories. Moreover, this aids with interpreting utility differences. If every pair of categories contains alternatives indifferent to one another, the entire representation is unique up to a common positive affine transformation. We call the combination of Axioms 1-4 and 6-7 the Affine CTM axioms.
3.3. Behavioral Foundation for Strong CTM. For a strong CTM, changing the reference point does not reverse the ranking of two products unless it also changes their categorization. The following axiom imposes this.
Axiom 8 (Reference Irrelevance). For any x, y, r, r ∈ X: For the general CTM, the reference point influences choice trough two channels: the category to which it belongs and its valuation. The axiom eliminates the latter. 11 The statement in terms of primitives involves standard sequences and does not reveal key aspects of behavior, so we instead present the simpler and easier to interpret one above. In special cases, this is easy to do. For instance, if U k is linear, then the axiom simply states that if K k (r) is an unbounded set, then the conclusion of the above axiom holds. When comparing two alternatives across different reference points, the DM's relative ranking does not change when neither's category changes. This property greatly limits the effect of the reference point. In fact, a sufficiently small change in the reference never leads to a preference reversal. Of course, they differ in how alternatives are categorized, but the models also reflect distinct behavior within and across categories.
In addition to Reference Irrelevance, they are distinguished by whether they satisfy two classic axioms: Monotonicity and Cancellation, the unrestricted versions of Category Monotonicity and Category Cancellation. 12 The first requires that a dominant bundle is chosen, and the latter that an additive structure obtains. The representation theorem of Tversky & Kahneman [1991] imposes those two axioms in addition to continuity. In Appendix A.8, we show that an affine CTM with a Gain-Loss category function satisfies the two classic axioms and continuity if and only if it has a TK representation. We provide a detailed examination of the BGS model in Section 4. 14 Whenever c(r) = c(r ) for every r, r ∈ X. The behavior in this example is both intuitively and formally consistent with the salient thinking model of BGS. 15 Without any promotion, the consumer expects to pay a high price for a relatively low quality selection. When choosing between Syrah or Shiraz, the consumer focuses on the French wine's sublime quality, and she is willing to pay at least $6 more for it. When choosing between water and Syrah, the low price of water stands out and she reveals that the gap between wine and water is less than $8. However, when there is no promotion, she focuses again on the quality, and she is willing to pay an additional $2 for even her less-preferred Australian Shiraz over water.
Notice that this explanation does not require that the reference points are different.
Since the consumer visits this bar regularly, intuitively, her reference point should be fixed and stable.
3.5. Non-increasing CTM. For simplicity, we have so far focused on increasing CTM. This is a desirable feature in consumer choice, but models of social preference often violate this property. For instance, inequality-averse individual 1 prefers to increase the allocation to individual 2 from x to y when she feels guilty but not when she is envious. However, she always prefers increasing the allocation to 2 in an allocation categorized as guilty, and to decrease in any categorized as envious. This contradicts Category Montonicity, suggesting the following weakening.
Axiom (Consistent Preference within Category, CPC). For each category k, there exists a set of attributes P k so that if x j ≥ y j for all j ∈ P k , y i ≥ x i for all i / ∈ P k , and x = y, then y k * x.
The set P k contains the attributes for which an increase positively affects the DM's evaluation. CPC requires that the set of positive attributes in a category does not depend on the reference point. For the two-person-RIA model, the set for the 15 Implicitly, the example reveals that the quality of French Syrah is higher than Australian Shiraz which is in turn higher than water. The numerical value of quality assigned to each beverage is irrelevant to the violation of Cancellation. For examples of qualities so that choice can be represented by the BGS model, one can calculate that (−8, q f s ) r (−2, q as ), (−2, q w ) r (−10, q f s ) and (−10, q as ) r (−8, q w ) for q f s = 8, q as = 6.9, q w = 5.1, and the reference point r = ( 1 2 (−10 + −8), 1 2 (q w + q as )) when w = 0.6. "guilty" category is {1, 2} since she strictly prefers increasing everyone's allocation, but the set for the "envious" one is {1} -she prefers more for herself but dislikes others having even more. Note that CM is the special case of CPC where P k includes every dimension for every category.
A CTM is characterized by all the properties of an increasing CTM, except where CM is replaced by CPC. The proof is a straightforward generalization of earlier one, so it is omitted.

BGS Model and Categories
The BGS model is intuitive, tractable, and accounts for a number of empirical anomalies for the neoclassical model of choice. Despite its popularity, it can be difficult to understand all of the implications of the BGS model. Its new components are unobservable, and its functional form rather involved.
The first crucial step towards understanding the model is getting a handle on the novel salience function that determines which attribute stands out for a given reference point. While one can work out the implications of a particular salience function, this exercise is not fruitful since the particular function that applies to a given agent is unobservable. Moreover, it is not clear how the model changes when the underlying salience function changes.
CTM provides a lens through which we can study the salience function. While it influences which attribute is salient, the weight given to each attribute is independent of its magnitude. Therefore, its role is simply to divide the domain into distinct categories, each associated with a particular attribute being most salient. We study the salience function by focusing on the properties of the categories it generates.
Categories are generated by a function s : only if s(x i , r i ) > s(x j , r j ) for all j = i. In the BGS model, categories are generated by a salience function σ that must satisfy the following properties. First, it increases in contrast, i.e. for > 0 and a > b, σ(a + , b) > σ(a, b) and σ(a, b − ) > σ(a, b). Second, it is continuous in both arguments. Third, it is symmetric, i. e. σ(a, b) = σ(b, a). Two other properties are sometimes assumed: σ is Homogeneous of Degree Zero (HOD) if for all α > 0, σ(αa, αb) = σ (a, b), and σ has diminishing sensitivity if for all > 0 and σ(a, b). 16 Finally, we always impose that the salience function is grounded: σ(r, r) = σ(r , r ) for all r, r ∈ X. This is an implication of HOD satisfied by all of the specifications of which we are aware in the literature, and is a necessary condition for an attribute to be salient only if it differs from the reference.
Consider the following properties of categories.

S6: (Equal Salience) For any
The properties have natural interpretations. Any category function satisfies S0 by definition; we include it for completeness. S1 indicates that making a bundle's less salient attribute closer to the reference point does not change the salience of the bundle.
That is, when x and y differ only in attribute l, and y is closer to the reference in that attribute, if x is k-salient, then so is y. S2 requires that the same ranking is used for  out more relative to r 1 than a 2 does to r 2 , and a 2 stands out more relative to r 2 than a 3 does to r 3 , then a 1 stands out more relative to r 1 than a 3 does to r 3 . S4 says simply that any difference stands out more than no difference. S5 implies that increasing both the good and the reference by the same amount in the same dimension does not move the good from one category to another. S6 reads that if every attribute of x differs from the reference point by the same percentage, then none of the attributes stands out. More formally, if the percentage difference between x k and r k is the same across attributes, then x is not k-salient for any k ∈ {0, 1}.  The proof provides an algorithm for this in general. We illustrate for the case where u 1 and u 2 are linear. Fixing a reference point r, any alternative that differs only in dimension i from r must be i-salient. Hence, we can identify the weights on dimensions within each category from the slope of the indifference curve passing through that alternative. Now, we can test whether y is 1-salient by seeing if the indifference curves close to it are those generated by the weights for 1-salient alternatives. Varying y and r allows identification of the salience function, and hence the categories.
In addition to the particular form of categories, BGS satisfies several properties that distinguish it from other CTMs. The most general of these is Reference Irrelevance, above, making BGS a strong CTM. The other follows.

Axiom 9 (Salient Dimension Overweighted, SDO). For any x, y, r, r ∈ X:
if x, y ∈ K k (r) ∩ K l (r ), x r y, x l > y l , and y k > x k , then x r y.
This axiom requires that categories correspond to the dimension that gets the most weight. That is, the DM is more willing to choose an alternative whose "best" attribute is k when it is k-salient. To illustrate, consider alternatives x, y with x 1 > y 1 and y 2 > x 2 . Because x is relatively strong in attribute 1, x should benefit more than cardinal structure on X. Properties S0-S4 are defined. Subsequent results that rely on Theorem 5, such as Propositions 2 and 3, remain true when imposing only S0-S4 in this setting.
y from a focus on it. If x is chosen over y when attribute 2 stands out for both, then this advantage in the first dimension is so strong that even a focus on the other one does not offset it. Hence, the DM should surely choose x over y for sure when attribute 1 stands out for it.

Proposition 2.
Assume that there exists x ∈ K k (r) and y ∈ K j (r) with x ∼ r y for any categories k, j and any r ∈ X. This result characterizes the BGS model. It also provides guidance for comparing it with other models in the CTM class (see Figure 1 and Table 1). By outlining the model's testable implications, the result provides guidance on how to design experiments to test it. 18 In their 2013 paper, BGS focus on a special case where the model is linear: w 1 1 = w 2 2 = 1−w 2 1 = 1−w 1 2 > 1 2 and u 1 (x) = u 2 (x) = x. In an earlier version of this paper, we show this model is characterized by strengthening Affine Across Categories to require linearity and imposing a reflection axiom that requires permuting two alternatives and the reference point in the same way not to reverse the DM's choice between the two. 19 Taken together Propositions 1 and 2 provide an outline for a fully subjective axiomatization of a family of preferences with a BGS representation. Proposition 1 shows that we can reveal a category function from the family of preferences, provided they have a representation. We check whether these revealed categories exist and satisfy S0-S5. If so, then the axioms shown necessary by the second result apply with this revealed category function. 18 The assumption that alternatives indifferent to each other exist in each category for each reference point is not strictly necessary. A sufficient condition for it to be necessary is that the utility indexes are both unbounded above (or below). 19 Formally, the first is that Affine Across Categories holds with ⊕ k replaced by the usual + operation. The second is that (a, b) r1,r2 (c, d) if and only if (b, a) r2,r1 (d, c). One can verify that these additional assumptions imply that the ancillary assumption about indifference holds.

Choice Correspondence
In this section, the modeler observes only the DM's choice from a finite subset of choices and nothing more. A model consists of both a theory of reference formation and a theory of choice given categorization. In this setting, we can jointly test the theory of choice given categorization, categorization given reference, and reference formation.
We model reference formation via a reference generator, a map from finite subsets of alternatives to reference points. We denote the reference generator A : 2 S \ ∅ → X, Fixing a categorization function K and a reference generator A, let X be the set of finite and non-empty subsets of X such that every alternative is categorized. Formally,

with the interpretation that A(S) is the reference point when the menu is S. Examples include the BGS theory that A(S) is the average alternative, that A(S) is the median bundle, that A(S) is the upper (or lower) bound of S, and the Köszegi & Rabin
. We call these menus or categorized menus for short. The requirement ensures that each alternative in the choice set belongs to a category given the reference point A(S). We leave open how the DM chooses when alternatives that are uncategorized belong to the choice set. By leaving the choice from this small set of menus ambiguous, we can more clearly state the properties of choice implied by the model. 20 20 One can, of course, extend the model to account for these choices. For instance, BGS hypothesize that these alternatives are evaluated according to their sum. Complications arise because the uncategorized alternatives are "small:" its complement is open and dense, and moreover it has zero measure.
We summarize the DM's choices by a choice correspondence c : X ⇒ X with c(S) ⊆ S and c(S) = ∅ for each S ∈ X . Adapted to this setting, the model has the following representation.

Reference point formation.
Provided that the reference generator is responsive enough to changes in the menu, there is the possibility of testing the properties required by categorization on r . One example of enough structure is that the reference point is the average bundle. However, this is just one example. An even more general sufficient condition is as follows.

Examples of generalized average reference include the average bundle
the median value of each attribute, and a weighted average , x∈S w(x)x 2 x∈S w(x) for any continuous weight function w : X → [a, b] with b > a > 0. We sometimes impose the additional requirement that A(S) ∈ co(S) \ ext(S) for all non-singleton S; if so, we call A a strong generalized average. The first and last of these examples satisfy this property. The supremum and infimum on their own are not weighted averages, nor (necessarily) is the choice acclimating reference generator, c(S) = A(S). 21

Behavioral Foundations for Strong-CTM.
We now consider the behavior by a DM who conforms to Strong-CTM for a given category function and reference generator. To do so, we make use of our earlier analysis by revealing how the DM evaluates alternatives categorized in a given way. When A(S) is a generalized average, this provides enough structure to identify enough of the family to apply our earlier analysis.
The main behavioral content comes from the choice correspondence equivalent of Reference Irrelevance. To state it, we introduce the following definition and notation.
Definition 5. The alternative x in category k is indirectly revealed preferred to alternative y in category j, written (x, k) R (y, j), if there exists finite sequences of S n , and for each i: We replace Reference Irrelevance with the following weakening of the Strong Axiom of Revealed Preference (SARP).
We first illustrate in a simple two menu setting, analogous to a test case for the Weak Axiom of Revealed Preference (WARP). Consider two menus S 1 and S 2 and two chosen products x 1 ∈ c(S 1 ) and x 2 ∈ c(S 2 ) where both products are categorized in the same way for both menus. For example, x 1 is in category 1 for both menus, and x 2 is in category 2 for both. The observation x 1 ∈ c(S 1 ) reveals that the valuation of x 1 is at least as high as that of x 2 when x 1 belongs to the first category and x 2 to the second. Since the categorization of products does not change when the menu changes 21 Recall sup S = (max x∈S x 1 , max x∈S x 2 ) and inf S is defined analogously. from S 1 to S 2 , their relative valuation stays the same as well. Hence, if x 2 is chosen from S 2 , then x 1 must be chosen too. Since neither products' category has changed, the DM should obey WARP for these two menus. However, the axiom leaves open the possibility of a WARP violation when either is differentially categorized.
The axiom extends this logic to sequences of choices in much the same way that SARP does to WARP. A finite sequence of choices, where the choice from the next menu is available in the current one and has the same salience in both, does not lead to a choice reversal. Since salience does not change along the sequence of choices, the choices do not exhibit a reversal.
Category SARP limits the effect of unchosen alternatives. Modifying them can alter the DM's choice, but only insofar as changing them changes the reference point and thus the salience of alternatives. It states that these unchosen options do not alter the relative ranking of two alternatives, unless they change the region to which the alternatives belong. That is, when comparing the same two alternatives in different menus, the DM's relative ranking does not change when neither's salience changes.
This property greatly limits the effect of the reference point. In fact, a sufficiently small change in the reference never leads to a preference reversal.
The remaining axioms are the natural generalizations to the choice correspondence of Category Cancellation, Category Monotonicity, Category Continuity, Reference Interlocking, and Affine Across Categories. We denote these by appending a "*" to distinguish from their reference-dependent-preference formulation. Appendix B.1 contains their formal statement.
As before, we require some additional topological structure on the categories. For a category k, let The generalization of the structure assumption is as follows.
Assumption (Revealed Structure). For any category k, E R,k is open, E R,k is dense in D k , and the following sets are connected: E R,k , {x ∈ E R,k : x j = s} for all dimensions j and scalars s ∈ R, and {y ∈ E R,k : (x, k) ∼ R (y, k)} for all x ∈ E R,k .
In addition to what was imposed by the Structure Assumption, we require that almost all objects categorized in a category are chosen in some menu. This can be weakened, but is typically satisfied by the models in which we are interested, such as BGS.
We require one last assumption.
Axiom (Comparability Across Regions, CAR). If x ∈ E R,k , then for any j there exists This is a version of the assumption we made for Strong CTM. It requires that every alternative chosen when it belongs to category k is revealed to be equally good to some other alternative when it is categorized in category j. With it, we can now state the result.

Theorem 6. Assume that Revealed Structure and CAR hold and that A is a generalized average. A choice correspondence c conforms to strong-CTM under (K, A) if and only if c satisfies Category-SARP, Category Monotonicity*, Category Cancellation*, Category
Continuity*, and Affine Across Categories*.
The result is the counterpart of Theorem 4 with an endogenous reference point.
The behavior corresponding to categorization does not fundamentally change across settings. As long as the DM reacts consistently when alternatives are categorized in the same way, then we can represent her choices as categorical thinking where the reference point only affects how she categorizes each alternative. The key challenge in the proof is to establish that the arguments we used to establish our earlier results still hold. We adapt our earlier arguments to show that revealed preference within category k is complete on E R,k . This relies on small changes in alternatives not changing choice, a property implied by generalized average. Then, the remaining axioms establish that this within-category preference has an additive representation. CAR allows us to extend across categories.

Behavioral Foundations for BGS.
In this subsection, we provide a behavioral foundation for BGS. The first step is to show that the Revealed Structure assumption holds. A is a strong generalized average, K satisfies S0, S1, and S4, and c satisfies Category Montonicity*, then E R,k = R 2 ++ for k = 1, 2.

Lemma 1. If
Given the assumptions we have made so far, every alternative is chosen in some menu when it is k-salient. Consequently, the revealed structure assumption must hold. While we argue in this paper that a number of prominent behavioral economic models can be thought of as resulting from categorization, few papers in economics explicitly address categorization. Mullainathan [2002]  is not a subset of B z j for any j, j and θ(z j ) < θ(z j+1 ), so x ∈ B z 1 and y ∈ B zm . Moreover, since θ crosses each indifference curve only once, if z k k * z k+1 (z k ≺ k * z k+1 ) for any k, then z j k * z j (z k k * z k+1 ) for any j > j. W.L.O.G. consider the former. Pick a 1 ∈ B z 1 B z 2 Y so that x k a 1 and then pick a j ∈ B z j B z j+1 Y so that a j−1 k a j . Then, x k * a 1 k * a 2 * · · · k * a m k * y. Since k * is transitive, we conclude x k * y. Since x, y were arbitrary, k * is complete.
Apply CW Theorem 2.2 to get an additive representation Proof. If neither (ii) nor (iii) holds, then after relabeling categories if necessary, there exist x ∈ K i (r) and y, z ∈ K j (r) such that y r x r z. Let U C j (x) and LC j (x) be the strict upper and lower contour sets of x in category j for reference r. Any ] is indifferent to x, so either (i) holds or the set is empty. There exists an > 0 such that for every x ∈ B (x), y r x r z by Category Continuity and hence K j (r) = U j (x ) and K j (r) = L j (x ). By Category Continuity, there exists x ∈ B (x) such that K j (r)\[U C j (x ) LC j (x )] = ∅ (otherwise, B (x) is contained in the interior of the set considered), so we can take y ∈ K j (r) \ [U C j (x ) LC j (x )] and conclude y ∼ r x .
We omit the dependence on r when clear from context. Define the relation r by x r y if there exists an indifference sequence of categories (Q 1 , . . . , Q m ) with x ∈ Q 1 and y ∈ Q m . It is easy to see that r is an equivalence relation (reflexive, symmetric, and transitive). Let [x] r denote the r equivalence class of x.

Lemma 5. If y /
∈ [x] r and x r y, then x r y for all x ∈ [x] r and y ∈ [y] r .
Proof. Fix x, y, r ∈ X with y / ∈ [x] r and x r y, and assume x ∈ K k . Pick any y ∈ [y] r . By definition, there is an IS (Q 1 , . . . , Q m ) with y ∈ Q m and y ∈ Q 1 . Let i = 1 and y 1 = y. If there exists y ∈ Q i with y r x, then y r x r y i , so by Lemma 4, we can find z ∈ Q i and x ∈ K k with z ∼ r x . If that occurs, then (K k , Q i , . . . , Q 1 ) is an IS and y ∈ [x] r , a contradiction. Thus x r y for all y ∈ Q i . Now, there exists y i+1 ∈ Q i+1 with x r y i+1 by transitivity and definition of IS. Hence, we can apply above logic to Q i+1 as well: x r y for all y ∈ Q i+1 . Inductively, this extends all the way to Q m , so x r y in particular. Since y is arbitrary, this extends to any y ∈ [y] r .
Similar arguments show that x r y for any x ∈ [x] r . Combining, x r y whenever x ∈ [x] r and y ∈ [y] r .
Fix a reference point r. Let A 1 , . . . , A n be the distinct equivalence classes of r . By Lemma 5, these sets can be completely ordered by r , i.e. A i r A j ⇐⇒ x r y for all x ∈ A i and y ∈ A j . Label so that A 1 r A 2 r · · · r A n .
Pick an indifference class A i and an IS Q 1 , . . . , Q M that contains points in every There is no loss in assuming that V i is bounded, and the closure of its range is an interval. 23 Now, assume inductively that, for a given m ≤ k, V i represents r when restricted to m−1 j=1 Q j ≡ Q m−1 , is bounded, is continuous on Q m−1 , and is an increasing transformation of U k within Q j when Q j = K k (r). Then, extend V i to Q m as follows. By Lemma 5, it is impossible that y r x for every x ∈ Q m−1 and every y ∈ Q m . It will be convenient to relabel regions so that Q m = K m (r).
Pick a bounded, strictly increasing, continuous h : R → R. For any x ∈ K m (r) so that x r y for all y ∈ Q m−1 , set For any x ∈ K m (r) for which there exists y, y ∈ Q m−1 so that y r x r y , let For all other x ∈ K m (r), let This V i is bounded and continuous.
We now show that it represents r on Q m . Pick x, y ∈ Q m . There are four cases: Case 1: x, y ∈ Q m−1 : then the claim follows by hypothesis. Case 2: x ∈ K m (r) and either x r y for all y ∈ Q m−1 or y r x for all y ∈ Q m−1 : the claim is immediate. Case 3: x ∈ K m (r) and y ∈ Q m−1 : If y r x, then y − r x for some > 0 so that y − belongs to the same region as y. If y ∼ r x, then V i (y) ≥ V i (x). If this does not hold with equality, then there is a y ∈ Q m−1 so that y r x and y r y (since y r y). But then y r x, a contradiction. If x r y but V i (y) ≥ V i (x), there exists z ∈ Q m−1 so that V i (z) ≤ V i (y) and z r x. But then by transitivity and hypothesis, y r z r x. Case 4: x, y ∈ K m (r) and Case 2 does not hold for either x or y: Suppose x r y. If not, then V i (y) > V i (x) so there exists a z ∈ Q m−1 so that z r x and z r y. By weak order, y r z and so y r x, a contradiction.
Since it represents r on K m (r), it also agrees with m on K m (r). Hence it is an increasing transformation of For any x, y ∈ A i , the above establishes that V i (x) ≥ V i (y) ⇐⇒ x r y. For any x ∈ A i and y ∈ A j where i < j, x r y by Lemma 5 and construction. Since . Define U k (·|r) to agree with the appropriate restriction of V i , and conclude { r } r∈X conforms to CTM under K. Since r was arbitrary, this completes the proof.
A.2. Proof for Theorem 2. Sufficiency is easy to verify. Suppose that U k (x) = n i=1 U k i (x i ). We show that for every category j there exists a vector w 0 so that Consider dimension 1, and the rest follow the same arguments. The goal is to If this is the case, then standard uniqueness results give that U j 1 (x) = αU k 1 (x) + β. The β can be dropped completing the claim.
Let π i be the projection onto the i-coordinate. Then, E k 1 = π 1 (E k ) is open and connected for any category k. This follows from E k connected and open and π i continuous. In R, connected implies convex.

Claim 1. For any
To see it is true, pick x ∈ E k 1 E j 1 . Then there is an a l ∈ E l with a l 1 = x for l = k, j. Let U k −i (y) = j =i U k j (y j ) for any y ∈ X. Since each a l ∈ K l (r l ) for some r l ∈ X, there exists an l > 0 so that B 2 l (a l ) ⊂ K l (r l ) ⊂ E l , where the distance is given by the supnorm. Pick ∈ (0, l ) so that by Category Continuity and CM. In particular, . By Reference Interlocking and weak order, , so we conclude that the claim holds with x = .
We now extend to the entire domain (this follows similar arguments in CW). Pick an arbitrary x * < x * ∈ E k 1 E j 1 and consider Z = (x * , x * ]. If the claim is true, then standard uniqueness results give that U j 1 (x) = αU k 1 (x) + β for all x ∈ O z for some α > 0. Let α * , β * be the constants so that U j 1 (x) = α * U k 1 (x) + β * for all x in the neighborhood of x * , as guaranteed to exist by the claim.
Let  inductively. If there is no intersecting category, we can start again and obtain a (disjoint) interval, the values of U i 1 (and U j 1 ) on which have no bearing on the DM's choices. Similar arguments obtain for the other dimensions. Moreover, there is no loss in setting each β = 0. This completes the proof.
A.3. Proof of Theorem 3. To save notation, until after Lemma 10, we fix r and write K k instead of K k (r) and instead of r . We also identify xα k y with the alternative αx ⊕ k (1 − α)y. Let (U 1 , . . . , U n ) be the additive functions that represent Recall from Definition 6 that an indifference sequence is a finite sequence of categories with indifference between each succeeding members.
Definition 7. The function v is a utility for the indifference sequence (Q 1 , . . . , Q m ) if v is an increasing additive utility function on each Q k and for all k, x, y ∈ Q k Q k+1 : For any y ∈ K l and α such that yα l x l ∈ B l (x l ), there exists β ∈ (0, 1) such that x * β k x * ∼ yα l x l by Category Continuity, CM, and that is a weak order. Let V l (y) = α −1 U k (x * β k x * ). This is well defined, additive, increasing, and ranks alternatives in the same way as U l . Thus, V l (y) = aU l (y) + b for some a > 0 and b ∈ R.
For any x ∈ K k and y ∈ K l , pick α ∈ [0, 1] such that xα k x k ∈ B k (x k ) and yα l x l ∈ B l (x l ). By construction, yα l x l ∼ y when y ∈ B k (x k ) and U k (y ) = αV l (y). Thus, xα k x k y ∼ yα l x l holds if and only if U k (x) ≥ V l (y) and x y ⇐⇒ xα k x k yα l x l by AAC since x k ∼ x l , completing the proof.
For an indifference sequence (Q 1 , . . . , Q m ) with utility v, we label the range of utilities as cl(v(Q k )) = [l k , u k ] where l k ≤ u k . Note that we allow Q k = Q l for k = l.
Lemma 7. For an indifference sequence (Q 1 , . . . , Q m ), there is an affine, increasing utility v for it.
Proof. The proof is by induction. We claim that there is a utility v k : X → R that is a utility for the IS (Q 1 , . . . , Q k ) for any k. When k = 1 or k = 2, this is true by the above lemmas. The induction hypothesis (IH) is that the claim is true for k = N . Consider k = N + 1. Let v N be the utility for (Q 1 , . . . , Q N ) be index that exists by the IH. If Q N +1 ⊆ N i=1 Q i , then we are done. If not, then for Q N = K l , there is no loss in normalizing v N so that it equals U l on K l (r). Suppose Q N +1 = K j (r), and let α, β be the scalars claimed to exist by Lemma 6, so that U j (x) ≥ αU l (y) + β ⇐⇒ x r y for x ∈ K k (r) and y ∈ K l (r). Restricted to Then, if l < N and x, y ∈ Q l Q l+1 , then we are done by the IH, . If x, y ∈ Q N Q N +1 , then Lemma 6 and construction implies the result. The claim then holds by induction.
Proof. The Lemma is vacuously true for any 1 or 2-element IS. Fix an IS (Q 1 , . . . , Q n ) with n ≥ 3 and v as above, and suppose . . , Q n ) is an IS; it remains to be shown that v is a utility for it. There is an > 0 s.
This establishes the Lemma.
Lemma 9. Fix an indifference sequence (Q 1 , . . . , Q n ) with utility v. If (l 1 , u 1 ) (l n , u n ) = ∅, then there exists i and Proof. If there is i with (l i , u i ) (l i+2 , u i+2 ) = ∅, then there is u ∈ j=i,i+1,i+2 (l j , u j ) so there exists x j ∈ Q j with v(x j ) = u for j = i, i + 1, i + 2 and thus by the hypothesis, We show there exists such an i by contradiction. If l i+2 > u i for all i or l i > u i+2 for all i, then (l 1 , u 1 ) (l n , u n ) = ∅, a contradiction. So there must exist i such that [l i+2 > u i and l i+2 > u i+4 ] or [u i+2 < l i and u i+2 < l i+4 ]. In the first case, l i+2 ∈ (l i+1 , u i+1 ) (l i+3 , u i+3 ); in the second, u i+2 ∈ (l i+1 , u i+1 ) (l i+3 , u i+3 ). In either case, we have a contradiction.
Lemma 10. Fix an indifference sequence (Q 1 , . . . , Q n ) with utility v. Then for all Proof. This is clearly true if n = 1. (IH) Suppose the claim is true for any IS with m < n elements. Fix an IS (Q 1 , . . . , Q n ) with utility v. If x / ∈ Q 1 Q n or y / ∈ Q 1 Q n , then the claim immediately follows from the IH, and clearly holds if x, y ∈ Q i for some i. So it suffices to consider arbitrary x ∈ Q 1 and y ∈ Q n . By Lemmas 8 and 9, if (u 1 , l 1 ) (l n , u n ) = ∅, we can form a shorter IS from Q 1 to Q n and the claim then follows from the IH.
There are two cases to consider: l n > u 1 and u n < l 1 . Consider l n > u 1 . The range of v restricted to n−1 i=1 Q i is dense in n−1 i=1 (l i , u i ) = (l,ū). Note l n ∈ (l,ū) since x n−1 ∼ y n , so (l n−1 , u n−1 ) (l n , u n ) = ∅. Then (l n , v(y)) is an open interval having a non-empty intersection with (l,ū). Since the range of v is dense in (l,ū), there exists y ∈ Q n with l n < v(y ) < v(y). Since l n > u 1 , n > 1. Then (Q 1 , . . . , Q n ) and (Q n , . . . , Q n ) are both ISes with strictly less than n elements. Applying the IH, y x and y y . Conclude using transitivity that y x. Similar arguments obtain the desired conclusion when u n < l 1 .
Define r as in the proof of Theorem 1, and let A 1 , . . . , A n be the distinct indifference classes of r . Again using Lemma 5, we can relabel so that x ∈ A i and y ∈ A i+1 implies x r y. By Lemma 10, there is v i on A i so that v i is additive and increasing within categories and x y ⇐⇒ v i (x) ≥ v i (y) for all x, y ∈ A i . By Unbounded and Lemma 5, every positive unbounded region (if any) is a subset of A 1 , and every negative unbounded region (if any) is a subset of A n . If one region is both positive and negative unbounded, then n = 1. Therefore, v i (A i ) is bounded for all i ∈ (1, n), and v n (A n ) is bounded above whenever n > 1. Define Observe V (·) is a positive affine transformation of v i (·) when restricted to A i , and if x ∈ A i , y ∈ A j and i > j, then V (x) > V (y). Thus V represents r and, when restricted to any given region, is affine and increasing.
Defining U k (·|r) as the (unique) affine transformation of U k so it agrees with V on K k (r) establishes that r is an affine CTM. Since r was arbitrary, this establishes that each r has such a representation. Conclude that { r } conforms to Affine CTM, completing the proof.
A.4. Proof of Theorem 4. Without loss of generality, normalize so that U 1 (·|r) = U 1 (·|r ) for all r, r . Suppose U k (·|r) = U k (·|r ) for some r, r and some k. Then, let = d(r, r ) and pick a sequencer n →r such that: U k (·|r n ) = U k (·|r),r n ∈ B¯ (r) for all n, and d(r n , r) → inf{d(r , r) : U k (·|r) = U k (·|r )}. Sincer n ∈ cl(B¯ (r)), there is no loss in assuming this sequence converges. Similarly, let r n be a sequence in B¯ (r) such that r n →r and U k (·|r) = U k (·|r n ).
By hypothesis and that each K k (r) is open, there exists > 0, x k and x 1 such that B 2 (x k ) ⊂ K k (r), B 2 (x 1 ) ⊂ K 1 (r), and x k ∼r x 1 . By continuity of the region functions, B (x k ) ⊆ K i (r n ) ∩ K i (r n ) and B (x 1 ) ⊆ K 1 (r n ) ∩ K 1 (r n ) for n large enough. For z close enough to x k , there exists y(z) ∈ B (x 1 ) such that z ∼r y(z). But then by SC, z ∼ rn y(z) and z ∼r n y(z). Thus U k (z|r n ) = U 1 (y(z)|r n ) = U 1 (y(z)|r n ) = U k (z|r n ) for all z close enough to x k , implying that U k (·|r n ) = U k (·|r n ), a contradiction. Conclude U k (·|r) = U k (·|r ) for all r, r .
A.5. Examples from Table 1. Example 1 shows that BGS violates Cancellation and inspecting Figure 1 shows it violates Monotonicity. It remains to show that TK violates Reference Irrelevance and that MO violates Cancellation. This is established by the following two examples.
A.6. Other models and CTM. In this subsection, we present the functional forms of the other models of salience we discussed, and show that they are not CTM.
• Gabaix [2014] assumes a rational DM would maximize u(a, w) but actually maximizes u (a, (w 1 m * 1 , . . . , w n m * n )) where m * ∈ arg min where Λ ij incorporates the "variance" in the marginal utility of dimensions i and j. When n is large, m * i is often zero, so (w 1 m * 1 , . . . , w n m * n ) is a "sparse" vector.
• Tversky & Kahneman [1991] refer in general to where v i is concave above 0 and convex below • Bordalo et al. [2019] and the continuous form of the salient thinking model has where w has the same properties as a salience function.
• Munro & Sugden [2003] use the functional from [2013] assume that the DM chooses the bundle x that maximizes given that a reference point r, where each α i is increasing and positive.
The first fails to be CTM, as the indifference curves have the same slope everywhere for a fixed context. If they were CTM, then they would necessarily have only a single region. Single region CTM coincides with the neoclassical model. The final four explicitly take into account a reference point. In all four, it is easy to see that the reference point affects the marginal rate of substitution between attributes. This implies a violation of weak reference irrelevance for any given category function: any two points in the same category that are indifferent to each other necessarily remain so for a sufficiently small change in the reference point.
A.7. Proof of Proposition 2. K satisfying S0-S4 implies that E 1 = E 2 = R n ++ , so the structure assumption is satisfied. Moreover, Theorem 5 gives that the categories are generated by a salience function. The axioms allow us to apply Theorems 2 and 4 to get a strong CTM representation of the family with reweighted utility indexes. Hence, U k (x) = w k 1 u 1 (x 1 ) + w k 2 u 2 (x 2 ) + β k for each x ∈ X.
A.8. TK. This subsection states and proves a characterization theorem for TK.

Proposition 5. A family of preferences { r } r∈X has a TK representation if and only
if it is an affine CTM with a gain-loss regional function that satisfies Reference Interlocking, Monotonicity, Cancellation, and continuity of each r .
Tversky & Kahneman [1991, p. 1053] provide an alternative axiomatic characterization of the model, and our result makes heavy use of their theorem.
Proof. Necessity follows from the discussion above and TK's theorem. To show sufficiency, we rely on TK's theorem, which states that any monotone, continuous family of preference relations that satisfies cancellation, sign-dependence and reference interlocking has a TK representation. Given our assumptions, we need to show that { r } satisfies sign-dependence and reference interlocking.
TK say that { r } satisfies sign-dependence if "for any x, y, r, s ∈ X, x r y ⇐⇒ x s y whenever x and y belong to the same quadrant with respect to r and with respect to s, and r and s belong to the same quadrant with respect to x and with respect to y." This happens if and only if x ∈ K k (r) K k (s) and y ∈ K k (r) K k (s) for some k ∈ {1, 2, 3, 4}. Then, sign-dependence is exactly an implication of affine CTM, since U k (·|r) = αU k (·|s) + β for α > 0.
TK say that { r } satisfies reference interlocking if "for any w, w , x, x , y, y , z, z that belong to the same quadrant with respect to r as well as with respect to s, w 1 = w 1 , x 1 = x 1 , y 1 = y 1 , z 1 = z 1 and x 2 = z 2 , w 2 = y 2 , x 2 = z 2 , w 2 = y 2 , if w ∼ r x, y ∼ r z, and w ∼ s x then y ∼ s z ." The assumptions on quadrants imply that w, w , x, x , y, y , z, z ∈ K k (r) K l (s) for some k, l ∈ {1, 2, 3, 4}. Since y , z ∈ K l (s), the conclusion follows immediately from RI.
A.9. Example 4. Example 4. The following salience functions generates regions all satisfy S0-S3, but only satisfy a subset of the other properties.
That (ii) implies (i) follows from the first part, and that S6 is implied by symmetry and homogeneity of degree zero. Now, we show (i) implies (ii). Set σ(a, b) = max{a/b, b/a}. Clearly σ is a salience function, and we show that σ generates K 1 and K 2 . Fix r ∈ X and set A = {x : σ(x 1 , r 1 ) > σ(x 2 , r 2 )}. We show A = K 1 (r).
If y ∈ K k (r), then there exists y ∈ I * (y) arbitrarily close to y so that y, y ∈ K k (r); for any such y , y ∼ r y . If y ∼ r y for every y ∈ I * (y) B (y) for some > 0, y / ∈ K k (r).
Fix two regions k and j. By CAR, for any x ∈ E R,k there exists x ∈ E R,k , y ∈ E R,j , and S ∈ X so that x , y ∈ c(S) and x ∼ R,k x . This implies there exists a strictly increasing function H so that V (x|r) = U k (x) when x ∈ K k (r) and V (x|r) = H(U j (x)) when x ∈ K j (r) represents choice (when S ⊂ K k K j ). This is well-defined and represents choice by Category SARP. By AAC*, H is an affine function. The argument are readily seen to extend inductively to all regions, which complete the proof.
B.3. Proof of Lemma 1. Pick any x ∈ X and set S = {x, x } where x = ( 1 2 x 1 , x 2 ). Then, A(S) 2 = x 2 by strong generalized average, so both x and x are 1-salient by S4. By CM*, x ∈ c(S), and so x ⊂ E R,1 . x was arbitrary, so X = E R,1 . Similar for K 2 .
B.4. Proof of Proposition 3. By Lemma 1, the structure assumption is satisfied. By Theorem 5, the category function is generated by a salience function. By Theorem 6, c conforms to Strong CTM. Mimicking the arguments of Theorem 2, Reference Interlocking implies U k (x) = w k 1 u 1 (x 1 ) + w k 2 u 2 (x 2 ) + β k . The rest follows from the arguments that establish Proposition 2.
B.5. Proof of Proposition 4. Pick any r, and suppose U k (r) ≥ U −k (r). Since A is a generalized average, for any y r there exists a menu S so that A(S) is arbitrarily close to r and y x for all x ∈ S (pick S so its convex hull is in a small enough neighborhood of r that doesn't include y). By making that neighborhood smaller if necessary, either y belongs to K k (A(S)) or y / ∈ K k (r). There exists a y arbitrarily close to, but not equal to, y, so that U k (y ) = U k (y) and U −k (y) = U −k (y ). In the former case either y or y is chosen from the menu S by categorical monotonicity*, where S is a menu (assumed to exist by generalized average) with A(S ) sufficiently close to A(S) and y, y ∈ S . Moreover, y ∈ c(S ) if it is close enough that it too belongs to K k . Conclude that y ∈ K k (r) implies that there exists y arbitrarily close to, but not equal to, y with U k (y ) = U k (y) and {y, y } = c(S ). If y / ∈ K k (r), then both y and y cannot be chosen. Either y is not chosen because it is in K −k , or the DM will not be indifferent between y and y. The rest follows from Proposition 1.