## 1. Introduction

Objective Bayesianism purports to tell us how strongly it is rational for an agent to believe in the propositions that can be expressed in his language. The degree of belief an agent has in a proposition can be represented by a number in the interval [0, 1] with 0 indicating that the agent is certain that the proposition is false and 1 indicating that the agent is certain that the proposition is true. Objective Bayesianism is the view that an agent’s degrees of belief should – on pain of irrationality – satisfy the following three norms (cf. p. iii):1

Probability: The probability norm states that an agent’s degrees of belief should be probabilities. So, for example, if an agent believes that the toss of a coin will come up heads with degree of belief 0.7, then he should believe that it will come up tails with degree of belief 0.3 (assuming that heads and tails are exclusive and exhaustive outcomes).

Calibration: The calibration norm states that an agent’s degrees of belief should be calibrated with his evidence. So, for example, if an agent has evidence that a coin is biased towards heads to the extent that it lands heads between 80% and 90% of the time then his degree of belief that it will land heads on the next toss should be between 0.8 and 0.9.

Equivocation: The equivocation norm states that an agent’s degrees of belief should be suitably non-committal. So, for example, if an agent knows that the outcome of the toss of a coin may be either heads or tails then, other things being equal, his degree of belief in each outcome should be 0.5 (since this is the most non-committal it is possible to be). But if, as above, the calibration norm requires him to assign a degree of belief of between 0.8 and 0.9 to heads then the equivocation norm requires him to assign a degree of belief of 0.8 to heads (because the function that assigns 0.8 to heads and 0.2 to tails is closer – in a sense to be explained later – to the function that assigns of 0.5 to heads and 0.5 to tails than any other probability function allowed by the calibration norm).

Objective Bayesianism is to be contrasted to two weaker forms of Bayesianism: strictly subjective Bayesianism, which endorses only the probability norm and empirically based subjective Bayesianism, which endorses both the probability and calibration norms (but not the equivocation norm).2 However, objective Bayesians also reject one norm that is typically endorsed by proponents of both forms of subjective Bayesianism. This is the conditionalization norm and pertains to how an agent should change his degrees of belief in the light of new evidence. Objective Bayesianism has no need of such a norm, since the three norms that it prescribes already tell an agent how he should alter his degrees of belief in the light of new evidence. In fact, it would be inconsistent to add this norm to objective Bayesianism, since in some cases the updating prescribed by the objective Bayesian’s three norms differs from the updating prescribed by the conditionalization norm.

This is a quick sketch of the view that Jon Williamson explores in his book In Defence of Objective Bayesianism. The first four chapters of the book articulate and defend the view in the context of languages of propositional logic. Chapter 5 develops the view in the context of the richer languages of predicate logic, which facilitates a tantalizing, but brief and very abstract discussion of how objective Bayesianism can provide a semantics for probabilistic logic (Ch. 7). Objective Bayesianism has been criticized on the grounds that it is not feasible to calculate objective Bayesian degrees of belief except in toy problems, so Chapter 6 explains how objective Bayesian nets can be used to simplify this task. Chapter 8 discusses the perhaps somewhat tangential – but nonetheless interesting – issue of how we should arrive at a single judgement when faced with the different judgments of a number of individuals who have different evidence and have formed different beliefs. Chapter 9 articulates objective Bayesianism in the context of the language of probability theory and discusses the bête noire of objective Bayesianism: the charge that it is perniciously language relative, because two agents faced with the same evidence but using different languages may – following the norms of objective Bayesianism – come to different conclusions as to what degree of belief it is rationally permissible to assign a given proposition.

In Defence of Objective Bayesianism is an excellent book. As Williamson notes (163) objective Bayesianism – and in particular the equivocation norm – is not easy to defend. It seems intuitively reasonable, but it turns out to be hard to justify without begging the question. Some objective Bayesians might just give up and claim that reasoning in accord with the norms of objective Bayesianism is partly constitutive of what it is to be rational, and therefore the norms do not need, and probably cannot be given, non-question begging justification. This might seem to have all the advantages of theft over honest toil. But it doesn’t: such an argument would not convince someone who does not already accept these norms that they are reasonable. This is particularly problematic because there are many people who explicitly reject these norms. Williamson makes a manful effort to give the norms independent motivation and to defend objective Bayesianism against a number of other charges that have been levelled against it. And whilst in places he may not be wholly successful, his labours are always interesting.

In this review, I comment mainly on Williamson’s discussion of what seem to me the most important challenges facing objective Bayesianism: the justification for adopting the calibration and equivocation norms (Sections 2 and 3), the justification for rejecting the conditionalization norm endorsed by subjective Bayesians (Section 4) and the charge that objective Bayesianism is perniciously language relative (Section 5).

## 2. Calibration

The intuition behind the calibration norm is that, ‘A wise man [ …] proportions his belief to the evidence’ (Hume 2008: § 10, part 1). The calibration norm is an extension of Miller’s principle (Miller 1966) (which is also known as the principal principle (Lewis 1980)). Roughly speaking, this states that if an agent knows that the physical probability of an event is a, then the agent’s degree of belief in the event should also be a. For example, if I know that the physical probability that an arbitrary atom of carbon-11 will decay in the next 20.334 minutes is 0.5 then my degree of belief that an arbitrary atom of carbon-11 will decay in the next 20.334 minutes should also be 0.5.

Williamson motivates this principle by an appeal to a betting argument (40–41). Suppose an agent is betting on a sequence of coin tosses. Let hi state that the i-th toss will be heads. Let’s further assume that the agent’s degree of belief in each hi is the same. Let Cr be the agent’s credence function and let Ch be the chance function.3 Let Cr(hi) = a and let Ch(hi) = b. If a ≠ b then the agent will be inclined to make bets that are guaranteed to lose him money in the long run. Suppose a > b (the other case is analogous). The agent will be prepared to bet a stake S on heads at odds of (1 − a): a (heads: tails) at every toss. Assuming that the outcomes of the tosses are identically and independently distributed in accordance with Ch then by the law of large numbers, for any ε > 0, there is some number N, such that for all n > N, |m/n − b| < ε (where m is the number of occurrences of heads) with probability 1. After n tosses the agent’s gain will be:

We can let ε = a − b so that the right hand side of the inequality is 0: so the agent loses money.

As noted, this argument assumes that the agent’s degree of belief in each hi is the same. Williamson says that it is ‘intuitively untenable’ to have different degrees of belief in different hi’s as they are ‘indistinguishable with respect to current evidence’ (41). While many people will agree that it is intuitively unreasonable to have different degrees of belief in different hi’s here it may appear that the assumption is question-begging in this context. The claim that we should assign the same degree of belief to evidentially indistinguishable propositions seems to be a variant of the principle of indifference, which is rejected by not only strict subjective Bayesians (who reject the calibration norm) but also by empirically based subjective Bayesians (who themselves accept the calibration norm). However, although the intuitions behind this assumption and the principle of indifference seem to be similar the two are not the same. The principle of indifference could be stated as follows: an agent should assign the same degree of belief to different possible outcomes that are indistinguishable with respect to his current evidence. The principle at work here could be stated as follows: an agent should assign the same degree of belief to the same outcome in repeated runs of an experiment, where the repeated runs are indistinguishable with respect to his current evidence. This latter principle is arguably more plausible than the principle of indifference. Moreover, the argument does not assume that an agent must follow this principle. It only shows that if an agent does follow this principle and he does not also respect the calibration norm then he can be subject to a sure long-term loss. So, to avoid a sure long-term loss he must either deliberately flaunt this principle or respect the calibration norm. It seems more plausible to say that rationality demands that an agent respect the calibration norm than to say that rationality demands that he flaunt this principle, because it does not seem to be irrational to assign the same degree of belief to the same outcome in repeated runs of an experiment, where the repeated runs are indistinguishable with respect to current evidence.

One thing Williamson does not say a great deal about is where our knowledge of the physical probabilities comes from. He thinks that this knowledge is inferred, at least in part, from observed frequencies. But he says very little about how this happens. For the most part he is content to leave this as a matter for the theory of statistics to deal with (his longest discussion of this issue is on 166–69). But given the importance of the calibration norm for the objective Bayesian I think this subject could have been treated in rather more detail. (In fact, in recent work Williamson has addressed this subject in rather more detail: see Williamson forthcoming)

## 3. Equivocation

The least widely endorsed and most controversial of the objective Bayesian’s norms is the equivocation norm. The intuition behind the norm is that, ‘If someone’s evidence leaves the truth or falsity of θ open, then she would be irrational to strongly believe θ or its negation’ (163).

Properly spelling out the content of the equivocation norm requires a little setting up. It is easiest to consider the case where we have a language of propositional logic, with a finite number of atomic sentences, p1, … , pn. Define an atomic state, ω, of such a language to be a sentence of the form ± p1& … & ± pn (where each ±pi is to replaced by either pi or ¬pi). There are thus 2 n atomic states. Call the set of all the atomic states Ω. Define the equivocator function, P=, to be the function that assigns equal probability – 1/2n – to every atomic state, i.e:

We can measure the distance between two probability functions, P and Q, using the cross entropy (28–29):

As a first approximation the equivocation norm can be stated as follows: an agent’s credence function should be the function (of those that are calibrated with the evidence) that is closest to the equivocator. However, this will not quite do. In some cases there is no function (of those that are calibrated with the evidence) that is the closest to the equivocator. For example (cf. 29), if an agent has evidence that the physical probability of a coin landing heads is > 0.5 then, other things being equal, the calibration norm demands that his degree of belief that it will land heads on the next toss should be greater than 0.5. But for any function that respects this requirement there is always another that also respects this requirement and is closer to the equivocator (i.e. assigns a degree of belief closer to 0.5 to the proposition that the coin will land heads on the next toss).

So the equivocation norm is instead stated as follows: an agent’s credence function should be one of the functions (of those that are calibrated with the evidence) that is sufficiently equivocal. This is rather vague, but as a rule of thumb Williamson proposes (130) that, in the absence of contextual factors determining otherwise, if there is a function (of those that are calibrated with the evidence) that is closest to the equivocator then that function is the only one of these functions that is sufficiently equivocal. On the other hand, if there is no function (of those that are calibrated with the evidence) that is the closest to the equivocator, then any of these functions is sufficiently equivocal.

Williamson attempts to motivate this norm by appeal to another betting argument (64). In the Kelly gambling scenario (Kelly 1956) we consider a horse race with m runners, exactly one of which will win. We will assume that the race is run repeatedly. We can also suppose for simplicity that the odds on every horse winning are the same (but essentially the same result can be obtained if this assumption is relaxed). Suppose the agent has evidence that the physical probability of each horse winning is given by a function Ch. Let K be the agent’s capital and let the function G be the agent’s gambling strategy: the agent bets G(i)K on the i-th horse winning. If the agent is only going to bet once his expected gain is maximized by betting his entire stake on the horse that is most likely to win, i.e. he should set G(i) = 1 if Ch(i) is the maximum value of Ch and should otherwise set G(i) = 0. We might call this the casual gambler’s strategy. But things are rather different for the professional gambler. The professional gambler (we will assume) bets on the race repeatedly: after every round he reinvests his entire capital in bets on the race. In this case the agent’s best strategy is to maximize the expected growth rate of his capital, which turns out to be equivalent to maximizing the expected value of log(G(i)), i.e. maximizing:

But now suppose that the agent doesn’t know exactly which function gives the physical probability of each horse winning, but only knows that it lies in a certain set of functions. The gambling strategy, G, that maximizes ECh[log(G(i))] will be different for different possible Ch’s. Williamson (following Grünwald (2000)) suggests that the agent’s best policy is to maximize his worst-case expected growth rate, i.e. to choose the G that maximizes ECh[log(G(i))] for the possible Ch that gives the smallest possible maximum of ECh[log(G(i))]. The possible Ch that gives the smallest possible maximum of ECh[log(G(i))] turns out to be the one that is closest to the equivocator. So, according to Williamson and Grünwald, the agent should choose his gambling strategy on the basis of the possible Ch that is closest to the equivocator. In other words, he should bet as if his degrees of belief are given by the possible Ch that is closest to the equivocator. And since (for Williamson) degrees of belief are to be interpreted via betting behaviour (31–33), this amounts to saying that his degrees of belief should be given by the possible Ch that is closest to the equivocator, i.e. he should follow the equivocation norm.

There seem to me to be two significant problems with this argument. First (as Williamson acknowledges (64)), the Kelly gambling scenario is quite specific and one might wonder how widely it can plausibly be applied. For example, it is assumed that the agent bets repeatedly on a rerun of the same race, that his evidence about the chance function does not change, and that he bets his entire capital on each trial. Whilst these might be reasonable idealizations for constructing models of some of the situations we face, it is very doubtful that they are in general reasonable idealizations. Consequently, one might doubt whether the argument can do anything more than suggest that the equivocation norm is prudential in one (perhaps not even very common) type of case. But of course that is not sufficient for the objective Bayesian since he claims that the equivocation norm should be generally applied.4

But secondly, and more worryingly, a crucial assumption in the argument is that the agent’s best policy is to maximize his worst-case expected growth rate. But this is a very strong and very questionable assumption. Consider a Kelly gambling scenario with two outcomes, h and t; for instance, a professional gambler betting on a series of tosses of a coin. As before, we will assume for simplicity that the odds offered by the bookie on both outcomes are the same, in this case evens. Suppose the agent has evidence that the coin is biased and that the physical probability of heads, Ch(h), lies between 0.5 and 0.9. Let us consider how three strategies fare across the possible values of Ch(h). Strategy 1 is the Williamson/Grünwald strategy (i.e. it is the implementation of the equivocation norm). It is designed to maximize worst-case expected growth rate (so we might also call it the pessimist’s strategy). Worst-case expected growth rate occurs when Ch(h) = 0.5 (and Ch(t) = 0.5) and is maximized by setting G(h) = 0.5 (and G(t) = 0.5). Strategy 3 we might call the Leibniz or optimist’s strategy. It is designed to maximize best case expected growth rate. Best case expected growth rate occurs when Ch(h) = 0.9 (and Ch(t) = 0.1) and is maximized by setting G(h) = 0.9 (and G(t) = 0.1). Strategy 2 we might call the Aristotle or midpoint strategy. It maximizes expected growth rate when Ch(h) lies in the middle of its range of possible values, i.e. when Ch(h) = 0.7 (and Ch(t) = 0.3). In this strategy we set G(h) = 0.7 (and G(t) = 0.3). The expected values of ln(G(i)) for these different strategies are plotted against the possible values of Ch(h) in Figure 1.

Figure 1.

The expected values of ln(G(i)) against the possible values of Ch(h) for the Williamson/Grünwald (pessimist) strategy (G(h) = 0.5), the Aristotle (midpoint) strategy (G(h) = 0.7) and the Liebniz (optimist) strategy (G(h) = 0.9).

Figure 1.

The expected values of ln(G(i)) against the possible values of Ch(h) for the Williamson/Grünwald (pessimist) strategy (G(h) = 0.5), the Aristotle (midpoint) strategy (G(h) = 0.7) and the Liebniz (optimist) strategy (G(h) = 0.9).

Williamson suggests that the equivocation norm is motivated by considerations of caution (65). But we can see from Figure 1 that while the equivocation norm is cautious in a sense – in the sense that implementing it in this case yields a strategy that does the best in the worst-case scenario – there is another sense in which it is not at all cautious: there are some possible Ch’s (e.g. when Ch(h) = 0.9) for which the Williamson/Grünwald strategy performs the worst of the three strategies. If we want to be cautious in the sense of doing relatively well over the whole range of possible values of Ch(h) then the midpoint strategy seems to be the best to adopt: of these three strategies it never performs the worst, over a fairly wide range of possible values of Ch(h) it performs the best and even when it is only second best it is not much worse than the best.

The midpoint strategy discussed above can be thought of as the implementation of a midpoint norm, which states that an agent’s degrees of belief should lie in the middle of those allowed by the calibration norm. There are other betting scenarios in which implementing the midpoint norm seems even more clearly preferable (from the point of view of caution) than implementing the equivocation norm. Consider the following scenario. A gambler is in possession of a biased coin and he has evidence that the physical probability of the coin landing heads is in the interval [0.5, 0.9]. The gambler wants to get some action so he decides that he will take a bet at what he considers fair odds for a stake S, that the next toss will be heads. If he follows the equivocation norm his degree of belief that the next toss will be heads will be 0.5 and he will offer the bet at evens. If he instead follows the midpoint norm his degree of belief that the next toss will be heads will be 0.7 and he will offer odds of 3: 7 (heads: tails). What is his worst-case expected loss (evaluated according to the physical probability that the coin will land heads)? If he implements the equivocation norm and offers evens his worst-case expected loss occurs when the physical probability of the coin landing heads is 0.9. In this case his expected loss is:

If he implements the midpoint norm his worst-case expected loss also occurs when the physical probability of the coin landing heads is 0.9. In this case his expected loss is:

which is clearly less. So in this scenario the gambler’s worst-case expected loss is worse when implementing the equivocation norm than when implementing the midpoint norm: so midpoint is the more cautious norm in this set-up.

Construed as an argument in favour of the midpoint norm these considerations leave much to be desired. Like the Kelly gambling scenario the above set-up is quite specific and one might wonder how widely it can plausibly be applied. Consequently, one might doubt that the foregoing considerations do anything more than suggest that the midpoint norm is prudential in some (perhaps not even very common) cases. But my aim here is not to argue that the midpoint norm should be generally applied. My aim is rather to argue that there are reasons for thinking that the equivocation norm should not generally be applied – or at least that it is not in general supported by considerations of caution as Williamson suggests.

## 4. Conditionalization

As noted, the objective Bayesian rejects the conditionalization norm endorsed by most subjective Bayesians. The conditionalization norm is intended to tell us how we should update our credence function in the light of new evidence. Suppose an agent’s old credence function is Cr. The conditionalization norm states that the agent’s new credence function, Cr′, should be such that, for any p:

The conditionalization norm is usually motivated by appeal to a betting argument: it can be shown that if an agent does not update his degrees of belief according to conditionalization then he can be the subject of a diachronic Dutch book (see Teller 1973). Since Williamson uses betting arguments to support the three norms he endorses (he uses the normal synchronic Dutch book argument in support of the probability norm (33–38)) it seems particularly important that he should say what is wrong with this particular betting argument. In fact he does not exactly do this, but he does at least argue that something must be wrong with diachronic Dutch book arguments, by showing that, ‘in certain situations one can Dutch book anyone who changes their degrees of belief at all, regardless of whether or not they change them using conditionalisation’ (85, original emphasis).

The sort of situation he has in mind is where it is generally known that an agent, A, will be presented with evidence that will definitely not decrease (or definitely not increase) his degree of belief in a proposition p (e.g. the agent might be a juror who is about to hear the case for the prosecution). Suppose the agent’s initial credence function is Cr, where Cr(p) = a. He will therefore be prepared to bet with odds of (1 − a): a (for the bet p: ¬p). If at some later date his credence function is Cr′, where Cr′(p) = a′, he will at that later date be prepared to bet with odds of (1 − a′): a′ (for the bet p: ¬p). If another agent, B, knows that A will be presented with evidence that will definitely not decrease his degree of belief in p (so that a′ ≥ a) then A can be Dutch booked if he changes his degree of belief in p at all: B need only make a bet on p at the original odds of (1 − a): a (that will pay him S(1 − a) or cost him Sa) and then make a bet on ¬p at the later odds (1 − a′): a′ (that will cost him S(1 − a′) or pay him Sa′). Whether or not p turns out to be true B will gain (and A will lose) Sa′ − Sa ≥ 0 (since a′ ≥ a), which will be strictly greater than 0 unless a = a′. So, in this case if A changes his degree of belief in p at all he can be Dutch booked no matter how he changes his degrees of belief. So Williamson concludes, ‘avoidance of Dutch books is a lousy criterion for deciding on an update rule’ (85).

However, it seems to me that an advocate of conditionalization might plausibly be able to reject this argument. If it is generally known – and in particular known also by the agent in question – that an agent will be presented with evidence that supports (or at least does not undermine) a proposition p then it is perhaps not unreasonable to expect the agent to take this into account when arriving at his degree of belief in p, so that his degree of belief in p is determined not only by the ‘first-order’ evidence he has that directly relates to p but also by the second-order evidence that he will be presented with more first-order evidence that supports (or at least does not undermine) p. So, for example, if the prosecution present very little or very weak evidence in support of the claim that the suspect is guilty then a juror’s belief that he is guilty might actually go down, because he knew that the prosecution was going to present some evidence in support of the case and was expecting it to be more compelling. An analogous phenomenon is often observed in the stock market. The price of shares in a company can go down after it reports a yearly profit if the market was expecting that it would have reported a greater profit, since the expected profit is factored into the share price.

However, it turns out that in many cases updating by conditionalization and updating by objective Bayesian methods actually yield the same results. In particular, in the context of a finite propositional language, if the agent initially has a set E of evidence and this imposes a consistent set X of affine constraints on the permissible probability functions, then the methods will agree provided:

1. The new evidence, e, is a sentence of the agent’s language,

2. e is simple with respect to E,

3. X′ is consistent (where X′ is the set of constraints imposed by E∪{e}),

4. Cr(·|e) satisfies X (where Cr is the agent’s initial credence function).

Cf. 78.

So perhaps a more promising line of argument for the objective Bayesian to pursue is to argue not that diachronic Dutch book arguments are suspect in general but that they are suspect when one or more of these conditions fail, so that where diachronic Dutch book arguments can legitimately be used to support conditionalization they can also be used to support objective Bayesian updating (because the two coincide).

In the first case, where e is not a sentence of the agent’s language, then diachronic Dutch book arguments do not even get off the ground, since to set up such a Dutch book requires that the dupe is prepared to make bets based on Cr(·|e). So in this case it is trivial that diachronic Dutch book arguments cannot be legitimately used (because they cannot be used at all). Indeed, in this case conditionalization and updating via objective Bayesian methods differ only because conditionalization gives us no advice on how to update. (Cf. Williamson’s argument that updating via objective Bayesian methods is better than updating via conditionalization when the first condition fails (79).)

In the second case e is not simple with respect to E: this means that e imposes a constraint other than just Cr′(e) = 1, (where Cr′ is the new credence function) which implies that X′ ≠ X∪{Cr′(e) = 1} (78). For example (cf. 79), if e is the sentence Ch(p) = 0.8 (where Ch is the chance function), then this imposes the constraint (given the calibration norm) that Cr′(p) = 0.8 (in addition to the constraint Cr′(e) = 1), so e is not simple with respect to E (unless E already imposes the constraint that Cr(p) = 0.8). If we were to update via objective Bayesian methods we would set Cr′(p) = 0.8, but if we were to update via conditionalization then we would set Cr′(p) = Cr(p|e), which may not be equal to 0.8. In this case too it seems that diachronic Dutch book arguments are not legitimate, or at least not compelling. Suppose Cr(p|e) ≠ 0.8 (so conditionalization and objective Bayesian updating disagree) then in order to avoid a diachronic Dutch book we must set Cr′(p) ≠ Ch(p). In other words, the only way to avoid a diachronic Dutch book is to violate the calibration norm. But then we open ourselves up to a sure long-term loss (if the betting argument in favour of the calibration norm is correct). So in this situation we have to pick the lesser of two evils: either the possibility of sure loss via a diachronic Dutch book or the possibility of sure long-term loss (admittedly it is not obvious which of these is the lesser evil, but the argument at least casts doubt on the assumption that the diachronic Dutch book argument is legitimate in these cases). (Cf. Williamson’s argument that updating via objective Bayesian methods is better than updating via conditionalization when the second condition fails (79–81).)

In the third case X′ is not consistent. Since (by assumption) X is consistent this means that e is not consistent with X. This means that Cr(e) = 0 so that Cr(·|e) is undefined. As with the first case diachronic Dutch book arguments do not even get off the ground, since to set up such a Dutch book requires that the dupe is prepared to make bets based on Cr(·|e). So in this case it is (as with the first case) trivial that diachronic Dutch book arguments cannot be legitimately used (because they cannot be used at all). Indeed, (as with the first case) in this case conditionalization and updating via objective Bayesian methods differ only because conditionalization gives us no advice on how to update.5 (Cf. Williamson’s argument that updating via objective Bayesian methods is better than updating via conditionalization when the third condition fails (81).)

In the fourth case Cr(·|e) does not satisfy X. Providing e is simple with respect to E, then X ⊆ X′ (if e is not simple with respect to E then we are dealing with the second case). So Cr(·|e) does not satisfy X′. So, if we update via conditionalization Cr′(·) will not satisfy X′. So the only way to avoid a diachronic Dutch book in this case is to have a new credence function that does not satisfy the calibration norm. This means that we open ourselves up to a sure long-term loss (if the betting argument in favour of the calibration norm is correct). Again, we have to pick the lesser of two evils. (Cf. Williamson’s argument that updating via objective Bayesian methods is better than updating via conditionalization when the fourth condition fails (81–82).)

## 5. Language Relativity

Objective Bayesianism is language relative, in the sense that two objective Bayesians using different languages but with the same evidence might disagree as to what degree of belief it is rational to assign a given proposition. For example, consider two languages of propositional logic, L1 and L2. L1 has just a single atomic sentence, b, which is true just in case Bob is tall, dark and handsome. L2 has three atomic sentences: (i) t, which is true just in case Bob is tall, (ii) d, which is true just in case Bob is dark and (iii) h, which is true just in case Bob is handsome. Agent A1, an objective Bayesian who uses language L1, will, in the absence of any pertinent evidence, have degree of belief 0.5 in b (since he equivocates over the two atomic states of L1: b and ¬b). Agent A2, an objective Bayesian who uses language L2, will, in the absence of any pertinent evidence, have degree of belief 0.125 in t&d&h (since he equivocates over the eight atomic states of L2: t&d&h, t&d&¬h, etc.). But b is true if and only if t&d&h is true. So, even though the agents have the same evidence they assign different degrees of belief to the same proposition.

Williamson argues that this type of language relativity is appropriate, because:

[t]here is a sense in which an agent’s language constitutes empirical evidence. Language evolves to better latch onto the world. Hence, that an agent adopts a particular language L rather than an arbitrary language L′ is evidence that the important and natural properties in the agent’s environment are better captured by predicates in L than by those in L′; that predicates of L are more likely to be relevant to each other than those in L′; that L offers a more efficient coding of the messages that the agent is likely to pass, and perhaps even that the atomic states of L are more suitable than those of L′ as a partition over which to equivocate. (156; see also Williamson 2005, Ch. 12)

Looked at from this perspective the fact that objective Bayesianism is language relative (in the sense that two objective Bayesians using different languages but with (otherwise) the same evidence might disagree as to what degree of belief it is rational to assign a given proposition) is no more worrying a phenomenon than the fact that it is evidence relative (in the sense that two objective Bayesians using the same language but with different evidence might disagree as to what degree of belief it is rational to assign a given proposition): it is not worrying at all.

But there are related problems – Bertrand style problems – that are not resolved by this approach. Suppose we are told that the length of a given cube is between 1 m and 2 m. It is tempting to think that that the objective Bayesian would recommend a degree of belief of 0.5 in the proposition that the cube is longer than 1.5 m, since it seems that the most equivocal we can be (given the evidence) is to have a credence function that is uniform over the interval [1, 2]. But if we are instead (but equivalently) told that the volume of the cube is between 1 m3 and 8 m3 then it is tempting to think that that the objective Bayesian would recommend a degree of belief of 0.5 in the proposition that the cube is more voluminous than 4.5 m3, since it now seems that the most equivocal we can be (given the evidence) is to have a credence function that is uniform over the interval [1, 8]. And of course these recommendations are incompatible.

The (purported) problem in the language relativity cases is that different languages give rise to non-equivalent equivocator functions. The Bertrand style problems are not caused by language relativity – after all we only have a single language here. The problem in these cases is that the language we are using does not allow us to uniquely determine an equivocator function, because it gives rise to continuously many atomic states. Of course, Williamson is well aware of this problem, but he does not advance a general solution: he only notes that, ‘potential equivocators can be determined by appealing to constraints imposed by invariance considerations, constraints imposed by considerations to do with conservative extension, and other objective Bayesian statistical techniques’ (153). From a philosophical point of view this seems rather disappointing and it seems to leave one of the main problems facing objective Bayesianism virtually untouched.

1 Throughout this review references that indicate only a page number refer to In Defence of Objective Bayesianism, by Jon Williamson (Oxford University Press, 2010. vi + 186 pp. £47.00).
2 Although, as Williamson notes (157), these names are not perfectly apposite. The strictly subjective Bayesianism is not strictly subjective (in the sense that he thinks that an agent may rationally hold any degrees of belief): he insists that an agent’s degrees of belief should be probabilities, after all. And, as we shall see, the objective Bayesian is not objective, insofar as (i) the degree of belief that it is rational for an agent to hold in a proposition, according to the norms of objective Bayesianism, is relative to the agent’s evidence and language, and (ii) even for fixed evidence and language the degree of belief that it is rational for an agent to have in a proposition is not always uniquely determined by the norms of objective Bayesianism.
3 The agent’s credence function assigns to a proposition the agent’s degree of belief that the proposition is true. The chance function assigns to a proposition the physical probability that the proposition is true.
4 However, as Williamson points out (63), it can also be shown, under fairly general assumptions, that whenever the agent’s loss function is logarithmic the agent minimizes his worst case expected loss by setting his credence function to be the function (of those calibrated with the evidence) closest to the equivocator (Grünwald and Dawid 2004; Topsøe 1979) and this is, arguably, a fairly wide range of cases (though it is obviously not universal).
5 This assumes that the conditional probability P(A|B) is defined as P(A&B)/P(B). In some approaches conditional probabilities are taken as primitives so that Cr(·|e) may be defined, even if Cr(e) = 0. But suppose E = {¬e}. In the case where Cr(e|e) = 1 updating (on learning e) via objective Bayesian methods and conditionalization will agree. So suppose Cr(e|e) ≠ 1. Then updating via conditionalization demands that we set Cr′(e) ≠ 1, but that just means that we do not, after all, accept e as evidence (since by assumption we assign all evidence degree of belief 1).

## References

Grünwald
P.D.
Maximum entropy and the glasses you are looking through
2000
In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 238–46
San Francisco
Morgan Kaufmann
Grünwald
P.D.
Dawid
A.P.
Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory
Annals of Statistics
,
2004
, vol.
32
(pg.
1367
-
433
)
Hume
D.
An Enquiry Concerning Human Understanding
,
2008
Oxford
Oxford University Press
Kelly
J.L.
A new interpretation of information rate
Bell Systems Technical Journal
,
1956
, vol.
35
(pg.
917
-
26
)
Lewis
D.K.
A subjectivist’s guide to objective chance
Philosophical Papers
,
1980
, vol.
vol. 2

Oxford
Oxford University Press
(pg.
83
-
132
)
Miller
D.
British Journal for the Philosophy of Science
,
1966
, vol.
17
(pg.
59
-
61
)
Teller
P.
Conditionalisation and observation
Synthese
,
1973
, vol.
26
(pg.
218
-
58
)
Topsøe
F.
Information theoretical optimization techniques
Kybernetika
,
1979
, vol.
15
(pg.
1
-
27
)
Williamson
J.
Bayesian Nets and Causality: Philosophical and Computational Foundations
,
2005
Oxford
Oxford University Press
Williamson
J.
In Defence of Objective Bayesianism
,
2010
Oxford
Oxford University Press
Williamson
J.
Why frequentists and Bayesians need each other
Erkenntnis

doi 10.1007/s10670–011–9317–8