A study of mathematical determination through Bertrand’s paradox

Certain mathematical problems prove very hard to solve because some of their intuitive features have not been assimilated or cannot be assimilated by the available mathematical resources. This state of affairs triggers an interesting dynamic whereby the introduction of novel conceptual resources converts the intuitive features into further mathematical determinations in light of which a solution to the original problem is made accessible. I illustrate this phenomenon through a study of Bertrand’s


Introduction
Mathematical problems often call for the introduction of new concepts or methods because certain intuitive features involved in their formulation cannot be codified by the mathematical apparatus canonically available to study them.
In such cases what looks like an inherent difficulty of a given problem is best regarded as an effect of the fact that its intuitive content has not yet been resolved into mathematical determinations that can be relied upon in order to obtain a solution. This paper aims to explore and clarify this phenomenon with respect to one particular example, namely Bertrand's paradox. The reason for this choice is threefold. First, Bertrand's paradox is an interesting mathematical problem that has aroused much discussion among both philosophers and mathematicians. Secondly, a recent exchange on the paradox contained in Rowbottom (2013) and Klyve (2013) can be fruitfully reconsidered in light of the phenomenon that this paper discusses. Finally, in view of this discussion, it is possible to introduce an elementary approach to Bertrand's paradox itself, motivated by the need to convert certain intuitive features of its geometrical setting into numerical determinations (more plainly, it is necessary numerically to specify the size of certain infinite collections of geometrical entities). This move can be made once the canonical resources of probability theory are supplemented with new computational resources. The next section extracts from the analyses of Rowbottom and Klyve an interpretation of Bertrand's paradox that stresses the incongruity between canonical probability models and the character of this problem. It is because of this incongruity that alternative probability models, to be introduced in section 3, are required.

Two readings of Bertrand's paradox
If one were to draw at random a chord in a circle, what is the probability for it to be shorter than the side of the inscribed equilateral triangle? This question, originally posed in Bertrand (1889), gives rise to a puzzle, generally known as Bertrand's paradox, on account of the fact that it is possible to specify distinct, seemingly equivalent, drawing procedures, each of which determines a distinct value for the sought probability. Bertrand specified three distinct drawing procedures, leading respectively to the probability values 2/3, 1/2 and 3/4. The debate around the existence of a uniquely determined solution has lasted longer than a century and has occupied several authors, such as Borel (1909), Mosteller (1965) and Jaynes (1973). Recently, the structure of Bertrand's paradox and its interpretation have been helpfully re-examined in Rowbottom (2013) and Klyve (2013) 1  operate on a restrictive ensemble, which is not representative of the random selection of interest. Each procedure leads to the deployment of a continuous, uniform probability distribution and, thus, to a probability model that can be handled for the sake of computing numerical probability values. However, the serviceability of Bertrand's drawing procedures may be at variance with the character of the problem, because all of them come at the cost of focussing on a subcollection of the full collection of chords, which is not the ensemble one intended to study in the first place. To see how this state of affairs hints at a problem of mathematical determination, consider, by way of comparison, the trivial case of throwing a fair die: in order to specify the probability that the outcome of a throw will be a number strictly smaller than three, it is sufficient to consider the totality of six outcomes and the totality of two outcomes of interest. The probability model implicitly adopted in this case is a uniform, discrete distribution on the space of outcomes resulting from a throw. The totality of outcomes as well as the subset of relevant outcomes can be numerically specified and the numerical specifications can then be used to carry out computations of probability values. Bertrand's question, about selecting a chord from the totality of all chords, mirrors the character of the die problem in an infinite setting.
Rowbottom simply points out that the question posed by Bertrand refers to the totality of chords, not a part thereof, and to certain distinctive subcollections of this totality. If probability values are to be computed, the infinite collections involved must be assigned numerical determinations. Following the template of the die model, such determinations should lead to the introduction of a uniform, discrete distribution on the numerically specifiable totality of chords. This approach is not viable if the canonical resources of probability theory are employed but, as will be shown in section 3, it is accessible to supplementary computational resources. To sum up the discussion so far, Rowbottom's analysis points to the need for a direct consideration of the totality of chords determined by a circle. If this totality is to be part of a workable probability model, a numerical estimate of its size, with which ordinary arithmetical computations can be carried out, must be available. In other words, an intuitive feature of Bertrand's geometrical setup, i.e. the fact that a circle determines an infinite collection of chords, is to be assigned a mathematical determination, i.e. a numerical specification, which cannot be offered in the canonical (i.e. measure-theoretic) context of probability theory. The possibility of introducing the missing determination depends on an expansion of the mathematical resources at hand: since the resources of probability theory are being used as instruments to intervene on a given geometrical setup, I shall refer to them as a particular mathematical instrumentality, or simply an instrumentality. Thus, Bertrand's paradox poses a problem of mathematical determination that cannot be solved in presence of the canonical instrumentality of probability theory but may well be solved through the appeal to a distinct instrumentality (which does not have to be a replacement of the canonical instrumentality, but may be a modification or extension thereof). As will be shown in section 3, the new instrumentality yields numerical specifications of the restrictions on the ensemble of chords qualitatively alluded to by Rowbottom, but does not rule out the possibility that some of these restrictions should offer adequate characterisations of the problem. Such a question cannot be decided upon in the absence of a suitable mathematical determination of the problem itself. Before offering mathematical support to these remarks, I wish to turn to Klyve's analysis of Bertrand's problem in order to show that it points to the dynamics of determination and instrumentality revealed by Rowbottom's own discussion, albeit from a different point of view and despite the fact that Klyve is critical of Rowbottom's conclusions. Against him, Klyve maintains that Bertrand's drawing procedures are adequate, i.e., actually take all chords into account, in which case: [t]he only thing that changes is that the method of selecting one (class of) chord from this set may be biased. (Klyve 2013: 368) In my opinion Klyve's important contribution does not lie in his intended refutation of Rowbottom but in his focus on what he calls the bias of a procedure, which is best spelled out as lack of mathematical determination and whose source is not so much a selection of drawing method but the resort to a pre-scribed instrumentality. In short, it seems to me possible at once to vindicate the correctness of Rowbottom's analysis and to extract from Klyve (2013) an important lesson, which is independent of the rejection of Rowbottom (2013).
Klyve's critique of Rowbottom is based on a close reading of Bertrand's manner of specifying his drawing procedures. For instance, with respect to the proce- This conclusion must suggest itself if one is an advocate of the absolute validity of the canonical instrumentality of probability theory, which does not afford numerical means to e.g. count alternatives over an infinite set or deploy a uniform, discrete distribution on it. If, however, one is not an absolute advocate of a prescribed instrumentality, the same conclusion can be read as a call for numerical resources that offer a more precise specification of the command to choose a chord at random. Precisely this call will be answered in section 3.
Thus, if one accepts Klyve's interpretation of Bertrand's intention, it reveals, from an angle alternative to Rowbottom's analysis, that the canonical instrumentality of probability theory is too imprecise or, in the present terminology, lacks sufficient determination to tackle the problem of selecting a chord at random. Thus, it is best to dismiss Klyve's references to Bertrand's results as effects of biassed drawing procedures that are sufficiently well-determined mathemat- ically. An independent reason for this can be offered by a brief discussion of the numerical example taken from Bertrand, upon which Klyve relies in order to illustrate what he means by bias. The example is presented as a solution to the problem of determining the probability of choosing a number greater than 50 by picking at random in the sample space {1, . . . , 100}. Given a uniform, discrete distribution, the answer is trivial, but, since the numbers in the sample space are uniquely determined by their squares, one might also decide to choose over {1, . . . , 10, 000}, in which case the probability of drawing a number whose square root is greater than 50 (but possibly not an integer) is 3/4 and not 1/2, as in the original setup. Klyve qualifies the second problem as a variant of the first in which only the procedure to pick a number has changed, thus introducing a bias. As a matter of fact, the sample space has changed from one scenario to the other and the question being answered is no longer the same (in the second case one is picking at random a number whose square root is greater than 50 and not a number greater than 50). It is certainly possible to exchange a move to a different sample space with a move to a different distribution over the same sample space {1, 2, 3, . . . , 100}, but the non-uniform distribution that gives rise to the probability value 3/4 has been manufactured out of the explicit consideration of a different problem. The problems in question here are easily distinguishable because sufficient numerical specifications are available to tell them apart. What Klyve calls bias reduces to their discriminability on numerical grounds. This reduction is less straightforward in the context of Bertrand's geometrical problem because there are insufficient numerical resources to identify restrictions and effect discriminations. The same reduction becomes apparent, though, when sharper numerical specifications can actually be used, as will be seen in the next section. What Klyve calls an effect of bias is in fact a problem of mathematical determination: his remarks point in the same direction as Rowbottom's. Under the canonical instrumentality of probability theory, Bertrand's paradox is intimately connected with a lack of mathematical determination. Supplying resources that provide a more sharply determined problem leads to a novel analysis of Bertrand's three drawing procedures and to some surprising conclusions about their agreement. This is the subject of section 3.

A study of Bertrand's paradox
The discussion from section 2 has primarily served the purpose of identifying is necessary in order to compute numerical probability values. Moreover, the required measures must be able to discriminate between an infinite collection and its infinite subcollections. This is necessary in order to keep track, in a computationally effective way, of the restrictions to infinite subcollections of chords involved in Bertrand's drawing methods. It is important to realise that the two desiderata just listed call for measures alternative to Cantorian cardinals, which abrogate the principle that strict subsets always have smaller measure than the sets including them. Ordinals are unsuitable for the same reason. Moreover, in both cases ordinary arithmetical laws fail 2 : in other words, computational drawbacks and identification between part and whole make an appeal to Cantorian ideas unsuitable to supply the kind of mathematical determination required by infinite probability models. It is mandatory to look for a 'counting' measure that is computationally effective and reinstates the general principle that the part should be smaller than the whole. These conditions are met by Sergeyev's approach 3 . Sergeyev's informal approach consists in drawing a distinction be-tween infinite collections, most notably N , and the numerals that refer to their elements and to the sizes of their parts. In presence of this distinction, it is natural to think that a richer numeral system than one relying on a finite base should support size discriminations between infinite parts of a collection, not only between finite ones. The desired enrichment is obtained by introducing a suitable base for the richer numeral system, which, given the goal at hand, can only be infinitely large. Sergeyev's numeral system works with the infinite base x (read: gross-one), which is intended to refer to the number of items in the infinite collection N = {1, 2, 3, . . .}. Then x denotes an infinitely large integer, greater than the natural numbers representable in a finite base. The purpose of introducing x is not merely to denote a specification of the 'level' of infinity attained by the set of natural numbers, but to increase the discriminability of 'levels' in a way that vindicates the principle that the whole should be greater than the part. Thus, for instance, the set N ∪ 0 has a number of elements denoted by x + 1 > x and the set {2, 3, 4, . . .} has a number of elements denoted by x − 1 < x. Moreover, as pointed out above, it is assumed that the familiar laws of field arithmetic extend to a notation for elements of the real field that includes terms expressible by means of the symbol x. For example, the terms x + 1, x + 2, x + 3, . . . , 2x, . . . , 3x, . . . , x 2 , . . . all denote infinitely large reals not in N, which can be summed and multiplied in the usual manner. The same treatment, when applied to x, yields atomic sentences like x−x = 0, x 0 = 1 or x/x = 1, which would give rise to indeterminate forms if one sought to develop arithmetic using e.g. ∞ or ℵ 0 . For present purposes, field arithmetic based on x is not enough, because it does not, on its own, allow enough numerical discriminations of size. In order to compute the number of, say, chords in a circle that are as long as the side of an inscribed equilateral triangle, it will numerosity. This possibility is ruled out in Sergeyev's framework, which does require choices of labellings.
prove necessary to rely on a divisibility property that Sergeyev also postulates.
Divisibility amounts to the assumption that any partition of N into n disjoint arithmetic progressions, with n finite, should have cells containing the same number of elements, denoted by x/n 4 . Note that x/n, as the evaluation of the size of an infinite aggregate, denotes an infinitely large natural number, since x/n < x. It follows that the partition of N into the two disjoint progressions of odd and even numbers determines two cells containing the same, infinitely large number of items, denoted by x/2. In a similar vein, the numerical specification of the collection of all multiples of three is x/3 < x/2. It is worth remarking that these ideas have been formalised in Lolli (2015), within the context of a conservative extension of second-order, predicative Peano arithmetic 5 . Lolli's idea is to work with models of arithmetic that contain infinitely large elements and to fix an infinitely large 'cut-off' point, denoted by x, intuitively intended to single out N within a larger model. Axioms governing a suitable measure guarantee that, given an initial segment of a model, e.g. the set of all items satisfying x < x, every subset thereof has a measure. Measures so defined identify (bounded) sets in one-to-one correspondence and enforce the principle that the whole should be greater than the part. Divisibility axioms guarantee that there is a sufficiently rich family of measures that are actually computable and can be expressed using Sergeyev's numeral system. Computability of measures guaranteed by divisibility will play a crucial role in the discussion of Bertrand's paradox to follow. In order to apply Sergeyev's computational methodology to it, it is necessary to decide how the chords in a circle should be parametrised and what probability distribution is to be imposed upon them. In the next section the choice of an adequate parametrisation and distribution will be first 4 More precisely, x/n denotes the number of elements of any arithmetical progression of the form k, k + n, k + 2n, . . ., with 1 ≤ k ≤ n and k, n finite. Once n is fixed, letting k increase from 1 to n, one obtains a partition of N into n progressions. 5 The same treatment is possible on the basis of first-order Peano arithmetic, at the cost of cumbersome numerical coding. A discussion of this matter can be found in Lolli (2015: 9). motivated and then used to compute probability estimates.

A counting argument
Let C be a circle of unit radius in R 2 . In order to describe the random selection means of a numerical specification, how many points on the boundary of the circle C can be discriminated. In general, if the numeral system adopted includes the symbol n, denoting a natural number, then the partition of C into equal arcs of length 2π/n makes it possible to discriminate distinct points on the boundary of C by assigning them distinct labels from the list {1, 2, . . . , n}.
This may not be very helpful if one can only end up with finitely many discriminable points, but it becomes a fruitful approach if an infinitely large number of discriminations can be effected. An obvious, but fruitful, choice is to set n equal to x (greater, infinitely large numbers could also be chosen, depending on the required level of accuracy 6 ). A numerical specification of discriminable points leads to a direct computation of the number of discriminable chords. As Figure   1 below shows, this computation is based on the subdivision of C's boundary into least discriminable arcs marked by x equally spaced, labelled points. A discriminable chord is uniquely determined by a pair of labelled points on the 6 One could e.g. pick x 2 or x 3 , both of which are evenly divided by 3, by divisibility, a fact on which the argument that follows relies. One could even consider an amount of 3 · 10 x discriminable points, which measures the continuum [0, 3), if one deploys a numeral system based on decimal expansions with x places, each of which is filled by one digit from the list {0, 1, 2, . . . , 9}. In this case, 10 x points are discriminable on [0, 1) and three times this amount on [0, 3).
circumference. Once a labelled endpoint is fixed, x − 1 discriminable chords through it may be counted. As one ranges through the x labelled endpoints, x(x − 1) discriminable chords are counted, but each chord is counted twice, since the two distinct orderings of its labelled endpoints are counted as distinct chords. As a consequence, the total number of discriminable chords is the infinitely large integer denoted by the term: The last numerical specification, inexpressible in a traditional numeral sys- In particular, let P (e), P (s), P (l) be, respectively, the probability of selecting a discriminable side of some equilateral triangle inscribed in C, the probability of selecting a shorter chord and the probability of selecting a longer chord. The problem set by Bertrand is to evaluate P (s). Since 1 = P (e) + P (s) + P (l) once P (e), P (s) are computed, a value for P (l) can also be determined. In order to compute P (e), it is convenient to count first the number of points that lie on any arc subtended by the side of some equilateral triangle inscribed into C.
Since the whole arc has length 2π/3 and two consecutive, discriminable points are far apart by an arc of infinitesimal width 2π/x, there are x/3 least discriminable arcs covering one third of the circumference. Note that x/3 denotes a natural number, by divisibility. It now follows that an arc of length 2π/3 contains x/3 + 1 discriminable points. It is convenient to work with the assignment of labels 1, 2, . . . , x − 1, x illustrated in Figure 1. Any discriminable side of an inscribed equilateral triangle is uniquely determined when one of its discriminable endpoints is fixed. The other endpoint is identified by summing x/3 to the label on the endpoint that has been fixed. The discriminable sides of equilateral triangles inscribed in C are thus systematically identified by the following pairs of labels 7 : 1, .

This is an evaluation of the probability requested in the problem set up by
Bertrand. It is infinitely close to 2/3. It is finally possible to compute: , whose finite part is 1/3. Although the analysis carried out in this section can be refined by more accurate numerical estimates (see fn. 5), information about the finite part of the estimates P (s) or P (l) is already available. Furthermore, it is possible to reconsider Bertrand's drawing methods as giving rise to approximations of the probability model introduced in this subsection and assess their adequacy against it. The next subsections are devoted to carrying out precisely this task.

Selecting chords through a fixed point
Among the three drawing procedures considered by Bertrand, let us consider first the one that represents the random selection of a chord's endpoint, followed by another endpoint selection, as a de facto selection from the collection of chords in C through an arbitrary fixed point on the circumference. Using the system of x labels illustrated in Figure 1, we may conveniently fix the point whose numeral label is 1 (see Figure 2).
In this case only x−1 out of (x2− It follows that P 1 (l) = P (l). The drawing method just examined is a scaled

Parallel chords
Let us now turn to the drawing procedure that corresponds to the selection of a diameter and then a chord perpendicular to it, which is represented by Bertrand as the de facto selection of a chord from the ensemble of those perpendicular to a fixed diameter. In the present context, this selection restricts the independently given ensemble of discriminable chords to those perpendicular to a fixed diameter and, thus, parallel to one another. Let P 2 (e), P 2 (s), P 2 (l) be the probabilities of selecting a chord respectively equal, shorter or longer than the side of an equilateral triangle inscribed in C from the restricted ensemble. In including the diameter between x/4+1 and 3x/4+1. The same situation arises in the lower semicircle from Figure 3. As a consequence, the total number of chords determined by the drawing procedure is x/2 − 1 (the diameter in this ensemble being counted only once). It is clear that, among the parallel chords, only two can be sides of an inscribed equilateral triangle (one of them has end-points labelled by 5x/6 + 1 and x/6 + 1 and subtends the arc containing these points as well as the point labelled by 1; the other is the reflection of the first around the diameter parallel to both). We can therefore compute: In order to determine P 2 (s), note that each semicircle in C contains that are shorter than the side of an inscribed equilateral triangle. Thus: .
Relative to the combinatorial argument from section 3.1, the values obtained for this drawing procedure exhibit an infinitesimal discrepancy of order x −1 because the chords connecting consecutive discriminable points are systematically neglected. They were, on the contrary, included in the counts from subsection 3.1 and 3.2 (in the latter case, one of the consecutive points had to have the numeral label 1.). Nevertheless, the finite parts of P (s), P 1 (s) and P 2 (s), as well as those of P (l), P 1 (l) and P 2 (l) are the same. Bertrand's original treatment of the drawing method just discussed leads to the probability value 1/2.
With the new instrumentality employed so far, the same value can be simulated, up to a discrepancy of order x −1 , by setting up a probability model for the random selection of a point from a diameter, once it is declared that x + 1 points can be discriminated along a fixed diameter. It is worth emphasising that Bertrand could not offer a numerical model for the selection of single diameters (or directions), when describing his drawing methods. He thus resorted to the assumption that the draw ultimately reduces to picking a chord perpendicular to a diameter. In presence of sharper numerical determinations, the random selection of a diameter can be explicitly described and it gives rise to an ensemble of endpoints round the circle that suffices to set up the model from subsection 3.1. In presence of this model, the selection of a chord from those perpendicular to an arbitrary diameter is entirely describable without superadding chords assigned to a uniform distribution of discriminable points along a particular diameter. That such superaddition introduces a distortion is revealed by the fact that, given a partition of the boundary of C into equal arcs, the discriminable chords orthogonal to a fixed diameter won't partition it into equal intervals.
When the distortion is removed, probability values that are finitely accurate can still be obtained. points. In this case the centre of C is the common midpoint of x/2 discriminable diameters. The discriminable midpoints in the interior of C are therefore:

Selecting midpoints of chords
Since a diameter is longer than the side of an inscribed equilateral triangle, the number of discriminable midpoints of chords shorter than such a side is the same as in section 3.1, namely x(x − 3)/3. Calling P 3 (s) the probability of selecting the midpoint of a chord shorter than the side of an inscribed equilateral triangle, it is now easy to compute:

Averaging over drawing methods
Aerts and Sassoli de Bianchi draw a distinction between an easy problem and a hard problem raised by Bertrand's paradox. The easy problem is to figure out why the fact that distinct probability values arise from distinct chord selection procedures does not contradict the principle of indifference (very roughly, the principle that no particular selection outcomes are more likely to occur than others). The hard problem is to obtain a uniquely determined value for P (s).
In order to tackle the easy problem, Aerts and Sassoli de Bianchi note that the of labelling invariance, see Gyenis and Rédei (2015: 357-358)). Its violation is then nothing but the fact that the probability value P (s) is not preserved across Bertrand's probability models. In view of section 3, this kind of violation may be regarded as a pointer to differences between the probability models that cannot be fully detected by the canonical instrumentality. In other words, it is an indicator of the level of canonical discriminability between these models. Under the enriched instrumentality supplied by Sergeyev's computational methodology, discrimination power does not only increase, but becomes mathematically informative. This is because, if one tries to simulate the probability values obtained by Bertrand using Sergeyev's computational methodology, one ends up with probability models whose sample spaces do not contain the same number of elements (e.g. the first drawing method takes into account x − 1 discriminable chords whereas the second drawing method takes into account x + 1 discriminable points) 9 . Even if these models are based on sample spaces that are classically indistinguishable relative to size, if one relies on the finer, numerical distinctions of size afforded by the numeral system based on x, it becomes clear what looked like indistinguishable collections cannot in fact be related by bijections. In effect, as was argued in section 3, the probability models that simulate Bertrand's distinct probability values may not even be seen as models of the same phenomenon. This difference was not visible under the canonical instrumentality other than as a failure of labelling invariance, whereas it becomes transparent after the shift to a numerically more expressive instrumentality is carried out. It follows that Gyenis and Rédei do not only provide a subtle analysis of Bertrand's paradox within a canonical context, but also pinpoint the canonical property whose failure corresponds to an actual differentiation between models once Bertrand's problem is endowed with the canonically missing numerical determinations. It is noteworthy that, under Sergeyev's methodology, labelling invariance holds, either because there are no bijections joining the relevant spaces or because a straightforward generalisation of this notion for a finite sample space is available.

Summary
This paper explored one aspect of mathematical thinking, which is equally significant in a pure and an applied context, and may be referred to as the dynamics of determination and instrumentality. Certain mathematical problems, as well as mathematised empirical problems, occur within an enquiry as objects of investigation calling for symbolic instruments adequate to their character and, thus, capable of tackling them. It may well be the case that a canonical array of instruments should prove insufficient to carry out a successful intervention upon a problem, in which case the forging of new instruments is required, if progress in enquiry is to be made. Bertrand's paradox nicely illustrates a situation in which canonical instruments are not effective because they cannot render into computationally serviceable terms certain features of the problem at hand, namely numerical specifications of infinite collections of chords. Once new computational instruments, such as those coming from Sergeyev's methodology, are introduced, greater insight into the paradox can be gained and certain difficulties produced by exclusive resort to the canonical instrumentality of probability theory are overcome.