Quasi-probabilities in Conditioned Quantum Measurement and a Geometric/Statistical Interpretation of Aharonov's Weak Value

We show that the joint behaviour of an arbitrary pair of quantum observables can be described by quasi-probabilities, which are extensions of the standard probabilities used for describing the behaviour of a single observable. The physical situations that require these quasi-probabilities arise when one considers quantum measurement of an observable conditioned by some other variable, with the notable example being the weak measurement employed to obtain Aharonov's weak value. Specifically, we present a general prescription for the construction of quasi-joint-probability (QJP) distributions associated with a given pair of observables. These QJP distributions are introduced in two complementary approaches: one from a bottom-up, strictly operational construction realised by examining the mathematical framework of the conditioned measurement scheme, and the other from a top-down viewpoint realised by applying the results of spectral theorem for normal operators and its Fourier transforms. It is then revealed that, for a pair of simultaneously measurable observables, the QJP distribution reduces to their unique standard joint-probability distribution, whereas for a non-commuting pair there exists an inherent indefiniteness in the choice, admitting a multitude of candidates that may equally be used for describing their joint behaviour. In the course of our argument, we find that the QJP distributions furnish the space of operators with their characteristic geometric structures such that the orthogonal projections and inner products of observables can, respectively, be given statistical interpretations as `conditionings' and `correlations'. The weak value $A_{w}$ for an observable $A$ is then given a geometric/statistical interpretation as either the orthogonal projection of $A$ onto the subspace generated by another observable $B$, or equivalently, as the conditioning of $A$ given $B$.

We show that the joint behaviour of an arbitrary pair of (generally non-commuting) quantum observables can be described by quasi-probabilities, which are an extended version of the standard probabilities used for describing the outcome of measurement for a single observable. The physical situations that require these quasi-probabilities arise when one considers quantum measurement of an observable conditioned by some other variable, with the notable example being the weak measurement employed to obtain Aharonov's weak value. Specifically, we present a general prescription for the construction of quasi-joint-probability (QJP) distributions associated with a given combination of observables. These QJP distributions are introduced in two complementary approaches: one from a bottom-up, strictly operational construction realised by examining the mathematical framework of the conditioned measurement scheme, and the other from a top-down viewpoint realised by applying the results of spectral theorem for normal operators and its Fourier transforms. It is then revealed that, for a pair of simultaneously measurable observables, the QJP distribution reduces to the unique standard joint-probability distribution of the pair, whereas for a non-commuting pair there exists an inherent indefiniteness in the choice of such QJP distributions, admitting a multitude of candidates that may equally be used for describing the joint behaviour of the pair. In the course of our argument, we find that the QJP distributions furnish the space of operators in the underlying Hilbert space with their characteristic geometric structures such that the orthogonal projections and inner products of observables can, respectively, be given statistical interpretations as 'conditionings' and 'correlations'. The weak value A w for an observable A is then given a geometric/statistical interpretation as either the orthogonal projection of A onto the subspace generated by another observable B, or equivalently, as the conditioning of A given B with respect to the QJP distribution under consideration.

Introduction
Since the discovery of quantum mechanics in the beginning of the last century, our classical understanding of the concept of observables has undergone a drastic change. It is by now widely accepted that, in the microscopic world, measured values of a physical quantity, termed 'observable' in quantum mechanics, are intrinsically random, and that certain combinations of quantum observables do not admit coexistence, as exemplified typically by the pair of observables corresponding to the position and the momentum of a particle. Such remarkable characteristics of quantum observables impose a strong limitation to the mathematical framework to be employed for describing their probabilistic behaviour; namely, it is no longer possible, in general, to assign probability spaces for the description of the joint behaviour of their arbitrary combinations in the classical sense. Nonetheless, various attempts have been made to construct a proper mathematical framework for the probabilistic description of the combination of quantum observables that resembles the Kolmogorovian style of formulation of classical probability theory. Extending the notion of probability has since been one of the major trends, which yielded the extended notion of probability which goes generally by the name of 'quasi-probability' or 'pseudo-probability'. Among the most celebrated proposal is the Wigner-Ville (WV) distribution [1,2], commonly known as the Wigner function in the physics community, which is primarily considered for a canonically conjugate pair of quantum observables to describe their joint behaviour. Another, though less known, example is the Kirkwood-Dirac (KD) distribution [3,4], which is structured differently but is meant to serve a similar purpose for arbitrary pairs.
Historically, those proposals including the WV and KD distributions have been made more or less in a heuristic manner, and as such, the general mathematical framework for the study, including the prescription for the concrete construction of such distributions to a pair of arbitrary quantum observables, which may comprehensively be termed 'quasijoint-probability' (QJP) distributions, is still underdeveloped, not to mention a transparent overview of the relations among the QJPs. We know, for instance, that both the WV and KD distributions retain similar properties to the standard joint-probability distributions defined for a pair of classical random variables, but they exhibit their own outstanding queerness in that the former admits negative numbers to be assigned whereas the latter takes even complex numbers. However, we still do not know whether the peculiar properties of joint-probability including those of the WV and KD distributions, which have occasionally been considered a serious impediment to their physical interpretation, are a norm of QJP distributions, or there can be other types of examples which share classical properties of joint-probability in different aspects. The theme of this paper revolves around the concept of QJP distributions of quantum observables, with the first objective being to present a mathematically solid framework to address some of their problems in a more systematic and lucid manner.
Another motivation of this paper comes from the recent rise of interest in the novel quantum observable called the weak value, which has been put forward by Aharonov and coworkers [5] based on their time-symmetric formulation of quantum mechanics [6] proposed more than a half century ago. In simple terms, the weak value is a physical quantity that supposedly characterises the value of the observable A in the process specified by an initial state |ψ and a final state |ψ ′ both specified in advance. Unlike the standard physical value which is given by one of the eigenvalues of an observable A, the weak value admits a definite value for any A, and is envisaged to be meaningful even for a set of non-commutable observables simultaneously. This inspired a new insight for analysing the quantum nature of the system as well as for understanding various counter-intuitive phenomena in quantum mechanics based on the weak value. For instance, the complex-valued nature of A w allows for a direct measurement of the wave function, offering a novel technique to rival the existing technology of quantum tomography. This in turn alludes us to contemplate on the possible trajectory of a particle [7,8], a notion which has conventionally been deemed untenable due to the incompatibility of measuring the position and the momentum simultaneously. The weak value also admits novel physical interpretations on such fundamental aspects of quantum mechanics as the wave-particle duality and the local existence of the physical quantity itself, offering us a possible resolution to some of the quantum paradoxes, including the three-box paradox [9], Hardy's paradox [10] and the Cheshire cat paradox [11].
Despite its growing attention, the status of the weak value in quantum mechanics is still not solid, and especially its physical interpretation is still open to debate. One of the recent strategies in addressing this question has been to investigate its relations to quasi-probabilities, specifically those to the KD distribution [12,13]. In this paper, we shall follow this line of study and show, among others, that a novel geometric/statistical interpretation emerges from these distributions. This necessitates a sound mathematical basis of QJP distributions, which we will provide in the course of our discussions.
The main theme of this paper is thus to obtain a more coherent understanding of the formalism of QJP distributions of quantum observables, and subsequently to apply the results in some areas of the foundational problems of quantum mechanics. In view of this, the key problems regarding QJP distributions may be to (i) provide a reasonably solid mathematical framework for the study of QJP distributions based on measure and integration theory, and possibly on the theory of generalised functions, (ii) present a viable scheme to address the inherent indefiniteness/arbitrariness to the possible candidates for QJP distributions of non-commuting pairs of quantum observables, a methodical way for their constructions, and the relation between each of the candidates, and (iii) devise a procedure for measuring such various candidates of QJP distributions in a systematic manner.
We shall address these problems from two complementary approaches: one from a bottom-up, strictly operational construction realised by carefully reviewing the mathematical description of the conditioned measurement scheme, and the other from a top-down viewpoint realised by applying the results of spectral theorem for normal operators and its Fourier transforms. The results of the study shall be subsequently applied to the analysis for the physical interpretation of the weak value. To this end, we first concentrate on the L 2 structures which the QJP distributions naturally induce, and observe that they furnish a statistical interpretation of the geometric structures introduced on the space of observables in the underlying Hilbert space, analogously to those introduced in the space of random variables in classical probability theory. Geometric concepts such as orthogonal projections and inner products are accordingly endowed with statistical interpretations as 'conditionings' and 'correlations', respectively, and in addition the representation of linear operators by functions provides us with a convenient tool for evaluating statistical quantities involved. These observations form a basis to perform further study on the weak value in general. As a result, the weak value A w is given a geometric/statistical interpretation: either as the orthogonal projection of an observable A on the subspace generated by another observable B which is determined by one of the predetermined states entering in the weak value, or equivalently, as the conditioning of A given B with respect to the QJP distribution under consideration. Although we shall not discuss it here, we mention that this interpretation also leads to a set of novel and remarkable inequalities of uncertainty relations for approximation/estimation which are capable of treating both the standard position-momentum inequality and the time-energy inequality [14].
As for the practical outcomes of our argument laid out for QJP distributions, we mentioned earlier the systematic construction of QJP distributions and the geometric/statistical interpretation of the weak value, but each of these can be made more explicit as follows. First, for the systematic construction of QJP distributions, we furnish a general prescription which ensures that it can describe the joint behaviour of an arbitrary pair of quantum observables. Specifically, inspired by the observations made on the Fourier transform of the product spectral measure of two simultaneously measurable observables A and B, we introduce a mixture #(s, t) of the disintegrated components of e −isA and e −itB with real parameters s, t for arbitrary pairs of (generally non-commuting) observables A and B, and thereby define the QJP distribution of the pair by the inverse Fourier transform of the distribution (s, t) → ψ, #(s, t)ψ / ψ 2 to a given quantum state |ψ . Each of the QJP distributions is then found to possess reasonable properties to be qualified as what its name suggests to be, and one can confirm that both the WV distribution and the KD distribution do belong to this class. The inherent arbitrariness observed to the candidates for QJP distributions is then understood as the possible variety of the way one could mix the disintegrated components of the unitary operators, which originates directly from the non-commutative nature of the pair of the observables A and B. A concrete measurement scheme for members of a specific subfamily of QJP distributions is further proposed.
For the geometric/statistical interpretation of the weak value, on the other hand, we start by noting that, as distributions, each QJP distribution naturally induces an L 2 structure. We will then find that the QJP distributions provide convenient methods of representing geometric structures in terms of the inner products of the form ⟪B, A⟫ ψ,α := 1 + α 2 · Bψ, Aψ ψ 2 + 1 − α 2 · Aψ, Bψ ψ 2 , −1 ≤ α ≤ 1, (1.2) which can be introduced on the space of operators in the underlying Hilbert space by integration of functions. With this inner product, we are allowed to consider orthogonal projections onto the subspaces E ψ (B) of operators generated by self-adjoint operators B, and find that the orthogonal projections can be interpreted as conditioning given B with respect to the 6 QJP distributions under consideration. The projection of the observable A on the subspace E ψ (B) is further found to be described by the weak value 1 , providing us with its proper geometric/statistical interpretation (Proposition 7.7).
Having furnished a general introduction to the topic of QJP distributions of quantum observables and the weak value along with a brief summary of the content, we now give the outline of the present paper. After this introductory section, we organize the main body, Section 2 to Section 7, into the following three logical groups of mutually interrelated topics: (A) QJP: Heuristic Construction Four sections starting from Section 2 to 5 are devoted to a heuristic and bottom-up construction of QJP distributions of a pair of quantum observables. This is accomplished by a thorough analysis on the mathematical formalism of two measurement schemes. One is the standard scheme, which we call the 'unconditioned measurement (UM) scheme', in which we measure an observable A under a given state as conventionally done (Section 2 to 3). The other is what we call the 'conditioned measurement (CM) scheme', in which under a given state we measure an observable A along with another observable B whose outcome is used for conditioning (Section 4 to 5). Each of these analyses will be conducted on the level of (conditional) expectations and (conditional) probabilities.
(i) UM I We start by reviewing, in Section 2, the UM scheme by a standard operator-centric approach, and investigate how one could reclaim the information of the target system by that means. (ii) UM II Subsequently, in Section 3, we take a closer look on the UM scheme in the level of probabilities, where the quantity of interest is now not only the statistical average, but also the 'raw' probability measure describing the probabilistic behaviour of the measurement outcomes of the meter observable, and discuss how one could recover the probability measure describing the outcomes of the target observable. (iii) CM I From Section 4 onward, we turn our attention to the CM scheme. In Section 4, we first conduct, in a parallel manner as we have done in the preceding Section 2, an analysis in the operator level, where now the quantity of interest becomes the conditional expectation of the meter observable given another conditioning observable B of the target system. (iv) CM II In Section 5, the study of the CM scheme is given a probabilistic approach, where the quantity of interest is the Wigner-Ville distribution of a pair of canonically conjugate observables on the meter system conditioned by the outcome of the conditioning observable B of the target system. We then see that this implies the existence of the concept of QJP distributions of pairs of generally non-commuting observables.
(B) QJP: Formal Definition Inspired by the heuristic arguments employed in the operational analyses over the preceding four sections, we devote Section 6 to the top-down construction of QJP distributions for arbitrary pairs of generally non-commutating quantum observables. We shall then summarise our findings obtained through Section 2 to Section 5 from a rather aerial viewpoint, discussing where the heuristic arguments and observations in the preceding sections find their places in this relatively general framework. (C) Application to the Interpretation of Weak Values As an application of the mathematical formalism provided so far, in Section 7 we conduct a study on the quantum analogue of correlations, which can be defined even for a pair of non-commuting observables. This leads us to the aforementioned geometric/statistical interpretation of the weak value as conditional quasi-expectations.
We shall finally summarise our results and give some concluding remarks in the last Section 8. Prior to our main discussions, however, we wish to say a few words about the mathematical preliminaries we supposed for the readers in preparing this paper. The formalism that we intend to provide necessarily requires, on top of the mandatory functional analysis, moderate acquaintance to measure and integration theory, preferably some familiarity with the basic terminologies in general topology, and ideally insight into the basic ideas of the theory of generalised functions. The obvious difficulty is then to find a decent balance between rigour and generality on one side, and accessibility on the other. To achieve this balance as much as possible, and assure our entire arguments to be fully accessible without any prior knowledge of advanced mathematics, we have included at the beginning of each section a subsection entitled Reference Materials containing a rather lengthy introduction of mathematical concepts that are used in the subsequent discussions. While the authors took care in introducing these mathematical concepts and their results in a self-contained manner to respect their logical sequence, these Reference Materials are primarily intended to serve as a convenient place to summarise the basic concepts and results in a crash-course, and as such, the mathematical theories presented there are not intended to be learned from scratch. For those who are interested in the mathematics itself are advised to be referred to standard textbooks on the respective topics, e.g., for general topology [15][16][17], measure and integration theory [18][19][20][21][22][23], functional analysis [24,25], and also those specifically targeting the audience from the physics community [26][27][28][29]. Naturally, those who are already familiar with the preparatory materials may safely skip them and directly go to the main arguments that follow.
Admittedly, the style of discussion found in this paper is heavily oriented toward mathematical rigorousness and logical clarity rather than brevity and physical intuition, especially compared to those found in the majority of the literature in physics. However, in spite of the possible initial hesitation that may be expected for the general readers due to the unfamiliarity of the style, the authors decided to adopt it in the belief that this way of presentation has its own merit, and that the costs will outweigh the rewards in the end. In fact, several important concepts and results from the branches of mathematics mentioned above (specifically, measure and integration theory and functional analysis) are quite indispensable in understanding some of the interesting results obtained in this paper. This is so, for instance, in defining the conditional quasi-expectations (to which Aharonov's weak value belongs as a special case) in terms of the Radon-Nikodým derivative to understand their properties (Section 4.3.2), in formulating the problem of the 'limit of amplification' by conditioning in terms of essential suprema (Section 4.2.1), in defining a family of QJP distributions of a combination of generally non-commuting quantum observables by the method of hashing (Section 6), and in providing geometric and 'statistical' interpretation of conditional quasiexpectations (Section 7). The authors hope that the readers will not be discouraged by these mathematical materials, but rather enjoy them to go through the discussions and reach the fruit of the physical results they finally brings forth.

Mathematical Notations Employed.
Throughout this paper, we denote by K either the real field R or the complex field C, and define K × := K \ {0}. In order to avoid confusion, we denote the collection of all natural numbers including 0 by N 0 , and N × := N 0 \ {0}. Since our primary interest is on quantum mechanics, Hilbert spaces are always assumed to be complex. Conforming to the convention in physical literature, we denote the complex conjugate of a complex number c ∈ C by c * , and an inner product · , · defined on a complex linear space is anti-linear in its first argument and linear in the second. For simplicity, we adopt the natural units where we specifically have = 1, unless stated otherwise.

Unconditioned Measurement I: In Terms of Expectations
We start by providing a brief review on the archetype of the indirect measurement scheme widely known as the von Neumann measurement scheme. The scheme will be referred to as the unconditioned measurement (UM) scheme in generic terms throughout this paper, primarily in order to contrast it with the conditioned measurement (CM) scheme (which includes the post-selected measurement scheme as a special case) discussed later.

Reference Materials
As a preamble to this section, we here include three introductory topics that form the basis of our study. We start by collecting some of the basic terminologies and results of measure and integration theory, based on which modern probability theory was established by Kolmogorov et al. Subsequently, we provide a brief note on both the Schrödinger representation and the Weyl representation of the canonical commutation relations (CCR), which will be extensively employed in describing the meter system in our measurement scheme. We finally close this subsection by providing a short summary on the precise definition of tensor products of Hilbert spaces and that of self-adjoint operators. Since these materials are included just to make our presentation self-contained, those who are already familiar with the subject may safely skip the contents and proceed directly to Section 2.2.

A Crash-Course into Measure and Integration Theory.
We begin by presenting some of the most basic concepts and results of measure and integration theory, starting from the definition of measure spaces up to the construction of the Lebesgue integration, followed by the definition of L p spaces.

σ-algebras and Measurable Spaces.
Let X be any set, and let P(X) denote the power set 2 of X, i.e., the collection of all subsets of X. A family A ⊂ P(X) of subsets of X is called a σ-algebra over X, if it satisfies the following conditions: (iii) For any sequence (A n ) n≥1 of subsets of X, ∞ n=1 A n ∈ A holds. Given a σ-algebra A over X, each element A ∈ A is called a measurable set, and the ordered pair (X, A) is called a measurable space.
Generator of a σ-algebra.
A trivial, but important property of σ-algebras is that, for any collection (A i ) i∈I of σ-algebras over X indexed by an index set I, the intersection i∈I A i = {A ∈ P(X) : A ∈ A i , ∀i ∈ I} is itself a σ-algebra over X. This leads to the following basic fact: For any collection E ⊂ P(X) of subsets of X, there exists a smallest (with respect to the set inclusion) σ-algebra encompassing E, namely, the intersection of all σ-algebras that encompass E. The intersection is called the σ-algebra generated by E, denoted as σ(E), and E is in turn called the generator of σ(E).
Let X be a metric (or, in general, a topological) space, and let O denote the collection of all open sets of X. We call the σ-algebra generated by O, the Borel σ-algebra of X, and denote it by B(X) := σ(O). We prepare a special symbol for the special case X = R n (n ∈ N × ), in which we denote the Borel σ-algebra of K n by B n := B(R n ), which is among the most well-known examples of σ-algebras that, incidentally, also plays an important role in quantum theory. For simplicity, we occasionally denote B := B 1 whenever there is no risk of confusion.
Measures and Measure Spaces.
Let (X, A) be a measurable space. A map µ : A → R from the σ-algebra A to the extended real line R := R ∪ {−∞, ∞} is called a measure, if µ satisfies the following conditions: (iii) For any sequence (A n ) n≥1 of pairwise disjoint subsets of X, the countable additivity holds.
Given a measure µ over a measurable space (X, A), the ordered triple (X, A, µ) is called a measure space.
As a concrete example, we make notes on the n-dimensional Lebesgue-Borel measure β n (n ∈ N × ) defined on the measurable space (R n , B n ), which is among the most well-known and important examples of measure spaces. To this end, we first recall that a measure µ on (R n , B n ) is called translation invariant, if µ(B + a) = µ(B), B ∈ B n , (2.2) holds for any a ∈ R n , where B + a := {x + a : x ∈ B}. The Lebesgue-Borel measure β n is then specified as the unique translation invariant measure on (R n , B n ) that satisfies the normalisation condition β n (]0, 1] n ) = 1, where ]0, 1] n := {x ∈ R n : 0 < x i ≤ 1 for 1 ≤ i ≤ n, x i is the ith coordinate of x}.
This is the measure which is implicitly assumed for the most case in performing the usual integration by the symbol which is a common practice in the physics community (the precise definition of the integral on the r. h. s. will be presented shortly after). The proof of the existence and uniqueness of the Lebesgue-Borel measure will be found in most elementary textbooks on the topic.

Measurable Functions.
Let (X, A) and (X ′ , A ′ ) be measurable spaces. A map f : X → X ′ is called A-A ′ measurable (or just measurable for short, whenever the measure spaces concerned are obvious by context), if f −1 (A ′ ) ⊂ A holds. In particular, we call a map f : X → X ′ from a metric (or a topological) space X to another metric (or a topological) space measurable. An important fact to note is that a continuous map f : X → X ′ is necessarily Borel-measurable.

Numerical Functions.
In integration theory, it proves fruitful to consider not only real functions f : X → R, but also functions that take values in the extended real line R, which is called a numerical function. One naturally equips R with the ordering −∞ < a < +∞, a ∈ R, and may also define agreeable operations of addition, subtraction and multiplication, where most of them should be self-evident, except for the following rather arbitrary definition 0 · (±∞) := (±∞) · 0 := 0, ∞ − ∞ := −∞ + ∞ := 0. (2.5) We then define the σ-algebra on R by where, in particular, its restriction on the real line gives B| R = B. We then say that a numerical function f : Throughout this paper, we denote by M + (A) (or occasionally by M + , whenever the σ-algebra concerned is evident by context) the collection of all measurable non-negative numerical functions.
In introducing the concept of integration, we proceed in three steps: We first define the integration for non-negative step functions, then extend the treatment to functions belonging to M + , and finally discuss the integrability of measurable numerical or complex functions.
(i) Integration of Step Functions.
Let (X, A, µ) be a measure space. A measurable function f : (X, A) → (R, B) is called a step function (staircase function, simple function), if it takes only finite distinct values in R. The collection of all measurable non-negative step functions will be denoted by T + . One readily sees that a non-negative step function f ∈ T + admits an expression where a 1 , . . . , a m ≥ 0 are non-negative real numbers, A 1 , . . . , A m ∈ A are measurable sets, and χ A denotes the characteristic function of the subset A ⊂ X. We then define the (µ-)integral of f (over X) as whose value lies in [0, ∞]. Note that, although the expression (2.7) is non-unique due to the possible choice of the measurable sets used, the definition (2.9) is well-defined since the outcome of the integral is independent of the choice.
(ii) Integration of Functions in M + . Now that we have defined the Lebesgue integral of non-negative step functions, we next define the integral of non-negative measurable numerical functions. For f ∈ M + , the Lebesgue integral of f is defined as holds, which in particular implies Having provided the definition of the Lebesgue integration, we close this subsection by introducing an important class of function spaces: L p . Let L p (µ), 1 ≤ p < ∞, denote the space of all measurable functions f : X → K for which its L p -norm is finite. For p = ∞, we let L ∞ (µ) denote the space of all f for which its essential supremum is finite (such a function is called essentially bounded). The term essential supremum is justified by the fact that the evaluation |f | ≤ f ∞ µ-a.e. universally holds (to see this, {|f | > f ∞ + 1/n} is a set of measure zero). Now, by identifying two functions f, g ∈ L p (µ) by the equivalence relation f ∼ g ⇔ f = g µ-a.e., we obtain a quotient space L p (µ) := L p (µ)/ ∼. For simplicity, it is customary to denote an element of L p (µ) by its representative f ∈ L p (µ) whenever there is no risk of confusion. For f ∈ L p (µ), one finds that the quantity f p , 1 ≤ p ≤ ∞ is welldefined (irrespective of the choice of the representative), and that this in fact provides a norm on L p (µ), called the L p -norm. The norm · p is also known to be complete and hence makes L p (µ) into a Banach space. The case p = 2 is of particular interest in the context of quantum mechanics, where the integration, g, f := g * f dµ, (2.21) defines an inner product that satisfies f, f = f 2 2 , making L 2 (µ) into a Hilbert space. As a special case, we are mostly interested in the choice (R n , B n , β n ) of the measure space. Conforming to convention in physical literature, we prepare a special symbol for the L p spaces of it and denote L p (R n ) := L p (β n ).
Among the most important inequality regarding L p -spaces is the Hölder's inequality.
For the specific choice p, q = 2, the resulting inequality has its own name as the Cauchy-Schwarz Inequality.

Rudimentary
Techniques in handling the CCR. While the contents of the following topics are widely known, we include this material mainly for reader's convenience, and also for self-consistency and reference.
Schrödinger Representation of the CCR.
We start by recalling the definition of the Schwartz space. A function f : R n → K is called rapidly decreasing when holds for any γ := (γ 1 , . . . , γ n ) with γ ∈ N n 0 . Here, the multi-index symbol γ ∈ N n 0 is understood to be used as where D i := ∂/∂x i is the partial differentiation operator with respect to the variable x i . The space is then called the Schwartz space, and its elements are in turn called Schwartz functions. The Schwartz space is known to be a dense subspace S (R n ) ⊂ L p (R n ) for 1 ≤ p < ∞. A well-known example of Schwartz functions is provided by the form, x γ e −a|x| 2 ∈ S (R n ), γ ∈ N n 0 , a > 0. (2.26) Specifically, the Gaussian wave-functions, which also appear later in our analysis, are among the most oft-used members of the Schwartz space belonging to this class. Now that we have the necessary definitions, we return to the main topic of this subsection and, for simplicity, confine ourselves to the case n = 1 without loss of generality. We start by introducing a pair of important operatorsx andp on the Hilbert space L 2 (R). Among these,x : dom(x) → L 2 (R) is an operator on L 2 (R) defined by the multiplication of x on a function f ,x (2.27) with its domain, The operatorx is known to be self-adjoint and is called the (one-dimensional) position operator.
Next, consider the operator −iD defined on S (R) with D := d/dx being the usual differential operator in our case n = 1. The operator −iD : S (R) → L 2 (R) is known to be essentially self-adjoint, which allows us to define the (one-dimensional) momentum operator by its self-adjoint extension 3 ,p := −iD. (2.32) Here, the overline on a closable operator denotes its closure, which in the case of an essentially self-adjoint operator is equivalent to its (unique) self-adjoint extension. One then verifies that the pair {x,p} satisfies the familiar (one-dimensional) canonical commutation relations (CCR), One then concludes from the above argument that the combination, gives a concrete example for the representation of the CCR, called the (one-dimensional) Schrödinger representation of the CCR. 3 While the explicit identification of the domain of the operatorp is not quite straightforward, we mention that it is given by where f | J denotes the restriction of the function f on the interval J, and AC(J) denotes the space of all absolutely continuous functions on J.
Here, a function f : [a, b] → K is called absolutely continuous, if for every ǫ > 0, there exists a δ > 0 such that e isQ e itP = e −istI e itP e isQ , (2.39) e isQ e itQ = e itQ e isQ , e isP e itP = e itP e isP , (2.40) for s, t ∈ R. One of the advantages of the Weyl relations, as compared to the CCR, is that they deal only with unitary operators, for which no particular consideration for the domain of the involved operators is necessary because of their boundedness. Fortunately, in the present case one can actually prove that the pair {x,p} of the position and momentum operators introduced earlier satisfy the Weyl relations (2.39) and (2.40) on L 2 (R). This implies that the Schrödinger representation of the CCR {L 2 (R), S (R),x,p} furnishes an example of the Weyl representation of the CCR, at least in the case of the configuration space R. One also finds that this is true for the Euclidean configuration space R n . One may naturally be interested in how the Weyl representation of the CCR relates to the standard representation of the CCR. To this end, we first begin by collecting some of the necessary definitions and basic theorems. Recall that a vector-valued map F : with respect to the norm · on V , and in turn, strongly continuous on U if it is strongly continuous at every point of U . The map F is then called strongly differentiable at t 0 ∈ U with strong derivative holds, and accordingly strongly differentiable on U if it is strongly differentiable at every point of U . We will occasionally write its strong derivative in either of the notations, Now, let A : H ⊃ dom(A) → H be a self-adjoint operator on a Hilbert space H, and consider a one-parameter unitary group {e itA } t∈R (defined by means of functional calculus). Then, Stone's theorem on one-parameter unitary groups states that, on account of the boundedness of the unitary operator, for a fixed |φ ∈ H the unitary group yields a strongly continuous vector-valued map, one proves the differentiability of the r. h. s. for the choice of the initial state |ψ ∈ dom(P Q) ∩ dom(P ), which yields Turning to the l. h. s., differentiability also leads to where, in particular, e itP P |ψ ∈ dom(Q) is implied, and the first equality is due to the closedness of the operator Q (recall that a self-adjoint operator is necessarily closed). By combining the above two results, one has Taking t = 0, we learn that this in particular leads to the operator identity, on the subspace dom(P Q) ∩ dom(P ). One also sees from this result that the choice |ψ ∈ dom(P Q) ∩ dom(P ) automatically implies |ψ ∈ dom(QP ). Proceeding further from (2.40) by analogous reasoning, one eventually obtains the CCR (2.35) and (2.36) on the domain (2.37). In the case where D is dense, one sees that a Weyl representation of the CCR {H, {Q, P }} together with the subspace D indeed gives a representation of the CCR. In fact, in the case where H is separable, D is known to be dense.
In passing, we mention that the importance of the Weyl relations becomes evident when one considers configuration spaces, other than the Euclidean space R n , where no reasonable counterpart of the CCR can be defined. For instance, when the configuration space is given by a coset space G/H where G is a Lie group and H its subgroup (typical examples being the spheres S n ≃ O(n + 1)/O(n)), one can readily adopt the inherent group theoretic structure of the configuration space to define the Weyl relations extended to the space. Unlike the Euclidean case, such extended Weyl relations are known to admit a multiple of inequivalent representations.
2.1.3. Tensor Product of Hilbert Spaces and Self-adjoint Operators. We finally provide a brief review on tensor products of Hilbert spaces and those of self-adjoint operators. Although the topic is elementary, we find it beneficial to give a summary of its precise definition in consideration of its extensive use due to the nature of this paper focusing on indirect measurement schemes.

Algebraic Tensor Products.
Let V, W be K-vector spaces. We call an ordered pair consisting of a vector space V ⊗ W and a bilinear map ⊗ : V × W → V ⊗ W , an (algebraic) tensor product of vector spaces V and W , if for any K-vector space Z and a bilinear map commutes 5 (universal property of (algebraic) tensor products). Each element of V ⊗ W is called a tensor, and the bilinear map ⊗ is called the tensor map, the image of which is The thus defined tensor products are in fact unique up to isomorphism. Indeed if (V ⊗ W, ⊗) and (V ′ ⊗ ′ W ′ , ⊗ ′ ) were two of such, then by first letting Z = V ′ ⊗ W ′ and T = ⊗ ′ in the above diagram, and then subsequently by changing roles of (V ⊗ W, ⊗) and (V ′ ⊗ ′ W ′ , ⊗ ′ ), one concludes that ⊗ and ⊗ ′ are linear bijections with ⊗ ′ • ⊗ = I. In this sense, we may refer to (V ⊗ W, ⊗) as the tensor product of V and W , and forget about the way how it is constructed 6 . One of the basic facts worth of special note is that, given two bases {e i } i∈I and {f j } j∈J of V and W , respectively, the tensors {e i ⊗ f j } i∈I,j∈J form a basis of V ⊗ W .
Tensor Product of Hilbert Spaces.
We are specifically interested in tensor products of Hilbert spaces. For a pair of Hilbert spaces (H 1 , · , · H1 ) and (H 2 , · , · H2 ), we denote by their algebraic tensor product defined from their purely algebraic structures described as above. We then introduce defined for pairs of all tensors of the form D := {v ⊗ w : v ∈ V, w ∈ W }, and let it extend linearly on whole H 1 ⊗ H 2 = Span(D). Here, span(S) := {k 1 v 1 + · · · + k n v n : k i ∈ K, v i ∈ S, n ∈ N} (2.62) denotes the subspace of a K-vector space V spanned by a nonempty set S ⊂ V , i.e. the set of all finite linear combinations of vectors belonging to S. It is routine to check that the thus defined extension · , · H1 ⊗ H2 is well-defined, and one moreover proves that the extension in fact makes itself an inner product on H 1 ⊗ H 2 , making the pair (H 1 ⊗ H 2 , · , · H1 ⊗ H2 ) into a pre-Hilbert space (i.e., an inner product space). The tensor map ⊗ can be also shown to be continuous with respect to the topology that the inner product generates. We then finally define the completion of the pre-Hilbert space, and denote it by (2.64) From the universal property of the algebraic tensor product mentioned above, one readily sees the existence of a unique linear map commute. Note in particular that the diagram implies Extending both the domain and the range of (2.65), we can think of as an operator on the Hilbert space H 1 ⊗ H 2 .
Tensor Product of Self-Adjoint Operators. Now, for a pair of densely defined closable operators A i : H i ⊃ dom(A i ) → H i , i = 1, 2, the operator (2.68) itself is known to be closable, whereby we define the tensor product of the pair by its closure. Specifically, since self-adjoint operators are densely defined and closed, the tensor product (2.69) is always well-defined. Although self-adjointness is not preserved in general by taking (2.68), its essential self-adjointness is at least known to be guaranteed. As the closure of an essentially self-adjoint operator, this makes the tensor product (2.69) itself self-adjoint, which is precisely the definition of the tensor product of self-adjoint operators.

Unconditioned Measurement
Now that we have reviewed the necessary materials, we begin our study on the unconditioned measurement scheme. Suppose that the experimenter wishes to extract information of the combination of a given but unknown observable A and a state |φ ∈ H of the target system, without direct access to it. To accomplish this, one first arranges an auxiliary meter system K equipped with a pair of observables {Q, P } for which {K, {Q, P }} gives a Weyl representation of the CCR. As we have seen above, the choice K = L 2 (R), Q =x and P =p gives a concrete example. One then prepares the meter system in a certain initial state represented by the vector |ψ ∈ K, and combines the two systems into the direct product state |φ ⊗ ψ ∈ H ⊗ K.

Target Meter
Target Meter Fig. 1 A graphical illustration of the unconditioned measurement scheme. The figure is to be read from top to bottom. The initial state preparation stage of both the target and the meter systems is depicted in the top part, and the manner in which the two quantum systems undergoes a von Neumann type interaction is illustrated in the middle part. The composite system after the interaction, which is depicted in the bottom part, generally becomes entangled. One finally performs a measurement of an observable X on the meter system.
Choosing an observable Y of the meter system K either by Y = Q or Y = P , the composite system is subjected to a von Neumann type interaction, i.e., a unitary evolution on the composite system parametrised by a real number g, which is often interpreted as the intensity, its time duration, or the combination thereof, of the interaction between the two systems. Finally, the experimenter performs local measurement of an observable X of the meter system K by choosing either by X = Q or X = P (chosen independently of Y ), or equivalently I ⊗ X on the generally entangled composite state |Ψ g after the interaction (see figure 1). As a preparation for further analysis, we first introduce the reduced density operator representing the state of the meter system K after the measurement 7 obtained by taking the partial trace of the composite state |Ψ g with respect to the target system H. The quantity of interest for our measurement is thus the expectation value of the observable I ⊗ X on the composite state |Ψ g after the interaction, which can interchangeably be written in terms of that of the local observable X on the density matrix |ψ g of the meter system. 7 Here we are adopting, instead of the more common usage ρ g , a slightly unusual notation ψ g to denote the generically mixed state of the meter. This we do because we wish to reserve the letter ρ for the density of some absolutely continuous complex measures (see Section 3.1.2). However, our notation has an advantage on its own in that, if we also write the state as |ψ g when it is pure as we usually do, the correspondence between the two, ψ g and |ψ g (both represent the same state), becomes obvious.
Main Objective of this Subsection.
The main objective of this subsection is to demonstrate the following basic proposition, which provides the sufficient condition for its well-definedness and its explicit evaluations. For definiteness, we shall from now on fix Y = P without loss of generality.

Proposition 2.2 (Unconditioned Measurement I).
In the context of the UM scheme, let Y = P for definiteness. Given the right choices of the initial states of both the target and the meter systems, depending on the choice of the observable X on the meter system to be measured, the composite state after the interaction lies in |Ψ g ∈ dom(I ⊗ X), g ∈ R. The expectation value (2.72) thus remains finite for all range of the interaction parameter, which reads for each of the choice of X.

Some Operator Identities.
Before we move on to the proof, we make notes on some important operator identities that will be extensively used throughout this paper. Our analysis is based on the following operator identities on the composite Hilbert space H ⊗ K, similar to those of (2.39) and (2.40 where Π an is the projection on the eigenspace associated with the eigenvalue a n . In the case where the eigenspace is one-dimensional (or non-degenerate), one may write Π an = |a n a n | with the eigenstate |a n for which A|φ n = a n |a n holds (more on the topic of spectral decomposition in Section 3.1.5). Now, for an arbitrary self-adjoint operator Z on the meter system K, one may expect from the defining property Π 2 an = Π an of projections that the 23 formal computation is legitimate. This in fact turns out to be correct as an operator identity on H ⊗ K with full rigour, which can be proven in a fairly straightforward manner by means of rudimentary techniques of functional calculus. One then has Π an ⊗ e itan(P −sI) I ⊗ e isQ = e itA⊗(P −sI) e isI⊗Q = e −istA⊗I e itA⊗P e isI⊗Q , s, t ∈ R, (2.78) which proves (2.74) for our special case, where we have used (2.39) in the third step. Returning to the general case in which A is now an arbitrary self-adjoint operator, one observes that the well-definedness of both the left-most and right-most hand sides of the above equality remains valid. From this, one may expect that the same result also holds for the general case, which indeed turns out to be true (as usual, one may prove this without much difficulty through rudimentary techniques of functional calculus).

Measurement Outcomes.
We now return to the main problem of this subsection. We are interested in finding the condition for which (2.72) is well-defined, and subsequently in obtaining an explicit formula in terms of the components of both the target and the meter system. Since most of the techniques employed here is the same as those introduced in Section 2.1, we shall proceed by sketching the proofs.
Proof of Proposition 2.2. Let us begin by choosing the operator X = Q for the measurement of the meter system, and thereby rewrite the r. h. s. of (2.74) to obtain e isI⊗Q e itA⊗P = e itA⊗P e is(I⊗Q−tA⊗I) , s, t ∈ R (2.79) for better usability 8 . By differentiating both sides of the above equality and taking s = 0, an analogous argument given earlier for obtaining (2.51) leads to the operator identity, Here, it may be worthwhile to note the analogy between (2.51) and (2.80). To put this in our context, let t = −g above. If one chooses |ψ ∈ dom(Q) as the meter state, and likewise assumes |φ ∈ dom(A) as the system state prepared prior to the interaction, one has in particular |φ ⊗ ψ ∈ dom(I ⊗ Q + gA ⊗ I). Then, equating |Φ = |φ ⊗ ψ in (2.81), we find (2.82) This guarantees that the expectation value (2.72) of the observable I ⊗ Q on the composite state |Ψ g remains finite and is given by for any such combination of the initial states. Evidently, for the choice X = P , one finds the validity of the operator identity e itA⊗P (I ⊗ P )e −itA⊗P = I ⊗ P, t ∈ R, (2.84) on the subspace dom(I ⊗ P ) by analogous reasoning. From this, one readily concludes that the expectation value of I ⊗ P reads which is well-defined for any choice of the state |ψ ∈ dom(P ) of the meter system and g ∈ R, irrespective of the initial choice of the state |φ ∈ H of the target system.

Recovery of the Target Profile
Now that we have revealed the explicit behaviour of the measurement outcomes of the meter, we are thus interested in recovering the information of the target system from it. As one may expect from the statement in Proposition 2.2, the information of the target system (which should essentially consist of the specification of the pair of A and |φ ) manifests itself in the form of the expectation value E[A; φ]. In recovering the desired information, one subsequently recognises from (2.73) that it fully suffices to examine only the outcomes of the measurement of the observable X conjugate to Y , and there is no use for that of the choice X = Y (this is to be contrasted with the conditional measurement we discuss later). Specifically, one finds below that there are two typical techniques in obtaining the desired information: one is to investigate the behaviour of the measurement outcome (2.73) in the strong region g → ±∞ of the interaction parameter, and the other is to examine the local behaviour of it around g = 0, which shall be respectively called the strong unconditioned measurement and the weak unconditioned measurement in this paper.
2.3.1. Strong Unconditioned Measurement. Our result (2.73) shows that the expectation value of the measurement of X = Q behaves linearly with respect to g, and that its growth is proportional to the expectation value E[A; φ] of the target observable. The experimenter would thus divide the measurement outcomes of Q by g and then take the limit of the strong coupling g → ±∞ (or equivalently g −1 → 0): which implies that the expectation value E[A; φ] of our interest may also be obtained as the first differential coefficient (n = 1) of the measured outcome at g = 0.
2.3.3. Discussion. While this whole section consisted of rather trivial results, the line of arguments presented here serves as the baseline of our analysis throughout this paper. Namely, we first examine the full behaviour of the target of our measurement (for this section, is was the expectation value (2.72) of the observable X of the meter) and intend to obtain 9 Alternatively, one may consider the shift of the expectation value, for g ∈ R as a quantity directly related to the observable A of the system. For the choice X = Q, one then simply has which might be a more straight-forward way to be employed practically.
an explicit description of how the profile of the initial configuration of the the target system gets mixed into that of the meter system through the interaction (which, for the current case, is the result (2.73)). We then intend to extract the information of the target system (for this section, it is the expectation value E[A; φ]) by separating it from the measurement outcomes. Specifically, we find that examining either the strong or the weak region of the interaction parameter g reveals itself useful for this purpose, and this should be the strategy that we take in the subsequent sections.
In the next section, the UM scheme is analysed in depth in terms of probabilities, following the same line as described above. Specifically, while the distinction between the strong and the weak measurements looked rather vague at the operator level, we shall see shortly that these two strategies are recognised to be qualitatively different from the viewpoint of probabilities.

Unconditioned Measurement II: In Terms of Probabilities
We have so far conducted an analysis of the UM scheme on the operator level, where the quantity of interest is the expectation value of an observable. However, one may be interested in the raw information that the measurement provides, i.e., the probability describing the behaviour of each measurement outcomes, which is the target of our study in this section.

Reference Materials
To prepare for our discussion, we here provide a concise summary on the topic of complex measures and integration with respect to them. We next make a brief review on the spectral theorem for self-adjoint operators and recall the general framework for describing the ideal measurement of a quantum observable. Subsequently, we expound on density functions and see how this relates to the description by measures.
3.1.1. The Space of Complex Measures. As a preparation in dealing with the spectral theorem for self-adjoint operators, we collect below the basic definitions and results regarding complex measures and integration with respect to them.

Signed Measures, Jordan Decomposition and Total Variation.
Let (X, A) be a measurable space. A map ν : A → R is called a signed measure, if it satisfies the following properties: Countable additivity (2.1) holds for any sequence (A n ) n≥1 of pairwise disjoint subsets of X.
They are, in a sense, generalisations of the concept of the standard measures by allowing negative numbers to be assigned to each measurable sets. A signed measure ν is called finite if ν(A) ⊂ R. One of the most important properties of a signed measure is described by the Jordan decomposition theorem, which states that every singed measure ν has the Jordan decomposition, i.e., a unique decomposition of ν into a difference of two measures ν + and ν − , respectively called the positive and negative variation of ν, and at least one of which being finite. Here, the positive and negative variations are singular to one another, denoted as ν + ⊥ ν − , in the sense there exists a decomposition of X = P ∪ N into two measurable sets such that ν + (N ) = 0 and ν − (P ) = 0 holds. The Jordan decomposition is minimal in the following sense: Given any decomposition ν = ρ − σ of ν into two measures ρ, σ, at least one of which being finite, then ν + ≤ ρ, ν − ≤ σ holds.
called the variation of ν. We then define its total variation by ν := |ν|(X), which is nothing but the evaluation of the whole space X by the non-negative measure |ν|. One proves that the total variation defines a norm on M(A), and in fact makes (M(A), · ) into a real Banach space.

Complex Measures.
Let (X, A) be a measurable space. A map ν : A → C is called a complex measure, when it is countably additive (2.1). One sees that ν is a complex measure if and only if both its real and imaginary parts Re [ν], Im [ν] are finite signed measures. Analogous to the case of signed measures, the collection M C (A) of all complex measures on (X, A) becomes a C-linear space, equipped with the natural addition and scalar multiplication. For a complex measure ν ∈ M C (A), we define the variation of a measurable set A ∈ A by and also its total variation, The definition coincides with the previous definition when ν happens to be a signed measure. The total variation ν of ν is known to be the smallest positive measure µ on (X, A) satisfying |ν(A)| ≤ µ(A), A ∈ A. In parallel to the case of signed measures, one finds that the total variation defines a norm on the linear space M C (A) and makes (M C (A), · ) into a complex Banach space.

Integration over Complex Measures.
It is now tempting to define integration with respect to complex measures, as a natural extension to that defined for (standard) measures. For a complex measure ν ∈ M C (A), we let ρ := Re [ν], σ := Im [ν] and consider the intersection of the spaces (3.5) where ρ ± and σ ± respectively being the positive and negative variations of ρ and σ. We then define the Lebesgue integral of f ∈ L 1 (ν) with respect to ν by Linearity of the Lebesgue integral with respect to the complex measure follows naturally as expected.
New Measure from Old. There are several ways to construct a new (complex) measure from a given measure. We mention below two of the most important manners that are frequently employed throughout this paper.
(A) Measure with Density.
Let (X, A, µ) be a measure space. Given a µ-integrable function f : X → C, one may define a complex measure by (3.7) The complex measure constructed in this manner is occasionally called the complex measure with the density f with respect to µ, and we write it as ν = f ⊙ µ. A measurable function g : X →K is known to be (f ⊙ µ)-integrable, if and only if the product g · f is µ-integrable, in which case the equality Let (X, A, µ) be a measure space. Given another measurable space (Y, B) and a measurable map f : X → Y , one may construct a new measure on (Y, B) by called the image measure (push-forward measure) of µ with respect to f . A measurable function g : Y →K is known to be f (µ)-integrable, if and only if the composition g • f is µ-integrable, in which case the the change of variables formula holds.

Measure Algebra.
The space of complex measures has an additional well-known structure regarding convolutions. The convolution of the two complex measures µ, ν ∈ M C (B n ) is defined by (3.11) One can easily confirm that the convolution is a bilinear operation, and is moreover shown to be associative µ * (ν * ρ) = (µ * ν) * ρ and commutative µ * ν = ν * µ. Together with the evaluation µ * ν ≤ µ ν based on the total variation norm (3.4), one sees that the convolution makes the complex Banach space M C (B n ) into a complex commutative Banach algebra, called the measure algebra of B n . The measure algebra M C (B n ) has a multiplicative identity e given by the delta measure e = δ 0 centred at the origin, that is, holds for all µ ∈ M C (B n ). Here, the delta measure (or the Dirac measure) δ a is a finite measure centred at a ∈ R n defined by characterised by the integral whenever the integration is well-defined. It is essentially the same object as the delta distribution that appears in the theory of generalised functions.
3.1.2. The Space of Density Functions. For later use, we are particularly interested in the special subspace of the space M C (B n ) of complex measures, namely, the space of absolutely continuous complex measures with respect to the Lebesgue-Borel measure β n . We shall provide a concise review on its definition, make comments on its relation to the space of complex density functions, and sees that the subspace reveals itself to be a sub-algebra of the measure algebra.
Absolute Continuity and Density Functions.
Let µ and ν be signed (or complex) measures on a measurable space (X, A). We say that ν is µ-continuous or absolutely continuous with respect to µ, written as ν ≪ µ, if µ(A) = 0 implies ν(A) = 0 for all A ∈ A. A signed measure µ is called σ-finite if there exists a sequence (A n ) n≥1 of disjoint measurable sets A n ∈ A satisfying X = ∞ n=1 A n and |µ(A n )| < ∞ (n ∈ N × ). By definition, finite measures are always σ-finite. The Lebesgue-Borel measure β n is among the most important examples of σ-finite measures. The following theorem is of great importance.
Theorem (Radon-Nikodým Theorem for Complex Measures). Let µ be a σ-finite measure and ν ≪ µ be a complex measure. Then, ν has a density with respect to µ, that is, there exists a µ-integrable function ρ : X → C such that ν = ρ ⊙ µ, and ρ is unique µ-a.e. If ν happens to be positive, then one may choose ρ ≥ 0.
In the above situation of the Radon-Nikodým theorem, the function ρ satisfying ν = ρ ⊙ µ is called the Radon-Nikodým derivative (or more casually, the density), and is denoted by This is nothing but to say that holds, if explicitly written out. For a ν-integrable function f , a direct application of (3.8) leads to in which the notation for the Radon-Nikodým derivative (which might at first seems strange) reveals its advantage.
Absolute Continuity with respect to the Lebesgue-Borel Measure.
We are particularly interested in the sub-family L 1 (B n ) ⊂ M C (B n ) consisting of complex measures that are absolutely continuous with respect to the Lebesgue-Borel measure β n on (R n , B n ). Whenever there is no risk of confusion, members of L 1 (B n ) shall occasionally be referred to as absolutely continuous measures, simply without reference to the base measure β n . One readily finds that the collection L 1 (B n ) forms a linear subspace of M C (B n ). Now, uniqueness β n -a.e. of the Radon-Nikodým derivative allows us to define a linear map which maps an absolutely continuous complex measure to its density. Conversely, one may construct a new complex measure given an integrable function f ∈ L 1 (R n ) by ν := f ⊙ β n . From this, one obtains a bijective linear map between the space of absolutely continuous complex measures L 1 (B n ) and the space of integrable functions L 1 (R n ), associating an absolutely continuous complex measure ν ∈ L 1 (B n ) to its density dν/dβ n ∈ L 1 (R n ). In this manner, one may identify a specific subspace of the space of complex measures with that of integrable functions as L 1 (B n ) ∼ = L 1 (R n ), (3.19) and may translate and interpret various properties of complex measures in terms of density functions. To discuss how this works, let dν/dβ n ∈ L 1 (R n ) be the density of ν ∈ L 1 (B n ) with 31 respect to the Lebesgue-Borel measure. One confirms from (3.17) that, for any measurable function g, the equality holds whenever the integration exists. In this manner, one may replace the Lebesgue integration of g with respect to the complex measure ν (the l. h. s.) by that with respect to the Lebesgue-Borel measure with the help of the (possibly more familiar notion of) density function dν/dβ n (the r. h. s.).

Convolution Algebra.
The space L 1 (B n ) of absolutely continuous complex measures is readily shown to be a topologically closed subset (with respect to the topology induced by the total variation norm · in (3.4)) of the Banach space M C (B n ). This implies that the subspace L 1 (B n ) is itself a Banach space. One then finds that the linear bijection (3.18) between the two Banach spaces actually defines an isometric (linear) isomorphism, which is to say that holds for all ν ∈ L 1 (B n ), where the l. h. s. is the total variation norm (3.4) of the complex measure ν and the r. h. s. is the L 1 -norm (2.19) of its density function. We next see how this bijection plays with convolution. To this end, we first recall that a linear subspace I of a commutative algebra A is called an ideal if it 'absorbs' multiplication by elements of A, i.e., i ∈ I, a ∈ A ⇒ i · a = a · i ∈ I. (3.22) In fact, it is known that the subspace L 1 (B n ) forms an ideal of the measure algebra M C (B n ), which is to say that In passing, the density of the convolution µ * ν above is given by the convolution of the density of µ and the complex measure ν as in which we understand the convolution of an integrable function f ∈ L 1 (R n ) and a complex measure µ ∈ M C (B n ) to be where the integral is well-defined β n -a.e. for x ∈ R n . In particular, being an ideal trivially implies that the space L 1 (B n ) of absolutely continuous complex measures is closed under the operation of convolution, i.e., it forms a sub-algebra of the measure algebra M C (B n ). Applying (3.17) to (3.24), one concludes that the density of the convolution of two absolutely continuous complex measures µ, ν ∈ L 1 (B n ) is given by the convolution of their densities as d(µ * ν) dβ n = dµ dβ n * dν dβ n , (3.26) in which we understand the familiar convolution of two integrable functions f, g ∈ L 1 (R n ) to be where the integral is well-defined β n -a.e. for x ∈ R n . Equality (3.26) implies that, equipped with the convolution (3.27), the space L 1 (R n ) of integrable functions becomes a Banach algebra that is isomorphically mapped to the sub-algebra L 1 (B n ) ⊂ M C (B n ) by the isometric algebra isomorphism (3.18). Incidentally, the sub-algebra L 1 (B n ) ∼ = L 1 (R n ) of the measure algebra is given its own name, and is occasionally called the convolution algebra.
At this point, we note that the convolution algebra L 1 (B n ) is a proper sub-algebra of the measure algebra M C (B n ) in general, i.e., not every complex measure may be represented by integrable functions. This can be readily seen by observing that the delta measure δ a centred at a ∈ R n (3.13) does not admit a description by density functions. Intuitively, such a density function, if existed, would be given by the 'delta function' centred at a, but it is actually a distribution and not a member of L 1 (R n ) as required. This leads to the basic fact that the convolution algebra L 1 (R n ) is non-unital, i.e., it lacks a multiplicative identity in the sense that there is no element e ∈ L 1 (R n ) for which holds for all f ∈ L 1 (R n ). This should be contrasted to the measure algebra M C (B n ), which always possesses a multiplicative identity. to be the product-σ-algebra of A and B. The following fact and definition is of importance.
Definition (Product Measure). Given two measure spaces (X, A, µ) and (Y, B, ν), let both µ and ν be σ-finite. Then there exists a unique measure µ ⊗ ν : holds. The measure µ ⊗ ν is σ-finite and is called the product measure of µ and ν.
The integration with respect to the product measure µ ⊗ ν of two σ-finite measures µ and ν can be performed by iterated integration of each of the respective variables. This is the essence of the following Fubini's Theorem, which belongs to one of the most oft-used theorems of integration theory.
Theorem (Fubini's Theorem). Let µ and ν be σ-finite. Then, the following statements hold: The functions x are respectively µ-integrable on A c and ν-integrable on B c , and the equalities (3.36) is finite, then all three of them are finite and agree, f is µ ⊗ ν-integrable, and the statements under (i) hold.
3.1.4. Measure on Topological Spaces. Let X be a metric space (or a topological space). One may naturally be interested in how the topology relates to the complex measures defined on the Borel σ-algebra B := B(X) generated by it. To this end, we briefly review one of the prominent results in the study of this realm, namely the famous Riesz-Markov-Kakutani Representation Theorem. In order to avoid complexity, we shall only deal with the case where the given measurable space is (R n , B n ). Observing now that a complex measure ν ∈ M C (B n ) generates an (algebraic) linear map f → R n f dν that maps a function to a complex number, the opposite question is then our interest, namely: what class of linear functionals admits representation by integration with respect to some complex measure?

Riesz-Markov-Kakutani Representation Theorem.
Let C 0 (R n ) be the space of all continuous functions f : R n → C that vanish at infinity, in the sense for every ǫ > 0 there exists a compact subset K ⊂ R n for which |f |K c | < ǫ holds. The space C 0 (R n ) equipped with the supremum norm f ∞ := sup{|f (x)| : x ∈ R n } is known to be a Banach space. Now for each ν ∈ M C (B n ), the map gives rise to a continuous (i.e., bounded) C-linear functional from C 0 (R n ) to C, for indeed the evaluation holds. The Riesz-Markov-Kakutani representation theorem is a classical theorem in measure and integration theory stating that the converse is also true, which is to say that, for any continuous C-linear functional I ∈ C ′ 0 (R n ), there exists a unique complex measure ν ∈ M C (B n ) 34 for which holds. The precise statement is given as follows.
Theorem (Riesz-Markov-Kakutani Representation Theorem for Euclidian Spaces). The correspondence that maps a complex measure to a continuous linear functional on C 0 is a bijection, which moreover satisfies In other words, the space of complex measures M C (B n ) is isomorphic to the topological dual of C 0 (R n ), and can be mapped to each other by an isometric isomorphism.
Here, the norm on M C (B n ) on the r. h. s. of (3.41) is naturally the total variation norm, and the norm on the topological dual C ′ 0 (R n ) (the l. h. s.) is the operator norm defined by In this sense we identify and may interchangeably interpret a continuous C-linear functional on the space C 0 (R n ) as a complex measure on the measurable space (R n , B n ), and vice versa.
3.1.5. Spectral Theorem and its Consequences. We next provide a concise review on some of the basic facts regarding the spectral theorem for self-adjoint operators, which is just the generalisation of the familiar eigendecomposition theorem for Hermitian matrices on finitedimensional vector spaces to the arbitrary dimensional case. In order to avoid confusion with operators, Borel sets on R n shall occasionally be denoted by ∆ ∈ B n in place of B, especially when we are working in the context of quantum mechanics.

Spectral Measures.
Closely associated to the notion of complex measures is that of spectral measures on a Hilbert space H. Let L(H) denote the space of all bounded operators on H, and recall that a map is called an n-dimensional spectral measure (or projection-valued measure), if each E(∆), ∆ ∈ B n is an orthogonal projection on H and satisfies The support of a spectral measure E on B n is defined as the smallest Borel set ∆ ∈ B n that satisfies E(∆) = I. An important point is that a spectral measure E and a pair of vectors |φ , |φ ∈ H induce a complex measure on B n given by Spectral Theorem of Self-adjoint Operators.
Having recalled the necessary definitions, we now state the spectral theorem for self-adjoint operators, which constitutes one of the most important mathematical ingredients in quantum mechanics. Theorem (Spectral decomposition of self-adjoint operators). Let A : H ⊃ dom(A) → H be self-adjoint. Then there exists a unique one-dimensional spectral measure E A supported on the spectrum σ(A) ⊂ R of A satisfying where the r. h. s. of the equality is understood as the Lebesgue integral with respect to the complex measure ∆ → φ ′ , E A (∆)φ induced from E A and the pair of vectors |φ and |φ ′ .
Under the situation above, the self-adjoint operator A is occasionally written symbolically as (3.48) in terms of integration with respect to its spectral measure.

Finite-dimensional Case.
To see the meaning of the above formula, we make a brief note on how the familiar eigendecomposition theorem for Hermitian matrices appears as a special case of the general statement. Let A be a Hermitian matrix on an N -dimensional complex Hilbert space H := C N , N ∈ N × . The eigendecomposition theorem states that, there exists an orthonormal basis B A := {|a 1 , . . . , |a N } of H with real numbers a 1 , . . . , a N ∈ R such that A|a i = a i |a i , i = 1, . . . , N, (3.49) hold. For each eigenvalue a ∈ σ(A) = {a 1 , . . . , a N } of A, we have the projection Π a onto the subspace, spanned by the collection of all eigenvectors associated with a. As we noted before, when the eigenstate |a is non-degenerate for a, or the subspace H a is one-dimensional, we may write Π a = |a a|. With the projection Π a in hand, the spectral measure of A is defined by with the convention a∈∅ Π a := 0. One readily verifies that E A is indeed a spectral measure supported on its spectrum σ(A), and subsequently sees that the projection Π a = E A ({a}) is nothing but the image of the spectral measure E A on the Borel set {a} ∈ B consisting of a single eigenvalue a ∈ σ(A) of the observable A. One then finds in accordance with (2.76), and subsequently proves The spectral decomposition formula (3.47) and the formal expression (3.48) are respectively just the generalisations of the finite dimensional versions (3.53) and (3.52).

Functional Calculus.
By means of the spectral decomposition of a self-adjoint operator, one may create a new set of operators from it. Let A : H ⊃ dom(A) → H be a self-adjoint operator on a Hilbert space H, and let E A the unique one-dimensional spectral measure associated with it. Given a measurable complex function f : R → C, the integral is any vector belonging to its domain, and |φ ′ ∈ H. The operator f (A) is occasionally written symbolically as (3.56) in terms of integration with respect to its spectral measure.

Born Rule and Quantum Measurement.
The axiom of quantum mechanics states that a quantum observable is represented by a self-adjoint operator A : H ⊃ dom(A) → H on a Hilbert space H, and that the probabilistic behaviour of the outcomes of an ideal measurement of A on the state |φ ∈ H is described by the probability measure, Here, the spectral measure E A is induced from A by the spectral theorem, and the Born rule proclaims that the measurement outcome be given by one of the elements in the spectrum σ(A) and that µ φ A (∆) provides the probability of finding the measurement in the measurable set ∆ ∈ B. Given |φ ∈ dom(A), one then realises from the spectral theorem (3.47) that the statistical average of the measurement outcomes of A gives the expectation value, where the l. h. s of the first equality is understood to be the Lebesgue integral with respect to the probability measure (3.57).
3.1.6. Observables admitting a Description by Density Functions. While the analysis based on probability measures provides an adequately general framework to work with, we find it useful to prepare a terminology for a special class of observables for which probability density functions, not just probability measures, are available to fully describe the behaviour of the measurement outcomes.
Observable admitting a description by probability density functions.
In this paper, we simply say that an observable A admits a description by probability density functions, if the probability measure (3.57) induced by the spectral measure of A is absolutely continuous with respect to the Lebesgue-Borel measure for every choice of the quantum state |φ ∈ H, which is to say that, if for every |φ ∈ H, there exists an integrable function ρ φ holds. A well-known example of it is provided by the one-dimensional position operatorx on L 2 (R) defined in (2.27). Indeed, one proves that the spectral measure ofx is given by the multiplication of the characteristic function (2.8) as holds. Specifically, this implies that where the denominator of the integrand of the r. h. s. denotes the square of the L 2 -norm of ψ ∈ L 2 (R) (see (2.19)). One thus concludes that the density of the probability measure µ ψ x is provided by Incidentally, it is known that each member of the pair of observables {Q, P } that satisfies the Weyl relations (2.39) and (2.40) admits descriptions in terms of density functions. However, it should be noted that this is not always the case in general: an observable with the spectrum consisting of a finite number of discrete eigenvalues (such as spin) provides a simple counterexample. To see this, let A be such an observable with N ∈ N × distinct eigenvalues, and let σ(A) = {a 1 , . . . , a N } be any enumeration of its spectrum. A straightforward application of (3.51) leads to in which one sees that the probability measure µ φ A is given by the weighted sum of delta measures centred at each eigenvalue. Obviously, since each of the delta measures is not absolutely continuous, the resultant probability measure does not admit a description by density functions.
For later use, we also note that, once the observable A admits a description in terms of probability density functions, then the complex measure (3.46) is also absolutely continuous for an arbitrary pair of vectors |φ ′ , |φ ∈ H. That this is the case can be seen by a straightforward application of the polarisation identity with respect to the operator T valid for any pair of vectors |φ , |φ ′ ∈ dom(T ), where we 3.1.7. Simultaneously measurable Observables. For reference, we briefly review the basic mathematical definitions and facts involved in describing measurements of simultaneously measurable observables, including the simultaneous measurement of local observables on the tensor product of Hilbert spaces.
Strong Commutativity of Self-adjoint Operators. Let A and B be self-adjoint operators on a Hilbert space H, and let E A and E B be their respective spectral measures. We say that the pair of operators A and B strongly commutes, if holds as an operator equality. Note that the strong commutativity of A and B implies its (familiar) commutativity AB = BA. On the other hand, it is known that the converse is in general not true in the case where either (or both) of the operators happens to be unbounded. The term strong commutativity is named after this fact, for it indicates a stronger condition than mere commutativity.

Product Spectral Measures.
It is a basic result of functional analysis that, given such a pair of A and B of strongly commuting self-adjoint operators, there exists a unique twodimensional spectral measure E A,B called the product spectral measure of A and B, for which holds. This is a straightforward operator-valued analogue of product measures in measure theory. With a pair of vectors |φ , |φ ′ ∈ H being specified, this gives rise to a complex measure on (R 2 , B 2 ), defined by In the context of quantum mechanics, for a given pair of simultaneously measurable quantum observables represented by strongly commuting self-adjoint operators A and B, the probabilistic behaviour of the outcomes of an ideal simultaneous measurement of both the observables on the state |φ ∈ H is described by the joint-probability distribution of the pair of observables A and B on the state |φ , which is a two-dimensional probability measure on the measurable space (R 2 , B 2 ). Here, the r. h. s. of (3.69) is interpreted as the probability of finding the outcomes of a simultaneous measurement of both observables in the Borel set ∆ ∈ B 2 . Note that the measurement outcomes of A and B may not be independent, i.e., the equality, may not necessarily hold, or in other words, the joint-probability distribution is not necessarily the product measure µ φ A,B = µ φ A ⊗ µ φ B of each of the respective measurements, in general.

Functional Calculus regarding simultaneously measurable Observables.
Given a pair of strongly commuting self-adjoint observables A and B, one readily confirms As for the sum and product of the observables, we first note the following basic fact. is essentially self-adjoint.
As a direct consequence, we thus have the operator equalities worth of special notice. As above, overlines on closable operators denote their closures, and specifically for essentially self-adjoint operators, their self-adjoint extensions.
Composite Systems.
We comment on the special case of the above situation in which the Hilbert space of our interest is the tensor product H ⊗ K of the target system H and the meter system K, and the operators involved are (local) self-adjoint operators A 1 and A 2 on the respective Hilbert spaces. Observing that the operators strongly commute with each other on the composite Hilbert space H ⊗ K, and that their spectral measures respectively read the previous argument leads to the existence of a unique two-dimensional product spectral measure E A1 ⊗ E A2 := EÃ 1,Ã2 satisfying the operator equality (3.77) Here, the left-most hand side denotes the two-dimensional spectral measure defined as in (3.67), while the right-most hand side denotes the tensor product of the self-adjoint operators E A1 (∆ 1 ) and E A2 (∆ 2 ) for each ∆ 1 , ∆ 2 ∈ B. As we have seen in the previous argument, this gives rise to a complex measure, for a given selection of a pair |Ψ , |Φ ∈ H ⊗ K of vectors of the composite system, and the map, (here, we have slightly abused the notation on the l. h. s. by writing A n in place ofÃ n for each n = 1, 2) provides a probability measure describing the probabilistic behaviour of the outcomes of the ideal local measurements simultaneously performed on each system in the state |Φ ∈ H ⊗ K.
In passing, we note that in the case where the state |Φ happens to be a direct product state |Φ = |φ 1 ⊗ φ 2 , the induced joint-probability distribution of the two local observables (3.79) becomes the product measure of the two probability measures associated with A 1 and A 2 , indicating that the measurement outcomes of each local measurement A 1 and A 2 are statistically independent (i.e., µ φ1⊗φ2 On the other hand, if one chooses the state |Φ to be an entangled state (i.e., those states in H ⊗ K that are not direct product states), the joint-probability distribution (3.79) is no more a product measure of those associated to the local observables in general. In the language of physics, this implies that the local measurements performed on each remote system may have some correlation if the state of the composite system happens to be entangled, and this is widely considered to be one of the most intriguing properties of quantum mechanics. Of course, statistical independence between the target and the meter systems is useless for the purpose of our measurement, and we naturally need an entangled state |Φ in order to retrieve any meaningful information of the former system out of the measurement of the latter.
Sum and Product of Local Observables.
As for the sum and product of a pair of local observables, we note that a direct application of Lemma 3.1 leads to and subsequently as expected.

Unconditioned Measurement
Now that we have recalled the necessary mathematical concepts and results, we shall embark on our main analysis. The target of our analysis is the probability measure describing the behaviour of the outcome of the composite observable I ⊗ X on |Ψ g , which may be rewritten in terms of that of the local observable X on the mixed state ψ g as is merely a straightforward extension of probability measures (3.57) for density operators (for the proof of the equality (3.84), just replace X with E X (∆) in (2.72)).

Main Objective of this Subsection.
The primary interest of our study is now to investigate how the information of the target system is encoded into the profile of the outcome of the meter system (3.84) through the interaction. As in the previous subsection, we assume without loss of generality that the meter observable Y coupled with the target observable A to yield the von Neumann interaction (2.70) is given by Y = P . The main objective of the passage is to demonstrate the following proposition as an answer to this question. The results, which shall be shortly demonstrated, form the bases we rely on in conducting our further study.
In the context of the UM scheme, let Y = P be fixed for definiteness, and let |φ ∈ H and |ψ ∈ K respectively be the initial states of the target and the meter systems. Then, the probability measure (3.84) for both the choice X = Q, P reads in which the resultant profile of the measurement outcomes of X after the interaction can be exclusively written by the convolution of the initial profiles of both the target and the meter systems.
Specifically, the interaction causes the change only in the profile of the outcome of the observable X conjugate to Y , in which the initial profile of the target system acts upon that of the meter system through convolution of measures. On the other hand, the profile of X for the same choice as Y is left untouched. The proposition can be readily demonstrated by observing that the change of the spectral measure of the measuring observables (I ⊗ X) with respect to the unitary operator U (g) := e −igA⊗P is provided by in the Heisenberg picture (they are respectively direct consequences of (2.74) and (2.75)), and that the probability distribution dictating the probabilistic behaviour of the sum of two simultaneously measurable observables is described by the convolution of both the individual profiles of the observables involved (which is in parallel to the well-known result for random variables in classical probability theory). However, in the main passages that follow, we intend to provide a more elementary and straightforward demonstration. As a corollary to this, one equivalently has: . Under the same condition as above, the result (3.85) can also be rewritten as by rescaling the outcome by the interaction parameter. Combining the interaction parameter g and the target observable A (the former) corresponds to the scaling of the target observable A → gA, whereas combining g and the meter observable P (the latter) corresponds to the scaling of the pair of the meter observables {Q, P } → {g −1 Q, gP }. Note that the pair of scaled observables {g −1 Q, gP } for g ∈ R × still satisfies the Weyl relations (2.39) and (2.40). Later on, we shall be investigating how one could recover the information of the target system µ φ A based on the results that we obtained here. Incidentally, one finds that probing either the strong or the weak region of the interaction parameter proves itself useful for this purpose, and the equalities (3.85) and (3.88) shall serve as the respective starting points for analysing the weak and the strong UM schemes.

Preliminary Observation.
For our purpose, we first consider the case where the target observable A has a finite point spectrum σ(A) = {a 1 , . . . , a N }, N ∈ N × . Writing the spectral decomposition of A as (2.76) and applying (2.77), one finds that the composite state after the interaction reads (3.90) It then follows that where we have used the operator equality 11 in the third to last equality, and have applied (3.64) to obtain the last equality.

Description of the Measurement Outcome.
Returning to the general case, where the target observable A is now arbitrary, we may conjecture from (3.91) that generally holds, which indeed turns out to be true; it can be shown straightforwardly in the general framework of functional analysis and measure and integration theory. From (3.92), we see that the probability measure describing the behaviour of the measurement outcome of Q on the (mixed) state ψ g after the interaction can be explicitly given by those of the initial states of both the meter and the system. Speaking in an intuitive way, each value a ∈ σ(A) of the spectrum of A causes a translation µ ψ Q (∆) → µ ψ Q (∆ − ga), ∆ ∈ B to the probability measure of the initial meter state while keeping its 'shape' of the profile intact, and each of these effects is all added over, weighted by the original probability µ φ A of the target observable A.
Parallel to this, we remark that the ideal measurement of the observable X = P after the von Neumann interaction would result in which states that the interaction does not alter the profile of the measurement of X = P at all. This can be readily shown by changing Q to P in (3.91), and by applying the operator equality e itP E P (∆)e −itP = E P (∆), ∆ ∈ B.

Scaling of Measures and Density Functions.
For later arguments, it proves convenient to rewrite our previous result (3.92) in terms of convolution of measures after introducing some notations. Let µ ∈ M C (B n ) be a complex measure, and define a parametrised family {µ t } t∈R of complex measures by Note that this definition is well-defined, for the continuity of the map x → tx implies its Borel-measurability, hence t −1 ∆ ∈ B n for ∆ ∈ B n . The coefficient µ(R n ) multiplied to the delta measure for t = 0 is to keep the total evaluation µ t (R n ) = µ(R n ) constant for all t ∈ R. Intuitively speaking, this parametrisation allows us to narrow down the profile of a given complex measure µ while keeping its total evaluation µ t (R n ) = µ(R n ) intact, so that it 'tends' in an intuitive way to the delta measure (weighted by its total evaluation µ(R n )) as t → 0.
To help visualise this, suppose that µ is absolutely continuous and write ρ := dµ/dβ n for simplicity. One then finds where we have introduced the scaling of any given integrable function f ∈ L 1 (R n ) by t ∈ R × . This implies that µ t is also absolutely continuous for each t ∈ R × by definition, and that its density is given by ρ t , i.e., where the l. h. s. is the density of the scaled probability measure µ t , and the r. h. s. is the density of the original probability measure µ scaled by t as in (3.96). In the special case where µ is a probability measure, one may intuitively see that the parametrisation (3.96) takes any non-negative integrable function with the total integral of unity (i.e., a probability density function) to the 'delta function' in the limit t → 0.
Von Neumann Interaction and Convolution. Now, note here that for each t ∈ R × , the probability measure µ t is nothing but the image measure (3.9) of µ with respect to the map x → tx (i.e., multiplication by t). With the help of the change of variables formula for image measures (3.10), one confirms that the equality holds for all f that is integrable with respect to µ. This allows us to rewrite (3.92) in terms of convolution as (3.99) Alternatively, by scaling ∆ → g∆ in (3.92), one finds from the definition that which is another way to describe how the von Neumann type interaction causes a change in the profile of the meter observable X = Q.

Scaling of Observables.
We make a short digression at this point to seek for the physical meaning of the two findings (3.99) and (3.100), which we have just acquired. To prepare for our argument, we first introduce some notations regarding scaling of spectral measures, in parallel to that of complex measures as we have done before. Let E : B n → L(H) be an n-dimensional spectral measure on the Hilbert space H, and define a parametrised family {E t } t∈R of spectral measures by Here, we have introduced the 'delta spectral measure' E 0 centred at 0 ∈ R n , defined by Incidentally, for the one-dimensional case (n = 1), the delta spectral measure E 0 centred at the origin is nothing but the spectral measure accompanying the zero operator 0 on H. We next confirm some basic facts regarding scaling of observables and their accompanying spectral measures. Let E A be the spectral measure of a self-adjoint operator A : H ⊃ dom(A) → H. The goal is to specify the spectral measure of the scaled self-adjoint operator tA, (t ∈ R) and to show that where the l. h. s. is the desired spectral measure accompanying the scaled operator tA, whereas the r. h. s. is the spectral measure accompanying the operator A scaled by t. To see this, first observe the following equality for the choice |φ ∈ dom(A), where we have used (3.98) to obtain the second to last equality. Applying the polarisation identity (3.65) for T = tA, one then has for any |φ , |φ ′ ∈ dom(A). Observing that the domain of a self-adjoint operator is dense in H by definition, one may continuously extend the above equality on |φ ′ ∈ H, based on which the uniqueness of the spectral measure leads to the desired result (3.103).
Returning to our main line of arguments, we first observe that the equality (3.103) leads to which states that the probability measure describing the ideal measurement outcome of the scaled observable tA on the state |φ ∈ H coincides with that of the original observable A scaled by t. Armed with this result, one may reformulate our previous findings (3.99) and (3.100) respectively as where we have also explicitly written down the profile of the outcome of the measurement of X = P . This completes our proof for Proposition 3.2 and Corollary 3.3.

Recovery of the Target Profile
We now consider the inverse problem of what we have discussed so far, that is, we argue how one can recover the probability measure µ φ A of the target observable A from the probability measure µ ψ g Q obtained through the measurement of Q on the meter system. Following the same line in the previous section, one finds it useful to probe either the strong or the weak region of the interaction for this purpose, which we shall see below one by one.

Strong Unconditioned Measurement.
We first concentrate on (3.88) (or equivalently (3.100)), and observe that the problem of recovering the desired probability measure reduces to the problem of 'deconvolution', where one wishes to find the solution µ := µ φ A of the equation of the form having knowledge and control over both the 'input' ν in := µ ψ (g −1 Q) and 'output' ν out := µ ψ g (g −1 Q) on their respective sides. Whilst there is rich literature on the topic of deconvolution, we take a specific approach to the solution in order to make our arguments simple.

Main Objective of this Passage.
A quick observation leads us to a naïve expectation that, if one could attune the input so that ν in may become a multiplicative identity (in our case, it is the delta measure δ 0 centred at the origin), or in the case where this is impossible, if one gradually approximates the input close enough to it, then, one may obtain the desired solution µ directly as the measured output ν out → δ 0 * µ = µ. One of the typical manners in which we attain such gradual approximation would be to fix the initial state ψ and taking the strong limit g −1 → 0 (g → ±∞) of the interaction parameter, so that ν in = (µ ψ Q ) g −1 'tends' towards the desired identity δ 0 in an intuitive manner (recall (3.94) and (3.96)). The main objective of this passage is to confirm that this idea is indeed valid, and thus to state it in a mathematically rigorous way.
As it becomes apparent through the line of discussions below, there are some certain mathematical hurdles that must be overcome to achieve this objective. In order to avoid much intricacies, we shall impose certain condition to the choice of the target observable, and present our main result in the following way: Proposition 3.4 (Strong Unconditioned Measurement). In the context of the UM scheme, suppose that (i) the target observable A admits description by density functions, (ii) the initial profile µ ψ Q of the meter observable Q on the state |ψ is compactly supported 12 .
Then, the scaled profile of Q after the interaction converges to the desired target in the strong limit of interaction with respect to the total variation norm (or, equivalently the L 1 -norm) for any choice of the initial states |φ ∈ H.
The remainder of this passage is devoted to its demonstration.

Preliminary Observations.
Let us make a preliminary observation following the above idea. The first thing we realise is that, in general, we cannot prepare the input ν in so that its profile may exactly coincide with the multiplicative identity δ 0 . To see this quickly, first recall that the realisable input probability measures ν in are exactly those that are absolutely continuous with respect to the Lebesgue-Borel measure. Since the delta measure δ 0 does not belong to the space L 1 (B), one concludes that it is impossible to prepare the input in such a way that ν in = δ 0 holds. An alternative approach to this problem may be to consider a sequence of inputs (ν in ) n that tends to the delta measure δ 0 in hope that the resultant sequence of multiplicative products (ν out ) n := (ν in ) n * µ also converges towards the desired solution µ in the limit. Indeed, if one could only construct a sequence (ν in ) n so that under the total variation norm, one concludes from the evaluation that the outcome tends to the desired solution in the limit. Unfortunately, however, one immediately realises that this idea also fails, since in general there is no such sequence (ν in ) n that meets the condition (3.111) in the first place, for indeed, since the space L 1 (B) of absolutely continuous complex measures is a topologically closed subset of the measure algebra M C (B), a sequence in L 1 (B) never converges to an element outside of L 1 (B) with respect to the total variation norm.
Discussion on the possible Approaches.
From the quick overview of our current situation, we learn that the problem at hand is to do with the topology we have given to the measure algebra M C (B). Namely, the topology induced from the total variation norm is too strong (fine) for our convenience. A fundamental cure for this would thus be to equip the space with a weaker (coarser) topology on M C (B) such that, at least, it may allow us to construct sufficiently abundant sequences (or nets, in general) of the 'inputs' in L 1 (B) that converges towards δ 0 , and that the sequence of the resulting 'outputs' (i.e., the multiplicative product (3.109)) would subsequently converge towards the desired solution in the limit 13 .
However, since this strategy, while being desirable, presupposes moderate familiarity with the mathematical branch of general topology, which the authors have deemed to be beyond the scope of this paper, an alternative approach to the problem without explicit exposure to it would be favourable (possibly at the cost of generality, while hopefully having the merit of being mathematically less demanding). In this paper, this would be accomplished by introducing an auxiliary concept of 'approximate identities', whose definition would be shortly presented. In essence, we focus only on the convergence of the output in the total variation norm, based on the observation that, even though there is no sequence of the input that converges to the delta measure (3.111), there are certain conditions in which the sequence of the output do converge towards the desired solution (3.113). As a preliminary observation to this approach, observe that the output ν out also necessarily lies in L 1 (B) 14 , and by recalling that L 1 (B) is closed under the topology induced by the total variation norm, one finds that the candidates of the solution µ towards which the sequence of outputs could ever converge are only those that also lie in L 1 (B). Based on this inspection, in what follows, we shall only treat the case in which the target observable A admits a description by density functions, which is to say that the solutions µ = µ φ A are always guaranteed to lie in L 1 (B), is assumed.

Approximate Identities.
The convolution algebra L 1 (B) ∼ = L 1 (R n ), contrasted to the measure algebra M C (B), is non-unital. In order to compensate the inconvenience arising from the lack of a multiplicative identity, a weaker concept is often used in analysing problems involving algebras. In this paper, we call a family {e t } t>0 of elements of L 1 (R n ) an approximate identity, if for every element f ∈ L 1 (R n ), the convolution e t * f converges to f 13 A straightforward candidate for such a topology would be the weak- * topology based on the identification (3.43) by the Riesz-Markov-Kakutani representation theorem, namely, the initial topology with respect to the family of all algebraic linear functionals of the form µ → R f dµ, where f ∈ C 0 (R). One eventually finds that the norm topology of the total variation is nothing but the strong topology with respect to the identification, which implies that the weak- * topology is strictly weaker than the topology we currently have at hand. Moreover, direct application of the dominated convergence theorem and Fubini's theorem reveals that the convergence of a sequence of probability measures ν n → δ 0 implies ν n * µ → µ (both the convergence is meant in weak- * ), which is a much cleaner result than what we have seen in the main paragraphs. As an example of such a sequence (net) of probability measures converging towards δ 0 , one finds that the scaling ν t (3.94) of a given probability measure ν is typical. In fact, the scaling becomes a continuous parametrisation from R to M C (B) under the topology, which is also a welcome property. in the topology induced by the L 1 -norm, i.e., (3.114) Before we move on to the construction of an example, we collect some necessary terminologies. Recall that the support of a function f : R n → K is a subset of R n defined by where the overline on a set denotes its topological closure. A support of a function f : R n → K is said to be compact if supp(f ) is bounded. Now, let η ∈ L 1 (R n ) be any integrable function possessing a compact support with the total integration of unity, With this, consider a family {η t } t∈R × of scaled functions defined as in (3.96), which preserve the total integration of unity for all t ∈ R × . One may then intuitively expect that η t tends to the 'delta function' in the limit t → 0 and can be used for an approximate identity, To confirm that this is indeed the case, observe the inequality where τ a is the translation operator defined by Recalling that lim a→0 τ a f − f 1 = 0 for any f ∈ L 1 (R n ), we see that for any ǫ > 0, there exists a δ > 0 for which a ∈ K δ (0) := {x ∈ R n : |x| < δ} leads to τ a f − f 1 < ǫ. By taking |t| small enough so that supp(η) ⊂ K t −1 δ (0), we find that the r. h. s of the above inequality is less than ǫ. This shows that the family defined by e t := η (±t) , t > 0, (3.120) makes a simple example of approximate identities (here, the meaning of the subscript on both sides of the equation is not to be confused, where the subscript on the l. h. s. indicates an index of the elements of the convolution algebra L 1 (R n ), whereas that on the r. h. s. indicates the scaling parameter of an integrable function η defined in (3.96)). Obviously, the construction of such approximate identities is highly non-unique, and one may attain it in various different ways.
Realisation of Approximate Identities.
Our observation so far revealed that, as long as the target profile µ ∈ L 1 (B) is absolutely continuous, by considering the family of inputs {(ν in ) t } t>0 in such a way that it makes an approximate identity in L 1 (B), the resulting family of outputs (ν out ) t := (ν in ) t * µ would successfully converge to the desired solution in the L 1 -norm (or equivalently, in the total variation norm) 15 . We are now interested in the construction of such approximate identities for our current situation. To this, we first observe that, since the profile of the input ν in = µ ψ (g −1 Q) in our case is exclusively determined by the choice of the interaction parameter g and the initial state |ψ ∈ K of the meter system, the problem reduces to finding a sequence of the pair (g, |ψ ) t , t > 0 that makes the input an approximate identity. As an example of such a construction, we first fix the initial state |ψ and observe that the density of the input is given by where we have used our previous result (3.97). Then, choosing |ψ so that the density of µ ψ Q may be compactly supported, one realises that taking the strong limit of the interaction g −1 → 0 (or equivalently g → ±∞) yields the desired result. In turn, we fix the interaction parameter g ∈ R × and choose a sequence of initial states that makes the corresponding probability measures an approximate identity. Since the scaling of an approximate identity by g −1 is still an approximate identity, one achieves another example of such a construction.
One thus finds a general guiding principle for the construction of an approximate identity to be the combination of the two manoeuvres, namely, either • by taking the strong limit of the interaction g −1 → 0, • by narrowing down the profile of the probability measures to the delta measure (symbolically µ ψ Q → δ 0 ) by changing the meter state |ψ ∈ K.
In order to explicitly see how these work together, choose a sequence of initial states |ψ , |ψ h ∈ K, h > 0, such that the density of the initial profile µ ψ Q is compactly supported and that the parametrisation corresponds to its scaling which makes itself an approximate identity as h → 0 (one may easily construct such a sequence in the special case in which the meter system is described in the Schrödinger representation of the CCR 16 ). Then, observing that the scaling of it by g −1 is one finds that it is indeed an approximate identity that tends to the delta in the limit as hg −1 → 0 together.

Concluding Remarks.
In conclusion, we see that the UM scheme allows us to recover the information of the target system and its observable A, not only in the form of expectation values described earlier, but also in the form of probability measures µ φ A . This is accomplished by taking the limit of either narrowing the profile of the probability measure µ ψ Q of the meter system, or intensifying the interaction parameter g → ±∞, or otherwise by appropriately balancing both contributions and having hg −1 → 0 as a whole. In this sense, we may say that intensifying the interaction parameter has an equivalent role to narrowing the profile of the probability measure of the meter. It may thus appear reasonable that, also in this respect, the von Neumann measurement scheme is sometimes referred to as the 'strong measurement' or the 'sharp measurement'.

Weak Unconditioned Measurement.
We shall see next how the measurement outcome of the UM scheme behaves locally around g = 0 in terms of probability measures. Specifically, we are interested in the (higher-order) derivatives of the map which is now a map from the real line R to the space of complex measures Main Objective of this Passage.
The main objective of this passage is to first compute the derivatives of the map (3.126) at the origin g = 0, and subsequently argue how one may reconstruct the profile of the probability measure µ φ A of our interest from the information obtained. However, as one realises in the line of discussion that follows, this involves certain mathematical intricacies. In order to avoid any difficulties and complication that may arise, we impose some restrictions to the configuration of the target and meter systems, and thus obtain the following two propositions, the first of which shall be demonstrated in the main passages below. Proposition 3.5 (Outcome of the Weak Unconditioned Measurement). In the context of the UM scheme, suppose that (i) the target profile µ φ A is compactly supported, (ii) the density of µ ψ Q belongs to the Schwartz space dµ ψ Q /dβ ∈ S (R). 16 One may choose any wave-function ψ ∈ L 2 (R) with compact support, and define Here, the braces among the subscript h to denote the index is merely employed in order to avoid confusion with that denoting scaling of a function (3.96). One then readily finds that this qualifies as an example of the desired family (3.123).
Then, the map (3.126) is arbitrarily many times strongly differentiable in the L 1 -norm (or, equivalently, in the total variation norm), and its derivatives at g = 0 reads where D denotes the operation uniquely specified through the relation by differentiating the density of absolutely continuous complex measures ν ∈ L 1 (B) whose density dν/dβ ∈ S (R) lies in the Schwartz space.
Note that compactness of the support of µ φ A implies the existence of all the higher-order moments | E[A n ; φ] | < ∞ of the observable A, and that the Schwartz space is closed under the operation of differentiation (i.e., D n (dν/dβ) ∈ S (R)), hence both sides of (3.127) is well-defined. Operationally, the above proposition implies that one may obtain not only the expectation value (n = 1) of µ φ A , as we have found by the operator level analysis (2.89) conducted in the previous section, but also its higher-order moments by probing the local behaviour of the interaction around g = 0. Incidentally, one might expect that one could recover the full profile of the original probability measure µ φ A by knowing enough numbers of its higher-order moments, which in fact turns out to be positive under our assumption. Proposition 3.6 (Weak Unconditioned Measurement). Let A be self-adjoint and |φ ∈ H for which the probability measure µ φ A is compactly supported. Given another compactly supported probability measure µ on (R, B) such that all their higher moments coincide with those of µ φ A , then the two probability measures agree µ = µ φ A . In other words, one may uniquely reconstruct the probability measure µ φ A of the target system by knowing all the higher moments of A by means of the weak UM.
Proof. In fact, this is one instance of the famous problems collectively called the classical moment problem [31,32]. We provide a sketch of the proof for our specific case at hand, and to this, we first observe that knowing all the higher-order moments (3.129) is equivalent to knowing the integral p(a) dµ φ A (a) of all polynomials p ∈ P (K) on some compact subset K ⊂ R on which µ φ A is supported. Now, choose a compact subset K ⊂ R that contains the support of both µ φ A and µ, i.e., µ| K c = µ φ A | K c = 0, and observe that the space of continuous functions on K trivially coincide with that of continuous functions on K that vanishes at infinity C(K) = C 0 (K). We thus have C(K) ′ = C 0 (K) ′ ∼ = M C (B| K ) by the Riesz-Markov-Kakutani representation theorem. Since the space of polynomials P (K) is dense in C(K) with respect to the supremum norm (cf. Stone-Weierstraß approximation theorem), one concludes that R p(a) dµ(a) = p(a) dµ φ A (a), p ∈ P (K) implies µ = µ φ A . 53 Preliminary Observation.
We now begin our analysis. To provide some preliminary observation to this problem, we start by observing that the target of our study would be the following formal expression in which we leave aside, just for now, all the inherent subtleties that will shortly become apparent regarding the operation of taking the limit. Now, since the numerator of the r. h. s. of the above formula can be written as one finds that the analysis of (3.131) reduces to the study of the formal expression of the form where µ, ν in ∈ M C (B) are probability measures (the latter being absolutely continuous), ν out (t) := ν in * µ t , and the subscript on µ t denotes the scaling defined in (3.94). In studying (3.133), one might find it a decent starting point to focus on the formal expression (the right component of the above convolution) From this, one realises that our problem is nothing but the differentiability of the map t → µ t at the origin t = 0 (recall that we have defined µ 0 := δ 0 for any probability measure µ), and thus have symbolically written the limit of the above expression by µ ′ 0 , temporarily leaving aside the question of its existence and well-definedness just as before. It would then be tempting to expect which should resolve our main problem fairly nicely.
A Formal Computation of the Derivative. Guided by the above naïve observation, we are naturally led to consider what the derivative of the map t → µ t at t = 0 for a given probability measure µ ∈ M C (B) would look like. As a first step, suppose for simplicity that µ is absolutely continuous, and denote its density by η := dµ/dβ. Armed with our previous findings η t = dµ t /dβ, t ∈ R × regarding scaling of measures and that of its densities (see (3.97)), we then intend to formally obtain in view of density functions, by first computing its derivative at t > 0 and then taking the limit t → 0. Now, assuming suitable differentiability and integrability conditions for the density η, one computes the derivative of the map t → η t at t > 0 as where D := d/dx was the usual operation of differentiation. Then, one might be tempted to formally proceed as where we have used (3.94) in the second equality. The above argument implies that the derivative of the map t → µ t at the origin would appear as which is the 'derivative of the delta measure' weighted by the expectation value of the original probability measure µ. As for the general case in which the original probability measure µ is now not necessarily absolutely continuous, we may conjecture that, since the r. h. s. of (3.139) does not depend on the absolute continuity of the original probability measure µ, the same result should hold even in the general case as well.
Discussion on the possible Approaches.
While we have conducted a very formal discussion above, the result in fact turns out to be true and can be made mathematically fully rigorous in the framework of the theory of generalised functions (distributions). In fact, it turns out that the derivative µ ′ 0 that appears in (3.139) is no longer a member of the space M C (B) of complex measures 17 , and accordingly the framework in which we have been working so far (i.e., the space of complex measures) is insufficient for our analysis. For further study of the weak UM scheme, a preferable approach would thus be to expand our framework by introducing the space of distributions. While this method has a great merit in being able 17 Incidentally, one may recall that the (higher-order) derivatives of the delta distribution appears in several branches of physics, one of the most familiar of which being presumably the theory of electromagnetism. The derivative of the delta distribution Dδ 0 is among the most well-known example of a distribution that cannot be expressed by a complex measure. In order to provide an intuitive reasoning with the tools at hand, let ϕ be a smooth function with compact support (i.e, a test function) satisfying (Dϕ)(0) = 1. As a concrete example, one may take ϕ(x) := xϕ 0 (x) with Defining a sequence of test functions by ϕ n (x) := n −1 ϕ(nx), n ∈ N × , observe that the dominated convergence theorem necessarily implies lim n→∞ R ϕ n dµ = 0 for any complex measure µ ∈ M C (B).
On the other hand, with the help of an auxiliary smooth density function ρ to symbolically express the delta distribution by the limit of its scaling δ 0 = lim t→0 ρ t , one may formally compute the integral to conduct our analysis with decent generality (and in fact, distributions have their role, not just in this subsection, but also later in studying the quasi-joint-probability distributions in Section 5 and 6), at the same time, it has a drawback in that it would be rather mathematically demanding, especially since the theory of distributions is build up on the results of general topology. In view of this, an alternative approach to the problem without direct exposure to the theory of distributions would be favourable. To this end, recalling the idea employed in the previous subsection, we concentrate only on the differentiability of the multiplicative product (3.133), setting aside the intricacies involving that of the map t → µ t we have seen above. To see what we mean, we first expect, by combining (3.135) and (3.139), that the derivative of the map t → ν out (t) at the origin be written as Now, assuming suitable differentiability condition of the density ρ in := dν in /dβ of the imput ν in as a starting point, we employ an auxiliary smooth density function η to symbolically express the delta distribution by the limit of its scaling δ 0 = lim t→0 η t (a similar technique is used in (3.141)) and formally obtain the 'density' of the convolution ν in * Dδ 0 as (3.143) Introducing the notation Dν in as defined in (3.128), we thus obtain The basic idea is that, while we have seen that the distributional derivative of the delta Dδ 0 does not allow itself to be expressed by a complex measure, the distributional derivative Dν in of some probability measure ν in might belong to the space M C (B) of complex measures 18 . of ϕ n weighted by the 'density' Dδ 0 as where we have used integration by parts to obtain the second equality. This implies lim n→∞ R ϕ n (Dδ 0 )dβ = lim n→∞ −(Dϕ n )(0) = −1, which would lead to a contradiction if (Dδ 0 ) were to be expressed by a complex measure. 18 As one may expect, the distributional derivative Dν of an arbitrary complex measure ν can be made well-defined by extending our framework into the theory of generalised functions. In general, the derivative derivative Dν is a distribution itself (as we have seen for the special case ν = δ 0 ), but not necessarily a complex measure anymore.
If we could moreover find a condition for which the differentiability (3.144) is valid with respect to the norm topology of the total variation (i.e., strongly differentiable), we could develop a line of argument that is totally confined in the space M C (B), without referring to the theory of distributions at all.
On the Main Results.
One finds below that the the above idea is indeed valid. To this end, we assume • The probability measure µ has compact support.
• The density of ν in belongs to the Schwartz space dν in /dβ ∈ S (R).
Under the above two conditions, we demonstrate below that the map t → ν out (t) is in fact arbitrarily many times strongly differentiable, and that its higher-order derivatives read at the origin t = 0. Here, D n ν in denotes the signed measure defined in (3.128), and the signed measure x n ⊙ µ is defined in (3.7). Note that our two conditions above, namely, the compactness of the support of µ = x 0 ⊙ µ and the density of ν in = (−D) 0 ν in belonging to the Schwartz space, are true not only for n = 0, but for all n ∈ N 0 . Note also that compactness of the support of µ guarantees the finiteness of all its higher-order moments | E[x n ; µ] | < ∞, n ∈ N 0 . Applying (3.146) to our physical situation by letting µ = µ φ A and ν in = µ ψ Q would prove Proposition 3.5.
Proof of our Main Result. For demonstration, we provide a sketch of the proof by mathematical induction. One may readily confirm by definition that the above statement is trivially true for n = 0. Now, assuming that the statement is true for n ∈ N 0 , we rewritẽ ν in := (−D) n ν in ,μ := x n ⊙ µ andν out (t) :=ν in * μ t for better readability. Now, recalling that the convolution algebra L 1 (B) is an ideal in the measure algebra M C (B), one finds thatν out (t) is absolutely continuous for all t ∈ R (in passing, one moreover finds that the density ofν out (t) is also a Schwartz function), and that its densityρ out (t) := dν out (t)/dβ is given byρ whereρ in denotes the density ofν in (see (3.24) for this result). In order to prove the strong differentiability of the map t →ν out (t), we work in the space of density functions. We start by demonstrating the point-wise differentiability of the map t →ρ out (t), and to this end, we fix t 0 , x ∈ R and observe where the exchange of the limit and integration in the second equality, while we shall omit any details of its proof, is essentially a consequence of the dominated convergence theorem.
Next, we return to its strong differentiability (i.e., differentiability with respect to the L 1norm). To this end, we assume t 0 < t without loss of generality and recall the mean-value theorem, which state that there exists a t 1 ∈]t 0 , t[ such that holds. Then, one has where the exchange of the order of integration in the last inequality is guaranteed to hold (Fubini's theorem), and the translation operator τ a is defined in (3.119). Compactness of the support ofμ together with an analogous argument made in (3.117) implies that the r. h. s. of the above inequality tends to 0 as t → 0, which completes our proof for strong differentiability. We thus have by (3.148) where we have used (3.94) and (x n+1 ⊙ µ)(R) = E[x n+1 ; µ] in the last equality. This completes our whole proof.

Conditioned Measurement I: In Terms of Conditional Expectations
We shall next embark on our study of the measurement scheme that we call the conditioned measurement (CM) scheme. As the name indicates, the CM scheme involves conditioning, where one employs the measurement of another observable on the target system on top of the UM scheme studied earlier. The CM scheme can be understood as a natural generalisation of the post-selected measurement scheme, which has recently been attracting much attention of several groups among the physics community. While the post-selected measurement scheme itself has been practiced for quite a while, it has caught a renewed interest since Aharonov et al. reintroduced it with the term weak measurement which in particular applies to the post-selected measurement in the weak limit, along with the complex quantity termed weak value purported to be measured by it. Two sections starting from here is devoted to the analysis on the CM scheme, and by following the same line as that of the former unconditioned counterpart, we start by examining the measurement scheme in terms of conditional expectations (Section 4), and subsequently in terms of conditional probabilities (Section 5).
Organisation of this Section. The contents of this section is organised as follows. We first provide a concise summary of some of the necessary mathematical concepts that provides us the tools for conducting the analysis. We then make a brief review on the CM scheme from a relatively general framework, and make some comments on the technique of employing conditioning (or post-selection, as a special case) in precision measurements, whose alleged advantages has recently become the topic of intensive debate. We shall then investigate how one could reclaim the information of the configuration of the target system from the the measured outcomes, and to this end, we concentrate on the behaviour of the conditional expectation of the meter observable around the weak limit g = 0 of the interaction parameter. In parallel to the unconditional case, we call this procedure the weak conditioned measurement scheme in this paper. We finally close this section by introducing the concept of conditional quasi-expectations of a quantum observable given another (not necessarily simultaneously measurable) observable, as a generalisation to that of the standard conditional expectations, and examine some of their notable properties.

Reference Materials
In this subsection, we shall briefly recall the necessary mathematical definitions and results regarding the formal mathematical description of conditioning.
4.1.1. Conditioning. The essence of the CM scheme lies in the conditioning of the outcomes of a measurement of an observable X of the meter system K by that of an additional observable B of the target system H. The quantity of interest is then the conditional expectation of X given B, in contrast to the UM scheme described in Section 2, where the quantity of interest was the mere (unconditional) expectation value of X.
Since one may find the general definition of conditional expectations to be rather involved, we start by some preliminary discussion in order to ease the introduction. Let (R n , B n , µ) be a probability space, and let f : R n → R be µ-integrable. Given a Borel set B ∈ B n with non-vanishing probability µ(B) = 0, one defines the conditional expectation of f given the measurable set B ∈ B n by the real number be a decomposition of R n into finite numbers of mutually disjoint Borel sets, and let E := {B i } i=1,...,N denote their collection. We then define to be the sub-σ-algebra of B n generated by E. Assuming µ(B i ) = 0 for all i = 1, . . . , N , this gives rise to an A-B measurable function where each χ Bi is the characteristic function of the subset B i . Observing that each element A ∈ A can be expressed by a union of elements of E, one has where µ| A denotes the restriction of the probability measure µ on the sub-σ-algebra A.
Guided by this observation, the conditional expectation of an integrable function f given a sub-σ-algebra A is defined in the following manner: Definition (Conditional expectation given a sub-σ-algebra). Let (R n , B n , µ) be a probability space. For a sub-σ-algebra A ⊂ B n and a µ-integrable function f : R n → R, the conditional expectation of f given A, denoted as E[f |A], is defined as a µ| A -integrable function satisfying The conditional expectation E[f |A] exists, and is unique µ| A -a.e.
To see the validity of the definition, first observe that the l. h. s. of (4.5) defines a complex measure A → (f ⊙ µ)(A), A ∈ A. Since (f ⊙ µ)| A ≪ µ| A , the Radon-Nikodým theorem leads to the existence and uniqueness µ| A -a.e. of the conditional expectation which is nothing but the Radon-Nikodým derivative (density) of the restriction (f ⊙ µ)| A with respect to the restriction µ| A . Note that the conditional expectation is defined as a function (or more precisely, an equivalent class of functions) rather than a mere number. The elementary definition (4.3) mentioned earlier is in fact a special case of the above general definition, in which the sub-σ-algebra concerned is given by (4.2). The conditional expectation E[f |A] serves as the, so to speak, best approximation of the original function f by measurable functions defined on the coarser 19 σ-algebra A ⊂ B n .

Conditional Expectation given another Function.
We next recall the definition of the conditional expectation given another real measurable function. As above, we first provide an introductory argument. Let (R n , B n , µ) be a probability space, and let f : R n → R be µ-integrable. Given another measurable function g : R n → R, suppose that the probability of obtaining the outcome y ∈ R of g is non-vanishing µ(g −1 (y)) = 0. In a similar manner as before, one may define the conditional expectation of f given the outcome y of g as where we have just replaced B = g −1 (y) in (4.1). It is now tempting to construct a function y → E[f |g = y] that maps each of the possible outcomes of g to the corresponding conditional expectation. Assuming that the function g only takes a finite number of distinct outcomes ..,N , y i ∈ R, one accordingly obtains a decomposition R n = ∪ N i=1 g −1 (y i ) of R n into a finite number of mutually disjoint Borel sets. Assuming moreover that µ(g −1 (y i )) = 0 for all i, one obtains a well-defined measurable function called the conditional expectation of f given g.
To see how this relates to the previous definition of the conditional expectation given a subσ-algebra, consider a general situation in which one is given a set X (without a σ-algebra), a measurable space (Y, A) and a function g : X → Y . The collection makes itself into a σ-algebra, called the initial σ-algebra on X with respect to g, and it is the coarsest σ-algebra on R n for which the map g is measurable. In the above situation, we take (Y, A) = (R, B 1 ) and define where we have let E := {g −1 (y i )} i=1,...,N . Now, since we have assumed that µ(g −1 (y i )) = 0 for all i, the conditional expectation of f given I(g) can be expressed as where the last equality is due to (4.3) by replacing B i = g −1 (y i ). It is then fairly straightforward to see that the conditional expectations E[f |I(g)], E[f |g] and the conditioning function g are related to one another through the commutative diagram, where each of the functions is measurable. In this sense, the function E[f |g] is understood to be nothing but the factorisation of E[f |I(g)] by g. The validity of such observation for the general case is guaranteed by the following Factorisation Theorem.
Theorem (Factorisation Theorem). Let X be a non-empty set, and let I(g) := g −1 (A) be the initial σ-algebra of a map g : X → (Y, B). A function h : (X, I(g)) → (R, B 1 ) is measurable if and only if there exists a measurable functionh : (Y, B) → (R, B 1 ) that makes the diagram commute.
By letting (Y, B) = (R, B 1 ) and h = E[f |I(g)], this guarantees the existence of the function E[f |g] :=h that makes the desired diagram commute, even for the general case.
As for the integrability of the conditional expectation E[f |g], we first observe that the probability of obtaining the outcome of g in a Borel set B ∈ B 1 is dictated by the probability measure which is nothing but the image measure of µ with respect to g (see (3.9) for its definition and properties). One thus sees by the formula that the function E[f |g] is g(µ)-integrable, and its expectation value coincides with the expectation value of f under µ, which is what one naturally expects. Guided by the above observation, the conditional expectation of an integrable function f given another measurable function g is defined in the following manner: Definition (Conditional expectation given a measurable function). Let (R n , B n , µ) be a probability space, and let f : R n → R be µ-integrable. The conditional expectation of f given a measurable function g : R n → R, denoted as E[f |g], is defined as a g(µ)-integrable function that makes the diagram R n , I(g), µ| I(g) commute. Its existence and uniqueness g(µ)-a.e. is known to be guaranteed. to denote the conditional expectation of f given the outcome y of g. Note that this definition is dependent on the choice of the representative and may admit ambiguity. Indeed, for the choice y ∈ R for which the probability of obtaining the outcome of g in {y} is vanishing: g(µ)({y}) = µ(g −1 ({y})) = 0, one sees that E[f |g = y] is indefinite and may take any real number. As exemplified in here, the conditional expectation E[f |g] of f given g is appropriate to be viewed as an equivalent class of integrable functions, rather than a function alone.
Conditioning by Simultaneously Measurable Observables.
As in the previous section, we occasionally denote the Borel sets on R n by ∆ ∈ B n in place of B for better understanding and readability, especially in the context of quantum theory, where the confusion of the notation of B with that of an operator may become a concern. Let A and B be a pair of simultaneously measurable observables on a quantum system H. We have seen that this yields a probability measure µ φ A,B on (R 2 , B 2 ) (cf. (3.69)), which is interpreted as the jointprobability distribution describing the outcomes of a simultaneous measurement of A and B performed on the quantum system in the state |φ ∈ H. Letting f (a, b) = π A (a, b) := a and g(a, b) = π B (a, b) := b describe the measurement outcomes of each of the observables A and B, we shall briefly see below how the previous discussions on conditioning fits in the context of quantum mechanics. For our purpose, assume |φ ∈ dom(A) so that the projection π A (a, b) = a may be integrable with respect to the second projection is nothing but the probability measure describing the outcome of B, we define the conditional expectation E[A|B; φ] of an observable A given B on the state |φ as the (equivalence class where the r. h. s. is the conditional expectation of π A given π B under the probability measure µ φ A,B . Under the same assumption, we analogously define the conditional expectation of an observable A given the outcome b of an observable B on the state |φ ∈ dom(A) by (4.21) We note again that the last definition incorporates some ambiguity, in which the number E[A|B = b; φ] is not well-defined in the case where the probability that the measurement of B yields the outcome b is vanishing.

Conditioned Measurement
The CM scheme incorporates the measurements of two observables, where the experimenter measures one local observable on the meter system and the other on the target system. In this paper, we generally define the CM scheme as the act of measuring the conditional expectation of an observable for the choice of either X = Q or X = P of the meter system given another observable B of the target system. Here, for better readability, we have made a little abuse of notation by writing X instead of I ⊗ X and B for B ⊗ I. We emphasise again that the conditional expectation (4.22) is defined as an equivalence class of functions that are integrable with respect to the probability measure which describes the behaviour of the outcome of the measurement of the local observable B on the target system. Here, we have introduced the density matrix on the target system defined in a parallel manner as in (2.71). For its well-definedness, we note the following statement for reference.
be the choice of the initial states of the target and meter systems. Then, the conditional expectation E[X|B; Ψ g ] is well-defined for all range of the interaction parameter g ∈ R.
Proof. For demonstration, we shall only refer to Proposition 2.2 that guarantees the integrability of the outcomes of the measurement of X (i.e., | E[I ⊗ X; Ψ g ] | < ∞) for all range of g ∈ R, given the conditions assumed.
Post-selected Measurement.
As a special subclass of this measurement scheme, we prepare the term post-selected measurement scheme to refer to the case where the conditioning observable B = |φ ′ φ ′ | happens to be a projection on some one-dimensional subspace of H spanned by some normalised vector |φ ′ ∈ H, and in such a case, the act of conditioning will be occasionally referred to as the post-selection. It is also a common practice found in various literatures to call the state |φ prepared prior to the measurement the initial or the pre-selected state, and the normalised vector |φ ′ spanning the image of the one-dimensional projection B = |φ ′ φ ′ | the final or the post-selected state.
4.2.1. Topic: 'Amplification Technique' by Conditioning. It is widely known that, in general, the range of conditional expectation may exceed the (unconditional) expectation value, i.e., for some clever choice of the conditioning observable B and its outcome b ∈ R, one has with non-vanishing probability. Clearly, this property should prove itself useful in some certain situations.
While this property has occasionally been utilised in experiments, it has recently caught wide attention due to the reports on the success of application in precision measurements, including the experimental detection of the spin-Hall effect of light (SHEL) in 2008 [33], and the detection of an ultra-sensitive beam deflection in a Sagnac interferometer in 2009 [34]. The experiments have effectively utilised the technique of conditioning (or post-selection) to yield an enhancement (or 'amplification') of an extremely small beam displacement to the extent that it is large enough to overcome various technical imperfections (noise level), and eventually realising significant detection of such tiny effects. In this context, this technique has often been referred to as the 'weak value amplification' or as 'Aharonov-Albert-Vaidman effect' of amplification [5].

Review of the Recent theoretical Analyses.
Extensive theoretical analyses have been conducted in recent years from various viewpoints on the technical advantages of the technique of post-selection over the conventional unconditioned counterpart. Some of them addressed the question of signal amplification and its limit, where one asks the question as to what extent one can amplify the signal [35] and how one could achieve the optimisation [36]; the question of the existence of the limit of amplification will be addressed shortly in a more general framework. As far as the authors are aware of, the first sound analytic result appeared around 2012 [37], in which the limit to the amplification rate, as well as the signal-to-noise ratio has been explicitly presented. The computation was conducted for a special case where the observable A fulfils the condition A 2 = I and the meter wave functions were assumed to be of Gaussian states, which we shall also address in a relatively more general setting later in this section, and also in Appendix A.
Others focused on the statistical loss which occurs due to the post-selection and examine the feasibility of improving the parameter estimation of the coupling constant g by postselection based on estimation theory (for a concise review on the topic form this point of view, see [38]). The result is that the post-selection statistically deteriorates the quality of estimation, both in the case where ideal noiseless experiments can be performed [39], and also in some case where certain types of fully-known or controllable noise are present [40][41][42]. In an attempt to address the question of how the post-selection technique, while being statistically inferior to the unconditioned case, could be advantageous in realistic experiments, the authors have conducted a theoretical analysis on post-selected measurement in the presence of some intractable 'measurement uncertainty', a relatively modern concept in metrology to express unknown or uncontrollable source of technical imperfections [43]. It was then found that, while post-selection suffers from statistical deterioration, in certain cases the amplification effect becomes favourable in overcoming the unknown/uncontrollable source of technical imperfections one could not completely eliminate through 'noise hunting', which accordingly cannot be reduced from statistical reiteration. This suggests that the post-selection technique should be understood as the practice of taking advantage of the trade-off relation between the reduced contribution from intractable source of measurement uncertainty due to its signal amplification effect, and the statistical deterioration caused by the decrease in success probability.

Topic: 'Limit of Amplification' in Terms of Essential Suprema.
In what follows, we provide a somewhat general result regarding the question of 'limit of amplification' by conditioning, which has been one of the hottest topics among the study of the technical advantages in employing conditioning in experiments. A typical way to address this problem is to ask oneself, to what extent one could enlarge the conditional expectation E[X|B; Ψ g ] by choosing an appropriate conditioning observable B and its outcome b ∈ σ(B) with nonvanishing probability. By recalling the definition of essential supremum of a function (2.20), one realises that the question is equivalent to asking to what extent one could make the essential supremum of the conditional expectation large by the choice of the conditioning observable B.

Preliminaries.
To prepare for our arguments, we first observe some basic facts regarding absolute continuity and essential suprema.
Let (X, A, µ) be a probability space, and let ν : A → C be a complex measure. Then, the following conditions are equivalent: holds for all A ∈ A.
In such a cases, the Radon-Nikodým derivative dν/dµ exists by the Radon-Nikodým theorem, and its essential supremum dν/dµ ∞ gives the smallest of such M that satisfies (4.27).
Proof. For the equivalence of the condition (i) ⇔ (ii), the reader is referred to any textbooks on measure and integration theory. We already know from the Reference Material in Section 3.1 that |ν(A)| ≤ |ν|(A), A ∈ A. The implication (ii) ⇒ (iii) is then trivial by simply taking M = ∞. The converse (iii) ⇒ (ii) is also immediate by the definition of absolute continuity. Now that we have proved the equivalence of the three conditions, we move on to the demonstration of the final statement. To this end, first observe the evaluation which contradicts the minimality of |ν|.
As a corollary to this, the following observation is of special interest.

Corollary 4.3 (Conditional Expectations and Essential Suprema).
Let (X, A, µ) be a probability space, f : X → R be µ-integrable, and B ⊂ A be a sub-σ-algebra. Then the evaluation holds. As a direct consequence, if moreover a measurable function g : X → R is given, the evaluation naturally holds.
Proof. First recall that the conditional expectation E[f |B] is nothing but the Radon-Nikodým derivative of the complex measure f ⊙ µ with respect to the restriction µ| B . Letting ν := f ⊙ µ and replacing µ by µ| B in the above Lemma, one finds which was to be demonstrated.
In casual language, this is to say that each value of the conditional expectation of f never exceeds the maximum number that f takes under a given probability measure, which is a result that should be intuitively clear. As a direct application of the result in the context of quantum measurement of a pair of simultaneously measurable observables A and B, this reduces to the following.
Corollary 4.4. Given a pair of strongly commuting self-adjoint operators A and B and a fixed state |φ ∈ dom(A), the essential supremum of the conditional expectation of A given B is never greater than where A φ ∞ := a ∞ denotes the essential supremum of the measurable function a → a under the probability measure µ φ A describing the behaviour of the outcome of the measurement of A on the state |φ . If A happens to be bounded, its operator norm 20 A becomes the universal (i.e., state independent) upper bound of A φ ∞ , hence holds for all |φ ∈ H. 20 For a bounded operator X, recall that the operator norm of X is defined by X := sup{ Xφ : φ = 1}. (4.35) Proof. The former part of the statement is immediate by Corollary 4.3. For the latter part, we first recall that the numerical range of a self-adjoint operator X is defined as W (X) := { ψ, Xψ : |ψ ∈ dom(X), ψ 2 = 1}, (4.37) which is nothing but the collection of all possible expectation values of X. Now, a direct application of the Cauchy-Schwarz inequality leads to for bounded X, and by recalling the basic relation σ(X) ⊂ W (X), where the overline on W (X) denotes its topological closure, one concludes which was to be demonstrated.
The latter part of the statement is to say that conditional expectations of a bounded observable has a universal upper bound given by its operator norm, which is also a result that should be intuitively clear.
On the 'Limit of Amplification' by Conditional Measurement.
As a direct application of the above corollary to our problem, we obtain the main result of this passage.

Proposition 4.5 (Amplification by Conditioning).
Under the framework of the CM scheme, the essential supremum of the conditional expectation of X given B is never greater than that of the UM scheme of X where X ψ g ∞ := x ∞ denotes the essential supremum of x under the probability measure µ ψ g X describing the behaviour of the outcome of the local measurement X on the meter system. In other words, X ψ g ∞ gives the (conditioning-observable-independent) upper bound to the extent the conditional expectation can be 'amplified' by means of conditioning 21 .
In physical terms, this is to say that the extent one may 'amplify' the conditional expectation E[X|B; Ψ g ] by means of changing the conditioning observable B is predetermined by X ψ g ∞ . This is one general form to answer the question of the existence of the limit of 'amplification' by conditioning.
As the next step, one might eventually be interested in seeking for the condition under which X ψ g ∞ is bounded from above, even if we could freely choose the initial state |φ of the target system. This would create a universal upper bound of E[X|B; Ψ g ] ∞ that is indifferent to both the initial and final configurations of the target system (i.e., the choice of the initial target state |φ and the conditioning observable B). As we have learned from the discussions above, this would typically be the case when there exists a subspace U (g, ψ) ⊂ H ⊗ K, for fixed g ∈ R and |ψ ∈ dom(X), such that |Ψ g ∈ U (g, ψ) for all |φ ∈ dom(A), and that the restriction of I ⊗ X on U (g, ψ) is bounded. Proposition 4.6 (Limit of Amplification by Conditioning). Under the framework of the CM scheme, let both the interaction parameter g ∈ R and the initial meter state |ψ ∈ dom(X) be fixed, and suppose that the target observable A has a spectrum σ(A) = {a 1 , . . . , a N }, N ∈ N × of finite cardinality. Then, the following facts hold: (i) The density operator ψ g of the meter system (2.71) can be written as a probabilistic mixture of a finite number of projection operators (pure states) supported on the finitedimensional (at most N -dimensional) subspace which is independent of the initial choice |φ ∈ H of the target state. (ii) The restriction X| K(g,ψ) of the meter observable X on the subspace (4.41) is bounded, and thus its operator norm provides a finite universal upper bound to the conditional expectation that is independent of the configuration of the target system (i.e., the choice of the initial state |φ ∈ H and that of the conditioning observable B).
Proof. Under the above condition, first observe that where we have used (2.77). One readily finds from the above formula that the density operator defined as in (2.71), can indeed be written as a probabilistic mixture of a finite number of projection operators (pure states) supported on the subspace (4.41). We then recall that any operator X defined on a finite-dimensional Hilbert space are necessarily bounded, and thus observe that the current problem at hand reduces to the situation of Corollary 4.4.
In physical terms, this is to say that there exists a finite limit X| K(g,ψ) < ∞ to the extent one may 'amplify' the conditional expectation E[X|B; Ψ g ] by means of only changing the configuration of the target system (namely, by changing either or both the conditioning observable B and the initial state |φ ∈ H of the target system). Specifically, the evaluation holds for all b ∈ σ(B) up to a set of probability zero, and the upper bound X| K(g,ψ) does not depend on the choice of B nor |φ . Naturally, if one could change either the interaction parameter g or the initial state |ψ of the meter system alongside, the above result is no more valid.

Recovery of the Target Profile
Parallel to the study of the UM scheme, we are now interested in the information of the target system which is to be extracted from the CM scheme. Following the line of arguments for the UM scheme, we are specifically interested in investigating the local behaviour of the outcome of the CM scheme around g = 0, i.e., the weak conditioned measurement, in which the target of our analysis is the map from the interaction parameter g to the conditional expectation of X given B, which was in general defined as a map from the real line to an equivalent class of functions. To this end, we first conduct a preliminary observation.

Preliminary Observation.
Since the definition of the conditional expectation is given in a rather abstract way, the conditional expectation (4.22) in general does not admit an explicit expression by vectors and operators (in contrast to the UM case (2.73), which always admits such an explicit expression). In view of this, it would be sometimes helpful if one could find a condition for which the conditional expectation (4.22) of our interest may be explicitly written down. We first point out that this will be indeed the case given that the spectrum of the conditioning observable B has finite cardinality. Now, let be the spectral decomposition of B, where σ(B) = {b 1 , . . . , b N } is any enumeration of its eigenvalues, and Π b := E B ({b}), b ∈ σ(B) denotes the unique projection on the eigenspace associated to it. It is then fairly straightforward to see by definition that the conditional expectation of X given B is explicitly given by Here, recall that conditional expectations are defined as an equivalence class of functions, and hence its value for the outcome b of the measurement of the observable B such that the probability of observing it is vanishing, is indefinite by definition. The study of the weak CM scheme then reduces to the analysis of the map for each b ∈ σ(B) such that the probability of observing it is non-vanishing. Since this is a map from the real line to itself (i.e., a function), it should be a much more familiar and straightforward object to deal with.

Objective of this Passage.
In what follows, we will be discussing the differentiability of the function (4.49) at the point g = 0. To this end, first observe that the choice of b ∈ σ(B) for which the probability of observing it is non-vanishing is dependent on g. Hence, for each b ∈ σ(B), we must first guarantee its well-definedness, at least on some neighbourhood of g = 0. Fortunately, this is indeed the case for the choice b ∈ σ(B) such that the probability of finding it on the initial state |φ of the target system E Π b ⊗ I; Ψ 0 = Π b φ 2 = 0 is nonvanishing, due to continuity of the function g → E [Π b ⊗ I; Ψ g ]. The main objective of this passage is to demonstrate the following statement.

Proposition 4.7 (Differentiability of the Conditional Expectation: Preliminary).
Suppose that the conditioning observable B has spectrum of finite cardinality, and moreover let |φ ∈ dom(A), |ψ ∈ D ⊂ dom(X) (the subspace D is defined as in (2.37)) be assumed, so that the conditional expectation E[X|B; Ψ g ] is well-defined for all range of g ∈ R. Then for b ∈ σ(B) such that Π b φ 2 = 0, the conditional expectation E[X|B = b; Ψ g ] is well-defined on some neighbourhood of g = 0. It is moreover differentiable with respect to g at the origin, for which the differential coefficient reads Here, we have introduced the quantities, occasionally called the symmetric and anti-symmetric (quantum) covariance 22 of X and Y on the state |ψ ∈ D, respectively, where {X, Y } := XY + Y X denotes the anti-commutator (not to be confused with the braces denoting sets).
Proof. Throughout the proof, we choose b ∈ σ(B) such that Π b φ 2 = 0. Then, it is fairly straightforward to see that the map is well-defined on some neighbourhood U 0 around the origin g = 0. It then follows directly from the expression (4.54) that the differentiability of both the numerator and the denominator of the r. h. s. gives a sufficient condition for the conditional expectation E[X|B = b; Ψ g ] to be differentiable. In order to simplify our notations, we assume in the following that all the vectors |φ and |ψ , respectively representing the initial quantum states of the target and the meter system, are normalised. Since the proof is rather lengthy, we divide it into several parts.

Leibniz Rule.
To prepare for our arguments, we first recall some basic facts. Let F, G : U → H be a map from an open subset U ⊂ R of the real line to a Hilbert space H. If both 22 Note that in the case where the two observables coincide X = Y , the symmetric quantum covariance reduces to the familiar variance, which is reminiscent of the familiar result in classical probability theory, whereas the anti-symmetric covariance reduces to null CV A [X, X; ψ] = 0. maps F and G are strongly differentiable at t 0 ∈ U , the inner product t → F (t), G(t) is differentiable at t 0 ∈ U , and the derivative satisfies the Leibniz rule, Differentiability of the Numerator.
To prove the differentiability of the numerator of (4.54) and obtain its derivative, we first introduce two auxiliary maps F (g) := |Ψ g and G X (g) := (Π b ⊗ X)F (g), by which we rewrite the numerator in terms of their inner products. From the Leibniz rule, one sees that the desired result can be immediately obtained once the differentiability of both the maps F (g) and G X (g) are proven and their derivatives are given. As for the strong differentiability of the map g → F (g), one readily finds by Stone's theorem on one-parameter unitary groups that the condition |φ ∈ dom(A), |ψ ∈ D ⊂ dom(Y ) (4.57) would suffice, in which case the derivative is given by As for the map g → G X (g), we first observe that it is written as Due to the boundedness (continuity) of the operator (Π b ⊗ I), strong differentiability of the vector-valued map g → (I ⊗ X)F (g) would give a sufficient condition for G X (g) to be strongly differentiable, which one readily proves under the condition by imitating the arguments we have made starting from (2.52) with the help of the relation (2.80). Now that the strong differentiability of both the maps g → F (g), G X (g) are proven, one finds from the closedness of the self-adjoint operator (Π b ⊗ X) that Given the results (4.58) and (4.61), the Leibniz rule leads to the desired differentiability of the numerator (4.56), in which one computes its derivative as where we have used the operator equality valid on the subspace D.
Differentiability of the Denominator.
The proof for the differentiability of the denominator E [Π b ⊗ I; Ψ g ] goes essentially the same as that for the numerator, where one readily proves its differentiability at g = 0 under the condition |φ ∈ dom(A), |ψ ∈ dom(Y ), in which case the derivative reads by formally replacing X with I in (4.62).

Final Result.
Combining the above two results (4.62) and (4.64), one concludes that, given the choice |φ ∈ dom(A) and b ∈ σ(B) with Π b φ 2 = 0 of the target configuration, and |ψ ∈ D for the meter system, the conditional expectation E[X|B = b; Ψ g ] is indeed differentiable at g = 0. Its derivative can then be evaluated based on the classical result of calculus (the quotient rule for derivative) as and the topology in which the convergence is meant. From the result of Proposition 4.7, one might naturally conjecture that the limit is given by with a 'function' f defined formally as In order to make this observation a precise mathematical statement, we first introduce a convenient concept.

Conditional Quasi-expectations.
Observing that in the case where A and B are simultaneously measurable, the function (4.68) is nothing but the conditional expectation of A given B. In general, however, the target observable and the conditioning observable B need not be simultaneously observable. We thus wish to define a quantum analogue of conditional expectations of an observable A given another observable B, well-defined even for the pair that are not necessarily simultaneously measurable. To this end, we first fix a non-zero vector |φ ∈ dom(A) and consider a complex measure where E B is the unique spectral measure accompanying B. Now, a direct application of the Cauchy-Schwarz inequality leads to By definition, it is the unique µ φ B -integrable (equivalence class of) function(s) that satisfies and as such, Arbitrariness to Conditional Quasi-expectations.
As one may immediately notice, there exists an arbitrariness to the way one may define conditional quasi-expectations. For example, one may just define the complex conjugate of the complex measure (4.69) as and introduce the Radon-Nikodým derivative as In fact, it reveals that there exists a multitude of potential candidates for possible definitions of such 'conditional quasi-expectations', all sharing desirable properties mentioned earlier. We shall be returning to this problem in a more general framework of quasi-joint-probabilities of quantum observables in Section 6, but for our purpose and the scope of this paper, it suffices to concentrate only on the family (4.77) for definiteness, and we thus introduce: Note, by definition, that each member E α [A|B; φ], α ∈ C, of the family of conditional quasi-expectations is integrable with respect to the probability measure µ φ B , and its total integration coincides with the expectation value E[A; φ] of A. If the conditioning observable B happens to possess spectrum with finite cardinality, so that its spectral decomposition reads (4.47), the conditional quasi-expectation admits an expression by operators and vectors as Incidentally, if the conditioning observable happens to be a projection B = |φ ′ φ ′ | on a one-dimensional subspace of H spanned by a unit vector |φ ′ (i.e., a post-selection), the conditional quasiexpectation of A given the outcome B = 1 reads given that the probability of finding the outcome 1 of B is non-vanishing µ φ B ({1}) = | φ ′ , φ | 2 = 0. Specifically for the choice α = 1, this reduces to The value A w is widely referred to as Aharonov's weak value [5,6] of A for the pair of the pre-selected state |φ ∈ dom(A) and the post-selected state |φ ′ ∈ H. Historically, the weak value is said to have been originally introduced as a hypothetical value of an observable A assigned to a quantum process from the pre-selected to the post-selected state, generalising the common practice of solely assigning values to a single static state in the standard framework of quantum mechanics. Following this philosophy, the value (4.80) termed the two-state value [44] of A under the respective selections of states was recently introduced in an attempt to generalise the idea of the weak value and to find out the possible form of a quantity of an observable specified by two quantum states. An application of the generalised Gleason's theorem revealed that, under certain desirable conditions, the most general form of the values of an observable A that can be assigned to the two specification of the quantum states |φ ∈ dom(A), |φ ′ ∈ H satisfying φ ′ , φ = 0 is given by (4.80) with a parameter α ∈ C representing the ambiguity inherent to it.

Essential Supremum of Conditional Quasi-expectations.
While conditional quasiexpectations and the standard conditional expectations share various properties in common, the non-commutative nature of quantum observables results in some interesting distinctions between the two concepts. In this paper, as an example, we shall focus on the remarkable difference in the behaviour of their essential suprema. Now, as one recalls from Corollary 4.4, for a pair of simultaneously measurable observables A and B and a fixed state |φ ∈ dom(A), the essential supremum of the conditional expectation E[A|B; φ] ∞ is never greater than the essential supremum A φ ∞ of the measurable function a → a under the probability measure µ φ A . If A happens to be bounded, the operator norm A gives the state independent universal upper bound to A φ ∞ , which in turn also naturally becomes an upper bound to the conditional expectation E[A|B; φ]. However, in general, this property is no longer preserved when A and B fail to be simultaneously measurable. There are several possible ways to express this discrepancy, but for brevity, we formulate it in the following manner.
To this end, we first prepare a terminology. In this paper, we say that an observable A on H is non-trivial if A is not a scalar multiple of the identity operator tI, t ∈ R, or equivalently, if A has a spectrum σ(A) of cardinality not less than 2. Note that the non-triviality of A automatically implies dim(H) ≥ 2, where dim(H) denotes the dimension of the Hilbert space H. Since trivial operators strongly commute with any other self-adjoint operators, the function E α [A|B; φ] always become an authentic conditional expectation, revealing itself to be a constant function always taking its unique eigenvalue E α [A|B; φ] = t, whose case is not interesting for our purpose. Hence, we shall from now on confine ourselves to the case where A is non-trivial. Proposition 4.8 (Essential Supremum of Conditional Quasi-expectations). Let A be a non-trivial observable, |φ ∈ dom(A) a vector that is not an eigenvector of A, and let α ∈ C be any choice of the ambiguity parameter of the conditional quasi-expectation. Then, for any non-negative number 0 ≤ M < ∞, there exists a self-adjoint operator B (not-necessarily simultaneously measurable with A) such that the essential supremum of the conditional quasiexpectation of A given B is not less than (4.82) Specifically, one may always choose such conditioning observable B = |φ ′ φ ′ | to be a projection onto a one-dimensional subspace of H spanned by some unit vector |φ ′ ∈ H.
Proof. It suffices to prove that, one may always adjust the choice of the conditioning observable B = |φ ′ φ ′ | so that the conditional quasi-expectation r, (r ∈ R), (α = 0) (4.83) may take any complex number for the choice α = 0, and any real number for the choice α = 0, while maintaining the probability of observing it to be non-vanishing µ φ B ({1}) > 0. The proof is a direct corollary of Proposition 4.9 that follows immediately.
In particular, this result is to say that one may always choose a conditioning observable B such that the essential supremum E[A|B; φ] ∞ of the conditional quasi-expectation exceeds A φ ∞ , which is never possible for standard conditional expectations defined for a pair of simultaneously measurable observables. This 'amplification of conditional quasiexpectations' is a noteworthy property of quantum mechanics, and the oft-discussed 'amplification of weak values' could be understood as its special case. Proposition 4.9 (Range of the Two-state Value). Let A be a non-trivial observable on H, and let |φ ∈ dom(A) be a pre-selected state that is not an eigenvector of A. Then, the two-state value of A under the pre-selected state |φ may take any complex number in the case α = 0, and in turn any real number in the case α = 0, given an appropriate choice of the post-selected state |φ ′ ∈ H.
Proof. For simplicity, we only provide the proof of the statement for the specific choice α = 1 of the ambiguity parameter without loss of generality. Now, before we go into the main part of the proof, we first observe that, for a non-trivial self-adjoint operator A and a normalised vector |φ ∈ dom(A), there exists a normalised vector |χ ∈ H orthogonal to |φ such that holds 23 . To see this, we first consider the case that is, when |φ is an eigenvector of A. Then, by choosing any normalised state |χ satisfying χ, φ = 0 (the existence of such |χ is guaranteed by the fact dim(H) ≥ 2), one finds that the above equality is fulfilled. Next, suppose that A|φ = E[A; φ] · |φ . Then, by defining one indeed learns that χ = 1 and χ, φ = 0 as stated. Armed with this fact and by fixing such |χ , we choose the post-selected state as with a free parameter c ∈ C × . One then finds This shows that, for the choice of an initial state |φ ∈ dom(A) that is not an eigenvector of A (which is always possible due to the non-triviality of A), the weak value (hence, also the two-state value) may indeed take any complex number by adjusting the free parameter c appropriately.
The difference between (standard) conditional expectations and conditional quasiexpectations in the behaviour of their essential suprema makes it clear that, conditional quasi-expectations are not conditional expectations in the classical sense. This provides an indirect proof for the fact that, in general, the 'joint behaviour' of the outcomes of the pair of (generally non-commuting) quantum observables A and B does not allow itself to be described by probability spaces. This would be accounted for in depth in Section 5 and 6 shortly.

Weak Conditioned Measurement.
Armed with our newly introduced concept of conditional quasi-expectations (4.77) of a quantum observable given another (not necessarily simultaneously measurable) quantum observable, we shall summarise our findings regarding the first-order local behaviour of the conditional expectation at the origin. Combining Proposition 4.7 and (4.78), one is naturally tempted to conjecture that: Proposition 4.10 (Weak Conditioned Measurement). Let A and B be self-adjoint operators defined on the target system H, and let the respective initial states |φ ∈ dom(A), |ψ ∈ D be fixed. Then, the conditional expectation E[X|B; Ψ g ] is well-defined for all range of g ∈ R, and the limit converges to point-wise µ φ B -almost everywhere. While we have explicitly proved the above statement only in the special case where B has spectrum of finite cardinality, the same statement indeed holds for general B, although we do not go into the technical details for its demonstration. One may thus understand the process of the weak CM scheme as the practice of measuring (the real and imaginary parts of) the conditional quasi-expectation E[A|B; φ] of the target system. This result is to be compared with the unconditioned counterpart, in which one may extract the standard (unconditional) expectation E[A; φ] by means of the weak UM scheme from the first-order differential coefficient of the measurement outcomes.

Topic: Conditional Quasi-expectation as the Merkmal for Amplification.
Under the above conditions, Taylor's theorem states that one has the following first-order expansion of the conditional expectation and the equality (4.91) is understood to hold µ φ B -almost everywhere. The above fact purports that the conditional quasi-expectation E[A|B; φ] gives the (best first-order) indicator on the degree of 'amplification' of the conditional expectation E[X|B; Ψ g ] one may attain by means of choosing the conditioning observable B on the target system. Colloquially speaking, if one hopes to gain large amplification effect by conditioning, the first place one should look for is its conditional quasi-expectation, and one may hopefully achieve it by adjusting the conditioning observable B so that the conditional quasi-expectation E[A|B; φ] becomes large enough. However, note here that while the conditional quasi-expectation (for nontrivial A, and in addition, for the choice of the initial state |φ ∈ dom(A) that is not an eigenvector of A) admits arbitrary large amplification by a suitable choice of the conditioning observable B (Proposition 4.8), the classical conditional expectation E[X|B; Ψ g ] may have an upper bound depending on its configuration (Proposition 4.5). This generally suggests that the discrepancies between the full-order behaviour of E[X|B; Ψ g ] and its first-order approximation becomes larger (in other words, the higher-order terms o(g) becomes more significant) as one adjusts the choice of the conditioning observable B so that the conditional quasi-expectation may become larger. As for the higher-order terms, although we shall omit details, we note that one may also prove higher-order differentiability of the conditional expectation by placing stricter conditions for the choice of both the initial states of the target system and the meter system, and subsequently compute higher-order derivatives through analogous procedure as demonstrated above.
In order to confirm this observation with a concrete model, we have included in Appendix A an analytic example where we compute the conditional expectation E[X|B; Ψ g ] for the special case in which the conditioning observable B = |φ ′ φ ′ | is a projection onto a one-dimensional subspace spanned by a unit vector |φ ′ ∈ H (i.e., the post-selected measurement scheme), and moreover the target observable A is dichotomic. One shall indeed find the existence of the limit of 'amplification' of the conditional expectation by the 'weak value amplification', and various other general properties alongside that we found in the discussions throughout this section.

Conditioned Measurement II: In Terms of Conditional Probabilities
In Section 3, we have elaborated the study of the UM scheme conducted in the preceding Section 2 in terms of probabilities. In this section, we follow the same line and intend to refine our analysis for the conditioned counterpart.

Preliminary Observations.
As one may recall, we have seen in Section 2 and Section 3 that, by means of the UM scheme, one could extract the information of the target system in both the form of the expectation value E[A; φ] and the probability measure µ φ A , the former by looking at the expectation value of the meter observable X conjugate to Y , whereas the latter by focusing at the probability measure of it, and they were obtained by either inspecting the strong region g → ±∞ of the interaction or by probing its local behaviour at g = 0, both in parallel manners. Now, as for the conditioned case, while we have not looked into the strong region g → ±∞ of the interaction parameter, our analysis on the local behaviour conducted in Section 4 revealed that the first-order derivative of the expectation value for the choice X = Q, P both contain potions (real and imaginary parts) of the conditional quasi-expectation E[A|B; φ]. By comparing this result to the unconditioned case, one may come to a naïve conjecture that both the expectation value and the conditional quasi-expectation of an observable A has some quality in common. Namely, since the CM scheme incorporate conditioning, one may speculate that the conditional quasi-expectation E[A|B; φ] may be interpreted as some form of a 'conditional average' with respect to an underlying 'probability distribution' of some kind.

Quasi-joint-probability Distributions in Quantum Mechanics.
A quick observation on our previous result (4.50) reveals that, the full description of the CM scheme must incorporate the information of the measurement outcomes of both the choice X = Q, P of the meter observables, which is in contrast to the unconditioned case where we may concentrate only on the analysis of the probability distribution describing the outcome of a single observable X that is conjugate to Y . In view of this, it would thus be natural to consider some form of a 'joint-distribution' describing the measurement outcome of both the observables Q and P . However, as we have seen in Section 3.1.7, and also from an indirect proof by observing the difference of conditional (quasi)-expectations in their behaviour regarding essential suprema that, by definition, only a pair of observables that are simultaneously measurable admits a description by joint-probability distributions in the classical sense and, unfortunately, the pair {Q, P } of observables of our present interest does not fall into this category.
On account of this, there have been various attempts to construct some alternative form of 'joint-distributions' for pairs of (generally non-commuting) quantum observables that possess convenient or desirable properties in describing the behaviour of both their outcomes. The Wigner-Ville distribution (WD distribution) [1,2], which purports to describe the 'joint behaviour' of the otherwise incompatible pair of observablesx andp on the normalised wave-function ψ ∈ L 2 (R), symbolically defined by W ψ (x, p) := 1 π R ψ * (x + y)ψ(x − y)e 2ipy dβ(y), (5.1) and the Kirkwood-Dirac distribution (KD distribution) [3,4], which on the other hand allows itself to be defined for arbitrary pair of observables A and B, symbolically defined by with the symbolical decomposition A = R a d|a a|, B = R b d|b b|, are among the most well-known classical proposals. The former allows negative numbers to be assigned, whereas the latter even admits complex numbers. Despite their queerness, they both retain some properties that one finds common in the standard (i.e., real and non-negative) jointprobability distributions, e.g., that they both have total integration of unity, and that the marginals coincide with the probability distribution describing the behaviour of the remaining observable, and in this sense, they are occasionally referred to as quasi-joint-probability (QJP) distributions of the specific pairs of observables.
Quasi-joint-probability Distributions and Conditional Quasi-expectations. Now, as some may expect, conditional quasi-expectations are closely related to the notion of quasijoint-probabilities in quantum mechanics. Indeed, a quick observation reveals that, given a symbolical spectral decomposition B = R b d|b b| of the conditioning observable, the complex-parametrised conditional quasi-expectation (4.77) for the choice α = 1 coincides with the, so to speak, 'conditional average' of A given the outcome b of B under the Kirkwood-Dirac distribution, as one finds under the formal computation .

(5.3)
As for the Wigner-Ville distribution, pure realness of its values might lead one to think that this is in some form related to the parametrised conditional quasi-expectation for the choice α = 0. Indeed, one confirms under the formal computation , (5.4) that the conditional quasi-expectation ofp givenx = x for the choice α = 0 coincides with the, again so to speak, 'conditional average' of the momentump given the outcome x of the positionx under the Wigner-Ville distribution.

Conditioned Measurement.
The above observation is instructive in guiding the direction of our analysis. Indeed, it would be natural to expect that the measurement of the meter system in view of QJP distributions of the pair of observables {Q, P } would allow us to extract the information of the target system in the form that is 'akin' to it, i.e., one might hope to obtain a QJP distributions of the target system, of which 'conditional average' coincides with the conditional quasi-expectations E α [A|B = b] of our interest. Guided by this formal argument and heuristic observation, in this section, we shall be analysing the CM scheme in terms of quasi-probabilities, or more specifically, in terms of 'conditional' quasi-probabilities. Now, as our previous arguments (in particular, those developed in Section 3.3.2) indicate, analysis directly on the level of probabilities is better suited to be performed in the space of generalised functions, rather than density functions or measures, if one is to conduct it with decent mathematical rigour and generality. This becomes especially crucial when introducing 'quasi-joint-probabilities' of a pair of (generally not necessarily simultaneously measurable) quantum observables, which is one of the main themes of this paper, and thus examined in depth in the next Section 6. However, since the present authors have judged the theory of generalised functions to be beyond the scope of this paper as a tool for analysis, we shall be working exclusively in the space of complex measures and density functions as usual. While this treatment comes with some unavoidable compromise on generality of the results and loss of transparency of the line of arguments, we hope that we may still convey the essence of the contents.

Conditioned Measurement in View of the WV Distributions.
In this section, the target of our interest for our measurement is the QJP distribution of the pair of observables Q and P on the meter, and we shall study how one may extract information of the configuration of the target system from this viewpoint. Now, as one may realise from the two concrete classical proposals given above (namely, WV distribution and KD distribution), there exist an indefiniteness/arbitrariness to the choice of such distributions, and by its very nature, one may equally conduct the analysis in view of any of one's own selection. In this section, we shall be analysing the CM scheme exclusively in terms of the Wigner-Ville distribution. The primary reason for our choice is merely based on its degree of familiarity in the physics community, and as mentioned above, the choice is essentially arbitrary. One may naturally conduct the same type of analysis in view of another type of quasi-probability distribution (e.g., the Kirkwood-Dirac type) in a similar manner and obtain analogous results, or may treat them collectively from a more general viewpoint (more to this in Section 6).

Reference Materials
As usual, we first make a brief review on the basic concepts and facts that are used in our later discussion. It is immediate that the map µ( · |B) is itself a probability measure satisfying the relation Conditional Probability given a Sub-σ-Algebra.
We now intend to generalise the elementary definition above to suit our further needs. In parallel to the manner we have done for conditional expectations in the previous section, let B ⊂ A be a sub-σ-algebra, and for each measurable set A ∈ A, we define the conditional probability of A given B by where χ A is the characteristic function ( This clarifies the relation between the general definition (5.7) and the elementary definition (5.6).
Conditional Probability given a Function. Now, under the condition above, instead of being given a sub-σ-algebra, suppose that one is given a measurable function g : X → Y for conditioning. We thus define µ(A|g) := µ(A|I(g)), A ∈ A, (5.9) to be the conditional probability of A given g, where I(g) is the initial σ-algebra of g (see (4.9) for its definition), and also introduce µ(A|g = y) := µ(A|g)(y), y ∈ R, (5.10) of which notation involves subtlety regarding the choice of the representative, in parallel to the situation of conditional expectations we have seen earlier.

Conditional Probabilities as Equivalent Classes of Functions.
Given a probability space (X, A, µ) and a sub-σ-algebra B ⊂ A, the conditional probability µ( · |B) satisfies properties analogous to those of probability measures, namely However, the key distinction to be noted between the usual probability measures is that, the above (in)equalities are guaranteed to hold almost everywhere, since by definition, conditional probabilities are equivalent classes of functions. It is thus of natural interest whether we could raise the limitation by dropping 'validity almost everywhere', which one may occasionally find troublesome.
To this end, we first recall the definition of transition kernels. Let (X, A) and (Y, B) be measurable spaces. We say that a map K : on (Y, B) for every x ∈ X, a transition kernel from (X, A) into (Y, B). A transition kernel is said to be (σ-)finite if the map B → K(x, B) is (σ-)finite for all x ∈ X. If K is normalised to unity K(x, Y ) = 1 for all x ∈ X, we say that K is a transition probability kernel. Given a σ-finite transition kernel holds. The following theorem is of much use.
Theorem (Transition Kernels into Measures on Product Spaces). Let K : X × B → [0, ∞] be a σ-finite transition kernel from (X, A) into (Y, B), and let µ be a measure on (X, A).
Then, there exists a measure π on the product space for all f ∈ M + (A ⊗ B). If, moreover, both µ and K happens to be finite, then π is the unique finite measure on the product space satisfying This provides us a convenient way to construct a measure on the product spaces given a transition kernel and a measure.

Conditional Probability Distributions.
We now return to our main line of arguments, and first introduce the definition of conditional probability measures.
Definition (Conditional Probability Measure). Let (X, A, µ) be a probability space, and let B ⊂ A be a sub-σ-algebra. We call a transition probability kernel K : X × B → [0, 1] a conditional probability measure (or a regular version) of the conditional probability µ( · |B) given B, if the map x → K(x, A) happens to be a representative of µ(A|B) for all A ∈ A, namely holds, where the brackets around an element denote its equivalence class. If such a transition probability kernel exists, we customarily denote it with the same notation µ( · |B), and its images are in turn denoted as interchangeably, depending on the aesthetics of the formula in which it should appear.
The presence of conditional probability measures allows us to readily make a connection between conditional expectations (defined previously in (4.5)) and averages with respect to conditional probabilities under consideration.
Proposition (Conditional Expectations as Averages over Conditional Probability Measures). Let (X, A, µ) be a probability space, B ⊂ A be a sub-σ-algebra, and suppose that the conditional probability µ( · |B) has a conditional probability measure. Then, for every µ-integrable function f , the map is a representative of the conditional expectation of f given B.
We note that conditional probability measures do not necessarily exist for general measure spaces. However, fortunately for us, the case (X, A) = (R n , B n ) that we are interested in is known to always admit it.

Conditional Probability Distributions.
Given a probability space (X, A, µ) and a sub-σalgebra B ⊂ A, suppose that a measurable map f : (X, A) → (X ′ , A ′ ) is moreover given.
In parallel to what we have seen for conditional expectations, this allows us to define an equivalence class of functions for all A ′ ∈ A ′ . Then, a transition probability kernel K : is called a conditional probability distribution of f given B. Likewise, given another measurable map g : (X, A) → (Y ′ , B ′ ), a transition probability kernel K : X × A ′ → [0, 1] from (X, I(g)) into (X ′ , A ′ ) satisfying is called a conditional probability distribution of f given g. Such transition probability kernels do not necessarily exist in general, but as above, the case (X ′ , A ′ ) = (R n , B n ) that we are interested in is known to always admit it.

Conditioning in Quantum Measurements.
Under the context of quantum measurements, let A and B be a pair of simultaneously measurable observables. Given a joint-probability distribution µ φ A,B of A and B on some quantum state |φ ∈ H, we introduce where π A (a, b) = a and π B (a, b) = b are measurable functions (projections) respectively representing the behaviour of the measurement outcomes of A and B. Accordingly, we define the conditional probability distribution of A given B to be a transition probability kernel which, as guaranteed above, is known to always exist. The values of the conditional probability distribution of A given B are in turn denoted interchangeably by depending on the context. Accordingly, in this section we employ the renormalised L p -norm and the convolution defined by the renormalised Lebesgue-Borel measure, For brevity, we occasionally write dm 1 = dm whenever there is no risk for confusion. Now, for a function f ∈ L 1 (R n ), recall that the functionsf ,f : R n → C defined bŷ with the scalar product q, x := n k=1 q k x k of two real vectors in R n , are respectively called the Fourier transform and the inverse Fourier transform of f . The C-linear map F that maps f to its Fourier transformf is called the Fourier transformation. It is known that the Fourier transformation is injective, i.e.f =ĝ implies f = g. For f, g ∈ L 1 (R n ), the following properties under the convolution (5.28), scaling (3.96), and translation (3.119), (τ a f )(t) = e i a,x f (t), a ∈ R, (5.33) respectively, are basic. The Fourier transformation F plays particularly well on the subspace S (R n ) ⊂ L 1 (R n ), where it becomes a linear bijection of S (R n ) onto S (R n ), whose inverse is given by the inverse Fourier transformation (recall, on the other hand, that one does not necessarily havef ∈ L 1 (R n ) for f ∈ L 1 (R n ) in general). One then has for f ∈ S (R n ), where we have used the multi-index γ := (γ 1 , . . . , γ n ) ∈ N n 0 as in (2.24) and introduced the shorthand |γ| := γ 1 + · · · + γ n .
On the other hand, if we consider the Fourier transform ofω ψ (x, y) with respect to its second parameter y,W ψ (x, p) := R e −ipyωψ (x, y) dm(y) = R ψ * (x + y/2)ψ(x − y/2)e ipy dm(y), (5.39) we readily find that it is a real function,  Applying Plancherel's theorem, one finds thatW ψ ∈ L 1 (m 2 ), and thus its total integration reads If the total integration (5.42) is non-vanishing, which is equivalent to the condition ψ = 0, the real quasi-probability density function denoted by is called the Wigner-Ville distribution on ψ. As we have seen in (5.41), the WV distribution possesses useful properties for our analysis, namely, that its marginals yield the probability density function describing the behaviour of the measurement outcomes of the respective observablesx andp on the state ψ, which is to say that if explicitly written down. Thus, the choice ψ ∈ L 1 (R) ∩ L 2 (R) defines a complex measure For our later argument we note that, since the functions ω ψ (x, y) and W ψ (x, y) are mapped to one another by Fourier transformation, they just represent the same contents seen from different viewpoints, and are thus essentially the same object.

Conditioned Measurement
We are now interested in simultaneously measuring the probability measure of B on the target system and a QJP distribution of Q and P on the meter system. This should be possible since every local measurements can be simultaneously performed on separate systems, and this leads to an existence of a joint distribution of the probability measure of B on one side, and a QJP distribution of Q and P on the other. Throughout this section, for definiteness, we exclusively treat the special case in which the meter state is described by the one-dimensional Schrödinger representation of the CCR {L 2 (R), S (R), {x,p}} and choose Y =p without loss of generality.

5.2.1.
Conditioning over Quasi-probabilities. Since we are now dealing with complex measures, the definitions for conditioning must be suitably expanded accordingly. To this end, we first prepare a terminology: Definition (Quasi-probabilities). Let (X, A) be a measurable space. We call a complex measure ν on (X, A) satisfying the normalisation condition ν(X) = 1 a quasi-probability measure, and accordingly the triplet (X, A, ν), a quasi-probability space.
If the underlying space is given by (X, A) = (R n , B n ), and the quasi-probability measure ν happens to be absolutely continuous, we call its density dν/dβ n ∈ L 1 (R n ), which is in general a complex function that has the total integration of unity, a quasi-probability density function.
According to the definition, note that the usual (i.e., real and non-negative) probability measures and density functions are special members of the respective families of quasiprobability measures and density functions. In analogy to the standard probability spaces, given a quasi-probability space (X, A, ν) and a ν-integrable function f , we occasionally denote the total integration by 47) and call it the quasi-expectation value of f under ν. Quasi-joint-probabilities.
As a special subclass of quasi-probability measures, we say that a quasi-probability measure ν ∈ M C (B n ) qualifies as a QJP distribution of the observables A 1 , . . . , A n on the state |φ ∈ H, if it satisfies for all 1 ≤ k ≤ n. In parallel to it, we prepare the term QJP density function for those ν that are absolutely continuous 24 . One confirms from (5.46) that, for the choice of the quantum state ψ ∈ L 1 (R) ∩ L 2 (R), the quasi-probability measure (5.45) qualifies as a QJP distribution for the pair of observables Q and P . It should be intuitively straightforward to see by the formal arguments made in the introduction that the Kirkwood-Dirac distribution also qualifies as a QJP distribution of the pair of observables under consideration.
Conditional Quasi-expectations. We next intend to introduce analogous definitions regarding conditioning on quasi-probability measure spaces (X, A, ν). To this end, we make some very important remarks on the different properties between standard probability measures and quasi-probability measures. Recall that we have made extensive use of the Radon-Nikodým theory for defining conditional expectations and conditional probabilities. In applying the theory, first note that positiveness of the measure µ is necessary in order for the Radon-Nikodým derivative dν/dµ of some complex measure ν ≪ µ to be well-defined. Hence, conditioning by a sub-σ-algebra B ⊂ A must be such that the restriction ν| B becomes a measure. The second fact to notice is that, for a ν-integrable function f , the complex measure on the sub-σ-algebra defined by is not necessarily absolutely continuous with respect to the restriction ν| B , in contrast to that of positive measures. With these in mind, we hereby define: Definition (Conditional Quasi-expectation). Let (X, A, ν) be a quasi-probability space, and B ⊂ A a sub-σ-algebra such that the restriction ν| B becomes a probability measure (i.e., real and non-negative). For a ν-integrable function f such that f ⊙ ν ≪ ν| B , we define the conditional quasi-expectation of f given B by the Radon-Nikodým derivative of the complex measure f ⊙ ν with respect to the measure ν| B .
Given another measurable function g : X → R such that the above conditions are fulfilled for the initial σ-algebra B = I(g), we define E[f |g] and any other relevant notations such as E[f |g = y] etc. in an analogous manner to those defined for standard probability measures.
We then intend introduce a complex analogue of conditional probabilities defined for quasi-probability measures.
Definition (Quasi-Conditional Probabilities). Let (X, A, ν) be a quasi-probability space, and B ⊂ A a sub-σ-algebra such that the restriction ν| B becomes a probability measure. For a measurable set A ∈ A, we define to be the quasi-conditional probability of A given B, whenever whenever the r. h. s. is well-defined. If, instead of being given a sub-σ-algebra, one is given a measurable function g : X → Y for conditioning, we define ν(A|g) := ν(A|I(g)), A ∈ A, (5.53) where I(g) is the initial σ-algebra of g, whenever, as usual, the r. h. s. is well-defined.
The conditional quasi-probability ν( · |B) satisfies properties analogous to those of quasiprobability measures, namely (i) ν(∅|B) = 0, ν(X|B) = 1, (ii) ν(A|B) ∈ C, A ∈ A, (iii) for any sequence (A n ) n≥1 of pairwise disjoint subsets of X, the equality holds, whenever every component above is well-defined. In parallel to conditional probabilities, the validity of the (in)equalities above are significant only in the sense of ν| B -a.e.

Conditional Quasi-probability Measures.
We now expand the definition of transition kernels to fit into the theory of complex measures. Let (X, A) and (Y, B) be measurable spaces. We say that a map K : X × B → C that satisfies the conditions (i) the map x → K(x, B) is A-measurable for every B ∈ B, (ii) the map B → K(x, B) is a complex measure on (Y, B) for every x ∈ X, a complex transition kernel from (X, A) into (Y, B). If a complex transition kernel K satisfies K(x, Y ) = 1 for all x ∈ X, we call such K a transition quasi-probability kernel. The following analogous result is of use.

Proposition 5.1 (Complex Transition Kernels into Complex Measures on Product Spaces).
Let K : X × B → C be a complex transition kernel from (X, A) into (Y, B), and let µ be a measure on (X, A). Then, there exists a complex measure π on the product space for all f , whenever the integration on the r. h. s. is well-defined. In particular, the complex measure π satisfies Armed with the above concepts, we thus introduce: Definition (Conditional Quasi-Probability Measure). Let (X, A, ν) be a quasi-probability space, and let B ⊂ A be a sub-σ-algebra such that the restriction ν| B becomes a probability measure, and that ν(A|B) is well-defined for all A ∈ A. We call a transition quasi-probability kernel K : X × B → C a conditional quasi-probability measure of the conditional quasiprobability ν( · |B), if the map x → K(x, A) happens to be a representative of ν(A|B) for all A ∈ A, namely K( · , A) ∈ ν(A|B) , A ∈ A (5.57) holds, where the brackets around an element denote its equivalence class. If such a transition quasi-probability kernel exists, we customarily denote it with the same notation ν( · |B), and its images are in turn interchangeably denoted by depending on the aesthetics of the formula in which it should appear.
As above, such transition quasi-probability kernels do not exist in general, while the case (X, A) = (R n , B n ) is known to always admit it. We then have: Proposition (Conditional Quasi-expectations as Averages over Conditional Quasi-probability Measures). Let (X, A, ν) be a quasi-probability space, B ⊂ A be a sub-σ-algebra such that the restriction ν| B becomes a probability measure, and suppose that the conditional quasiprobability ν( · |B) has a conditional quasi-probability measure. Then, for every ν-integrable function f , the map is a representative of the conditional quasi-expectation of f given B.

Conditional Probability Distributions.
On a quasi-probability space (X, A, ν), suppose that a measurable map f : (X, A) → (X ′ , A ′ ) is moreover given. Choosing a sub-σ-algebra B ⊂ A such that the restriction ν| B is a measure, this allows us to define an equivalence class of functions for all A ′ ∈ A ′ , whenever they are well-defined. Then, a transition quasi-probability kernel is called a conditional quasi-probability distribution of f given B. Likewise, given another measurable map g : (X, A) → (Y ′ , B ′ ) such that the restriction of ν over its initial σ-algebra I(g) is a measure, a transition probability kernel K : is called a conditional quasi-probability distribution of f given g.

Conditioned Measurement via the WV Distributions
. Now that we have prepared the necessary concepts and results, we may embark on our analysis. By measuring B locally on the target system on one side, and a specific QJP distribution of Q and P locally on the meter system on the other, we obtain a quasi-probability distribution that describes the joint behaviour of the target system and the meter system. If, by haps (e.g. by choosing the right initial state |ψ ∈ K) the QJP distribution of Q and P on the meter admits representation by a complex measure, the total quasi-probability distribution of both the target and the meter system also admits representation by a complex measure. We thus generally define the CM scheme as an act of measuring the conditional quasi-probability distribution of the 'joint outcome' of Q and P of the meter system given the outcome of the conditioning observable B on the target system.

WV Distribution.
To demonstrate our point with an example, we shall from now on exclusively concentrate on the Wigner-Ville distribution for our choice of the QJP distribution of Q and P for definiteness. Since the choice ψ ∈ L 1 (R) ∩ L 2 (R) of the initial meter state allows the WV distribution to be described by quasi-probability measures on (R 2 , B 2 ), we assume such special choice throughout this passage in order to remain contained in the framework of measure and integration theory (so that we may not have to deal with the theory of generalised functions). In this subsection, the CM scheme is studied in view of the WV distribution. We first start by transcribing the CM scheme, which was initially introduced in terms of vectors and operators on Hilbert spaces, into the description by quasi-probability density functions on R 2 . It is then found that the transcription allows a much simpler expression in view of its Fourier transform (rather than the WV distribution itself), in which the description of the meter system after the interaction is given precisely by the convolution of the configuration of both the meter and the target system, quite analogous to the case of the UM scheme that we have previously seen. This allows us to extract the information of the target system either by means of deconvolution discussed earlier (specifically by constructing an approximate identity on the meter system), or by probing the behaviour of the distribution around the origin g = 0. We shall then investigate the properties of the information of the target system we have just obtained, and find that this qualifies as a 'conditional quasi-probability distribution of A given B', of which the average has a connection to the conditional quasi-expectation of A given B introduced earlier.

Preliminary Observation.
As a preliminary observation, we start by assuming that the target observable A has a spectrum consisting of a finite number of eigenvalues σ(A) = {a 1 , . . . , a N } so that its spectral decomposition reads (2.76). For the ease of arguments, we further assume that the conditioning observable B also has a spectrum consisting of a finite number of eigenvalues σ(B) = {b 1 , . . . , b M }, that every eigenvalue of B is degenerate, i.e., Π bm = |b m b m | for some normalised vectors |b m ∈ H for all 1 ≤ m ≤ M , and moreover that Π b φ 2 = 0 for all b ∈ σ(B). As for the state preparation, let ψ ∈ L 1 (R) ∩ L 2 (R) be a wavefunction of the meter system with normalisation ψ 2 = 1 so that the WV distribution can be represented by a quasi-probability density function, and we also let the initial selection |φ ∈ H of the target system be normalised φ = 1.
Computing the WV Distribution.
We are now interested in measuring the WV distribution of the meter system given the outcome of B on the target system. Since both the measurements are local measurements performed on the respective systems, this should be statistically equivalent to measuring the WV distribution for the meter state is the, so-to-speak, 'conditional' meter state 25 given the outcome b of B. Our analysis thus reduces to computing the WV distribution of the density operator ψ g B=b for each of the outcomes b ∈ σ(B). In our case, in which we assume that the eigenvalues of B are all degenerate, the density operator (5.63) in fact becomes a pure state, of which representation by wave-functions reads where we have used (3.90) to obtain the first equality. Here, we have introduced an auxiliary quasi-probability measure defined by means of the spectral measure E A of A, the initial state |φ ∈ H of the target system, and the outcome b ∈ σ(B) of the conditioning observable, and have used a result analogous to (3.64) in the last equality. One then finds that the WV distribution of the meter wave-function (5.65) reads where we have introduced the product measure ν * b ⊗ ν b of ν b and its complex conjugate 26 in the last equality. In order to gain a better view of our findings, let us now change variables according to the linear transformation, Since T ∈ GL(2, R) belongs to the general linear group, for indeed det T = 1/2, note that this transformation is invertible, i.e., it is a linear automorphism. We then introduce the quasi-probability measure defined on the measurable space (R 2 , B 2 ) as the image measure (cf. see (3.9) for the definition of image measures) of the product complex measure with respect to the automorphism T (we shall be shortly returning to the properties of the quasi-probability measure (5.70) and the righteousness of its notation). Then, due to the change of variables formula (3.10), one 26 For a pair of complex measures µ and ν, by observing that µ ≪ |µ| and ν ≪ |ν|, we define the product complex measure of µ and ν by may rewrite our previous findings (5.67) by letting a ′ 1 = a 1 − a 2 and a ′ 2 = a 1 + a 2 as where the change of the order of the integration in the second equality is guaranteed by the Fubini's theorem. For later convenience, we introduce the complex number a ∈ C defined by a := a 1 + ia 2 by identifying C ∼ = R 2 in a usual manner, and write To sum up, here we have learned how the CM scheme may be rewritten in terms of quasiprobability measures, in which the WV distribution of the initial meter wave-function ψ is acted upon by the quasi-probability measure (5.70) of the target system to yield the final WV distribution of the meter wave-function ψ g B=b .
Changing the Viewpoint through Fourier Transformation. One finds below that the transcription (5.72) of the CM scheme admits a much simpler expression when described in terms of the inverse Fourier transform (5.38) of the WV distribution, rather than the WV distribution itself. Introducing the (yet to be normalised) functionω ψ g B=b (x, y) uniquely specified through the relation B=b (x, y) dm(y), g ∈ R (5.73) (cf., injectivity of the Fourier transformation), the goal of this small paragraph is to show that our finding (5.72) is equivalent tõ which is essentially nothing but the convolution of the initial profileω ψ of the meter state by that of the two-dimensional quasi-probability measure ∆ → µ φ A (∆|B = b) scaled by g. If, moreover, the total integration ofω ψ happens to be non-vanishing, we may renormalise both sides of the above equality to obtain for later use 27 . Observe here the analogy between the unconditioned case (3.99): in both cases, the profile of the 'output' of the meter is given by the convolution of the profile of the 'input' of the meter and that of the target system scaled by g.
To verify our statement, one may simply repeat the previous argument to obtain the result directly, but it is actually easier to demonstrate that the Fourier transforms of the two sides of the above equality coincide. Indeed, the Fourier transform of the l. h. s. is nothing but W ψ g B=b , which is just the definition (5.73). As for the r. h. s., one has where the exchange of the order of the integration (the first equality) is guaranteed by Fubini's theorem, and the last equality is due to (5.33). Combining the above two results and by observing (5.72), the injectivity of the Fourier transformation leads to the desired statement. We emphasise again that both (5.72) and (5.75) represent the same contents seen from different viewpoints.

Recovery of the Target Profile
We are now interested in how one may recover the profile ∆ → µ φ A (∆|B = b) of the target system for each b ∈ σ(B) through CM scheme. As one may expect, the procedure essentially goes analogously to that of the recovery of the probability measure µ φ A in the case of the UM scheme demonstrated in Section 3.2. Recalling the techniques employed there, and by introducing the rescaling υ ψ (x, y) := 2 −1 ω ψ (x, 2y), (5.78) for the ease of discussion, one may readily rewrite (5.75) into or equivalently where the subscript on the respective quasi-probability measures/density functions denotes the scaling (3.94) and (3.96), just as we have done for the case of the UM scheme (see (3.99) and (3.100)). In parallel to the case of the UM case, these two expressions (5.79) and (5.80) 27 Here, note that we have used the general property of convolutions g dβ n , f, g ∈ L 1 (R n ) (5.76) regarding integration.
correspond to the manner in which one combines the interaction parameter (3.89), where the former corresponds to the scaling of the target observable A → gA, whereas the latter corresponds to the scaling of the pair of the meter observables {Q, P } → {g −1 Q, gP } (cf. (3.85) and (3.88)).

Strong Conditioned Measurement.
We now intend to recover the quasi-probability measure ∆ → µ φ A (∆|B = b) by making use of the latter expression (5.80). The idea and the procedure are essentially the same as those we have employed in the unconditional case, namely, we manipulate both the interaction parameter g ∈ R × and the initial meter state ψ ∈ L 1 (R) ∩ L 2 (R) so that the scaling υ ψ g −1 of the inverse Fourier transform of the WV distribution tends towards the delta measure δ 0 centred at the origin 0 ∈ R 2 .
Recovery of the Conditional Quasi-joint-probability.
For the same reason discussed in Section 3.3.1, we assume throughout this passage: • The target observable A admits description by density functions.
• The total integration ofω ψ is non-vanishing.
The first condition guarantees that the quasi-probability measure ∆ → µ φ A (∆|B = b), ∆ ∈ B 2 is absolutely continuous for all b ∈ σ(B), of which density we shall write The last condition is necessary in order to assure the well-definedness of ω ψ . Then, one sees from an analogous argument that we have previously made in Section 3.3.1 that, if one adjusts the pair of g ∈ R × and ψ ∈ L 1 (R) ∩ L 2 (R) so that υ ψ g −1 makes itself an approximate identity in L 1 (R 2 ), one may let the product of the convolution (i.e., the 'outcome') converge towards the desired target with respect to the L 1 -norm. A typical way to construct such an approximate identity is to start by preparing a compactly supported wave-function ψ, which automatically guarantees ψ ∈ L 1 (R) ∩ L 2 (R), and to consider a family {ψ (h) } h>0 of the initial meter state defined as in (3.124). One then finds and hence by observing that the above equality has total integration of |h| R 2ω ψ dm 2 , its normalisation becomes This should further lead to υ ψ(h) when scaled by g −1 . With the initial ω ψ (or equivalently υ ψ ) being compactly supported, one then sees that this indeed makes an example of an approximate identity, and we may thus achieve our objective by either narrowing the wave-function h → 0, by intensifying the interaction g −1 → 0 (g → ±∞) or by appropriately balancing both manoeuvres and letting hg −1 → 0 altogether.

Weak Conditioned Measurement.
We shall next investigate how the map behaves locally around g = 0, and discuss what information of the target configuration one might reveal through it. In parallel to the case of the UM scheme discussed in Section 3.3.2, one finds below that the information of the configuration of the target system is encoded into the differential coefficients of the above map at g = 0, and that by knowing all the higherorder derivatives, one may fully recover the quasi-probability measure ∆ → µ φ A (∆|B = b) of our interest.

Main Objective.
Throughout the following passage, we assume the following.
• The quasi-probability measure ∆ → µ φ A (∆|B = b) has a compact support. • The total integration ofω ψ is non-vanishing, and its normalisation belongs to the Schwartz space ω ψ ∈ S (R 2 ).
These requirements are imposed primarily for the same reason as we have previously discussed in analysing the weak UM scheme in Section 3.3.2 (which, in short, is to say that we do not wish to get involved in the theory of generalised functions). A sufficient condition for the first and second assumptions would be to respectively require that the spectral measure E A be compactly supported, and that ψ ∈ S (R). Under such conditions, the main objective of this passage is to demonstrate the following Proposition: Proposition 5.2 (Weak Conditioned Measurement). Under the above conditions, the map (5.86) is arbitrarily many times strongly differentiable on all the real line R, and its nth derivatives at the origin g = 0 reads Here, γ = (γ 1 , γ 2 ) ∈ N 2 0 is a multi-index introduced in (2.24), and the 'quasi-moments' under the quasi-probability measure µ φ in its explicit form, where we understand a = a 1 + ia 2 ∈ C.
Proof. Since the assumptions and reasonings are essentially the same as those provided for the unconditioned counterpart, we shall avoid reiteration and provide a rough sketch of the proof. In order to avoid clumsiness of notation, we write υ := υ ψ , υ[g] := υ ψ g B=b and µ := µ φ A ( · |B = b) for simplicity, and denote by υ (n) [g] the nth derivative of the map g → υ[g]. 99 We first prove that the nth derivative of υ[g] reads As above, we argue by mathematical induction. The case n = 0 is trivial. Suppose that the statement is true for n ∈ N 0 . Then, one may compute its point-wise derivative as and subsequently prove its strong differentiability by employing the same technique as above. This completes our first step of the proof. Now, by taking g = 0 of (5.89), we observe which completes our proof.
One then immediately obtains the following corollary by applying the Stone-Weierstraß approximation theorem and the Riesz-Markov-Kakutani representation theorem.

Corollary 5.3 (Recovery of the Target Profile by Weak Conditioned Measurement).
The weak CM scheme (i.e., the knowledge of all the 'quasi-moments' (5.88)) allows us to uniquely specify the quasi-probability measure µ φ A ( · |B = b) of our interest. Compare these results to those obtained in the case of the weak UM scheme described in Section 3.3.2.

Profile of the Target System
We have so far investigated how the CM scheme can be transcribed into the language of conditional quasi-probabilities, rather than in terms of mere conditional expectations. As a result, we found that the measurement outcome after the interaction incorporates two components: one being the profile of the meter system in the form of the WV distribution and the other being the that of the target system in the form of the quasi-probability measure µ φ A ( · |B = b) defined in (5.70). Specifically, in view of the (scaled) inverse Fourier transform of the WV distribution, we found that the manner in which the two components interact with each other admits a simple description by convolution (5.75), which is quite analogous to the unconditioned case. Based on our findings, we have thus analysed how one may recover the profile µ φ A ( · |B = b) by means of both the strong and weak CM schemes, whose procedures are also quite analogous to the unconditioned counterpart. We are now interested in the properties of the quasi-probability measure µ φ A ( · |B = b) we have obtained, which should be expected to convey some information of the target system.
Quasi-joint-probability Distribution of a Pair of Observables.
By means of either the strong or weak CM scheme, we have so far obtained the family of quasi-probability measures µ φ A ( · |B = b) for all b ∈ σ(B). Allowing it to extend on the whole real line, one may construct a complex transition kernel by from the space (R, B 1 ) of the measurement outcomes of B into (C, B(C)). For definiteness, we assign to each b / ∈ σ(B) any quasi-probability measure, so that (5.92) defines a transitional quasi-probability kernel as a whole. This allows us to construct a quasiprobability measure µ φ A,B on the product space (C × R, B(C) ⊗ B 1 ), by combining the transition quasi-probability kernel (5.92) and the probability measure µ φ B , that satisfies 93) whose existence is guaranteed by Proposition 5.1. The target of our analysis in this passage is the quasi-probability measure (5.93). As one may expect from the notation employed, we shall shortly see that this qualifies as a QJP distribution of the target observable A and the conditioning observable B.
Proposition 5.4 (Quasi-joint-probability Distribution). Under the definitions above, the quasi-probability measure µ φ A,B qualifies as a QJP distribution of A and B in the sense of (5.48), namely holds. Here, µ φ A denotes the probability measure on (C, B(C)) generated by the twodimensional spectral measure associated to A understood as a normal operator, whereas µ φ B denotes the probability measure on (R, B) generated by the one-dimensional spectral measure associated to the self-adjoint operator B.
Proof. We start by demonstrating that the marginal of the quasi-probability measure µ φ A,B of the first term coincides with the probability measure µ φ A on (C, B(C)) generated by the spectral measure of A (seen as a normal operator). To this end, we first observe where we have used (5.70) in the last equality. Now, in order to proceed further, we then maintain that the measure is essentially the same object as the continuous C-linear map defined by in the sense of the Riesz-Markov-Kakutani representation theorem. The proof can be carried out in several ways, but for the sake of simplicity, we rather take an elementary approach.
Observing that any two measures on a product space (X × Y, A ⊗ B) coincides with each other if they coincide on the subset A * B ⊂ A ⊗ B (see (3.29) for the definition), one proceeds as Armed with the findings, we return to our original problem (5.95) and finally obtain b∈σ(B) where E A,0 denotes the product spectral measure of the one-dimensional spectral measure E A of A (as a self-adjoint operator) and that of the 0 operator E 0 (i.e., the 'delta spectral measure' (3.102) centred at the origin), and the last equality is due to the observation that the two-dimensional spectral measureẼ A of A as a normal operator coincides with the product spectral measureẼ A = E A,0 introduced above. This completes our proof for the marginal of the first term.
It now remains to compute the marginal of µ φ A,B of the second term, which one carries out as where the second equality is due to the definition of µ φ A,B , and the third equality is due to the fact that µ φ A (C|B = b) = 1 is normalised to unity (i.e., a quasi-probability measure) for all b ∈ σ(B).
As for the relation between the QJP distribution µ φ A,B and the transition quasi-probability kernel (b, ∆ A ) → µ A (∆ A |B = b), one immediately has the following corollary by construction.
Corollary 5.5. The transition quasi-probability kernel (5.92) is a conditional quasiprobability distribution of A given B under the QJP distribution µ φ A,B .
Conditional Quasi-expectation of A given B.
It is now tempting to investigate how the 'conditional average' of the QJP distribution µ φ A,B relates to the conditional quasiexpectation E α [A|B; φ] we have introduced earlier in (4.77). Proof. For the demonstration, let b ∈ σ(B). One then has where the second equality is due to the change of variables formula (3.10) for image measures.
Obtaining Conditional Quasi-probability Distribution by Conditioned Measurement.
We now realise that the CM scheme, in view of conditional quasi-probabilities, can be regarded as a method of obtaining conditional quasi-probability distributions of the target observable A given the conditioning observable B, and that it implies the existence of QJP distributions of a pair of (generally not necessarily simultaneously measurable) quantum observables lying underneath. Moreover, we have seen a connection between the concept of conditional quasiexpectations and the 'conditional average' of the QJP distributions, which is reminiscent of the familiar relation between classical conditional expectations and conditional average of probability measures. While we have conducted an analysis for the special case in which both A and B happen to possess spectra of finite cardinalities (and that B is degenerate), we note that one may suitably generalise the results obtained here by introducing appropriate mathematical tools and some little more advanced mathematical languages.

Quasi-probabilities of Quantum Observables
By studying the both the UM and CM schemes in depth throughout the preceding four sections, we have so far naturally arrived, by a purely bottom-up construction, at the concept of quasi-joint-probability (QJP) of an arbitrary pair of quantum observables. While such an operational way of demonstration has its own merit of being solid and down to earth, it has an apparent downside in that the line of argument lacks transparency and that the whole structure may become obscure on occasions. In this section, we will be conducting a top-down study on the topic as a complement to the analyses made in the preceding sections.
Organisation of this Section.
In this section, we first devote several pages to introducing some mathematical tools for our analysis as usual. We then propose a general prescription for the construction of QJP distributions of a given pair of quantum observables, and observe their basic properties. Since it is difficult to perform a general analysis on the whole class of all possible candidates of QJP distributions with full mathematical rigour due to the limited framework and tools available, for our demonstration we shall mostly concentrate on a special sub-family of such distributions parametrised by a single complex number, hopefully without loss of too much essence. We finally close this section by observing where the bottom-up line of discussion performed in Section 5 fits in this more general framework.

Reference Materials
As usual, we first prepare some necessary mathematical tools for reference. As a generalisation to those defined on integrable functions, we now introduce Fourier transforms of complex measures. where q, x := n k=1 q k x k denotes the scalar product on R n as usual. Note that the functionŝ µ,μ are well-defined, for indeed |μ(q)| ≤ X |e −i q,x | d|µ|(x) = µ < ∞ for all q ∈ R n , where |µ| and µ are respectively the variation and the total variation of µ (a similar evaluation holds forμ).

Basic Properties.
To see how this newly introduced definition of Fourier transforms relates to that of integrable functions introduced earlier, let L 1 (B n ) ⊂ M C (B n ) be the subalgebra of absolutely continuous complex measures with respect to m n , where m n denotes the renormalised n-dimensional Lebesgue-Borel measure on (R n , B n ) defined in (5.26). Choosing µ ∈ L 1 (B n ) and letting ρ := dµ/dm n be the Radon-Nikodým derivative of µ, one finds by a direct application of (3.20) that µ(q) := R n e −i q,x dµ(x) = R n e −i q,x ρ(x) dm n (x) =:ρ(q), (6.3) holds as expected. An analogous relation holds for the inverse Fourier transform as well.
The C-linear map F that maps a complex measure into its Fourier transform is called the Fourier transformation. In parallel to that defined for integrable functions, the Fourier transformation on the measure algebra is injective, i.e.,μ =ν implies µ = ν. For µ, ν ∈ M C (B n ), the properties respectively.

Linear Transformation.
Let T be a linear operator on R n (i.e., an n × n real matrix), and let µ ∈ M C (B n ) be a complex measure. We then define the linear transform for a pair of linear operators S and T on R n , and that by the change of variables formula (3.10), whenever the integration exists. Note that the familiar scaling µ t , t = R defined in (3.94), and the translation τ a µ, a ∈ R n defined in (6.7) are respectively special cases of the linear transform of µ with respect to T = tI and T = I − a, where I denotes the identity operator. In such a cases, note also that the linear operators involved are automorphisms, hence members of the general linear group GL(n; R). In relation to the Fourier transformation, one finds where T * denotes the adjoint (in this case, the transpose T * = T t ) of the Matrix T .

Complex Conjugate.
We finally review how the Fourier transform behaves regarding the operation of taking the complex conjugate of a complex measure. To this end, let µ be a complex measure on (R n , B n ), and define the complex conjugate of µ by µ * (∆) := µ(∆) * , ∆ ∈ B n in a natural manner. One then readily finds (6.12) where f † (x) := f * (−x) denotes the involution of a function f .

Differentiation.
We finally make a brief note on the basic results regarding differentiability and derivatives of a Fourier transform of a complex measure at the origin. Lemma 6.1. Let µ be a complex measure on (R n , B n ), and let γ ∈ N n 0 be a multi-index. If the integration exists for all 0 ≤ γ ′ ≤ γ, then the derivative D γμ of the Fourier transform of µ exists at the origin, in which case the derivative reads x γ dµ(x). (6.14) Proof. One readily computes where the second equality (exchange of the differentiation and integration) is a consequence of the dominated convergence theorem.
Compare this result to that for Schwartz functions (5.34).

Quasi-joint-probabilities of a Combination of Quantum Observables
We now intend to provide a general prescription for defining a QJP distribution of a combination of generally not necessarily simultaneously measurable quantum observables.

Preliminary Observations.
In this passage, we conduct some formal discussions on the topic of QJP distributions of a combination of quantum observables. Since rigorous treatment requires advanced mathematical tools that is beyond the scope of this paper, we first conduct a formal and intuitive argument to obtain the essence of the idea. Now, before we embark on our main objective, we first recall a basic theorem regarding strong commutativity of A and B and that of their unitary operators.
Theorem. Let A and B be self-adjoint. Then, the following conditions are equivalent.
(i) The operators A and B strongly commute with each other.
(ii) The operators e isA and e itB commute with each other for all s, t ∈ R, namely e itA e isB = e isB e itA , s, t ∈ R (6.16) holds.
This familiar theorem builds the starting point of our discussion that follows.

Fourier Transform of Product Spectral Measures.
Recall that the joint behaviour of the outcomes of an ideal measurement of a pair of simultaneously measurable observables A and B is governed by the product spectral measure E A,B of their respective spectral measures E A , E B introduced earlier in (3.67). An important observation here is to see that the 'Fourier transform' of the product spectral measure E A,B is nothing but the product (6.16) of the parametrised unitary operators where the overline on the essentially self-adjoint operator sA + tB denotes its unique selfadjoint extension as usual, and the second equality is due to the familiar Trotter formula.

Hashed Operators.
We now consider a pair of arbitrary (not necessarily stronglycommuting) self-adjoint operators A and B. Guided by the above observation, we formally introduce #(s, t) := a 'decent' mixture of the disintegrated components of e −isA and e −itB (6.18) for the pair of A and B. Example of such mixtures of the disintegrated components of the unitary operators are given by: etc., (6.19) or even any linear combinations of them. The term 'decent' is intended to express a mathematical condition as to what qualifies as a reasonable 'mixture' to meet our purpose. However, we do not intend to discuss its precise mathematical definition here, for it is beyond the scope of this paper. In this paper, the 'parametrised family of operators' #(s, t) shall occasionally be referred to as hashed operators of the unitary operators, in a rather casual manner. Due to the commutativity of the unitary operators for a simultaneously measurable pair, the hashed operator #(s, t) = e −isA e −itB is always unique, while on the other hand, hashed operators admit variety for non-commutative pairs. Now, given a hashed operator # of A and B, we then introduce the collection of all parametrised operators of the form M A,B := {# : # is a hashed operator of A and B defined as in (6.18) } . (6.20) As we have seen above, in the case in which A and B are simultaneously measurable, the above collection in fact consists of only one trivial element due to the strong commutativity of the two operators. On the other hand, one readily observes that the cardinality ofM A,B is always greater than unity if the pair of observables A and B fails to strongly commute. where # is the hashed operator whose inverse Fourier transform is the element Π = F −1 # of our choice. As for the marginals, by formally introducing one observes under a formal computation that The injectivity of the Fourier transformation F leads us to conclude that the marginal Π B = E B is essentially the same object as the original spectral measure governing the probabilistic behaviour of the outcomes of B. By a parallel argument, one also finds that the marginal is nothing but Π A = E A . These properties are naturally found common in product spectral measures defined for strongly commuting pairs of self-adjoint operators, although each Π(a, b) is not necessarily a projection, or may not be even positive. This tempts us to introduce the term quasi-joint-spectral distributions of a pair of observables, which can be understood as a generalisation of the concept of spectral measures or POVMs.
Definition In the case where the observables A and B happen to strongly commute with each other, we specifically call the unique element of M A,B the joint-spectral distribution of A and B, which is nothing but the product spectral measure E A,B of the pair in standard terminology. We note that the terminologies introduced above are non-standard, and are to be used only in this paper.

Quasi-joint-probability Distributions.
Although the study on the precise definitions and properties of the family of quasi-joint-spectral distributions would be of mathematical interest in its own right, we shall refrain from going further due to the limited mathematical tools available. Instead, we turn to a more elementary object to ease our discussion. Now, given a quasi-joint-spectral distribution Π ∈ M A,B of A and B, we fix a specific quantum state |φ ∈ H, and consider a distribution of the form formally defined by where # is the hashed operator of which inverse Fourier transform Π = F −1 # is the quasijoint-spectral distribution under consideration. Since the distribution p is 'scalar valued', it should be a much more feasible object to deal with than the 'operator valued' distribution Π introduced earlier. We thus introduce the collection In the case where the observables A and B happen to strongly commute with each other, we specifically call the unique element of M φ A,B the joint-probability distribution of A and B on |φ , which is nothing but the probability measure µ φ A,B of the pair introduced in (3.69). Given a hashed operator # of the parametrised unitary groups and a quantum state |φ ∈ H, we call an element p ∈ M φ A,B specified by the QJP distribution generated by # and |φ . Our choice of the denomination of the elements of p ∈ M φ A,B is due to the fact that they retain similar properties to those of classical jointprobability distributions. Indeed, the 'total integration' reduces to where # is the hashed operator that, together with |φ , generates p. As for the marginals, by introducing the marginal distribution formally defined by one observes through a formal computation that The injectivity of the Fourier transformation F leads us to conclude that the distribution p B (b) is essentially the same object as the probability measure µ φ B describing the probabilistic behaviour of the outcomes of B. By a parallel argument, one also finds that the marginal is nothing but p A = µ φ A . Before we proceed further, we make notes on some mathematical intricacies involved in their definitions for the interested.

Mathematical Remarks.
One may notice some subtleties inherent to the definition of M φ A,B . The first problem might be the domain of the definition of the inverse Fourier transformation: while the Fourier transform of a complex measure µ is a function, in regard that it does not necessarily lie inμ / ∈ L 1 (R n ), its inverse Fourier transform may not be well-defined, even in the case where A and B strongly commute with each other. This can be temporarily remedied by understanding the inverse Fourier transform of an element u ∈M φ A,B to be the unique complex measure µ such that u =μ holds, which should be a reasonable treatment due to the injectivity of the Fourier transformation. This provides a sufficient cure in the case where the pair of observables strongly commutes.
On the other hand, another problem arises in the case in which the pair of self-adjoint operators fails to strongly commute: it might happen that, for some element u ∈M φ A,B , there is no complex measure µ such that its Fourier transform coincides with u =μ. A straightforward and more fundamental cure for this would be to expand our framework into that of generalised functions, specifically, by embedding the space of complex measures into that of tempered distributions. Indeed, since the Fourier transformation is a bijection on the space of tempered distributions, by understanding that each of the elements ofM φ A,B to be a tempered distribution, its inverse Fourier transform itself always exists as a tempered distribution. In consideration of this, since we do not wish to get involved with the theory of generalised functions, we shall be exclusively dealing with those elements u ∈M φ A,B for which there exists a complex measure µ satisfying u =μ, and understand the element µ := F −1 u ∈ M φ A,B to be the complex measure. To this end, we introduce: Definition (Representation by Quasi-probability Measures). Under the above situation, let p ∈ M φ A,B be a QJP distribution of A and B, and let u ∈M φ A,B be an element such that p = F −1 u. We say that the QJP distribution p admits representation by a quasi-probability measure, if there exists a quasi-probability measure µ on K 2 such that u =μ holds, and understand the QJP distribution p = µ to be the quasi-probability measure.
A similar concern arises for the definition of quasi-joint-spectral distributions Π = F −1 # defined as inverse Fourier transforms of hashed operators # of the unitary operators e −isA and e −itB . Parallel to the 'scalar valued' case seen above, quasi-joint-spectral distributions Π are better understood as an object generalising the concept of spectral measures (or POVMs), in the sense that, while spectral measures E (or POVMs) yield probability measures φ, E( · )φ / φ 2 when combined with a vector |φ , quasi-joint-spectral distributions Π yield generalised functions, symbolically denoted by φ, Π(a, b)φ / φ 2 . In this respect, quasi-joint-spectral distributions are to be understood as elements of the space of operator valued (tempered) distributions (OVDs), which should serve as a generalisation to that of POVMs.
We also note that the methods introduced above in defining QJSDs admit a straightforward generalisation in defining them, not only for a pair (N = 2) of quantum observables as presented above, but also for arbitrary combinations (N ≥ 2) of quantum observables, or even for arbitrary combinations of POVMs. Also, one may readily generalise the discussion for defining QJP distributions, not just for pure states as presented above by sandwiching the QJSPs by kets and bras, but also for mixed states by taking the trace of the product of QJSPs and density operators.

Complex-parametrised Sub-families
Since we have decided to confine ourselves in the framework of complex measures rather than that of generalised functions due to our restricted mathematical tools available, we would mostly refrain from treating the general cases, and shall concentrate on a special sub-families of QJP distributions of a pair of quantum observables A and B. Proof. We provide the proof for the first case without loss of generality. Observe that the complex measure is absolutely continuous with respect to µ φ B for all fixed ∆ A ∈ B. This allows us to construct a transition quasi-probability kernel by taking the Radon-Nikodým derivative of the above complex measure with respect to µ φ B . A direct application of Proposition 5.1 with f (a, b) := e −i(as+bt) then leads to the desired statement.

113
This inspires us to introduce the complex linear combinations of the above two distributions. We hereby consider the hashed operators of the form # α add (s, t) := 1 + α 2 e −itB e −isA + 1 − α 2 e −isA e −itB , s, t ∈ R, α ∈ C, (6.38) and observe that the QJP distributions induced by them naturally admit representation by quasi-probability measures.
Corollary 6.4. The QJP distributions generated by the hashed operators of the form (6.38) and |φ ∈ H admits representation by quasi-probability measures.
In this paper, we call the above sub-family of QJP distributions the additive complexparametrised sub-family of QJP distributions of A and B on |φ (or simply, the additive sub-family, for short).
6.3.2. Convolutive Sub-family. One realises below that another class of QJP distributions parametrised by a complex number can be introduced. We hereby consider the hashed operators of the form where s, α := s 1 α 1 + s 2 α 2 denotes the inner product of each of them understood as real vectors of R 2 ∼ = C, and introduce the convolutive complexparametrised sub-family of QJP distributions of A and B on |φ (or simply, the convolutive sub-family, for short) by those elements of M φ A,B that are generated by the hashed operators of the form (6.39) and |φ .

Linear Transformation.
It is of natural interest to find out the condition as to when an element of the convolutive sub-family admits representation by quasi-probability measures. Obviously, the choice α = ±1 admits it, since they are also members of the additive subfamily introduced earlier. As for the other choices of the complex parameter α ∈ C, we first introduce an auxiliary distribution defined bỹ where s = s 1 + is 2 ∈ C, s 1 , s 2 ∈ R is defined as (6.40). Once there exists a quasi-probability measureμ such that its Fourier transform coincides with Fμ =ũ, one finds below that every member of the convolutive sub-family is a linear transform of the quasi-probability measurẽ µ, hence themselves admit representation by quasi-probability measures. To see this, we first introduce the matrix defined for each complex number α = α 1 + iα 2 ∈ C, α 1 , α 2 ∈ R. The Fourier transform Fμ (Tα×I) of the linear transform of the quasi-probability measureμ with respect to the operator (T α × I)(a, b) := (T α a, b), a ∈ C, b ∈ R, (6.43) reads where we have used (6.11) in the first equality. We thus have: Lemma 6.5 (Transformation between Parameters). Let A and B be self-adjoint operators on H, and let |φ ∈ H.
(i) If the inverse Fourier transform of the auxiliary distribution (6.41) admits representation by a quasi-probability measure, then all the members of the convolutive sub-family admit representation by quasi-probability measures. (ii) Under the above situation, let T α be the linear transformation defined for every choice of the complex parameter α ∈ C as in (6.42), and letũ be the auxiliary distribution (6.41) andμ be the quasi-probability measure such that Fμ =ũ. Then, every member of the convolutive sub-family can be described as the linear transform ofμ as where µ φ,α cnv denotes the quasi-probability measure generated by the hashed operator of the form (6.39), and T α × I is the operator defined as in (6.43).

Representation by Quasi-probability Measures.
Now, observing that the determinant of the linear transform T α reads det T α = Im α/2, (6.46) one finds that the transformation µ → µ (Tα×I) is invertible if and only if α ∈ C \ R, for indeed T α ∈ GL(2; R) ⇔ α ∈ C \ R. The product rule (6.9) then reveals that, one may move from one member of the convolutive sub-family to another by a sequential application of the transformations as for the choice α ∈ C \ R and α ′ ∈ C. Combining Lemma 6.5 with the above observation, one concludes:  Explicit Computation of the Members of the convolutive Sub-family.
We shall provide an explicit example of the case in which every member of the convolutive complex-parametrised sub-family admits representation by quasi-probability measures. Proposition 6.7. Let A and B self-adjoint, and suppose that B has spectrum σ(B) of finite cardinality and that it is non-degenerate For a quantum state |φ ∈ H such that the probability of finding the outcomes of B is nonvanishing | b, φ | 2 = 0 for all its eigenvalues b ∈ σ(B), every member of the convolutive subfamily of QJP distributions admits representation by quasi-probability measures.
Proof. Corollary 6.6 purports that it suffices to construct the quasi-probability measureμ that satisfies Fμ =ũ, whereũ is the auxiliary distribution (6.41). Now, under the above conditions, let b ∈ σ(B) and ∆ A ∈ B 1 be fixed, and introduce the Radon-Nikodým derivative where the complex measure ν( · , ∆ A ) was defined in (6.37). For every fixed b ∈ σ(B), this defines a quasi-probability measure ∆ A → ν b (∆ A ), which is in fact nothing but a slight generalisation of the quasi-probability measure (5.66) previously introduced in Section 5.
Defining the product complex measure on the product space R 2 ∼ = C for each b ∈ σ(B), we intend to extend the domain of the variable b to the whole real line to make a transition quasi-probability kernel from (R, B) into (C, B(C)) by defining, for example, where δ 0 is the delta measure centred at the origin (for the extension into R \ σ(B), we could have assigned any quasi-probability measure so that the extension makes a transition quasi-probability kernel as a whole). Lettingμ denote the quasi-probability measure on the product space C × R defined byK and µ φ B by means of which was to be demonstrated. We have thus achieved a concrete construction of the quasiprobability measure, whose Fourier transform is the distributionũ defined in (6.41).
In passing, we note that, by comparing the transformation matrices (6.42) and (5.69), one finds that the quasi-probability measure µ φ A,B obtained in the preceding Section 5 defined as in (5.93) is nothing but the member of the convolutive complex-parametrised sub-family for the purely imaginary choice α = i of the complex parameter.
6.3.3. Qualification as Quasi-joint-probability Distributions. Although we have provided a formal discussion to the problem, it yet remains to be confirmed by a rigorous treatment that every member of either the additive or the convolutive complex-parametrised subfamily of QJP distributions of a pair of quantum observables indeed qualifies as what its name indicates itself to be. Without loss of generality, we only provide the demonstration for the convolutive sub-family, since the proof for the additive subfamily is essentially the same.
Proposition 6.8 (Qualification as Quasi-joint-probability Distributions). Let A and B be self-adjoint, |φ ∈ H, and suppose that there exists a quasi-probability measure µ on the product space C × R such that holds for some α ∈ C. Then, µ qualifies as a QJP distribution of A and B on |φ , in the sense that (5.48) holds.
Proof. We first observe a general result regarding marginals of complex measures and Fourier transformations. Let µ be a complex measure on the product space R m × R n , and define the marginal of µ by which is itself a complex measure on (R n , B n ). One then observeŝ where the second equality is due to the change of variables formula (3.10) for the image measure µ 2 = π 2 (µ), where π 2 (x, y) = y, x ∈ R m , y ∈ R n is the projection on the second variable. Applying this fact to our situation as one readily finds µ(C × ∆) = µ φ B (∆), ∆ ∈ B 1 , (6.59) by the injectivity of the Fourier transformation. One may also demonstrate µ(∆ × R) = µ φ A (∆), ∆ ∈ B(K) by an analogous reasoning, which completes our proof.
6.3.4. Relation to other known Proposals. We demonstrate below, in passing, that the complex-parametrised sub-families of the QJP distributions of a pair of quantum observables serve as generalisations to the other well known proposals of quasi-probability distributions.

Kirkwood-Dirac Distribution.
We first note that the Kirkwood-Dirac distribution, introduced in (5.2) in a formal manner, can be given a mathematically rigorous definition within our framework, and that it belongs to both the additive and convolutive sub-families of the QJP distributions for the choice α = 1.
Definition (Kirkwook-Dirac Quasi-joint-probability Distribution). Let A and B be selfadjoint on H, and let |φ ∈ H. We call the member of the additive/convolutive sub-family of the QJP distributions of the pair of observables A and B for the choice α = 1, the Kirkwook-Dirac QJP distribution of the pair.
To see how this definition can be justified, observe the following formal chain of expressions where K φ A,B is the formal definition of the Kirkwood-Dirac distribution introduced in (5.2). The injectivity of the Fourier transformation leads to the desired statement.

Wigner-Ville Distribution.
We next note that the Wigner-Ville distribution, introduced in (5.43), is also a special member of the convolutive sub-family of QJP distributions. Proposition 6.9 (Wigner-Ville Distribution). Let {L 2 (R), S (R), {x,p}} denote the onedimensional Schrödinger representation of the CCR. Then, for the choice ψ ∈ L 1 (R) ∩ L 2 (R) of the wave-function, the member of the convolutive sub-family of the QJP distributions of the canonically conjugate pairp andx admits representation by quasi-probability measures for the choice α = 0, which we denote by µ ψ,0 cnv . The quasi-probability measure µ ψ,0 cnv is absolutely continuous, and its Radon-Nikodým derivative with respect to the renormalised two-dimensional Lebesgue-Borel measure reads Proof. Observe that the condition ψ ∈ L 1 (R) ∩ L 2 (R) guarantees the integrability W ψ ∈ L 1 (R 2 ) of the WV distribution, based on which we compute Combining (6.3) and the injectivity of the Fourier transformation, one arrives at the desired statement.

Some General Properties
We next observe some general properties of QJP distributions. We first provide some discussion regarding the operation of taking the complex conjugate, and subsequently seek for the condition for their realness.
6.4.1. Complex Conjugate. We are interested in the complex conjugate of QJP distributions of a pair of observables A and B on |φ . To this, let # be a hashed operator of the unitary operators e −isA , e −itB , and let |φ ∈ H be such that the QJP distribution generated by them admits representation by a quasi-probability measure µ. By applying (6.12), one readily finds that the Fourier transform of the complex conjugate µ * reads where #(s, t) * denotes the 'adjoint' of the hashed operator. Since the 'involution' #(−s, −t) * is itself a hashed operator of e −isA and e −itB , one concludes that the complex conjugate µ * is again a QJP distribution of the pair of observables A and B, and that it is precisely the distribution generated by the 'involution' of the original hashed operator. One also specifically finds that the sub-family of the QJP distributions M φ A,B that admit representations by quasi-probability measures is closed under the operation of taking the complex conjugate.
Parallel to this, by observing that the left most hand side of (6.63) can be written as where Π = F −1 # is the quasi-joint-spectral distribution, one concludes the validity of the equality where (F Π) † denotes the 'involution' (observe the analogy between (6.12)). This shows that the 'adjoint' Π * of the quasi-joint-spectral distribution of A and B is again a quasi-jointspectral distribution of the pair, and that it is precisely the inverse Fourier transform of the 'involution' of the original hashed operator.
Complex-parametrised Sub-families. Armed with our findings, one may explicit compute the complex conjugate of the elements of both the additive and the convolutive sub-families, and see that the sub-families are also closed under the operation of taking the complex conjugate. Indeed, if we respectively introduce for the members of the additive and convolutive sub-families, by observing that the 'involution' of the hashed operators read . Suppose that the member of the convolutive sub-family admits representation by the quasi-probability measure µ φ,α cnv for the choice α ∈ C. Then, the member for the choice −α ∈ C also admits representation by quasi-probability measures, and the equality holds.
6.4.2. Realness of the QJP Distributions. One may naturally be interested in the condition as to when the quasi-joint-spectral distribution Π = F −1 # becomes 'self-adjoint' so that the resulting QJP distribution, symbolically denoted by p(a, b) = φ, Π(a, b)φ / φ 2 , is also 'real' for any choice of the vector |φ ∈ H. While the task of finding the explicit condition for which Π(a, b) becomes 'self-adjoint' seems at first non-trivial, the problem becomes significantly tractable if one considers its Fourier transform. Indeed, combining (6.65) with the injectivity of the Fourier transform, one concludes that Π = Π * is 'self-adjoint' if and only if its Fourier transform (namely, the hashed operator) # = # † is a 'self-involution'. Examples of such 'self-involutive' hashed operators are provided by

Conditioned Measurement Revisited
We finally investigate how the CM scheme described in Section 5 fits into our general framework of quasi-joint-probabilities of quantum observables. What we see below is that the CM scheme is essentially a measurement scheme for measuring QJP distributions of an arbitrary pair of quantum observables. As before, since the tools for the analysis of the most general cases are beyond the scope of this paper, we shall exclusively concentrate on the subfamily of quasi-joint-probabilities parametrised by a single complex number. Without loss of generality, we only provide below a demonstration for the convolutive sub-family for simplicity.
6.5.1. Short Introduction. We now intend to construct a measurement scheme for obtaining the member of the convolutive sub-family of the QJP distributions for arbitrary choices of the parameter α ∈ C. As for the problem, let us first recall that the quasi-probability measure (5.93) obtained in Section 5 was nothing but the member for the choice of the parameter α = i (see (6.54) for the discussion). In fact, as we have seen before, once we know the member of the subfamily for the parameter α ∈ C \ R, we may compute all other members of the complex parameters by sequentially applying linear transformations as depicted in (6.47). Hence, the knowledge of the distribution for the choice α = i, obtained by means of the CM scheme in view of the WV distribution, actually suffices for our purpose. Even so, one might be interested in how one could measure the QJP distribution for some specific parameter in a more direct manner. This should also provide a much more transparent view of the measurement scheme described in Section 5 from a more general viewpoint, which may be beneficial in its own right.

Model and Assumption.
Throughout this subsection, we let A denote an observable on the target system H, and assume that the meter system K is described by the one-dimensional Schrödinger representation of the CCR {L 2 (R), S (R), {x,p}} for simplicity. As usual, we prepare the two systems into their respective initial states |φ ∈ H, |ψ ∈ K, and let them interact under the unitary operator e −igA⊗Y , g ∈ R, for which we choose Y =p for definiteness, and let |Ψ g denote the state of the composite system after the interaction. Since we intend to confine ourselves within the framework of complex measures, we place several conditions throughout this passage, so that, given a conditioning observable B on the target system H, all the members of the convolutive sub-family of the QJP distributions of A and B on |φ admits representation by quasi-probability measures.

Conditioned Measurement Revisited.
In the previous section, the choice of the QJP distribution we intend to measure on the meter system was the WV distribution, which we found to be nothing but the member of our convolutive sub-family of the QJP distributions of the canonically conjugate pair of observables A =p, B =x for the choice of the parameter α = 0. The result was that, one could obtain the member of the convolutive sub-family of the QJP distributions for arbitrary pairs of quantum observables for the choice α = i. Motivated by this finding, it is then natural to conjecture that a different choice of the meter QJP distribution results in different choice of the target QJP distribution.

Rescaling.
For simplicity of the argument, we only treat the case for the choice α ∈ C \ R, and for later convenience, we introduce the functioñ where we have combined the second and the last equality of (6.76), and applied the result (5.32).
QJP of the 'conditional' Meter State. The next step is to compute the functionυ ψ g b ,α for the 'conditional' meter state ψ g b := ψ g B=b introduced in (5.65). What we find below is that, parallel to the findings in Section 5, the resulting functionυ ψ g b ,α is provided by the convolution of the initial profiles of both the meter and the target configurations. As above, we assume, for the ease of demonstration, that both the target and the conditioning observables A and B have spectra of finite cardinality, that B is degenerate, and the probability of finding the outcomes of B is non-vanishing | b, φ | 2 = 0 for all its eigenvalues b ∈ σ(B). Since the essence of the demonstration is substantially the same as those provided in Section 5, we proceed by sketching the proofs.
In computing the function of our interest, we first compute its Fourier transform to observe where we have used (6.79) in the first equality, and where ν b is the quasi-probability measure introduced in (5.66). We next change variables of the above equality according to the linear transformation a 1 a 2 = T α s 1 s 2 , T α := (1 − α 1 )/2 (1 + α 1 )/2 −α 2 /2 α 2 /2 (6.81) by substituting is the image measure, and we have combined (6.79) with (5.33) to obtain the second equality. One thus concludes from the injectivity of the Fourier transformation that as promised.
Recovery of the Target QJP.
As for the recovery of the target information ∆ → µ φ,α A (∆|B = b), ∆ ∈ B(C), one may resort to the familiar techniques we have discussed so far in depth, namely, one may recover the full profile by either probing the strong or the weak region of the interaction parameter. Once we obtained µ φ,α A ( · |B = b) for all b ∈ σ(B), one may extend the domain of b ∈ σ(B) to the whole real line R in a consistent manner, making it a transition quasi-probability kernel. This allows us to construct the QJP µ φ,α A,B of the pair of A and B in a manner described in Proposition 5.1 that satisfies A close look on the proof of Proposition 6.7 leads one to conclude that the QJP obtained here is in fact nothing but the member of the convolutive sub-family for the choice α ∈ C \ R, and that µ φ,α A ( · |B = b) is the conditional quasi-probability distribution of A given B = b.

Application: Interpretation of Aharonov's Weak Value
As an application of the findings on the QJP distributions of quantum observables, we now focus on the geometric structure that the QJP distributions induce in the space of quantum observables. Specifically, by drawing an analogy between the result of classical probability theory, we provide a geometric and statistical interpretation of Aharonov's weak value as 'orthogonal projection' and 'conditional average', respectively.

Reference Materials
As usual, we start by preparing some necessary materials that become useful for our analysis.
The main objective of this subsection is to obtain a geometric understanding of conditional expectations in classical probability theory. L p -spaces for finite Measures.
Let µ be a finite measure on a measurable space (X, A), i.e., µ(X) < ∞, and let 1 ≤ p < q ≤ ∞. By defining r > 0 satisfying 1 r = 1 p − 1 q , a direct application of Hölder's inequality yields for f ∈ L q (µ). The following Lemma is worth of special notice.
Lemma 7.1. Let µ be a finite measure on a measurable space (X, A). Then, for any 1 ≤ p ≤ q ≤ ∞, the relation holds.
Specifically, for probability spaces, note that one has the evaluation f p ≤ f q for the choice of the parameters 1 ≤ p ≤ q ≤ ∞.
Conditional Expectations for square-integrable Functions. Now, consider a probability space (R n , B n , µ), and let A ⊂ B n be a sub-σ-algebra. Since every square-integrable function f ∈ L 2 (µ) ⊂ L 1 (µ) is integrable due to the above Lemma, its conditional expectation E[f |A] := d(f ⊙ µ)| A /dµ| A ∈ L 1 (µ| A ) is well-defined, where (f ⊙ µ)| A and µ| A denotes the restriction of the respective (complex) measures on the sub-σ-algebra. Now, observe that, for any square-integrable function g ∈ L 2 (µ| A ), the equality holds by the definition of the Radon-Nikodým derivative. Specifically, note that this leads to the fact that the conditional expectation E[f |A] ∈ L 2 (µ| A ) of a square-integrable function f ∈ L 2 (µ) is again square-integrable.

Conditioning as Projection.
Another important observation to make from the above equality is that, the act of conditioning that takes a µ-square-integrable function to its conditional expectation, is an orthogonal projection. To see this, first observe that linearity E[af + bg|A] = aE[f |A] + bE[g|A], f, g ∈ L 2 (µ), a, b ∈ C follows immediately by definition (naturally, equality is only valid µ| A -almost everywhere). Now, since L 2 (µ| A ) is itself a complex Hilbert space, it is a topologically closed subspace of the larger complex Hilbert space L 2 (µ). By recalling that there is a one-to-one correspondence between closed subspaces and orthogonal projections in Hilbert spaces, let P ( · |A) : L 2 (µ) → L 2 (µ| A ) denote the unique orthogonal projection associated with it. By observing that holds for all g ∈ L 2 (µ| A ) and f ∈ L 2 (µ) by definition of orthogonal projections, one realises that the equality (7.3) combined with the non-degenerateness of inner products leads to We summarise the results as follows.
Proposition 7.2. (Orthogonal Projection and Conditional Expectation) Let µ be a probability measure on (R n , B n ), and let A ⊂ B n be a sub-σ-algebra. Then, the unique orthogonal projection P ( · |A) : L 2 (µ) → L 2 (µ| A ) associated with the subspace L 2 (µ| A ) is provided by the conditional expectation where f ∈ L 2 (µ).
We here see how the geometric concept of orthogonality relates to the statistical concept of conditioning in L 2 -spaces.
7.1.2. Conditioning as Optimal Approximation. The geometric property mentioned above leads to several important interpretation of conditional expectations. One of the prominent characteristics of orthogonal projections is the validity of the Pythagorean identity where · 2 denotes the standard L 2 -norm introduced earlier in (2. 19). An immediate consequence of the above Pythagorean identity is the following equality which states that the optimal µ| A -square-integrable function one can find in approximating a function f is explicitly provided by the conditional expectation of f given A, and the positive-definiteness of the L 2 -norm shows that the optimum is unique µ| A -a.e.

Statistical Interpretation of Geometric Structures
Now that we have reviewed the geometric interpretation of conditional expectations in classical probability theory, we shall begin our main analysis.
7.2.1. Preliminary Observation. In classical probability theory, probability measures equip the space of square-integrable functions with a geometry, i.e., an inner product defined by (2.21), which we reiterate for the readers' convenience as g, f µ := g * f dµ (7.10) (here, we have also explicitly written the probability measure µ under consideration for clarity). In the context of classical physics in which observables are represented by functions, a probability measure defines quantities on a given pair of square-integrable classical observables interpreted as correlations or covariances between them.
In quantum mechanics, observables are represented by self-adjoint operators on Hilbert spaces, and the statistics of the system are in turn represented by vectors of Hilbert spaces. In order to see how the two distinct frameworks of classical and quantum theory on correlations play together, first let A and B be a pair of simultaneously measurable bounded quantum observables on a Hilbert space H, |ψ ∈ H a vector, and introduce B, A ψ := Bψ, Aψ ψ 2 . (7.11) One readily sees that, for the present case, the above geometry induced by the vector |ψ is in accordance with the classical theory. Indeed, the unique product spectral measure E A,B of A and B admits a unique representation of the observables given by with A(a, b) = a, B(a, b) = b, and moreover defines a joint-probability measure µ ψ A,B of the pair (see (3.69)) on the state |ψ ∈ H. It is then straightforward to see the validity of the equality based on which one obtains a statistical interpretation of the geometry (7.11) as the correlation or covariance between the observables in the classical sense.
On the other hand, the problem is not so straightforward for the case where the pair A and B does not admit simultaneous measurability. This is essentially to do with the lack of the unique product spectral measure of the pair. As we have seen in the previous Section 6, the non-commutative analogues of product spectral measures are the quasi-joint-spectral distributions (QJSDs) defined as the inverse Fourier transforms of the hashed unitary groups (6.22). The non-uniqueness of the QJSDs for the non-commuting case generally leads to the non-uniqueness of the representation of operators and vectors by functions and quasi-probability distributions. Specifically, given a choice of a QJSD Π A,B for a pair of generally non-commuting observables A and B, one obtains a functional introduce the subspace Z ψ (H) := {X ∈ L(H) : X|ψ = X * |ψ = 0} (7.29) and define the C-linear quotient space by identifying those operators for which the action of both themselves and their adjoints are indistinguishable on the state |ψ . In other words, this is to say that we identify two operators X, Y ∈ L(H) by the equivalence relation One readily sees that the sesquilinear form (7.16) passes to the quotient, and we thus obtain a sesquilinear form on the quotient space L ψ (H). Whenever there is no risk of confusion, we shall mostly denote equivalence classes by their representatives for simplicity of notation. Note also that the involution * : L(H) → L(H), X → X * that takes a bounded linear operator to its adjoint is also well-defined on the quotient space.
Hilbert Space of Operators.
We have already seen that the original sesquilinear form (7.16) becomes positive and symmetric for the choice −1 ≤ α ≤ 1 of the parameter. Based on the identification above, the sesquilinear form (7.32) on the quotient space L ψ (H) becomes positive definite, which is to say that (i) ⟪X, X⟫ ψ,α ≥ 0, X ∈ L ψ (H), (ii) ⟪X, X⟫ ψ,α = 0 ⇔ X = 0 for the choice −1 ≤ α ≤ 1. This makes (7.32) an inner product on L ψ (H) for −1 ≤ α ≤ 1, allowing us to define the norm X ψ,α := ⟪X, X⟫ 1 2 ψ,α , X ∈ L ψ (H), −1 ≤ α ≤ 1. (7.33) One moreover proves by rudimentary technique that the space is in fact complete with respect to the norm. We thus have the following result. This convenient property greatly facilitates our further argument. Hence, in what follows, we will be treating only those geometries associated with the specific choice −1 ≤ α ≤ 1 of the complex parameter.
Sub-algebra generated by an Observable.
We next introduce an important subspace of L(H). Given a bounded self-adjoint operator A ∈ L(H), we prepare a special symbol Identification.
An immediate observation one makes is that the sesquilinear form (7.16) is independent of the choice of the parameter α ∈ C on the space E(A). Indeed, for any choice of a pair of continuous functions f , g defined on the spectrum σ(A), the equality holds, where the right-most hand side denotes the standard inner product introduced on the space of square-integrable complex functions. Following the same line of arguments we have made in the previous discussion of this section, we next intend to identify those operators that are not distinguishable in view of a given state |ψ ∈ H. To this, we introduce the subspace by identifying those normal operators for which the action of both themselves and their adjoints on the state |ψ are indistinguishable. Here, the overline on the quotient space E(A)/Z ψ (A) ⊂ L ψ (H) denotes its topological closure with respect to the topology on the superset L ψ (H) induced by the norm · ψ,α (7.33). Note here that, as a set, the closure E ψ (A) is independent of the choice of the parameter −1 ≤ α ≤ 1, since all the norms · ψ,α coincide on the subspace E(A)/Z ψ (A). Moreover, one may readily check that all the inner products ⟪ · , · ⟫ ψ,α also coincide for the pair of elements of the closure E ψ (A) for any choice of the parameter −1 ≤ α ≤ 1. Now, since by definition the space E ψ (A) is a closed subspace of the complex Hilbert space, it is itself a complex Hilbert space. By denoting the restriction of the inner product as ⟪ · , · ⟫ ψ := ⟪ · , · ⟫ ψ,α | Eψ(A) , which does not depend on the choice of the parameter as we have mentioned above, we have: Lemma 7.5. For a fixed |ψ ∈ H, the ordered pair (E ψ (A), ⟪ · , · ⟫ ψ ) defines a complex Hilbert space.
The next Lemma is of special interest for our purpose. Proof. We will construct the map Φ by continuous linear extension. To this, first recall that, since σ(A) is compact, the space C(σ(A)) of all continuous functions on σ(A) is dense in L 2 (µ ψ A ). Since the map Φ : is an isometry Φ (f ) ψ := f (A)ψ = f 2 from a dense subspace of a normed space to a Banach space, there exists a unique isometric extension Φ : L 2 (µ ψ A ) → E ψ (A). By construction, one may also prove the surjectivity of Φ, hence Φ is unitary. This is to say that the Hilbert spaces E ψ (A) ∼ = L 2 (µ ψ A ) are unitarily isomorphic, and that Φ gives an embedding of the space of square-integrable functions L 2 (µ ψ A ) into the space of bounded operators on a Hilbert space. For simplicity of notation, we occasionally denote the image of a square-integrable function f ∈ L 2 (µ ψ A ) by f (A) := Φ(f ). A word of caution is to be made here for the possible confusion for the notation used. Here, the notation f (A) := Φ(f ) is meant to denote (a representative of) the equivalence class of bounded linear operators, whereas the notation f (A) is usually used to represent (generally unbounded) linear operator defined by means of the functional calculus (3.56). The relation between the two different notations can be understood in the following way. For f ∈ L 2 (µ ψ A ), let T ∈ Φ(f ) be a representative of the equivalence class of bounded operators, and let f (A) be a (generally unbounded) operator defined by means of the functional calculus (3.56), and note that |ψ ∈ dom(f (A)) by definition. Then, we have T |ψ = f (A)|ψ .

Interpretation of Conditional Quasi-expectations
Now that we have sufficiently prepared our tools, the most important among which is the embedding Φ : L 2 (µ ψ A ) ∼ = E ψ (A) ⊂ L ψ (H) (7.41) of the L 2 -space of functions into that of bounded linear operators on a Hilbert space, we next focus on orthogonal projections and 'conditioning' with respect to QJP distributions. 7.3.1. Geometric Interpretation of Conditional Quasi-expectations. Recall that, with each closed subspace of a Hilbert space, a unique orthogonal projection is associated. In what follows, we are interested in the orthogonal projection of an observable A onto the subspace E ψ (B) generated by another observable B, and see that this provides a geometric interpretation of the conditional quasi-expectation E α [A|B; ψ] introduced earlier in (4.77).
To this, let B ∈ L(H) be self-adjoint, |ψ ∈ H, −1 ≤ α ≤ 1, and let E ψ (B) be the space generated by B, which is a closed subspace of the Hilbert spaces (L ψ (H), ⟪ · , · ⟫ ψ,α ). As a closed subspace of a Hilbert space, there exists a unique orthogonal projection P α ( · |B; ψ) : L ψ (H) → E ψ (B), X → P α (X|B; ψ) (7.42) associated to E ψ (B). Recalling the relation between orthogonal projections and conditional expectations in classical probability theory (see Section 7.1), it is natural to conjecture that an analogous relation holds for the quantum case. To this, let A ∈ L(H) be self-adjoint, and consider the projection P α (A|B; ψ) of A onto E ψ (B). We have seen in Section 4 that the conditional quasi-expectations E α [A|B; ψ] ∈ L 1 (µ ψ B ) introduced in (4.77) serve as possible candidates of quantum analogues of conditional expectations that can even be defined for non-commuting pair of quantum observables. Since, the observable A we consider here is bounded, one may prove that the conditional quasi-expectation E α [A|B; ψ] ∈ L 2 (µ ψ B ) is in fact square-integrable. By letting E α [A|B; ψ] denote both the square-integrable function and its image by the unitary map Φ : L 2 (µ ψ B ) → E ψ (B), it is natural to conjecture the validity of the equality P α (A|B; ψ) = E α [A|B; ψ], where the l. h. s. is the orthogonal projection of A onto the space E ψ (B) with respect to the (parameter-dependent) inner product ⟪ · , · ⟫ ψ,α , whereas the r. h. s. denotes the image of the (parameter-dependent) conditional quasi-expectation of A given B by the unitary map Φ. by means of the embedding Φ : L 2 (µ ψ B ) → E ψ (B) defined in (7.41). Then, the orthogonal projection of A onto the subspace E ψ (B) generated by B, defined in (7.42), reads P α (A|B; ψ) = E α [A|B; ψ], (7.44) which is to say that orthogonal projections are equivalent to conditional quasi-expectations.
Just as we have seen for the classical case, this result provides a geometric interpretation of conditional quasi-expectations as orthogonal projections. As a corollary, one has a geometric interpretation of Aharonov's weak value. could be interpreted as the orthogonal projection of A onto the subspace E ψ (B) generated by B.
Topic: Weak Value as Optimal Approximation.
As a direct consequence of Proposition 7.8 (specifically, Corollary 7.8), we note an interesting result regarding conditional quasi-expectations (specifically, the weak value) and optimal approximation. As orthogonal projections, observe that conditional quasi-expectations furnish the optimal proxy function for A minimising the distance A − E α [A|B; ψ] ψ,α = min which is nothing but the 'Pythagorean identity' valid for orthogonal projections 29 in Hilbert spaces. The interpretation of conditional quasi-expectations as orthogonal projections provide the core geometric observations why the weak value appears as the optimal choice for the proxy functions in the novel uncertainty relations for approximation/estimation [14].

Statistical
Interpretation of Conditional Quasi-expectations. In classical probability theory, conditional expectations not only admitted geometric interpretation as orthogonal projections, but also statistical interpretation as conditioned averages. We next seek to provide a quantum analogue of this observation, namely, to provide a statistical interpretation of the conditional quasi-expectations as 'conditional averages' with respect to QJP distributions. To this, we first introduce a general term: Definition (Conditional Quasi-expectation of Quantum Observables). Let µ ∈ M ψ A,B be a QJP distribution of a pair of quantum observables A and B on the state |ψ ∈ H, such that is admits representation by quasi-probability measures, and suppose that the expectation value E[π A ; µ] exists. Denoting the measurable functions representing the behaviour of the measurement outcomes of A and B by π A (a, b) = a and π B (a, b) = b, respectively, we then  − f (B)) ψ , f ∈ L 2 (µ ψ B ), f is real, (7.52) as a special case. This specific form is known by [45,46], although proven from a different perspective than directly utilising the geometric observation made in this paper. where the definition of the r. h. s. is given in Section 5.2.1, whenever the Radon-Nikodým derivatives concerned exist.
We next see that the definition of conditional quasi-expectations agree with those introduced earlier (4.77).
Proposition 7.9 (Statistical Interpretation of Conditional Quasi-expectations). Let A, B ∈ L(H) be self-adjoint, |ψ ∈ H, and let µ ψ,α add be a member of the additive complex-parametrised sub-family of the QJP distributions of A and B for the choice α ∈ C defined as in (7.22 This result provides a statistical interpretation of conditional quasi-expectations as 'conditional averages' of an observable A given another observable B with respect to the QJP distributions concerned. As a corollary, one also has a statistical interpretation of Aharonov's weak value. Then, the weak value 'amplification' by conditioning, and a systematical method (with an example) to analytically evaluate the conditional expectation in the case where A has a spectrum consisting of finite points. As for the retrieval of the target information, we exclusively studied the behaviour of the meter outcome in the weak region of the interaction parameter, and observed that the obtained value can be understood as a quantum analogue of conditional expectations, which we termed conditional quasi-expectations, of the target observable A given the conditioning observable B, to which Aharonov's weak value belongs as a special case. It was also revealed that there exists some qualitative difference on the properties between the classical conditional expectations and the quantum analogue discussed here.

Section 5 (CM II).
In Section 5, the study of the conditioned measurement scheme was given a probabilistic approach, where the quantity of interest now became the QJP distribution of a pair of canonically conjugate observables on the meter system, conditioned by the outcome of the conditioning observable B of the target system. For definiteness, this was accomplished in view of the Wigner-Ville distribution, which was primarily chosen as a convenient realisation among the various candidates of the quasi-probability distributions of the canonically conjugate pair that may be naturally associated with the quantum state of the meter system. It was then argued that, in parallel to the UM case, one can recover the information of the target system by examining either the strong or the weak region of the interaction parameter, and that the information obtained can be understood as a quantum analogue of conditional probabilities, which we termed conditional quasi-probabilities, of the target observable A given the conditioning observable B on the initial state |φ . We then found that the conditional quasi-probability shares similar properties with the classical counterpart, while it admits complex values unlike the classical one. We subsequently confirmed that, given B, the 'statistical average' of the conditional quasi-probability of A coincides with the conditional quasi-expectation of A obtained in the preceding section. This is precisely the same as the relation between classical conditional probabilities and conditional expectations.
8.1.2. QJP: Formal Definition. Inspired by the heuristic arguments from the bottomup and operational analyses given in the preceding four sections, in Section 6 we provided the top-down discussion on QJP distributions defined for arbitrary pairs of generally noncommutative quantum observables.

Section 6 (QJP of Quantum Observables).
Based on the results of the spectral theorem for self-adjoint/normal operators on Hilbert spaces and their Fourier transforms, we proposed a general prescription for defining distributions describing the 'joint behaviour' of a pair of generally non-commuting quantum observables, which serves as a natural generalisation to that defined for a pair of simultaneously measurable observables. We then observed that the QJP defined this way for a non-commutative pair of observables admits arbitrariness, that is, there exists a multitude of candidates that all share in common certain desirable properties to be qualified as QJP. We subsequently concentrated on a special sub-family of the class of QJP distributions parametrised by a single complex number for the ease of further discussions, such that it includes both the Wigner-Ville type and the Kirkwood-Dirac type of QJP distributions which are among the most familiar examples considered in the literature. We then summarised our results obtained up to Section 5 from a relatively aerial viewpoint gained here, and discussed where the heuristic arguments and observations in the foregoing sections find their places in this broader framework. 8.1.3. Application. As the final topic, we gave an example of application of our observations on QJP distributions of quantum observables. Section 7 (Application: Interpretation of the Weak Value).
To discuss where the mathematical observations on QJP distributions may find their use, we studied on the quantum analogue of 'correlations' (inner products) that can be defined even for a pair of noncommuting observables. As is well known, due to the non-commutative nature of quantum observables, there is no unique way to introduce a 'natural inner product' on the space of quantum observables. We showed that the ambiguity of the possible geometries that can be introduced on the space corresponds precisely to the ambiguity of the definition of QJP distributions, and that the QJP distributions provides a convenient representation of the geometries in terms of 'integration' (statistics). We then concentrated on a special sub-family of all possible QJP distributions parametrised by a single complex number and observed that the geometric concept of orthogonal projection may be endowed with a statistical interpretation as conditioning. This fact is analogous to the classical case, while the difference lying in the fact that, for the quantum case, there could be multiple orthogonal projections due to the non-uniqueness of the inner product. The main finding is that, Aharonov's weak value may be understood as a special realisation of the possible orthogonal projections of a quantum observable A onto the space of all normal operators generated by another observable B, and at the same time, as a conditioning of A when the outcomes of B is given. The former is a geometric interpretation of the weak value, while the latter is its statistical interpretation, but since QJP distributions tie them together, both interpretations are equivalent.

Discussion
Since the advent of quantum theory founded nearly a century ago, non-commutativity of quantum observables has undoubtedly been one of the major sources of troubles we face when we try to interpret their measurement outcomes in a sensible manner. This has naturally led to various attempts of 'quasi-classical' interpretation of quantum observables in terms of commuting quantities familiar to us in classical theory. Wigner, Weyl and Moyal were among the prominent figures who have made much contribution in this effort, bearing most notably the theory of Wigner-Weyl transform [47] and Weyl-Groenewold-Moyal product [48,49]. In particular, the theory of Wigner-Weyl transform provides an invertible mapping between functions defined on a phase space and operators on a Hilbert space, in which the mapping from functions to operators is called the Weyl transform, whereas the inverse is called the Wigner transform. It is notable in this respect that the Wigner-Ville distributions arise as the Wigner transform of density operators, and from this follows the fact that the expectation values of quantum observables can be expressed as the statistical average by integration of their Wigner transforms with respect to the Wigner-Ville distribution defined on the phase space.
Viewed from the broader context of these quasi-classical transforms, the mathematical methods developed in this paper may be understood as another functional analytic approach to this problem. Recall that, in functional analysis, a map that assigns a ring of functions onto a commutative sub-algebra of the algebra of quantum observables is known as the functional calculus, which in turn is known to be uniquely represented by a spectral measure. The family of quasi-joint-spectral distributions (QJSDs) introduced in Section 6 are non-commutative analogues of spectral measures, which induce maps that assign functions to generally noncommutative sets of quantum observables. Due to the possible non-commutativity of the chosen combination of observables, QJSDs are in general highly non-unique, and this leads to various candidates of quasi-classical transforms. In fact, the Wigner-Weyl transform can be understood as a special case in this framework, namely, the quasi-classical transform corresponding to the member of our complex parametrised convolutive sub-family of QJSDs mentioned in the text for the particular choice α = 0. The method of 'hashing' presented in this paper thus exemplifies a procedure for constructing a broad class of candidates of quasi-classical transforms.
The method of quasi-classical transforms, to which the Wigner-Weyl transform belongs as a special case, not only offers a statistical interpretation of the behaviour of a combination of non-commuting quantum observables, but also sheds new light on the physical analysis in quantum mechanics pertaining to that process. It should be obvious that one can draw an analogy to various concepts and results in classical probability theory when one considers the quantum counterparts obtained by this method, which allows for an intuitive treatment of the latter based on the geometric structure present in the probability theory. Besides, transformation of Hilbert space operators into functions or quasi-probability distributions has its own technical merit in the mathematical analysis, since familiar results in measure and integration theory, including various convergence theorems, integral inequalities and representation theorems, are readily available.
One of the direct applications taking advantage of these properties is the geometric/statistical interpretation of the weak value discussed in Section 7. There, we found that the weak value can be regarded as one of the possible quantum analogues of conditional expectations, which are indeed fundamental quantities in quantum mechanics as much as the standard conditional expectations are in classical probability theory. This interpretation also leads to novel inequalities of uncertainty relations for approximation and estimation which are capable of treating both the position-momentum inequality and the time-energy inequality [14] within a unified framework.
Finally, we wish to note that, in any conditioned quantum measurement such as the weak measurement, non-commutative observables must be dealt with in one way or another in the context of probability theory when one tries to make sense of the measurement outcome. Given this, we expect that our method of quasi-classical transforms, which is established on a rigorous mathematical basis, may offer a fundamental and practical scheme in which issues involving measurement results of non-commuting observables are analysed.

A. Post-selected Measurement
Given that the conditioning observable B has a spectrum of finite cardinality, we have seen in (4.48) that the conditional expectation E[X|B; Ψ g ] admits an explicit expression, of which value reduces to for the choice b ∈ σ(B) such that the probability of observing it is non-vanishing. This is roughly to say that the description of a conditioning by a general observable B, hence the study of conditioned measurement scheme, essentially reduces to that given by a projection. Of course, this should be intuitively clear, since each self-adjoint operator admits a unique spectral decomposition. As the extreme case, the choice of the conditioning observable given by a projection on a one-dimensional subspace of H spanned by a unit vector |φ ′ ∈ H becomes of special interest for our study. The vast majority of literatures with similar interest to this paper is devoted to the study of this special type of conditional measurement, and the act of measuring the conditional expectation is mostly referred to as the 'post-selected measurement' or the 'weak measurement'. In such context, the unit vector |φ ′ is occasionally called the final state, denoted as |φ f := |φ ′ , in order to contrast it with the initial state denoted as |φ i := |φ .

A.1. Example: Analytic Model
We are now interested in the construction of a model in which the conditional expectation can be analytically computed for all range of the interaction parameter g ∈ R. To this end, we assume that the target observable A has a spectrum of finite cardinality. One readily sees that the 'conditional' composite state essentially reduces to the computation of the vector for the special choice of the conditioning observable B = Π f := |φ f φ f | defined by some final state |φ f ∈ H. A careful observation reveals that the 'conditional' meter state (5.63), which is in general a mixed state for the general conditioning observable B, in fact becomes a pure state for the post-selected measurement case, in which the conditional expectation reads whenever the denominator is non-vanishing, i.e., when the 'conditional' meter state is not a zero vector. One thus learns that the computation of the conditional expectation essentially