Maximum Likelihood Estimation from a Tropical and a Bernstein--Sato Perspective

In this article, we investigate Maximum Likelihood Estimation with tools from Tropical Geometry and Bernstein--Sato theory. We investigate the critical points of very affine varieties and study their asymptotic behavior. We relate these asymptotics to particular rays in the tropical variety as well as to Bernstein--Sato ideals and give a connection to Maximum Likelihood Estimation in Statistics.


Introduction
Maximum likelihood estimation.Let X be a smooth closed subvariety of the algebraic torus (C * ) p and denote by t 1 , . . ., t p the coordinates on the torus restricted to X.Given α ∈ C p , the maximum likelihood estimation (MLE) problem is to find the zeroes of the logarithmic one-form dlog(t α 1 1 • • • t αp p ).These zeroes are also referred to as critical points.As the terminology suggests, this problem arises from Statistics.We refer to [20] for further details and examples related to this statistical context.In this article, we investigate this problem from an algebro-geometric point of view.The geometric object underlying our analysis is the critical locus C ⊆ X × P p−1 , consisting of all pairs (x, α) for which the logarithmic differential dlog t α vanishes at x.As proven by Franecki-Kapranov in [14] and later by Huh in [18], for a general data vector α, the number of critical points is finite and equal to the signed Euler characteristic (−1) dim X χ(X) of X.We refer to non-general data vectors as special values.The number of critical points for general α is also called the maximum likelihood degree of X, denoted by d ML (X).
Maximum likelihood estimation approaching non-general data vectors.In this article, we study the behavior of the critical points when approaching a special data vector along a curve consisting of general values.More precisely, we study the subset S F ⊆ P p−1 consisting of all data vectors α for which at least one of the d ML (X) many critical points leaves X as we approach α.We refer to the components of S F that are of codimension one as critical slopes of F. In this case, at least one of the coordinates of some critical point approaches either zero or infinity.To obtain more refined information of the asymptotic behavior, we keep track of the direction in which this critical point escapes by recording it in a set Q F,α ⊆ Z p .The points in Q F,α essentially describe which torus orbits in a tropical compactification of X are approached by the critical points as one approaches α.In Section 2, we introduce both of these objects in the general setting of a smooth variety X with a tuple of nowhere vanishing regular functions F = (f 1 , . . ., f p ) on it.It turns out that both S F and Q F,α are closely related to the divisorial valuations corresponding to the irreducible components of the boundary in a smooth compactification of X with simple normal crossings (SNC) boundary.The critical points for special data vectors were, among others, also studied in [9] for the case of hyperplane arrangements and in [27] using probabilistic methods.To study S F and Q F,α , a good understanding of a smooth compactification and its boundary components is essential, which we tackle in Section 3.

Schön varieties and tropical compactifications.
A natural class of compactifications of closed subvarieties X of tori is provided by the tropical compactifications of [31].Tropical compactifications are constructed by taking the closure X Σ of X in the toric variety associated to a fan Σ whose support is the tropical variety of X.In general, these compactifications are not smooth.If the variety X is schön (see [18,Definition 3.6]), it allows for a tropical compactification that is smooth and whose boundary is a simple normal crossings divisor.This class of subvarieties of tori provides a common generalization of hyperplane arrangement complements and of Newton non-degenerate hypersurfaces as shown in [17].For these compactifications, we prove the following theorem.Theorem 3.7.Let E i be an irreducible component of the boundary in a smooth tropical compactification of X with simple normal crossings boundary.Assume that χ(E • i ) = 0, where E • i := E i \ ∪ j =i E j .Then for general α in the hyperplane H E i = {ord E i (t 1 )s 1 + • • • + ord E i (t p )s p = 0}, the vector (ord E i (t 1 ), . . ., ord E i (t p )) is contained in Q F,α .In particular, the hyperplane H E i is contained in S F .Moreover, these hyperplanes are the only codimension-one components of S F .
The toric variety associated to Σ has codimension-one torus orbits O τ indexed by the rays τ ∈ Σ.The boundary component E • i is an irreducible component of X Σ ∩ O τ for some τ ∈ Σ and thus the condition that χ(E • i ) = 0 is a property of the ray τ in the tropical variety of X.To translate this condition into a tropical condition, we define the notion of rigid rays in Trop(X); these are the rays for which any small perturbation of the ray changes the associated initial ideal.Proposition 3.13.Let X be a schön very affine variety and Σ a fan supported on the tropical variety of X such that the closure X Σ of X in the toric variety associated to Σ is smooth and X \ X Σ is a SNC divisor.Assume that for all τ ∈ Σ, the intersection X Σ ∩O τ is connected.If τ is a rigid ray, then for general α in the hyperplane orthogonal to τ , the primitive generator of the ray lies in Q F,α .In particular, P(τ ⊥ ) ⊆ S F .This gives a bijection of the rigid rays in Trop(X) and the codimension-one components of S F .This proposition completely determines the critical slopes of F in terms of tropical data associated to X.By the very definition of Q F,α , this proposition gives a description of the asymptotic behavior of the critical points as we approach the special set of data vectors formed by the hyperplane orthogonal to a rigid ray.We refer to Section 3.1 for a more explicit interpretation of this result in terms of maximum likelihood estimation.
Slopes of Bernstein-Sato varieties.In Section 4, we relate these results to Bernstein-Sato ideals.For a tuple F = (f 1 , . . ., f p ) of regular functions on a smooth algebraic variety, the Bernstein-Sato ideal of F is the ideal B F in C[s 1 , . . ., s p ] consisting of polynomials b for which there exists a global algebraic linear partial differential operator P (s 1 , . . ., s p ) such that This definition is a generalization of the Bernstein-Sato polynomial of a single regular function f .It is an intricate invariant of the tuple F which is related to the singularities of the hypersurface V (f 1 • • • f p ), see for instance [3] or [5] for relations to monodromy eigenvalues.Sabbah [28] showed that the Bernstein-Sato ideal is non-zero and that the codimension-one components of the Bernstein-Sato variety V (B F ) are affine hyperplanes with rational coefficients.The set of Bernstein-Sato slopes of F , denoted by BS F ⊆ C p , is the union of these hyperplanes after translating them to the origin.We denote by P(BS F ) the projectivization of this set.Maisonobe [25] gave a geometric description of the Bernstein-Sato slopes.We denote by Y the closure of X inside C p and will assume that Y is smooth.We then study the Bernstein-Sato ideal of the tuple of coordinate functions on C p restricted to Y .In this setup, we deduce the following theorem using Maisonobe's description of the Bernstein-Sato slopes.
Theorem 4.4.Under the assumptions of Proposition 3.13, the irreducible components of S F ∩ P(BS F ) are exactly the hyperplanes P(τ ⊥ ) for τ a rigid ray contained in R p ≥0 .
A study of BS F using the critical locus was also undertaken in [2] under a different technical assumption, namely that the tuple F is sans éclatements en codimension 0. We would like to point out that Lemma 2.6 therein, treating the special case p = 2, is analogous to our Theorem 4.4.In Example 4.5, we demonstrate that, in general, the sets S F and P(BS F ) are incomparable in the sense that either can contain irreducible components not contained in the other one.Theorem 4.4 explains in a rigorous way the observations made in [29,Example 3.1].We revisit this example in Section 5.
Theorem 4.4 provides information on the slopes of the Bernstein-Sato variety, but not on the affine translation with which these slopes appear.In Subsection 4.2, we formulate a conjecture related to these affine translates.This conjecture is formulated in terms of the log-canonical threshold polytope of the tuple F .We prove this conjecture for the case of indecomposable central hyperplane arrangements, in which case it proves [3,Conjecture 3] for complete factorizations of hyperplane arrangements.It turns out that in this case the object involved has already extensively been studied under a different name: it is the matroid polytope of [13], and Proposition 2.4 in loc.cit.precisely recovers our conjecture.The Bernstein-Sato ideal of hyperplane arrangements has recently also been studied using different methods in [1], [26], and [32].
In summary, our results connect Tropical Geometry, Bernstein-Sato Theory, and Likelihood Geometry.Among others, our article provides new tools for Algebraic Statistics and Particle Physics: in the recent article [30], Sturmfels and Telen outline a link of MLE for discrete statistical models to scattering amplitudes.
Notation and conventions.By a variety we mean an integral, separated scheme of finite type over the complex numbers, unless explicitly stated otherwise.A property of a variety is called general if it holds true on a Zariski dense open subset of the variety.For a smooth variety X with compactification X ֒→ Y s.t.Y is smooth and E := Y \ X = q i=1 E i is a divisor with irreducible components E i , we denote by the complement of E i by the other components.When X is equipped with a tuple F = (f 1 , . . ., f p ) of regular functions, we denote by H E i the hyperplane We denote by ∆ the formal disc Spec C t , by ∆ • := Spec C((t)) its generic point, and by 0 its closed point.Throughout the article, we will always assume that all cones in a fan are strongly convex.

Asymptotic behavior of critical points
In this section, we introduce the objects of study and investigate basic properties.We study smooth varieties X with a p-tuple F = (f 1 , . . ., f p ) of nowhere vanishing regular functions on X.
Definition 2.1.The critical locus of F , encoding the zeros of dlog f α , is defined to be Let X ֒→ Y be a compactification of X and denote by E := Y \ X the boundary of this compactification.
1 (E) ⊆ P p−1 and refer to its irreducible components of codimension one as critical slopes of F.
The variety S F is the image of a proper variety and as such a closed subvariety of P p−1 .
Example 2.4.Let X ′ ⊆ C 2 be the complement of the arrangement defined by f = xy(x − y)(x − 1), the four factors of which form the tuple F = (f 1 , f 2 , f 3 , f 4 ).As compactification, we take P 2 .With a computer algebra system one confirms that the critical slopes of F are , and {s i = 0} comes from {f i = 0} for i = 2, 3, 4. △ Lemma 2.5.Let Z be a variety, X ⊆ Z a locally closed subvariety and x ∈ Z a closed point.Then x ∈ X if and only if there exists a morphism γ : ∆ → Z such that γ(∆ • ) ∈ X and γ(0) = x.
Proof.The existence of such γ clearly implies that x ∈ X.For the reverse implication, let x ∈ X.Take a curve C ⊆ Z with x ∈ C and C ∩ X = ∅.Let C denote the normalization of C and choose a point x ∈ C lying over x.Since C is smooth, by the Cohen structure theorem we can construct the required morphism as As an immediate consequence, we deduce the following corollary.
Corollary 2.6.A point α ∈ P p−1 lies in S F if and only if there exists a morphism To keep track of more refined information about the asymptotic behavior of the critical points, we introduce the following subsets of Z p .Definition 2.7.Let Y be a compactification of X and let α ∈ P p−1 .Let π 1 : Y × P p−1 → Y denote the projection to the first factor.We denote by Q F,α ⊆ Z p the set of vectors that arise in the following manner.A vector is in Q F,α if and only if it is of the form (ord t (γ * (π * 1 f 1 )), . . ., ord t (γ * (π * 1 f p ))) for some morphism γ : ∆ → Y × P p−1 such that γ(∆ • ) ∈ C F and γ(0) ∈ E × {α}.
Lemma 2.8.The sets S F and Q F,α , α ∈ P p−1 , do not depend on the choice of a compactification.
The following observation is immediate from the definitions.
Lemma 2.9.The closed points of S F are given by those α ∈ P p−1 for which Q F,α is non-empty, i.e., S F = {α ∈ P We recall that for a boundary component E j in a compactification of X, we denote by H E j the hyperplane ord j=1 H E j and hence also the critical slopes of F are among the H E j .
Proof.Since E is a SNC divisor, the locally free sheaf Ω 1 X on X extends to the locally free sheaf We then have the following commutative diagram: The horizontal arrow defines the incidence variety where dlog f α is regarded as a global section of Ω 1 Y (log E), and hence dlog f α (y) is an element of the fiber of this locally free sheaf over y.We refer to C F,Y (log E) as the logarithmic critical locus of F .This is a closed, possibly reducible, subvariety of Let y ∈ j∈J E j .Assume for simplicity of notation that J = {1, . . ., r}.Take local coordinates y 1 , . . ., y n on a small open U around y in which E i ∩ U = {y i = 0} for i = 1, . . ., r.Then in these coordinates ) where the u i are non-vanishing functions on U .Summing up, we conclude that where θ(α) is a regular 1-form around y, which we write in coordinates as n i=1 θ i (α)dy i .We will denote q j (α) = ord E j (f 1 )α 1 +• • •+ord E j (f p )α p .In the frame dy 1 /y 1 , . . ., dy r /y r , dy r+1 , . . ., dy n for Ω 1 Y (log E) we conclude that dlog f α is given by y 1 θ 1 (α)+q 1 , . . ., y r θ r (α)+q r , θ r+1 (α), . . ., θ n (α), and thus these are a system of defining equations for Since y ∈ j∈J E j was arbitrary, we conclude that For a morphism γ : ∆ → Y × P p−1 as in the statement of the proposition, we thus conclude that In particular, (∩ j∈J H E j ) ∩ {α} is non-empty so that indeed α ∈ ∩ j∈J H E j .The last claim of the lemma follows from the first statement and Lemma 2.6.
Remark 2.11.In the above proof, the reason we work with 1 (E), and hence is defined by a saturation of the ideal defining C F,Y (log E).The ideal defining C F,Y (log E) can be written down fairly explicitly as in the proof above, but we do not have a general method to analyze the saturation.We illustrate with an example that in general there is a difference between C F,Y and , and V (z).We can perform blowups with exceptional locus lying over V (z) to make the boundary E SNC.On can compute that

Maximum likelihood estimation on schön very affine varieties
In this section, we investigate schön very affine varieties.Those varieties allow for a smooth compactification with SNC boundary obtained from combinatorial data, namely from their tropical variety.For background in Tropical Geometry and tropical compactifications, we refer the reader to [24], [31], and [23].In particular, we refer to [24, Definitions 3.1.1and 3.2.1]for the definition of the tropical variety and to [24, Theorem 3.2.3]for equivalent characterizations.We analyze the MLE problem in terms of Tropical Geometry.The main results of this section are Theorem 3.7 and Proposition 3.13 which completely determine the critical slopes from tropical data.
3.1.Zeroes of logarithmic differential forms.Let X ⊆ (C * ) p be a smooth closed subvariety of the algebraic p-torus.We denote by t 1 , . . ., t p the coordinate functions on (C * ) p and by f i := t i | X their restrictions to X.We will assume that there exists a fan structure Σ on Trop(X) such that the closure X Σ of X in the associated toric variety T Σ is proper and smooth and X Σ \ X = X Σ ∩ (T Σ \ (C * ) p ) is a reduced simple normal crossings divisor.If X is schön, it admits such a compactification (see proof of [16,Theorem 2.5]).We denote by {τ i } i∈I the rays in Σ and we denote the primitive ray generator of the ray τ by v τ ∈ Z p .
The irreducible components of the boundary E = X Σ \X are partitioned by the rays in Σ.Those corresponding to τ are the irreducible components {E τ,i } of E τ := X Σ ∩O τ , where O τ ⊆ T Σ is the locally closed torus orbit corresponding to τ .Recall that an alternative characterization of schön very affine varieties is as those closed subvarieties of a torus for which there is a fan structure Σ on Trop(X) such that the multiplication map m : X Σ ×(C * ) p → T Σ is smooth.In particular, it follows from this characterization that every E τ is smooth, and hence if E τ is not irreducible, it is a disjoint union of smooth irreducible components.
By construction of T Σ we have that ord Oτ (t j ) = v τ j .Since E τ is reduced by assumption, it follows that also for every component E τ,i ⊆ E τ , ord E τ,i (f j ) = (v τ ) j .In particular, for each such E τ,i , H E τ,i equals P(τ ⊥ ), the hyperplane orthogonal to τ .
Recall that for each α ∈ C p , the form dlog f α extends to a global section of Ω 1 X Σ (log E), i.e., a differential one-form on X Σ with logarithmic poles along E. As in the proof of Proposition 2.10, we denote by C F,X Σ (log E) the logarithmic critical locus of F, i.e., As also stated there, we have have the following containments: 1 (E).However, the following lemma assures that in this situation this cannot be the case.E) mapping the ith generator to dlog f e i .It follows that the kernel of this morphism has constant rank equal to p − n and thus is locally free.Since C F,X Σ (log E) is the projectivization of the total space of the vector bundle corresponding to the kernel, it is smooth and irreducible.Remark 3.2.These properties of the asymptotic critical locus are not intrinsic to X, but depend on the chosen compactification.For example, in [9], the authors investigate the singularities of C F,P n for hyperplane arrangement complements that are not closed in a tropical compactification but are closed in projective space.
Then the sum of the primitive ray generators {v E) is a projective space bundle over X Σ .Assume for the sake of notation that J = {1, . . ., r}.Take local coordinates y 1 , . . ., y n on U ⊆ X Σ centered at x in which Denote the image of (x, α) under Ψ by (x, q), q ∈ P p−n−1 .Consider then the curve By construction, γ(0) = (x, α).It follows immediately from the definition of this γ that ord t (γ Let α ∈ H Eτ .The local expression for the logarithmic differential form dlog f α that we computed in Equation 1shows that dlog f α has residue 0 along every irreducible component of E τ .Hence we can restrict dlog f α to E τ to obtain We define By a local computation, one obtains the following description of C F,X Σ ,Eτ .
Lemma 3.4.Considering C F,X Σ ,Eτ as a subvariety of X Σ × P p−1 via the inclusion Proof.We work around a point y ∈ k i=1 E i , with E τ = E 1 , and such that y is not contained in any other boundary component.We take coordinates y 1 , . . ., y n on a small open U around y such that E i ∩ U = {y i = 0} for i = 1, . . ., k.The local expression for dlog f α from Equation 1gives On the other hand, y 2 , . . ., y n are coordinates on E τ ∩ U , and the form dlog This shows that C F,X Σ ,Eτ and C F,X Σ (log E) ∩ π −1 1 (E τ ) are defined by the same equations.

We will denote E
, and thus by Lemma 3.4, (x, α) ∈ C F,X Σ (log E).The claim then follows from Corollary 3.3.
We will use the following theorem of Huh in order to apply Corollary 3.5.
Theorem 3.6 ( [14,18]).Let Z be a smooth very affine variety contained in a torus with coordinates t 1 , . . ., t p .Then there exists a Zariski dense open subset of C p , s.t. for all α in this subset, the form p i=1 α i dt i t i has exactly (−1) dim(Z) χ(Z) many zeroes on Z.
In this theorem, smoothness is a necessary assumption-the failure of the statement for singular X is explained in [7].Notice that the boundary components E • τ are disjoint unions of smooth very affine varieties, since they are contained in the torus orbit O τ ∼ = (C * ) p−1 .Hence, Huh's theorem also applies to E • τ .Since the torus orbit O τ is naturally a quotient of (C * ) p , its coordinate ring naturally is a subring of the coordinate ring of (C * ) p .Under this identification, when applying Theorem 3.6 to E • τ ⊆ O τ , the vector space C p−1 in which the α live is naturally identified with τ ⊥ .When applying this theorem to X (and E • τ , resp.), we denote the Zariski dense subset that appears in the theorem by V ⊆ C p (and by V τ ⊆ τ ⊥ , resp.).Theorem 3.7.Assume that χ(E • τ ) = 0. Then for all α ∈ V τ , the primitive ray generator v τ is contained in Q F,α .In particular, H Eτ ⊆ S F .Moreover, these are the only codimension-one components of S F .
Proof.Since E • τ is smooth and very affine, for every α ∈ V τ the form dlog zeroes, all contained in E • τ by Theorem 3.6.The first claim follows from Corollary 3.5.Since V τ is dense inside τ ⊥ and S F is closed, we conclude that H Eτ = P(τ ⊥ ) ⊆ S F .
We prove the last claim by reasoning the other way round.We recall that by Proposition 2.10, every codimension-one component of S F is of the form P(τ ′⊥ ) for some ray τ ′ ∈ Σ.Let τ ′ be some ray in Σ and suppose that Then by Theorem 3.6, E • τ ′ has non-zero Euler characteristic.Remark 3.8.Smooth very affine varieties with zero Euler characteristic can still have "resonant" α for which dlog f α has a zero.In this case, strict subvarieties of some H Eτ might show up as an irreducible component of S F .For relations to resonance varieties of hyperplane arrangements, we refer to [10] and the references therein.

We denote by I ⊳ C[t ±1
1 , . . ., t ±1 p ] the defining ideal of X.For w ∈ R n , in w (I) denotes the initial ideal of I as defined in [24,Section 1.6].The following lemma gives a description of the components E • τ in terms of initial ideals.
Lemma 3.9.Let τ be a ray in Trop(X) with primitive ray generator w.Under the natural isomorphism Proof.For a proof, see [24, page 308].
Recall that we started by taking a fan Σ supported on the tropical variety of X.In general, there are many such Σ and there is no coarsest fan structure on Trop(X).Despite the lack of a coarsest fan structure, Theorem 3.7 guarantees that some rays are present in every fan structure, namely those for which E • τ = X Σ ∩ O τ has non-zero Euler characteristic.We now characterize these rays in terms of initial ideals, under a connectedness assumption.
In order to decide in practice if a ray is rigid, one has has to compare the initial ideal of the defining ideal of X w.r.t.this ray with the initial ideal w.r.t. a relative interior point of all neighboring cones.Rigid rays can also be characterized as follows.
Proof.The ray τ is rigid if and only if in w (I) is not homogeneous with respect to any weight vector other than scalar multiples of w.In other words, τ is rigid if and only if in w (I) is not preserved under any non-trivial subtorus of (C * ) p other than T w := {(t w 1 , . . ., t wp ) | t ∈ C * }.The claim then follows from Lemma 3.9.Lemma 3.12.In the situation as above, the following two statements hold true.
Proof.Denote by T the non-trivial subtorus of O τ that preserves E • τ .The action of T on E • τ is free and hence so is the action of the group Z/mZ of m-th roots of unity for all m ∈ Z >0 .Hence E In particular, χ(E • τ ) is divisible by every non-negative number m and hence is 0, concluding the proof of the first statement.Since X is schön, E • τ is smooth, and by connectedness it is thus irreducible.Let F be the irreducible perverse sheaf In summary, we obtain the following implications: where the last implication is a bi-implication if we assume in addition that E • τ is connected.Combining this with Theorem 3.7, we obtain the following proposition.Proposition 3.13.Assume that for all τ ∈ Σ the intersection E • τ = X Σ ∩ O τ is connected.If τ is a rigid ray, then for all α ∈ V τ , v τ ∈ Q F,α .In particular, P(τ ⊥ ) ⊆ S F .This gives a bijection of the rigid rays in Trop(X) and the codimension-one components of S F .Example 3.14.We pick up the arrangement from Example 2.4, i.e., X ′ ⊆ C 2 is the complement of the arrangement defined by f = xy(x − y)(x − 1), the four factors of which form the tuple coinciding with what was computed in Example 2.4.Notice that {s 1 = 0} does not show up in S F .There are two equivalent ways of explaining this: using the fact that ) is isomorphic to C * and thus has zero Euler characteristic, or using the fact that the ray generated by e 1 is not rigid.We note that a compactification of X ′ is given by P 2 .To make the boundary of this compactification SNC we have to blow up the triple intersection points, namely the origin and a point on the hyperplane at infinity.Every ray τ in the Gröbner fan that we found above is then of the form P(τ ⊥ ) = H E i for some boundary divisor E i in this compactification.Since X ′ is a hyperplane arrangement complement, it makes sense to investigate the relation between the combinatorics of the arrangement and S F .Recall that the combinatorics of an arrangement are encoded in the its intersection lattice L, which is a poset with one vertex for every subspace that can be formed by intersecting some of the hyperplanes in the arrangement.These vertices are referred to as edges of the arrangement, and are ordered using the reverse inclusion.An edge S is a dense edge if the subposet L ≤S is not a non-trivial product of two posets.An edge S is a flacet if neither L ≤S nor L ≥S is a non-trivial product of two posets.The following diagram depicts the Hasse diagram of the arrangement: We see from this diagram that there are 5 dense edges: {x = 0}, {y = 0}, {x = y}, {x = 1}, and {(0, 0)}.
To understand the relation between the codimension-one components of S F and the intersection lattice, we regard X ′ as the complement of the projective hyperplane arrangement f = uv(u − v)(u − w)w inside P 2 and define f 5 := w.Denote by L the intersection lattice of the affine arrangement defined by the same equation.By [13,Theorem 2.7], the rays in the Gröbner fan on Trop(X) ⊆ R 4 are in bijection with the flacets of M .A concrete recipe to compute the rays is as follows: take a flacet F and for i = 1, . . ., 5 define (v F ) i = 1 if F ⊆ {f i = 0} and (v F ) i = 0 otherwise.Then add a multiple of (1, . . ., 1) to v F to make the last coordinate equal to 0, and then forget the last coordinate.In this example, all dense edges except for {f 1 = 0} are flacets, and it can be verified that the procedure outlined here indeed recovers the 6 rays that we found before.
Finally, let E 1 := V (f 1 ).We notice that for α ∈ {s 1 = 0}, the form dlog f α | E 1 equals (α 2 + α 3 )dy/y.It follows that the form dlog f α | E 1 has a zero on E 1 if and only if α ∈ {s 1 = s 2 + s 3 = 0}.By Corollary 3.5, for general such α, e 1 ∈ Q F,α .In conclusion, although E 1 does not contribute a codimension-one component to S F , it does contribute an embedded component, in the sense that π Maximum likelihood estimation interpretation.Suppose that X has nonzero Euler characteristic.Then by Theorem 3.6, for a general data vector α, the function f α has exactly (−1) dim X χ(X) many critical points.This number is also called the maximum likelihood degree of X and is denoted by d ML (X).We denote the Zariski open subset of P p−1 for which this statement holds true by V .Consider a rigid ray τ , for which we thus know that χ(E • τ ) = 0. Since the E • τ are smooth very affine varieties as well, we denote their set of general data vectors by V τ ⊆ τ ⊥ .By its very definition, the set Q F, α describes the behavior of critical points of f α as α approaches α.Keeping this in mind, Corollary 3.5 states that as α approaches an α ∈ V τ along a curve in V , at least one of the critical points of f α approaches the torus orbit corresponding to τ .In particular, its limit lies in X Σ \ X.
The asymptotic behavior of critical points as described above is particularly explicit in the d ML (X) = 1 case, i.e., when the signed Euler characteristic of X is 1.In this case the maximum likelihood estimate is unique and determined by the rational map mapping α to the unique critical point of f α .Notice that ψ is the rational inverse to the birational morphism π 2 : C F,Y → P p−1 .Then ψ i := f i • ψ are rational functions on P p−1 .We now translate Theorem 3.7 into a statement about the ψ i .Note that v τ ∈ Q F,α if and only if there exists a morphism γ The following diagram gives an overview of the various objects and morphisms that we have defined: .
By the definition of ψ, the morphisms In other words, ord t (( Since this is true for all α ∈ V τ , we conclude that ord H Eτ (ψ i ) = (v τ ) i .Moreover, ψ i has no other zeroes or poles besides the H Eτ .To see this, we note that the same argument shows that every additional pole or zero of ψ i induces a component of S F .Namely, suppose there is an additional variety R ⊆ P p−1 such that ord R (ψ i ) = 0. Then approaching a general point p of R along a curve γ, ψ i (γ(t)) will approach 0 or ∞, and in particular ψ(γ(t)) leaves X as t → 0. Hence, p ∈ S F and thus R ⊆ S F .However, by Theorem 3.7 all components of S F are of the form H Eτ for some rigid ray τ ∈ Trop(X).We deduce the following proposition.
Proposition 3.15.Let X be a schön very affine variety with d ML (X) = 1 and Σ a fan whose support is Trop(X) s.t. for all rays τ ∈ Σ, X Σ ∩ O τ is connected.For a rigid ray τ ∈ Σ, let g τ be a defining equation of τ ⊥ .Then there exist complex numbers c i such that .
Since ψ i is homogeneous, the primitive generators of the rigid rays sum up to 0, i.e., The fact that the rigid rays sum up to zero is a special property of schön very affine varieties with maximum likelihood degree one, as will be demonstrated in Example 3.19.Related structure results for general very affine variety with maximum likelihood degree one were obtained in [19] and [12].
Example 3.16.We continue Example 3.14.In this case χ(X) = 1.An explicit computation of the morphism ψ using Mathematica gives the four rational functions , as predicted by Proposition 3.15.Note that the rigid rays sum up to zero.△ In order to formulate an analogous property for arbitrary maximum likelihood degree, we make use of the following lemma.Lemma 3.17.Let τ be a rigid ray and α ∈ V τ .Assume that E • τ = X Σ ∩O τ is connected and that d ML (E • τ ) > 0. Let S ⊆ P p−1 be an irreducible smooth curve containing α whose generic point lies in V and which intersects H Eτ transversely.Then there exists a morphism γ : ∆ , and non-empty since it contains (x, α).Hence any local equation g for π −1 1 (E τ ) around (x, α) gives a generator for the maximal ideal of O C,(x,α) .By Cohen's structure theorem, t → g gives an isomorphism C t ∼ = O C,(x,α) giving rise to the morphism γ : ∆ → C F,X Σ that we are looking for, since by construction ord t (γ * E τ ) = 1 and thus ord t (γ Remark 3.18.The difference between this construction and the construction in the proof of Corollary 3.3 is the fact that in this lemma we start by specifying a curve in P p−1 along which we approach α and then lift it to C F,X Σ .Note that the number of components of π −1 2 (S) passing through (x, α) is closely related to [20, Conjecture 3.19], which essentially predicts that there is only a single component passing through (x, α).
We deduce the following consequence for maximum likelihood estimation.Let α ∈ V τ and γ : ∆ → P p−1 with γ(∆ • ) ∈ V approaching α.Let S be a system of equations in C[t 1 , . . ., t p , s 1 , . . ., s p ] defining C F .Then substitute the components of γ for the s i -variables.The system S ′ obtained like that consists of equations in C t [t 1 , . . ., t p ].A solution γ ∈ C((t)) p of this system is a solution of the equation dlog f γ(t) (γ(t)) = 0. Lemma 3.17 assures that the system S ′ has a solution and that moreover it has a solution for which ord t (γ i ) = (v τ ) i for all i = 1, . . ., p.In practice, in order to approximate such a solution, one does a formal power series substitution x i = ∞ j=(vτ ) i c i,j t j into the system S ′ and iteratively solves for the c i,j .
Example 3.19.Let g ∈ C[x, y] be a generic conic through (0, 0) and write g = l+q as a sum of a linear and a quadratic part.We embed C 2 into C 3 via the tuple F = (x, y, g) and denote, as usual, f = xyg.Denote by X := F (C 2 ) ∩ (C * ) 3 the intersection of its image with (C * ) 3 .X is a hypersurface in (C * ) 3 cut out by the polynomial h := t 3 − g(t 1 , t 2 ).Since g is generic for its Newton polygon, X is schön (see [17,Section 2]).The rigid rays are given by e 1 , e 2 , e 3 , e 1 + e 2 + e 3 , and −e 1 − e 2 − 2e 3 .We remark that d ML (X) = 3 and that the rigid rays do not sum up to zero, demonstrating that the assumption on the maximum likelihood degree in Proposition 3.15 is indeed necessary.We now illustrate Lemma 3.17 for the schön very affine variety X as above.Consider for example the ray τ = −e 1 − e 2 − 2e 3 .The initial ideal in (−1,−1,−2) (h) is generated by t 3 −q(t 1 , t 2 ).By Lemma 3.9, it follows that X Σ ∩O τ is a copy of P 1 minus 4 points, and thus has Euler characteristic −2.By Theorem 3.6 and Corollary 3.5, this means that as one approaches a general α ∈ {s 1 +s 2 +2s 3 = 0}, at least two of the three maximum likelihood estimates of dlog t α will approach the torus orbit O τ .To make this more explicit, consider for instance g = x + y + x 2 + xy + y 2 , which turns out to be generic enough.Take the point (2, 1, −3/2) ∈ τ ⊥ and the curve t → (2 + t, 1 + t, −3/2) approaching it.We now notice that X equipped with the tuple (t 1 , t 2 , t 3 ) is isomorphic, via the map F , to C 2 \V (f ) with the tuple (x, y, g).To make our computations simpler we will work with the latter.On C 2 we compute that the two components of dlog f γ(0) are zero on C 2 \ V (f ) if and only if the following two equations are fulfilled: x + 2tx − 2x 2 + 2tx 2 + 4y + 2ty + xy + 2txy + 4y 2 + 2ty 2 = 0, 2x + 2tx + 2x 2 + 2tx 2 − y + 2ty − xy + 2txy − 4y 2 + 2ty 2 = 0.
The fact that v τ = (−1, −1, −2) ∈ Q F,α implies that this system has a solution (η 1 , η 2 ) in t −1 C t , for which g(η 1 , η 2 ) = c −2 t −2 + higher order terms.With Mathematica, we compute that we indeed have a solution with This solution converges to (1 : 1 8 (−1 − √ 33) : 0) ∈ P 2 , which is one of the three critical points of f α on P 2 .The other ones are p 2 = (1 : 1 8 (−1+ √ 33) : 0) and p 3 = (3 : −3 : 1).For p 2 , we can construct a similar curve to the one found above.The point p 3 is a point in X and thus we get a solution in C t whose first terms are

△
We remark that in the previous example the following equality holds: in analogy to the second equation in Proposition 3.15.It would be interesting to study if this equality holds true in general.

Bernstein-Sato ideals
In this section, we investigate Bernstein-Sato ideals and the codimension-one components of their vanishing sets.We explain how those can partially be recovered in terms of Q F,α and formulate a conjecture relating those codimension-one components to log-canonical threshold polytopes.4.1.Slopes of Bernstein-Sato ideals.Let Y ⊆ C p be a smooth closed subvariety the affine space.We consider the tuple of regular functions G = (g 1 , . . ., g p ) on Y consisting of the restriction of the coordinates t 1 , . . ., t p on C p to Y .We denote their product by g := p i=1 g i .We denote by X the very affine variety Y ∩ (C * ) p and on it we consider the tuple of nowhere vanishing functions F = (f 1 , . . ., f p ) consisting of the restriction of the coordinates t 1 , . . ., t p on C p to X.For a smooth affine algebraic variety with a tuple of regular functions, the Bernstein-Sato ideal is defined as follows. where is the sheaf of algebraic linear partial differential operators on Y with formal variables s 1 , . . ., s p adjoined.
Sabbah [28] showed that every codimension-one component of V (B G ) is an affine hyperplane.The set of Bernstein-Sato slopes of G, denoted by BS G , is defined to be the union of these hyperplanes after translating them to the origin.Since this is a homogeneous variety by definition, we will also consider the projective version of this variety, denoted by P(BS G ) ⊆ P p−1 .Theorem 4.2 ([25, Résultat 6]).Let , where π 1 , π 2 are the first and second projection from T * Y × C p , and π : T * Y → Y is the natural map.
Note that the description of BS G in this theorem is very similar to the definition of S F .There are two differences between the objects.In the definition of S F , the second factor is equal to P p−1 rather than C p .The second difference is the fact that in the definition of S F , the set C F is closed inside a compactification of X, whereas in Theorem 4.2, the closure of W G is taken inside the non-compact affine variety Y .Hence there will be contributions to S F from boundary components at infinity that are not relevant from the perspective of the Bernstein-Sato ideal.The following lemma explains how to identify contributions to BS G in terms of Q F,α .Lemma 4.3.Let α ∈ P p−1 and denote by L α ⊆ C p the line through the origin corresponding to α.
We denote by P(BS G ) the projectivization of the hyperplanes in BS G , and by X Σ a tropical compactification of X as in Section 3. Theorem 4.4.Assume that X is schön and that for all τ ∈ Σ the intersection X Σ ∩O τ is connected.Then the codimension-one irreducible components of S F ∩ P(BS G ) are exactly the hyperplanes P(τ ⊥ ) for τ a rigid ray contained in R p ≥0 .
Proof.By Proposition 3.13, the codimension-one components of S F are exactly the hyperplanes P(τ ⊥ ) for τ a rigid ray.Again by Proposition 3.13, all α ∈ P(τ ⊥ ) satisfy the condition of Lemma 4.3.Thus, L α is contained in BS G and thus α lies in P(BS G ). Hence P(τ ⊥ ) ⊆ P(BS G ).We now show that these indeed recover all components in the intersection.As in Section 3, denote by Σ a fine enough fan structure on Trop(X) such that the closure X Σ of X in the associated toric variety is smooth and the boundary X Σ \ X is a SNC divisor.Starting from an arbitrary fan we can always refine it to obtain a fan satisfying this condition, as in [16, proof of 2.5].
Denote by Σ C p the standard fan for C p .Let Σ ′ be a refinement of Σ of the type mentioned above for which the following is true: if the relative interiors of two cones σ 1 ∈ Σ ′ and σ 2 ∈ Σ C p intersect, then σ 1 ⊆ σ 2 .Denote by X Σ ′ the closure of X in the associated toric variety T Σ ′ .Let Ẋ be the variety obtained by removing from X Σ ′ all divisors corresponding to rays in Σ ′ \ R p ≥0 .This gives a log-resolution µ : Ẋ → Y of the divisor of g inside Y .Moreover, the irreducible components of the divisor µ −1 (V (g)) are in bijection with the rays in Σ ′ ∩ R p ≥0 .By [6, Lemma 4.4.6], the irreducible components of BS G are among the hyperplanes orthogonal to the rays in Σ ′ ∩ R p ≥0 .Let C be any component of S F ∩ P(BS G ). Since C is in P(BS G ), we find that C = P(τ ⊥ ) for some ray τ ∈ Σ ′ ∩ R p ≥0 .Since C is also a component of S F , this ray must be rigid.It follows that C is indeed of the form P(τ ⊥ ) for some rigid ray in R p ≥0 , concluding the proof.
Example 4.5.We pick up Example 3.16.Using the library dmod.lib[22] in the computer algebra system Singular [11], we compute the Bernstein-Sato slopes to be It follows that S F intersected with the projectivized hyperplanes of BS G equals In particular, the components {s 1 + s 2 + s 3 + s 4 = 0} and {s 2 + s 3 = 0} are contained in S F but not in BS G .This is explained by Theorem 4.4 since the rigid rays introducing these components to S F are −e 1 − e 2 − e 3 − e 4 and −e 2 − e 3 , which are not contained in R 4 ≥0 .On the other hand, the component {s 1 = 0} is contained in BS G but not in S F .This shows that in general the sets of components of BS G and S F are incomparable.Finally, we combine Theorem 4.4 with Proposition 3.15.We then find that the linear forms appearing in the numerators and denominators of the MLE as determined in Example 3.16 are among the defining equations for the hyperplanes in BS G .This explains the geometry behind the observation made in [29] suggesting a link between the MLE and the Bernstein-Sato ideal of the parametrization of the algebraic model of the statistical experiment.△ 4.2.LCT-polytopes.In the previous subsection, we partially recovered the Bernstein-Sato slopes in terms of the critical slopes of the very affine variety.We now formulate a conjecture related to the translations using log-canonical threshold (LCT) polytopes.
Definition 4.6.Let µ : Y ′ → Y be a log-resolution of V (g) with µ −1 (V (g)) = q i=1 E i .Denote by a ij := ord E i (g j ) and by k We call an affine hyperplane H ⊆ C p facet-defining if dim(H ∩ LCT(G)) = p − 1 and H ∩ LCT(G) ⊆ ∂(LCT(G)).The facet-defining hyperplanes are thus all of the form but in general not all these hyperplanes are facet-defining.The relevance of the facetdefining hyperplanes is highlighted in the following theorem: It follows from the proof of Theorem 4.4 that if X is schön we can construct a log resolution µ : Ẋ → Y of the divisor of g as the closure of X inside a toric variety with fan Σ ⊆ R p ≥0 ∩ Trop(X).In this log resolution the components of µ −1 (V (g)) correspond to the rays in Σ, and if τ is a ray in Σ, ord Eτ (g j ) = (v τ ) j .We denote by k τ the integer ord Eτ (K X Σ /X ) + 1. (v τ i ) j s j ≤ k τ i for i = 1, . . ., q    .
By Theorem 4.4, for every rigid ray τ in R p ≥0 , τ ⊥ ∈ BS G , i.e., τ ⊥ is a Bernstein-Sato slope of G.This means precisely that at least one affine translate of τ ⊥ lies in V (B G ).A natural candidate for such a translate is the one corresponding to the log-canonical threshold, leading to the following conjecture.Example 4.9.Let Y ⊆ C p be a linear subspace such that the restrictions of the coordinate functions to Y define a central indecomposable hyperplane arrangement.In this situation, the rigid rays in Trop(X) are (1, . . ., 1) and (−1, . . ., −1).These rays are rigid since X is homogeneous and they are the only rigid rays since the arrangement is indecomposable.The matroid polytope of the matroid associated to the arrangement is the intersection of LCT(G) with the hyperplane H := {s 1 + • • • + s p = dim Y }.It is shown in [13] that the matroid polytope has dimension p − 1. Hence H is facetdefining and by Theorem 4.7 H is an irreducible component of V (B G ).This proves [3, Conjecture 3] for complete factorizations of hyperplane arrangements.Recall that a factorization of a hyperplane arrangement is complete if each of its factors is linear.△

Example: Flipping a biased coin
In this section, we pick up and continue the example of flipping a biased coin twice that was studied in [12] and [29].Consider the smooth curve X in P 2 defined by the homogeneous polynomial f = det p 0 p 1 p 0 + p 1 p 2 = p 0 p 2 − (p 0 + p 1 )p 1 .
As pointed out in [12,Example 2], this is the implicit representation of the statistical model describing the following experiment: Flip a biased coin.If it shows head, flip it again.Here, the p 0 , p 1 and p 2 are to be thought of as representing the probabilities of three possible outcomes of the experiment.Since these outcomes must sum up to one, we impose the additional condition that p 0 + p 1 + p 2 = 0 and hence consider the variety X \ H, where H is the collection of hyperplanes {p 0 p 1 p 2 (p 0 + p 1 + p 2 ) = 0} in P 2 .We embed X \ H into the 3-torus via the morphism P : X \ H → (C * ) 3 , (p 0 : p 1 : p 2 ) → p 0 p 0 + p 1 + p 2 , p 1 p 0 + p 1 + p 2 , p 2 p 0 + p 1 + p 2 .
For c 1 = c 2 = c 3 = 1, this recovers the MLE computed in [12].Finally, we compute the Bernstein-Sato ideal of the tuple of coordinate functions on Y .Under the isomorphism and thus the Bernstein-Sato slopes of G are BS G = V (2s 0 + s 1 ) ∪ V (s 1 + s 2 ) .Indeed, the components correspond to w 1 and w 2 .The ray w 3 does not contribute to the Bernstein-Sato slopes since it is not contained in R 3 ≥0 .

Definition 2 . 2 . 1 .Definition 2 . 3 .
The asymptotic critical locus of F with respect to Y , denoted C F,Y , is the closure of C F inside Y × P p−1 , i.e., C F,Y := C F Y ×P p−Denote by π 1 : Y × P p−1 → Y (resp.π 2 : Y × P p−1 → P p−1 ) the projection to the first (resp.second) factor.Associated to F we define the variety

By Lemma 3 . 9 ,
all rays of the Gröbner fan on Trop(X) are rigid.They are generated by the vectors e 2 , e 3 , e 4 , −e 2 − e 3 , e 1 + e 2 + e 3 , and −e 1 − e 2 − e 3 − e 4 .It follows from Proposition 3.13 that the codimension-one part of S F equals