Definable Inapproximability: New Challenges for Duplicator

We consider the hardness of approximation of optimization problems from the point of view of definability. For many NP-hard optimization problems it is known that, unless P = NP, no polynomial-time algorithm can give an approximate solution guaranteed to be within a fixed constant factor of the optimum. We show, in several such instances and without any complexity theoretic assumption, that no algorithm that is expressible in fixed-point logic with counting (FPC) can compute an approximate solution. Since important algorithmic techniques for approximation algorithms (such as linear or semidefinite programming) are expressible in FPC, this yields lower bounds on what can be achieved by such methods. The results are established by showing lower bounds on the number of variables required in first-order logic with counting to separate instances with a high optimum from those with a low optimum for fixed-size instances.


Introduction
Twenty years ago, the PCP theorem [4] transformed the landscape of complexity theory. It showed that if P = NP then not only is it impossible to efficiently solve NP-hard problems exactly but for some of them it is also impossible to approximate the solution to within a constant factor. Consider for instance the problem MAX 3SAT. Here we are given a Boolean formula in 3CNF and we are asked to determine m * , the maximum number of clauses that can be simultaneously satisfied by an assignment of Boolean values to its variables. It is a consequence of the PCP theorem that there is a constant c < 1 such that, assuming P = NP, no polynomial-time algorithm can be guaranteed to produce an assignment that satisfies at least cm * clauses, or indeed determine the value of m * up to a factor of c. The proof of the PCP theorem introduced sophisticated new techniques into complexity theory such as the probabilistically checkable proofs that gave the theorem its name. Over the years, stronger results were proved, improving the constant c and, by reductions, proving inapproximability results for a host of other NP-hard problems.
A structural theory of hardness of approximation was introduced by Papadimitriou and Yannakakis [23] who defined the class MAX SNP of approximation problems, with a definition rooted in descriptive complexity theory. They showed that for every problem in this class, there is a constant d such that a polynomial-time algorithm can find approximate solutions within a factor d of the optimum. At the same time, for all problems that are MAX SNP-hard, under approximation-preserving reductions defined by [23], there is a constant c such that no polynomial-time algorithm can approximate solutions within a factor c. This makes it a challenge, for each MAX SNP-complete problem, to determine the exact approximation ratio that is achievable by an efficient algorithm. In some cases, this has been pinned down exactly. For instance, for MAX 3SAT we know that there is a polynomial-time algorithm that will produce an assignment satisfying 7/8 of the clauses in any formula but, unless P = NP, there is no polynomial-time algorithm that is guaranteed to produce a solution within 7/8 + ǫ of the optimal, for any ǫ > 0 [16]. Another interesting case is MAX 3XOR, where we are given a formula which is the conjunction of clauses, each of which is the XOR of three literals. Here, satisfiability is decidable in polynomial time as the problem is essentially that of solving a system of linear equations over the two-element field. However, determining, for an unsatisfiable system, how many of its clauses can be simultaneously satisfied is MAX SNP-hard, and the exact approximation ratio that is achievable efficiently is known: unless P = NP, no polynomial-time algorithm can achieve an approximation ratio bounded above 1/2 [16].
To give a problem of another flavour, consider minimum vertex cover, the problem of finding, in a graph G, a minimum set S of vertices such that every edge is incident on a vertex in S. Let vc(G) denote the size of a minimum size vertex cover in G. There are algorithms that are guaranteed to find a vertex cover no larger than 2vc(G) (this being a minimization problem, the approximation ratio is expressed as a number c ≥ 1). It has been proved, by means of rather sophisticated reductions starting at the PCP theorem, that, unless P = NP, no polynomial-time algorithm can achieve a ratio better than 1.36 [14]. Very recent results announced in [20] improve this lower bound to √ 2. It is conjectured that indeed no such algorithm could achieve a ratio of 2 − ǫ for arbitrarily small ǫ > 0 but, as of our current knowledge, the right threshold constant could be somewhere between √ 2 and 2. We approach these questions on the hardness of approximability from the point of view of definability. Our aim is to show that the tools of descriptive complexity can be brought to bear in showing lower bounds on the definability of approximations and that these definability lower bounds have consequences on understanding commonly used techniques in approximation algorithms.
A reference logic in descriptive complexity is fixed-point logic with counting, FPC. The class of problems definable in this logic form a proper subclass of the complexity class P. However, FPC is very expressive and many natural problems in P are expressible in this logic. For instance, any polynomial-time decidable problem on a proper-minor closed class of graphs is expressible in FPC [15]. Also, problems that can be formulated as linear programming or semidefinite programming problems are in FPC [2,8,13]. At the same time, for many problems we are able to prove categorically, i.e., without complexity theoretic assumptions, that they are not definable in FPC. Among these are NP-complete problems like 3SAT, graph 3-colourability and Hamiltonicity (see [10]). We can also prove that certain problems in P are not in FPC, such as 3XOR.
A particularly interesting class of problems are the optimization problems known as MAX CSP or constraint maximization problems, where we are given a collection of constraints and the problem is to find the maximum number of constraints that can be simultaneously satisfied. When it comes to finding exact solutions, definability in FPC turns out to be an excellent guide to the tractability of such problems. It is known that each such problem is either in P and definable in FPC or it is NP-complete and provably not definable in FPC [12]. We would like to extend such results also to the approximability of such problems. This paper develops the methodology for doing so.
For MAX 3SAT, we prove, without any complexity theoretic assumption, that no algorithm expressible in FPC can achieve an approximation ratio of 7/8 + ǫ. The question seems ill-posed at first sight as FPC is a formalism for defining problems rather than expressing algorithms. We return to the precise formulation shortly, but first note that there is a sense in which FPC can express, say the ellipsoid method for solving linear programs [2]. This is the basis for showing that many commonly used algorithmic techniques for approximation problems, such as semidefinite programming relaxations, are also expressible in FPC. Thus, on the one hand, reductions from MAX SNP-hard problems show inapproximability by any polynomial-time algorithm, assuming P = NP. On the other hand, our results show, without the assumption, inapproximability by the most commonly used polynomial-time methods.
Undefinability of a class of structures C in FPC is typically established by showing that structures in C cannot be distinguished from structures not in C in C k -firstorder logic with counting and just k variables-for any fixed k. In the terminology of [13], C has unbounded counting width. On the other hand, hardness of approximation for a maximization problem is typically established by showing that every class that includes all instances with an optimum m * and excludes all instances with an optimum less than cm * , is NP-hard. Our method combines these two. We aim to show that any class separating instances with an optimum m * from instances with an optimum less than cm * has unbounded counting width. In general, we not only show that counting width is unbounded, but establish stronger bounds on how it grows with the size of instances, as such bounds are directly tied to lower bounds on semidefinite programming hierarchies [13,8]. This methodology poses new challenges for Spoiler-Duplicator games in finite model theory. Such games are typically played on pairs of structures that are minimally different. In the new setting, we need to show Duplicator winning strategies in games on pairs of structures that differ substantially, on some numeric parameters.
The PCP theorem is the fons et origo of results on hardness of approximation. It established the first provably NP-hard constant gap between the fully satisfiable instances of MAX 3SAT, i.e., those in which all clauses can be satisfied, and the less satisfiable ones, those where no more than 1 − ǫ 0 can be satisfied, for some explicit ǫ 0 > 0. The gap between 1 and 1 − ǫ 0 was then amplified and also transferred to other problems by means of reductions. For us, the starting point is the problem MAX 3XOR. We are able to establish a definability gap between the satisfiable instances of this and instances in which little more than 1/2 of the clauses can be satisified. The constant 1/2 is easily seen to be optimal since in every 3XOR instance at least half of the equations can be satisfied.
The methods for establishing this optimal initial gap are very different from that for the PCP theorem. We construct a k-locally satisfiable instance of MAX 3XOR which, by a random construction is at the same time highly unsatisfiable. We can then combine this with a construction adapted from [6] to obtain an optimal gap that defeats any fixed counting width. This shows that no algorithm that is expressible in FPC can approximate MAX 3XOR within a constant above 1/2, even on satisfiable instances. It should be pointed out that, although the inapproximability of MAX 3XOR above 1/2 matches algorithmic lower bounds and is tight, the type of definability gap that we obtain, which applies to satisfiable instances, cannot have an analogue in the algorithmic setting. The satisfiable instances of MAX 3XOR are distinguished from unsatisfiable ones by a polynomial-time algorithm. To show inapproximability for any constant greater than 1/2 one has to show that it is the almost satisifable ones that are indistinguishable from those that are highly unsatisfiable. This distinction supports our claim that our methods are very different from that for the PCP theorem.
With such an optimal initial gap for MAX 3XOR in hand, we can then transfer it to other problems by means of reductions, just as in classical inapproximability. Our reductions have to preserve FPC definability and we mostly rely on first-order definable reductions. For one, the standard direct reduction from 3XOR to 3SAT is trivially firstorder definable and gives an optimal undefinability gap for MAX 3SAT: no algorithm expressible in FPC can achieve an approximation ratio of 7/8 + ǫ, even on satisfiable instances. Again this matches known algorithm lower bounds and is tight. For other problems we need to rely on more sophisticated constructions, without leaving the realm of first-order definable reductions. It turns out that many of the reductions used in the classical theory of approximability are first-order reductions but this requires close examination and proof.
We show that the long-code reductions from [16] are definable in first-order logic. Such reductions have the merit of providing different constructions of optimal gaps for MAX 3XOR and MAX 3SAT starting at any initial gap whatsoever. In addition, the techniques that are involved in them have applications elsewhere. For the vertex cover problem, we are able to show that the reduction from [14], which is based on the same long-code reduction techniques as in [16], is first-order definable, showing that FPC cannot give an approximation better than 1.36. It is possible that this could be improved to √ 2 using the recent breakthrough of [20] but we leave this to future work.

Logics and games
We assume familiarity with first-order logic FO. All our vocabularies are finite and relational, and all structures are finite. For a structure A, we write A to denote its universe. We refer to fixed-point logic FP and fixed-point logic with counting FPC but the definitions of these are not required for the technical development in the paper. For this it suffices to consider the bounded variable fragments of first-order logic. For a fixed positive integer k, we write L k to denote the fragment of first-order logic in which every formula has at most k variables, free or bound. We also write ∃L k,+ for the existential positive fragment of L k . This consists of those formulas of L k formed using only the positive Boolean connectives ∧ and ∨, and existential quantification. FOC is the extension of first-order logic with counting quantifiers. For each natural number i, we have a quantifier ∃ i where A |= ∃ i x φ if, and only if, there are at least i distinct elements a ∈ A such that A |= φ[a/x]. While the extension of first-order logic with counting quantifiers is no more expressive than FO itself, the presence of these quantifiers does affect the number of variables that are necessary to express a query. Let C k denote the k-variable fragment of FOC in which no more than k variables appear, free or bound.
For two structures A and B, we write A ≡ C k B to denote that they are not distinguished by any sentence of C k . All that we need to know about FPC is that for every We also write A ⇒ k B to denote that every sentence of ∃L k,+ that is true in A is also true in B. While ≡ C k is an equivalence relation, ⇒ k is reflexive and transitive but not symmetric. These relations have well established characterizations in terms of two-player pebble games. The relation ⇒ k is characterized by the existential k-pebble game [21] and ≡ C k by the k-pebble bijective game [17].
Both versions of the game are played on a pair of structures A and B by two players, Spoiler and Duplicator, using k pairs of pebbles (a 1 , b 1 ), . . . , (a k , b k ). In a game position, some (or all) of the pebbles a 1 , . . . , a k are placed on elements of A while the matching pebbles among b 1 , . . . , b k are placed on elements of B. Where it causes no confusion, we do not distinguish notationally between the pebble a i (or b i ) and the element on which it is placed. In the existential k-pebble game, at each move Spoiler chooses a pebble a i (which might or might not already be on an element of A) and places it on any element of A. Duplicator has to respond by placing b i on an element of B. If the resulting partial map from A to B given by a i → b i is not a partial homomorphism, then Spoiler has won the game. In the bijective k-pebble game Spoiler chooses a pair of pebbles (a i , b i ) and Duplicator has to respond by giving a bijection from b : A → B which agrees with the map a j → b j for all j = i. Spoiler chooses a pair (a, b(a)) on which to place the pebbles (a i , b i ). Again, if the resulting partial map from A to B given by a i → b i is not a partial isomorphism, then Spoiler has won the game. In both games, we say Duplicator has a winning strategy if, no matter how Spoiler plays, it can play forever without losing. The following summarises the connection between these games and the relations ≡ C k , and ⇒ k . For any two structures A and B, the following hold: A ⇒ k B if, and only if, Duplicator has a winning strategy in the existential k-pebble game played on A and B [21]; and A ≡ C k B if, and only if, Duplicator has a winning strategy in the bijective k-pebble game played on A and B [17].
For undirected graphs, the relation ≡ C 2 has a simple combinatorial characterization in terms of vertex refinement (see [19]). For any graph G, there is a coarsest partition C 1 , . . . , C m of the vertices of G such that for each 1 ≤ i, j ≤ m there exists δ ij such that each v ∈ C i has exactly δ ij neighbours in C j . Let H be another graph and D 1 , . . . D m ′ be the corresponding partition of H with constants γ ij . Then G ≡ C 2 H if, and only if, m = m ′ and there is a permutation h ∈ Sym m such that |C i | = |D h(i) | and δ ij = γ h(i)h(j) for all i and j.
Let C be a class of structures and for any n ∈ N, let C n denote the structures in C with at most n elements. The counting width of C [13] is the function k : N → N where k(n) is the smallest value such that for any A ∈ C n and any B ∈ C , we have A ≡ C k(n) B.
Note that k(n) ≤ n. Because A ≡ C 1 B whenever A and B have different numbers of elements, k(n) is also the smallest value such that C n is a union of ≡ C k(n) -classes. In particular, it follows that the counting width of C is the same as that of its complement. For k : N → N, we say that two disjoint classes C and D are C k -separable if whenever A ∈ C n and B ∈ D n , then we have A ≡ C k(n) B. Equivalently C and D are C k -separable if there is a class E of counting width at most k such that C ⊆ E and D ⊆ E .

Interpretations
Consider two signatures σ and τ . A d-ary FO-interpretation of τ in σ is a sequence of first-order formulas in vocabulary σ consisting of: (i) a formula δ(x); (ii) a formula ε(x, y); (iii) for each relation symbol R ∈ τ of arity k, a formula φ R (x 1 , . . . , x k ); and (iv) for each constant symbol c ∈ τ , a formula γ c (x), where each x, y or x i is a d-tuple of variables. We call d the dimension of the interpretation. If d = 1, we say that the interpretaion is linear. We say that an interpretation Θ associates a τ -structure B to . Note that an interpretation Θ associates a τ -structure with A only if ε defines an equivalence relation on A d that is a congruence with respect to the relations defined by the formulae φ R and γ c . In such cases, however, B is uniquely defined up to isomorphism and we write Θ(A) = B. It is also worth noting that the size of B is at most n d , if A is of size n. But, it may in fact be smaller. We call an interpretation p-bounded, for a polynomial p, if |B| ≤ p(|A|), and say the interpretation is linearly bounded if p is linear. Every linear interpretation is linearly bounded, but the converse is not necessarily the case.
For a class of structures C and an interpretation Θ, we write Θ(C ) to denote the class {Θ(A) | A ∈ C }. We mainly use interpretations to define reductions between classes of structures. These allow us to transfer bounds on separability, by the following lemma. Lemma 1. Let Θ be a p-bounded interpretation of dimension d and let t be the maximum number of variables appearing in any formula of Θ. If C and D are two disjoint classes of structures such that Θ(C ) and Θ(D) are C k(n) -separable, then C and D are C dk(p(n))+tseparable.
Proof. Let A ∈ C n and B ∈ D n be two structures. Then, since Θ(A) and Θ(B) have size at most p(n), there is a formula φ ∈ C k(p(n)) such that Θ(A) |= φ and Θ(B) |= φ. We compose φ with the interpretation Θ to obtain φ ′ . That is to say, we replace every relation symbol by its defining formula, including replacing all occurrences of equality by ε, and we relativize all quantifiers to δ. Note that this involves replacing quantification over elements with quantification over tuples. It is well known that a counting quantifier over tuples ∃ i x can be replaced by a series of counting quantifiers over single elements without increasing the total number of variables. Then A |= φ ′ and B |= φ ′ . It is also easy to check that φ ′ has at most dk(p(n)) + t variables. The multiplicative factor d comes from the fact that every variable in φ is replaced by a d-tuple and the additive t accounts for any other variables that may appear in the formulas of Θ.
When we wish to define a reduction from a class C by a first-order interpretation, it suffices to give an interpretation Θ for all structures in C with at least two elements (or, indeed, at least k elements for any fixed k). This is because we can define an arbitrary map on a finite set of structures by a first-order formula, so we just need to take the disjunction of Θ with the formula that defines the required interpretation on the structures with one element. With this in mind, we define the method of finite expansions which gives us interpretations Θ that take a structure A with universe A to a structure with a universe consisting of l labelled disjoint copies of S for some definable subset S of A. Note that Θ would not, in general, be linear, but it is linearly bounded.
So, fix a value l, and let t be the least integer such that l ≤ 2 t . In a structure A with at least two elements, we say that a t + 1-tuple of elements (a 1 , . . . , a t+1 ) codes an integer i ∈ [2 t ] if b 1 · · · b t is the binary representation of i − 1 and b j = 1 if, and only if, a j+1 = a 1 . For each i, we can clearly define a formula γ i (y) with t + 1 free variables that defines those tuples that code i. Now, for any formula σ(x), let δ(x, y) be the formula σ(x) ∧ i≤l γ i (y) and let ǫ(x 1 , y 1 , x 2 , y 2 ) be the formula In other words, δ picks out those t + 2 tuples (s, a) where s satisfies σ and a codes an integer in [l], and ǫ identifies distinct tuples which have the same s and the same integer l. An interpretation using these can be seen to yield a structure with l disjoint copies of the set of elements of A satisfying σ.

The Basic Gap Construction
The problems 3SAT and 3XOR both ask to decide if a formula consisting of the conjunction of Boolean constraints each on exactly three Boolean variables is satisfiable. In 3SAT the constraints are disjunctions of literals on three distinct variables. In 3XOR the constraints are parities of three distinct variables. Both problems are known to have unbounded counting width [6]: the class of satisfiable instances cannot be separated in C k , for bounded k, from the class of unsatisfiable ones. Our aim is to show that this result can be strengthened to show that the class of satisfiable instances is not C k -separable from the class of instances that are highly unsatisfiable, meaning that no assignment to the variables can satisfy more than a fraction s of the constraints for some fixed s ∈ (0, 1). We give a basic construction for 3XOR, based on that in [6], that establishes this for any s > 1/2, with a lower bound on the value of k that is linear in the number of variables in the system. Then we use this construction to get one for 3SAT for any s > 7/8, also for a value of k that is linear in the number of variables. In both cases, the constants 1/2 and 7/8 are easily seen to be optimal.

Systems of constraints
Let Γ be a finite set of relations over a finite domain D, also called a constraint language. Let I = {c 1 , . . . , c m } be a collection (multi-set) of constraints, each of the form R(x i 1 , . . . , x i k ), where R is a k-ary relation in Γ, and x i 1 , . . . , x i k are k distinct D-valued variables from a set x 1 , . . . , x n of n variables. For c ∈ [0, 1], we say that the system I is c-satisfiable if there is an assignment f : {x 1 , . . . , x n } → D that satisfies at least cm constraints; i.e., that satisfies (f (x i 1 ), . . . , f (x i k )) ∈ R for at least cm constraints R(x i 1 , . . . , x i k ) from I. Note that, as we are counting the number of satisfied constraints, multiplicities matter and this is why we have multi-sets rather than sets of constraints.
We think of a system I = {c 1 , . . . , c m } over the constraint language Γ as a finite structure in two ways. In the first encoding, the universe is the disjoint union of x 1 , . . . , x n and c 1 , . . . , c m . The vocabulary includes binary relations E 1 , E 2 , . . . such that E i (x, c) holds if the constraint c has arity at least i and x is the ith variable in c. The vocabulary also includes a unary relation where k is the arity of R. In the second encoding, the universe is just the set of variables x 1 , . . . , x n , and the vocabulary includes a k-ary relation symbol R for each k-ary relation R in Γ, such that R(x i 1 , . . . , x i k ) holds if this is one of the constraints in the collection c 1 , . . . , c m . Note that in this second encoding the collection of constraints is treated as a set. In particular, the multiplicity of constraints is lost, which could affect its csatisfiability.
The constraint language Γ is also encoded as a finite structure in two ways. In the first encoding the domain is D ≤r = D ∪ D 2 ∪ D 3 ∪ · · · ∪ D r , where r is the maximal arity of a relation in Γ. The relations E 1 , E 2 , . . . are interpreted by the projections: The relations Z R are interpreted by the relation R itself as a unary relation over the universe: In the second encoding, the universe is just D, and the relation symbol R is interpreted by R itself. Where it causes no confusion, we do not distinguish between a constraint language Γ and the structure that encodes it, and similarly between an instance I and its encoding structure.
It is easily seen that, in both encodings as finite structures, a system I over Γ is satisfiable if, and only if, there is a homomorphism from the structure that encodes I to the structure that encodes Γ. We say that the system is k-locally satisfiable if I ⇒ k Γ.
For 3SAT, the constraint language is denoted Γ 3SAT . It has domain D = {0, 1} and the relations are the eight relations R 1 , . . . , R 8 ⊆ {0, 1} 3 defined by the eight possible clauses on three variables. For 3XOR, the constraint language is denoted Γ 3XOR . It also has domain D = {0, 1} and the relations are the two relations R 0 , R 1 ⊆ {0, 1} 3 defined by the two possible linear equations x + y + z = b with three variables over F 2 = {0, 1}. Accordingly, 3XOR instances I can be identified with systems of linear equations Ax = b over F 2 . In the following, A and b are referred to as the left-hand side matrix of I and right-hand side vector of I, respectively.

Gap construction
We now focus on 3XOR and hence on systems of linear equations over F 2 .
A starting point for us is the following construction which allows us to convert any klocally satisfiable system of equations into a pair of systems that are ≡ C k -indistinguishable. See [1,Prop. 32] for a related construction, which is inspired by the proof in [6] that satisfiability of systems of linear equations over F 2 is not invariant under ≡ C k for any k.
For any instance I of 3XOR we define another instance G(I) of 3XOR which has two variables x 0 j and x 1 j for each variable x j of I. For each equation x j + x k + x l = b in I, we have eight equations in G(I) given by the eight possible values of a 1 , a 2 , a 3 If I is the system Ax = b, then the homogeneous companion of I is the system Ax = 0, which we denote I 0 . Note that the system G(I 0 ) is satisfiable for any I by setting each variable x a j to a. We show that, despite this, as long as I is locally satisfiable, G(I) is hard to distinguish from its homogeneous companion G(I 0 ).
Proof. We describe a strategy for Duplicator in the bijective k-pebble game played on G(I) and G(I 0 ), given a strategy in the existential k-pebble game on I and Γ = Γ 3XOR .
Suppose we have a position in the existential k-pebble game on I and Γ with pebbles on x 1 , . . . , x k ′ , for some k ′ ≤ k in I, and corresponding pebbles on v 1 , . . . , v k ′ ∈ {0, 1} in Γ. Suppose further that this is a winning position for Duplicator, i.e. she has a strategy to play forever from this position. Then, we claim that the position in the bijective game where the pebbles in G(I) are on is a winning position in the bijective game on these two structures. To see this, note first that, if is a partial isomorphism. To see that Duplicator can maintain the condition, suppose Spoiler moves the pebble on x a j . By assumption, Duplicator has a response in the existential game whenever Spoiler moves the pebble from x j to x l . This response defines a function f from the variables in x to {0, 1}. We use this to define the bijection taking x a l to x a+f (x l ) l . This is a winning move in the bijective game.
As far as the degree of satisfiability is concerned, the construction preserves a gap in the following quantifiable terms: Lemma 3. For every 3XOR instance I and every c, s ∈ [0, 1], the following hold: For proving 2, suppose g is an assignment of values in {0, 1} to the variables x a i in G(I). Let h : {x 1 , . . . , x n } → {0, 1} be the assignment defined by h(x j ) = g(x 0 j ). We claim that if e i is an equation x j + x k + x l = b in I that is not satisfied by h then at least four of the eight equations in G(I) arising from e i are falsified by g. To see this, consider two cases. First, suppose that g(x 0 t ) = g(x 1 t ) for some t ∈ {j, k, l}. Without loss of generality, we assume t = j. Then consider the four pairs of equations obtained by taking the four possible values of a 1 and a 2 . Since g(x 0 j ) = g(x 1 j ), if one equation in a pair is satisfied by g the other is necessarily falsified. Thus, at least four equations are falsified. For the second case, suppose that for each t ∈ {j, k, l} occurring in e i we have g(x 0 t ) = g(x 1 t ). But then, since we assume that h falsifies e i , it follows that g falsifies x 0 j + x 0 k + x 0 l = b and hence it falsifies all eight equations arising from e i . In either case, g falsifies at least four of the equations arising from e i . Now, suppose that g satisifes at least (1/2 + s/2) · 8m of the 8m equations in G(I). We claim that h satisfies at least sm equations in I. Suppose for contradiction that h falsifies a proportion ǫ > 1 − s of the equations. By the above argument, then g falsifies at least 4ǫm of the equations in G(I). But 4ǫm > (1/2 − s/2) · 8m contradicting the assumption that g satisfies at least (1/2 + s/2) · 8m equations.
The extreme cases of Lemma 3 are given by c = 1 for point 1, and s = 1/2 + ǫ with ǫ > 0 for point 2. Indeed, every 3XOR instance is 1/2-satisfiable, as witnessed by the all-zero assignment, or the all-one assignment, whichever satisfies more equations. Note also that point 1 of Lemma 3 preserves its extremality: if I is satisfiable, then so is G(I). However, point 2 does not preserve its extremality, since even if I is not (1/2 + ǫ)satisfiable, the best that can be claimed about G(I) is that it is not (3/4+ǫ/2)-satisfiable. In the following we show that if the vector b is chosen uniformly at random, then both instances Ax = b and G(Ax = b) are at most (1/2 + ǫ)-satisfiable, with high probability, provided the matrix A has at least a constant-factor more rows than columns.
Lemma 4. For every two reals ǫ > 0 and δ > 0 there exists an integer c > 0 such that for every sufficiently large integer n and every matrix A ∈ {0, 1} m×n , where m = cn and each row of A has exactly three ones, if b is chosen uniformly at random in {0, 1} m then, with probability at least 1 − δ, both 3XOR instances Ax = b and G(Ax = b) are at most (1/2 + ǫ)-satisfiable.
Proof. Fix ǫ > 0 and δ > 0 and let c be any integer bigger than 1/ǫ 2 . Let n be sufficiently large, let m = cn, and let A ∈ {0, 1} U ×V be any matrix with U = [m] and V = [n] that has exactly three ones in each row.
The probability of this event is 1/2, and all such events, as u ranges over U, are mutually independent. Thus, setting X f = u∈U X f,u , we have that X f is a binomial random variable with expectation E[X f ] = m/2. By Hoeffding's inequality, the probability that X f −E[X f ] ≥ t is at most e −2t 2 /m . In particular, the probability that X f ≥ (1/2+ǫ)m is at most e −2ǫ 2 m . By the union bound, the probability that some f satisfies Similarly, for each assignment f : We claim that the expectation of the random variable Y f,u is 1/2. To see this, consider two cases: In case 1), either all eight equations that come from u are satisfied, or none is, and each possibility happens with probability 1/2 according to the outcome of the random choice of b u . The expectation of Y f,u is thus 1/2 in this case. In case 2), exactly half of the eight equations that come from u are satisfied, and which half depends on the outcome of the random choice of b u . The expectation of Y f,u is thus 1/2 also in this case. This shows that the expectation of Y f,u is 1/2 in either case. Moreover, the random variables Y f,u , as u ranges over U, are mutually independent. Thus, setting Y f = (1/m) u∈U Y f,u , we have that Y f is the average of m independent random variables with range in [0, 1]. By Hoeffding's inequality, the probability that In particular, the probability that Y f ≥ 1/2 + ǫ is at most e −2ǫ 2 m . By the union bound, the probability that some f satisfies Y f ≥ 1/2 + ǫ is at most 2 n e −2ǫ 2 m .
Since m = cn and c > 1/ǫ 2 , twice 2 n e −2ǫ 2 m is at most 2 n+1 e −2n and is less than δ for all sufficiently large values of n. Thus, for any large enough n, the probability that both Ax = b and G(Ax = b) are at most (1/2 + ǫ)-satisfiable is at least 1 − δ.
The next step is to show that an appropriate choice of the matrix A will give a locally satisfiable instance Ax = b for any right-hand side b. Entirely analogous claims have been known and proved in the context of the proof complexity of propositional resolution; indeed, our proof builds on the methods for resolution width [9], and their relationship to existential pebble games from [5,7].
In the proof, we need the notion of a graph G that is a bipartite unique-neighbour expander graph with parameters (m, n, d, s, e) where m, n, d and s are integer parameters with s < n and e is a positive real number. What this means is that G is a bipartite graph with parts U and V with m and n vertices respectively; each u ∈ U has exactly d neighbours in V ; and for every A ⊆ U with |A| ≤ s we have |∂A| ≥ e|A|, where |∂A| denotes the set of vertices in V that are unique neighbours of A; i.e., they are neighbours of a single vertex in A.
Lemma 5. For every integer c > 0 there is a real γ > 0 such that for every sufficiently large integer n there is a matrix A ∈ {0, 1} m×n , where m = cn, such that each row of A has exactly three ones and, for every vector b ∈ {0, 1} m , the 3XOR instance Ax = b is k-locally satisfiable for k ≤ γn.
Proof. Fix an integer c > 0 and reals α > 0 and e > 0, and let n 0 be sufficiently large that for every n ≥ n 0 there exists a graph G that is a bipartite unique-neighbour expander graph with parameters (cn, n, 3, αn, e). For the existence of such graphs with these parameters see [27,Chapter 4]. Let A ∈ {0, 1} U ×V be the incidence matrix of G, where U = [m] and V = [n] are the two sides of G, for m = cn. For each b = (b u : u ∈ U) ∈ {0, 1} U , the 3XOR instance Ax = b has one variable x v for each v ∈ V , and one equation e u : , v 3 (u) are the three neighbours of u in G. We claim that every choice of b ∈ {0, 1} U gives that Ax = b is k-locally satisfiable for k ≤ γn with γ = eα/9. Proof. If I is satisfiable, then Duplicator certainly has a winning strategy and there is nothing to prove. Assume then that I is unsatisfiable and let I ′ be a minimally unsatisfiable subsystem; a subset of the equations of I that is unsatisfiable and every proper subset of it is satisfiable. For each equation e u : where z (a) stands for the negative literal ¬z if e = 0 and the positive literal z if e = 1. Let F be the 3CNF formula that is the union of all the F u as u ranges over U. Observe that F is an unsatisfiable 3CNF. We intend to apply Theorem 5.9 from [9] to it.
Let A be the collection of all Boolean functions f u : for u ∈ U. Each function in A is sensitive in the sense of Definition 5.5 from [9], and compatible with F in the sense of Definition 5.3 from [9]. Moreover, if A 0 ⊆ A is the set of functions that corresponds to the minimally unsatisfiable subsystem I ′ of I, then its cardinality m 0 satisfies m 0 > αn by Claim 6. It follows that the expansion e(A ) in the sense of Definition 5.8 from [9] is at least eαn/3. By Theorem 5.9 in [9], every resolution refutation of F requires width at least eαn/3, and hence at least 3k since k ≤ γn = eαn/9. By Theorem 2 in [7], Duplicator has a winning strategy for the existential 3k-pebble game played on the structures F and the constraint language Γ 3SAT of 3SAT, in the second encoding discussed in Section 3.1. We use this winning strategy to design a winning strategy for Duplicator in the existential k-pebble game played on I and Γ 3XOR .
While playing the game on I, Duplicator plays the game on F on the side and keeps the invariant that each pebbled variable in the game on I is also pebbled in the side game, and each pebbled equation in the game on I has its three variables pebbled in the side game. Whenever a new variable is pebbled in the game on I, Duplicator pebbles the same variable in the side game, and copies the answer from its strategy on it. Whenever a new equation is pebbled in the game on I, Duplicator pebbles its three variables in the side game, and answers the pebbled equation accordingly from its strategy. Since at each position of the game on I there are no more than k pebbles on the board, at each time during the simulation the side game has no more than 3k pebbles on the board. This shows that the simulation can be carried on forever and the proof is complete.
This completes the proof of Lemma 5.
We can now prove our first two gap theorems.
Theorem 8. For every real ǫ > 0, if C is the collection of 3XOR instances that are satisfiable and D is the collection of 3XOR instances that are not (1/2 + ǫ)-satisfiable, then C and D are not C k -separable for any k = o(n).
Proof. By combining Lemma 5 with Lemma 4, there is a family of systems (S k ) k≥1 with O(k) variables and equations such that G(S k ) is not (1/2+ǫ)-satisfiable but S k is k-locally satisfiable. Let I 1 k = G(S k ) and I 0 k = G(S 0 k ). Then I 0 k ≡ C k I 1 k by Lemma 2. Moreover, by the first part of Lemma 3, the instance I 0 k is satisfiable while, by choice, the instance I 1 k is not (1/2 + ǫ)-satisfiable. Since I 0 k and I 1 k have two variables for each variable in S k and eight equations for each equation in S k , they also have O(k) variables and equations and the result follows.
Theorem 9. For every real ǫ > 0, if C is the collection of 3SAT instances that are satisfiable and D is the collection of 3SAT instances that are not (7/8 + ǫ)-satisfiable, then C and D are not C k -separable for any k = o(n).
Proof. Consider the reduction Θ from 3XOR to 3SAT that translates each equation into a conjunction of four clauses. Thus x + y + z = d translates into the four clauses {x (a) , y (b) , z (c) } with a, b, c ∈ F 2 with a+b+c = d, where z (e) stands for the negative literal ¬z if e = 0 and the positive literal z if e = 1. This is easily defined in first-order logic. As the set of variables in I is the same as in Θ(I), it is linearly bounded. We claim that applying Θ to Theorem 8 with ǫ reset to ǫ/4 gives the theorem through Lemma 1. First, it is clear that if I is a 3XOR instance that is satisfiable, then Θ(I) is also satisfiable. Now, suppose that I is a system of m equations that is not (1/2 + ǫ/4)-satisfiable, and let g be an assignment of truth values to the variables X of Θ(I). Applied to I, the assignment g falsifies at least (1/2 − ǫ/4)m of the equations. For each equation, g must falsify at least one of the four corresponding clauses in Θ(I). Thus, g falsifies at least (1/2 − ǫ/4)m clauses in Θ(I) and so satisfies at most 4m − (1/2 − ǫ/4)m = (7/8 + ǫ) · 4m of the 4m clauses.

Long Code Reductions
In this section we show that certain reductions from the theory of inapproximability of MAX 3XOR and MAX 3SAT can be expressed as FO-interpretations. While no re-duction can provide an improvement on the already optimal inapproximability results that are implied by Theorems 8 and 9, these FO-interpretations have the merit of providing optimal gap pairs starting at any initial gap pair, provided the initial gap pair exhibits any constant gap separation whatsoever. In addition, the details of the FOinterpretations that we work out here will also be useful when we discuss the reductions to the vertex-cover problem in the next section.

Parallel repetition
An instance I of the LABEL COVER problem is given by two disjoint sets of variables U and V with domains of values A and B, respectively, a predicate P : U × V × A × B → {0, 1}, and an assignment of weights W : U × V → N. If all the non-zero weights W (u, v) are equal, then the instance is said to have uniform weights. If for all u ∈ U the sums W (u) := v∈V W (u, v) of incident weights are equal, then the instance is called left-regular. A right-regular instance is defined analogously in terms of W (v) := u∈U W (u, v). The instance is a projection game if for every (u, v) ∈ U × V with W (u, v) = 0 it holds that for every a ∈ A there is exactly one b ∈ B satisfying P (u, v, a, b) = 1. It is called a unique game if |A| = |B| and it is a projection game both ways: from A to B, and from B to A. The instance is said to have parameters (m, n, p, q) if |U| = m, |V | = n, |A| = p and |B| = q. Its domain size is p + q.
A value-assignment for an instance I is a pair of functions f : U → A and g : V → B. The weight v(f, g) of the value-assignment (f, g) is the total weight of the pairs ( For c ∈ [0, 1], we say that the instance is c-satisfiable if there is a value-assignment whose weight is at least c · W 0 , where W 0 = (u,v)∈U ×V W (u, v) is the maximum possible weight. We call it satisfiable if it is 1-satisfiable. The bipartite reduction takes an instance I of 3XOR and produces a projection game instance L(I) of LABEL COVER defined as follows. The sets U and V are the set of equations in I and the set of variables in I, respectively. The weight W (u, v) is 1 if v is one of the variables in the equation u, and 0 otherwise. The domains of values associated to U and V are A = {(a 1 , a 2 , a 3 ) ∈ F 3 2 : a 1 + a 2 + a 3 = 0} and B = F 2 , respectively. The predicate P associates to the pair (u, v), where u is the equation 1 , a 2 , a 3 ), a) ∈ A × B satisfying a = a i + b. In other words, P (u, v, (a 1 , a 2 , a 3 ), a) = 1 if, and only if, v appears in the equation u, and if u is v by construction, agrees with the (partial) assignment {v i → a}. Clearly, this defines a projection game.   satisfied by (f, g). Thus (f, g) satisfies at least 3cm of the 3m constraints in L(I), so L(I) is c-satisfiable. For proving 2, let (f, g) be an assigment for L(I) that satisfies at least (s + 2)m of the 3m constraints in L(I).
For each variable v in I, define h(v) = g(v). Let t be the number of equations of I that are satisfied by h. In terms of t, the assignment (f, g) satisfies at most 3t + 2(m − t) of the 3m constraints of L(I). Thus t ≥ sm, so I is s-satisfiable.
The parallel repetition reduction takes an instance I of LABEL COVER, and a positive integer t ≥ 1, and produces another instance R(I, t) of LABEL COVER defined as follows. Let U and V be the sets of variables in I and let W : U ×V → N be the weight assignment. The sets of variables of R(I, t) are U t and V t .
If A and B are the domains of values associated to U and V , then the domains of values associated to U t and V t are A t and B t respectively. For Observe that this definition guarantees that if I is a projection game, then so is R(I, t).
Theorem 11 (Parallel Repetition Theorem [24,18]). There exists a constant α > 0 such that for every instance I of LABEL COVER with domain size at most d ≥ 1, every s ∈ [0, 1] and every t ≥ 1 the following hold: Moreover, if I is a projection game, left-regular, right-regular, or has uniform weights, then so is R(I, t).
Although it is the case that the bipartite and the parallel repetition reductions are both FO-interpretations, we do not need to formulate this. Instead, we show the FOdefinability of the composition of these reductions with the long-code reductions that we discuss next.

First long-code reduction
The first long-code reduction that we consider takes a projection game instance I of LABEL COVER and a rational ǫ ∈ [0, 1] and produces an instance C(I, ǫ) of 3XOR defined as follows. Let U and V be the sets of variables of sizes m and n, respectively, with associated domains of values A = [p] and B = [q], let W : U × V → N be the weight assignment, let P : U × V × A × B → {0, 1} be the predicate of I, and for each (u, v) ∈ U ×V with W (u, v) = 0 and each a ∈ A let π u,v (a) be the unique value b ∈ B that satisfies P (u, v, a, b) = 1. The existence of such a function π u,v : A → B is guaranteed from the assumption that I is a projection game. The set of variables of C(I, ǫ) includes one variable u(a) for each u ∈ U and a ∈ F p−1 2 , and one variable v(b) for each v ∈ V and b ∈ F q−1 2 , for a total of m2 p−1 + n2 q−1 variables. Before we are able to define the set of equations of C(I, ǫ) we need a piece of notation. For a vector z = (z 1 , . . . , z d ) ∈ F d 2 of dimension d ≥ 2, we write S(z) = z d and F (z) = (z 1 + S(z), . . . , z d−1 + S(z)). Note that S(z) is a single field element, and F (z) is a vector of dimension d − 1. With this notation, the set of equations of and each y, z ∈ F p 2 , where M is the denominator of ǫ = N/M reduced to lowest terms, D is the number of positions i ∈ [p] such that z i = x π(i) + y i , and π = π u,v if W (u, v) = 0.
Theorem 12 (Håstad 3-Query Linear Test [16]). For every s, ǫ ∈ [0, 1] with ǫ > 0 and s > 0 and every projection game instance I of LABEL COVER, the following hold: The proof of Theorem 12 follows from Lemmas 5.1 and 5.2 in [16]. In order to see this, we need to explain how our notation matches the one in [16]. Besides the obvious and minor correspondance between multiplicative and additive notation for F 2 , with −1 ↔ 1 and +1 ↔ 0, there are three other noticeable differences between the statement of Theorem 12 and the statements of Lemmas 5.1 and 5.2 in [16].
The first difference is that Theorem 12 applies to arbitrary projection game instances of LABEL COVER, while the statements in [16] are phrased only for the special cases of the problem that result from applying the parallel repetition construction to a suitable bipartite reduction applied to a 3SAT instance. We chose to formulate Theorem 12 in this more general and modular form because this is what the proofs of Lemmas 5.1 and 5.2 in [16] show, and also because this is how more recent expositions of these results are presented (see, e.g., [3]).
The second difference is that the conclusion of our statement is phrased in terms of the c-satisfiability of a 3XOR instance, while the statements of Lemmas 5.1 and 5.2 in [16] are phrased in terms of the acceptance rate of a probabilistic test that has the following form: given access to certain tables A u and A v , with F 2 entries {A u (x)} x∈I and {A v (y)} y∈J for certain index sets I and J, respectively, choose a random 3-variables parity test on the A u (x) and A v (y) entries under a well-designed special-purpose distribution, and check if it is satisfied. This difference is only notational and minor: our instance of XOR is built by viewing the A u (x) and A v (y) entries as variables u(x) and v(y), and assigning weight to each 3-variable parity equation on these variables proportionally to the probability that it is checked by the probabilistic test on the A u and A v tables. With this change, c-satisfiability of the instance translates into the probability of acceptance of the test being at least c, and vice-versa.
The third difference in the notation is that our variables u(x) and v(y), and the corresponding entries A u (x) and A v (y) of the tables A u and A v , are indexed by F p−1 2 and F q−1 2 instead of the more natural F p 2 and F q 2 , respectively. This is due to the fact that we implement the operations of folding over true and conditioning upon h from [16] directly in our construction. In other words, our tables A u and A v are what [16] calls A W,h,true and A U,true , respectively. Folding over true as in A U,true is achieved for A v through the notation S(z) and F (z) defined above: we chose to partition F p 2 into 2 p−1 pairs of the form (z, 0), (F ((z, 1)), 1), as z ranges over F p−1 2 , and view an arbitrary It is straightfoward to see that A ′ v is folded over true, in the definition of [16], by construction.
Conditioning upon h as in A W,h,true for A u is achieved through the same mechanism as folding over true with the additional observation that the operation of conditioning upon h is necessary only if the instance of LABEL COVER fails to satisfy the property that for every (u, v) ∈ U × V and every a ∈ A there is at least one b ∈ B that satisfies the predicate P (u, v, a, b). When this is the case, one defines h = h u,v : A → {0, 1} as the predicate indicating if a given a has at least one b that satisfies P (u, v, a, b), and conditions the table A u upon h. In our case we do not require this since the given instance of LABEL COVER is a projection game instance, and, in particular, for every a there is exactly one b, and hence at least one b, such that P (u, v, a, b) = 1; i.e., h = h u,v is the constant 1 predicate. It should be added that the reason why we can assume that I is a projection game instance is that our bipartite reduction is designed in such a way that the values a in A are partial assignments that always satisfy the corresponding constraints u in U. In constrast, in [16] the values are taken as arbitrary truth assignments to the variables of a collection of clauses, and not all such assignments satisfy all the clauses. Our exposition is again more modular and also matches more recent expositions of the results in [16] (again, see, e.g., [3]).
With this notational correspondance, it is now easy to see that Lemma 5.1 in [16] gives the first claim in Theorem 12, and Lemma 5.2 in [16] applied with δ = (s/ǫ) 1/2 /4 gives the second claim in Theorem 12.
Next, by composing Lemma 10, Theorem 11, and Theorem 12 with the appropriate parameters we get the following: Theorem 13. For every s, ǫ ∈ [0, 1] with 0 < s < 1 and ǫ > 0, there is an FOinterpretation Θ that maps instances of 3XOR to instances of 3XOR in such a way that, for every 3XOR instance I the following hold: Proof. First we define Θ(I) and then check that this definition is an FO-interpretation. In anticipation for the proof, let t be a large enough integer so that the following inequality holds: where α is the constant in Theorem 11. Such a t exists because s < 1 and ǫ > 0. Apply the bipartite reduction to I to obtain the instance I ′ = L(I) from Lemma 10. Observe that the domain size d of I ′ is |A| + |B| = 6. Next apply the parallel repetition reduction to I ′ with parameter t to obtain a new instance I ′′ . Finally apply the long-code reduction to I ′′ with parameter ǫ to obtain the system I ′′′ . The parameters were chosen in a way that the system I ′′′ satisfies properties 1 and 2, through Theorem 12.
It remains to argue that I ′′′ can be produced from I by an FO-interpretation. To define I ′ from I there is no difficulty at all: the FO-interpretation is even linear. To define I ′′ from I ′ we note that t is a constant, and that the weights W (u, v) of I ′ are 0 or 1, so again there is no difficulty. In this case the FO-interpretation has dimension t, and it is n t -bounded. To define I ′′′ from I ′′ we note that the domain sizes p and q of the instance I ′′ are constants, indeed p = 4 t and q = 2 t . This means that there are |U| · 2 p−1 variables of type u(a), and |V | · 2 q−1 variables of type v(b), and these are constant multiples of |U| and |V |, respectively. Such domains are FO-definable by the method of finite expansions (see Section 2). Finally, since the weights W (u, v) of I ′′ are still zeros or ones and both ǫ and q are constants, the multiplicities of the equations of I ′′′ are also constants, and hence FO-definable.
It is useful to compare Theorem 13 with Lemma 3. Both statements are reductions that take 3XOR instances to 3XOR instances, and they both preserve gaps. But the reductions differ in what happens to satisfiable instances. For statement 1, in which the extreme case is c = 1, the reduction in Lemma 3 preserves this extremality exactly. In contrast, the reduction in Theorem 13 incurs a vanishing ǫ loss as it produces instances that are only (1 − ǫ)-satisfiable.

Second long-code reduction
The second long-code reduction takes a projection game instance I of LABEL COVER and a rational δ ∈ [0, 1] and produces an instance D(I, δ) of 3SAT defined as follows. Before we define D(I, δ), let us define an intermediate instance D ′ (I, ǫ) of 3SAT that takes a different parameter ǫ ∈ [0, 1]. Let U, V , m, n, A, B, p, q, W , P , and π u,v (a) be as in the first long-code reduction. The set of variables of D(I, ǫ) is defined as in the first long-code reduction: a variable u(a) for each u ∈ U and each a ∈ F p−1 2 , and a variable v(b) for each v ∈ V and each b ∈ F q−1 2 . We also use the folding notation F (z) and S(z) from the first long-code reduction. Now the instance D ′ (I, ǫ) includes W (u, v) · M q · ǫ D · (1 − ǫ) E−D · H copies of the clause {v(F (x)) (S(x)) , u(F (y)) (S(y)) , u(F (z)) (S(z)) } for each (u, v) ∈ U × V , each x ∈ F q 2 and each y, z ∈ F p 2 , where M is the denominator of ǫ = N/M reduced to lowest terms, E is the number of positions i ∈ [p] with x π(i) = 1 and D is the number of positions i ∈ [p] with x π(i) = 1 and z i = y i for π = π u,v if W (u, v) = 0, while H ∈ {0, 1} is the indicator for the event that in each position i ∈ [p] with x π(i) = 0 we have z i = y i . Finally, to define the instance D(I, δ), set t = ⌈δ −1 ⌉ and ǫ 1 = δ, and ǫ i+1 = δ 71 2 −35 ǫ i for i = 1, . . . , t − 1, and let the instance be t i=1 D ′ (I, ǫ i ). Theorem 14 (Håstad 3-Query Disjunction Test [16]). There exists s 0 > 0 such that for every s ∈ [0, 1] with 0 < s < s 0 and every projection game instance I of LABEL COVER the following hold: 1. if I is satisfiable, then C(I, ǫ) is satisfiable, 2. if I is not s-satisfiable, then C(I, ǫ) is not (7/8 + log 2 (1/s) −1/2 )-satisfiable.
For the proof of Theorem 14, see Lemmas 6.12 and 6.13 in [16]. As in the first long-code reduction, some explanation is needed for seeing this.
Besides the notational differences that were already pointed out in the first long-code reduction, the second long-code reduction adds the following. First, the constants 71 and 35 in the definition of ǫ i+1 come from setting c = 1/35 in the definition of Test F3S δ (u) in [16]. According to Lemma 6.9 in [16], this is an acceptable setting of c. Second, the constant s 0 > 0 in Theorem 14 is meant to be chosen small enough so as to ensure that, for each s satisfying s < s 0 , we have 2 −64δ −2 /25 < 2 −dδ −1 log 2 (δ −1 ) for δ = 8 log 2 (1/s) −1/2 /5, where d is the constant hidden in the asymptotic O-notation of Lemma 6.13 in [16]. Such an s 0 exists because N log 2 (N) = o(N 2 ) as N → +∞. With this notation, Lemma 6.12 in [16] gives point 1, and Lemma 6.13 in [16] with δ = 8 log 2 (1/s) −1/2 /5 gives point 2 in Theorem 14.
By composing Lemma 10, Theorem 11, and Theorem 14 with the appropriate parameters we get the following: Theorem 15. For every s, ǫ ∈ [0, 1] with 0 < s < 1 and ǫ > 0, there is an FOinterpretation Θ that maps instances of 3XOR to instances of 3SAT in such a way that, for every 3XOR instance I the following hold: Proof. First we define Θ(I) and then check that this definition is an FO-interpretation. Let t be a large enough integer so that the following inequality holds: where α is the constant in Theorem 11 and s 0 > 0 is small enough as in Theorem 14.
Such a t exists because s < 1 and ǫ > 0 as well as s 0 > 0. Apply the bipartite reduction to I to obtain the instance I ′ = L(I) from Lemma 10. Observe that the domain size d of I ′ is |A| + |B| = 6. Next apply the parallel repetition reduction to I ′ with parameter t to obtain a new instance I ′′ . Finally apply the second long-code reduction to I ′′ to obtain the system I ′′′ . The parameters were chosen so that the system I ′′′ satisfies properties 1 and 2, through Theorem 14. As in the proof of Theorem 13 this reduction is FO-definable.
This gives us another route to Theorem 9.

Vertex Cover
We investigate gap inexpressibility results for the vertex cover problem VC on graphs.
Recall that a set X ⊆ V of vertices in a graph G = (V, E) is a vertex cover if every edge in E has at least one of its endpoints in X. If the graph comes with a weight function w : V → R + , then the weight of X is the sum of the weights of the vertices in X. If the weights of the vertices are omitted in the specification of the graph, then all the vertices are assumed to have unit weight. The problem of finding the minimum weight vertex cover in a graph is a classic NP-complete problem.
In the following we write vc(G) for the weight of a minimum weight vertex cover, and vcd(G) := vc(G)/W 0 , where W 0 := v∈V w(v), for the vertex cover density. Analogously, we write IS(G) for the weight of a maximum weight independent set, and isd(G) := IS(G)/W 0 . Clearly vcd(G) = 1 − isd(G) holds for all weighted graphs.

Direct reductions
The standard reduction that proves the NP-completeness of the vertex cover problem (see, e.g. [22,Thm. 9.4]) takes an instance I of 3SAT with n variables and m clauses and gives a graph G with 3m vertices in which the minimum vertex cover has size exactly 2cm, if cm is the maximum number of clauses in I that can be simultaneously satisfied. It is also easy to see that this reduction can be given as an FO-interpretation. This interpretation is linearly bounded and therefore it follows from Theorem 9 and Lemma 1 that for any ǫ > 0 the collection of graphs G with vcd(G) ≤ 7/12 + ǫ and the collection of graphs G with vcd(G) ≥ 2/3 cannot be separated in C k for any k = o(n). This has the consequence that no approximation algorithm for the vertex cover problem expressible in FPC can achieve an approximation ratio better than 8/7.
We can improve on this by considering instead the so-called FGLSS reduction from 3XOR to vertex-cover, which we describe next.
Theorem 16. There is a linearly-bounded first-order reduction G that takes an instance I of 3XOR with m equations to a graph G(I) with 4m vertices so that if m * is the maximum number of equations of I that can be simultaneously satisfied, then vc(G) = 4m − m * .
Proof. For each equation x + y + z = b in I, the graph G(I) has a 4-clique of vertices, each labelled with a distinct assignment of values to the three variables that make the equation true. In addition, we have an edge between any pair of vertices that are labelled by inconsistent assignments. It is easily seen that the largest independent set in G(I) is obtained by taking an assignment g of values to the variables of I that satisfies m * equations and, for each satisfied equation, selecting the vertex in its 4-clique that is the projection of g. This yields an independent set of size exactly m * and the result follows.
From this, and Theorem 8, we immediately get the following result.
Corollary 17. For any ǫ > 0, if C is the collection of graphs G with vcd(G) ≤ 3/4 and D is the collection of graphs G with vcd(G) ≥ 7/8 − ǫ then C and D are not C k -separable for any k = o(n).
This improves the FPC inapproximability ratio from 8/7 to 7/6. Better lower bounds on the approximation ratio are known under the assumption that P = NP. One such lower bound was achieved by Dinur and Safra [14] who showed that, under this assumption, no polynomial-time algorithm for approximating vertex cover can achieve an approximation ratio better than 1.36. In the next section we argue that this reduction is also an FO-interpretation, so we get the same inapproximability ratio for algorithms that are expressible in FPC, giving a strengthening of Corollary 17.

Dinur-Safra reduction
As in the long-code reductions from Section 4, this reduction is also composed of several steps: we start with the bipartite reduction, continue with the parallel repetition reduction, then we apply an intermediate reduction to a technical variant of the independent set problem, and end with a long-code reduction that is specially tailored for the vertex cover problem.
The intermediate reduction takes a projection game instance I of LABEL COVER as input and produces an undirected graph G(I) defined as follows. Let U, V , A, B, W and π u,v determine the projection game instance I. The set of vertices of the graph G(I) is U × A. There is an edge between (u 1 , a 1 ) ∈ U × A and (u 2 , a 2 ) ∈ U × A in G(I) if, and only, if either u 1 = u 2 and a 1 = a 2 , or u 1 = u 2 and there exists v ∈ V such that W (u 1 , v) > 0 and W (u 2 , v) > 0 and π u 1 ,v (a 1 ) = π u 2 ,v (a 2 ). This defines G(I). In the terminology of [14], the graph G(I) is (m, p)-co-partite: its edge-set is the complement of an m-partite graph with all its parts of size p.
For an undirected (unweighted) graph G, recall that IS(G) denotes the size of a largest independent set in G. For a positive integer h ≥ 2, let IS h (G) denote the size of a largest subset of vertices of G that does not contain any h-clique. Note that IS h (G) ≥ IS(G), and IS 2 (G) = IS(G).
where l = 2l 1 ·r, and l 1 is an integer that depends only on h, ǫ, and p, and is independent of r, that is set as in Definition 2.3 of [14]. Here, and in the following, P =k (X) and P ≥k (X) denote the collections of subsets of X of size exactly k and size at least k, respectively, and P(X) denotes the collection of all subsets of X. Thus, if n = mp is the number of vertices of G, then H(G, ǫ, p, h; r) has n l · 2 l i=l 1 ( l i ) vertices. Since we want to be able to show that for fixed r, h, ǫ and p the graph H(G, ǫ, p, h; r) can be produced from G by an FO-interpretation, we give an alternative presentation of the set of vertices W .
Let V l, = denote the set of l-tuples of pairwise distinct elements from V . Formally, For each u = (u 1 , . . . , u l ) ∈ V l, = , let σ u : {1, . . . , l} → {u 1 , . . . , u l } be the natural bijection defined by σ u (i) = u i for i = 1, . . . , l. The set is a good proxy for the set W through the identification of {1, . . . , l} and {u 1 , . . . , u l } given by σ u . Now, turning W ′ into a faithful copy of W is only a matter of taking a quotient with the appropriate equivalence relation, as we do next. Let ∼ be the equivalence relation on V l defined by (u 1 , . . . , u l ) ∼ (v 1 , . . . , v l ) if and only if for each i ∈ [l] there exists j ∈ [l] with v j = u i and for each j ∈ [l] there exists i ∈ [l] with u i = v j . Restricted to V l, = ⊆ V l , this is still an equivalence relation. Moreover, whenever u = (u 1 , . . . , u l ) and v = (v 1 , . . . , v l ) are ∼-equivalent tuples in V l, = , there is a unique permutation π ∈ S l that sends u to v; i.e., that satisfies π · u = v, or u π(i) = v i for each i ∈ [l]. Now we extend this equivalence relation ∼ from the set V l, = to the set V l, = × P(P ≥l 1 ([l])) as follows: (u, S) ∼ (v, T ) if, and only if, u ∼ v and the unique permutation π ∈ S l that sends u to v also sends S to T ; i.e., it satisfies π · S = T , where π · S denotes the natural action of π on S. It is not hard to see that the set of equivalence classes W ′′ := (V l, = × P(P ≥l 1 ([l])))/ ∼ is an alternative presentation of the same set W . This alternative presentation of W is useful when we argue that the reduction is an FO-interpretation in Theorem 20 below. We still need to define the vertex-weights and the edge-set of H(G, ǫ, p, h; r). The weight of a vertex (B, S) in W is defined as where M is the denominator of p = N/M reduced to lowest terms, and q = |P ≥l 1 (B)|. Next we define the edge-set: two vertices (B 1 , S 1 ) and (B 2 , S 2 ) in W are adjacent if, and only if, either B 1 = B 2 and S 1 ∩ S 2 = ∅, or there exist an edge {v 1 , v 2 } ∈ E of G and an (l − 1)-element subsetB of V such that B 1 =B ∪ {v 1 } and B 2 =B ∪ {v 2 } and, for all Theorem 19 (Dinur-Safra Vertex-Cover Test [14]). For any two rationals ǫ and p satisfying 0 < ǫ ≤ 1 and 0 < p < p max = (3 − √ 5)/2, any small enough s 0 > 0, any large enough integer h, and any (m, r)-co-partite graph G, the following hold: 1. if IS(G) = m, then isd(H(G, ǫ, p, h; r)) ≥ p − ǫ, For the proof of Theorem 19, see Theorem 2.2 in [14].
The reduction described above produces a weighted graph H(G, ǫ, p, h; r). The weights, as defined in (8) are non-negative integers with a maximum value of M q . This value depends on ǫ, p, h and r but is independent of the number of vertices of G. In other words, fixing the other parameters, H gives us a traslation from G to a weighted graph, with integer weights bounded by a constant. This can be easily modified to get an unweighted graph. Indeed, let H = (V, E, W ) be a graph with a weight function W : V → N. We define from this an unweighted graph H ′ with vcd(H ′ ) = vcd(H). This is obtained by replacing each vertex v by the set of vertices v * := {v} × [W (v)] and having an edge between (u, i) and (v, j) if, and only if, {u, v} ∈ E. To see that this has the right property, it is sufficient to observe that S ⊆ V is a minimum weight vertex cover in H if, and only if, S * := v∈S v * is a minimum vertex cover of H ′ . The direction from right to left is obvious. For the other direction, suppose that H ′ has a minimum vertex cover X that is not of this form. In particular, for some v ∈ V , v * ∩ X = ∅ and v * ⊆ X. But then X \ v * is still a vertex cover, contradicting the minimality of X.
By composing Lemma 10, Theorem 11 and Theorem 19 with the appropriate parameters and combining it with the observation above we get the following.
Theorem 20. For every s, ǫ, p with 0 < s, ǫ < 1, and 1/3 < p < p max = (3 − √ 5)/2, there is an FO-interpretation Θ that maps instances of 3XOR to undirected graphs in such a way that, for every 3XOR instance I the following hold: Proof. Firt we define Θ(I) and then check that it is an FO-interpretation. Let t be a large enough integer so that the following inequality holds: when s 0 is small enough, and h is large enough, so that Theorem 19 applies. Such a t exists because s < 1 and s 0 > 0. Apply the bipartite reduction to I to obtain the instance I ′ = L(I) from Lemma 10. The domain size of I ′ is 6. Apply the parallel repetition reduction of Theorem 11 to I ′ with parameter t to get another instance I ′′ .
Next apply the intermediate reduction of Lemma 18 to get a graph G. Finally, apply the Dinur-Safra long-code reduction of Theorem 19 to get a weighted graph H and convert it to an unweighted graph that is the output of Θ. The parameters were chosen in such a way that the points 1 and 2 hold via the relationship vcd(H) = 1 − isd(H).
We still need to check that Θ is an FO-interpretation. As in the proof of Theorem 13, producing I ′ from I and I ′′ from I ′ is straightforward. Producing G = G(I ′′ ) from I ′′ is equally straightforward: the definition of the intermediate reduction is explicit enough that this can be checked directly, especially because the weights of I ′′ are still zeros and ones. On the other hand, producing H = H(G, ǫ, p, h; r) from G requires some explanation.
In the description of the vertex-cover long-code reduction we already described W ′′ as an alternative presentation (7) of W in (4). This alternative presentation suggests that the vertex-set of H be defined by an FO-interpretation of dimension l through the method of finite expansions from Section 2 to produce W ′ in (6), followed by a quotient by an FO-definable equivalence relation. The method of finite expansions produces a set of the form V l, = × A for some bounded set A that codes P(P l 1 ([l])). The effect of the quotient on V l, = × A can be achieved through the equality-defining formula ǫ(x, y) of the FO-interpretation, which in this case can be designed as follows. Let (u, a) and (v, b) be two elements of the expanded domain V l, = × A. We want ǫ(x, y) to tell if u and v involve exactly the same elements from V and, in such a case, whether the unique permutation that takes u to v also takes the set of subsets of [l] coded by a to the set of subsets of [l] coded by b. The first part can be stated by means of a simple quantifier-free formula. The second part can also be stated by a quantifier-free formula (that depends on l) by taking a disjunction over all l! potential permutations of [l].
Once the domain is defined as W ′′ in equation (7), defining the edge-set is easy. Defining the weights is also straightforward given that h, ǫ, p and l are all fixed constants independent of G, and as noted above, we can replace the weights with sets of unweighted vertices. Now we can state the improved version of Corollary 17. Composing Theorem 8, Theorem 20, and Lemma 1 we get the following.
Theorem 21. For any ǫ > 0 there is a δ > 0 such that if C is the collection of graphs G with vcd(G) ≤ 1 − 4p 3 max + 3p 4 max − ǫ and D is the collection of graphs G with vcd(G) ≥ 1−p max +ǫ then C and D are not C k -separable for any k = o(n δ ), where p max = (3− √ 5)/2.
In terms of algorithms, Theorem 21 says that no algorithm that can be expressed in FPC, or even C k for k = n o(1) , can achieve an approximation ratio better than (1 − 4p 3 max + 3p 4 max )/(1 − p max ) ≈ 1.36. In particular, this means that n O(1) levels of the Lasserre hierarchy are necessary to give an approximation algorithm for vertex cover with an approximation better than 1.36. This result was previously known from the work of Tulsiani [26].

Tight lower and upper bounds for C 2
There are straightforward polynomial-time algorithms that yield a vertex cover in a graph with guaranteed approximation ratio 2. It is conjectured that no polynomial-time algorithm can achieve an approximation ratio of 2 − ǫ for any ǫ > 0. It would be interesting to prove a version of this conjecture for algorithms expressible in FPC, and without the assumption that P = NP. This could be established by a strengthened version of Theorem 21 with better ratios. We next show that we can at least do this for the special case of k = 2.
Theorem 22. For any ǫ > 0, if C is the collection of graphs G with vcd(G) ≤ 1/2 and D is the collection of graphs G with vcd(G) ≥ 1 − ǫ then C and D are not C 2 -separable.
Proof. Let (G n ) n∈N be a family of 3-regular expander graphs on n vertices, so that the largest independent set in G n has size o(n). For the existence of such graphs see [27,Chapter 4]. It follows that the smallest vertex cover in G n has size n − o(n). Hence, we can choose a value m such that G 2m has no vertex cover smaller than 2m(1 − ǫ).
Let H m be a 3-regular bipartite graph on two sets of m vertices. Now, each part of a bipartite graph is a vertex cover, so H m has a vertex cover of size m. However, it is known that G ≡ C 2 H holds for any pair G and H of d-regular graphs with the same number of vertices, for any d. Thus, G 2m ≡ C 2 H m and the result follows.
Essentially, Theorem 22 tells us that no algorithm that is invariant under ≡ C 2 can determine vc(G) to an approximation better than 2, and Theorem 21 tells us that no algorithm that is invariant under ≡ C k for constant or even slowly growing k can determine vc(G) to an approximation better than 1.36. A legitimate question at this point is whether there is any algorithm that is invariant under ≡ C k , such as one expressible in FPC would be, that does achieve an approximation ratio of 2. The natural polynomialtime algorithms that give a vertex cover with size at most 2vc(G) are not expressible in FPC. Indeed, we cannot expect a formula of FPC to define an actual vertex cover in a graph G as this is not invariant under automorphisms of G. We can only ask for an estimate of the size, i.e. of vc(G), and this we can get up to a factor of 2. For this, it turns out that k = 2 is enough, showing that the lower bound of Theorem 22 is tight: Theorem 23. For any δ, if C is the collection of graphs G with vcd(G) ≤ δ and D is the collection of graphs G with vcd(G) > 2δ then C and D are ≡ C 2 -separable.
The proof of Theorem 23 proceeds through a series of lemmas.
Lemma 24. If G is a d-regular graph on n vertices, for any d ≥ 1, then vc(G) ≥ n/2.
Proof. Let S be any set of vertices in G. Then the number of edges incident on vertices in S is at most d|S|. Since the number of edges in G is dn/2, if S is a vertex cover d|S| ≥ dn/2 and so |S| ≥ n/2.
Let G be a graph and C 1 , . . . , C m be the partition of the vertices of G given by vertex refinement. So, there are constants δ ij such that each v ∈ C i has exactly δ ij neighbours in C j . Since the graph is undirected, the number of edges from C i to C j is the same as in the other direction and so δ ij |C i | = δ ji |C j |, for all i and j. Also, δ ij = 0 if, and only if, δ ji = 0.
Let X = {i | δ ii = 0} and Y = {i | δ ii > 0}. Consider the undirected graph X G with vertices X and edges {(i, j) | δ ij > 0}. Consider the instance (X G , w) of weighted vertex cover obtained by taking the graph X G and giving each vertex i the weight w(i) = |C i |. Let p G denote the value of the minimum weighted vertex cover of this instance. Also, let q G = i∈Y |C i |. Finally, define v G = p G + q G .
Proof. The value v G is determined entirely by the sizes of C i in the vertex refinement of G and the corresponding values of δ ij . Since G ≡ C 2 H, these values are the same for H. Proof. Let Z ⊆ X be a minimum-weight vertex cover in (X G , w). Take the set S ⊆ V (G) defined by S = i∈Y ∪Z C i . Note that the sets Y and Z are disjoint, i∈Y |C i | = q G by definition, and i∈Z |C i | = p G by construction. So S has exactly v G vertices. We claim that S is a vertex cover in G. Let e be any edge of G with endpoints in C i and C j . If either i or j is in Y , then the corresponding endpoint of e is in S since C i ⊆ S for all i ∈ Y . If both i and j are not in Y then both are in X and δ ij > 0. Thus, since Z is a vertex cover for the graph X G then one of i or j must be in Z and again at least one endpoint of e is in S.
For the proof of the next lemma, we need the notion of a fractional vertex cover of a graph G = (V, E). This is a function f : V → [0, 1] satisfying the condition that for every (u, v) ∈ E, f (u) + f (v) ≥ 1. It is known that if f is a fractional vertex cover of G, then v∈V f (v) ≥ vc(G)/2 (see [28,Thm. 14.2]). More generally, suppose we have an instance of weighted vertex cover, i.e. G along with a weight function w : V → N where vc(G, w) is defined as the value of the minimum weighted vertex cover. Then