Asymptotically Optimal Multi-Paving

Anderson's paving conjecture, now known to hold due to the resolution of the Kadison-Singer problem asserts that every zero diagonal Hermitian matrix admits non-trivial pavings with dimension independent bounds. In this paper, we develop a technique extending the arguments of Marcus, Spielman and Srivastava in their solution of the Kadison-Singer problem to show the existence of non-trivial pavings for collections of matrices. We show that given zero diagonal Hermitian contractions $A^{(1)}, \cdots, A^{(k)} \in M_n(\mathbb{C})$ and $\epsilon>0$, one may find a paving $X_1 \amalg \cdots \amalg X_r = [n]$ where $r \leq 18k\epsilon^{-2}$ such that, \[\lambda_{max} (P_{X_i} A^{(j)} P_{X_i})<\epsilon, \quad i \in [r], \, j \in [k].\] As a consequence, we get the correct asymptotic estimates for paving general zero diagonal matrices; zero diagonal contractions can be $(O(\epsilon^{-2}),\epsilon)$ paved. As an application, we give a simplified proof wth slightly better estimates of a theorem of Johnson, Ozawa and Schechtman concerning commutator representations of zero trace matrices.

Thus, we improve the dependence for simultaneously (one-sided) paving k matrices from exponential in k to linear in k. Taking k = 2 with the pair A, −A for a zero diagonal Hermitian A yields two sided pavings of size r = O(1/ǫ 2 ).
We remark that this bound matches the r = Ω(1/ǫ 2 ) lower bound established by Casazza et al. [CEKP07], and the r = Ω(k) lowerbound established by Popa and Vaes [PV15], who also speculated as to whether a polynomial or linear dependence might be achievable. In the last section, we show how one may combine the above two examples to construct k tuples of zero diagonal matrices for which a multi-paving requires at least k⌊ǫ −2 ⌋ blocks. Thus, our result is asymptotically optimal in terms of both k and ǫ.
We note that in the infinite-dimensional setting, Popa and Vaes [PV15] have shown that given a singular MASA A in a type II 1 factor M, when considering the tuple A ω ⊂ M ω where A ω and M ω denote the ultrapowers of the relevant von Neumann algebras, the bound ǫ = 2 √ r − 1/r is achievable for (r, ǫ)−paving for any finite collection of zero diagonal operators . We emphasise here that Popa and Vaes show that in this setting, the multi-paving bound is independent of the number of operators. However, when it comes to MASAs with large normalisers, or finite matrices for that matter, the situation is quite different; the number of blocks must grow linearly in the number of matrices.
A corollary of our main theorem is the following improvement of the quantitative commutator theorem of [JOS13] (who had provided an estimate of e K √ log m log log ma for a large constant K), whose main advantage is a significantly simplified proof.
Corollary 2 Every zero trace matrix A ∈ M n (C) may be written as A = [B, C], such that B C ≤ 300 e 9 √ log(m) ||A||.
We include a short proof of this in the final section of the paper.

Restricted Invertibility
Restricted invertibility refers to the problem of choosing a single submatrix (as opposed to a paving) of a given matrix with small norm. The first such result was given by Bourgain and Tzafriri in [BT87], with improvements and generalizations by [Ver01,SS12,NY16,Rav17,MSS17]. We prove the following result on choosing a single submatrix which simultaneously has small norm for a given collection of matrices A (1) , . . . , A (k) . Our bounds have the merit of being slightly stronger (by a constant factor) than those obtained by applying paving and choosing the largest part. The proof is also somewhat simpler than that of Theorem 1.

Techniques and Organization
To introduce our techniques, let us briefly recall how [MSS15] constructs pavings. First, a one-sided paving is obtained via the method of interlacing families of polynomials (see e.g. [MSS14] for an overview), which relies on three facts: (I) The largest eigenvalue of a Hermitian matrix is equal to the largest root of its characteristic polynomial.
(II) The characteristic polynomials of the matrices: form an "interlacing family", which implies in particular, that their average is real-rooted and that there exists a polynomial in the family whose largest root is at most that of the sum.
(III) The largest root of the average polynomial can be bounded using a "barrier function argument".
This method is only able to control the largest eigenvalue of a matrix, and therefore can only give one-sided bounds. The suboptimal exponential dependence on k for simultaneously paving k matrices arises from sequentially applying the one-sided result in a black box way k times.
The main contribution of this paper is a technique for simultaneously carrying out the interlacing argument "in parallel" for k matrices, losing only a factor of k in the process. The key idea is to replace the determinant of a single matrix by a mixed determinant of k matrices.

Definition 4 Given a k tuple of matrices
Closely related to the mixed determinant is the following generalization of the characteristic polynomial of a single matrix to a tuple of k matrices, which we call the mixed determinantal polynomial (introduced in [Rav16]) or MDP in short.
Definition 5 Given a k tuple of matrices A = (A (1) , · · · , A (k) ) in M n (C), the mixed determinantal polynomial is defined as, Closely related polynomials were also used by Borcea and Branden in [BB08] in their solution of Johnson's conjectures. This definition is useful because it turns out that the largest eigenvalues of k matrices are simultaneously controlled by the largest root of their MDP up to a factor of k, serving as a substitute for (I).
We then show that parts (II) and (III) of the interlacing families argument can be carried out for MDPs. As in previous works, (II) is established by deriving a differential formula for MDPs which shows that they and all of their relevant convex combinations are real-rooted. For joint restricted invertibility, (III) is derived from the univariate root shrinking estimates of [Rav17]; for multi-paving, the required bound follows by observing that the relevant expected characteristic polynomials are equivalent (after a change of variables) to the mixed characteristic polynomials of [MSS15], and appealing to the bounds derived there.
The interlacing families for joint restricted invertibility and paving are analyzed in Sections 4 and 5 respectively, proving Theorems 3 and 1.

Preliminaries
Submatrices, Pavings, and Derivatives We will use the notation A(S) to denote the submatrix of A indexed by rows and columns in S, and A S to denote the matrix with rows and columns in S removed. For a collection of subsets to denote the block matrix containing submatrices indexed by the S i . Together with the mixed determinantal polynomial from the introduction, we will also need a closely related polynomial, Definition 7 Let A = (A (1) , · · · , A (k) ) be matrices in M n (C) and let S ⊂ [n]. Define the restricted mixed determinantal polynomial by, Central to our main result will be characteristic polynomials and mixed determinantal polynomials of pavings.
Definition 8 Let A ∈ M n (C) and let S = {S 1 , · · · , S r } be a collection of subsets of [n], that we will typically take to be a partition of [n]. Then, we define the characteristic polynomial of the collection, Also, given matrices A = (A (1) , · · · , A (k) ) in M n (C), we define the mixed determinantal polynomial of the collection, noting that the paving S takes precedence over the tuple A. We will use multiset notation to indicate partial derivatives, i.e., for a multiset S = (κ 1 , . . . , κ n ) containing elements of [n] with frequencies κ i , we will write where the variable z i will be clear from the context. Multiplying a multiset by an integer means multiplying the multiplicity of each element in it, for example ∂ k[n] means every element in [n] with multiplicity k. We will frequently deal with doubly indexed arrays of indeterminates z be a tuple of diagonal matrices of indeterminates. For a multiset S, we will use the shorthand notation to indicate differentiation with respect to the (i) variables. We will make frequent use of differential formulas for mixed determinantal polynomials and their variants.
Proposition 9 Let A = (A (1) , · · · , A (k) ) be matrices in M n (C). Then, (1) . Then, (2) Let S = {S 1 , · · · , S r } be a partition of [n]. Then, the mixed determinantal polynomial of the paving may be written as, Proof. Observe that for A ∈ M n (C) and S ⊂ [n], we have, By the Leibnitz formula, we have that, Given a partition T 1 ∐ · · · ∐ T k = (k − 1)[n], for the corresponding term above to be non-zero, all the S i must be subsets of [n], which in turn yields that yields that letting S i = [n] \ T i for i ∈ [k], that S 1 ∐ · · · ∐ S k = [n]. We therefore have that, For (2), we observe that, where in the last line we have used the fact that det[Z − A (j) ] is multilinear in the variables z i .
And finally, (3) follows from observing that, Interlacing, Stability, and Mixed Characteristic Polynomials We will use the following facts about real-rooted polynomials.
Definition 10 We say that a real rooted polynomial g(x) = α 0 We say that polynomials f 1 , . . . , f k have a common interlacer if there is a polynomial g so that g interlaces f i for each i.
By a result of Fell [Fel80], having a common interlacer is equivalent to asserting that for every f i and f j all of the convex combinations αf i + (1 − α)f j are real-rooted. We recall the following elementary lemma from [MSS13] that shows the utility of having a common interlacing.
Lemma 11 Let f 1 , . . . , f k be polynomials of the same degree that are real-rooted and have positive leading coefficients. Define If f 1 , . . . , f k have a common interlacing, then there exists an i so that the largest root of f i is at most the largest root of f ∅ .
An important class of real-rooted polynomials, which are closely related to MDPs, are mixed characteristic polynomials.
Definition 12 Given independent random vectors r 1 , . . . , r m with covariance matrices A i := Er i r * i , the mixed characteristic polynomial of A 1 , . . . , A m is defined as: It was shown in [MSS15] that the expression on the right hand side only depends on the A i and not on the details of the r i , so mixed characteristic polynomials are well-defined. We will exploit the following key property of mixed characteristic polynomials, also established in [MSS15]: Recall that a polynomial p(z 1 , . . . , z n ) is real stable if its coefficients are real and it has no zeros with Im(z i ) > 0 for all i. Note that a univariate real stable polynomial is real-rooted, and that a degree one polynomial with coefficients of the same sign is always real stable.
We will rely on the following theorem of Borcea and Branden, which characterizes differential operators preserving stability.
where c α,β ∈ R and c α,β is zero for all but finitely many terms. Define the symbol of T to be Then T preserves real stability if and only if F T (z, −w) is real stable.
Finally, we will use the elementary symmetric functions and the notation e i (A) to denote the i th coefficient of the characteristic polynomial of a matrix A, which is equal to e i of its eigenvalues.

Joint Control of The Largest Root
Theorem 6 is a consequence of the following monotonicity result.
Theorem 15 Let A = (A (1) , · · · , A (k) ) be a k tuple of Hermitian matrices and let B be a zero diagonal Hermitian matrix. Then, We will deduce this by setting the rows and columns of B to zero one by one. Let B (S) denote the n × n matrix gotten from B by setting entries in rows and columns from S ⊆ [n] to zero.
Proposition 16 Let A = (A (1) , · · · , A (k) ) and B be zero diagonal n×n Hermitian matrices. Then, We will show that f is monotone increasing in t.
Observe that p t is a polynomial of degree at most one in t since by Definition 4 it is a sum over products of characteristic polynomials of principal submatrices of the A (i) and B, which either contain both the first row and column of B t (introducing terms containing a factor of t) or contain neither. Thus, we may write p t (x) = r(x) + ts(x) for some polynomials r, s. Moreover, r(x) = p 0 (x) is real-rooted and s(x) = lim t→∞ p t (x) must be real-rooted or identically zero by Hurwitz's theorem; let us assume the former case since otherwise we are done.
Since the largest root of a real-rooted polynomial is continuous in its coefficients, f is continuous. Assume for a moment that r(x) and s(x) have no common roots. If f is not monotone, then by continuity we can choose two points t 1 = t 2 such that f (t 1 ) = f (t 2 ) = z. This implies that whence s(z) = 0, and consequently r(z) = 0, contradicting that r(x) and s(x) have no common roots.
To handle the general case, let r(x) = q(x)r 1 (x) and s(x) = q(x)s 1 (x) where r 1 and s 1 have no common factors. Observe that every root of q is a root of p t for all t. Thus, where M is the largest root of q, which is monotone by the preceding argument.
We now show that f is nondecreasing in t by examining its behavior at ∞. Assume that the first row of B is nonzero since otherwise we are done. Let where the expectation is over a uniformly random (k + 1)−wise partition of [n]. Since every matrix that appears above has zero diagonal and consequently zero trace, the second coefficient of each characteristic polynomial that appears in the above sum is zero, which implies that c 1 (t) = 0 for all t ≥ 0, i.e.
On the other hand, observe that Since the first row of B is nonzero this implies that c 2 (t) → −∞ as t → ∞. Combining this with (4), we have so the roots of p t must be unbounded in t. Since the mean of the roots is always zero, we conclude that λ max (p t ) → ∞ and λ min (p t ) → −∞ as t → ∞. Thus, f (t) must be monotone increasing in t, and f (1) ≥ f (0), as desired. Theorem 15 now follows by a simple inductive argument. Proof. Sequentially applying proposition (16) yields that, We conclude that, To obtain Theorem 6 we iterate Theorem 15 k − 1 times, yielding Letting A = A (1) to ease notation, we now observe that , as desired.

Joint Restricted Invertibility
In this section we prove theorem 3 by analyzing the expected MDP when we choose a common submatrix of a k−tuple of matrices. We begin by deriving the a useful formula for this polynomial. Recall that A S denotes the submatrix of A with columns in S removed, and χ S denotes the correspondingly restricted characteristic polynomial.
We conclude that, We now show that the roots of the expected MDP can be used to control the roots of the best MDP over all S, via an interlacing family argument.
Proposition 18 Let A (1) , · · · , A (k) be Hermitian matrices in M n (C). Then, for any m ≤ n, there is a subset S ⊂ [n] of size m such that, Proof. We first claim that the polynomials χ[A ({i}) ] for i ∈ [n] have a common interlacer. This is equivalent to verifying that for every α 1 , · · · , α k > 0, the polynomial is real rooted. We have, is real stable, the differential operator i∈[n] α i ∂ i preserves real stability and diagonalization preserves real stability, yielding a univariate real stable polynomial, which is necessarily real rooted.
Since differentiation preserves interlacing, we conclude that the polynomials for fixed j have a common interlacer. As a consequence, see [MSS14], we have that there is an i 1 such that, Repeating the same argument, we have that there is an i 2 such that, Iterating this, we see that there is a subset S of size m such that, To finish the proof, we appeal to the following Theorem from [Rav17][Lemmas 4.3, 4.5], which shows how the roots of a univariate polynomial shrink under taking many derivatives.
Theorem 19 (Root Shrinking) For any real rooted polynomial p of degree n with roots in [−1, 1], with average of roots 0 and with average of the squares of the roots α and any c ≤ (1 + α) −1 , In the proof that follows, given a tuple of Hermitians A = (A (1) , · · · , A (k) ), we will use the notation, Proof of Theorem 3. We start off by noting that the characteristic polynomial of a zero diagonal n × n Hermitian matrix A is of the form, x n−2 + lower order terms.
Observe that for any k tuple of n×n zero diagonal Hermitians A := A (1) , . . . , A (k) and partition S 1 ∪ . . . ∪ S k = [n], each of the matrices A x n−2 + lower order terms, and it is evident that this polynomial positive on (1, ∞) (since it is monic) and has sign (−1) n on (−∞, 1). Taking an average over all partitions and noting that each pair of indices has probability 1/k 2 of being in any particular S i , we find that 2k 2 x n−2 + · · · also has these properties, so all of its roots must lie in [−1, 1]. Moreover, by above, the sum of the roots of q is zero and the average of the squares of the roots is Writing, m = (1 − c)n for c sufficiently small, Theorem 19 yields that, Applying Theorem 6, this implies that Given ǫ < 1, setting c = ǫ 2 /6k, we see that, as desired.

Joint Paving
In this section we prove Theorem 1. As with restricted invertibility, the idea is to construct an interlacing family on certain MDPs, this time indexed by r−wise partitions of [n]. To this end, let P r denote the set of partitions of [n] into r subsets. Given a partition S = (S 1 , . . . , S r ) ∈ P r and a tuple of matrices A = (A (1) , . . . , A (k) ), recall that we defined, We will show that these polynomials can be used to construct an interlacing tree, as follows. For any m ≤ n and any ordered r partition (that may include empty sets) of [m], T = (T 1 , · · · , T r ), i.e. T 1 ∐ · · · ∐ T r = [m], define, Proposition 20 Let A = (A (1) , · · · , A (k) ) be a k tuple of matrices in M n (C) and let m ≤ n and let T = (T 1 , · · · , T r ) be a r partition of [m]. Then, Proof. We have, We now show that the polynomials q(T) form an interlacing family. We will use the notation e i j for i ∈ [n] to denote a collection of r subsets of [n], with the i'th subset equalling the singleton {j} and the other subsets being empty. Formally, Lemma 21 Let A = (A (1) , · · · , A (k) ) be a k tuple of Hermitian matrices in M n (C) and let m ≤ n and let T = (T 1 , · · · , T r ) be a r partition of 3. Consequently, there is a i ∈ [r] such that, λ max q(T ∪ e i m+1 ) ≤ λ max (q(T)) .
Proof. Let p be the polynomial, We have that, To show that the polynomials q(T ∪ e i m+1 ) have a common interlacer, we need to show that for any non-negative real numbers {α i } i∈ [r] , the polynomial i∈[r] α i q(T ∪ e i m+1 ) is real rooted. We see that, where D is the differential operator, .

The symbol of this diffrential operator is
w j , and F D (−w) may be written as, We conclude that F D (−w) is real stable, which by 14 shows that D preserves real stability. We conclude that the polynomial in (7) is real rooted. This proves the first assertion of the proposition.
For the second, we see that, The third assertion follows from Lemma 11.
We now give a succinct formula the expected characteristic polynomial over all partitions. We do this by deriving a formula for the expectation of this polynomial over a random partition, which turns out to be an MDP.
Lemma 22 Given a k-tuple of matrices A = (A (1) , · · · , A (k) ) in M n (C), we have, Proof. By the definition of an MDP, the right hand side is a uniform average over all rk−partitions of [n]. Consider the following two stage sampling process: 1. Choose an r−partition X ∈ P r .

For each
We claim that the partition {S (i) j } j≤r,i≤k output by the procedure is uniformly random. To see why, observe that the process constructs a random allocation [n] → [r] × [k] of elements to subsets by choosing the first coordinate uniformly and then the second, which certainly generates a uniformly random allocation. Labeling the r copies of A (i) as A (i,j) with j = 1, . . . , r, we have: as desired.
Theorem 23 Let A = (A (1) , · · · , A (k) ) be a k tuple of Hermitian matrices in M n (C). Then, there is a partition S = (S 1 , · · · , S r ) of [n] such that, Proof. Iterating the third statement in proposition (21), we see that there is a partition S = (S 1 , · · · , S r ) of [n] such that, From the definition (5), it is clear that, Also, by lemma (22), we have that, This completes the proof.
Finally, we prove a bound on the largest root of the expected characteristic polynomial above. This follows by observing that the MDP of a tuple of matrices can be written as the mixed characteristic polynomial of related matrices, which immediately allows us to transfer known bounds for roots of mixed characteristic polynomials to MDPs.
Lemma 24 Let B (1) , · · · , B (k) be positive semidefinite matrices in M n (C) and suppose {v i j } i≤k,j≤n are vectors in C n such that i.e., B (i) is the Gram matrix of the v i j . Letting, we have that, Proof. Define random vectors r 1 , · · · , r n by letting r j be the random vector taking values in k C n ⊕ · · · ⊕ C n that takes values 0 ⊕ · · · ⊕ j √ kv i j ⊕ · · · ⊕ 0 for i ∈ [k] with probability 1/k apiece. We note that, E r j r * j = X j , j ∈ [n]. By the definition of the mixed characteristic polynomial, we have that, We note that, We conclude that, The expression on the right equals x (k−1)n χ[k B (1) , · · · , k B (k) ], by definition.
Remark. Lemma 24 shows that the MDP, which is an expectation over uniform k−partitions of [n], can be written as a mixed characteristic polynomial in the covariance matrices X i which encode the (uniform over [k]) distribution of each coordinate in [n]. With minor modifications the lemma can be generalized to the setting of distributions over nonuniform partitions, in which each coordinate i ∈ [n] is assigned to an element of [k] independently according to some distribution µ i . With this setup, the operation of taking a mixture of two such distributions µ := µ 1 ⊗ . . . ⊗ µ n and ν := ν 1 ⊗ . . . ⊗ ν n simply corresponds to taking an average of the corresponding covariance matrices X i , yielding another mixed characteristic polynomial, which is necessarily real-rooted by Theorem 13. This fact can be used to give an alternate proof of Lemma 21, since the relevant averages of the conditional expected polynomials which appear during the interlacing argument are simply mixtures of such distributions. We have chosen to give a direct proof to keep the presentation self-contained, but would like to mention that conceptually the interlacing argument for MDPs works for the same reason as it does for mixed characteristic polynomials. The above identity allows us to use the estimates of Theorem 13 on roots of mixed characteristic polynomials to get effective upper bounds for mixed determinantal polynomials.
Theorem 25 Let A (1) , · · · , A (k) be a k tuple of zero diagonal Hermitian contractions, where k ≥ 2. Then, Proof. Define the matrices, We note that, Letting the X i be as in the proof of the previous lemma, we note that, yielding that X 1 + · · · + X n is a positive contraction. Further, given that all the vectors v i j (in the proof of the previous lemma) have squared lengths 1/2, we note that, Theorem 13 now yields that, This in turn yields that, Proof of Theorem 1. Given any zero diagonal n × n Hermitian A = (A (1) , . . . , A (k) ) and r, Lemma 21 tells us that there exists a partition X ∈ P r such that By Theorem 25, the right hand side is at most 3 √ 2/ √ rk. By Theorem 6 this means that for every i = 1, . . . , k we have In order to make the latter quantity at most ǫ it is sufficient to take r = 18k/ǫ 2 , as desired. We now state a corollary that gives paving estimates for non-Hermitian matrices. We also indicate a simple trick that allows us to get blocks that are all uniformly bounded.
Corollary 26 If A is a zero diagonal contraction, and r is an integer, then there is a two-sided paving of A with r such that each block has norm at most 12 √ 2/ √ r. Further, there is a paving into 2r blocks such that each block has size at most ⌊m/r⌋ and norm at most 12 √ 2/ √ r.
Proof. By applying Theorem 1 to the tuple S = (Re(A), − Re(A), Im(A), − Im(A)), we may obtain a paving X 1 , . . . , X r such that, This implies that, This in turn implies that, For the second assertion, take a paving of A into r blocks {X 1 , · · · , X r } such that each block has norm at most 12 √ 2/ √ r. Subpartition each block X i which has more than ⌊m/r⌋ elements into sub blocks, all of which save at most one have size at least ⌊m/r⌋ and with at most one sub block of smaller size. Suppose we get s blocks of size ⌊m/r⌋ and t blocks of smaller size (note that t ≤ r.
We have that s⌊m/r⌋ + t ≤ m, yielding that the total number of blocks s + t is at most 2r. Finally since passing to a sub-block cannot increase the norm, we get the required norm estimate.

A Simplified Proof of the Quantitative Commutator Theorem
In They also remarked that optimizing their proof yields a bound of e O( √ log m log log m) . In this section, we explain how to use Theorem 1 to give a simple proof of a slight improvement of the main theorem of [JOS13]. The main idea is the same as that paper, and consists of three steps. Given A ∈ M m (C), the matrix B will be chosen to be diagonal and with entries in the square 1. Given an A ∈ M m (C), write it as an r × r block matrix containing r 2 significantly smaller matrices such that the diagonal blocks A ii , i = 1, . . . , r are square and have smaller norm.
2. Recursively solve the commutator problem for these blocks, to obtain B ii and C ii such that A ii = [B ii , C ii ] and every B ii is diagonal with all eigenvalues in the square S : 3. Show that for the same B, the off-diagonal blocks C ij may be chosen to have small norm. This is done by appealing to an explicit solution of the equation A ij = B ii C ij − C ij B jj due to Rosenblum [ROS56], which implies that the C ij have small norm whenever the spectra of B ii , B jj are separated, which is achieved by appropriately embedding these spectra in a tiling of S by r smaller well-separated squares.
Following [JOS13], let λ(A) denote the smallest norm of a matrix C such that A = [B, C] with the spectrum of B contained in S. Then we have the following, which is closely related to Claim 1 from [JOS13].
Lemma 28 Suppose A is an m × m zero diagonal contraction and X 1 , . . . , X r is a partition of [m], with r 2 a perfect square. Then, Proof. The proof is identical to the proof of Claim 1 from [JOS13], with minor adjustments. Let B ii , C ii be matrices with A ii = [B ii , C ii ], for i = 1, . . . , r such that the spectra of the B ii lie in S and C ii ≤ λ(A ii ). Subdivide S into r = √ r × √ r smaller squares S 1 , . . . , S r of side length 2/ √ r, and let B ′ ii be B ii scaled by 1/(2 √ r) and translated by a multiple of the identity so that its spectrum lies inside S i at distance at least 1/(2 √ r) from ∂S i . Since translation does not change the commutator, we still have We now handle the off-diagonal blocks. A result of Rosenblum [ROS56] asserts that for any square matrices A, S, T of the same dimension, the matrix where γ is a positively oriented simple closed curve containing the spectrum of S and excluding the spectrum of T . One can easily verify by the residue theorem that this solution also works when A is rectangular and S, T are square but of possibly different dimensions. Apply this with A = A ij , S = B ′ ii , T = B ′ jj , and γ = ∂S i to obtain a solution C ij . Note that (z − S) −1 , (z − T ) −1 ≤ 2 √ r on γ, and the length of γ is at most 8/ √ r so by the triangle inequality: Since the norm of C is at most the sum of the norms of its cyclic diagonals (i.e., submatrices containing blocks C ij for i − j = ℓ (mod m) for ℓ = 1, · · · , (r 2 − 1)), and the norm of each diagonal is the maximum over the blocks it contains, we conclude that We now apply this recursively, together with a use of corollary (26), √ log(m) . To finish the proof, we remove the constraint on r. Replace r by the next even perfect square, which is at most 4r (since the ratio between consecutive even squares is (2r + 2) 2 /4r 2 = 1 + 2/r + 1/r 2 ≤ 4). The final bound is at most the expression evaluated at r = 4e This yields a B with B ≤ √ 2 and C ≤ 200 e 9 √ log(m) , as desired. We have not made any attempt to optimize the constants in the above proof.

Concluding remarks
We end this paper with three remarks; we first show how our multi-paving bounds are asymptotically sharp, both in the number of matrices k and the desired norm ǫ. We next make a calculation with characteristic polynomials of signed adjacency matrices of graphs to get a weak result concerning eigenvalue bounds for adjacency matrices of non-bipartite regular graphs. Thirdly, we discuss a vectorial version of Anderson paving.

Tightness of bounds
The bounds in the main theorem on multi-paving are sharp. For any k ∈ N and ǫ < 1, one has examples of k zero diagonal matrices that need at least r = k⌊ǫ −2 ⌋ blocks to be jointly (r, ǫ) paved.
Fix a positive integer k and ǫ < 1 and let m = ⌊ǫ −2 ⌋. Let U be the cyclic left shift on C k and let F m be the m × m Fourier matrix, for ω = e 2πi/m . Note that F m is a unitary and every element of F m has absolute value equal to m −1/2 . We now define, For the final matrix A (k) , we use an idea of Casazza et. al. from [CEKP07]. A conference matrix C m in M m (C) is a zero diagonal Hermitian such that C m C * m = m − 1. As pointed out in the abovementioned reference, conference matrices are known to exist for infinitely many integers m. Define the k'th matrix in our tuple as, A simple calculation shows, Proposition 30 With A (1) , · · · , A (k) as above, suppose X 1 ∐ · · · ∐ X r = [km] is a paving such that, then, we must have that the X i are all singletons. Consequently, if (A (1) , · · · , A (k) ) can be jointly (r, ǫ) paved, then r ≥ k⌊ǫ −2 ⌋.

A weak result concerning lifts of non-bipartite regular graphs
In [MSS13], the authors solved a conjecture due to Bilu and Linial [BiLu], and used this to show the existence of infinitely many d regular bipartite Ramanujan graphs for every d > 2. They showed that for every d regular graph, there is a signed adjacency matrix A s such that λ max (A s ) ≤ 2 √ d − 1. We now make a calculation that is relevant to non-bipartite graphs.
A straightforward calculation yields that the mixed determinantal polynomials χ[A s , −A s ] where s runs over all signings of the graph form an interlacing family and that, where m G is the matching polynomial of the graph. Using the Heilman-Lieb bound [HeLi], we see that there is a signing such that, Together with theorem (6), we deduce that there is a signing such that, This bound doesn't however yield anything new; Friedman [Fri04] has shown that for any ǫ > 0, with high probability, the non-trivial eigenvalues of any sufficiently large d regular graph lie in

Vectorial versions of Anderson Paving
It is easy to deduce vectorial versions of Anderson paving. Here is one such version; we seek a paving of a block matrix that respects the block structure. We use the expression ∆ to denote the map sending a matrix to its diagonal.
Theorem 31 There is a universal constant C such that for any ǫ > 0 and r, n, k ∈ N, given a matrix A ∈ M k (C) ⊗ M n (C) such that, there is a collection of diagonal n × n projections P 1 , · · · , P r such that P 1 + · · · + P r = I and where r ≤ Ck 4 ǫ −2 such that, ||(I ⊗ P i )A(I ⊗ P i )|| < ǫ ||A||, i ∈ [r].
If A ∈ M k (C) ⊗ D n , where D n is the diagonal subalgebra, we make take r ≤ D kǫ −2 for a universal constant D. In this case, the estimate is tight upto constant factors.
Write A = (A ij ) i,j∈ [k] for matrices A ij ∈ M n (C) and let X 1 ∐ · · · ∐ X s = [n] be a (s, ǫ/k) joint paving for the k 2 matrices (A ij ) i,j∈ [k] . By theorem (1), we may take s = Ck 4 ǫ −2 for a suitable universal constant C.
The desired projections are precisely P i = I ⊗ P X i for i ∈ [s]. Writing A ij (l) for P X l A ij P X l , we have that ||(I ⊗ P l )A(I ⊗ P l )|| equals the norm of the block matrix A(l) = (A ij (l)) i,j∈ [k] . A(l) is a k × k block matrix with each entry being a square matrix with Rank(P l ) rows. Further, each entry has norm at most ǫ/k and it is easy to see that the norm of A(l) is at most ǫ (for instance by writing it as a sum of k matrices, each with a single block on each row and column).
For matrices in M k (C) ⊗ D n , we only need to take a (s, ǫ) joint paving for k matrices, which explains the better result. In this case, the estimate is asymptotically optimal by proposition (30).