Primes and polynomials with restricted digits

Let $q$ be a sufficiently large integer, and $a_0\in\{0,\dots,q-1\}$. We show there are infinitely many prime numbers which do not have the digit $a_0$ in their base $q$ expansion. Similar results are obtained for values of a polynomial (satisfying the necessary local conditions) and if multiple digits are excluded. Our proof is based on the Hardy-Littlewood circle method and Fourier analysis of the set of integers with no digit equal to $a_0$ in base $q$.


Introduction
Let a 0 ∈ {0, . . . , q − 1} and let A = i≥0 n i q i : n i ∈ {0, . . . , q − 1}\{a 0 } be the set of numbers which have no digit equal to a 0 when written in base q. For fixed q, the number of elements of A which are less than x is O(x 1−ǫq ), where ǫ q = log (q/(q − 1))/ log q > 0. In particular, A is a sparse subset of the natural numbers. A set being sparse in this way presents several analytic difficulties if one tries to answer arithmetic questions such as whether the set contains infinitely many primes. Typically we can only show that sparse sets contain infinitely many primes when the set in question possesses some additional multiplicative structure.
The set A has unusually nice structure in that its Fourier transform has a convenient explicit analytic description, and is often unusually small in size. There has been much previous work [1,2,4,5,6,10,13] studying A and related sets by exploiting this Fourier structure. In particular the work of Dartyge and Mauduit [7,8] shows the existence of infinitely many integers in A with at most 2 prime factors, this result relying on the fact that A is well distributed in arithmetic progressions [7,12,14]. We also mention the related work of Mauduit-Rivat [15] who showed the sum of digits of primes was well-distributed, and the work of Bourgain [3] which showed the existence of primes in the sparse set created by prescribing a positive proportion of the digits.
We show that there are infinitely many primes in A, and any polynomial P satisfying suitable local conditions takes infinitely many values in A provided the base q is sufficiently large (i.e. provided A is not too sparse). Our proof is based on the circle method, and in particular makes key use of the Fourier structure of A, in the same spirit as the aforementioned works. Somewhat surprisingly, the Fourier structure is sufficient to deduce the existence of primes in A using only existing exponential sum estimates for the primes, and without having to investigate further bilinear sums. Theorem 1.1. Let q > 2 000 000, a 0 ∈ {0, . . . , q − 1} and A = { i≥0 n i q i : n i ∈ {0, . . . , q − 1}\{a 0 }} be the set of numbers with no digit in base q equal to a 0 . Then for any constant A > 0 we have if (a 0 , q) = 1, , if (a 0 , q) = 1.
Thus there are infinitely many primes with no digit a 0 when written in base q.
There is nothing special about the fact we sum up to a power of q; one could sum n up to x instead of q k and have n<x 1 A (n) instead of (q − 1) k in the statement.
We have made no particular effort to optimize the lower bound on q; it is likely that it could be improved significantly. In particular, a more involved calculation shows that q > 2500 is sufficient by the same method, whilst it appears that the method of bilinear sums, Harman's sieve and zero density estimates all have the potential to show the existence of primes missing digits when the base is noticeably smaller. One might conjecture that the result would remain true for all q > 2.
As presented here the bound is ineffective due to the reliance on estimates for primes in arithmetic progressions. However, since these estimates are only used when the modulus is highly composite, in fact Siegel zeros do not play a role, and so the error terms could be replaced by effective ones of size O((q − 1) k exp(−ck 1/2 )) if desired.
An analysis of our method reveals that in fact one can choose digits a 0 , . . . , a k−1 ∈ {0, . . . , q − 1}, and we obtain the same statement for primes p = k−1 i=0 p i q i with p i ∈ {0, . . . , q − 1}\{a i } uniformly over all such choices of a 0 , . . . a k−1 .
Our results hold for q sufficiently large not only because we require A to be not too sparse, but also because we separately get superior L 1 control on the Fourier transform of A as q. A similar feature was present in the earlier work [11]. Theorem 1.2. Let q > exp(exp(2r)), and P ∈ Z[X] be a polynomial of degree r with lead coefficient a r . Then for any A > 0 we have Given a polynomial P it is a straightforward computation to determine whether S(P ) > 0, in which case it takes infinitely many values in A, or whether S = 0 in which case it takes finitely many values in A. (This is because P (Z p ) is a disjoint union of open balls and a finite set of points in the p-adic topology.) In particular, by Hensel lifting we see that Theorem 1.2 shows that there are infinitely many ℓ th powers in A, provided that q > exp(exp(2ℓ)).
Again we have made no particular effort to optimize the lower bound on q. It is clear that the statement must require q to grow with r, since the main term q k/r (q − 1) k /q k is only larger than 1 if q is large enough in terms of r. Presumably this bound would be improved if one used stronger bounds of Vinogradov type for the Weyl sums which appear rather than bounds based on Weyl-differencing for large r, and by less crude numerical bounds. We note that although the implied constant in the error term in the statement of the Theorem depends on the coefficients of P , the lower bound on q depends only on the degree.
Moreover, if b 1 , . . . , b s are consecutive integers then the same result holds provided only that q − s ≥ q 4/5+ǫ and q is sufficiently large in terms of ǫ.
In the case of b 1 , . . . , b s consecutive with q − s = q 4/5+ǫ we see that Theorem 1.3 shows the existence of primes in a set containing x 4/5+ǫ elements less than x. The exponent 4/5 is ultimately related to the 4/5 exponent of Lemma 4.2 for an exponential sum over primes, and represents a limit of our basic method. As with Theorem 1.1, one would hope that utilizing Type I-II sums and Harman's sieve would extend this to sets of smaller density.
The conclusion of Theorem 1.3 holds in the case q = 10 8 and s = 10, so one can choose {b 1 , . . . , b 10 } = {0, 11111111, 22222222, . . ., 99999999}. Thus there are infinitely many prime numbers with no string of 15 consecutive base 10 digits being the same. (Again, we expect 15 to be able to be reduced with slightly more effort.) An analogous statement for the set B for polynomial values also holds, but in the more restrictive region 0 < s < q 1/r2 r −ǫ for arbitrary b 1 , . . . , b s or q − s ≥ q 1/r2 r +ǫ for consecutive b 1 , . . . , b s .

Notation
We use e(x) = e 2πix as the complex exponential and x = inf n∈Z |x − n| to denote the distance to the largest integer. We will use various expressions of the form min(A, α −1 ), which are interpreted to take the value A if α = 0. We use n ∼ N to abbreviate n ∈ [N, 2N ). Any implied constants in asymptotic notation ≪ or O(·) are allowed to depend on the base q and when dealing with polynomials as in Theorem 1.2, the polynomial P , but on no other quantity unless explicitly indicated by a subscript. Outside of Section 4 all quantities should be thought of as k → ∞. In particular, k will implicitly be assumed to be larger than any fixed constant.

Outline
We give an informal sketch the overall outline of the proof, which is essentially an application of the Hardy-Littlewood circle method. We letF X be the Fourier transform (over Z) of the set A restricted to {1, . . . , X}. Thus for X = q k we havê Here we have written n = k−1 i=0 n i q i . It is this factorization ofF q k and the fact that the sum over n i is almost a geometric series which allows us very good Fourier control over A. By Fourier inversion on Z/q k Z We split the contribution up depending on whether a/q k is close to a rational with small denominator or not. This distinguishes between those a when S Λ,q k (a/q k ) is large or not. It turns out thatF q k (a/q k ) is large if a is 'close' to a number with few non-zero base q-digits, but these are somewhat rare and 'spread out' except when a/q k close to a rational with denominator being a small power of q, and so it turns out decomposition is adequate for describingF q k as well as S Λ,q k .
If D is the set of a such that a/q k = ℓ/d + β for some integers (ℓ, d) = 1 of and some β ∈ R with d|β| of size D, we use a L ∞ -L 1 bound to show their contribution is at most One can save a small power of D over the trivial bound on S Λ,q k (a/q k ) for a ∈ D. By using a large-sieve type argument (and the analytic description ofF ) we show equidistribution for a truncated version ofF J ofF q k where J = #D. We then use the explicit analytic description ofF q k to obtain a final bound which is unusually strong. In particular, we important make use of the averaging over different β. This bound loses only a small power of D over the size of the largest individual terms in the sum. Crucially this power decreases to 0 as q → ∞, whilst the power saving in S Λ,q k was independent of q, and so we have an overall saving of a small power of D if q is sufficiently large. This saving shows that these 'minor arc' contributions when D is large are negligible.
Thus only those a/q k which are very close to a rational (i.e. d|β| is small) make a noticeable contribution. In this case the problem simply reduces to estimating primes and elements of A separately in short intervals and arithmetic progressions. For primes this is well known, whilst for the set A this follows from a suitable L ∞ bound onF .
After writing this paper, the author discovered that very similar ideas appeared earlier in the literature, notably in [15,12,14,3]. For simplicity we give an essentially self-contained proof, but emphasize to the reader that many Lemmas appearing are not new. It appears possible that (at least in the case when the base q is large) that an argument similar to the one here might simplify or extend other arguments in the study of digit related functions.
Much of the previous work relied on estimating correlations of primes with digitrelated functions relied on exploiting a certain property of the Fourier transform described in [16] as the 'carry property', which often allowed one to simplify bilinear expressions so the Fourier transform only relied on the lower-order digits. This feature is not present in our work.

Exponential sums for primes and polynomials
We first collect some results for exponential sums for primes and polynomials. The bounds here are well-known, but we give a essentially complete proofs since they differ slightly from some standard references.
Proof. If N d|β| < 1/2 then we let n = n 0 + dn 1 for non-negative integers n 0 , n 1 with n 0 < d and n 1 < N/d. If n 0 = 0 then Thus the terms with n 0 = 0 contribute a total The terms with n 0 = 0 contribute Here we have used the fact that since n 0 = 0 and we sum over n ≥ 1 we must have n 1 ≥ 1.
We now consider the case N d|β| > 1/2. We let n = n 0 + dn

and this gives a bound
Putting these bounds together gives Proof. From [9, (6), Page 142], taking f (n) = e(nα) we have that for any choice of The sum over r is clearly ≪ min(x/t, tα −1 ) and the sum over m is similarly ≪ min(M, (j − k)α −1 ). Putting t and j − k into dyadic intervals and applying Lemma 4.1 to the resulting sums (or the trival bound when j = k) gives a bound Choosing U = V = x 2/5 and simplifying the terms then gives the result. Lemma 4.3. Let P ∈ Z[X] be an integer polynomial of degree r ≥ 2 with lead coefficient a r . Let α ∈ R be such that a r r!α = a/d + β with (a, d) = 1 and |β| < 1/d 2 . Then for any constant ǫ > 0 we have by applying Lemma 4.1. Writing this bound as x r B/Z + x r (log x) (r−1) 2 /B and choosing B = Z 1/2 then gives the result, noting that (log x) (r−1) 2 /2 r−1 < log x.

Fourier analysis
We now establish in turn several properties of the functionF q k , which are the key ingredient in our result.
Lemma 5.1 (L 1 bound). There exists a constant C q ∈ [1/ log q, 1 + 3/ log q] such that Proof. We expand out the definition ofF q k , and let n = k−1 i=0 n i q i be the base-q expansion of n.F The sum over n i is a sum over all values in {0, . . . , q − 1}\{a 0 }, and so is bounded by Here we used a small computation to verify 1≤t≤(q−1)/2 t −1 ≤ log q for all integers q < 20, whilst for q ≥ 20 > 2/(log 2 − γ) (where γ is Euler's constant), we have 1≤t≤(q−1)/2 t −1 ≤ log q−log 2+γ+2/q ≤ log q. (This bound is only relevant to the final lower bound on q; for a qualitative statement a bound O(log q) suffices.) Lemma 5.2 (Large sieve estimate). We have Here C q is the constant described in Lemma 5.1.
Proof. We have thatF We note that the fractions ℓ/d + θ + ǫ with (ℓ, d) = 1, d < 2D and |ǫ| < 1/10D 2 are separated from one another by ≫ 1/D 2 . Thus We note that, writing n = k−1 i=0 n i q i we havê Thus, as in Lemma 5.1, we have and we have the same bound for |F q k (t)| but without the q k factor. We let t = k i=1 t i q −k + ǫ for some t 1 , . . . , t k ∈ {0, . . . , q − 1} and ǫ ∈ [0, 1/q k ). We see that, as in Lemma 5.1 we have Putting this all together then gives the result.
where C q is the constant described in Lemma 5.1 and Proof. The result follows immediately from Lemma 5.1 if B > q k , so we may assume B < q k . For any integer k 1 ∈ [0, k] we havê Using this and the trivial bound |F q j (θ)| ≤ (q − 1) j , for k 1 + k 2 ≤ k we have that

Substituting this bound gives
We choose k 1 minimally such that q k1 > B, and extend the inner sum to |η| < q k1 . Applying Lemma 5.1 to the inner sum, and then Lemma 5.2 to the sum over d, ℓ gives We choose k 2 = min(k − k 1 , ⌊2 log D/ log q⌋). We see that Combining these bounds gives the result.
Lemma 5.4 (L ∞ bound). Let d < q k/3 be of the form d = d 1 d 2 with (d 1 , q) = 1 and d 1 = 1, and let |ǫ| < 1/2q 2k/3 . Then for any integer ℓ coprime with d we have for some constant c q > 0 depending only on q.

Minor arcs
We now use the exponential sum estimates from the previous sections to show that when α is 'far' from a rational with small denominator the quantityF q k (α)S Λ,q k (−α) andF q k (α)S P,q k (−α) are typically small in absolute value.
Here α q is the constant described in Lemma 5.3.

Proof. By Lemma 5.3 we have that if
Putting these together gives Recalling that D 2 B < q k and DB < q k /D 0 by assumption, we see that this is , and the first term clearly dominates the second.
By partial summation we see that we obtain the same bound for S Λ,q k (α+O(1/q k )) as the bound for S Λ,q k (α) given in Lemma 4.2. Thus in the case |η| ≪ 1 we obtain the well-known bound

This gives
Recalling that 1 ≪ D ≪ D 0 ≪ q k/2 then gives the result.
Here α q is the constant described in Lemma 5.3.

Proof. By Lemma 5.3 we have that if
By Lemma 4.3 we have Putting these together gives Recalling that D 2 B < q k and DB < q k /D 0 by assumption, we see that this is , and the first term clearly dominates the second. As in Lemma 6.1, in the case we instead sum over |η| ≪ 1, we obtain the same bound as (6.1) with B replaced by 1, since by partial summation we obtain the bound of Lemma 4.3 for S P,q k (α) as S P,q k (α + O(1/q k )). This gives Recalling 1 ≪ D ≪ D 0 ≪ q k/2 gives the result.

Major Arcs
We now considerF q k (α)S Λ,q k (−α) andF q k (α)S P,q k (−α) when α is close to a rational with small denominator.
Proof. This follows immediately from Lemma 5.4, using the trivial bound for the exponential sum involving primes or polynomials.
Lemma 7.2. Let A > 0. Then for D, B < (log q k ) A and d > q we have Proof. If b = 0 then by the prime number theorem in arithmetic progressions in short intervals and partial summation we have Thus the terms with b = 0 contribute Here we used the trivial bound that |F q k (θ)| ≤ (q − 1) k for all θ.
Using the prime number theorem in arithmetic progressions again, we see that Thus we may restrict to d|q, since all other such d are not square-free. Letting ℓ ′ /q = ℓ/d, we see terms with b = 0 and d|q contribute If a = a 0 then the sum over m is (q − 1) k−1 since there are (q − 1) choices for each digit of m apart from the final one, which must be a. If a = a 0 then the sum is empty. Thus Proof. For y ≪ x 1−1/2r we have Thus the values of P (n) < q k are well-distributed arithmetic progressions modulo d < q J and in short intervals of length ≫ q k(1−1/2r) . Therefore by partial summation we have that for b = 0 and d < D In particular, using the trivial bound |F q k (θ)| ≤ (q − 1) k , we have Thus we may restrict our attention to b = 0. Rewriting ℓ/d as ℓ ′ /q J the sum we see that these terms are equal to Putting n, m into residue classes (mod p J ) then gives the result.
By Dirichlet's approximation theorem, for any choice of 0 < D 0 and any 0 ≤ a < q k there exists integers (ℓ, d) = 1 with d < D and a real |β| < 1/DD 0 such that We see that q k ℓ/d + q k β ∈ Z. We use Lemmas 7.2 and 7.1 to estimate the contribution when max(d, q k |β|) < (log q k ) A , and use Lemma 6.1 for the remaining cases. This gives Choosing D 0 = q k/2 we see that the error term is O B ((q − 1) k (log q k ) −B ) provided α q < 1/5 and A is chosen such that A > (B + 5)/(1/5 − α q ). We recall from Lemmas 5.3 and 5.1 that This clearly tends to zero as q → ∞. A calculation shows that α q < 0.198 for q > 2 000 000. This gives the result.
Proof of Theorem 1.2. The proof is essentially identical to that of Theorem 1.1 above. We choose D 0 = q k/2 , and split our summation according to ℓ, d, β such that a a r r!q k = ℓ d + β.
We use Lemma 6.2 in place of 6.1 for max(d, |β|q k ) > q J and Lemma 7. Since D 0 = q k/2 , we see that provided α q < 2 −r /r the error term is small. In particular there is some quantity S such that for any such choice of J < c 1/2 Thus, if α q < 2 −r /r we see that S J converges to S as J → ∞ and that 1 q k 0≤a<q kF q k a q k S P,q k −a q k = S a 1/r r q k/r (q − 1) k q k + O q k/r exp(c 1/2 q k 1/2 ) .
Since α q → 0 as q → ∞, we see that α q < 2 −r /r for q > q 0 (r). From the bound on C q ≤ 1 + 3/ log q, we see that this holds for q ≥ exp(exp(2r)). This completes the proof.

Modifications for Theorem 1.3
In this section we sketch the modifications required to establish Theorem 1.3, leaving the precise details to the interested reader. The results of Section 4 remain unchanged. In Lemma 5.1, instead of equation (5.1) we have If the b i are consecutive integers then this can be improved to min(2q, 1/ q i t ). Thus we can instead take C q = C q,s = 1 + (2 + s)/ log q in general, or C q,s = 2 + 2/ log q if the b i are consecutive. Lemma 5.2 remains unchanged whilst in Lemma 5.3 all occurrances of q − 1 should be replaced by q − s. In particular, we have α q = α q,s = log C q,s q q−s log q log q .
With these values of α q,s and C q,s in place of α q and C q , the arguments and statements of Lemmas 5.4, 6.1, 6.2, 7.1, 7.2, 7.3 all go through as before, except that any occurrence of q − 1 must be replaced by q − s. In Lemma 7.1 we made use of the fact that there were two consecutive digits which were not excluded; clearly this still holds in the cases considered by Theorem 1.3.
The final proofs of Theorems 1.1 and 1.2 then work as before, provided that the constraints α q,s < 1/5 or α q,s < r −1 2 −r hold. If the b i are not neccessarily consecutive then we take C q,s = 1 + (2 + s)/ log q and see that if q is sufficiently large in terms of ǫ and s < q/2 then α q,s ≤ log s log q + ǫ.
If the b i are consecutive then we can take C q,s = 2 + 2/ log q, and see that for q sufficiently large in terms of ǫ we have α q,s ≤ log q/(q − s) log q + ǫ.