Universal adversarial examples and perturbations for quantum classifiers

Abstract Quantum machine learning explores the interplay between machine learning and quantum physics, which may lead to unprecedented perspectives for both fields. In fact, recent works have shown strong evidence that quantum computers could outperform classical computers in solving certain notable machine learning tasks. Yet, quantum learning systems may also suffer from the vulnerability problem: adding a tiny carefully crafted perturbation to the legitimate input data would cause the systems to make incorrect predictions at a notably high confidence level. In this paper, we study the universality of adversarial examples and perturbations for quantum classifiers. Through concrete examples involving classifications of real-life images and quantum phases of matter, we show that there exist universal adversarial examples that can fool a set of different quantum classifiers. We prove that, for a set of k classifiers with each receiving input data of n qubits, an O(ln [k]/2n) increase of the perturbation strength is enough to ensure a moderate universal adversarial risk. In addition, for a given quantum classifier, we show that there exist universal adversarial perturbations, which can be added to different legitimate samples to make them adversarial examples for the classifier. Our results reveal the universality perspective of adversarial attacks for quantum machine learning systems, which would be crucial for practical applications of both near-term and future quantum technologies in solving machine learning problems.

In this Supplementary Material, we provide more details about the proofs of the two theorems, structures of the quantum classifiers, quantum encoding for classical data, training and attacking processes, and the algorithms for obtaining universal adversarial examples and perturbations.
A. PROOF FOR THEOREM 1 In addition to the ones in the main text, we first give more notations and definitions to formulate the problem.
We also introduce the following Lemma A1, which has already been obtained in Ref. [1]. Here, we recap the statement and sketch the proof for completeness.
Lemma A1. For a quantum classifier C i that takes ρ ∈ SU (d) according to the Haar measure µ(·) as input and has a misclassified set E i . Suppose the adversarial input state ρ is restricted by D HS (ρ, ρ ) ≤ to the clean data ρ. Then to guarantee an adversarial risk R i , is bounded below by To prove Lemma A1, we further introduce the following two lemmas together with their brief proofs.
Lemma A2. (Theorem 3.7 in [2]) Suppose the input samples are chosen from H with metric D(·) such that H is in (α, β)-normal Levy group. For each classifiers C i and risk µ(E i ), consider additional perturbation ρ → ρ , ρ, ρ ∈ H and D(ρ, ρ ) ≤ . If the adversarial risk µ(E i, ) is guaranteed to be at least R i , then 2 must also be bounded by Proof. We decompose the perturbation = 1 + 2 . First construct a 1 such that µ(E i ) > αe −β 2 1 d . Consider two cases for whether µ(E i ) ≤ 1 2 .
Proof. First apply isoperimetric inequality [3,5], which states that for H ⊆ H, dim(H) = d and µ(H ) ≥ 1 2 , where and v goes through all unit tangent vectors in H. Combining (S4) and (S1) we can deduce that According to [6], for SU (d) equipped with Hilbert-Schmidt metric, is the Hilbert-Schmidt metric and v is any unit tangent vector in SU (d). Then from [7] G(v, v) = 1. Therefore, R(H) = d 2 . This indicates that we can rewrite (S5) as Combining (S3) and (S6), it is shown that for a classifier C i with risk E i which takes ρ ∈ SU (d) as input and the Hilbert-Schmidt metric, to bound above adversarial risk with R i , the adversarial perturbation is bounded below by Hence, we have completed the proof for Lemma A1. Now, we continue to prove Theorem 1 in the main text by using the Ineq. (S2). We consider a set of quantum classifiers C i , i = 1, ..., k with misclassified set E i , i = 1, ..., k. Our goal is to calculate µ(E ) for a given perturbation. Consider the set E set = ∩ k i=1 E i of original data that is misclassified by all classifiers in the set. If we assume an additional condition E set = ∅, then we can construct a quantum classifier C * that misclassifies all ρ ∈ E set and correctly classifies other states in H. Then we apply (S2) to this classifier C * and can deduce that to guarantee a risk larger than R 0 , the perturbation is bounded below by If the additional constraint is not satisfied, i.e. ∩ k i=1 E i = ∅, then we can not directly construct a quantum classifier C * . In this case, we notice that . Therefore µ(E ) can be bounded below by: Hence, if we attach a perturbation that ensures µ(E i, ) ≥ R 0,i = k−1+R k for each classifier C i , then the universal adversarial risk will be bounded below by R. Replacing R and µ(E i ) in (S2) with R 0 and µ(E) min , we finish the proof by arriving at the inequality: It is worthwhile to mention that Ineq. (S9) holds regardless of whether the additional assumption E set = ∅ is satisfied or not. When E set = ∅ is satisfied, the problem reduces to the case for the single classifier C * . Yet, we cannot tell which inequality, either Ineq. (S7) or (S9), gives a tighter bound as we have no information about the value of µ(E set ) and µ(E min ). In our numerical simulations, among the test set containing 100 ground states of the Ising model, we find that there are five samples that can be misclassified by all eight quantum classifiers without adding any perturbation. This indicates that the additional condition might be satisfied in practice.

B. PROOF FOR THEOREM 2
In this section, we provide the details of the proof for Theorem 2 with some further discussions. Following the definitions in the main text, the adversarial operatorˆ is unitary, and henceˆ −1 is also unitary. Then by applying the property of unitary transformation, we have This indicates that the adversarial risk remains the same after we perform the same unitary perturbation operationˆ on every input quantum state ρ ∈ H. We randomly pick ρ ∈ H according to the Haar measure. For each selection, the probability of misclassification occurrence is µ(Eˆ ) = µ(E). Therefore, we can regard each selection as a random variable, which will be 1 when misclassification occurs and 0 otherwise. Then, we apply Hoeffding's inequality for independent Bernoulli random variables and get with probability at least 1 − δ (δ > 0) This proves the first part of the Theorem 2 in the main text.
To obtain an lower bound for µ(E), we further resort to the no free lunch theorem [8] and its reformulation in the context of quantum machine learning [9,10]. Unlike in Ref. [9], where quantum input and output are considered, our discussion is restricted to classification problems in which the output is classical labels. To this end, here we give a loose estimation for the lower bound of µ(E) with some additional constraints according to our numerical simulations.
In our consideration, the quantum classifier takes two steps to classify input samples. In the first step, the classifier takes a quantum state ρ ∈ H as input and undergoes a variational circuit to arrive at the output state ρ out belonging to a ddimensional Hilbert space. In the second step, the classifier outputs a label s ∈ {0, 1, ..., d − 1} according to the largest probability among 0|ρ out |0 , 1|ρ out |1 , ..., d −1|ρ out |d −1 . Based on this, our analysis of µ(E) will lead to an average performance bound for the classifier [9].
In the first step from ρ to ρ out , the quantum ground truth is defined as a unitary process t. Without loss of generality, we may restrict our discussion to the case of quantum pure states. The training set is rewritten as S N = {(|ψ 1 , |φ 1 ), ..., (|ψ N , |φ N )} and the classifier learns a hypothesis operator V , which is a unitary process such that t|ψ i = V |ψ i = |φ i for the training set. The quantum risk function is defined as [11].
where ||A|| 1 is the trace norm for matrices [12]. Now the quantum no free lunch theorem is described as below. Lemma B1.(Quantum No Free Lunch) The quantum risk function in a classification task averaged over selection of quantum ground truth t and training set S N with respect to the Haar measure can be bounded below by The proof of this lemma and more discussions about its implications are provided in Refs. [9,10]. Here, we use this lemma to obtain Ineq. (4) in the main text. Noting that ||A|| 1 ≤ 1, hence for all the ρ = |ψ ψ| ∈ E, This means that R t (V ) ≤ 1, regardless of whether the quantum data is correctly classified or not.
Then we come to the case when a quantum input is classified correctly. Without loss of generality, we can assume that the ground truth gives true label and output state t|ψ = |i , then since the quantum data is correctly predicted, i|t|ψ ψ|t † |i ≥ 1 d . From this inequality, we obtain that the fidelity F (V |ψ , t|ψ = |i ) ≥ 1 d . We can utilize the relation between fidelity and the trace norm where ρ, σ denote arbitrary quantum states. Hence, for correctly classified quantum data we have As a result, the integral in Eq. (S12) is bounded by (S15) Combining (S13) and (S15), we obtain a lower bound of µ(E) averaged over ground truth t and training set S N This completes the proof of Theorem 2.
In such a variational circuit model, we first prepare the (m+n)-qubit input state to be |ψ in ⊗|1 ⊗m , where |ψ in is an n-qubit state that encodes the complete information of input sample to be classified. Then we apply a unitary transformation, which is composed of p layers of interleaved operations, on the state. In each of the p layers, there are two rotation units each performs arbitrary Euler rotations in Bloch sphere and an entangler unit consisting of CNOT gates between each pair of neighboring qubits. The adjustable parameters are the rotation angles and are collectively denoted as Θ. This generates a variational state: where denotes the unitary operation for the i-th layer, with U ent representing the unitary operation generated by the entangler unit.
The brief structure of the QCNN and its hyperparameters utilized in this paper is shown in Fig. S1(b). The structure of the QCNN is the same as in Ref. [21].
In our numerical simulations, we only focus on twocategory classification problems. Thus, we only need one qubit to encode the labels y = 0, 1. After the variational circuits, the state of the output qubits becomes ρ out . We compute P(y = m) = Tr(ρ out |m m|) and then assign y = 1 if P (y = 1) ≥ P (y = 0) and y = 0 for other cases.

II. Quantum encoding for classical data
In the main text, one of the numerical simulations we did is based on the images of handwritten digits. In this dataset, the images are encoded classically, i.e. the data is encoded into a m-dimensional vector v in R m . To make such classical data processable to quantum classifiers, we need to convert the classic vector into a n-qubit quantum (pure) state in a d = 2 n dimensional Hilbert space. This converting process is called a quantum encoder. In this paper, we use the amplitude encoder to transfer classical data into quantum states [13,15,26,[29][30][31][32][33][34][35][36].
For an amplitude encoder, each component of v is then represented by the amplitude of the n-qubit ket vector |ψ in represented in computational basis. Without loss of generality, we assume that m = 2 n is a power of 2, otherwise we can attach 2 n − m zeros to the end of the vector v so that it can be transformed into a n-qubit pure state. The encoder can be realized by a circuit and the depth of the circuit is linear with the number of features [37][38][39]. Under certain conditions, a polynomial size of gate complexity over m might be needed [40,41]. Such encoding procedure can be improved using a more complex approach like tensorial feature maps [13].

III. The training process of quantum classifiers
In classical machine learning, different loss functions are introduced when training the networks and estimating the performance. In numerical simulations, we employ a quantum version of cross-entropy as where q = (q 1 , q 2 ) is the diagonalized expression of output state diag(ρ out ) and p = (1, 0) for y = 0 and p = (0, 1) for y = 1. In the training procedure of a quantum classifier, an optimizer is used to adjust the parameter Θ to minimize the empirical loss function L N (θ) = 1 N N i=1 L(h(|ψ i ; Θ), p i ). In recent years, a large family of gradient-based algorithms have been broadly used in training classical and quantum neural networks [42][43][44][45][46]. In the numerical simulations in this research, we use Adam optimization algorithm [45,46], which is a gradient-based learning algorithm with adaptive learning rate.
To find the minimization of the loss function using multiple-step gradient-based methods, we need to calculate the gradient of L N (Θ) over parameter Θ.

Each component of the gradient is represented as
where θ is one of the parameters of Θ. Owing to the special structures of the quantum classifiers, we use the "parameter shift rule" [47][48][49] in our numerical simulations to obtain the gradients required.
In Fig. S2, we plot the average loss and accuracy of some of quantum classifiers in our classifier set during the training pro-  a)], where i = 1, ..., p refer to the number of layers, k = 1, ..., m + n denotes the number of qubit. After obtaining the output state |φout , we compute the probabilities of projection measurements to predict and assign a label that corresponds to the largest probability. (b) The illustrative structure of the QCNN classifier. This circuit contains six convolutional layers labeled by C1 to C6, two polling layers labeled by P1 and P2 respectively, and a fully connected layer labeled by F C. The initial parameters are set to random values at the beginning of the training process. cedure. The numerical simulations including the training procedure and adversarial attack were done on a classical cluster using Yao.jl [50] and its extension packages in Julia language [51]. To run the simulation on GPU, i.e. to perfectly fit the mini-batch gradient descent algorithm, we use CuYao.jl [52]. This package is an efficient extension of Yao.jl on GPU calculation that can obtain a speedup. Flux.jl [53] and Zygote.jl [54] packages are used to calculate the differentiation of the function. We note that the overfitting risk is low as the loss of the training data and validation data is close [55].
In Table S1, we list the number of parameters for each quantum classifier used in this paper, and their final accuracy in classifying the ground states of the 1D transverse field Ising model.

D. ADVERSARIAL ALGORITHMS
In this section, we provide more details on the algorithms for obtaining adversarial examples and perturbations.
When proposing an adversarial attack on a quantum classifier that takes quantum data as input, we maximize the adversarial risk µ(E) mentioned in the main text. However, in practice µ(E) is typically inaccessible. Hence, we consider maximizing the loss function instead. It is worthwhile to mention that a maximal loss function value does not always indicates a maximal risk. In the quantum scenario, we denote the adversarial perturbation attached to the quantum sample as an operator U δ that acts on the given input state. The maximization problem of adding perturbation can be described as: where Θ * denotes the optimized parameters after the training process, ∆ are the possible perturbations that can be added, |ψ is the original input state and p is the correct label. In the case of studying universal adversarial examples, we have a test set T M = {(|ψ 0 , y 0 ), ..., (|ψ M , y M )} and a set of quantum classifiers which learn hypothesis functions h 1 , h 2 , ..., h k . In order to obtain universal adversarial examples that can deceive all the quantum classifiers in the set, we solve the following optimization problem: where U j δ is the perturbation for the j-th sample. For the case of obtaining universal adversarial perturbations, we use an identical perturbation to implement the adversarial attack to all samples in the test set T M . In this case, we have one quantum classifier and its hypothesis function h. The maximization problem can be expressed in the similar form In general, the set ∆ can be the set of unitary operators that are close to the identity matrix. We use automatic differentiation [56] to improve precision when applying the perturbation. In practice, we restrict the set ∆ to be the product of local unitary operators near the identity matrix.
In the white-box attack scenario, the attacker has the full information of the classifiers, including their inner structures and loss functions. The attacker can then calculate the gradient of loss functions ∇L(h(|ψ ; Θ * ), p). In this scenario, we use the quantum-adapted Basic Iterative Method (qBIM) introduced in Ref. [49] to solve the above optimization problems in Eq. (S20) and (S21).
Compared with a white-box scenario, the adversary in a black-box setting does not have complete information of the quantum classifier. In classical adversarial learning, the development of black-box assumption has been divided into several categories. In non-adaptive black-box attack [57][58][59], the adversary knows nothing about the classifier's inner structure but can get access to the training data and analyze its distribution. In adaptive black-box scenario [58,60,61], the attacker can use the classifier as an oracle without extra information provided. Another category is the strict black-box scenario [62], where the data distribution is unknown but the adversary can collect the input-output pairs from the target classifier. In our simulations, we implement the non-adaptive black-box adversarial attack in which we try to use the knowledge of one quantum classifier to attack all quantum classifiers in the set that share the same training set and test set. The result shown in Fig. 2(c) in the main text indicates the effectiveness of such a black-box attack.