Quantum-inspired analysis of neural network vulnerabilities: the role of conjugate variables in system attacks

ABSTRACT Neural networks demonstrate vulnerability to small, non-random perturbations, emerging as adversarial attacks. Such attacks, born from the gradient of the loss function relative to the input, are discerned as input conjugates, revealing a systemic fragility within the network structure. Intriguingly, a mathematical congruence manifests between this mechanism and the quantum physics’ uncertainty principle, casting light on a hitherto unanticipated interdisciplinarity. This inherent susceptibility within neural network systems is generally intrinsic, highlighting not only the innate vulnerability of these networks, but also suggesting potential advancements in the interdisciplinary area for understanding these black-box networks.


Introduction
Despite the widely demonstrated success across various domains -from image classification [1] and speech recognition [2] to predicting protein structures [3], playing chess [4] and other games [5], etc. -deep neural networks have recently come under scrutiny for an intriguing vulnerability [6,7].The robustness of these intricately trained models is being called into question, as they seem to falter under attacks that are virtually imperceptible to human senses.
A growing body of both empirical [8,9,10,11,12,13,14,15,16] and theoretical [17,18,19,20] evidences suggests that these sophisticated networks can be tripped up by minor, non-random perturbations, producing high-confidence yet erroneous predictions -a striking and quite succinct example being the Fast Gradient Sign Method (FGSM) attack [18].These findings raise significant concerns about the vulnerabilities of such neural networks.If their performance can indeed be undermined by such slight disruptions, the reliability of technologies that hinge on state-of-the-art deep learning could potentially be at risk.
A natural question emerges concerning the vulnerability of deep neural networks.Despite the classical approximation theorems [21,22,23,24] promising that a neural network can approximate a continuous function to any desired level of accuracy, is the observed tradeoff between accuracy and robustness an intrinsic and universal property of these networks?
This query stems from the intuition that stable problems, described by stable functions, should intrinsically produce stable solutions.
The debate within the scientific community is still ongoing.If this trade-off is indeed an inherent feature, then a comprehensive exploration into the foundations of deep learning is warranted.Alternatively, if this phenomenon is merely an outcome of approaches to constructing and training neural networks, it would be beneficial to concentrate on enhancing these processes, as have already been undertaken, e.g., the certified Adversarial Robustness via Randomized Smoothing [25,26,27], and the concurrent training strategy [28,29,30,19,31,32,33,34], etc.
In this study, we uncover an intrinsic characteristic of neural networks: their vulnerability shares a mathematical equivalence with the uncertainty principle in quantum physics [35,36].This is observed when Illustration of ∆x and ∆p in a three-layer convolutional neural network trained on the MNIST dataset over 50 epochs.The data's high-dimensional feature space was reduced to two dimensions using the t-SNE (t-Distributed Stochastic Neighbor Embedding) algorithm for easy visualization.(A) Shaded regions indicate the class predictions obtained by the finally trained network, and the colors imposed on individual points indicate the true labels of corresponding test samples.(B) All test samples were subjected to the Projected Gradient Descent (PDG) adversarial attack method [37,38] with ϵ = 0.1 and α = 0.1/4 over four iterative steps.It is seen that these adversarially perturbed samples are evidently deviated from class regions they should be located.(C) The prediction region evolution for the digit '8' is displayed at epochs 1, 21, and 41.More deeper the color is, more confident the prediction is by the network.(D) The shaded area is similar to (C), but with points representing the adversarial predictions of the attacked images, illustrating the temporal impact of the PDG attack on model accuracy.
Taking into account a trained neural network model, denoted as  (, ), where  signifies the parameters and  represents the input variable of the network, we observe a consistent pattern.The network cannot achieve arbitrary levels of measuring certainties on two factors simultaneously: the conjugate variable ∇   (  (, ),  ) (where  denotes the underlying groundtruth label of ) and the input , leading to the observed accuracy-robustness trade-off.This phenomenon, similar to the quantum physics' uncertainty principle, offers a nuanced understanding of the limitations inherent in neural networks.

Conjugate variables as attacks
In quantum mechanics, the concept of conjugate variables plays a critical role in understanding the fundamentals of particle behavior.Conjugate variables are a pair of observables, typically represented by operators, which do not commute.
This non-commutativity implies that the order of their operations is significant and it is intrinsically tied to Heisenberg's uncertainty principle [35,36].A prime example of such a pair is the position operator, xqt , and the momentum operator, pqt = −   qt .Here, the order of operations matters such that xqt pqt is not equal to pqt xqt , indicating the impossibility of simultaneously determining the precise values of both position and momentum.This inherent uncertainty is quantitatively expressed in Heisenberg's uncertainty relation: , where Δ qt and Δ qt represent the standard deviations of position and momentum measurements, respectively.
Drawing an analogy from quantum mechanics, we can formulate the concepts of conjugate variables within the realm of neural networks.Specifically, the features of the input data provided to a neural network can be conceptualized as feature operators, denoted as x , while the gradients of the loss function with respect to these inputs can be viewed as attack operators, denoted as p =    .Here, the subscript  refers to the -th feature of the entire input feature vector.The attack operators, corresponding to the gradients on inputs, hold a clear relationship with gradient-based attacks, such as the FGSM attack (the application of such attacks often involves a sign function, although this is not strictly necessary [42,43]).
This analogy leads us to an inherent uncertainty relation for neural networks, mirroring the Heisenberg's uncertainty principle in quantum mechanics.Providing a trained neural network with properly normalized loss functions, the relation reads: Δ  Δ  ≥ 1 2 (see derivations in Methods).This relation, relying on both the dataset and the network structure, suggests that there exists an intrinsic limitation in precisely measuring both features and attacks simultaneously.This intrinsically reveals an inherent vulnerability of neural networks, echoing the uncertainty we observe in the quantum world.
To intuitively visualize the manifestation of Δ = ( Δ  ) 1/2 and Δ = ( Δ  ) 1/2 within neural networks, we use the MNIST dataset as a representative example.The neural network is trained and subsequently subjected to attacks at each training epoch.
In this scenario, a trained network partitions the hyperspace (the space inhabited by the samples) into distinct regions.A given input, represented as a point in this space, is classified based on the label of the region it falls within.After 50 epochs of training, the shaded areas encapsulate most correctly labeled data points (Fig. 1A).Conversely, the attacks shift these input points slightly, leading to misclassification.The shifted points do not overlap with the regions defined by the trained network (Fig. 1B).
We pay particular attention to class number 8, which exhibits the most interconnections with other classes.This class is further illustrated in Fig. 1C and D. As the training epochs progress, the "effective radius" of the shaded area shrinks, causing the area to gradually coincide with the correctly labeled data points (Fig. 1C).Simultaneously, the "effective radius" of the attacked points begins to deviate further from the shaded regions, and thus from the correctly labeled data (Fig. 1D).
This visualization reveals an inherent tradeoff: a reduction in the effective radius of the trained class corresponds to an increase in the effective radius of the attacked points.These two radii can be conceptualized as the visual representations of the uncertainties, Δ and Δ, highlighting the delicate balance of precision and vulnerability in neural networks.
In addition to the adversarial attacks explored in this study, there exist analogous effective conjugates in other types of adversarial attacks as well [37,39,40,41].While we are currently unable to explicitly define the conjugates associated with black-box attacks as referenced in [44,45], it is plausible that these methods may adhere to the same underlying principle.

Manifestation of the uncertainty principle in neural networks
The shaded areas in Fig. 1A  2B, accompanied by the associated trade-off between accuracy and robustness (Fig. 2A).The Cifar-10 dataset, having a higher complexity than MNIST, poses a potential indeterminacy in identifying a specific class that has more connectivity with other classes.In this case, the average values Δ = Mean(Δ [All classes]) and Δ = Mean(Δ [All classes]) are employed instead.The similar results obtained on Cifar-10 underscore the inherent uncertainty relation that drives the accuracy-robustness trade-off, as demonstrated in Fig. 2C and D.

Discussions Attacking features is more effective than attacking pixels
The pixels in our dataset serve as the raw, unprocessed data, gathered directly from the detectors.
These pixels carry the features that serve as an accurate representation of the real world.While there is a possibility of manipulating these features, it is more common and practical to focus on the pixels themselves.By doing so, we can observe the accuracyrobustness trade-off (Fig. 2E and G), a fundamental concept that is underpinned by the uncertainty relation, as seen in Fig. 2F and H.
However, it is important to note, as evidenced by the testing accuracy results from the MNIST dataset, that there is an initial learning curve or 'kick' that is encountered (Fig. 2E).This is to be expected as the neural network must first familiarize itself with, or 'learn', the features before it can effectively classify the images.
While processing the initial learning stages, it is also worth noting the fluctuation in both Δ and Δ for input pixels.This fluctuation is more pronounced than that seen in the features, highlighting the random exploration nature of the learning algorithm.As illustrated in Fig. 2H, these fluctuations could be attributed to the inherent randomness of the learning process, a factor that is crucial to potentially uncover more optimal weight configurations.

Phenomenon in attacking well designed neural networks
Typically, network structures are scrupulously architected to fit the demands of specific tasks.To address this, we introduce a more advanced network structure that incorporates residual networks and additional convolutional layers.This refined structure increases the accuracy to nearly 90% 1 .One can still observe a clear pattern in the tradeoff between Δ and Δ for both features and pixels (Fig. 2I-L).Besides, this trade-off is also more pronounced for features than for pixels.Understanding this trade-off allows for a more effective optimization of the network structure.In closing, constructing a network structure that best fits the task at hand is pivotal in delivering optimal performance.

Neural network as a complex physical system
As scientific research and engineering become increasingly reliant on artificial intelligence (AI) methods, questions about the future role of human beings in these fields naturally 1 Given that the quantities Δ and Δ  are approximately computed through high-dimensional Monte Carlo integrations, a process that is exceedingly timeconsuming, we can only feasibly perform these computations for the network with such complexity.If they could be calculated more accurately under more complex and accurate networks with stronger computational resources, we believe the calculated patterns will better conform to the expected regularities.
arise.Whether guiding AI or being guided by it, understanding the fundamental principles underpinning these sophisticated structures is paramount.
One approach to glean this understanding is to treat neural networks as complex physical systems, thereby applying principles of physics to elucidate the inner mechanisms of AI.
In the study at hand, it is posited that neural networks, much like quantum systems, are subject to a form of the uncertainty principle.This connection potentially uncovers intrinsic vulnerabilities within the neural networks.A comparison of formulas from these distinct fields is presented in Table 1.Here, concepts from quantum physics such as position, momentum, and wave function are juxtaposed with their counterparts in neural networks: image, attack, normalized loss function, and so on.This comparison not only reveals striking similarities but also indicates that the methodologies employed in physical sciences could potentially be harnessed to investigate the properties of neural networks.
The intersection of AI and physics has the potential to provide novel insights into the intricate complexities of neural networks.For instance, the emergent capabilities exhibited by large language models might be correlated with principles found in statistical physics.Moreover, phenomena such as small data learning could be linked to concepts from Noether's theorem and gauge transformations [47].By drawing inspiration from physical processes such as weak interactions, we can devise innovative generative models, such as "Yukawa Generative Models" [48].Viewing neural networks through the lens of physics can give us a deeper understanding of their structure and functionality from an entirely new perspective.
The synergy between AI and physics, two seemingly distinct fields, could lead to advancements in both domains.It's a twofold benefit: AI could gain from the structured, universal laws of physics, and in return, physics could possibly leverage the predictive and analytical power of AI.

Conclusion
This study reveals the remarkable link between quantum physics and neural networks, demonstrating that these artificial systems, like quantum systems, are subject to the uncertainty principle.This principle, often associated with precision and vulnerability trade-offs, provides new insights into the potential frailties inherent in neural networks.
Our findings also indicate that attacking the features of a neural network can be more effective than focusing on its pixels.This insight could possibly influence the optimization of network structures for better performance.
Meanwhile, viewing neural networks as complex physical systems allows us to apply principles from physics to understand the behaviour of these AI systems better.This interdisciplinary approach not only enhances our comprehension of AI systems but also suggests a wealth of potential applications and advancements in both fields.
As we move forward, further exploration of this accuracy-robustness trade-off and its influence on the design of neural networks will be crucial.While this study provides a valuable perspective on the relationship between quantum physics and AI, additional research is still needed to more comprehensively understand how these principles can be applied to improve neural network robustness and design.

Methods
Detailed methods and materials are given in the online supplementary data.deviation of position σ x i and the standard deviation of momentum σ p i reads Uncertainty relation Eq. ( 5) states a fundamental property of quantum systems and can be understood in terms of the Niels Bohr's complementarity principle [2].That is, objects have certain pairs of complementary properties cannot be observed or measured simultaneously.

Formulas and notations for neural networks
Without loss of generality, we can assume that the loss function l(f Eq. ( 6) allows us to further normalize the loss function as so that For convenience, we refer ψ Y (X) as a neural packet in the later discussions.Note that under different labels Y , a neural network will be with a set of neural packets.
An image X = (x 1 , ..., x i , ..., x M ) with M pixels can be seen as a point in the multidimensional space, where the numerical values of (x 1 , ..., x i , ..., x M ) correspond to the pixel 1 In practical applications, it is rational to only consider the loss function in a limited range l(f (X, θ), Y ) < C under a large constant C, since samples out of this range can be seen as outliers and meaningless to the problem.The loss function can then be generally guaranteed to be square integrable in this functional range.
values.The feature and attack operators of the neural packet ψ Y (X) can then be defined as: Similar as Eq. ( 3), the average pixel value at x i associated with neural packet ψ Y (X) can be evaluated as Since ψ Y (X) corresponds to a purely real number without imaginary part, the above equation is equivalent to: Besides, the attack operator pi = ∂ ∂x i corresponds to the conjugate variable of x i .And we can obtain the average value for pi as

Derivation of the uncertainty relation
The uncertainty principle of a trained neural network can then be deduced by the following theorem: The standard deviations σ p i and σ x i corresponding to the attack and feature operators pi and xi , respectively, are restricted by the relation: We first introduce the standard deviations σ a and σ b corresponding to two general operators Â and B. Then it follows that: In general, for any two unbounded real operators ⟨â⟩ and ⟨ b⟩, the following relation holds If we further replace â and b in Eq. ( 15) by operators â⟨â 2 ⟩ −1/2 and b⟨ b2 ⟩ −1/2 , we can then obtain the property Seeing the fact that [â, b] = [ Â, B], we finally obtain the uncertainty relation In terms of the neural networks, we can simply replace operators Â and B by pi and xi introduced in Eq. ( 9), and this leads to where we have used the relation Note that for a trained neural network, ψ Y (X) depends on the dataset and the structure of the network.Eq. ( 18) is a general result for general neural networks.
In the FGSM attack, the attacked image is of the form: where P = ( ∂ ∂x 1 , ..., ∂ ∂x i , ..., ∂ ∂x M ) and ϵ ′ = ϵ • β 1/2 .In the second line of Eq. ( 20) we have used the property substantiated in [3]: "even without the 'Sign' of the FGSM, a successful attack can also be achieved".From Eq. ( 20), we can then obtain which is the reason that we call pi the attack operator.
2 Evaluation of ∆x and ∆p

Approximation of ∆x and ∆p
In the equation referred to as Eq. ( 4), we encounter complex integrals involving σ x i and σ p i .These integrals are based on loss functions from trained neural networks and are challenging due to their high dimensionality.Specifically, they are 784-dimensional for the MNIST dataset and 3072-dimensional for the Cifar-10 dataset, which makes them impractical to calculate directly.
To work around this complexity, we simplify these multidimensional problems to a single dimension.Here's how we do it using the MNIST dataset as an example: We start by calculating the average value of all the input pixels, which we call X base .
Then, for a given trained classifier with a loss function, l(f (X, θ), Y = 8)-where Y = 8 refers to the loss associated with the label number eight-we focus on one particular dimension, i, of the input X.We keep all other dimensions fixed at their base values, X base .This reduces the loss function to depend on just one variable, x i .
As a result, the complex equation (Eq.( 6)) simplifies to the following one-dimensional integral: l(f (X, θ), Y = 8) 2 dX ⇒ l(f (x i , θ), Y = 8) This integral can now be solved using the direct Monte-Carlo integration method.
To get an overall estimate for label number eight, we randomly pick different i dimensions and then average them using the square-root of the sum: This approach provides only an approximate estimate of the original high-dimensional integrals.While the results may not match the exact values, this estimation is useful as long as we are interested in the comparative trend of ∆X and ∆P , rather than their absolute values.Thus, this approximation is considered acceptable for our purposes.

Integral with Respect to Features and Pixels
In our research, we have employed three distinct neural network architectures.Upon completion of their training, these networks are partitioned into two segments: the feature extractors and the classifiers, as illustrated in Fig. 1.The integration process outlined in Equation ( 22) necessitates the use of an integrand space.This space can consist of either the unprocessed images at the pixel level or the attributes of the images that have undergone processing.Our study takes both scenarios into account in order to compare how the uncertainty principle manifests at each of these levels.
When performing the integration over pixel values, the loss functions associated with the three neural networks serve as the integrands and are evaluated using the Monte Carlo technique, effectively operating within the pixel space.
Alternatively, we initially process the images using the feature extractors, then we Figure 1.
are actually representative of wave functions in quantum physics.Specifically, for the MNIST dataset, we have ten corresponding wave functions corresponding to ten digit number classes.Therefore, the uncertainty relation ΔΔ ≥ 1 2 shown in Fig. 1C and D should be reinterpreted as Δ [class 8]Δ [class 8] ≥ 1 2 , indicating that we are concentrating on the class of number 8.This equation is a clear depiction of the trade-off between Δ [class 8] and Δ [class 8], as depicted in Fig.

Figure 2 .
Figure 2.Results of the three different types of neural networks: a three-layer convolutional network running on the MNIST dataset, a four-layer convolutional network on the CIFAR-10 dataset, and a residual network[46]  with eight convolutional layers on the CIFAR-10 dataset.The term "feature" in the labels represents the results obtained by attacking the features of the input images, while "pixel" corresponds to attacks directed at the pixels themselves.Each neural network underwent training for a span of 50 epochs.The quantities ∆x and ∆p were determined through high-dimensional Monte-Carlo integrations.Subfigures (A), (C), (E), (G), (I), and (K) depict the test and robust accuracy metrics, with the robust accuracy evaluated on images perturbed by the PDG adversarial attack method, using parameters ϵ = 8/255 and α = 2/255 across four iterative steps.Subfigures (B), (D), (F), (H), (J), and (L) illustrate the trade-off relationship between ∆x and ∆p.

Figure 1 :
Figure 1: The three network structures used.

Table 1 .
Comparison of the uncertainty principle between quantum physics and neural networks.The subscript i represents the i -th dimension.For physics, i stands for the spatial coordinates (x , y , and z ), whereas in the context of neural networks, i refers to the i -th feature.When we consider pixels, i simply pertains to the i -th pixel.Additionally, we utilize Dirac notation, for instance, ⟨ xi ,qt ⟩ = ∫ ψ * (X )x i ,qt ψ (X )dX , where ⟨ xi ,qt ⟩ is the expectation value of the i -th dimension.Similarly, ⟨ xi ⟩ = ∫ ψ Y (X )x i ψ Y (X )dX for neural networks.  ,   ,   )  = (  1 , ...,   , ...,   ) ,qt = ⟨( p,qt − ⟨ p,qt ⟩) 2 ⟩ 1/2 Δ  = ⟨( p − ⟨ p ⟩) 2 ⟩ 1/2Take Fig.2C,D,G,H as an example.In the figure, the network only achieves a test accuracy of around 65% due to the relatively simple network architecture.