Observing how deep neural networks understand physics through the energy spectrum of one-dimensional quantum mechanics

We investigate how neural networks (NNs) understand physics using 1D quantum mechanics. After training an NN to accurately predict energy eigenvalues from potentials, we used it to confirm the NN's understanding of physics from four different aspects. The trained NN could predict energy eigenvalues of different kinds of potentials than the ones learned, predict the probability distribution of the existence of particles not used during training, reproduce untrained physical phenomena, and predict the energy eigenvalues of potentials with an unknown matter effect. These results show that NNs can learn physical laws from experimental data, predict the results of experiments under conditions different from those used for training, and predict physical quantities of types not provided during training. Because NNs understand physics in a different way than humans, they will be a powerful tool for advancing physics by complementing the human way of understanding.


Introduction
In recent times, deep neural networks (DNNs) have made remarkable progress in image recognition, natural language processing, voice recognition, and anomaly detection through numerous technological breakthroughs. Among these breakthroughs, the residual connection prevents gradient loss, even when neural networks (NNs) have deep layers, contributing to the very high image recognition capability [1]. Another example is the attention mechanism, which has succeeded in connecting neurons in distant locations, a shortcoming of convolutional neural networks (CNNs), and has significantly advanced fields such as translation, where relationships between distant words are essential [2,3]. In addition to accuracy improvement, NNs have begun generating new images or sentences by themselves, and their performance has been rapidly improving [4,5]. In a slightly different field, the combination of DNNs and reinforcement learning has rendered humans incapable of competing with computers in table games such as Go and Shogi, where human intuition was previously superior [6,7].
The significant difference between NNs and previous artificial intelligence is that NNs do not require human intuition at the design stage. Earlier types of artificial intelligence relied on humans to predetermine features of objects and perform learning. In contrast, NNs do not require such human intuition and find features in the learning process by themselves. In return, NNs comprise numerous parameters and require high computing power for learning.
In addition, it is difficult for humans to understand how trained NNs operate since they do not have the predetermined features humans want to observe to understand the operation.
Another advantage of NNs is that the input and output can be exchanged in some cases, enabling their application to inverse problems. If the inputs and outputs of the problem are bijections, they can be swapped and trained to create a NN that solves the inverse problem.
The properties of not requiring prior intuition and applicability to inverse problems make NNs promising in natural sciences, where the goal is to reveal unknown problems, and NNs have been used in many fields, including physics [8]. In this paper, the subject of this study is the Schrödinger equation in quantum mechanics, and even if we focus only on the surrounding area, we can see several recent developments related to NNs. Partial differential equations, including the Schrödinger equation, have been solved using NNs [9], potentials have been estimated inversely from wave functions [10,11], and soliton solutions to the nonlinear Schrödinger equation have been investigated using NNs [12]. A new family of NNs inspired by the Schrödinger equation has also been proposed [13].
In these studies, the Schrödinger equation was used, and in this sense, human understanding of physics was already assumed. However, since we consider this study preparation for applying NNs to unsolved physical systems, we do not directly solve the Schrödinger equation using NNs. From this viewpoint, the research conducted in a context most similar to this paper is Ref. [14]. This pioneering study provided NNs with two-dimensional (2D) potentials and the corresponding energy eigenvalues of the ground states. This prior work and our study differ in technical aspects, such as the method for generating the potentials and NN's architecture, but the most significant difference is that we are most interested in how NNs understand physics. In this sense, despite the differences in physical systems and methods, our study shares a common interest with Ref. [15].
The application of NNs to concrete physics problems seems to be progressing well. However, how an NN learns physics is embedded in numerous parameters and is difficult to find.
In particular, whether the NN is merely learning its output pattern or understanding the underlying physics is essential in considering the future use of NNs. If the NN understands physical laws, it should be able to predict physical quantities of various ranges and types that were not used in training. It should also be possible to reproduce physical phenomena and estimate physical quantities even if the laws governing that physical system were not yet known. This study sheds light on these points using one-dimensional (1D) quantum mechanics as a subject. Since our research assumes that we will use NNs to solve unknown physics problems in the future, we only provide NN potentials and a few energy eigenvalues, which are physical quantities that can be measured experimentally.
This paper is organized as follows. First, in Sec.2, after a brief introduction to NNs for physicists, we demonstrate how to generate datasets and training results in this paper. Then, in Sec.3, where we state our main results, we investigate four aspects of how NNs understand physics and the potential for NNs to be used in physics research. Next, section 4 summarizes the results of this paper and provides some insights into the future potential of NNs in physics. Finally, the NN architecture used in this paper is summarized in the appendix.

Setup and training of neural networks
This section first presents a brief introduction to NNs for physicists and then explains how NNs are trained in this paper on 1D quantum mechanics problems in Sec.2.1. Next, we explain how to prepare the datasets in Sec.2.2, and finally, we present the training results in Sec.2.3.

Application of Neural Networks to one-dimensional Quantum Mechanics
Regression using an NN is a kind of variational method if we use a term familiar to physicists. An NN is a function that contains parameters, which are determined to reproduce the best ground truth outputs. This function, inspired by the brain's structure, has several characteristics: • While the variational trial function in physics is typically assumed to be an elementary function or a superposition of elementary functions by some physical intuition, NNs do not require such intuition. In return, NNs have far more parameters than trial functions in traditional variational methods in physics to be sufficiently flexible.
• Since NNs are developed by a composite function of linear transformations and simple functions called activation functions, the back-propagation technique can quickly obtain their derivatives.
• As a technical matter, NNs are beginning to be firmly recognized for their usefulness, and supportive libraries such as Pytorch and Tensorflow have been developed, as well as technologies such as Cuda for faster computation on GPUs. This has enabled experimenting with NN technology to solve different problems easily.
Here, we explain the training of NNs in more detail. Training an NN requires a dataset recorded through experience, experimentation, or artificial data generation. We write this as The flow of this optimization is depicted in Fig.1 Then, the loss resulting from the error between the output and the ground truth data, E (k) i , is calculated. In addition, the first-order derivative of the loss function to each parameter, w α , can be calculated almost simultaneously using a method called back-propagation. The value of w α is updated to decrease the loss using the derivative. Repeating this process until the loss does not significantly decrease is called NN training.
In actual training, instead of adding up all samples in the loss function, a few samples are selected, which is called mini-batch learning. Mini-batch learning saves computer memory, Fig. 1 Training a neural network(NN). An NN is a function, which takes a multidimensional real-valued input and produces a multidimensional real-valued output. Training an NN requires many input and ground truth output datasets. NN learning is the optimization of real valued parameter set, {w α }, to reduce the loss function, M SE({w α }), linked to the difference between the correct data and the prediction by the output of the NN. In this optimization, the back-propagation method, which can quickly calculate the derivative of the loss function to parameters, is indispensable. In this paper, the input is 1D potentials' values on mesh points, and the output is the corresponding energy spectra. We describe the architecture of our NN and the specific method for optimizing its parameters in the appendix. and it is empirically known that mini-batch learning is faster than taking all data. Furthermore, the final results are often better when the batch size is smaller. At least, we do not know the general theory for determining the optimal batch size and have empirically fixed the size to 100 by trial and error in this paper.
Next, we explain how we apply NN learning to a physical problem in this paper. We deal with quantum mechanics in one dimension. A particle is confined to the interval −1 < x < 1 by infinitely high walls. Then, we generate N various potentials between −1 < x < 1 and take their values at equally spaced points as input data v (k) m . For the output data E (k) i , we take ten of the smallest energy eigenvalues corresponding to the potentials (i max = 10). The following subsection explains how to generate the potentials and calculate the corresponding eigenvalues. In this study, E (k) i is obtained by solving the Schrödinger equation numerically. However, because the primary goal of this paper is to see if NN can be trained to understand the physical laws using observed physical quantities, E We use a 1D CNN. In addition, we also use the residual connection and the self-attention layer, which are effective when used in conjunction with CNNs in the natural language processing and image generation fields [1,16]. We adopt the mean squared error(MSE) between the ground truth energy eigenvalues and the outputs from the NN for the loss function. We describe the architecture of our NN and the specific method for optimizing its parameters in the appendix.

How to prepare the datasets
We take the energy spectrum of a particle confined to an interval in one dimension as our output data. Hence, we consider the following potential, where v (k) (x) are arbitrary continuous functions and k takes values from 1 to N, which is the size of datasets.
Next, we use the kernel method to generate the potential functions, v (k) (x). Note that we use the kernel method only to generate the potential functions, not to perform regression using the Gaussian process. We set mesh points on −1 < x < 1 with ∆x (= 2/M ) width as x l = (2l − M )/M (l = 0, 1, 2, · · · M ) to use the kernel method. Let the kernel function be k(x, x ), and then, under our discretization, the kernel matrix is the (M + 1) × (M + 1) matrix defined by K lm = k(x l , x m ). By using this kernel matrix as the covariance matrix, we can probabilistically generate the potential function, as follows: v ∼ N (0, K) (2) where N represents the normal distribution function, with the mean being 0 and the covariance matrix being K. Furthermore, v is an (M + 1)-dimensional vector, whose components are regarded as potential values at x l . We obtain v (k) (x l ) by performing this potential generation for N times.
This study uses four kernel functions to generate the training data: Gaussian kernel , and exponential kernel k (Exp) (x, x ). These functions are defined, as follows: where we defined r = |x − x |. These functions have two parameters: σ, which controls the magnitude of the generated potential, and L, which controls the correlation distance of the potential. These kernel functions can be written concisely using the gamma function Γ(ν), and the Bessel function of the second kind K ν , as follows: The Gaussian, Matérn5, Matérn3, and the exponential kernels correspond to the cases of ν = ∞, 5 2 , 3 2 , 1 2 , respectively. Kernels with larger ν generate smoother potential functions, and the generated potentials are [ν] times differentiable. Here [ν] is the largest integer not greater than ν. Figure 2 shows the potentials generated from each kernel. The four panels in this figure show that the smoothness of the potentials changes as ν increases. Now that we have generated the potentials, which are the input data needed to train the NN, we need to find the energy spectrum of those potentials, the output data. For this purpose, we solve the time-independent 1D Schrödinger equation, where ψ (k) (x) is the wave function and we use Planck's constant h = 2π, and the particle's mass m = 1. Since we are considering the potentials in Eq.(1), the boundary conditions for the wave function are ψ (k) (±1) = 0. In the case of the square potential, v (k) (x) = 0, the energy eigenvalues are, and the corresponding wave functions are, Since we use the ten lowest energy eigenvalues as the ground truth output data, it should have a rich structure. Referring to Eq.(9), we take σ = 100 in Eqs.(3)-(6) throughout this paper expect the dataset for obtaining Fig.5. Figure 3 shows that this setting works in the actual numerical experiment results.
We use the matrix method [17] to solve this eigenvalue problem for general v (k) (x). This method is appropriate for our setting because it enables us to find multiple energy eigenvalues and corresponding wave functions simultaneously. Furthermore, since we have already discretized the problem to generate the values of the potentials on the mesh points, we can directly use this method.
which we defined, as follows (v Here, we have used the boundary conditions vector. This eigenvalue problem is easy to solve because H (k) is a tridiagonal real symmetric matrix. We denote the eigenvectors and corresponding eigenvalues as E , respectively. Note that this normalization of the wave function is natural as a normalization for the matrix eigenvalue problem but deviates by √ ∆x from the ordinary normalization condition of the wave function, Since this is 1D quantum mechanics, the energy eigenvalues have no degeneracy. We adopt ten eigenvalues(i = 0, 1, 2 · · · , 9) for the ground truth output data of the NN.
We have depicted one potential generated from the Gaussian kernel and the ten lowest energy eigenvalues of the potential in Fig.3(c). We note that we have confirmed that these are the ten lowest energy eigenvalues by counting the nodes of the corresponding wave functions.
This subsection explains how to create a set of (M + 1)-dimensional potential vectors, v (k) m , as input data and 10-dimensional energy eigenvalues, E (k) i , as output data for training an NN. Here we used the Schrödinger equation to prepare these datasets, but we did not feed the equation to the NN. We only feed the NN with the set of potentials v to measure energy eigenvalues while varying the potential in various ways. The primary goal of this paper is to demonstrate that NN understands physical laws using only the dataset i )} k=1,··· ,N principally observed in the experiment.
Here, N is the size of the used dataset. In addition, we defined the progress rate as the ratio of the performed update times of {w α } to the total one, 1 × 10 5 . This figure displays the NN's learning process using datasets generated from the Gaussian kernel. The dashed line represents the error for the training data, and the solid line represents the error for the validation data. This figure shows that overfitting is suppressed as the size of the dataset increases, and the effect of overfitting is minimal at N5. The NMSE is less than 10 −4 at the end of the training, and the error in the energy eigenvalue is less than 1% on average for N5. We have confirmed the datasets generated from other kernels show the same trend even though some quantitative differences exist. respectively. Validation data were generated from the same kernel that generated the training data. This figure shows that the error is minor for the dataset with a larger size. In addition, the smoother the potential, the smaller the error.

Understanding Physics with Neural Networks
In Sec.2, we showed that NNs can now predict energy eigenvalues accurately, but it is still unclear whether they understand the physics described by Eq. (8). As described in the Appendix, our NN has 4.9 × 10 6 real number parameters, {w α }, and the real number size of the output of the largest dataset, {E (k) i } k=1,··· ,N , is 5 × 10 6 . This fact may lead us to suspect that the NN does not understand physics but memorizes the output.
In this section, which describes the main results of this paper, we will show in four aspects that NNs do understand physics and that they are helpful for the study of physics. First, in Sec.3.1, we show that NNs can predict energy eigenvalues even for potentials of a different shape than the training dataset. Next, in Sec.3.2, we successfully extract the eigenvectors' magnitudes not used for the training from the trained NN. We then show that NNs can predict unknown physical phenomena in Sec.3.3. Finally, we treat potentials with matter effects unknown to the experimenters to demonstrate the broad applicability of the method in Sec.3.4.
These results show that NNs can learn the physical laws from experimental data, predict the results of experiments under conditions different from those used for training, and predict physical quantities of types not provided during training. Because NNs understand physics in a different way than humans, they will be a powerful tool for advancing physics by complementing the human way of understanding.

Expand the scope of applicable data
In Sec.2.3, the training performance of the NN was verified with validation data generated from the same kernel that generated the training data. However, if the NN understands the physics governed by Eq. (8), it should predict the energy eigenvalues of a potential generated from a different kernel than the one that generated the training data. Therefore, this subsection shows how the NN predicts energy eigenvalues for the validation data generated from a different kernel than the training data generated. Figure 4(a) shows the training progress using the N5 dataset generated from the Gaussian kernel. This figure shows that the error for the validation data generated from the Gaussian kernel is the smallest, but the prediction is not broken, even for datasets generated from other kernels.
We show the error at the final epoch in Fig.4(b) by the kernel that generated the training data. This figure shows that NNs trained with data such as the one in Fig.2(d), generated from the exponential kernel, show promising results, regardless of the dataset type. On the other hand, the NN trained on the dataset generated from the Gaussian kernel is not good at predicting potentials that are not smooth.
We used the NN trained on the N5 dataset generated from the exponential kernel in Fig.4(c) to predict the energy eigenvalues of the same potential as in Fig.3(c). This figure demonstrates that, although there is a slightly significant deviation in the energy eigenvalue of the seventh excited state, the overall prediction is accurate. Though the NN has seen only the jagged potential in Fig. 2(d), it correctly predicted the energy eigenvalues for a smooth potential generated from Gaussian kernel. This result indicates that it understood the physical laws governing this system.
In addition, to test whether the learning works even with highly undulating potentials, we took σ = 1000 and generated a new dataset, N5', with 5 × 10 5 potentials for each kernel.
The same figures as in Fig. 4 are shown in Fig. 5 for this new data set. As can be seen The NN trained with the exponential kernel shows a good prediction for any validation dataset. (c) The NN trained by the N5 dataset generated by the exponential kernel was used to predict energy eigenvalues for the same potential as in Fig.3(c). The energy eigenvalues of the seventh excited state are slightly different, but considering that the prediction is for the potential generated by the the different kernel from the training one, the prediction is excellent. Fig. 5 The same figures as in Fig. 4 for the dataset, N5'(σ = 1000).
from Fig. 5(c), the ten lower energy eigenvalues taken are well below the highest point of the potentials. Figure 5(a) and (b) show that although the learning is well advanced, the error is larger than in the case of N5. This error increase is most likely due to the differences in larger energy eigenvalues; in N5, some of the larger energy eigenvalues were approaching the obvious ones determined only by the boundary conditions, but in N5', all ten energy eigenvalues are nontrivial.
Following that, we quantitatively examine the relationship between the similarity and loss between training and validation potentials. The potentials generated by different kernels appear to have different shapes in Fig.2(a) through 2(d). However, some of the 5 × 10 5 potentials in N5 may be close by chance even for different kernels, and the NNs may simply remember them.
To discuss this, we define a measure of similarity between a training dataset, = min , where c k is taken at∆ for the minimization.
The smaller this number, d, the closer the validation potential is to the training dataset.
In this definition, we find the potential with the smallest L 2 norm to the validation potential in the training dataset and use the norm as our measure. However, because we are concerned with the shape of the potentials, we add a constant, c k , to the training dataset potential to minimize the L 2 . For the validation dataset, 10 4 potentials were generated from the Gaussian kernel, of which the one that minimizes d is shown in Fig 6(a). We used the N5 dataset generated from the Gaussian kernel for the training dataset. The potential that gives the minimum d value in the training dataset, v (k ) m +∆ k is also shown. In other words, this figure shows our measure's closest pair of the 10 4 validation potentials and the 5 × 10 5 training potentials both generated from Gaussian kernel. We can see from this figure that if some of the validation and training data are similar in shape, we can successfully find them with our measure. Similarly, Fig.6(b) depicts the best match between the 10 4 Gaussian kernel validation potentials and the N5 training dataset generated by the exponential kernel. However, it can be seen that their overall undulations are similar, but their shapes are not. The closest match between the validation and N5 training potentials generated by the exponential kernel is shown in Fig.6(c). This figure also demonstrates that while their overall undulations are similar, their shapes are not.
To evaluate the similarity between the validation datasets and the training datasets, the following metric is defined, Here, we take N v = 10 4 and N = 5 × 10 5 .  Finally, we look at how the residual connection and the self-attention mechanism affect NN predictions. For this examination, we omit portions of the NN layers and observe how the loss changes. Figure A1 summarizes our NN architecture. To begin, to see the effect of deepening NN via the residual connection, we examine how the loss changes as the depth of the NN changes.
As shown in Fig.A1, our NN has five blocks with residual connections, which we refer to as "Res-SA blocks." To see the difference when we reduce these blocks from the higher level, we define the following quantity, , where NMSE(1) is NMSE resulting from the NN design that directly leads from the initial Res-SA block to the Flatten layer. The loss rapidly decreases as the NN is deepened using the residual connection for any training dataset, as shown in the left panel of Fig.8. This  The exponential kernel generated the closest potentials between the validation and training datasets. (d) The resemblance of the validation and training datasets.
change shows that it is essential to capture the overall shape of the potential by broadening the field of view of neurons.
Next, we see the effect of the self-attention mechanism: the self-attention layer is in the Res-SA block in Fig.A1, and the contents of the Res-SA block are shown in Fig A2. The ratio of the NMSE evaluated from the NN with and without the self-attention layer is defined as follows, R SA ≡ NMSE( With self-attention mechanism) NMSE( Without self-attention mechanism ) . The quantity, R SA , represents how much the loss is reduced by the self-attention mechanism. As shown in the right panel of Fig.8, the self-attention mechanism reduces losses for all training and validation dataset combinations. It is especially effective when smooth potentials are used for validation. We examine the attention maps to determine what the self-attention mechanism is focusing on. Figure 9 depicts the first and second self-attention maps for the potential depicted in Fig.4(c). We used the NN trained with N5 dataset generated from the exponential kernel. To make the figure easier to understand, we aligned the potential shapes on the left and top sides of the self-attention maps. This self-attention map represents the importance of the relationship between two points on the potential learned by the NN. The figure on the left is the self-attention map placed in the layer closest to the input data and shows that the NN perceived the relationship between two minima of the potential as essential. The figure on the right is a self-attention map placed in the second self-attention layer. The NN perceived the relationship of maxima locations as meaningful in addition to the relationship of minima locations. Thus, attention to the relationship between distant feature points seems to reduce losses. This observation also shows that the NN learns the necessary features by itself through learning.
The findings in this subsection show that NNs can help physics experiments produce more results with fewer practical resources. For example, even when only materials with rough surfaces are available, it would be possible to train an NN with the results of experiments using those materials and then use the trained NN to predict the results of experiments using materials with smooth surfaces. In addition, for a physical system with wide parameter space, we can experiment with a part of it and use the observed results to train the NN. Then, we may raise candidates for parameter regions where interesting physical phenomena would be observed by using the trained NN to search for unknown parameter regions. A schematic diagram is shown in Fig.16.

Elicit information that is not being fed to the neural network
When we trained the NN, we only gave it the potential and the corresponding energy eigenvalues. However, if the NN understands the physics described by Eq. (8) in this process, we may extract a physical quantity other than the energy eigenvalues. This subsection attempts to find the probability of the particle's existence, which is not used in training.  potential value is considered the existence probability of the particle for the ground state.
The same can be said for excited states.
This can also be explained, as follows: f i (v m ; {w α }) should have been learned to approximate E i 's representation. From the discretized quantum mechanics, Eq.(11)-(13), the first-order derivative of the energy eigenvalue to the potential value is the square of the eigenvector, as shown below, This should result in the following relationship if the learning is sufficient: As a property of NNs, the first-order derivative of the argument can be easily obtained using back-propagation. When training an NN, only the derivative to the parameters {w α } This subsection shows that the trained NN provides observable quantity other than the output data used during training. As a technical aspect, we also found that differentiation on input data using back-propagation may be helpful in physics. The same technic called saliency map has been used in object detection [18]. For example, by differentiating the dog probability in the outputs concerning the input image, one can get an idea of where a dog is in an image. It is very similar to the calculation in this subsection in that it focuses on the response to the input perturbation.

Predicting Physical Phenomena with Neural Networks
We reproduce a well-known physical phenomenon here, but the same procedure would be able to predict unknown phenomena. 1D quantum mechanics has no degeneracy of energy eigenvalues, as can be seen from the fact that even the symmetric double-well potential has no degeneracy of energy eigenvalues. Two energy eigenvalues do not intersect, and their eigenstates cross over when the potential changes slowly and continuously. This phenomenon plays an essential role in the Mikheyev-Smirnov-Wolfenstein effect of the solar neutrino problem [19][20][21][22].
The following setup is used to reproduce this phenomenon. First, we generate two potentials, v (1) (x) and v (2) (x), using the Gaussian kernel, and subtract the energy eigenvalues of the respective ground states to create potentialsṽ (1) (x) andṽ (2) (x) adjusted so that the energy eigenvalues of the ground states become zero. We then introduce a real mixing parameter λ ∈ [0, 1] and create a potential v(x) that varies continuously from the linear combination, as follows: This continuously varying potential is shown in Fig.11(a). Figure 11(b) shows the predicted energy eigenvalues from the NN as a function of λ.
The energy eigenvalues are changing without crossing. The NN trained with the N5 dataset investigate what is happening in the vicinity of this value using the method of Sec.3.2. Figure 12(a) shows the existence probability distribution of the particle at λ = 0.75. The ground state is on the left and the first excited state is on the right. Figure 12(b) shows the existence probability distribution of the particle at λ = 0.79. The existence probability distributions of the ground state and the first excited state overlap. Figure 12(c) shows the existence probability distribution of the particle at λ = 0.83. The ground state is on the right, and the first excited state is on the left. In this way, we can see how the eigenstates continuously cross over.
This subsection showed that the trained NN could reproduce the well-known physical phenomenon. Therefore, it is also expected to help humans to find new physical phenomena.

Wide range of applications and experiments necessary for learning
So far, we've looked at the case where the experimenter is fully aware of the potential.
We also know that the physical system was governed by the Schrödinger equation, so we can solve it numerically and obtain the energy eigenvalues in this case. In this subsection, we consider simple toy models to demonstrate that the NN method has a broader applicability range.
Consider the case of an experiment with a particle in the matter. The experimenters can make tunable potentials, V e (x), to the particle by employing an electric field. However, the potentials are expected to be V total (x) in matter due to unknown matter effects to the experimenters. Because the experimenters do not know how matter affects the potential in First, as a simple case, consider that a fixed potential is added as follows(σ = 100), We assume the particle moves in −1 < x < 1, and the experimenters can vary V e (x) in a variety of ways. The experimenters do not know the matter effect, σ sin (πx), and then V total (x), but the energy eigenvalues are assumed to be observable in the experiment. The experimenters can train NN with their generated V e (x) as inputs and the observed energy eigenvalues as outputs.
We here use the N5 dataset as the input potentials, V e (x), for the numerical experiment. The energy eigenvalues used for the ground truth data are calculated from V total (x) using Schrödinger equation to make artificial experimental results datasets. These values correspond to the observed energy eigenvalues without the experimental error.

Case 2
Next, consider more complex case where the matter effect depends on V e (x) as follows, We also use the N5 dataset as the input potential, V e (x), and σ = 100. Figure 14 shows the results of numerical experiments in this case. The left figure shows the input potentials V e (x), 1 2σ V e (x) 2 − σ, V total (x), the ground truth energy eigenvalues, and  their predictions. This figure shows that the energy eigenvalues are also correctly predicted in this case. This figure shows that the energy eigenvalues are also correctly predicted in this case. The NMSE is depicted in the figure on the right. In comparison to Fig.13(b) in Case 1, the error is slightly larger, but learning has progressed sufficiently in this case as well. This diagram shows that NN comprehends the law composed of Eqs. (8) and (25).
Even though V total is not given, the NN can predict the ten energy eigenvalues for given V e in both cases, 1 and 2. These numerical experiments demonstrate that the NN method can be used even when the laws governing the physical system are unknown, as long as sufficient experimental results are available.

Necessary experiments and NN's versatility
As we have seen from the numerical experiments in cases 1 and 2, the NN can be trained to make predictions even if the experimenters do not know the laws governing the physical system. However, it is necessary to provide a variety of inputs and obtain the corresponding experimental results as outputs. As a result, an experimental setup should be prepared to automatically generate various inputs while monitoring the results.
As shown in Fig.13(b) and Fig.14(b), the NN trained on the potentials generated by the exponential kernel predicts well the potentials generated by the other kernels. This finding suggests that if NNs are trained on a variety of easily experimented-with inputs, they can predict the outcomes of experiments even in difficult-to-experiment-with input domains. This observation demonstrates that sufficiently versatile NNs can be created even from experiments in a limited environment.
This NN-based method would also be appropriate when the laws governing a physical system are known but difficult to solve numerically, due to high dimension or multiparticle interactions, for example. Even in such cases, NN can be trained using a variety of experimental results. Once the trained NN is obtained, the results of experiments can be predicted.

Conclusion
This paper investigated how NNs understand physics using 1D quantum mechanics as a subject. In the preparation stage in Sec.2, it was found that a DNN can find energy eigenvalues only from the potentials with errors of approximately 1%.
We showed from four aspects that NNs could understand physics and are helpful for physics research in Sec.3.
• The NN could predict energy eigenvalues, even for different kinds of potentials from those used for training. For example, a NN trained on a dataset created from jagged potentials could successfully predict the energy eigenvalues of a smooth potential.
We also found that capturing the overall potential shape using the self-attention mechanism and deepening the NN through residual-connection improved prediction accuracy.
• The NN could predict the existence probability distribution of particles not used during training. The results show that extracting physical quantities not given during training from the NN is possible.
• The NN predicted the crossover of states when the potential changes slowly and continuously. The results show that NN can reproduce or predict physical phenomena. • NNs are found to be trained with diverse inputs and the corresponding experimental results, even when the laws governing the physical system are unknown to the experimenters. Once the NN has been trained, it can be used to predict the experimental results of the systems.
As shown in Fig.15, how NNs and humans understand physics seems different. Humans understand physics using their intuition from experimental data, discover unifying laws such as the Schrödinger equation that governs physical systems, or build phenomenological approximation models. On the other hand, NNs attempt to understand physical systems by adjusting numerous parameters, taking advantage of their high computational power to explain a large amount of experimental data. Although humans and NNs follow different paths to understand physics, the underlying physics is the same, and, as a result, they can make the same predictions.
The understanding of physical systems by NNs has the following advantages.
• It allows us to tackle problems without prior intuition. The NN used in this paper is a combination of layers used in natural language processing and image recognition, and no part of it has been specially devised for physics.
• We can create a dataset in a region where experiments or numerical simulations are simple to perform and use it to train NNs. As demonstrated in this paper, this trained NN is versatile and can be used to make predictions in conditions that are difficult to experiment with or numerically simulate. Figure 16 depicts a schematic diagram of the procedure.
On the other hand, it has the following drawbacks.
• Massive amounts of experimental data or highly diverse numerical simulation results are required to train a NN with high reproducibility while avoiding overfitting.
Experiments for these preparations must be automated. High computer performance and clever algorithms are required when numerical simulations, such as Monte Carlo methods, generate the training dataset.
• It is possible to improve the approximation's accuracy, but it does not appear to be as simple to obtain the exact solution as the Schrödinger equation so far. Humans These findings demonstrate that NNs can learn physical laws from experimental data, predict experimental results under conditions different from those used for training, and predict physical quantities of types not provided during training. Because NNs understand physics in a different way than humans, they will be a powerful tool for advancing physics by complementing the human way of understanding.

A Architecture of Neural Networks
This appendix describes the architecture of the NN we used in this paper. Our NN consists of a combination of the Resnet and SAGAN structures, which have recently been used with success in natural language processing and image generation [1,16]. The residual connection used in Resnet mitigates gradient vanishing and increases the depth of the NN.
On the other hand, the self-attention mechanism used in SAGAN mitigates the shortcoming Using NN for physics research. Initially, a dataset is prepared in a training area within a physical system that adheres to the same physical laws. If the dataset is created through experimentation, this training area will meet conditions such as temperature and material, making it easy to conduct experiments. If the dataset was generated using a numerical simulation, such as a Monte Carlo method, the training area corresponds to a parameter region with fast convergence. NNs can be trained using data generated by experiments or numerical simulations. As demonstrated in this paper, NNs are versatile and can be used to predict experimental results under difficult conditions to perform experiments or numerical simulations. that convolutional layers cannot relate to distant neurons and is very powerful in natural language processing and image generation.
We began with the overall structure, as shown in Fig.A1. First, the input is a 101dimensional vector corresponding to the potential value at each point. Next, this input is stretched in the channel direction by the blocks containing the convolutional layer, reducing it to 17 dimensions in the spatial direction, whereas it becomes 256 dimensions in the channel direction. Finally, it is flattened to a 1D vector and passed through full connection layers, outputting a 10-dimensional vector. This 10-dimensional vector corresponds to ten small energy eigenvalues. Figure A2 shows the details of the Res-SA block used in Fig.A1. The input first passes through the convolutional layer and then passes through the residual block three times.
Finally, a self-attention layer is used to relate the information about distant locations. We embedded the self-attention layer in the same way as in the SAGAN study [16]. The only difference is that SAGAN needed to flatten a 2D image to 1D to create the attention map, but our 1D potential can skip this process.
In Fig.A3, we show the structure of the residual block. This block consists of three convolutional layers and one residual connection and has the feature that the input and output have the same shape. This structure is the same as the one used in Resnet.
We have employed the ReLU function for all activation functions, totaling 31 times. The total number of parameters used in this network is approximately 4.9 × 10 6 .
Next, we explain the specific method of learning. We employed Adam as the optimization method and varied the learning rate from 1 × 10 −3 (initial value) to 1 × 10 −5 (value at the last epoch) in a geometrical progression. In addition, since learning is more efficient when the input and output values are of order O(1), we multiplied the input and output values by 0.01 and adjusted them to be of that magnitude. The batch size was fixed to 100. Pytorch was the NN framework used, and the GPy library provided the kernel functions used. The Numpy library generated the random numbers from the covariance matrix. For diagonalization of the matrix, we used the Scikit-learn library.  Res-SA block design. The input tensor can change the number of channels in the initial convolution layer. At that time, the kernel size is 5, and the padding size is 0, so the tensor size in the spatial direction is reduced by 4. If there is a pooling layer, the tensor size in the spatial direction is halved there. For the other layers, the size of the tensor does not change. The deep structure increases the flexibility of the function, and the self-attention layer helps relate the information of points far apart in position. For the self-attention layer, we used the same structure as in the SAGAN paper [16]. The details of the residual block are depicted in Fig.A3.

Fig. A3
Design of the residual block. The block consists of two pointwise convolutional layers corresponding to full connection layers in the channel direction and a convolutional layer with a kernel size of 3. The padding is adjusted so that the input and output tensors have the same shape. This structure is the same as the one used in Resnet. The presence of residual coupling allows learning to proceed without gradient loss, even as the depth of the neural network increases.