Machine learning-aided atomic structure identification of interfacial ionic hydrates from AFM images

ABSTRACT Relevant to broad applied fields and natural processes, interfacial ionic hydrates have been widely studied by using ultrahigh-resolution atomic force microscopy (AFM). However, the complex relationship between the AFM signal and the investigated system makes it difficult to determine the atomic structure of such a complex system from AFM images alone. Using machine learning, we achieved precise identification of the atomic structures of interfacial water/ionic hydrates based on AFM images, including the position of each atom and the orientations of water molecules. Furthermore, it was found that structure prediction of ionic hydrates can be achieved cost-effectively by transfer learning using neural network trained with easily available interfacial water data. Thus, this work provides an efficient and economical methodology that not only opens up avenues to determine atomic structures of more complex systems from AFM images, but may also help to interpret other scientific studies involving sophisticated experimental results.

. Examples of network prediction of K + hydrates simulated and experimental data. Figure S11. Validation of the predicted K + hydrates structure. Figure S12. Prediction of Na hydrates by NN with or without transfer learning. Figure S13. The prediction accuracy error bar in Fig. 4 with the manually collecting data. Table S1. Force field parameters of SPC/E water, Au and Pt surface for interfacial water simulation.

Neural Network Structure
The Neural Network (NN) for structure prediction from atomic force microscopy (AFM) images was a standard 3D U-Net [1] model, a CNN widely used for biological and clinical image segmentation [2]. It consists of an encoder part for feature extracting, a decoder part for converting input (AFM images) to output (visual representation), and skip connections between the encoder and decoder parts, effectively preventing the feature loss of H-atom during the encoder process ( fig. S1). 10 of the gray scale AFM images at different tip-sample distances are stacked to a 3D image as the input to the network. The size of the input data is 1 × 10 × 128 × 128 where each number represents the channel, depth, height and width respectively. The output of the network is a 3 channels 2D image of size 3 × 128 × 128 where each number represents the channel, height and width respectively. Each block in the encoder part contains two 3 × 3 × 3 convolutional layers, each layer is preceded by a Group Normalization (GN) layer and followed by a leaky rectified linear unit (L-ReLU). And a maxpooling is added at the end of each block. The first two maxpoolings are 1 × 2 × 2 in size to keep the AFM stack depth at 10, and the next two maxpoolings are 2 × 2 × 2 in size to reduce the data depth to 2. In the decoder part, each block consists of an interpolation upsampling of the size of maxpooling in the corresponding block in the encoder, followed by two 3 × 3 × 3 convolution layers, each of which is preceded by a GN layer and followed by a L-ReLU. Skip connections between layers with the same size in the encoder and decoder part provide the decoder with essential high-resolution features. After the decoder, there are two 3 × 3 × 3 convolution layers and two 3 × 3 convolution layers. The data was flattened before the 3 × 3 convolution. Except for the last layer using ReLU, other convolution layers use L-ReLU. The slope of the L-ReLU is 0.001.

5
Figure S1 | 3d U-Net architecture. Schematic illustration of the model architecture. Blue cuboids and rectangles represent 3D and 2D feature maps, respectively. The number of channels is denoted above each feature map. 6

The MD simulations of interfacial water.
The MD simulations of interfacial water are performed in the NVT ensemble using the LAMMPS software packages [3]. In the constructed system of the interfacial water on Au(111) surface, the size of the entire box is about 5.0 × 5.0 × 7.5 ! , the Au (111) surface modeled by a six-layer slab is fixed at the bottom of the box, the water tank with a size of about 5.0 × 5.0 × 3.0 ! is placed right above the surface and the rest of the box is vacuum space. For the system of interfacial water on the Pt surface, we only replace the surface to Pt (111) and adjusted the box size according to the lattice difference between Pt and Au surface. The detailed force field parameters are listed in Table S1. MD simulations over a wide temperature range from 140K to 300K (40 a step) , were performed to sample various water phase spaces sufficiently. The simulation timestep is 1 fs and the total simulation time at a given ( , , ) is 5 where the first 2 is run for structure relaxation.

The MD simulations of ion hydrates.
The MD simulations of Na + /K + hydrates were performed using the AMBER16 software package[4] with polarizable force fields[5] [6] for Na + hydrates and the recently optimized force fields for K + hydrates [7]. The effectiveness of the force field for Na + hydrates has been demonstrated in our previous publication [8]. The force field parameters are shown in

DFT calculation.
We calculate the electrostatic potential of Na + /K + hydrate at Au (111) surface using DFT within the generalized gradient approximation of Perdew-Burke-Ernzerhof [10] (PBE-GGA), which has been shown to give good results for H2O hydrogen bonding [11] DFT calculations were performed using the Vienna ab initio simulation package (VASP) [12].
Projector augmented wave pseudopotentials were used with a cut-off energy of 550 eV for the expansion of the electronic wave functions. Van der Waals corrections for dispersion forces were considered by using the optB86b-vdW functional [13] [14].

Structure Sampling
We note that the detection window (Fig. 1a,  For acquiring sufficient interfacial water structure, the detection window is initially settled at the first water layer and then slides in the direction parallel and perpendicular to the substrate. In XY plane, the number of steps of the detection window is 2 and the stride is 0.5 for each step. While in Z direction, the number of steps and stride is 1 and 0.5 , respectively. To ensure the variety of the data, the water molecules within the distance δ % and δ & from the window boundary are neglected during each detection. The distance δ % and δ & are settled randomly, but can guarantee that at least one water molecule remains in the detection window. The configurations used for data acquisition is sampled from the MD simulations trajectories with a time interval of 10 .

Training Data Augmentation
To make the network more robust to experimental data and avoid overfitting, the input data is processed as follows.
a) Nosie. For each data, a random uniform noise or a gaussian noise is randomly selected and added to all the images.
A random uniform noise with an amplitude in a range of [−δ , δ ] is added to each image, where '(% and ')* is the maximum and minimum pixel value of an image, 10 and is the amplitude ratio.
A gaussian noise with an amplitude in a distribution + is added to each image, Where ′ is the amplitude ratio, = 1.

Structure Representation
We design a new structure representation named advanced vdW (a-vdW) spheres as the neural network output/ data label. As shown in fig. S3, it composed of three grayscale images representing atom type, positive charge and negative charge. They are the same size and are all the projection of the structure onto the − plane. Since the interfacial structure is restricted in a box with a height of only 0.3 , its information in the direction is negligible, and the projection of all the atoms is sufficient to represent the structure to some extent. In our representation, each atom is described as a sphere and projected onto the − plane in height order. If two atoms overlap, the higher atom cover the lower one. In atom-type images, the diameter and grayscale # of atoms are related to their LJ potential parameters σ and ϵ. Here, in order to differentiate different elements for easier identification, ϵ of each element is scaled and transformed, where ϵ ')* ϵ '(% are the minimum and maximum ϵ values among all elements, respectively, and # = 0.6 and $ = 0.7 are the scale factors. All the LJ potential parameters are chosen from the OPLS Force Field [15]. For positive and negative charge images, the diameter of atom is defined as above, and the grayscale $ is scaled by charge, where is the charge of the atom and '(% is the charge of the most charged atom in the structure.
If we consider these three images as three channels of a color image and combine them, we get a colormap where each atom has a different color. According to the definition of

Training Process
The loss function is mean square error (MSE) and the optimizer for gradient descent is the Adaptive Moment Estimation (Adam) [17]. For the training of interfacial water data, the learning rate is 0.001 in the first 30 epochs, and then decreases to 0.0001 for the next 30 epochs. For the transfer learning of Na + hydrates, trained network parameters from the interfacial water dataset are used. The learning rate for the last two 2D 3 × 3 convolution layers and other layers are 0.001 and 0.0001, respectively. The loss converges after 60 epochs of training. The batch size we use throughout the training is 8. Fig. 2. a, The atomic model presented by OVITO. b, d, f, The reference representations of structure a, which is atom type, positive charge and negative charge, respectively. c, e, g, The representations predicted by the NN for structure a, which is also atom type, positive charge and negative charge, respectively. Fig. 3. a, The atomic model presented by OVITO. b, d, f, The reference representations of structure a, which is atom type, positive charge and negative charge, respectively. c, e, g, The representations predicted by the NN for structure a, which is also atom type, positive charge and negative charge, respectively. 14

Figure S6 | Bad prediction of NN on interfacial water (Au surface) data. The representations
of structure a, which is atom type, negative charge, positive charge, and color plot, respectively (the upper panels, predicted by NN; the lower panels, reference representations).

Correction of the hydrogen atom shift by transfer learning
After training the NN using data of interfacial water on Au (111) substrate, we test the transferability of NN using the data of interfacial water on Pt (111) substrate (testing data). Manually examination of the prediction of NN on testing data revealed that some of the predicted hydrogen atom positions drifted along hydrogen bonds compared to the reference. As can be seen from fig. S7 (the middle plot), the hydrogen shifts are marked with arrows, and the misdescription of hydrogen atom is marked with the circle.
After re-training (transfer learning) the NN using 5000 data of interfacial water on Pt

Prediction for experimental AFM images of Na + ⋅4H 2O
Before prediction, the experimental AFM images to be predicted should be preprocessed. They should be padded or cut to the same size as the detection space of the training data, so as to ensure that the size of the imaging of a specific molecule relative to the detection space remains unchanged. Here, the detection space of experimental AFM images to be predicted is 1.8 × 1.8 and the image size is 240 × 240 pixels, which should be padded to 333 × 333 pixels, since the detection space of the training data is 2.5 × 2.5 . Then, the images are resized to 128 × 128 pixels as the input data. In addition, to avoid the influence of experimental noise on the NN prediction, we adjust the noise amplitude of the data augmentation = 0.01 during the NN training process for ionic hydrates data.
According to the prediction of NN of experimental data (Fig. 3, the prediction in last row), we create a Na + ⋅4H2O structure on NaCl substrate and simulate the AFM images.
Comparing the simulated ( fig. S8, second row)   13 Prediction for Na hydrates whose water molecules are at different positions Figure S9 | Prediction for Na hydrates whose water molecules are at different positions.
The representations of Na hydrates, from left to right where one of the water molecules posited from around the ion to above the ion (the upper panels, predicted by NN; the lower panels, reference representations).

Structure prediction of K + hydrates with Transfer Learning
To verify the general applicability of our transfer learning method, we carried out further investigations on K + hydrates, following the same operation as for Na + hydrates.
In particular, thousands of K + ⋅nH2O (n>3) data were prepared by simulation for NN transfer learning. To reproduce most of the important features of experimental AFM images, the AFM simulation parameter of charge at the tip apex was used Q = -0.20 e.
After 50 epochs of training, the NN shows quite good performance on the K + hydrates testing data (fig. S10). The predicted accuracies for K, O and H (calculated from 700 simulated testing data) are 99.9%, 96.3%, and 81.4%, respectively. This is a good demonstration of the robustness and general applicability of our method.
Prediction of experimental AFM images of K + hydrates on Au surface is presented in the fourth row of fig. S10 below. Validation of the predicted K + hydrates structure is shown in fig. S11, which is verified by the consistency between the experimental AFM image and the simulated AFM image of the NN-predicted structure after DFT relaxation ( fig. S11). It is worth mentioning that since the experimental image used to predict the structure is a part of a large interfacial hydrogen bond network, the simulated AFM 18 images of the predicted structure near the boundary where the periodic boundary condition was applied, show some differences compared with the experimental image.
Nevertheless, the AFM simulation of the predicted structure agrees well with the experimental result, demonstrating the high predictive power of our method for the experimental AFM results.  For the hydrogen atoms, specifically, only a few pixels of light grayscale are presented.
We use the edge detection for atom identification. The image is fist smoothed by ]. Furthermore, we should consider the shape of the region as an important criterion. We define the shape factor of a region as follows: where ) denotes the distance from the i-th pixel to the center of the region, and the sum is over all pixel at the edge of the region. A perfect circle has a shape factor of 1.
Once the > '(% where '(% is a threshold, the shadow should be discarded. In addition, if two hydrogen atoms overlap in the label and we can only detect one region in the prediction, we count them as two TPs.
Finally, for each element, the prediction accuracy is calculated by eq. S7 based on all predictions for it in the target dataset. To avoid overestimating the performance of the NN, the parameter set in the prediction accuracy calculation algorithm was adjusted and cross-checked by manual calculated prediction accuracy of each atom species. We manually identified the atoms and 22 calculate the prediction accuracy for each atom species among different datasets.
Calculated using the same datasets used in Fig. 4, we plot the error bars with the manually collected data (see fig. S13). The prediction accuracy of machine calculation of this parameter set is slightly lower than that of manual calculation, indicating that the prediction accuracy of the machine calculation is reliable.