3D graph contrastive learning for molecular property prediction

Abstract Motivation Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data. This learning method is in the spotlight in the drug field, lacking annotated data due to time-consuming and expensive experiments. SSL using enormous unlabeled data has shown excellent performance for molecular property prediction, but a few issues exist. (i) Existing SSL models are large-scale; there is a limitation to implementing SSL where the computing resource is insufficient. (ii) In most cases, they do not utilize 3D structural information for molecular representation learning. The activity of a drug is closely related to the structure of the drug molecule. Nevertheless, most current models do not use 3D information or use it partially. (iii) Previous models that apply contrastive learning to molecules use the augmentation of permuting atoms and bonds. Therefore, molecules having different characteristics can be in the same positive samples. We propose a novel contrastive learning framework, small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property prediction, to solve the above problems. Results 3DGCL learns the molecular representation by reflecting the molecule’s structure through the pretraining process that does not change the semantics of the drug. Using only 1128 samples for pretrain data and 0.5 million model parameters, we achieved state-of-the-art or comparable performance in six benchmark datasets. Extensive experiments demonstrate that 3D structural information based on chemical knowledge is essential to molecular representation learning for property prediction. Availability and implementation Data and codes are available in https://github.com/moonkisung/3DGCL.


Introduction
Self-supervised learning (SSL) learns data semantics by utilizing inherent supervision in data from a large amount of unlabeled data. A self-supervised model using a pretext task has recently outperformed the general supervised model. SSL has been extremely successful in computer vision  and language fields (Mikolov et al., 2013;Devlin et al., 2018) and has attracted particular attention in the field of drugs (Rong et al., 2020), wherein considerable time and money are incurred labeling data, and there is a lack of annotated data compared with other domains. A molecule can be expressed in various ways, such as a chemical fingerprint, for example, ECFP (Rogers and Hahn, 2010), which uses a fixed vector for particular substructures, and simplified molecular input line entry system (SMILES), (Weininger, 1988) which represents the molecule as a string.
In addition, there is a way to represent a molecule as a graph, and Graph Neural Networks is widely used for molecular property prediction (Gilmer et al., 2017) because it can reflect the structure and correlation of atoms and bonds effectively. SSL using enormous amounts of unlabeled data has shown excellent performance for molecular property prediction (Li et al., 2021;Zhang et al., 2020). However, a few issues exist. First, Existing self-supervised learning models are 'large-scale.' They require a million sizes of pre-train data to generalize various downstream tasks and, in many cases, are large-size models such as Transformer (Vaswani et al., 2017) to learn that data. Therefore, there is a limitation to implementing selfsupervised learning where the computing resource is insufficient. We use only 1,128 samples for pre-train data, about 0.5 million model parameters, and overcome the not-high computing environment. Second, in most cases, self-supervised models do not utilize 3D structural information for molecular representation learning. The activity and property of a drug are closely related to the structure of the drug molecule. Nevertheless, most current self-supervised models do not use 3D information or use it partially (Liu et al., 2021b;Stärk et al., 2021). We introduce a novel 3D-3D view contrastive learning method to learn molecular structural-semantic. Contrastive learning is one of the self-supervised learning methods and consists of pretext tasks to learn similarities and dissimilarities between positive and negative pairs. Finally, previous models that apply contrastive learning to molecules have used the augmentation permuting atoms and bonds, while positive samples should be intrinsically identical to each other. Unlike images, molecules can be completely different if we use the augmentation that changes atoms or bonds, so molecules having different characteristics can be in the same positive samples. We generate a conformer pool consisting of several conformers to preserve the molecular composition and use it for molecule-contrastive learning.
We present a 3D Graph Contrastive Learning (3DGCL) framework for molecular property prediction, a small-scale method that uses a tiny dataset, model, and 3D coordinates. Our approach uses approximately 1,000 data and 0.5 million model parameters , and randomly selects molecules from the conformer pool instead of selecting the most stable molecules to learn the 3D structure abundantly. We demonstrated the effectiveness of the 3DGCL through extensive experiments. We compared the proposed method with previous state-of-the-art baselines in four regression and two classification benchmark datasets under the same experimental settings. To investigate the importance of the 3D view, we compared our method of utilizing molecular 3D coordinates with the existing pre-training method of modifying the original molecule. In addition, we also compare our conformer pool, comprising conformers that exist in nature, with molecules that are difficult to exist in nature, which is created by adding noise to the structure of the original molecule. We achieved outstanding performance and the best result in the pre-training effect by comparing the difference between pre-training and non-pre-training, compared to existing methods. The results showed that chemical-based 3D structural information is vital for molecular representation learning in property prediction. We focus not only on large-scale learning but also on pre-training strategies to learn representations correctly in self-supervised learning.
Our main contributions are: • We develop a compact self-supervised learning approach that can be run even in environments with low computational resources, using the small-scale pre-train samples and parameters. We also achieve the stateof-the-art or comparable performance in six molecular benchmarks. • To the best of our knowledge, we propose 3D-3D view contrastive learning that can take full advantage of 3D information for the first time.
We actively utilize 3D positional information inherent in molecules through a pre-training scheme using a conformer pool. • Extensive experiments demonstrate that our method, which can utilize structural information abundantly while maintaining semantics, is more suitable for molecular property prediction than conventional contrastive learning, which may change the structure or properties of molecules.

Graph Neural Networks
Graph Neural Networks (GNNs) is a widely used deep learning technique using graph-structured data. Due to the fact that molecules can be well described in graphs, GNNs for molecular property prediction has been active research (Yang et al., 2019a;Gilmer et al., 2017;Lu et al., 2019). The molecular graph is represented as G = (V, E), where V and E denote the set of atoms and bonds, respectively. The message passing scheme in GNNs (Gilmer et al., 2017) can be formalized as follows: The GNNs aims to learn each node vector h k v and the entire graph vector h G in the k-th layers. N (v) denotes the neighbor node of node v and evw denotes the edge between node v and node w. M and U are message and update functions depending on the GNN models. GNNs update each node through an iterative message passing process. Finally, the readout layer, e.g. sum or mean pooling, is applied to get the entire graph vector while satisfying permutation-invariance.

Self-supervised learning for molecular property prediction
In the drug field, it is challenging to obtain annotated data due to expensive wet-lab experiments; however, it is relatively easy to collect data without annotation. Thus, there have been many SSL approaches recently, and they have shown noticeable results for molecular property prediction. Inspired by BERT (Devlin et al., 2018), a powerful pre-training model in NLP, SMILES-BERT , ChemBERTa (Chithrananda et al., 2020) utilized the enormous SMILES datasets to predict molecular properties. However, SMILES is not geometry-aware because it represents molecules as string sequences. Instead of SMILES as molecular representations, recent studies using molecular graphs use various types of pretext tasks. GROVER (Rong et al., 2020) developed a transformer-style architecture and utilized motif-level pretext tasks. (Dillard, 2021) integrated pretext tasks at a different scale consisting of the atom, fragment, and molecule levels.
Contrastive learning is one of the powerful learning methods of selfsupervised learning, making similar samples close and dissimilar samples far away in embedding space (He et al., 2020;Chen et al., 2020). GraphCL (You et al., 2020) proposed various augmentation methods for graph contrastive learning. MolCLR  focused on molecular representation learning based on approaches presented in GraphCL. These methods change the structure of the original graph, such as dropping nodes or changing edges, so that molecules can have different properties. Zhang et al. (2020) proposed contrastive method brings the subgraph and the graph close together through motif-level sampling from the entire graph. MoCL (Sun et al., 2021) used augmentation to replace the sub-structure in the original molecule with bioisostere, which has similar properties to the original, incorporating domain knowledge. They tried to maintain the semantic information of the molecule, but supervision used in pre-training may not be appropriate for molecular representation learning because even a tiny change in a molecule can lead to a significant difference. Hermosilla and Ropinski (2022) proposed a 3D protein contrastive learning method to learn the structure of a protein. During the pre-training process, the substructures of the protein are used, which is a similar approach to (Sun et al., 2021) except that 3D information is used. MEMO (Zhu et al., 2022) is a multiple-view contrastive learning method using various molecular representations (2D graph, 3D graph, Fingerprint, and SMILES). Uni-Mol ) is a self-supervised model that uses 3D information and consists of a pre-training scheme that predicts masked atoms or denoises 3D positions after adding noise to the molecular coordinates.
Molecular structure information plays an essential role in determining molecular properties. We have recently witnessed studies with great success using 3D geometric data. GraphMVP (Liu et al., 2021b) and 3Dinformax (Stärk et al., 2021). developed 2D-3D view contrastive learning approaches maximizing the mutual information between molecular embedding of 2D graph network and 3D graph network. Existing works with 3D coordinates do not fully utilize geometric information because they allow the 2D view network to learn 3D view information.

3D Graph Neural Networks
Due to a general graph being represented in a non-euclidean space, it is difficult to grasp the exact molecular structure. However, by leveraging 3D positions, we can specifically use unique molecular geometric properties such as distance, angle, and torsion in the euclidean space. For this reason, 3D Graph Neural Networks (3D GNNs) using 3D information are attracting widespread attention in drug and material discovery fields (Unke and Meuwly, 2019;Qiao et al., 2020).
3D Molecular graph can be represented as G 3d = (V, E, R) , where R denotes the 3D coordinates of atoms. SGCN (Danel et al., 2020) applied different weights according to interatomic distance in the message-passing process based on GCN. SchNet (Schütt et al., 2017) used Gaussian radial basis functions to represent distance information in a high-dimensional space. DimeNet (Klicpera et al., 2020) used orthogonal functions to learn both distance and angle information. SphereNet (Liu et al., 2021c) also used torsion information to represent a complete molecular structure using spherical message passing. ChIRo (Adams et al., 2021) developed a chirality-aware 3D network that can learn molecular chirality from torsion angles. SchNet and subsequent models (DimeNet and SphereNet) can capture non-bonded interaction based on 3D positional information within cutoff. SchNet is the most efficient architecture among 3D GNNs in terms of the learning cost and time, although the other models are superior performance. We use SchNet as an encoder for 3D contrastive learning. The critical process of SchNet is expressed as follows: where v 0 i denotes the feature of the atom i and is the initialized value using embedding of the atomic number.
where d ij denotes the distance between the atom i and the atom j. The interatomic distance is encoded to e ij by radial basis functions located in v l+1 i in the l+1-th layer is updated based on neighbor atom j in the l-th layer and the distance information e ij . • indicates the element-wise multiplication.
The global molecular feature vector h G 3d is obtained based on summation of the atom embeddings. We can handle arbitrary positioned atoms in the 3D space through the encoder.

Conformer Pool
A conformer is a molecule group formed by rotation on single bonds in a molecule. Conformers have different potential energies depending on the degree of rotation, and the lower the energy, the higher the probability of existence in nature. For example, butane is a molecule composed of single carbon-carbon bonds and single carbon-hydrogen bonds. If it is Fig. 1. The process of creating a conformer pool we use in contrastive learning. We cannot grasp the spatial characteristics of butane in 2D. However, we can see that the potential energy varies depending on the rotation angle of the bond between the second carbon (C2) and the third carbon (C3), both colored black in the figure on top. We construct a pool of conformers with different stability and pre-train by randomly selecting from the rest of the conformers shown in the yellow box in the figure in the middle, except for the most stable molecule. expressed in 2D, it is difficult to see the change due to the rotation of a single bond. Nevertheless, as shown in Figrue 1, if represented in 3D, we can see that butanes form conformers with different potential energies according to rotation. We generate a conformer pool using the Merck molecular force field (MMFF94) (Halgren, 1996) function in RDkit (Landrum et al., 2020) to utilize diverse molecular geometric information in contrastive learning. The MMFF method combines distance geometry (DG) algorithms, a classic approach that randomly sample conformational space, with energy minimization using MMFF. The conformer pool consists of five conformers with the lowest energy. We do not add more conformers because five conformers are enough to represent almost all molecules in nature (Liu et al., 2021b), and the more conformers we add, the more computational cost we need. We also reduce the cost and enrich the molecular representation by randomly selecting the conformers from the conformer pool.

3D Graph Contrastive Learning
Contrastive learning is a powerful self-supervised learning method, moving positive pairs of similar samples close and negative pairs of dissimilar samples far away in embedding space. In order to be consistent with the basic assumption of contrastive learning, we make the positive samples be intrinsically the same as each other, unlike previous works that have made use of the augmentation permuting atoms and bonds or may not maintain molecular semantics Zhang et al., 2020;Sun et al., 2021).
Maintaining semantics between conformers means that conformers conserve identity with the same element. Therefore, we performed contrastive learning by constructing positive pairs with conformers We generate a conformer pool from the original 2D molecule using 3D coordinates information. The conformer pool consists of five conformers that vary stability with the rotation angle of a single bond. We randomly choose the two molecules from the conformer pool for conformer augmentation. Then, the selected molecule is fed to a 3D encoder for embedding molecular representation. We apply a projection head to map the 3D view representations into the latent space for contrastive learning. The contrastive loss is computed across all positive and negative pairs in the minibatch of N molecules.
(molecules with slightly different properties) instead of the existing methods of changing semantics in the pre-training process. Given molecular-input samples, we augment the molecule by applying our conformer pool to construct the positive and negative pairs. We randomly select two conformers (in yellow circle) in the conformer pool, as shown in Fig. 2.
Graph contrastive learning maximizes the consistency between latent representations of positive pairs against negative pairs. We get the molecular graph embedding h using a 3DGNN encoder as the SchNet, and we apply a projection head to embedding h resulting in the non-linear transformation to obtain latent representation z. The projection head consists of a twolayer multi-layer perceptron and maps the molecular representations to another latent space as advocated in . Finally, we adopt a normalized temperature-scaled cross entropy (NT-Xent)  loss as our objective loss to maximize the agreement between the positive samples compared to the negative samples in a minibatch of N molecules. The contrastive loss for the first and second sampled conformers of k-th graph k c1 , k c2 is defined as: where sim (z 1 , z 2 ) is the cosine similarity z 1 · z 2 /(∥z 1 ∥ ∥z 2 ∥) and τ is the temperature paramerter. We compute the final loss across all training samples in the minibatch. There have been a few molecular graph contrastive learning models using diverse viewpoints recently. As introduced in the Section 2.2, MoCL and MolCLR (Sun et al., 2021;Wang et al., 2022) use the 2D-2D view approach, and there are models 3Dinformax and GraphMVP (Stärk et al., 2021;Liu et al., 2021b) that use the method of the 2D-3D view. However, as far as we know, we propose a 3D-3D perspective contrastive learning model first to fully utilize 3D information in the context of molecular property prediction, as shown in Table 1.

Datasets
Pre-training Dataset For pre-training of 3DGCL. We use only 1,128 ESOL datasets (Delaney, 2004). We add 3D coordinates to pre-train datasets Table 1. Comparison of views in various contrastive learning models.

Downstream Datasets
We use four regression and two classification datasets for downstream tasks. The datasets are described in Table 2: • ESOL: Water solubility data (log solubility in mols per litre) for common organic small molecules.  Table 3. Test performance of 3DGCL and other methods based on four regression (ESOL, Freesolv, QM7, and QM8) and two classification benchmarks (BBBP and BACE). We mark the best results in bold and the second best results in underlined. We split the dataset into 8:1:1 (train:validation:test) using scaffold splitting considering chirality. • QM8 Ramakrishnan et al. (2015): Electronic spectra and excited state energy of small molecules calculated by multiple quantum mechanic methods. • BBBP: Binary labels of blood-brain barrier penetration. • BACE: Quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human β-secretase 1(BACE-1).

Metric RMSE (Lower is better) ↓ MAE (Lower is better) ↓ ROC-AUC (Higher is better) ↑
We select the six molecule datasets from MoleculeNet (Wu et al., 2018). The first two, ESOL and Freesolv, are physical chemistry datasets, and the next two, QM7 and QM8, are quantum mechanic datasets, then BACE and BBBP are biophysics and physiology, respectively. We also add 3D locational information to downstream datasets in the same way (Fang et al., 2022) as the pre-training dataset.

Pre-training Setting
During the pre-training of the model, we use an Adam optimizer with an initial learning rate of 0.001 and exponentially reduce the learning rate at a ratio of 0.95 or 0.99. Pre-training process is run with 300 epochs and 400 batch sizes.

Fine-tuning Setting
After pre-training, we fine-tuned the model on six benchmark datasets. We train the model with an Adam optimizer of a learning rate of 0.001 and exponentially decrease the learning rate at rates of 0.95 or 0.99 in the same way as pre-training. We set 32 batch sizes and ran 200 epochs for all datasets.
We split the data set into train/validation/test sets at a ratio of 80/10/10 using the scaffold splitter (Bemis and Murcko, 1996) from DeepChem (Ramsundar et al., 2019) for downstream tasks like previous works (Hu et al., 2019;Rong et al., 2020). Each set is structurally different as the scaffold splitter splits molecular data by their substructure. Then, this splitting method is widely used for molecular-related tasks because it can evaluate the generalization capability of algorithms well. We provide further details of hyperparameter settings in supplemental materials.

Implementation Details
We implement our 3DGCL with PyTorch (Paszke et al., 2019) and PyTorch Geometric frameworks (Fey and Lenssen, 2019) and write our implementation code based on (Liu et al., 2021a). We use RDKit (Landrum et al., 2020), a cheminformatics software for molecular-related tasks. All the experiments are run on a single NVIDIA RTX 3090 GPU.
There are two processes regarding the running time of tasks about conformers: creating a conformer pool from the original ESOL dataset and contrastive learning after selecting two conformers. In our study, each process took 3 and 6 minutes, totaling approximately 9 minutes, and the number of model parameters is about 0.5 million. This indicates that our work requires significantly low resources.

Evaluation
To evaluate our fine-tuned model, we measure the RMSE (Root Mean Squared Error) on ESOL, Freesolv, and the MAE (Mean Absolute Error) on QM7 and QM8 datasets. For a fair comparison with the state-of-the-art models, we run the model with random seeds three times and average the performance and standard deviation in the same way in previous works. In the same way Rong et al., 2020;Zhou et al., 2022), we evaluated QM7 for 1 target task and QM8 for the average of 12 target tasks.

Results
We evaluate the 3DGCL performance with standard supervised baselines and self-supervised models. All compared methods use more than one dataset of six benchmark datasets (ESOL, Freesolv, QM7, QM8, BBBP, and BACE) and conduct experiments under the same condition . The same condition denotes scaffold-splitting (with considering chirality) the train/validation/test data to an 8:1:1 ratio and running tests independently three times with three random seeds. Scaffold splitting can be divided into two types according to the consideration of chirality. We show the results of scaffold splitting with considering chirality in Table 3, the results without considering chirality in Supplementary Table S2.
The baselines are as follows: DMPNN (Yang et al., 2019b) proposes an interactive message passing scheme considering the interactions. AttentiveFP (Xiong et al., 2019) is an attention-based graph neural network.
The rest of the seven methods are self-supervised models. N-Gram (Liu et al., 2019) produces a graph representation in short walks by building the node embedding. MolCLR  is a 2D-2D view contrastive learning model based on atom masking, bond deletion, and subgraph removal. GraphMVP (Liu et al., 2021b) proposes 2D-3D view contrastive learning approaches. GROVER (Rong et al., 2020) uses a predictive pretraining strategy of motif-level. GEM Fang et al. (2022) and Uni-Mol design predictive self-supervised learning scheme using 3D molecular information. We reference the performance of the baselines in Uni-Mol. Table 4. Performance improvements of 3DGCL pre-training. We show enhanced performances to verify the pre-training effect of our method. We also indicate the number of atoms of datasets. We present the experimental result with dataset size in pre-training to show the efficiency of our method, as can be seen in Table 3. We mark the best results in bold and underline the second best in Table 3. 3DGCL outperforms all other methods on four of six datasets and shows the second-best performance in QM7 and third-best in BACE. Furthermore, our method achieves overwhelming performance on Freesolv and BBBP by a large margin.
The results present that 3DGCL consistently achieves the best performance and comparable results. We should also note that our method uses overwhelmingly fewer resources than other methods, as shown in Figure 3. We compare 3DGCL and the three state-of-the-art, GROVER, GEM, and Uni-Mol, that show the best overall performance with the pre-train dataset and model parameters. We can see that 3DGCL uses about 10,000 times fewer datasets using only 1k than other state-of-the-art methods using over 10M datasets, as shown in Figure 3(a). We can also confirm that the number of 3DGCL parameters is approximately over 100 times fewer than the parameter size of other models, as shown in Figure  3(b).

Pre-training effects according to the molecular 3D information and size
3D information is molecular coordinates, and we can obtain geometric information such as distance, angle, and torsion. Our encoder utilize distance information in the pre-training process. To verify the contribution of geometric information on molecular representation learning, we compare our model with pre-training and the model without pre-training, i.e., SchNet.
As a result of the experiment, 3DGCL consistently obtains enhanced performance on all the datasets, as shown in Table 4. We also investigate the pre-training effect by molecular size. The size denotes the average number of atoms in each dataset. The result shows that 3DGCL leads to higher self-supervised performance on small molecules than on larger datasets. This confirms the effect of the backbone encoder (SchNet), which was developed to focus on small organic molecules. We demonstrate that 3D spatial information is critical for molecular property prediction, and the encoder plays an important role in self-supervised learning.

Comparison with diverse augmentation methods
We conduct comprehensive studies to compare the proposed 3DGCL method with the existing contrastive learning approaches, which may alter the original molecular properties. In addition, we also compare our conformer pool, which consists of five conformers that can most likely exist in nature, and molecules that are difficult to exist in nature, which is created by adding noise to the 3D position of the original molecule. Finally, we compared selecting the most stable molecules with our method of selecting molecules at random. We visualize the methods used in this study in Figure 4.
Atom Dropping: Atom dropping removes atoms randomly at a constant ratio in the molecule graph G. In previous graph contrastive learning studies, atom dropping has been widely used for graph augmentation (You et al., 2020;Wang et al., 2022). We randomly delete 20% of the atoms in the original molecule.
Attribute Masking: Like atom dropping, we randomly select 20% of atoms in the original molecule, then mask their attributes with a feature not presented in the original molecules (You et al., 2020;Wang et al., 2022). Since we use only the atomic number as the feature of atoms, we replace the atom number with the new masking number n.
Position Noising: We add noise to the original 3D coordinates of the most stable molecule. We generate the noise drawn from a Gaussian distribution ∼ N (0, 1) whose mean and standard deviation are 0 and 1, then multiply the noise by 0.01 to fit the scale with coordinates.
Conformer-S: We compare the method using the two most stable conformers without creating the conformer pool to verify that the proposed way of randomly selecting in the conformer pool is effective for learning rich molecular representations. We name the approach utilizing only the two stable molecules conformer-S and call our method Conformer-R in this experiment.
We conduct extensive experiments to confirm the efficacy of molecular location information based on chemical knowledge. The experiments consist of three concepts. First, we compare atom dropping and atom masking methods, which are challenging to preserve molecular semantics due to severe structural changes. Secondly, we performed pre-training using the noising position method that utilizes 3D information while subtly giving spatial changes without chemical knowledge. Finally, we witness that our random select way is superior to the stable molecule choice method.
As experimental results, attribute masking has the least improved effect in pre-training, and atom dropping shows the second minor effect after attribute masking, as shown in Table 5. These results show that the method of significantly modifying the structure and properties of the original molecule is not suitable for molecular representation learning. Augmentation, position noising, which generates molecules that are difficult to exist (finely changing the 3D locations of atoms at random), showed a better effect than the above two approaches. We also test comparing the conformer pool (Conformer-R) and without the pool (Conformer-S)  to confirm that our method plentifully learns from molecular geometric representation. As a result, our 3DGCL shows the best performance than other methods. Through diverse comparative experiments, we validate that our proposed approach is the most proper for molecular graph contrastive learning in the context of molecular property prediction.

Application to real world
The drug field suffers from annotation scarcity in that wet-lab experiments are required. We design experiments to verify that our model can be applied to the actual drug area, assuming the real world with a data deficiency. We conduct the experiments using ESOL and only 150, 300, and 600 samples as training data without using the 1,128 entire datasets. Then, we compare the model's results with pre-training and the model without pre-training. The pre-training process was conducted using the whole ESOL dataset in the same way as in the above experiments. We evaluate the results based on RMSE and observe that our method consistently outperforms the model trained from scratch in all the reduced datasets, as shown in Figure 5. The results indicate that our approach can be utilized in the drug field encountering annotation insufficiency to obtain significant effects.

Conclusion and Futher Works
In this work, we first proposed a novel 3D-3D view Graph Contrastive Learning (3DGCL) framework to address existing issues in self-supervised learning for molecular property prediction. Despite the prevailing notion that self-supervised learning requires hyper-scale resources, we present the possibility of small-scale self-supervised learning methods. Furthermore, we suggested a contrastive approach using randomization in a conformer pool that can learn 3D information in abundance while maintaining molecular semantics, unlike previous methods that could alter the property of the molecule. Comprehensive experiments show that our method outperforms existing self-supervised methods. We provide insight that it is proper to leverage 3D geometric information and domain-based knowledge in molecular property prediction. We also demonstrate that 3DGCL can be applied to the actual drug field undergoing annotation scarcity. We have shown remarkable performance using models and datasets, even on a small scale.
Despite the strong strengths, our model 3DGCL also has some weaknesses, such as focusing on only distance-based geometric information due to our 3D-based backbone framework. Another weakness is that the model needs to generate 3D information of molecules, which can slow the model training. If sufficient resources are supported, we will extend 3DGCL to further studies with a broader range of downstream tasks, along with hyper-scale model capacity, and more diverse geometric information.