MolMeDB: Molecules on Membranes Database

Abstract Biological membranes act as barriers or reservoirs for many compounds within the human body. As such, they play an important role in pharmacokinetics and pharmacodynamics of drugs and other molecular species. Until now, most membrane/drug interactions have been inferred from simple partitioning between octanol and water phases. However, the observed variability in membrane composition and among compounds themselves stretches beyond such simplification as there are multiple drug–membrane interactions. Numerous experimental and theoretical approaches are used to determine the molecule–membrane interactions with variable accuracy, but there is no open resource for their critical comparison. For this reason, we have built Molecules on Membranes Database (MolMeDB), which gathers data about over 3600 compound–membrane interactions including partitioning, penetration and positioning. The data have been collected from scientific articles published in peer-reviewed journals and complemented by in-house calculations from high-throughput COSMOmic approach to set up a baseline for further comparison. The data in MolMeDB are fully searchable and browsable by means of name, SMILES, membrane, method or dataset and we offer the collected data openly for further reuse and we are open to further additions. MolMeDB can be a powerful tool that could help researchers better understand the role of membranes and to compare individual approaches used for the study of molecule/membrane interactions.


Introduction
Biological membranes consist of complex lipid and protein mixtures that play a crucial role in molecular transport into/out of cells. Apart from passive or active permeation, molecules can also accumulate in the membranes at specific functional positions or they can disrupt the membrane altogether. All those molecule-membrane interactions are important for the actions of individual molecules in the organism and their pharmacokinetics.
And yet, most chemical databases use octanol/water partition coefficient (logP) as the only measure of small molecule interactions with lipid membranes, but the membrane compositions of individual cells and organelles can widely vary as it is being currently unraveled by findings from lipidomics (1). The membrane protein structural databases provide additional information not only about the position and the topology of the membrane proteins but also about the membrane-type localization (e.g. OPM (2), PDBTM (3), MemProtDB (4), TPML (5) or EncoMPASS (6)); however, the data about various molecule/membrane interactions are scattered among different sources. For example, DrugBank (7) covers logP and information about membrane transporters and carriers for many drug molecules, but it does not provide a measure for the assessment of penetration nor does it involve partitioning through individual membranes. Permeability of compounds through skin membranes is either present in EDETOX database (8) or scattered throughout literature, e.g. sources cited in supplemental information of reference (9). Similarly, the recently established PerMM database (10) covers only cellular permeability together with permeability prediction using an implicit membrane model with rigid compounds. Finally, molecular dynamics simulations are often used for predictions of membrane partitioning (11) or permeability even on a large scale (12,13). However, current theoretical predictions of molecule/membrane interactions vary by method as well as in comparison with data from experiments, lacking community benchmark comparison between individual methods.
To fill this gap, we have developed Molecules on Membranes Database (MolMeDB) as an open and up-to-date online manually curated depository of molecule/membrane interactions. MolMeDB contains over 3600 interactions described in the literature or obtained by our COSMOmicbased high-throughput calculations (14). In addition to listing the individual molecule/membrane interactions, we provide a simple tool for comparison of interactions between multiple methods and/or membranes. Using this information, it is possible to analyze the membrane behavior of the selected subsets of molecules. Examples of these analyses are provided as case studies to better illustrate efficient ways to extract useful knowledge from the MolMeDB database.

Data collection
To collect datasets of molecule-membrane interactions, a manual inspection of articles (15)(16)(17)(18)(19)(20)(21)(22)(23) and already existing databases (e.g. PerMM database (10)) with the focus on expressions like 'membrane partition coefficient', 'membrane permeability' or 'permeability coefficient' was performed ( Figure 1). Primarily, we focused on highthroughput experimental setups like black lipid membrane (BLM) (24), parallel artificial membrane permeability assay (17,25), Caco-2 permeability assay (22), liposomal fluorescence assay (20), n-hexane passive dosing (26) and polydimethylsiloxane-based permeabilities (19,27) that provide partition coefficients of compounds on a variety of natural and artificial membranes. Moreover, we have also collected resources from a broad variety of computational methods, e.g. molecular dynamics-based umbrella sampling approach, COnductor like Screening MOdel for Real Solvents (COSMO-RS) theory-based COS-MOmic calculations, or implicit solvent-based Permeability of Molecules across Membranes (PerMM) model. Those methods also differ in the level of approximation or force fields used to predict the compounds properties. This diversity within individual methods provides additional verification, especially in comparison to experimental data.
Neutral conformers of compounds were generated from SMILES with the LigPrep and MacroModel modules (Small-Molecule Drug Discovery Suite 2015-4, Schrödinger, LLC, New York, NY, 2016, https://www. schrodinger.com). Individual conformers of each compound were generated using the OPLS_2005 force field (28) in vacuum. Mixed MCMM/LMC2 conformational searches were performed to enable low-mode conformation searching with Monte Carlo structure selection. A maximum of 10 conformers were selected for further analysis if they were within 5 kcal/mol of the lowest energy conformer and, to reduce the number of similar conformers, had an atom-positional RMSD of at least 2 Å relative to all other selected conformers. Each selected conformer was subjected to a series of DFT/B-P/cc-TZVP vacuum and COSMO optimizations using Turbomole 6.3 (Turbomole V6.3 2011, http://www.turbomole.com) within the cuby4 framework (29). After each optimization step, single-point energy calculations at the DFT/B-P/cc-TZVPD level with a fine grid (30) were performed to obtain COSMO files for each conformer. The structures of COSMO.mic files describing bilayers were then obtained from fitting COSMO files of individual lipids to the bilayer structures obtained from free 200 ns + long molecular simulations from refs for DMPC (11), for DOPC and ceramide NS (31) and for stratum corneum mixture bilayers (32).
For each conformer/lipid sets we calculated free energy profiles using COSMOmic 15 (14) or COSMOmic/COS-MOperm 18 to obtain averaged free energy profiles. From those, information about membrane partitioning, permeability and affinity, central energy barrier and the position of drug at its energetical minima was extracted.

Database architecture
MolMeDB webpage is built with the combination of HTML5/CSS and PHP7 layouts running on Apache server. The database runs on MySQL ( Figure 2). The AJAX search engine allows search over names of compounds, datasets or SMILES. 2D structures are generated from SMILES using CDK Depict (33). 3D structure visualization is provided by LiteMol (34) over MOL files generated via RDkit (RDKit: Cheminformatics and Machine Learning Software. 2018, http://www.rdkit.org), or downloaded from PubChem (35), or DrugBank (7) databases, or uploaded by the user. DrugBank is used also for interconnection links to other databases. Free energy profiles of individual molecule/method/membrane sets, where available, are visualized using Chart.js JavaScript application (https:// chartjs.org). The potential of mean force (PMF) profiles data are stored as equidistant values spaced 1 Å for possible comparison between individual PMF profiles and they are interpolated from the uploaded data by Neville's algorithm of iterated interpolation.

User interface and database usage
To provide a user-friendly interface for the moleculemembrane interaction data, we developed a web version of MolMeDB database, freely accessible at http://molmedb. upol.cz. Interaction data can be accessed via browse or search functions. 'Browse' section ( Figure 3A) allows the user to scroll through the list of available compounds, membranes and methods used to describe the moleculemembrane interactions. 'Search' section ( Figure 3B) enables the user to search for a desired compound by its name or SMILES notation and to look up compounds measured/computed by individual methods and membranes. Dataset search allows the user to browse within a list of publications by title or authors' names. All data can be added into the Comparator tool (described below). MolMeDB web also includes 'Documentation' section ( Figure 3C) explaining the methodology, giving several examples and a tutorial for using the database. Finally, 'Statistics' section ( Figure 3D) keeps track of the number of entries and of interactions in subsets of individual methods or membranes.
The user can use the 'Search' section ( Figure 3A) to list all compounds matching the entry name. As example of 'xanthine' entry, six compounds were listed partially matching the given expression ( Figure 3). Among the listed entries, the user gets a quick overview of the compounds' properties, 2D structure and links to other chemical databases containing information about the molecule (e.g. Protein Data Bank (36), PubChem (35), ChEBI (37), ChEMBL (38), DrugBank (7)). Desired molecules can be added into the Comparator tool (see below). After selecting a compound, in our case 'xanthine', a purine-based molecule that serves as a parent compound for caffeine and its derivatives, the page is divided into the following three sections (Figure 4): 1. General info ( Figure 4A)-provides a description of compound properties like molecular weight along with links to other databases via the molecule's identifiers. The first section also shows a 2D image generated from SMILES via CDK Depict and a 3D structure generated by RDKit or downloaded from PubChem or DrugBank databases visualized with LiteMol. Figure 4B)-displays an interactive table with molecule-membrane interactions such as membrane/water partitioning (logK m ), permeability coefficient (logP erm ), free energy barrier in the membrane center ( G pen ), affinity toward the membrane measured by the energetical minimum ( G wat ) or the position of the interaction minimum for the molecule on membrane (Z min ) available for a combination of membranes and methods with variation where available. Charge of compound (Q) and temperature are specified for each molecule-membrane interaction. The user can then switch among individual methods to compare the measured/calculated properties. Source references are also listed in the Publication field. The desired data can be directly downloaded as a .csv table. 3. Free energy profile graph ( Figure 4C)-demonstrates the course of the free energy profile along the membrane normal with energetical barriers and minima between membrane center (0 nm) and water environment (3.5 nm). For an individual method, the user can switch among available membranes and directly visualize given energetical profiles.

Use cases-Comparator tool
Although interaction data for individual compounds are valuable as such, their comparison allows the use of the data for research within multiple scientific fields. For this purpose, we have embedded the Comparator tool that allows to gather molecule-membrane interaction data for multiple compounds from one or more methods and to compare them in order to visualize patterns within the data or to assess the validity of predictive methods.

Caffeine and its metabolites
Caffeine is a purine-based molecule, which is metabolized in a set of multiple-step reactions into a series of chemically modified compounds. In the first step, caffeine is metabolized to the following metabolites: theobromine (by enzymes CYP1A2, CYP2E1), theophylline (CYP1A2, CYP2E1), 1,3,7-trimethyluric acid (XO), 6amino-5(N-formylmethylamino)-1,3-dimethyluracil (CYP 1A2) and paraxanthine (CYP1A2) ( Figure 5) (39,40). Free energy profiles on DOPC membrane for this set of caffeine derivatives were calculated using COSMOmic 18. The nature of the chemical modification of the parent molecule caused different interactions of the metabolites with the membrane. Caffeine derivatives are metabolized in three distinct types of reactions: demethylation (theobromine, theophylline, paraxanthine), oxidation (6-   amino-5(N-formylmethylamino)-1,3-dimethyluracil, 1,3, 7-trimethyluric acid) and decyclization (6-amino-5(Nformylmethylamino)-1,3-dimethyluracil). Here we show that individual types of caffeine metabolites exhibit distinguishable changes in free energy profiles ( Figure 5). The lowest energetic barrier ( G pen ) is shown for caffeine as the parent compound, followed by all demethylated metabolites with an identical increase of penetration barrier by ∼3.4 kcal/mol. Oxidized and oxidized/decyclized products experienced an even greater increase of penetration barrier compared with the original caffeine molecule by 5.4 and 5.9 kcal/mol, respectively.
Overall, all products of caffeine metabolism show a hindered passage through the membrane core as evidenced by the increase in the penetration barrier energy ( G pen ). On the other hand, the affinity of all molecules toward the membrane ( G wat ) remained almost the same, which is in concord with their very similar logP values. Finally, all metabolites shifted their energetic minima (Z min ) toward the membrane/water interface by 2 Å.

Comparison of methods
The Comparator tool also allows a comparison of multiple methods/membranes with each other over selected compounds. Such type of comparison can be used to evaluate different theoretical approaches (e.g. PerMM, COM18) versus experimental data (e.g. BLM).
In this example, we show a comparison of permeability coefficient datasets obtained from theoretical PerMM and COSMOmic/COSMOperm 18 predictions and experimental BLM method for 101 neutral/unionized compounds on DOPC/generic phosphatidylcholine membrane. The user can reach the whole dataset for an individual Method-/Membrane via the Search tool and add it directly into the Comparator tool. Upon choosing the desired combination of multiple Method/Membrane options and charge of molecules, the data can be plotted in an interactive window ( Figure 6). Hovering a cursor over a data point shows the name of the compound and its plotted property. Figure 6 shows examples of permeability coefficients for Citric acid and Hydrofluoric acid. Linear regression fit was used here to determine the level of correlation between the two methods, obtaining the coefficient of determination (R 2 ) of 0.76 and 0.77 for COM18/BLM and PerMM/BLM respectively, while PerMM data show almost identical slope to the experimental data.

Discussion
MolMeDB is a unique, manually curated database on interactions of compounds with membranes. To date it contains more than 1200 compounds and 3600 molecule-membrane interactions obtained both theoretically and experimentally. MolMeDB stores multiple descriptors of molecule/membrane interactions and provides the tools for searching and browsing these data and their comparison. MolMeDB can prove to be a valuable resource for many research groups to benchmark the key data on molecule-membrane interactions, which are important in the fields of pharmacology, toxicology and molecular simulations. In the future, we plan to add further datasets and implement also the involvement of transporters and carriers. More complex statistical analysis within the selected datasets and further FAIRification of the data is anticipated in following versions of MolMeDB. We believe that MolMeDB can be a useful starting point, which can facilitate future studies devoted to a deeper understanding of biological roles of molecules on membranes and that it will attract the biological membrane community to establish a common ground for sharing open data in this field.

Availability and requirements
MolMeDB database is freely available at http://molmedb. upol.cz. The visualization of 3D structures with the LiteMol molecular viewer requires the browser to have WebGL enabled.