DNA G-quadruplexes for native mass spectrometry in potassium: a database of validated structures in electrospray-compatible conditions

Abstract G-quadruplex DNA structures have become attractive drug targets, and native mass spectrometry can provide detailed characterization of drug binding stoichiometry and affinity, potentially at high throughput. However, the G-quadruplex DNA polymorphism poses problems for interpreting ligand screening assays. In order to establish standardized MS-based screening assays, we studied 28 sequences with documented NMR structures in (usually ∼100 mM) potassium, and report here their circular dichroism (CD), melting temperature (Tm), NMR spectra and electrospray mass spectra in 1 mM KCl/100 mM trimethylammonium acetate. Based on these results, we make a short-list of sequences that adopt the same structure in the MS assay as reported by NMR, and provide recommendations on using them for MS-based assays. We also built an R-based open-source application to build and consult a database, wherein further sequences can be incorporated in the future. The application handles automatically most of the data processing, and allows generating custom figures and reports. The database is included in the g4dbr package (https://github.com/EricLarG4/g4dbr) and can be explored online (https://ericlarg4.github.io/G4_database.html).

1 General overview

Intended and less-intended uses
g4dbr is an R package containing the Shiny app g4db that is dedicated to the creation, visualization, and reporting of curated circular dichroism (CD), 1 H-NMR, UV-melting and native mass spectrometry (MS) data from oligonucleotides. Although specifically developed for G-quadruplex forming sequences deposited in the PDB, g4dbr can be used with any nucleic acid sequence.
Users can either employ the app to visualize a database generated by g4db, visualize data pasted into a templated Excel file (provided in the package), and create/edit/complete a g4db database from data supplied in said template.
The long-term goal is to provide tools for the robust deposition of raw experimental data, and processed data derived from them, while allowing for easy and versatile visualisation and reporting.
Raw data pasted in the supplied Excel template can be deposited, and visualized in several ways, which are open to other scientists without the need for proprietary so ware. The approach is two-fold:

Templated .xlsx file deposition as is
Once pasted into the input template, the data can be deposited as is. It can then be explored natively in Excel or any open-source equivalent. The data is formatted in a non-ambiguous layout, provided it is properly labeled in the header cells.
The template is also amenable to so ware allowing header cell import/management, such as Origin, in which import scripts can be used.
Of course, the template can be natively imported in the g4db app. The advantages over Excel/Origin for this particular application are numerous in terms of both ease and speed of use (e.g. data filtering, automated figure plotting), and functionalities (e.g. peak labeling, normalization/calculation, selective data export). See the Main features section for more details.
Any data treatment and filtering performed within g4db is not saved into the input .xlsx file. To save this into a new or existing database file, the second approach must be used:

Rdata file
g4db allows exporting selected datasets into an RData (.Rda) file where the data is consolidated and all calculation has already been performed. This leads to faster figure display, smaller file size, and is amenable to host very large datasets (where Excel is limited in row numbers, which is particularly problematic for mass spectrometry data).
The downside of this approach is that it cannot be handled outside of R. Note, however, that g4db is not required to open and use the data, it can be natively loaded in base R, which is free and open source. To do so, use the load function, for instance below for a demo database provided in the package: load(system.file("extdata/demo_database.Rda", package = g4dbr ))

Extended scope
g4dbr includes a number of functionalities that will be described here within the context of their intended use, but that can be utilized outside of this scope, i.e.

Main features
Below is a list of the main features of g4dbr.
• Visualization of CD, UV-melting, 1  -Coded in R -Easy-to-export data tables (practical for standalone data treatment) -Import template easy to read in other so ware -Full code and experimental data hosted openly on GitHub

Workflow
For raw data import, the data must be pasted into a templated Excel file, then read in the importR module of g4db.
In this module, the data can be filtered, processed, and selected for writing into a database file (.Rda). The .Rda file can be opened in the database module for visualization and reporting purposes. It can also simply be loaded in base R for further processing or reporting steps that may not be possible in g4db.
Figure S169: Application workflow 2 Installation and setup

Running the app
Only one function must be called to use all functionalities from g4dbr:

g4db()
This function opens a Shiny app in either the currently used IDE (e.g. RStudio), or a web browser.
Other functions used in g4db are packaged in g4dbr, and can be used as standalone tools. Refer to the Other functions and reference files section.

Interface overview
The interface is divided in 3 tabs that can be selected at the top of the screen, and are used to accomplished specific tasks: • database, to visualize, report, and remove data from a database file.
• importR, to visualize and process raw data, and export all or part of it to a database file, • meltR, to visualize and treat UV-melting data, and export all or part of it to the a database (via importR).
The tabs make use of various sidebars, mainly to perform data importing, filtering, processing, exporting and reporting.

Figures and tables
In the main area of the interface are the figures and tables, within collapsible and closable boxes, letting the user select what data to display.
All tables are sortable and filterable to assist in exploring rich data sets, and find specific data points rapidly. The data is presented in long format, which makes it easier to filter through, and to map variables into figures, because each variable is contained in its own column. Columns can be selectively hidden, and some of the less relevant ones are hidden by default.
Data presented in figures and tables reflects the values given to the di erent filters. On the contrary, filtering the tables does not alter the figures, it is only a mean of accessing and/or exporting a subset of the data.
All tables can be exported as .csv, .xlsx, or in the clipboard. All columns will be exported, regardless of their visibility in the app.

Le sidebars and panels
Each tab has a sidebar on the le -hand side, which contains a number of tools for data importing, exporting, filtering, and formatting. This le sidebar is collapsible to release some space for figures and tables on smaller screens. Each tab has a specific and independent le sidebar, and the values from those le sidebar modifies the data for all the content of the tab (and almost always only this tab). Drop-down menus contain select all/deselect all buttons for quick data selection.
Given the amount of menus necessary for the meltR tab, a large portion is hosted in two collapsible and movable "hovering" panels.
The sidebar from the database and importR tabs, and a panel of meltR also contain a color palette selection menu, and submenu for certain palettes having variations ( Figure S170). The available palettes include: • The well known Brewer palettes that include qualitative, diverging, and sequential palettes, • Some discrete palettes from D3.js, a JavaScript library for producing interactive data visualizations (imported from the ggsci package), • Several palettes inspired by the colors used by scientific journals/publishers (NPG, AAAS, NEJM, Lancet, JAMA, JCO, etc.; imported from the ggsci package).
The selected colour palette is applied to all the figures of the tab, but does not a ect other tabs.

Consulting a database: the database tab
The database tab is dedicated to visualizing, exporting, and reporting on the data of a curated database file.

Database input
The data from a given database must be gathered in a single .Rda file generated in the importR tab. It contains five dataframes: one dedicated to the general oligonucleotide information (db.info), and the four other ones to each analytical technique (db.CD, db.NMR, db.MS, db.UV).
g4db extracts automatically all the data, but it can also be loaded in the global environment (i.e. without using g4db) using load(). For instance, to load the demo database, run: The global environment should now contain five dataframes that can be opened and worked with. When using g4db(), the data will be loaded in the package environment and will therefore not appear in the global environment.

Data loading
Upon opening the database file, the interface should be devoid of data. The first step is to import a database file: 1. Click on Browse in the Load section of the le sidebar ( Figure S171), 2. Select a .Rda file that has been prepared in importR Figure S171: Empty database view The General information and oligonucleotide selection table should now be populated by a list of the oligonucleotides for which the database file contains at least information data ( Figure S172-1).
The content of this table is controlled by a drop-down menu in the le sidebar, and by the oligonucleotide column filter (in that order) ( Figure S172-2). By default, all oligonucleotides are shown, but none are selected for analytical result display (to avoid wait times when the table content is changed).
Figure S172: Demo database loaded in the database tab: the general oligonucleotide information should be displayed (1). The visible oligonucleotides can be filtered in the table or from the dropdown menu in the le sidebar (2). The table (1), and other tables in g4db, can be exported (a), their column visibility changed (b), and their content sorted, filtered or searched through (c)

Data display
To start visualizing data, the oligonucleotide(s) of interest must be selected from the General information table, by clicking on one or several rows ( Figure S173-1). Clicking again on a row deselects it.
The CD, NMR and UV-melting data should now be displayed (Figures S173-2 and S174-1). By default, the data acquired for all bu er conditions (i.e. all cation + electrolyte) are shown, but it can be restricted to only certain bu ers, electrolytes or cations, using the menus from the le sidebar ( Figure S173-3). Individual cation and electrolyte selections supersede the bu er selection. For instance, if the bu ers "TMAA + KCl" and "Kp + KCl" are selected, but the "Kp" electrolyte is excluded, then only "TMAA + KCl" will e ectively be selected.
Note that the bu ers, electrolytes and cations are not a static list, but are automatically collected from the CD and UV-melting data. It is therefore important to keep their naming consistent across the entire database. Figure S173: Database data display: both oligonucleotides have been selected (1). Their data is displayed (2) but was bu er-filtered (3): only KCl-containing solutions are selected (a). Using the right sidebar, the CD data was panelled by oligonucleotide (b) The UV-melting data is displayed in two separate figures (S174-1): on the le is shown the raw data with the fit line, and on the right is processed data. Depending on whether the data was processed by non-linear fitting or not, the processed data will either be the folded fraction or the absorbance normalized in [0,1]. This allows to plot the data of highly stable or unstable species on the same figure as those for which the Tm could be determined.
To also display MS data (S174-2), the Plot MS button must be used S174-3. This avoid long refresh times when selecting oligonucleotides. Any change in the oligonucleotide, bu er, tunes, replicates, and m/z selections will only be e ective if the figure is replotted. Figure S174: Database data display: UV-melting (1) and native MS (2) display. To display the MS data, the Plot MS button must be used (3) For all these analytical methods, all data points are gathered in tables, collapsed by default. These data points can be sorted, filtered, and exported in .xlsx or .csv files, or copied in the clipboard ( Figure S172). Again, filtering data in the tables does not a ects the figures, only the le and right sidebars do.

General information
This table gathers all the general information on the deposited oligonucleotides. By default, the following variables are displayed: • Oligonucleotide name, preferably a PDB code where available • DOI, with a hyperlink that is automatically generated upon importing with importR • Submitted by, the initials or full name of the data submission author • Deposition date, which is generated automatically by g4db • Sequence, the 5' to 3' oligonucleotide sequence • Length, the number of nucleotides, generated automatically by g4db • Average mass and Monoisotopic mass of the oligonucleotide, generated automatically by g4db, and used for the native MS peak labelling • Extinction coefficient (260 nm), the molar extinction coe icient of the oligonucleotide (in M -1 cm -1 ), calculated automatically by g4db (via the epsilon.calculator) • Topology, a short user-supplied description of the oligonucleotide structure (e.g. parallel quadruplex) The fields hidden by default (nucleotide and atom numbers) are not of direct interest to the general user, but can be displayed using the column visibility button.
Importantly, this table is used to select the oligonucleotide for which the analytical data should be displayed, as shown in Figure S173. It is possible to quickly filter through entries by e.g. topology or length, to select all oligonucleotides falling in a given category.

Circular dichroism
The data is shown as points and lines, colored by bu er. The oligonucleotides are di erentiated by point shape.
The right sidebar contain the following settings: • normalized switch: choose to display molar ellipticities (as automatically calculated in importR; default) or raw data (i.e. in mdeg). • superimposition dropdown menu: choose to display all data superimposed (default), grouped in panels by oligonucleotide or bu er, or not superimposed at all. The data is gathered in the CD data table below, which can be sorted, filtered, and exported. The fields displayed by default are Oligonucleotide, Buffer, Wavelength (nm), CD (mdeg), and Delta epsilon (M-1cm-1). The other fields hidden by default can be displayed using the column visibility button.

1 H NMR
The data is shown as a line, colored by oligonucleotide, and is normalized so that all spectra will share the same y-axis range. By default, each spectrum is shown in its own panel. Peak numbers are shown above their peaks and linked by a segment.
The right sidebar contains some settings identical to the CD one (superimposition, scale, line size). In addition, it contains a chemical shi (ppm) slider to select the chemical shi range to display (default: 9.5-12.5 ppm).
The data is gathered in the NMR data table below, which can be sorted, filtered, and exported. The fields displayed by default are Oligonucleotide, Buffer, Chemical shift (ppm), and Intensity. The other fields hidden by default can be displayed using the column visibility button.

UV-melting
UV-melting data is plotted with points, and in the case of the raw data with an additional fit line.
The right sidebar contains some settings identical to some described above (point size, line size, line transparency). In addition, it contains a Temperature (K) slider to select the temperature range to display (default: 278-368 K).
The data is gathered in the UV-melting data table below, which can be sorted, filtered, and exported. The fields displayed by default are Oligonucleotide, Buffer, ramp, T (K), Folded fraction, and Absorbance. The other fields hidden by default can be displayed using the column visibility button.

Native mass spectrometry
There are two distinct plots to visualize MS data, i.e. one full scale and one charge-state focused, to better see the potassium adduct distribution.
In both cases, the data is shown as line, with labels to name the visible species ( Figure S175). By default, spectra are paneled by oligonucleotide (columns) and bu er (rows), which should typically lead to a single spectrum per panel. Peak labels appear above their corresponding peak. The focused plot displays the 5-charge state by default, but this can be changed by the user.
Besided a line size slider, the right sidebar of the full-scale plot contains: • m/z slider: select the m/z range to display (default: 800-2500 m/z).
• Tunes dropdown menu: select the tunes to display. The charge-state focused plot sidebar only contains a charge selection menu.
The data is gathered in the native ESI-MS data table above, which can be sorted, filtered, and exported. The fields displayed by default are Oligonucleotide, Buffer, Tune, Replicate, m/z, Normalized intensity, and Intensity. The other fields hidden by default can be displayed using the column visibility button.
The table may take some time to load given the large number of data points.

Report generation
Reports can be generated from the displayed data, either full (with traceability features, titles,. . . ), or SI (with minimal information to avoid redundancy when reports are collated into a supporting information document), in Word, pdf, and HTML formats, in a few simple steps:

Word formatting
The Word format uses a template file to define its appearance (i.e. the styles). This template file can be changed by the user to generate reports directly with the desired appearance, to avoid additional work outside of g4db.
The template is located in the markdown folder of the g4dbr package. To locate the template, run: system.file("rmarkdown/word-styles-reference.docx", package = "g4dbr") Then, modify the styles as desired. Local text modifications will not be taken into account.
It is also advised to back up this file in another location, because any new install or update will overwrite it.

Data deletion
It is possible to selectively remove data from the database, by oligonucleotide and analytical method, using the database.eraser function implemented within g4db.
Several oligonucleotides can be processed at once, if the same analytical methods to remove are selected. If all analytical methods are selected, the selected oligonucleotide entries will be entirely purged (including the general information).
In many cases, it is not good practice to ever delete data from a database. If the use of g4db lies within these cases do not use the data deletion tool as it permanently deletes data. Here, the data deletion tool was mostly provided as a mean to correct and update data cleanly, as the new data might not be written to the database if a duplicate record already exists. It is also a way to generate lighter, sub-databases for specific uses, by discarding all irrelevant entries.
By default, a new file will be generated, named Modified database-YYYY-MM-DD.Rda, where YYYY-MM-DD is the date of the day, so as to avoid accidental file overwriting.
To delete one or several entries: 1. Select the oligonucleotide(s) to delete from the dropdown menu in the le _sidebar (not from the general info table), 2. Select the methods for which the data must be removed, by flipping the switches on, 3. Click on Erase to a db file 4. Save the file (with a di erent name than the one in use) 5. Optional: load the new database file for verification and further use For more details on the database.eraser function, refer to the Other functions and reference files section.

Templated-Excel file
Before importing data into a database file using g4db, it is necessary to paste this data into a provided Excel template file. Once filled, this file doubles as a data repository that can be explored in other pieces of so ware. Note, however, that such files can become quite heavy (in particular with MS data), leading to very slow loading and saving times, and high memory use.
The Excel file is divided into seven tabs that contain raw data (UV, CD, NMR, MS), general oligonucleotide information (info), or peak labeling data (NMR and MS labels). It is essential to maintain consistency throughout the file to ensure that the data and labels are read and associated correctly: oligonucleotide, electrolyte, cations, tunes and replicate must be named identically across columns and tabs. If the data is to be appended to an existing database, the naming scheme must be extended to the new data. In particular, attention should be paid about capitalization (e.g. 'TMAA' vs 'tmaa' vs 'Tmaa') and typical name variants (e.g. 'Kp' vs. 'Kpi').
The template is installed with the package. Its location can be obtained by running: system.file("extdata/demo_input.xlsx", package = g4dbr ) A er adding data, do not save the file in this folder, as it would be overwritten by a package update, and deleted upon package removal.

Info
The first tab gathers essential data on the entries to submit ( Figure S176). Five fields must be filled, i.e.: • oligo, the name of the oligonucleotide, preferably a PDB code where available, • sequence, in the 5' to 3' direction, without spaces or dashes, • submitted_by the initials or full name of the data submission author, • DOI is the DOI of the paper linked to the PDB deposition. Paste the DOI only, and not a full link, which will be automatically generated by importR • Topology, a short user-supplied description of the oligonucleotide structure (e.g. parallel quadruplex).

Figure S176: Info template
All the other fields that can be seen in the corresponding tables in g4db are calculated automatically.

CD
The CD data must be pasted in two columns, below the header, with the wavelength in the first column and the ellipticity in mdeg in the second column ( Figure S177).
The oligonucleotide, buffer and cation names, the cuvette path length in cm, and the oligonucleotide concentration (in µM) must be supplied in the header rows.
For every new data set (new oligonucleotide/bu er/cation combination), the next two columns must be used and so forth. Even if the wavelength axis is the same, it must be specified again; this allows dealing with mismatched axes (see the right-hand side columns in Figure S177). Figure S177: CD template. Four spectra are shown. Note that one of the x-axis is mismatched

UV-melting
The UV-melting tab is the only one where three columns must be filled for each oligonucleotide/bu er/cation combination: • Temperature, is the solution temperature, in°C or K (importR determines which automatically), • Absorbance, is the absorbance of the solution, with or without blank subtraction (blank subtraction can be performed in importR) • Blank, is the absorbance of the reference blank solution to subtract, where necessary.
Besides the oligonucleotide, buffer and cation names, the header contains a replicate field, to increment when several experiments for the same oligonucleotide/bu er/cation combination are submitted. Figure S178: UV-melting template. Case where the data is already blank-subtracted The melting data must be pasted as is, in particular if both cooling and heating ramps are recorded successively. MeltR uses the changes in temperature (increase or decrease) from successive rows to assess whether it deals with a heating or cooling ramps, and eventually dissociates both for further processing.

NMR
The 1 H-NMR template follows the same principle as the CD one: two columns (per oligonucleotide/bu er/cation combination) for the chemical shi and intensity, and three header rows for the oligonucleotide, buffer and cation names ( Figure S179).

NMR labels
This tab is used to submit 1 H NMR peak labelling information ( Figure S180). The header structure is the same than in the NMR data tab. The first column must be filled with peak numbers, in any order, with the corresponding chemical shifts in the second column. The labels are handled as text, and therefore several numbers can be submitted for a single chemical shi value.
As a sidenote, it is possible to keep cells empty if a given peak number is in the list but there is no corresponding peak in the spectrum. This is practical when several spectra are being labelled and a common peak number list is used. Note that the peak list must be repeated for all spectra, even if they are identical. Figure S180: NMR labels template. Note that both oligonucleotides have completely di erent labellings Make sure to mirror the header from the NMR data tab, so that all spectra are labelled.

MS
The MS template shares the same structure as NMR and CD, with m/z and the intensity as columns one and two ( Figure S181). The intensity can be supplied normalized or not, it will eventually be normalized in importR. Two additional header rows must be filled: • tune, a short name identifying the MS parameters. The name must be linked to said parameters along the database file (e.g. publication, readme file). • replicate, a number to increment when several experiments for the same oligonucleotide/bu er/cation/tune combination are submitted.

Figure S181: MS template
It is advised to be relatively conservative with data-heavy spectra to cut on processing time in importR, e.g. irrelevant m/z ranges can be discarded. In case of doubt, everything can be kept at this stage and filtered later on in importR.

MS labels
This tab is aimed at providing the database with the nature of the species to label in the MS spectrum, not their m/z. It therefore di ers from the NMR label tab, where one must supply the chemical shift of each label.
The first column contains the charge state numbers, to label di erent charge states independently ( Figure S182). The second column contains the name of the species to be labelled, which must be supplied using the following syntax: M for the non-adducted oligonucleotide, MK for a single-potassium-adduct species, MK2 for a twopotassium-adduct species, and so forth (up to ten). Figure S182: MS labels template. Note the di erence in labelling between oligonucleotides and bu er.
Make sure to mirror the header from the MS data tab, so that all spectra are labelled.

Populating a database
Once the template file is ready, the data can be loaded in g4db, processed, filtered, and written into a new or existing database file. All of these steps can be performed in the importR tab, except for the UV-melting data treatment that is carried out in meltR (see the Importing UV-melting data: the meltR tab section).
Essentially, importR works just like database. The main window hosts the same data tables and figures than database (except UV-melting figures, which are in meltR, and the charge-focused MS plot), with the same functioning (data filtering, figure customization). In the same vein, the le sidebar also contains the filters and color palette selection menus. All these common features are described in the Interface overview and Consulting a database: the database tab sections, and will not be discussed below.
The key aspect of importR is that it is a selective database writing tool. In that context: • What you see is what you write to the database. Any data point filtered out (whether by oligonucleotide, bu er compositon, x-axis range), will not be written in the database file. • Duplicated data points (same technique, oligonucleotide, bu er composition, x-axis position,. . . ) are discarded. For instance, resubmitting data with a wider x-axis range will have the e ect of completing the database (without doubling the already existing points), but resubmitting corrected data on the same range might not replace the initial data. It is therefore better to first remove the erroneous entry (see the Data deletion section). • Individual oligonucleotides and analytical methods can be included or excluded from the database writing.

Template file input
The data is imported by selecting a file via the Browse. . . button in the le sidebar.

Data filtering and processing
Oligonucleotides are selected from the General information table. Further bu er composition filtering can be performed in the le sidebar.
The CD and NMR calculations (e.g. normalization, labeling) and plotting are automatically performed, without any user input. The MS data is processed and plotted when the plot MS button is clicked. Note that if the MS data is not plotted, it cannot be exported to a database.
Method-dependent filtering is performed in the corresponding right sidebars, as described for the database tab.

Importing UV-melting data: the meltR tab
The processing of UV-melting data is performed in meltR, a distinct tab from importR, to avoid overcrowding the interface and allow its use outside of the database frame.
The data is sourced from the template file loaded in importR, and once the data is processed in meltR it can be sent back to importR to include in the database. Note that the filtering of temperature range and bu er composition must be performed directly in meltR.
The use of meltR itself is described below.

Writing a database file
Once the data has been selected and properly filtered (including or not UV-melting data from meltR), it can be written into a database file in three simple steps: 1. Select a database file, either an existing one (to add new entries) or an empty one (to create a new database).
This file can be opened in the Export section of the le sidebar of importR, or from the database tab. In either way, the data can be consulted in the database tab. An empty file is available in the package, and can be found by running: system.file("extdata/empty_database.Rda", package = g4dbr ) 2. Select the methods to write to the database file, using the switches. The MS and UV-melting data must be generated to be exported. 3. Click on Write to db file. By default, the file will be named following the Database-YYYY-MM-DD.Rda template. Rename where necessary. If the database in use was generated the same day than the deletion operation, there is a risk of it being overwritten: make sure to name the new file with a di erent name. 4. Optional: load the new/updated database to verify that the import worked correctly.

Automated processing of UV-melting data: the meltR tab
3.5.1 Principle 3.5.1.1 Purpose meltR is an automated UV-melting data processing so ware. It determines the melting temperatures (T m ), ∆G 0 , ∆H 0 and ∆S 0 by non linear fitting, and converts the absorbances into folded fractions.
Folded fractions are a good way to assess to which extent an oligonucleotide is structured (1: all molecules folded, 0: all molecules unfolded), visually observe the T m (folded fraction = 0.5), and normalize the data of di erent samples (and therefore di erent absorbances) to a common y-scale. 3 For the non-linear fitting and the folded fraction calculation to work, the data must contain both a lower and higher baseline. 3 In other words, the oligonucleotide must not be too stable or too unstable. In such cases, meltR allows to normalize the data to [0;1] to at least bring all data to a common y-scale.

Data modeling: General model
In a melting experiment, changes in the solution temperature lead to changes in the amount of folded (decreases with increasing temperatures) and unfolded species (increases with increasing temperatures). The model relies on the expression of the measured absorbance A T as the sum of the absorbances from the folded (F) and unfolded (U) forms, weighted by their abundance expressed from the folded fraction θ T .
Herein, the absorbances measured at 295 nm were converted to molar extinction coe icient (in M -1 cm -1 ) using ε = A/lC, where l is a path length (in cm) and C the oligonucleotide concentration (in M).
The folded fraction is defined by . Assuming a simple two-state model F ⇔ U with an equilibrium constant K, θ can be expressed as: This leads to: T can be modeled as a linear function of the temperature, where a is the slope and b the intercept of these baselines: can be expressed by thermodynamic quantities of interest: ∆G 0 , ∆H 0 and ∆S 0 .
Note that in meltR, potential changes in heat capacity changes in the evaluated temperature range are not taken into account to avoid over-paramaterization. At the melting temperature: Which leads to: And therefore: Finally, K can be expressed as exp(− ∆H 0 (1− T Tm ) RT ), yielding:

Data modeling: Implementation and derived values
In meltR, the absorbance is converted to molar extinction coe icients before fitting with the following model: where epsilon is the molar extinction coe icient, T is the temperature (in Kelvin), P1 is ∆H 0 , P2 is the T m , P3/P5 and P4/P6 are respectively the origins and slopes of the baselines. The optimized parameters are summarized in the meltR tab, and can be later consulted in the database tab.

Workflow
The data is processed following this workflow: 1. Detection of the temperature unit, and conversion to Kelvin where necessary, 2. Generation of a unique id for each oligonucleotides, ramps, bu ers, and replicates combinations. From then on, all data is processed by id (in particular cooling and heating ramps are processed separately). 3. Blank subtraction, if blank data is submitted (can be turned o ), 4. Conversion of the absorbance data to molar extinction coe icient, 5. Determination and separation of the ramps (cooling and heating). The ramps are always processed separately. 6. Data selection from user input: oligonucleotides, ramps, bu ers, replicates, or individual id. Step 10 is only carried out for non-fittable data: 10. The ε values are normalized in the [0;1] range, to be displayed alongside folded fraction data (same y-scale).

Data loading and filtering
The data must be loaded from the Excel template into importR. All of the UV-melting data is automatically imported into meltR, regardless of the oligonucleotides selected in importR (to facilitate the standalone use). However, only the processed data for the oligonucleotides selected in importR is sent back to that tab.
The meltR interface has a slightly di erent organization than importR and database: the filtering of data to process is carried out in the hovering Filter panel ( Figure S187).
1. Where necessary, refine the temperature range (default: 276-363 K, or~3-90°C), 2. Select the oligonucleotides to process (default to all). It is possible to process several oligonucleotides at once. Remember however that, in the context of g4db, these di erent oligonucleotides need to be selected in importR to be sent to that tab. 3. Select the ramps (heating or cooling) to process (default: both). The nature of the ramps is determined automatically, and the ramps are processed separately. 4. Select the bu ers to process (default: all), 5. Select the replicates to process (default: all) 6. If the steps 2-5 do not allow to specifically select the desired data, it is possible to directly filter the data by id. Figure S187: The UV-melting data from the demo input, where the Kp+KCl bu er was filtered o The Filter panel can be minimized by clicking on the header.

Data fitting
This section can be carried out only for data that can be fitted. For non-fittable data, skip this section.
1. Click on the Plot derivative button, located in the le sidebar.
a. The Input data box will automatically switch to display ∆ε ∆T ( Figure S188) b. The Approximate Tm table is filled with the maxima from the derivatives, in the Fit box.
c. Artifactual points (e.g. caused by important local data variations) may lead to erroneous approximated Tm: increase the smooth window and click on the button again. If the results are still not satisfactory, continue anyway to step 2 (Figures S188 and S189).
Figure S188: First derivative data was obtained by clicking on Plot derivatives. Note the presence of artifacts at high temperature that will cause an erroneous initialization to the Tm for 1XAV-TMAA + KCl-heating-1 Figure S189: Tm initialization from first derivative data. Here, the second entry is erroneous and must be corrected either by increasing the derivative smoothing, or manually at the next step 2. Click on the Initialize fitting button, located in the le sidebar ( Figure S190).
a. The Fit box will automatically switch to the Fit initialization table. b. If step 1. was not satisfactory, manually correct the Tm.init variable. Correctly initialized Tm are critical for the success of the fitting process. The other initial fitting parameter values can also be modified. c. If desired, change the legend; by default it is the id Figure S190: Fitting initialization. All parameters are initialized. Note that the Tm initialization is being manually corrected 3. Click on the Launch fitting button, and the data will be processed and the result displayed in several figures and tables ( Figure S191).

Sending data to importR
To send data to importR for database edition: 1. If not already done, select the oligonucleotides to import in importR from the General information table of that tab, 2. Select whether the data was fitted or not with the select data switch, in the le sidebar, 3. Click on the send to importR button, 4. In importR, verify that the data has correctly been sent into the UV-melting data box. The oligonucleotide molar extinction coe icients at 260 nm are calculated using the nearest-neighbor model in its traditional format, 1,2 where ε i is the molar extinction coe icient (in M -1 cm -1 ) of the nucleotide in position i (in the 5'to 3' direction), ε i,i+1 is the extinction coe icients for doublets of nucleotides in positions i and i + 1, and N b is the number of nucleotides in the oligonucleotide.

Figure customization
To that e ect, it uses epsilondb, a database of reference ε 260nm contributions from the individual nucleobases, and couples of nucleobases (neighboring e ects): 4.2 mass.diet

Principle
The importR tab includes an optional mass spectrometric data reduction step, performed by the mass.diet function. It applies two di erent filters: • An m/z filter, which exclude all data points above or below a user-supplied m/z range, • An intensity filter, which excludes data points whose intensity is below a threshold. This intensity threshold is calculated as the mean intensity of a user-supplied m/z baseline range of length n, multiplied by a usersupplied coe icient.
When submitting several spectra, the intensity thresholds are computed for each individual spectrum to avoid issues with di erent signal-to-noise ratios.

Code
The code of mass.diet is contained in R/massdiet.R.
mass.diet requires that the data is formatted as a dataframe with the following columns: • mz, the m/z axis, • int, the intensity, • oligo, the oligonucleotide names, • buffer.id, the bu er name, • tune, the MS tune name, • rep, the replicate number The last four columns are used as grouping variables to calculate individual intensity thresholds.
The data is processed in three simple steps. First the m/z range filter is applied, then the intensity threshold is calculated for each spectrum from the average noise in the defined baseline, and finally the intensity thresholds are applied to their respective spectrum. If the user lets the coe icient to its default value, i.e. 0, no intensity filtering will happen.

Use
mass.diet can be used outside of g4db, provided the input data contains the above-mentioned columns.
Here, we will use the data from the demo input file. In g4db it is loaded as follows: library(readxl) library(hablar) wide.input <-read_excel(system.file("extdata/demo_input.xlsx", package = g4dbr ), sheet = "MS") Here, the 1250-1350 m/z region was picked for the baseline with a coe icient of 2, and the m/z was restricted to 1000-2000. This reduced the number of data points to 7% of its original value (from 1,268,904 to 98,998). That being said, mass.diet should be used conservatively and the size-reduced data must be inspected visually for excess removal.

Use
Below is an example for the demo database, for which the MS and NMR data will be removed for both entries.  To save the modified database, use the save function: Ion mobility data