Interactive Software for Visualization of Non-Targeted Mass Spectrometry Data – FluoroMatch Visualizer

There are thousands of different per-and polyfluoroalkyl substances (PFAS) in everyday products and in the environment. Discerning the abundance and diversity of PFAS is essential for understanding sources, fate, exposure routes, and the associated health impacts of PFAS. While comprehensive detection of PFAS requires use of non-targeted mass spectrometry, data-processing is time intensive and prone to error. While automated approaches can compile all mass spectrometric evidence (e.g., retention time, isotopic pattern, fragmentation, and accurate mass) and provide ranking or scoring metrics for annotations, confident assignment of structure often still requires extensive manual review of the data. To aid this process, we present FluoroMatch Visualizer which was developed to provide interactive visualizations which include normalized mass defect plots, retention time versus accurate mass plots, MS/MS fragmentation spectra, and tables of annotations and meta-data. All graphs and tables are interactive and have cross-filtering such that when a user selects a feature, all other visuals highlight the feature of interest. Several filtering options have been integrated into this novel data visualization tool, specifically with the capability to filter by PFAS chemical series, fragment(s), assignment confidence, and MS/MS file(s). FluoroMatch Visualizer is part of FluoroMatch Suite, which consists of FluoroMatch Modular, FluoroMatch Flow, and FluoroMatch Generator. FluoroMatch Visualizer enables annotations to be extensively validated, increasing annotation confidence. The resulting visualizations and datasets can be shared online in an interactive format for community based PFAS discovery. FluoroMatch visualizer holds potential to promote harmonization of non-targeted data-processing and interpretation throughout the PFAS scientific community.


Introduction:
Innovation in the chemical industry often leads to better consumer products, 1-3 longer life expectancy, new scientific discoveries, and higher interconnectivity of society, to name a few benefits. There are revolutionary new prospects of technology 2 enabled by chemistry. These revolutionary benefits are actualized through a diversification of synthesized chemicals. Alongside the benefits of novel chemicals are the unknown health impacts of exposure 2,4 . Databases of chemicals which may be found in the environment range from 360,000 5 to over 800,000 compounds 6 , with additional compounds added annually. Determining which chemicals pose a health risk is challenging without an understanding of the persistence, bioaccumulation, fate and transport, transformation products, and toxicity of the compound. One major chemical class that is receiving scientific and regulatory attention is the expanding list of per-and polyfluoroalkyl substances (PFAS).
There are thousands of known PFAS, which combined with the fact that they are pervasive, 7-10 persistent, and are often noted as toxic, makes these chemicals an important target for comprehensive nontargeted analysis. The health burden associated with PFAS can depend on structure, and most studies on PFAS health effects primarily focus on perfluorooctanoic acid (PFOA) and perfluorooctane sulfonic acid (PFOS). Most PFAS bioaccumulate [11][12][13][14][15][16] and certain PFAS have been linked to high cholesterol 17,18 and triglycerides 19,20 , thyroid disease 18 , pregnancy-induced hypertension 21,22 , ulcerative colitis 23 , cancers 24,25 , and a weakened immune system [26][27][28][29] . While there has been increased regulatory 30,31 , academic, and public awareness, scrutiny, and action to limit PFAS exposures, novel PFAS chemicals often are outside the scope of these efforts [32][33][34] . To accompany the growing need for regulatory changes, tools, such as highresolution mass spectrometry, [35][36][37] are needed to provide the capability to rapidly assess the diversity, abundance, and health impacts of environmental exposures in a comprehensive fashion to better determine and understand emerging and legacy chemicals of concern.
Liquid chromatography high-resolution mass spectrometry (LC-HRMS/MS) utilizes universal chemical principles to determine chemical structure including chemical polarity, size, or charge (retention time), bond connectivity (fragmentation), molecular formula (mass and isotopic pattern), and chemical moieties / chemical class (fragmentation and spectral similarity) [35][36][37] . The specificity and universality of mass spectral evidence allows for non-targeted analysis: the analysis of thousands to hundreds of thousands of signals representing individual chemicals or structurally similar isobars or isomers. However, annotating chemical structure from mass spectral signal is often the bottleneck in nontargeted LC-HRMS/MS and even with sophisticated software, manual review of data is generally required for high confidence [38][39][40][41] .
For nontargeted PFAS data processing, we have previously released FluoroMatch Flow, FluoroMatch Modular, and FluoroMatch Generator which cover all steps of the PFAS data processing workflow. FluoroMatch Flow performs all steps in automatic fashion, FluoroMatch Modular can be used to annotate feature tables acquired using the users own peak picking and filtering algorithms, and FluoroMatch Generator can be used to develop PFAS MS/MS libraries from a few class-representative standards. While a low false positive rate has been shown for FluoroMatch Flow, using class-based fragmentation rules from standards, [42][43][44][45] annotations using fragment screening, homologous series detection, and other more non-targeted approaches are more susceptible to very high false positive rates. 45 Most features have modest to low confidence in the noted annotations and hence, manual validation is often required for the annotation of the majority of PFAS suspected in nontargeted high resolution mass spectrometry datasets.
To aid in manual validation of PFAS annotations, we introduce FluoroMatch Visualizer. This tool provides interactive, cross-linked, and cross-filtered graphics and tables to facilitate evaluation of annotations in complex data. Here we describe a suggested stepby-step workflow employing the tool, exemplifying this workflow using PFAS detected in Aqueous Film Forming Foam (AFFF).

Methods:
2.1. FluoroMatch Visualizer integration with FluoroMatch Flow / Modular FluoroMatch Visualizer was developed to work with outputs from FluoroMatch Flow or Modular 44,45 . FluoroMatch v2.6 currently outputs a csv file, which contains over 35 columns of information for each feature and includes confidence scores, SMILES structures, identifiers, names, formulaes, mass-to-charge ratios, retention times, peak area(s), mass defects, fragments and annotations, MS/MS files, and adducts. The code was modified to also output annotated MS/MS spectra combined across all MS/MScontaining files in a format readable by Microsoft Power BI. These are the only two inputs needed for FluoroMatch Visualizer.
2.2. Data-acquisition and data-processing An AFFF mixture was collected from a holding tank at a field site in 1999; this holding tank contained a mixture of legacy AFFF products. Therefore, characterization of this AFFF mixture can serve as a proxy for legacy AFFF-contaminated sites 46 . The sample was diluted 100,000 fold in 70:30 water:methanol (Fisher Scientific Optima® LCMSgrade). The diluted sample was injected four times for iterative exclusion informationdata dependent analysis (iterative MS/MS), with a 50 μL injection volume onto an Agilent 1290 Infinity II ultra-high-performance liquid chromatography (UHPLC) system connected to an Agilent 6545 quadrupole time-of-flight mass spectrometer (Q-TOF MS). Blanks were acquired every other injection for blank filtering. PFAS were detected in negative electrospray ionization mode. Data was acquired from m/z 100-1100, with MS/MS collision energy set to 0, 25, and 40 eV. Source parameters and further acquisition parameters for this dataset have been previously described 45,47 . 3. Results and Discussion:

FluoroMatch Workflow
FluoroMatch and FluoroMatch Visualizer were developed to be comprehensive, modifiable, user-friendly, and open source. The workflow starts with FluoroMatch Flow (or Modular), which can be used to directly process vendor files (Figure 1); the software is freely available at innovativeomics.com/software. FluoroMatch Modular uses the same annotation algorithms as FluoroMatch Flow but can be used to annotate feature tables generated using the users own peak picking and filtering algorithms/software (vendor or open source including Profinder, Compound Discoverer, MZMine and XCMS). FluoroMatch Flow, which covers the entire PFAS nontargeted data-processing workflow (Figure 1), uses MZMine, 48 MSConvert, 49 and in-house code using R and C# in the background to process the files. FluoroMatch Flow works for most vendor files including Waters, Agilent, Thermo, and Bruker. Together FluoroMatch Flow, FluoroMatch Modular, FluoroMatch Generator, and FluoroMatch Visualizer are termed FluoroMatch Suite, and this suite of software offers several workflows (Figure 1). After following the proper naming conventions outlined for this software program, users simply drag files onto the interface, choose blank filtering parameters and their preferred output directory, and click run. The software performs file conversion, peak processing (including peak picking, alignment, and gap-filling), blank filtering, compilation of annotation evidence, annotation, and assignment of confidence (Figure 1). Confidence assignments are based on an in-house scoring framework 44,45 (Figure 2) and the Schymanski scheme 50 . Comparison of FluoroMatch Flow with other software, including Compound Discoverer and EnviMass, shows unique advantages and disadvantages of various software. 42,43 Generally FluoroMatch Flow has similar or lower false positive rates among high confidence annotations and has similar or more comprehensive annotation when the same peak list is used, but vendor or open-source peak picking algorithms designed specifically for the methods / instruments deployed often have better peak picking coverage then built-in algorithms in FluoroMatch Flow. To increase coverage, users can use their own peak picking workflow optimized for their instrument and gradient, using FluoroMatch Modular for annotation, compiling evidence, and scoring; this workflow is currently recommended for expert users. FluoroMatch Modular works with any peak picking / processing software (both vendor and open source), which outputs a txt file with m/z and retention time information.
FluoroMatch Visualizer requires the desktop version of Power BI which can be freely downloaded. An online version is being developed for future use. Using the FluoroMatch Visualizer Power BI file, users upload their data-processing output file from FluoroMatch Modular or FluoroMatch Flow.  FluoroMatch includes a systematic scoring framework to communicate confidence for every single feature, alongside also reporting confidence levels via the Schymanski schema. All scores are shown alongside visual and descriptive definitions. * B--are features with fluorine containing fragments using a list of 777 Fluorine containing fragments for screening derived from standards, but the fragments observed are not common fragments with only one possible predicted formula.

FluoroMatch Visualizer Description and Possible Workflow
FluoroMatch Visualizer includes three interactive graphs: m/z vs retention time ( Figure  3B), normalized mass defects (Figure 3C), and MS/MS spectra ( Figure 3E). These graphs can be filtered by score and chemical series ( Figure 3A). Data is further summarized in two tables: annotated and scored features ( Figure 3F) and annotated fragments for fragment screening ( Figure 3D). For all tables and graphs users can drag over entries to visualize PFAS chemical structures, or substructures related to fragmentation. For this we developed a novel tool for interpreting SMILES within Power BI.
All graphs and tables can be filtered by user selected feature(s). For example, if a row in the feature table is selected then only that feature is displayed in the charts and vice versa. Cross-filtering is important so that all evidence for a feature, PFAS series, or other group of features, can be analyzed simultaneously. FluoroMatch Visualizer additionally includes tool tips, which provide further information when the user hovers over a feature or fragment in the MS/MS spectra. When hovering over a feature in graphs showing m/z vs retention time ( Figure 3B) or normalized mass defects ( Figure  3C), the chemical series, formula, SMILES structure, name/class, score, and x/y-axis values are shown. In the graph of MS/MS spectra (Figure 3E), any fragment annotation, potential SMILES substructure of fragment, fragment m/z, fragment intensity, and ppm error of any fragment annotation are shown. In addition, features aligned with presented fragments are presented, which is important when MS/MS spectra from multiple features are overlaid.
User workflows employing FluoroMatch Visualizer can be diverse, especially considering that using Power BI Desktop new graphs, variables, and tables can be designed and added by users familiar with the platform. For example, new columns can be added to tables containing information of interest, new plots, for example mass defect versus retention time can be added, and new splicers and filters can be developed. Here we describe a simple workflow for determining false positives and true positives, in section 3.3. this workflow will be exemplified using an AFFF mixture.
Step 1) Determine series with high confidence series By filtering features to only those having score "A" using the Score Filter ( Figure 3B) the most confident annotations are retained (~5% false positive rate). The remaining series containing one or more confident annotation can then be selected in the legend ( Figure  3B or Figure 3C). Then in the Score Filter all scores can be added back (no filtering) and now the user can view only those series with confident annotations. Confident series should be documented separately for future use.
Step 2) Determine false positives using retention time vs m/z plots Opening the annotated feature table (.csv) in excel from LipidMatch Flow (the file with FIN in the name), a new column can be added for comments and for flagging potential false positives or true positives. The file should be saved with a different name in excel format (.xlsx). Now, select the first confident series, and look for patterns in retention time vs m/z ( Figure 3B). Members of the same homologous series should follow a diagonal line. Outliers that do not fall along this diagonal can then be noted in the annotated feature table as likely false positives. The remaining members will now have higher confidence, given that they fall withing a clear retention time pattern even for features without MS/MS. This process can be iterated across all remaining high confidence series. Depending on the goals of the study this process can also be done for all series, including series without structural annotations, although this may be too time intensive. Because FluoroMatch is designed to define series using the mass defect plot (Figure 3C) no false positives will be observed for series in those plots.
Step 3) Determine false positives using MS/MS spectra For each series selected the MS/MS will be combined (summed) for all individual PFAS species for that series which have MS/MS acquired ( Figure 3E). Clear fragmentation patterns can be observed in this manner, with the most abundant and common fragments for this PFAS class readily noted. Any features with MS/MS but without a confident annotation (score of A) can then be elevated if they have the appropriate fragments and fall into the observed fragmentation pattern. Furthermore, false positives which do not have expected fragments can be noted for that class. This information can be noted in a separate column in the final table in excel, and flags for both retention time and MS/MS can be referenced to remove false positives which do not belong to the series.
Step 4) Discovery of PFAS structures without confident assignments After flagging potential false positives and true positives for confident annotations, then all other annotations with some PFAS MS/MS evidence can be screened. Using the same methodology as in Step 1 through Step 3, series with B+, B, or B-can be determined (and series with A's can be removed from this list) and false positives flagged. For certain features with a score of B+, B, or B-, structures will be assigned using in-silico approaches or fragment screening. MS/MS evidence should be carefully considered and benchmarked against literature or standards for validation of these structures. Furthermore, isotopic pattern matching and formula prediction (not included in the platform) can be used to validate formula.
Step 5) Discovering series and PFAS chemical classes using fragment screening In addition to investigating by predefined series and scores, series can be determined by filtering based on fragments which are most abundant across the entire dataset, and which represent chemical moieties of interest, using the Fragment Screening Filter ( Figure 3D). After these features are discovered, MS/MS evidence (Figure 3E) can be investigated to ensure these fragments are of high abundance for each feature, and then new series can be defined, or series can be selected and investigated based on containing these moieties.

Step 6) Discovery of PFAS series without MS/MS Evidence
Depending on the application, further validation of series which do not have any structural information but may be PFAS based on accurate mass can be performed. In this case using methods in Step 1 through Step 2 series with D+ and D can be investigated for retention time patterns (no MS/MS evidence exists for these). In addition, these series can further be narrowed down to only those series within the normalized mass defect where high confidence annotations were observed, as these are more likely to be PFAS (Figure 3C). One way to navigate these series is to sort the table (Figure 3F) by number of compounds in the series and look at series with many members, these series are likely to be more confident. Finally, as in Step 4, isotopic pattern matching, and formula prediction can be used to validate formula.

Application of FluoroMatch Visualizer to identify PFAS in a Legacy AFFF Mixture
Non-targeted data provide a wealth of information and in most datasets the majority of chemical features are not PFAS related. By looking at all features simultaneously, it can be difficult to determine patterns and discover potential PFAS series. Often thousands of features are observed (3,686 for the AFFF mixture after blank filtering), and no user can look at all the data and make sense of it simultaneously; for example, when all features are displayed the resulting plots show few clear patterns (Figure 4 and Figure  5).   Figure 5A) and for features that belong to high confidence series with one or more PFAS with a score of A ( Figure 5B). A specific region in the plot is where most PFAS can be found, and certain outliers of high confident series can easily be found (Figure 5B). Red highlighted areas show background noise, yellow highlighted area shows PFAS region of plot, and grey highlighting shows another potential non-PFAS series.
Because of the complexity and richness of non-targeted data, users need to look at a subsection of PFAS features and multiple filters are provided in FluoroMatch Visualizer to prioritize which group of features to investigate. Ideally the user would want to know which series have confident annotations and which series have no annotations, in order to evaluate the most confident annotations first. Filtering by score (Figure 6) allows users to determine which features to focus on based on the evidence provided, by selecting the series which contains the highest scoring features. A description of the scores can be found in Figure 2 and are described previously 45 . Briefly, features assigned as A (A, A-) contain class-based MS/MS fragments required for confident assignments, features assigned B (B+, B, and B-) have an in-silico or fragment screening match, features assigned C (C+, C, and C-) are in homologous series with B's or A's, and D's form homologous series or have accurate mass evidence consistent with PFAS but no MS/MS information; E's are the remaining features which are likely not PFAS and therefore can be generally ignored, drastically reducing the number of features needing manual validation 45 .
By filtering to retain only features with high-confidence scores (e.g., scores of A; Figure  6A), known series (about a 5% false positive rate) 44,45 can be readily viewed. In the case of the AFFF mixture, 18 PFAS series were determined, covering 62 individual species with a high-confidence score (Figure 6A). These series can then be expanded to all features which fall within the series (including lower confidence annotations or no annotations), in which case the coverage of species with likely annotations expanded to 257 individual species for the AFFF mixture ( Figure 6C). After establishing these series containing high-confidence annotations it is important to distinguish which members of the series are false positives via manual review.
Organizing features by series is essential for determining false positives, as clear patterns are observed for each series. If a member of a series falls outside of this pattern, it may be considered a false positive. Therefore, using the drop-down menu (or by selecting series in the legend), users can select one or more series to start determining true and false positives, as well as false negatives. Each member of a series should follow a clear diagonal pattern in m/z vs retention time plots as shown for only high-confidence annotations of perfluorosulfonic acids (PFSAs) (Figure 6B) and any outliers can be considered false positives ( Figure 6D). Furthermore, members of the series should also follow a clear trend in the normalized mass defect plots. Because the tool automatically assigns series by mass defect and nominal mass, the PFAS series will follow trends in these plots automatically and no manual removal of false positives is needed (Figure 6A and Figure 6C). Figure 6. By filtering to only include those series with the most confident annotations, true positives and false negatives can manually be determined using clear trends. Shown above are kendrick mass defect plots normalized to CF2 (KMD (CF2)) for annotations with high confidence (Figure 6A) and series containing one or more high confident annotations ( Figure 6C). As an example, for high confidence annotation, the series for perfluoroalkyl sulfonic acids (PFSA) are selected ( Figure 6A) and the retention time vs m/z plot is shown for this series (Figure 6B). Retention time vs m/z plots are also shown for all high confidence series' (Figure 6D).
Cross-filtering is especially important when looking at MS/MS spectra, as multiple MS/MS spectra can be overlaid. If certain members of a series MS/MS spectra are overlaid, for example by sorting and selecting them in the plots (e.g., as selected in Figure 6A for the PFSA series) or feature table, the trends for neutral losses and fragment ions become clear (Figure 7). Any MS/MS spectra which do not have fragments consistently observed in other features of the same series can then be labeled as potential false positives. Furthermore, assignments with low confidence may attain higher confidence by the user after surveying the MS/MS spectra overlaid with other members of the series. Fragment screening can also be achieved by annotated fragments from MS/MS spectra, which can aid in finding unknowns and identifying chemical series/classes. Common fragments which were annotated can be selected and all MS/MS spectra and features across all charts will be shown, which contain these annotated fragments. This can aid in grouping compounds which contain certain moieties. For example, [SO3F]and [SO2F]may be selected to find PFAS with sulfonic acid groups, and [CF3]and [C2F5]may be selected to identify carbon fluorine chains. PFAS containing sulfonic acid groups (Figure 8) and PFAS-containing pentafluorosulfide (Figure 9) were screened in this manner. One strategy to determine which fragments to use for fragment screening is to sort the annotated fragments by fragment abundance and then select those representing a moiety of interest. Fragments related to sulfonic acid and the PFAS carbon fluorine chain were the most abundant fragments when all fragments across all features were summed (automatically provided by FluoroMatch Visualizer) ( Figure 8A). Many potential false positives were determined using retention time vs m/z plots (Figure 8B), showing the importance of this visualization.
Fragment screening using the [SF5]fragment for pentafluorosulfide-containing PFAS showed that while the fragment was of low abundance across all features ( Figure 9B) 11 features were determined ( Figure 9A). When looking at all members of those series containing at least one feature with a [SF5]fragment, 83 features were determined ( Figure 9C and Figure 9D). For one series, many false positives (teal series containing m/z 257.0472, 307.0419, and 357.0414) were observed ( Figure 9D) and the entire series (blue) was also likely a false positive (no retention time order, data not shown).   (Figure 9B), and these fragments were used to screen for pentafluorosulfide containing series ( Figure 9C). Most species form clear trends in retention time vs m/z plots, with some obvious outliers ( Figure 9D).
Beyond fragment filtering and overlapping series MS/MS spectra, MS/MS spectra and features can be filtered by MS/MS file, in case certain files represent very different sample types, which may have different PFAS isomers. Furthermore, techniques using different MS/MS methods or parameters can be cross compared readily using filter by MS/MS file. Here we compared different acquisitions of iterative exclusion (IE), where repeated injections of a sample using a rolling exclusion list of ions previously selected is deployed (ensuring that each new injection has MS/MS acquired on new ions not previously selected). As shown in mass defect plots in the PFAS region filtered by each MS/MS file, new series and new members are discovered for each additional round of IE ( Figure 10A). 44,51 Note these plots are cumulative, showing all past acquisitions and new acquisitions. Furthermore, by summing and plotting the number of features and filtering by score, it's shown that the advantage dissipates for high confidence annotations after about the 4 th injection, meaning fewer injections are needed if high confidence annotations are the goal (Figure 10B-D).  (Figure 10C and Figure 10D). All scans represent any score A through E (Figure 10B), scans with PFAS MS/MS evidence represents scores of A through B (except B--) (Figure 10C), and confident annotations refer to those with scores of A and A- (Figure 10D).

Conclusion and Future Developments:
Nontargeted liquid chromatography high-resolution mass spectrometry (LC-HRMS/MS) provides evidence using universal chemical properties (polarity, functional groups, molecular formula, bond linkages and strength) for comprehensive characterization of molecules. Sifting through this rich set of evidence provided by LC-HRMS/MS, for structural characterization of the molecules, is a major bottleneck in nontargeted workflows. Our automated approach previously released for PFAS (FluoroMatch) incorporates mass defect, retention time order, exact mass matching, homologous series grouping, and fragmentation pattern searching (class-based fragment rules, common PFAS fragment screening, and structure to fragmentation in-silico approaches) to annotate features. While scores are provided to communicate confidence in the results, for most annotations, manual review is required. FluoroMatch Visualizer was developed to meet this need with interactive tables and charts, allowing the user to examine visually several lines of evidence for annotating a specific feature. Visual evidence includes kendrick mass defect type plots normalize to CF2 or other repeating unit type, m/z versus retention time plots, and annotated tandem mass spectra, while tables provide a wealth of other evidence for each selected feature. While this software currently covers PFAS, the framework can readily be adapted to other chemical series, and work is being undertaken to expand capabilities to polymers and lipids.
To investigate trends across PFAS, the user can narrow down the number of features to examine using multiple methods. One of the most useful methods is to select individual homologous series, automatically determined using nominal mass and normalized mass defect. When series are selected, all visuals, including MS/MS spectra, are updated to show all members of a series overlaid. Then patterns can easily be observed, and outliers determined. Tens to hundreds of series often exist, and therefore series can further be narrowed down easily by those containing high scores or certain characteristic PFAS fragments. Beyond series automatically assigned, PFAS can be determined in another nontargeted fashion, by selecting all features containing a specific fragment. In an application to a mixture of aqueous firefighting foams (AFFF), certain PFAS functional groups and structural characteristics could be determined, for example, sulfonic acid, alcohol, pentafluorosulfide, double bond, carbon-fluorine chain, and carboxylic acid containing species in this application.
Future developments will further improve the utility of this software. For example, statistics could also be performed in Power BI including boxplots, tests of significance, ANOVA, and volcano plots. The advantage of embedding statistics here is that the statistical graphs and tables would update depending on the features or series the user selects. Furthermore, this visual platform can be expanded to other applications being developed, including lipids and polymers. By using FluoroMatch Visualizer as part of FluoroMatch Suite, researchers can quickly cross compare results in a visual manner and share reports and data online in an interactive fashion. This will ideally aid in transparency and community-based validation of results in nontargeted PFAS analysis. The nontargeted community uses various methodologies and lines of evidence and this harmonization of reporting an data sharing can improve trust and comprehension in the field of LC-HRMS/MS PFAS analysis.