Recommendations for standardizing nomenclature for dietary (poly)phenol catabolites

ABSTRACT There is a lack of focus on the protective health effects of phytochemicals in dietary guidelines. Although a number of chemical libraries and databases contain dietary phytochemicals belonging to the plant metabolome, they are not entirely relevant to human health because many constituents are extensively metabolized within the body following ingestion. This is especially apparent for the highly abundant dietary (poly)phenols, for which the situation is compounded by confusion regarding their bioavailability and metabolism, partially because of the variety of nomenclatures and trivial names used to describe compounds arising from microbial catabolism in the gastrointestinal tract. This confusion, which is perpetuated in online chemical/metabolite databases, will hinder future discovery of bioactivities and affect the establishment of future dietary guidelines if steps are not taken to overcome these issues. In order to resolve this situation, a nomenclature system for phenolic catabolites and their human phase II metabolites is proposed in this article and the basis of its format outlined. Previous names used in the literature are cited along with the recommended nomenclature, International Union of Pure and Applied Chemistry terminology, and, where appropriate, Chemical Abstracts Service numbers, InChIKey, and accurate mass.


Introduction
The complexity of foods and the accompanying limited information on dietary phytochemicals in metabolomics and dietary guidelines has recently been recognized (1). Although a number of databases contain information on phytochemicals of the plant metabolome, they are not directly relevant to human health or metabolomics because many of these compounds undergo substantial metabolism within the body following ingestion. Chemical/metabolite databases such as Phenol-Explorer, FooDB, ChEBI, and PubChem, metabolomics data repositories including HMDB and MetaboLights, and mass spectra databases such as Mass Bank, ReSpect for Phytochemicals, Golm, and Metlin contain fragmentary information that is far from a comprehensive overview of human metabolites resulting from consumption of phytochemicals. Plant-derived dietary compounds have a much greater metabolomic complexity in vivo than is currently recognized and, as a consequence, the metabolomes derived from plant foods are undercharacterized. This is evident with the highly abundant dietary (poly)phenols, which are among the phytochemicals that are recognized as important regulators of human health (2,3). To further complicate matters, there are areas of confusion regarding the absorption and metabolism of dietary (poly)phenols, partially because of the variety of nomenclatures and trivial names used to describe compounds that arise in the gastrointestinal tract as the result of microbial catabolism and human phase II metabolism. This confusion is perpetuated in online chemical and metabolomic databases.
One of the current goals of (poly)phenol research is to establish dietary guidance including a recommended intake or number of servings of (poly)phenol-rich foods to promote optimal health or reduce disease risk (4,5). The aim of the proposed nomenclature standardization is to advance basic and applied (poly)phenol research, and associated methods such as metabolome pathway analysis tools (6)(7)(8), to provide evidence in support of observational studies (epidemiological evidence) that form the foundation for Dietary Guidelines (National Nutrition Monitoring and Related Research Act). Databases and pathway tools/systems operate using compound recognition, which requires standardized nomenclature and machine-readable identifiers such as InChIKey, and Chemical Abstracts Service (CAS) numbers. It is therefore paramount for (poly)phenol researchers to standardize nomenclature and reporting practices. In addition, there is a requirement for a reliable system of translation, permitting incorporation of pre-existing data described using older or nonstandard terminology into modern databases and bioinformatics tools such as PubChem and HMDB. This need to adopt universally accepted annotations, classification schemes and ontologies, facilitating data interoperability in metabolomics, has recently been emphasized (9,10).
Calls for standardizing reporting practices in (poly)phenol research are not novel. In 2015 a special article was published in The American Journal of Clinical Nutrition, in which recommendations were proposed on reporting requirements for flavonoids in research (11). This report focused primarily on standardization of reporting relevant to dietary assessment, food composition characterization, food administration, human and animal intervention study design, and analytical reporting. Subsequently, a joint International Union of Pure and Applied Chemistry (IUPAC)/International Union of Biochemistry and Molecular Biology working group published recommendations for general nomenclature of (poly)phenols and provided examples of acceptable trivial names, together with semisystematic and fully systematic names (12). However, both these publications focused on phytochemical nomenclature used to characterize secondary plant metabolites in foods and did not cover catabolites arising from the gut microbiome. The current special article provides standardized nomenclature and reporting practices for human metabolites of dietary (poly)phenols encompassing gut microbial catabolites and their human phase II metabolites, which were overlooked in the previous reports. Our aim is to eliminate inconsistencies in nomenclature/terminology, which create confusion and ultimately affects interpretation of data surrounding the biological effects of plant-based foods in epidemiology and nutrition intervention studies.

Standardization
A call for standardized nomenclature and reporting practices is timely because it will undoubtedly help researchers establish the untapped potential of the non-nutrient phytochemicals in food and diet to modulate human health. Phytochemical structural data need to be captured in ways that have future utility, and this will become of increasing importance because national funding bodies are beginning to require studies make their source data publicly available using open access data repositories (8,13). Progress is currently impeded by confusion in the nutrition and food science literature because of the previously mentioned inconsistent chemical nomenclatures applied to phenolic catabolites, which make it very difficult to integrate data for review and meta-analysis. The magnitude of this problem is clearly illustrated by the results of Web of Science Core database searches (February 13, 2020; https://clarivate.com/webofscien cegroup/solutions/web-of-science) using known variations in nomenclature for a common (poly)phenol catabolite as the search "topic." This revealed that even differences in hyphenation have significant impact on how many references may be identified when searching for a single compound. A search using our recommended name "3-(3 -hydroxyphenyl)propanoic acid" ( Table  1) yielded fifty four references, but its associated synonyms yielded additional references as follows: 3-hydroxyphenylpropionic acid (thirty references), 3-hydroxy-phenylpropionic acid (one reference), 3-hydroxy-phenyl-propionic acid (two references), and 3-(3 -hydroxy-phenyl)-propionic acid (three references). Similarly, a further search using the recommended name "3-hydroxy-3-(3 -hydroxyphenyl)propanoic acid" (Table 1) produced only one reference, whereas its synonyms 3-hydroxy-(3 -hydroxyphenyl)propionic acid yielded eight references, 3-hydroxyphenyl-hydracrylic acid three references, and 3hydroxyphenylhydracrylic acid one reference. The nomenclature system for phenolic catabolites provided herein will help clarify confusion in the nutrition literature and provide a harmonized approach for online databases.
Biological activity, or its absence, is a function of the 3D structure of a molecule. The goal of a nomenclature system is to describe this 3D structure unambiguously allowing translation of text to a 2D illustrated structure, in a manner as accurate as the available experimental data allow. It should be easy to use and understand, and accommodate multiple forms of isomerism, to permit distinction of isomeric structures with ease. There is, however, no perfect system capable of achieving this objective in a convenient manner that also accommodates all compounds in all situations. Accordingly, this article focuses on the commonly reported isomeric forms of (poly)phenol catabolites and their metabolites, recommending a system of nomenclature and also providing a thesaurus to facilitate translation between different systems of nomenclature. This is important because with many compounds, five and as many as ten synonyms and styles of nomenclature, some incorrect, appear in the nutrition literature (Table 1). In the proposed nomenclature, IUPAC-based names are used, but in instances in which IUPAC terminology is inconsistent (e.g., phenylpropanoic and phenylpropionic acids) or is overly complex and/or where alternative names have become accepted

5-(3-Methoxyphenyl)-4-hydroxypentanoic acid-4-sulfate [4-Hydroxy-5-(3,4-dihydroxyphenyl)-valeric acid-3-methoxy-4-sulfate]
(Continued)      138.06808 g/mol) 1 This Glucuronic acid-oxygen conjugations are of β-d-configuration. 3 Column 2 contains synonyms along with incorrect nomenclature that are in brackets. Both should be avoided to prevent further confusion in the literature and in online databases. 4 CAS Registry Number (CAS-RN) is a numeric identifier that can contain ≤10 digits, divided by hyphens into 3 parts, and each number is a unique numeric identifier that can link to information about the substance to which it refers, but it has no chemical significance per se (see https://www.cas.org/support/documentation/chemical-substances). Not all compounds have a CAS number, typically because they have a limited or no commercial source. 5 The IUPAC is described as the world authority on chemical nomenclature and terminology (https://iupac.org/who-we-are). 6 InChIKey: IUPAC standard textual unique chemical identifier. 7 Cinnamic acids have cis (Z) and trans (E) geometric isomers. In nature, the trans isomer is more common. 8 Alternative "nonprime" nomenclatre. 9 Compounds that occur as R-and S-isomers. 10 In rare instances, compounds appear to have 2 different InChIKey formulas in online databases which are generally associated with different CAS or registry numbers and possibly reflect uncharacterized isomeric configuration. For  Numbering of the phenolic ring and its side chain with the current practice (A) and the older less fashionable use of lowercase Greek letters for the side chain (B). and widely used (e.g., hippuric acids, phenyl-γ -valerolactones, phenylvaleric acids, and urolithins), modifications are used.
The phenolic compounds that are covered in Table 1 are those originating from acyl-quinic acids, ellagitannins, and the main dietary flavonoids-namely anthocyanins, flavones, flavonols, flavanones, and flavan-3-ols-as well as the avenanthramides and alkyl-resorcinols characteristic of the staple foods wheat, rye, and quinoa. It does not cover catabolites of minor dietary (poly)phenols or those with a limited distribution, such as isoflavones, stilbenes, phenylpropenes, lignans, and iridoids (oleuropeins). Catabolites of these (poly)phenols also require a standardized nomenclature, which is beyond the scope of this article.

Nomenclature of Phenolic Catabolites
(Poly)phenolic catabolites found in blood, tissues, urine, and feces can be easily subdivided using a shorthand system to describe their molecular skeleton by concisely defining the number of carbon atoms on the phenyl ring and its side chain and whether the side chain is unsaturated (i.e., containing double bonds) and/or contains nitrogen or substituents such as a hydroxyl group. In this shorthand system, C 6 identifies a phenyl ring with its 6 carbon atoms (Figure 1), and the number of carbon atoms in the side chain is defined using C 0 , C 1 , C 2 , etc., as necessarythat is, C 6 -C 0 , C 6 -C 1 , C 6 -C 2 , etc. The most extensively studied catabolites have a side chain with a terminal carboxyl group that is designated C-1. Further description of the molecule becomes more complicated because the literature contains trivial names, some of which are synonyms, such as acetic acid and ethanoic acid. For a C 6 −C 3 structure, the synonyms cinnamic, propenoic, and acrylic all signify an unsaturated C 3 side chain with a double bond between C-2 and C-3. The synonyms propionic, propanoic, and dihydrocinnamic acid denote a saturated C 3 side chain. Longer side chains include pentanoic (C 6 -C 5 ), also described as valeric, and the less common heptanoic (C 6 -C 7 ) and nonanoic (C 6 -C 9 ). The term phenyl-pentenoic identifies a C 6 -C 5 catabolite with an unsaturated side chain but does not specify the location of the double bond. This can be defined by identifying the lowestnumber carbon associated with it-for example, pent-2-enoic or pent-3-enoic-and, if possible, specifying whether it is of a cis (Z) or trans (E) configuration. The current practice for numbering the carbon atoms in these compounds is shown in Figure 1A, with the former less fashionable practice of using lowercase Greek letters shown in Figure 1B, and illustrates the proposed framework for simplifying phenolic metabolite nomenclature. Figure 1 also illustrates how the ring carbons can be distinguished from the side-chain carbons using a number bearing a prime-for example, 3 -rather than simply 3. This use of a prime is of value when otherwise a structure would contain two atoms that might reasonably be described as "3," as in 3-(3 -hydroxyphenyl)propanoic acid, in which 3 refers to the number of carbons in the side chain (counting from the carboxyl carbon, COOH = 1) and 3 refers to the position of substituent groups on the phenyl ring, with 1 being the carbon on the phenyl ring bearing the primary side chain. This use of prime numbers may not be essential by IUPAC convention, and seems superfluous to seasoned chemists, but it does not require an extensive understanding of nomenclature to facilitate its use. The system recommended here is designed to be easily accessible to nonchemists.
For C 6 -C 0 or C 6 -C 1 structures such as phenols and benzoic acids, the use of prime numbers is not required because the phenyl ring is the only site for substituent groups. This is illustrated in Figure 2 with the conversion of 3-(3 ,5 -dihydroxyphenyl)propanoic acid ( Figure 2B) to 3,5dihydroxybenzoic acid ( Figure 2C) and 3-hydroxybenzoic acid ( Figure 2E).
The advantage of the prime symbol is clear with a structure such as 3-(3 -hydroxyphenyl)propanoic acid, but one could argue it is not entirely necessary to use a prime number for 3-(4hydroxyphenyl)propanoic acid because there are only 3 carbons in the side chain and no risk of assigning a bond position twice. However, this can become confusing when there are ≥2 substituents in the ring, as in 3-(3 ,4 -dihydroxyphenyl)propanoic acid, or if 1 of the hydroxyls in the phenolic ring is a biological conjugate, as in 3-(3 -hydroxyphenyl)propanoic acid-4 -sulfate, a feature discussed more extensively later. Table 1 outlines the proposed and alternatively accepted nomenclature for reference and for integration of published works and databases. Examples of the proposed nomenclature are provided in column 1 of Table 1, with equally accurate but alternative "nonprime" nomenclature, often found in online databases, presented in column 2 (superscript 8), followed by common or trivial names, and, finally, in italics and brackets, the often reported nomenclature for phase II conjugates that is inaccurate because of the double assigning of carbon atoms, as further discussed later. The use of trivial names and inaccurate nomenclature should be avoided to prevent further confusion in the literature and online databases. Column 3 contains CAS numbers (https://www.cas.org/support/documentation/chemical -substances), and column 4 includes the IUPAC nomenclature followed in parentheses by standard InChIKey chemical identifiers (https://iupac.org/who-we-are) and monoisotopic mass values. Figure 3 illustrates how the numbering of the ring carbons might change following metabolism. For example, when the 2 -OH is removed from the 3-(2 ,4 ,5 -trihydroxyphenyl)propanoic acid ( Figure 3A) and the 3-carbon side chain is shortened, this produces 3,4-dihydroxybenzoic acid ( Figure 3B) because in assigning numbers to the carbons bearing hydroxyl groups, it is necessary to take the shortest route from the carbon bearing the carboxyl moiety. This also applies to 3-(3 -hydroxy-5methoxyphenyl)propanoic acid ( Figure 3C), in which the numbering of the hydroxyl is given priority over the methoxy group. When the hydroxyl is removed and the side chain shortened, the C 6 -C 1 product is 3-methoxybenzoic acid ( Figure 3D) rather than 5-methoxybenzoic acid. The terms hydracrylic acid and lactic acid describe isomeric saturated C 3 side chains bearing a hydroxyl, at C-3 for the hydracrylic acid side chain but at C-2 for a lactic acid side chain. These compounds are referred to as 3-hydroxy-3-(phenyl)propanoic acids and 2-hydroxy-3-(phenyl)propanoic acids, respectively. The alternative non-IUPAC names that are often used are 3-(phenyl)-3-hydroxypropionic and 2-hydroxy-3-(phenyl)propionic acids, respectively (Table 1, sections 6 and 8). For both of these types of compounds, there are R-and S-isomers determined by the orientation of the side-chain hydroxyl group.

Phase II Metabolites of Phenolic Catabolites
The correct designation of mammalian phase II conjugates where a hydroxyl has been conjugated with a sulfate, glucuronide, or methyl group during the metabolism of dietary (poly)phenols, often causes confusion. For example, if 5-(3 ,4dihydroxyphenyl)valeric acid is conjugated with a glucuronide moiety, the substrate effectively loses one hydroxyl and becomes 5-(3 -hydroxyphenyl)valeric acid-4 -glucuronide or 5-(4hydroxyphenyl)valeric acid-3 -glucuronide, depending on which hydroxyl is conjugated. It is essential to be able to distinguish such isomers. To use 5-(3 ,4 -dihydroxyphenyl)valeric acid-3glucuronide would be inaccurate and is not recommended nomenclature because the 3 position is assigned twice. However, in order to facilitate interpretation and translation of pre-existing publications and databases, such "double assignments" are shown, italicized and in brackets in column 2 of Table 1, because they are common in the literature. "Double assignment" is also encountered when catabolites are described using trivial names-for example, dihydrocaffeic acid, dihydrocaffeic acid-3-glucuronide, and dihydrocaffeic acid-4-glucuronide. This should also be avoided, and we recommend 3-(3 ,4 -dihydroxyphenyl)propanoic acid, 3-(4hydroxyphenyl)propanoic acid-3 -glucuronide, and 3-(3hydroxyphenyl)propanoic acid-4 -glucuronide, respectively. Again, for the purposes of translation, trivial names are shown in column 2 of Table 1.
It is common when describing conjugated molecules to define the nature of the atom to which the conjugating moiety is attached-for example, O-glucuronide, O-sulfate, and O-methyl. This is necessary in any situation in which alternative points of attachment might be encountered, but as far as the authors are aware, there are no reported human glucuronide or sulfate metabolites of dietary (poly)phenols that are conjugated to a carbon atom. Accordingly, the use of "O" is redundant, albeit not incorrect, but use of the lowercase "o" is incorrect. We also recommend that methyl conjugates be referred to as "methoxy-" rather than "O-methyl" derivatives-for example, 3-(3 -hydroxy-4 -methoxyphenyl)propanoic acid.
The issue of conjugation is often further complicated by commercial vendors describing synthetic reference standards using inaccurate nomenclature. For example, many companies and online databases present the phase II metabolite of 3-(3 ,4 -dihydroxyphenyl)propanoic acid as 3-(3 ,4 -dihydroxyphenyl)propanoic acid-3 -glucuronide. As noted previously, a bond position for (poly)phenols and phenolic metabolites should not be assigned twice in structural nomenclature. The accurate representation for the structure under the proposed nomenclature is 3-(4 -hydroxyphenyl)propanoic acid-3 -glucuronide. As another example, phenyl-γvalerolactone conjugates are often inaccurately annotated as 5-(3 ,4 -dihydroxyphenyl)-γ -valerolactone-4 -glucuronide, although the correct nomenclature is 5-(3 -hydroxyphenyl)γ -valerolactone-4 -glucuronide (14). The same applies to the naming of sulfate and methoxy metabolites. In addition, Greek symbols such as "α," "β," and "γ " may be presented as their textual equivalents "alpha," "beta," and "gamma" in some databases in which symbols may not be compatible with certain software or programs.

Isomers
Dietary (poly)phenols, their catabolites, and phase II conjugates can display multiple forms of isomerism, and because biological activity is a function of the 3D structure, it is important to discriminate between isomers and describe their structure clearly and unambiguously. It is recommended that after determining the general nature of a metabolite, the number of possible isomers is calculated before assigning a more precise structure. The possible number may be surprisingly high, and if there are insufficient data to allow unequivocal discrimination between the possible structures, the description applied must make this clear. There may be good reasons to eliminate some of these possibilities and favor others, and this can be made clear in discussion, but overprecise assignment must be avoided. Previous detailed recommendations on this topic, which we endorse, were provided by Sumner et al. (16).
It is easy to overlook just how many isomers there might be even without the added complication of conjugation and geometric isomerization, because there may be regio-and stereoisomers. Although perhaps not immediately obvious, there are ≥27 C 6 -C 3 compounds having two hydroxyls and an exact mass of 182.0579 Da (Figure 6), and although they are not all typical urinary catabolites, it is important to be able to distinguish between them because each of these structures could have very different biological activities. Furthermore, two stereo-isomers are also possible for the C 6 -C 3 phenylpropanoic acids, having a side-chain hydroxyl group at C-2 or C-3 ( Figure 6, structures 1-6 and 12-15). Racemization-that is, conversion of an Senantiomer to an R-enantiomer or the reverse-is possible at least for structures 1-3 in Figure 6 and is known to occur during mitochondrial β-oxidation of fatty acids, generally producing a small excess of the R-enantiomer (17). If it is uncertain which isomer is present, or if it is likely to be an unresolvable mixture, the structure can be described as, for example, 3R/S, and the structure can be drawn with the "wiggly" bond as shown in structures 1-6 and 12-15 ( Figure 6). The R and S configuration can be critical in determining the interaction of a compound with a protein, transporter, or enzyme and hence is critical for biological activity. As an example of the importance of this
It is clear that the multiplicity of isomers is a problem, not only of nomenclature but also for the identification of metabolites in biological samples, because without the use of appropriate reference standards, the potential for misidentification is considerable. As noted previously, commercial vendors can often be lax with regard to nomenclature, and it is relatively common for samples to be impure or even incorrectly described. Even with appropriate reference standards, exact identification may not be possible if isomers are not chromatographically resolved. In such cases, extreme caution should be used when describing the metabolite, and using "tentative identification" or "partial identification," rather than "identification," is recommended. A feature of the older literature that is still valid today when the exact structure of a particular isomer is not known is to use the empirical formula (C 9 H 10 O 4 ), a nominal molecular mass (182 Da), and a general name-for example, dihydroxy C 6 -C 3 metabolite. In this example, all the compounds in Figure 5 would be encompassed because no positional assignment is provided for the hydroxyl groups. In such circumstances, the metabolic standards initiative proposals of Sumner et al. (16) are of value. From the previous account, it is clear that the multiplicity of synonyms and variations in nomenclature can be confusing for even the seasoned biochemist, and standardization is essential.

Conclusion
The proposals made in this article will help establish a convenient, clear, and unambiguous nomenclature that is relevant to studies on microbiota-mediated breakdown of dietary (poly)phenols and their phase II metabolites. In publications, the use of fully characterized accurate names is recommended along with, whenever possible, ≥2 confirmatory identifiers taken from CAS numbers, InChIKey, and the IUPAC name, in addition to monoisotopic mass, because this allows others to locate the structure in question using recognized online databases. Furthermore, where analytical reference standards are not available and precise identification has not been possible, the metabolite should be described as fully as possible and the assignment downgraded to "tentative identification" or "partial identification." The proposed standardization of nomenclature will be of value to researchers because national funding bodies are beginning to require studies make their source data publicly available using open access data repositories. It will also be of value in literature searches for meta-analysis/systematic reviews. In such circumstances, standardized nomenclature will undoubtedly help researchers establish the untapped potential of (poly)phenols to human health.