-
PDF
- Split View
-
Views
-
Cite
Cite
Mitja M Zdouc, Kai Blin, Nico L L Louwen, Jorge Navarro, Catarina Loureiro, Chantal D Bader, Constance B Bailey, Lena Barra, Thomas J Booth, Kenan A J Bozhüyük, José D D Cediel-Becerra, Zachary Charlop-Powers, Marc G Chevrette, Yit Heng Chooi, Paul M D’Agostino, Tristan de Rond, Elena Del Pup, Katherine R Duncan, Wenjia Gu, Novriyandi Hanif, Eric J N Helfrich, Matthew Jenner, Yohei Katsuyama, Aleksandra Korenskaia, Daniel Krug, Vincent Libis, George A Lund, Shrikant Mantri, Kalindi D Morgan, Charlotte Owen, Chin-Soon Phan, Benjamin Philmus, Zachary L Reitz, Serina L Robinson, Kumar Saurabh Singh, Robin Teufel, Yaojun Tong, Fidele Tugizimana, Dana Ulanova, Jaclyn M Winter, César Aguilar, Daniel Y Akiyama, Suhad A A Al-Salihi, Mohammad Alanjary, Fabrizio Alberti, Gajender Aleti, Shumukh A Alharthi, Mariela Y Arias Rojo, Amr A Arishi, Hannah E Augustijn, Nicole E Avalon, J Abraham Avelar-Rivas, Kyle K Axt, Hellen B Barbieri, Julio Cesar J Barbosa, Lucas Gabriel Barboza Segato, Susanna E Barrett, Martin Baunach, Christine Beemelmanns, Dardan Beqaj, Tim Berger, Jordan Bernaldo-Agüero, Sandra M Bettenbühl, Vincent A Bielinski, Friederike Biermann, Ricardo M Borges, Rainer Borriss, Milena Breitenbach, Kevin M Bretscher, Michael W Brigham, Larissa Buedenbender, Brodie W Bulcock, Carolina Cano-Prieto, João Capela, Victor J Carrion, Riley S Carter, Raquel Castelo-Branco, Gabriel Castro-Falcón, Fernanda O Chagas, Esteban Charria-Girón, Ayesha Ahmed Chaudhri, Vasvi Chaudhry, Hyukjae Choi, Yukyung Choi, Roya Choupannejad, Jakub Chromy, Melinda S Chue Donahey, Jérôme Collemare, Jack A Connolly, Kaitlin E Creamer, Max Crüsemann, Andres Arredondo Cruz, Andres Cumsille, Jean-Felix Dallery, Luis Caleb Damas-Ramos, Tito Damiani, Martinus de Kruijff, Belén Delgado Martín, Gerardo Della Sala, Jelle Dillen, Drew T Doering, Shravan R Dommaraju, Suhan Durusu, Susan Egbert, Mark Ellerhorst, Baptiste Faussurier, Artem Fetter, Marc Feuermann, David P Fewer, Jonathan Foldi, Andri Frediansyah, Erin A Garza, Athina Gavriilidou, Andrea Gentile, Jennifer Gerke, Hans Gerstmans, Juan Pablo Gomez-Escribano, Luz A González-Salazar, Natalie E Grayson, Claudio Greco, Juan E Gris Gomez, Sebastian Guerra, Shaday Guerrero Flores, Alexey Gurevich, Karina Gutiérrez-García, Lauren Hart, Kristina Haslinger, Beibei He, Teo Hebra, Jethro L Hemmann, Hindra Hindra, Lars Höing, Darren C Holland, Jonathan E Holme, Therese Horch, Pavlo Hrab, Jie Hu, Thanh-Hau Huynh, Ji-Yeon Hwang, Riccardo Iacovelli, Dumitrita Iftime, Marianna Iorio, Sidharth Jayachandran, Eunah Jeong, Jiayi Jing, Jung J Jung, Yuya Kakumu, Edward Kalkreuter, Kyo Bin Kang, Sangwook Kang, Wonyong Kim, Geum Jin Kim, Hyunwoo Kim, Hyun Uk Kim, Martin Klapper, Robert A Koetsier, Cassandra Kollten, Ákos T Kovács, Yelyzaveta Kriukova, Noel Kubach, Aditya M Kunjapur, Aleksandra K Kushnareva, Andreja Kust, Jessica Lamber, Martin Larralde, Niels J Larsen, Adrien P Launay, Ngoc-Thao-Hien Le, Sarah Lebeer, Byung Tae Lee, Kyungha Lee, Katherine L Lev, Shu-Ming Li, Yong-Xin Li, Cuauhtémoc Licona-Cassani, Annette Lien, Jing Liu, Julius Adam V Lopez, Nataliia V Machushynets, Marla I Macias, Taifo Mahmud, Matiss Maleckis, Añadir Maharai Martinez-Martinez, Yvonne Mast, Marina F Maximo, Christina M McBride, Rose M McLellan, Khyati Mehta Bhatt, Chrats Melkonian, Aske Merrild, Mikko Metsä-Ketelä, Douglas A Mitchell, Alison V Müller, Giang-Son Nguyen, Hera T Nguyen, Timo H J Niedermeyer, Julia H O’Hare, Adam Ossowicki, Bohdan O Ostash, Hiroshi Otani, Leo Padva, Sunaina Paliyal, Xinya Pan, Mohit Panghal, Dana S Parade, Jiyoon Park, Jonathan Parra, Marcos Pedraza Rubio, Huong T Pham, Sacha J Pidot, Jörn Piel, Bita Pourmohsenin, Malik Rakhmanov, Sangeetha Ramesh, Michelle H Rasmussen, Adriana Rego, Raphael Reher, Andrew J Rice, Augustin Rigolet, Adriana Romero-Otero, Luis Rodrigo Rosas-Becerra, Pablo Y Rosiles, Adriano Rutz, Byeol Ryu, Libby-Ann Sahadeo, Murrel Saldanha, Luca Salvi, Eduardo Sánchez-Carvajal, Christian Santos-Medellin, Nicolau Sbaraini, Sydney M Schoellhorn, Clemens Schumm, Ludek Sehnal, Nelly Selem, Anjali D Shah, Tania K Shishido, Simon Sieber, Velina Silviani, Garima Singh, Hemant Singh, Nika Sokolova, Eva C Sonnenschein, Margherita Sosio, Sven T Sowa, Karin Steffen, Evi Stegmann, Alena B Streiff, Alena Strüder, Frank Surup, Tiziana Svenningsen, Douglas Sweeney, Judit Szenei, Azat Tagirdzhanov, Bin Tan, Matthew J Tarnowski, Barbara R Terlouw, Thomas Rey, Nicola U Thome, Laura Rosina Torres Ortega, Thomas Tørring, Marla Trindade, Andrew W Truman, Marie Tvilum, Daniel W Udwary, Christoph Ulbricht, Lisa Vader, Gilles P van Wezel, Max Walmsley, Randika Warnasinghe, Heiner G Weddeling, Angus N M Weir, Katherine Williams, Sam E Williams, Thomas E Witte, Steffaney M Wood Rocca, Keith Yamada, Dong Yang, Dongsoo Yang, Jingwei Yu, Zhenyi Zhou, Nadine Ziemert, Lukas Zimmer, Alina Zimmermann, Christian Zimmermann, Justin J J van der Hooft, Roger G Linington, Tilmann Weber, Marnix H Medema, MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration, Nucleic Acids Research, Volume 53, Issue D1, 6 January 2025, Pages D678–D690, https://doi.org/10.1093/nar/gkae1115
- Share Icon Share
Abstract
Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in agriculture, engineering and medicine. Usually, the biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes known as biosynthetic gene clusters (BGCs). To share information about BGCs in a standardized and machine-readable way, the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard and repository was initiated in 2015. Since its conception, MIBiG has been regularly updated to expand data coverage and remain up to date with innovations in natural product research. Here, we describe MIBiG version 4.0, an extensive update to the data repository and the underlying data standard. In a massive community annotation effort, 267 contributors performed 8304 edits, creating 557 new entries and modifying 590 existing entries, resulting in a new total of 3059 curated entries in MIBiG. Particular attention was paid to ensuring high data quality, with automated data validation using a newly developed custom submission portal prototype, paired with a novel peer-reviewing model. MIBiG 4.0 also takes steps towards a rolling release model and a broader involvement of the scientific community. MIBiG 4.0 is accessible online at https://mibig.secondarymetabolites.org/.

Introduction
Many organisms are prolific producers of small molecules known as specialized or secondary metabolites (SMs). These molecules often show a diversity of potent biological activities, which have been leveraged for the development of numerous drugs (1,2). SMs are generally hypothesized to increase the fitness of the producing organism or its host. In microbes, the biosynthetic genes required for the production of an SM are co-regulated and frequently physically clustered in the genome, in a so-called biosynthetic gene cluster (BGC), and often transferred horizontally (3). BGCs, which by definition consist of two or more genes, encode the proteins/enzymes used in biosynthesis, resistance and regulation of SMs and are the object of ‘genome mining’ strategies that leverage analysis of genome sequence data for the discovery of (novel) metabolites (4).
Over the last decades, various methods using manually curated detection rules based on prior knowledge (5–7), and more recently, machine learning-based tools for genome mining have been developed (8–12). These tools rely on accurately curated and machine-readable experimental data for annotation, rule definition and training purposes. Unfortunately, machine-readable data are neither readily available from the scientific literature nor universally required by journals to be directly deposited in databases. While there are efforts to mine data from the literature using computational methods (13,14), these approaches currently often come with limitations when compared with human curators and may not be compatible with copyright laws. Therefore, manual data curation performed by researchers remains the gold standard for the generation of machine-readable data.
The largest manually curated resource on SM BGCs is the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) data repository (15). Initiated in 2015 and based on the MIBiG Data Standard, it now holds over 2500 hand-curated entries of experimentally validated BGCs and their products, alongside additional information such as biological activities and gene annotations. In rare cases, a single gene may be responsible for the biosynthesis of a natural product, such as a large non-ribosomal peptide synthases; these standalone genes are also entered into MIBiG due to their relevance to specialised metabolism. Conceptualized as an open data repository curated by and for the SM community, the MIBiG repository has seen three iterations of online community-driven data annotation and curation hackathons (also known as ‘annotathons’), with >250 participants from 33 countries (16,17). Despite its size, the MIBiG repository still only describes a part of the continuously growing known biosynthetic space, which motivates further efforts in curating and systemizing information on BGCs.
Here, we present version 4.0 of the MIBiG data standard and repository. Besides a thorough update of the underlying MIBiG data standard, we have substantially grown the number of available entries by initiating a large-scale community curation effort. In the first half of 2024, 267 contributors created 557 new entries and modified 590 existing entries in the scope of eight community annotathons (six general open events and two final data curation sessions with a more dedicated team). In this version of MIBiG, we focused on maintaining and further improving data quality in terms of completeness and accuracy. We encouraged contributors to fully complete entries before submission, which has significantly decreased the number of so-called minimum entries (entries with only the minimally required information) in the database. We also introduced a new peer-review model where modifications to entries are examined and approved by one or more volunteer expert reviewers, who can request corrections from data submitters. Additionally, we have established an initial prototype for efficient and standardized data submission, and during the annotathons we utilized a web interface (MIBiG Submission Portal) that allows for parallel, distributed data input featuring automated input validation. The latter refers to the tests that are performed by the submission portal itself to ensure the correct data types and formats are filled in. Together, these efforts further consolidate MIBiG as the leading database on experimentally characterized BGCs and prepare for the transition to a dynamic, rolling-release curation model.
Materials and methods
Rework of the MIBiG Data Standard
The MIBiG Data Standard (from here onwards, Data Standard) is the ‘blueprint’ of all allowed data in the MIBiG repository. It defines mandatory and optional data fields, allows the use of controlled vocabularies and automated validation and enables the organization of complex data in a consistent, human- and machine-readable way. In this update, we extensively revised the Data Standard to accommodate advances in SM research and to extend the scope and ease of (re-)use of covered (meta)data.
Literature references and evidence qualifiers
Previously, all literature references associated with a MIBiG entry were collected in a single block, making it difficult to locate the origin of specific experimental data. In this update, we reorganized the Data Standard such that each data category (e.g. biosynthetic information, compound details, etc.) has its own list of literature references. Furthermore, evidence qualifiers can be selected from a controlled vocabulary (e.g. ‘heterologous expression’) that concisely summarizes the experimental support for the claims. While newly added entries adhere to these changes, entries added in previous versions of MIBiG still follow the legacy format, and will be updated gradually over time. To summarize the data quality of an entry concisely, we also introduced a ‘Quality’ identifier, and it is possible to filter entries based on high, medium or questionable quality of data. Note that this label only reflects the presumed data quality of an MIBiG entry and does not address the quality of the underlying literature.
Biosynthesis information, multiple loci and class updates
Biosynthetic information is now organized in a ‘biosynthesis’ section, tracking biosynthetic types, modules, operons and newly introduced ‘biosynthetic path’, which allows contributors to describe cases where a single BGC can lead to multiple products or describe sub-clusters of genes that produce building blocks. The ‘multiple loci’ system has been re-introduced, allowing the specification of satellite genes or gene clusters that are involved in the biosynthesis but are not clustered with the ‘main’ BGC. Nevertheless, we still require that multiple biosynthetic genes are clustered in the same genomic region, to exclude non-clustered pathways. Furthermore, it is now possible to mark genes that are located within the boundaries of a BGC but do not partake in the biosynthesis, such as pseudo-genes or transposable elements. Additionally, we have separated biosynthetic classification from compound classification (e.g. we removed ‘alkaloid’ as a biosynthetic class) and introduced a custom biosynthesis-inspired chemical ontology for SMs (Supplementary Data 1, section 3.4) based on the work by Dewick (1). Furthermore, we have newly defined the non-ribosomal peptide synthetase Type VI (modular, non-condensation-domain peptide-bond-forming), extending the current classification (18).
Biological activity and resource integration
MIBiG also accepts additional BGC-related data. In this update, we have reworked fields registering the biological activity of BGC-associated SMs: activities are now considered properties of a specific assay, and a controlled vocabulary (Supplementary Data 1, section 3.3) is available for defining bioactivity in a reproducible way. Additionally, we have included an optional ‘Concentration’ field, allowing submission of both qualitative and quantitative bioactivity data. At the same time, additional metadata parameters increase the scope of the already extensive Data Standard, and as such MIBiG references external resources where possible. Newly introduced links include references to the Minimum Information about a Tailoring Enzyme (MITE) data repository for annotation of tailoring enzyme-encoding genes (19), and CyanoMetDB for compound information on cyanobacterial SMs (20).
Community mobilization and data curation
Inspired by the contributions made to MIBiG 3.0, we again sought participation from the scientific community. Following calls on social media, 398 researchers signed up to participate in a series of eight 3-h online annotation sessions, accommodating different time zones (Figure 1). This enormous interest posed organizational challenges in terms of coordination and communication, prompting us to develop a new model for community participation. Individual contributors were part of one or more Interest Groups that communicated using the MIBiG Slack (https://mibigannotathons.slack.com/) channel and were headed by Interest Group Coordinators: topic matter experts responsible for answering biosynthesis- and chemistry-related questions. Kanban-style boards (free version of Trello, https://trello.com/) were employed to coordinate work on entries. Data submission was performed using a MIBiG Submission Portal prototype, a bespoke web interface that uses validated fields for data processing (code available at https://github.com/nlouwen/submission-prototype). Several curators with relevant expertise volunteered to take Reviewer roles, focusing on assessing the quality of newly generated or modified entries using the newly introduced peer review system. Aimed towards further improving the quality and confidence of entries, Reviewers could leverage the Kanban-style boards (Figure 2) to request revisions of entries if errors were found. To facilitate data curation, we prepared extensive online documentation (Supplementary Data 1) and instructional videos, and trained Interest Group Coordinators and Reviewers for their roles in online meetings. Participants who made a significant contribution (defined as participating in at least two 3-h sessions or an equivalent time investment) were invited to be co-authors in the present publication.

General workflow of the MIBiG annotation process. Data are submitted by annotathon contributors (organized by expertise into Interest Groups) or independent submitters to the database from new experimental data or existing/recent literature. The entries are then assessed by reviewers and revised when needed. Finally, they end up in the online MIBiG repository and become accessible by querying them online on the MIBiG web page or via interoperable tools.

Architecture of the Kanban board used for the MIBiG annotathons. Every BGC would have its own ‘card’, where annotators with specific expertise could fill in and then check a specific part of its annotation. Once the checklist was complete, the card would move to review and, potentially, revision to repair any issues identified by the reviewers.

Quantitative overview of updates to the MIBiG database.in comparison with the previous version 3.1. Numbers in panel (a) refer to MIBiG entries, while numbers in panel (b) refer to individual compounds (a single MIBiG entry may contain more than one compound).
Results and discussion
Advancing the MIBiG data repository
In this iteration of the MIBiG annotathons, we put a greater emphasis on self-organization and facilitating motivated contributors to act independently and confidently when curating data. During the call for participation, researchers not only signed up to participate, but also contributed to assembling a list of recent publications associated with the biosynthesis of SMs. This initial effort yielded 552 publications supporting new entries and 266 publications for improvements of existing entries, which were used as a starting point for the curation process. Over the course of the annotathons, 267 contributors made a total of 8304 edits (e.g. adding an entirely new entry, adding biological activity to an existing entry, etc.), resulting in 557 new and 590 modified existing entries. With the present update, MIBiG now contains a total of 3059 entries, a 22% increase in comparison to MIBiG 3.0. Of these, 1655 entries are now associated with 3604 biological activities, and 2634 entries have 5002 associated chemical structures. However, 672 entries still lack chemical structures; hence, future efforts will include attention to improving this aspect, especially with regard to structural information for ribosomally synthesized and post-translationally modified peptides. Additionally, 7677 references and 8582 evidence qualifiers were provided, 171 biosynthetic paths were described for 110 entries and cross-references to 173 MITE and 93 CyanoMetDB entries were established. A summary of the changes in comparison to MIBiG version 3.1 can be seen in Figure 3.
Of the total 1147 contributed entries (557 new, 590 modified), 464 (40%) have been reviewed at the time of manuscript preparation. While all entries are available, those that are reviewed are highlighted in the MIBiG repository website to reflect the additional confidence. For applications using the MIBiG data where a high confidence level is required (e.g. machine learning applications), we recommend the use of reviewed entries only (the website facilitates filtering/sorting on this). We expect the ‘reviewed’ part of the MIBiG repository to grow continuously once we have transitioned to the MIBiG rolling release model, and over time, we aim to formally review all entries in the MIBiG repository.
Initiating the MIBiG rolling release model
The aforementioned efforts demonstrate the value of leveraging large community initiatives such as the MIBiG annotathons. We estimate that contributors volunteered ∼4000 h in curating and reviewing entries, an effort in time and expertise that could not be raised by any single research group. Besides expanding the MIBiG repository, the annotathons were appreciated for their community-building aspect, fostering communication and exchange of ideas in the SM research community. In addition, the interaction with other resources prompted improvements to these databases as well, e.g. when curators could not find matching entries for a structure in the NP Atlas, thus encouraging wider cooperation beyond MIBiG itself. The broad interest of the community motivated the planning of a ‘rolling release’ model of MIBiG. In addition to the biennial efforts that will lead to ‘major’ releases of MIBiG (e.g. the current v4.0, or the next major release v5.0), curators will be able to contribute new or modify existing entries on an ad hoc basis, leading to quarterly ‘minor’ releases (i.e. 4.1 and 4.2). Contributors will be able to correct bugs and add references at any time, instead of waiting for the ‘major’ release cycle to perform all edits at once. This new system is currently under development, and we invite the scientific community to participate in the discussion on how to structure contributions and governance (i.e. by communicating with the corresponding authors of this publication or using the MIBiG Slack Workspace https://mibigannotathons.slack.com). Furthermore, to facilitate future MIBiG updates and curation we encourage authors to release BGC sequence data during the publication submission and peer review process, or immediately thereafter, and to provide the respective accession details in the manuscript text.
In summary, we have conducted a large-scale community effort to make experimental data on SM BGCs freely accessible and machine-readable. As a resource created for and by the scientific community, the MIBiG repository is freely accessed on an entry-by-entry basis or can be downloaded and parsed in bulk. MIBiG 4.0 also serves as the stepping stone for creating the infrastructure to establish a Wikipedia-like model of continuous community curation. Such a decentralized organization will guarantee continuous development of MIBiG and help in including the next generations of scientists in the annotation and development process.
Data availability
The MIBiG repository is available at https://mibig.secondarymetabolites.org/. Files in JSON format following the MIBiG data standard (https://github.com/mibig-secmet/mibig-json) can be found on the MIBiG webpage (https://mibig.secondarymetabolites.org/download) and on the MIBiG Zenodo Community page (https://doi.org/10.5281/zenodo.13367755). Further materials are available on GitHub (https://github.com/mibig-secmet). All data are freely available with no restrictions for academic and commercial reuse under the OSI-approved CC BY 4.0 Open Source license (https://creativecommons.org/licenses/by/4.0/).
Supplementary data
Supplementary Data are available at NAR Online.
Acknowledgements
The following contributors are acknowledged for their outstanding efforts as Interest Group Coordinators (in alphabetical order): Chantal D. Bader, Constance B. Bailey, Lena Barra, Thomas J. Booth, Kenan A. J. Bozhüyük, José D. D. Cediel-Becerra, Zachary Charlop-Powers, Marc G. Chevrette, Yit Heng Chooi, Paul M. D’Agostino, Tristan de Rond, Elena Del Pup, Katherine R. Duncan, Wenjia Gu, Novriyandi Hanif, Eric J. N. Helfrich, Matthew Jenner, Yohei K. Katsuyama, Aleksandra K. Korenskaia, Daniel Krug, Vincent Libis, Roger G. Linington, George A. Lund, Shrikant Mantri, Kalindi D. Morgan, Jorge Navarro, Charlotte Owen, Chin-Soon Phan, Benjamin Philmus, Zachary L. Reitz, Serina L. Robinson, Kumar Saurabh Singh, Robin Teufel, Yaojun Tong, Fidele Tugizimana, Dana Ulanova and Jaclyn M. Winter.
Funding
M.M.Z. was supported by the NWO Grant KICH1.LWV04.21.013 and by the Horizon 2020 Grant 101000392; C.L. was supported by the NWO Open Science Project 'BiG-CODEC' No. OSF.23.1.044; C.D.B was supported by the German Research Foundation Grant No. 547394769; C.B.B. was supported by the University of Sydney Drug Discovery Initiative; T.J.B was supported by the Novo Nodisk Foundation Grant NNF22OC0078997; Y.H.C. was supported by the Australian Research Council Industry Fellowship IM230100154; P.M.D. was supported by the Hans Fischer Society; K.R.D. was supported by the UK Government Department for Environment, Food & Rural Affairs (DEFRA) Global Centre on Biodiversity for the Climate and by the United Kingdom Research and Innovation (EP/X03142X/1) and by the Horizon Europe Marie Skłodowska-Curie grant agreement No 101072485; N.H. was supported by the Indonesia Endowment Fund for Education Agency (LPDP) and National Research and Innovation Agency (BRIN) of the Republic of Indonesia (106/IV/KS/11/2023 and 41644/IT3/PT.01.03/P/B/2023) and by the the Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia (027/E5/PG.02.00.PL/2024); M.J. was supported by the UKRI Future Leaders Fellowship (MR/W011247/1); A.Korenskaia was supported by the Horizon Europe Marie Skłodowska-Curie grant agreement No 101072485; V.L. was supported by the ERC Starting Grant 101117891-MeDiSyn and by the Agence Nationale de la Recherche project ANR-22-CE44-0011-01 UMISYN; G.A.L. was supported by the Growing Health Institute Strategic Programme (BB/X010953/1; BBS/E/RH/230003B); S.M. was supported by the Department of Biotechnology (DBT), Government of India and by the National Agri-Food Biotechnology Institute (NABI); C-S.P. was supported by the EU project No. 101087181 (Natural Products Research at Latvian Institute of Organic Synthesis as a Driver for Excellence in Innovation); R.T. was supported by the Swiss National Science Foundation (SNSF), grant 212747; Y.T. was supported by the National Key Research and Development Program of China (2021YFA0909500) and by the the National Natural Science Foundation of China (32170080 and 32370026) and by the Shanghai Pilot Program for Basic Research - Shanghai Jiao Tong University; D.U. was supported by the Japan Society for Promotion of Science KAKENHI grant number 21K06336; D.Y.A. was supported by the São Paulo Research Foundation (FAPESP) research scholarship (grant 21/07038-0); M.A. was supported by the NWO Talent programme Veni science domain (VI.Veni.202.130); F.A. was supported by the UKRI Future Leaders Fellowship (MR/V022334/1); G.A. was supported by the USDA Evans-Allen Research Grant (222676); N.E.A. was supported by the National Center for Complementary and Integrative Health of the NIH under award number F32AT011475; H.B.B. was supported by the São Paulo Research Foundation (FAPESP) research scholarship (grant 21/08947-3); S.E.B. was supported by the National Science Foundation Graduate Research Fellowship (DGE 21-46756) and by the the University of Illinois Urbana-Champaign Illinois Distinguished Fellowship; C.B. was supported by the European Union Horizon 2020 research and innovation program (ERC Grant number: 802736, MORPHEUS); J.B-A. was supported by the Consejo Nacional de Ciencia y Tecnología (CONACyT) [735867]; K.M.B. was supported by the NWO Merian fund (Micro-GRICE); M.W.B. was supported by the United Kingdom Research and Innovation (UKRI) Biotechnology and Biological Sciences Research Council (BBSRC) funded White Rose Mechanistic and Structural Biology Doctoral Training Program (BB/T007222/1); L.B. was supported by the Horizon Europe Marie Skłodowska-Curie Actions Postdoctoral Fellowship funded by the European Union (Project chelOMICS - grant agreement No. 101066127); V.J.C. was supported by the Ministerio de Ciencia, Innovación y Universidades project RYC2020-029240-I; R.C-B. was supported by the scholarship SFRH/BD/136367/2018 by Fundaçao para a Ciencia e Tecnologia (FCT); G.C-F. was supported by the National Institutes of Health (NIH)/NIGMS K12 GM068524 Award as a San Diego IRACDA Scholar.; E.C-G. was supported by the HZI POF IV Cooperativity and Creativity Project Call; V.C. was supported by the Alexander von Humboldt-Stiftung (Ref: 3.5-IND-1199743-HFST-P) and by the Cluster of Excellence: Controlling Microbes to Fight Infection (CMFI-YIG (EXC-2124/1-09.029_0; H.C. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1A6A1A03044512, and NRF-2021R1A2C1010727); Y.C. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1C1C1004046 and NRF-2022R1A5A2021216); J.A.C. was supported by the Signals in the Soil program via UK Research and Innovation (UKRI; NE/T010959/1); K.E.C. was supported by the Chan Zuckerberg Initiative Foundation grant number CZIF2022-007203; M.C. was supported by the German Research Foundation (DFG) Grant No. 495740318; J-F.D. was supported by the Agence Nationale de la Recherche (ShySM grant ANR-24-CE20-7299-01 and by the EUR Saclay Plant Sciences-SPS (ANR-17-EUR-0007); L.C.D-R. was supported by the European Union's Horizon Europe programme under the Marie Skłodowska-Curie grant agreement No 101072485 (MAGic-MOLFUN); T.D. was supported by the European Regional Development Fund, Programme Johannes Amos Comenius project ‘IOCB MSCA PF Mobility’ no. CZ.02.01.01/00/22_010/0002733; M.dK. was supported by the European Union's Horizon 2020 research and innovation program (ERC Grant number: 802736, MORPHEUS).; A.Fetter was supported by the United Kingdom Research and Innovation (EP/X03142X/1) and by the Horizon Europe Marie Skłodowska-Curie grant agreement No 101072485; M.F. was supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI; A.Frediansyah was supported by the Fulbright Grant (PS00349981); A.Gavriilidou was supported by the Deutsche Forschungsgemeinschaft [398967434-TRR 261]; A.Gentile was supported by the Italian Ministry of Research (Grant DM60066); H.G. was supported by the Research Foundation–Flanders (FWO) under the scope of junior postdoctoral fellowship (1229222N); L.A.G-S. was supported by the CONAHCYT scholarship (#971765); N.E.G. was supported by the NIGMS R01-GM146224, NERRS NA22NOS4200050; C.G. was supported by the BBSRC (BB/V005723/2); J.E.G.G. was supported by the CONAHCYT scholarship (#1347411); S.G. was supported by the National Institutes of Health (NIH), Grant T32GM136583. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.; L.H. was supported by the NIH F31 1F31ES036421-01; T.Hebra was supported by the European Union's Horizon Europe research and innovation program under the Marie Skłodowska-Curie grant agreement No. 101130799; T.Horch was supported by the Novo Nordisk Foundation grant-number: CFB 2.0, NNF20CC0035580; M.I. was supported by the Italian Ministry of Research (Grant DM60066); E.J. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1C1C1004046 and NRF-2022R1A5A2021216); K.B.K. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1C1C1004046 and NRF-2022R1A5A2021216); S.K. was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-RS-2024-00408499) and by the National Research Foundation of Korea grants funded by the Republic of Korean Government (Ministry of Science and ICT) (NRF-RS-2024-00352229); W.K. was supported by the National Research Foundation of Korea grant funded by the Korea government (No. 2022R1C1C2004118); G.J.K. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1A6A1A03044512, and NRF-2021R1A2C1010727); H.K. was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT) Grants NRF 2018R1A5A2023127 and RS-2023-00211868; M.K. was supported by the Werner Siemens Foundation grant Paleobiotechnology; R.A.K. was supported by the NWO-XL (OCENW.XL21.XL21.088); A.T.K. was supported by the Danish National Research Foundation CeMiSt, DNRF137 and by the Novo Nordisk Foundation INTERACT, NNF19SA0059360; A.M.K. was supported by the U.S. National Science Foundation (CBET-2032243); A.Kust was supported by the Delta Stewardship Council Delta Science Program; S.L. was supported by the European Research Council under European Union's Horizon 2020 Research and Innovation Program ERC St grant 852600 Lacto-Be; S-M.L. was supported by the Deutsche Forschungsgemeinschaft LI844/11-1 and LI844/14-1; A.L. was supported by the European Union's Horizon Europe programme under the Marie Skłodowska-Curie grant agreement No 101072485 (MAGic-MOLFUN); M.I.M. was supported by the Conahcyt Mexico International PhD Studentship and by the Strathclyde University Global Research Scholarship; M.M. was supported by the Novo Nordisk Foundation (Grant NNF23OC0082881) and Innovation Fund Denmark (Grant 3141-00013A); Y.M. was supported by the Leibniz Association grant K445/2022; M.F.M. was supported by the São Paulo Research Foundation (FAPESP) research scholarship (grant 23/01956-2); C.M.M. was supported by the NSF GRFP (#DGE 2241144); C.M. was supported by the MiCRop Consortium (NWO/OCW grant no. 024.004.014); A.M. was supported by the Carlsberg Foundation (CF22-1239); G-S.N. was supported by the SINTEF internal projects, POP-SEP BiocatDB (102022750), SEP AGREE (102029187) and POS BIOINFO 2024 (102024676-14), European Union's Horizon 2020 research and innovation programme under Grant Agreement no. 101000392 (MARBLES), no. 101081957 (BLUETOOLS), and no. 862923 (AtlantECO); A.O. was supported by the Marie Sklodowska-Curie grant No. 101106349; B.O.O. was supported by the BG-21F, Ministry of Education and Science of Ukraine and by the 57/0009, National Research Fund of Ukraine (partial support); H.O. was supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231; L.P. was supported by the German Academic Scholarship Foundation; X.P. was supported by the NWO-XL grant OCENW.GROOT.2019.063; M.P. was supported by the Department of Biotechnology (DBT), Government of India and by the University Grants Commission (UGC), Ministry of Education, Government of India; M.P.R. was supported by the Spanish “Junta de Andalucía” project PROYEXCEL_00012; H.T.P. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1C1C1004046 and NRF-2022R1A5A2021216); S.J.P. was supported by the National Health and Medical Research Council GNT2021638, Australian Research Council Discovery Project DP230102668; B.P. was supported by the SECRETed EU Project Horizon 2020 (101000794); M.H.R. was supported by the European Research Council (ERC), European Union's Horizon 2020 Research and Innovation Program (grant agreement no. 865738); A.J.R. was supported by the Chemical-Biology Interface Training Grant (Grant T32-GM136629) and a National Science Foundation Graduate Research Fellowship (Grant DGE 21-46756); A.R. was supported by the ERC Advanced Grant 101055020-COMMUNITY; L.R.R-B. was supported by the Consejo Nacional de Ciencia y Tecnología (CONACyT) [757173]; L.Salvi was supported by the European Union's Horizon Europe programme under the Marie Skłodowska-Curie grant agreement No 101072485 (MAGic-MOLFUN); E.S-C. was supported by the PhD scholarships ANID N° 21231991; L.Sehnal was supported by the Horizon Europe Marie Skłodowska-Curie Actions Postdoctoral Fellowship funded by the European Union (Project NAfrAM - grant agreement No. 10106428); A.D.S. was supported by the Biotechnology and Biological Sciences Research Council-funded South West Biosciences Doctoral Training Partnership [BB/T008741/1]; T.K.S. was supported by the Novo Nordisk Foundation (Grant number: NNF22OC0080109); V.S. was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean Government (MSIT) (Grant No. NRF-2020R1A6A1A03044512, and NRF-2021R1A2C1010727); E.C.S. was supported by the Pathfinder Open 2022, a European Innovation Council (EIC) work programme that is part of Horizon Europe (grant agreement no. 101099528) and UK Innovation Funding Agency (UKRI) (reference no. 10062709).; M.S. was supported by the Italian Ministry of Research (Grant DM60066); K.S. was supported by the Swedish Pharmaceutical Society PostDoc stipend; A.B.S. was supported by the Swiss National Science Foundation (SNSF, 205320_219638); T.S. was supported by the Carlsberg Foundation (CF22-1239); J.S. was supported by the European Union's Horizon Europe programme under the Marie Skłodowska-Curie grant agreement No 101072485 (MAGic-MOLFUN); A.T. was supported by the Saarland University through the NextAID project; M.J.T. was supported by the Pathfinder Open 2022, a European Innovation Council (EIC) work programme that is part of Horizon Europe (grant agreement no. 101099528) and UK Innovation Funding Agency (UKRI) (reference no. 10062709).; T.T. was supported by the Carlsberg Foundation (CF22-1239); A.W.T. was supported by the BBSRC Institute Strategic Program grant (BB/X01097X/1); M.T. was supported by the AUFF (AUFF-E-2022-9-42); L.V. was supported by the European Union's Horizon Europe programme under the Marie Skłodowska-Curie grant agreement No 101072485 (MAGic-MOLFUN); G.P.vW. was supported by the ERC Advanced Grant 101055020-COMMUNITY; K.W. was supported by the MR/N029909/1, Medical Research Council, UK; S.E.W. was supported by the Novo Nordisk Foundation Postdoctoral Fellowship (NNF22OC0079021); T.E.W. was supported by the Natural Science and Research Council of Canada PGS-D scholarship; D.Y. was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Grant RS-2024-00440975); A.Z. was supported by the German Center for Infection Research (DZIF) (TTU 09.826); C.Z. was supported by the Austrian Science Fund (FWF) [10.55776/P 34036]; R.G.L. was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant program; T.W. was supported by the Novo Nordisk Foundation, NNF20CC0035580 and by the Danish National Research Foundation CeMiSt, DNRF137 and by the European Union's Horizon Europe programme under the Marie Skłodowska-Curie grant agreement No 101072485 (MAGic-MOLFUN).
Conflict of interest statement. J.H. and C.S.M. are employees of Corteva Agriscience. B.R.T. is a consultant for BioConsortia Inc. J.J.J.vdH. is member of the Scientific Advisory Board of NAICONS Srl, Milano, Italy and consults for Corteva Agriscience, Indianapolis, IN, USA. M.H.M. is a member of the Scientific Advisory Board of Hexagon Bio.
References
Author notes
The first two authors should be regarded as Joint First Authors.
Comments