Complementary environmental analysis and functional characterization of lower glycolysis-gluconeogenesis in the diatom plastid

Abstract Organic carbon fixed in chloroplasts through the Calvin–Benson–Bassham Cycle can be diverted toward different metabolic fates, including cytoplasmic and mitochondrial respiration, gluconeogenesis, and synthesis of diverse plastid metabolites via the pyruvate hub. In plants, pyruvate is principally produced via cytoplasmic glycolysis, although a plastid-targeted lower glycolytic pathway is known to exist in non-photosynthetic tissue. Here, we characterized a lower plastid glycolysis–gluconeogenesis pathway enabling the direct interconversion of glyceraldehyde-3-phosphate and phospho-enol-pyruvate in diatoms, ecologically important marine algae distantly related to plants. We show that two reversible enzymes required to complete diatom plastid glycolysis–gluconeogenesis, Enolase and bis-phosphoglycerate mutase (PGAM), originated through duplications of mitochondria-targeted respiratory isoforms. Through CRISPR-Cas9 mutagenesis, integrative ‘omic analyses, and measured kinetics of expressed enzymes in the diatom Phaeodactylum tricornutum, we present evidence that this pathway diverts plastid glyceraldehyde-3-phosphate into the pyruvate hub, and may also function in the gluconeogenic direction. Considering experimental data, we show that this pathway has different roles dependent in particular on day length and environmental temperature, and show that the cpEnolase and cpPGAM genes are expressed at elevated levels in high-latitude oceans where diatoms are abundant. Our data provide evolutionary, meta-genomic, and functional insights into a poorly understood yet evolutionarily recurrent plastid metabolic pathway.


Introduction
Each year, over 250 gigatonnes of atmospheric carbon dioxide is assimilated through photosynthesis, with effectively equal contributions from terrestrial plants and aquatic algae (Friedlingstein et al. 2022).This activity is essential for maintaining planetary climate homeostasis, supporting the entire Earth's ecosystem.
Carbon assimilated through photosynthesis via the Calvin-Benson-Bassham Cycle is diverted into multiple metabolic fates (Raines 2003).In plants, these fates include gluconeogenesis of glucose-6-phosphate directly in plastids (e.g.chloroplasts), which can then be used in leaf tissue for starch storage (Scialdone et al. 2013).Additional metabolites including fatty acids and lipids, amino acids, chlorophyll, and carotenoid pigments are synthesized directly in the plastid [Tanaka and Tanaka 2007;Bromke 2013;Maréchal and Lupette 2020;Bai et al. 2022 (Fig. 1A)].Many of these plastid metabolic reactions utilize pyruvate, or its adjacent metabolic precursor phospho-enol-pyruvate (or PEP), and are referred to collectively as the pyruvate hub (Shtaida et al. 2015).In addition, plant photosynthate is exported from the plastids to the cytosol for subsequent glycolysis and respiration in the mitochondria (Moog et al. 2015) or for transport to nonphotosynthetic tissue [Carrera et al. 2021 (Fig. 1A)].
Plants are classically thought to generate PEP and pyruvate through glycolysis in the cytoplasm, and then reimport these metabolites into the plastids [Fig.1A (Moog et al. 2020)].Alongside this, certain plants may synthesize pyruvate hub substrates directly from the Calvin-Benson-Bassham Cycle inside the plastid.This conversion is performed by two enzymes, plastid-targeted phosphoglycerate mutase and enolase (henceforth referred to as

A B C
Figure 1.Metabolic context and evolution of the lower half of the diatom plastid glycolysis-gluconeogenesis pathway.A) Schematic comparison of diatom and plant core carbon metabolism, adapted from Liu et al. (2022).This diagram highlight is the localization and functions of two enzymes in the lower half of glycolysis-gluconeogenesis (phosphoglycerate mutase, and enolase), whose localization to the chloroplast can connect endogenous enzymes in the Calvin-Benson-Bassham Cycle and pyruvate hub to create a complete glycolytic-gluconeogenic-gluconeogenic pathway.Abbreviations: GA3P, glyceraldehyde-3-phosphate; 1,3-PGA and 2,3-PGA, 1,3 and 2,3 bis-phosphoglycerate; Glu-6-P, glucose-6-phosphate; PEP, phospho-enol-pyruvate; RuBP, ribulose bis-phosphate; PGAM, phosphoglycerate mutase; cER, chloroplast: endoplasmic reticulum.B and C) Consensus MrBayes topologies realized with 3 substitution matrices (GTR, Jones, WAG) of a 163 taxa × 413 aa alignment of organelle-targeted enolase (B) and 105 taxa × 220 aa alignment of selected organelle-targeted PGAM1 enzymes from diatoms and their closest relatives (C).For clarity, green, red, mitochondria and chloroplast-targeted diatom sequences are indicated with labeled boxes.These topologies identify recent duplications and recruitments of respiratory glycolytic-gluconeogenic enzymes from the mitochondria to plastid in diatoms and their closest relatives.Branch lines correspond to the frequency with which a given branching relationship was recovered, with thick branches identified by all 3 substitution matrices.For ease of viewing, trees are rooted between bacterial and eukaryotic sequences.
Probing Diatom Chloroplast Lower Glycolysis | 3585 cpPGAM and cpEnolase), which allow the conversion of 1,3-bis-phosphoglycerate from the Calvin-Benson-Bassham Cycle to PEP [Fig. 1A (Raines 2003;Andriotis et al. 2010)].Both Enolase and PGAM have been shown experimentally to be fully reversible enzymes, with bidirectional functions that we henceforth refer to as glycolysis-gluconeogenesis, contrasting with glycolysis and gluconeogenesis to signify enzymatic activities in one direction only (Sutherland et al. 1949).Documented plant cpEnolase and cpPGAM enzymes are associated with non-photosynthetic tissues such as seeds and roots (Prabhakar et al. 2009;Fukayama et al. 2015;Troncoso-Ponce et al. 2018).Arabidopsis thaliana cpEnolase and cpPGAM knockout lines do not have substantially different phenotypes to wild-type lines under replete growth conditions (Prabhakar et al. 2009;Andriotis et al. 2010;Anoman et al. 2016), raising questions about their overall function.
Diatoms are a eukaryotic algal group that is distantly related to plants, with over one billion years of evolutionary separation between the nuclear and mitochondrial genomes of each species (Nonoyama et al. 2019;Strassert et al. 2021).In contrast to the primary plastids of plants, surrounded by 2 membranes and of bacterial origin, diatoms possess complex plastids surrounded by 4 membranes and derived from a eukaryotic red alga, which is likewise ancient (Nonoyama et al. 2019;Liu et al. 2022).Diatoms are extraordinarily successful in the modern ocean, comprising nearly half of total algal abundance, based on environmental sequence data such as that produced in the Tara Oceans expedition (Malviya et al. 2016;Behrenfeld et al. 2021).Diatoms are particularly abundant in high-latitude and temperate oceans (i.e. the North Atlantic, North Pacific, and Southern Oceans) that are characterized by stresses including low temperatures and elongated photoperiods [long days in the summer and long nights in the winter (Gilbertson et al. 2022;Joli et al. 2024)].Previous studies, particularly of the transformable coastal and mesophilic species Phaeodactylum tricornutum, have identified multiple strategies that allow diatoms to tolerate photo-stress, including complex inter-organelle metabolite trafficking (Bailleul et al. 2015;Broddrick et al. 2019;Smith et al. 2019) and extensive photoprotective capabilities (reviewed by Lepetit et al. 2022).These data are further supported by extensive environmental (metagenomic) sequence data such as those of the Tara Oceans mission.While the data from these studies relate fundamentally to different species of Phaeodactylum (i.e.open-ocean diatoms, including from polar habitats), they may allow us to understand how individual diatom chloroplast proteins function at ecosystem scales, as well as under laboratory conditions (Kazamia et al. 2018;Liu et al. 2022).
Diatom carbon metabolism differs greatly from that of plants (Kroth et al. 2008).Differences include the storage of sugars in cytoplasmic vacuoles (as chrysolaminarin) as opposed to plastidial starch, and the synthesis of most lipid groups (e.g.galactolipids and lipids produced in part of the triacylglycerol pathway) directly in the plastid (Zhu et al. 2016;Huang et al. 2024).Diatom plastids furthermore possess no known plastid hexose phosphate transporters, which in plants are implicated in plastidial sugar import in storage tissue.Diatoms are instead inferred to exchange sugars with the cytoplasm via triose phosphates only [Moog et al. 2020;Liu et al. 2022 (Fig. 1A)].The lower half of respiratory glycolysisgluconeogenesis (from glyceraldehyde-3-phosphate to pyruvate) in diatoms occurs in the mitochondria, as opposed to the cytoplasm (Kroth et al. 2008;Río Bártulos et al. 2018), and proteins for a complete plastid lower glycolysis-gluconeogenesis pathway, including encoded cpEnolase and cpPGAM proteins, have been inferred from sequenced diatom genomes [Kroth et al. 2008;Smith et al. 2012;Hippmann et al. 2022 (Fig. 1A)].As diatoms are unicellular and colonial species, plastid glycolysis presumably occurs in organelles that perform photosynthesis, contrasting with its predominant association with non-photosynthetic tissues in plants (Fig. 1A).
Here, we profile sequence datasets from cultivated and environmental diatoms, characterize P. tricornutum CRISPR-CAS9 knockout mutants, and measure kinetic activities of expressed enzymes, to infer possible functions of diatom cpEnolase and cpPGAM enzymes.We demonstrate that the genes encoding these enzymes arose from diatom mitochondria-targeted and respiratory isoforms in a common ancestor of all species, and are widespread across diatoms.This distribution contrasts with that in other algae and plants in which genes encoding lower chloroplast glycolysis have a sporadic distribution.We further show that the genes encoding these proteins are most highly expressed at high latitudes in environmental sequence data from Tara Oceans, and indeed, their expression is induced in Phaeodactylum in response to continuous light and low temperature.From Phaeodactylum knockout phenotypes, we present evidence that this pathway may have different functions in cells grown under continuous illumination as opposed to light-dark cycling, and at low compared to moderate temperature.We use mutant phenotypes and measured kinetic activities to propose metabolic functions of diatom cpEnolase and cpPGAM under different illumination and temperature regimes.Overall, our data position lower glycolysis-gluconeogenesis as a modulator of diatom plastid metabolic poise, providing insights into its physiological roles for photosynthetic organisms beyond plants.

Distribution and phylogeny of cpEnolase and cpPGAM across photosynthetic eukaryotes
To evaluate the occurrence of plastid-targeted glycolysis across the algal tree of life, we searched for plastid-targeted homologs of P. tricornutum and Arabidopsis thaliana Enolase and PGAM enzymes in 1,673 plant and algal species, considering genomes from JGI PhycoCosm, and transcriptomes from the MMETSP (Marine Microbial Eukaryotic Transcriptome Sequencing Project) and OneKp (One Thousand Plant Transcriptomes) initiatives (Keeling et al. 2014;One Thousand Plant Transcriptomes Initiative 2019;Grigoriev et al. 2021).Plastid-targeting sequences were inferred using both PFAM domain presence and the combined in silico predictions of HECTAR, ASAFind, WolfPSort, TargetP, and PredAlgo (Emanuelsson et al. 2007;Horton et al. 2007;Gschloessl et al. 2008;Tardif et al. 2012).An overview of the distributions of these proteins in different taxonomic groups of plants and algae is provided in Supplementary Data Set 1, sheet 1; a complete list of species with all inferred plastid-and non-plastid-targeted copies of these proteins is provided in Supplementary Data Set 1, sheet 2; and the complete OneKp ASTRAL topology labeled with the presence of detected plastid-targeted glycolysis proteins is provided in Supplementary Data Set 1, sheet 3.
Plastid lower glycolysis-gluconeogenesis was frequently inferred in diatoms, with 60/101 (59%) libraries with identified Enolase and PGAM sequences possessing plastid-targeted versions of each.A lower occurrence (22/69 libraries, 32%) was found amongst close relatives in the stramenopiles (e.g.pelagophytes, dictyochophytes) and other algae with secondary red plastids [cryptomonads, haptophytes; 25/94 libraries, 27% (Supplementary Fig. S1A)].Within primary plastid-harbouring lineages, only angiosperms were inferred to frequently possess plastid-targeted copies of both enzymes (47/537 libraries, 9%).Alongside the previously described Arabidopsis thaliana plastid lower glycolytic pathway, these species were found in a wide range of different plant families, with the greatest numbers in Lamiales (7 species with both plastid-targeted Enolase and PGAM), Asterales (5 species) and Malphigiales (4 species), potentially indicating recurrent origins of this pathway in higher plants (Supplementary Data Set 1, sheets 2 and 3).Notably, only 4/127 (3%) occurrences were inferred in primary green algae and none in primary red algae, suggesting that diatom plastid glycolysis does not derive from the secondary red chloroplast ancestor (Supplementary Fig. S1A).
Considering collection sites, diatom species with either plastid glycolysis enzyme typically derive from higher latitudes (mean absolute latitude 45.6°, standard deviation 13.5°, n = 81) than ones that possess neither (mean absolute latitude 38.9°, standard deviation 24.3°, n = 10; one-way ANOVA P = 0.19; Supplementary Fig. S1B).This difference was deemed to be significant for certain diatom groups (e.g.araphid pennate diatoms, Supplementary Data Set 1, sheet 1; one-way ANOVA P = 0.012), but was not observed in other algal groups.
Next, we explored the specific origins of P. tricornutum plastid Enolase and PGAM sequences from diatoms by building phylogenies of homologs obtained from other diatoms, the broader taxonomic group to which they belong, the stramenopiles, and two other algal groups, the cryptomonads and haptophytes.These lineages all possess plastids of secondary red endosymbiotic origin, surrounded by four membranes, which are likely to be closely related to one another (Strassert et al. 2021), but also contain nonphotosynthetic members (e.g.oomycetes in stramenopiles) which only possess respiratory (i.e.mitochondria-targeted) lower glycolytic enzymes (Río Bártulos et al. 2018).Single-gene trees were made for the conserved domains of all organelle-targeted Enolase and PGAM sequences from 289 cryptomonad, haptophyte, and stramenopile genomes and transcriptomes, plus all orthologs from 85 further genomes selected from across the tree of life, based on a previously defined pipeline (Supplementary Data Set 1, sheet 2 to 6).Full and trimmed versions of these alignments are provided in Supplementary Data Set 1, sheets 7 to 11; nexus format outputs in Supplementary Data Set 1, sheet 12. Figure 1, B and C show consensus MrBayes trees realized with GTR, Jones, and WAG substitution matrices for species with both identifiable plastid-and mitochondria-targeted orthologs of each protein.
The obtained topologies revealed multiple evolutionary origins for plastid Enolase and PGAM sequences from mitochondriatargeted (respiratory) enzymes, with diatom plastid isoforms typically having recent and/or diatom-specific evolutionary origins.Diatom cpEnolase sequences resolve in a well-supported clade with plastid-targeted enzymes from bolidophytes, dictyochophytes, and pelagophytes, which are sisters to diatoms in the stramenopiles (Río Bártulos et al. 2018;Nonoyama et al. 2019), followed by mitochondria-targeted proteins from these groups (MrBayes PP = 1.0 under all studied matrices, Fig. 1B), other photosynthetic (chrysophytes), and non-photosynthetic stramenopiles (oomycetes; MrBayes PP ≥ 0.95 under GTR and Jones matrices, Fig. 1B).This indicates a duplication and recruitment of the hostderived mitochondria-targeted protein to the plastid within a common ancestor of the diatoms, pelagophytes and dictyochophytes.A broader evaluation of cpEnolase distribution suggests further duplications and plastid retargeting of mitochondriatargeted Enolases in both the chrysophytes and cryptomonads (Supplementary Fig. S2).
The PGAM phylogeny revealed at least two closely related families of plastid-targeted diatom enzymes, both likely derived from host mitochondrial isoforms.The cpPGAM1A clade (typified by the P. tricornutum protein Phatr3_J17086) was closely related to mitochondrial-targeted proteins found across the stramenopiles (MrBayes PP = 1.0 under all studied matrices, Fig. 1C), followed by plastidtargeted proteins from chrysophytes and mitochondria-targeted oomycete proteins.Similarly, the cpPGAM1B (Phatr3_J51404) clade included mitochondrial-targeted proteins from pelagophytes and dictyochophytes (MrBayes ≥ 0.85 under all studied matrices, Fig. 1C), and plastid-and mitochondria-targeted enzymes from the chrysophytes (Supplementary Fig. S3).Further duplications and plastid recruitments of mitochondria-targeted PGAM proteins were again visible in the haptophytes and cryptomonads (Supplementary Fig. S3).
A final plastid-targeted protein annotated as PGAM in the version 3 P. tricornutum genome (Rastogi et al. 2018), hereafter termed PGAM2, was identified exclusively in diatoms, pelagophytes, and haptophytes (Supplementary Fig. S4), with limited homology to PGAM1 (BLASTp e-value >1.0 in pairwise protein-protein searches).Only PGAM1 contains an annotated phosphoglyceromutase active site (IPR005952) per InterProScan, while both PGAM1 and PGAM2 contain the same PFAM (histidine phosphatase, PF03000) per PFAMscan (Jones et al. 2014;Mistry et al. 2021).PGAM2 enzymes were predominantly mitochondriatargeted, with plastid-or dual-targeted isoforms amongst diatoms only identified in P. tricornutum (Phatr3_J37201, and a more divergent copy Phatr3_J47096) and three species in which it is inferred to have evolved independently (Supplementary Fig. S4).

Plastidial localization and expression dynamics of Phaeodactylum lower glycolysis enzymes
To confirm plastid localization of P. tricornutum cpEnolase and cpPGAM, eGFP-tagged copies of three proteins (Phatr3_J41515, cpEnolase; Phatr3_J17086, cpPGAM1A; Phatr3_J37201, cpPGAM2) were expressed in P. tricornutum Pt1.86 cells via biolistic transformation.The observed GFP fluorescence patterns were coincident with chlorophyll autofluorescence, consistent with in silico targeting predictions in each case and confirming plastid localization (Fig. 2A, Supplementary Fig. S5).We note that both cpEnolase (named per its previous annotation Phatr2_J56418) and cpPGAM1B (named Phatr2_J42857) have been independently localized with GFP to the P. tricornutum plastid in a separate study (Río Bártulos et al. 2018).
Next, we considered previously published experimental proteomic data of plastid-enriched and Phaeodactylum total cellular fractions, following Huang et al. (2024).Both cpEnolase and cpPGAM1A were detected in multiple plastid-enriched and total cell proteome samples, respectively forming 0.040% and 0.0033% of the mean plastid-enriched total proteome (Supplementary Fig. S6A).These abundances were analogous to the total abundances and plastid enrichment ratios found for other Calvin-Benson-Bassham Cycle and plastidial carbon metabolism enzymes, e.g.pyruvate kinase (Phatr3_J22404), ribose-5-phosphate isomerase (Phatr3_J13382), and ribulose-5-phosphate epimerase (Phatr3_J53395).Both mtEnolase (Phatr3_draftJ1572) and mtPGAM (Phatr3_J33839) were also detected in plastid-enriched fractions, which may relate to close associations observed between the Phaeodactylum plastid and mitochondria (Bailleul et al. 2015;Uwizeye et al. 2020).No other organelle-associated Enolase or PGAM enzymes were detected, including cpPGAM1B and cpPGAM2, suggesting that they are present at low abundances in the Phaeodactylum cell (Supplementary Fig. S6A; Supplementary Data Set 2, sheet 1).
We further considered the transcriptional dynamics of Phaeodactylum lower plastid glycolysis proteins, using a pooled Probing Diatom Chloroplast Lower Glycolysis | 3587 and ranked dataset of normalized microarray and RNAseq data to identify genes that are co-expressed with one another and which may perform linked cellular functions [Ashworth et al. 2016;Ait-Mohamed et al. 2020;Liu et al. 2022 (Supplementary Fig. S6B; Supplementary Data Set 2, sheets 2 to 3)].From these data, cpenolase and cppgam1A genes showed strong, positive coregulation to one another (r = 0.868, P < 10 −05 ), with cppgam1 the second most strongly coregulated gene to cpenolase across the entire Phaeodactylum genome.Other cpPGAMs showed much weaker coregulation to both cpenolase and cppgam1A, including cppgam1B (cpenolase r = 0.432; cppgam1A r = 0.473) and cppgam2 (cpenolase r = 0.490; cppgam1A r = 0.478; Supplementary Fig. S6B).The coexpression of cpenolase and cppgam1A and accumulation of both encoded proteins in the plastid suggest that they possess linked metabolic functions.
We then explored under what conditions cpenolase and cppgam1A genes are likely to be highly expressed, considering RNAseq (Supplementary Fig. S7, A to C) and microarray (Supplementary Fig. S7D) data (Supplementary Data Set 2, sheet 4).These data suggested that nutrient limitation does not directly induce the expression of the chloroplast-targeted glycolysis proteins, with the ratio of expression of genes encoding chloroplastto mitochondria-targeted copies of each enzyme either remaining unchanged in published nitrate limitation (nitrate reductase knockout) and iron limitation data [Supplementary Fig. S7, A  and B 19CL); and under 8C and continuous light (8CL).Relative expression levels were normalized against two housekeeping genes (rps, RNA polymerase subunit 1; tbp, Tata binding protein) that show invariant expression in response to light cycles in Phaeodactylum (Sachse et al. 2013).Error bars correspond to one standard deviation.*, significantly different expression levels, 1-way ANOVA, P < 0.05; **, the same, P < 0.001.
In contrast, we observed clear impacts of light quality and day length on plastidial glycolysis gene expression.In an RNAseq study of the effects of the Circadian cycle on Fe-limited and Fe-replete Phaeodactylum cells (Smith et al. 2016), a much higher ratio of plastid to mitochondrial enolase gene expression was identified in samples harvested 12 h post-illumination than other timepoints (Supplementary Fig. S7B; 1-way ANOVA P = 4 × 10 −05 ).From a similar meta-analysis of microarray data (Ashworth et al. 2016), cpenolase showed the greatest relative fold-change in RNA samples (1-way ANOVA P < 10 −05 ) collected between 8 and 16 h after the light onset, and were strongly suppressed following 30 min of white, red, green, and blue light treatment (Supplementary Fig. S7D).cppgam1A showed the same trends albeit with lower expression in normal light 10.5 h after the light induction period than 16 h (Supplementary Fig. S7D; Supplementary Data Set 2, sheet 4).Both cpenolase and cppgam1A genes showed strong suppression in microarray data obtained following 4 h compared to 30 min of dark incubation (1-way ANOVA, P 0.05) and 2-day dark incubation compared to 2-day high light treatment (1-way ANOVA P < 10 −05 ; Supplementary Fig. S7D), suggesting that these effects relate to light perception.
Finally, we performed quantitative RT-PCR (RT-qPCR) of cpenolase and cppgam1A genes from wild-type Phaeodactylum cells under different conditions (Fig. 2B).A total of 648 discrete Cp values were measured for two cpenolase and two cppgam1A RT-qPCR amplicons, alongside two reporter genes (RPS and TBP) under three conditions.These were RNA collected from late exponential-phase cells at the subjective day mid-points at 19 °C and 12-h: 12-h light: dark cycling (19C LD); the same time but for cells grown under 19 °C and 24-h continuous light (19C CL); and the same time but for cells grown under 8 °C and 24 h continuous light (8C CL), considering Tara Oceans sampling data (Supplementary Data Set 2, sheet 4).RT-qPCRs were performed using two RT-PCR amplicons for each gene and two normalization references (rps and tbp) previously shown to have invariant expression under Circadian cycles in Phaeodactylum (Sachse et al. 2013).
Both cpenolase and cppgam1A showed transcriptional responses to light and temperature, with different responses dependent on normalization reference.The expression of cpenolase was inferred to be increased in 19C CL relative to 19C LD when normalized to rps (fold-change: 2.31, 1-way ANOVA P = 0.028) although no difference was measured by normalization to tbp.In contrast, the expression of cpenolase was found to be significantly higher under 8C CL than 19C CL conditions considering both rps (fold-change: 2.74, P = 0.015) and TBP (fold-change: 4.17, P = 0.001; Fig. 2B).cppgam1A expression was inferred to be increased in 19C CL relative to 19C LD conditions normalized to both rps (fold-change: 6.33, P = 0.003) and tbp (fold-change: 2.50, P = 0.002); but was only inferred to increase in 8C CL relative to 19 CL conditions normalized to tbp (fold-change: 1.96, P = 0.005; Fig. 2B).In total, these data suggest additive effects of both continuous light and low temperature on Phaeodactylum cpPGAM1A and cpEnolase expression.

Environmental roles of diatom cpEnolase and cpPGAM inferred from meta-genomics
Next, we considered general patterns of transcriptional coregulation of diatom cpEnolase and cpPGAM sequences in environmental sequence data from Tara Oceans.First, we used a previously benchmarked pipeline, based on combined hmmer, reciprocal BLAST, and phylogenetic filtration (Liu et al. 2022) to identify Tara Oceans meta-genes that reconcile exclusively with plastid-targeted proteins from cultured diatom species, to the exclusion of non-diatom and non-plastid homologs (Supplementary Fig. S8A).Full and trimmed versions of alignments of these sequences against cultured species replicates are provided in Supplementary Data Set 3, sheet 1, and nexus format outputs of rAXmL best-scoring trees of these alignments in Supplementary Data Set 3, sheet 2. Amongst the retained meta-genes likely to be N-terminally complete (BLAST homology within the first 40 residues of a P. tricornutum sequence), a majority have consensus plastid-targeting sequences (Enolase: 38/78-49%, PGAM: 58/97 -60%).Only a very small number (one Enolase, 10 PGAM) possess mitochondrial or endomembrane localizations, suggesting that they principally correspond to plastid-targeted environmental homologs of each protein (Supplementary Fig. S8B).
Within Tara Oceans data, the greatest relative abundances of diatom cpEnolase and cpPGAM1 meta-genes were observed in meta-transcriptome (metaT) data in stations from both high northern and southern latitudes (Fig. 3).We observed these trends concordantly in both surface and deep chlorophyll maximum (DCM) samples from 0.8 to 2000 µm size filtered (Fig. 3A), and in individual size fractions (0.8 to 3/5 µm, 5 to 20 µm, 20 to 180 µm, 180 to 2000 µm (Supplementary Fig. S9), suggesting broad reproductibility across diatoms independent of cell size and depth.These levels were notably greater than equivalent levels in metagenome (metaG) data (Fig. 3B, Supplementary Fig. S9).
To confirm that this was due to a greater expression of cpPGAM and cpEnolase genes, as opposed to being purely driven by the greater relative abundance of diatoms in high-latitude Tara Oceans stations, we performed multiple normalization tests (Supplementary Fig. S10; Supplementary Data Set 3, sheet 10).First, metaT abundances calculated for each gene in the 0.8 to 2,000 μm size fraction were divided by the total relative abundance of all diatom metaT sequences, providing the total proportion of each diatom meta-transcriptome occupied by cpEnolase and cpPGAM.These normalizations showed positive correlations to latitude in both surface and DCM depth fractions, with the greatest relative abundances (> 0.1% total diatom mapped transcripts) typically occurring in stations >60° (Supplementary Fig. 10A).The observed Pearson correlations to latitude were significantly positive [surface cpEnolase R 2 = 0.18, P < 10 −05 , cpPGAM1A R 2 = 0.23, P < 10 −05 ; DCM cpEnolase R 2 = 0.53, P < 10 −05 , cpPGAM1A R 2 = 0.59, P < 10 −05 (Supplementary Data Set 3, sheet 10)].More broadly across, the metaT-normalized relative abundance levels showed clearest positive correlations to day length and negative correlations to temperature.No other parameters (e.g.nutrient concentrations) showed as clear correlations to chloroplast glycolysis metaT relative abundances (Supplementary Data Set 3, sheet 10).
Alongside this, the metaT abundances obtained for diatom cpEnolase and cpPGAM genes were compared (via log normalization, to allow the inclusion of zero values) to the relative abundances calculated for the meta-genomic (metaG) sequences of the same genes (Supplementary Fig. 10B).This can be taken as an indicative measurement of the relative ratio of the transcript to gene abundances for each meta-gene, i.e., in effect its expression level.These measurements showed a weaker but still significant positive correlation to latitude for cpEnolase surface fractions (R 2 = 0.10, 1-tailed F-test, P < 0.05) and for both genes in DCM fractions [cpEnolase R 2 = 0.28, 1-tailed F-test P < 0.05, cpPGAM1 R 2 = 0.29, one-tailed F-test P < 0.05 (Supplementary Data Set 3, Probing Diatom Chloroplast Lower Glycolysis | 3589 sheet 10].For both genes and in both depth fractions, two individual stations within the Arctic (Station 173, 78.93 to 78.96°N; Station 188, 78.25 to 78.36°N) were observed to have extremely high metaT to metaG ratios ((log 10 (1 + metaT)-log 10 (1 + metaG)) > 3 to 5) that disrupted the linear relationship between normalized metaT and latitude and point to specifically high expression of chloroplast glycolysis genes in polar waters.To correct the impacts of these stations, ranked (Spearman) correlation values were also calculated for normalized chloroplast glycolysis metaT expression levels.Significant positive correlations with latitude were detected in multiple individual size fractions and depths (0.8 to 5, 3/5 to 20, 20 to 180, and 180 to 2000 μm), including for cpPGAM1 metaT normalized against metaG in surface 3/5 to 20 (one-tailed F-test, P < 10 −05 ), 20 to 180 (one-tailed F-test; P < 10 −05 ), and 180 to 2,000 (one-tailed F-test, P < 0.05) μm fractions (Supplementary Data Set 3, sheet 10).
The transcriptional preference of diatom cpEnolase and cpPGAM1 for high latitudes contrasted strongly with PGAM2, which showed equivalent relative abundance in stations from the temperate South Pacific and Atlantic as stations from the Arctic and Southern Oceans (Supplementary Fig. S11; Supplementary Data Set 3, sheet 10).In certain size fraction and depth combinations (e.g.DCM 0.8 to 3, and 3/5 to 20 μm fractions, normalized against metaG abundances; and surface and DCM 180 to 2,000 μm fractions normalized against all diatom metaT abundances), PGAM2 metaT abundances even demonstrated significant negative correlations to latitude (Supplementary Data Set 3, sheet 10).
Finally, we tested whether the occurrence of plastidial lower glycolysis may correlate to algal abundance at high latitudes.For this, we screened single-cell and meta-genome assembled genomes (sMAGs) from Tara Oceans for potential plastid-targeted Enolase and PGAM enzymes, using similar reciprocal BLAST best hit, PFAM annotation and in silico targeting prediction techniques as previously used for cultured algae [Supplementary Fig. S12; Supplementary Data Set 3, sheet 11 (Delmont et al. 2022)].We emphasize these results are preliminary, as many of these genomes are incomplete, and gene non-detection does not formally confirm absence (Delmont et al. 2022;Pierella Karlusich et al. 2023).For each sMAG, we considered the presence or absence of possible plastid-targeted Enolase and PGAM sequences; taxonomic assignation of the MAG; and mean mapped vertical coverage of each MAG in each station (i.e.depth and breadth of the coverage of sequences recruited to each genome), as a proxy for abundance, regardless of whether the plastid-targeted glycolysis genes associated with each sMAG were detected (Supplementary Fig. S12).
Across 291 eukaryotic algal sMAGs, 32 were found to possess both plausible cpEnolase and cpPGAM proteins, and a further 84 could be assigned either one or the other (Supplementary Fig. S12).As expected, diatoms were found to possess plastid-targeted glycolysis much more frequently than other groups, with 17/49 of the diatom sMAGs found to possess both chloroplast-targeted Enolase and PGAM enzymes, and a further 20 one of the two only (Supplementary Fig. S12).We also detected probable complete plastid glycolysis pathways in 10 further sMAGs belonging to lineages (pelagophytes, dictyochophytes, haptophytes, chrysophytes, and bolidophytes) previously inferred to possess complete plastid-targeted glycolysis pathways amongst cultured species (Supplementary Figs.S2 and S3).Surprisingly, given the relative paucity of this pathway in cultured primary green algae, we finally identified 5 putative chlorophyte sMAGs with both plausible cpEnolase and cpPGAM1A proteins (Supplementary Fig. S12).All 5 of these sMAGs (TARA_AON_82_MAG_00297, AOS_82_MAG_ 00181, ARC_108_MAG_00063, ARC_108_MAG_00100, and PSW_86_MAG_00289) are assigned as novel members of the genus Micromonas which is abundant at high latitudes (Lovejoy et al. 2007;Worden et al. 2009;Delmont et al. 2022).Of note, no cultured Micromonas are inferred to possess this pathway (Supplementary Data Set 1, sheet 1).We therefore infer, in particular, that the recurrent Micromonas sMAG isoform may be a novel plastid glycolysis pathway specific to uncultivated taxa.
Considering the biogeography of each sMAG, we note that diatoms that possess complete lower half plastidial glycolysis pathways show positive correlations between mean mapped vertical coverage and absolute station latitude, albeit only in DCM A B Figure 3. Environmental distributions of diatom plastidial lower glycolysis-gluconeogenesis meta-genes.Total transcriptome (A) and genome (B) relative abundances, sampled from all (0.8 to 2,000 µm) size fractions and surface layer (i, iii) or DCM (ii, iv) stations for Tara Oceans meta-genes phylogenetically resolved to diatom cpEnolase (i, ii) and cpPGAM1 (iii, iv).These data provide a global overview of Tara Oceans meta-gene abundances, in complement to data from individual size fractions shown in Supplementary Fig. S9.These data demonstrate higher meta-transcript abundance without commensurate increases in meta-gene abundance at high northern and southern latitudes.
fractions (Supplementary Fig. 12A; r = 0.517, P < 0.001), while positive correlations to latitude were observed for diatom sMAGs possessing one of cpEnolase or cpPGAM only at both depths (Supplementary Fig. 12A; surface r = 0.313, two-tailed t-test P = 0.003, DCM r = 0.614, P < 0.001).This trend was not however observed for diatom sMAGs lacking plastid-targeted copies of both proteins, which showed non-significant and even weakly negative correlations to absolute latitude (Supplementary Fig. 12A; surface r = −0.069,DCM r = 0.192, P > 0.1).No clear association between plastidial lower glycolysis and occupancy at high latitudes was observed for other algal groups, with the exception of chlorophytes, in which the presence of both cpEnolase and cpPGAM1A showed a strong association with abundance in high (and particularly) Arctic latitude stations (surface r = 0.508, P < 0.001; DCM r = 0.386, P = 0.017; Supplementary Fig. S12, B and C).

Growth and photophysiology of Phaeodactylum cpEnolase and cpPGAM1A knockouts across light and temperature conditions
We generated homozygous CRISPR knockout lines for both cpenolase and cppgam1A genes in the model diatom P. tricornutum.cpPGAM1A was selected over other PGAM (cppgam1B, cppgam2) genes because of its transcriptional coregulation to cpenolase and occurrence of its encoded protein in measurable quantities in plastid proteome data (Supplementary Fig. S6; Supplementary Data Set 2) and latitudinal expression correlation in Tara Oceans (Fig. 3, Supplementary Fig. S11).
Multiple CRISPR knockout lines were generated from two regions with unique sequences in the P. tricornutum genome for each gene [cpenolase CRISPR region 1 n = 4, CRISPR region 2 n = 3; cppgam1A CRISPR region 1 n = 2, CRISPR region 2 n = 3 (Supplementary Fig. 13A)].Each CRISPR line was verified by sequencing to be homozygous and to contain a frame-shift mutation sufficient to impede translation of the encoded protein (Supplementary Fig. 13A).Commercial antibodies against Enolase and PGAM peptides were found not to specifically label cpEnolase and cpPGAM1A in immunoblots, so we inferred protein relative expression level by RT-qPCR (Zhang et al. 2020).A total of 1,189 discrete Cp values were measured for two cpenolase and two cppgam1A RT-qPCR amplicons, alongside two reporter genes (rps, tbp) as above (Sachse et al. 2013) for all knockout lines included in this study and two empty-vector controls.The measured knockout mRNA abundance in each line was significantly lower (1.8% to 39%) than that identified in empty vector control mRNA (n = 4, 1-way ANOVA, P < 0.05) 19C LD conditions, (Supplementary Fig. 13B).This is consistent with effective knockdown of mutated genes.e.g.via non-sense mediated decay (Chang et al. 2007).
Next, we performed growth curves of cpEnolase and cpPGAM1A knockout lines compared to empty-vector controls (Fig. 4; Supplementary Data Set 4, sheets 3 to 6).We chose to target changes in light and temperature, given that both show clear associations observed with cpPGAM1A and cpEnolase in Phaeodactylum gene expression and Tara Oceans data (Figs. 2 and 3), using the three conditions (19C CL, 19C LD, and 8C CL) previously tested for RT-qPCR.We note that these conditions are relevant to the environmental conditions in which the type culture of Phaeodactylum (strain CCAP 1055/1) was collected (Irish Sea, 53.5°N) with measured sea temperatures (1960 to 1999) between 3 °C and 17 °C; and day lengths between 7 and 17 h (Young and Holt 2007;Gachon et al. 2013).
To test the possibility of off-target effects of the CRISPR constructs, we complemented mutant lines with blasticidin resistance genes linked to either cpEnolase-GFP or cpPGAM1A-GFP modified to remove all CRISPR target sequences [Supplementary Data Set 4, sheet 2 (McCarthy et al. 2017;Buck et al. 2019)].Despite an overall lower growth rate in all blasticidin-resistant lines compared to primary transformants, and within-line variation, comparative growth curves of 47 complemented vs placebotransformed mutant lines revealed increased growth rates in complemented cpEnolase and cpPGAM1A vs blank transformed knockout lines under 19C CL and 19C LD (Supplementary Data Set 4, sheet 7; 1-way ANOVA, 2-tailed P < 0.05).By contrast, complemented knockout line growth rates overlapped with empty-vector controls either transformed with cpEnolase or blank complementing vectors, indicating effective rescue of mutant phenotypes (Supplementary Data Set 4, sheet 7).
Finally, we performed comparative photophysiological measurements of knockout lines in the two conditions (19C LD and 19C CL) where they presented an aberrant growth phenotype (see Methods).Our data indicate that the presence/absence of these enzymes does not significantly impact photosynthetic performance.The light dependencies of either electron transfer rate through photosystem II [PSII (rETR(II)] or photoprotection (nonphotochemical quenching, NPQ) were very similar between control and knockout lines (Supplementary Fig. 15A; Supplementary Data Set 4, sheets 8 to 11).A slight but significant increase in the functional absorption cross-section of photosystem II (σPSII) was found under 19C CL in both cpEnolase (319.3 ± 22.5) and cpPGAM1A knockouts (306.6 ± 11.6) compared to controls [292.3 ± 8.2; 1-way ANOVA, P < 0.05 (Gorbunov et al. 2020)].This elevation was suppressed in both complemented lines (Supplementary Fig. 15B; Supplementary Data Set 4, sheet 11).

Gene expression profiling of Phaeodactylum cpEnolase and cpPGAM1A knockouts
Next, we investigated the impacts of disruption of plastid glycolysis on diatom metabolism beyond photosynthesis.First, we performed quantitative RNAseq analysis using 63 RNA samples drawn from multiple knockout and empty-vector lines under all three physiological conditions (19C LD, 19C CL, and 8C CL; Supplementary Data Set 5, sheet 1; Materials and Methods).These comprised 19 cpEnolase, 27 cpPGAM1A, and 17 control line samples; 20 total harvested under 19C LD, 21 under 19C CL, and 2 under 8C CL conditions.A minimum of four samples for each genotype: treatment combination was included for all subsequent quantitative analyses.8C CL was targeted despite the absence of an aberrant growth phenotype associated with this line due to the high levels of cpEnolase and cpPGAM1A gene expression inferred from RT-qPCR data (Fig. 2B; Fig. 4) Complete results are provided in Supplementary Data Set 5, sheets 5 to 11.Both cpEnolase and cpPGAM1A mRNA were found to significantly underaccumulate in the corresponding knockout lines, consistent with RT-qPCR analysis (Supplementary Fig. 13B) and suggesting maintenance of the mutant genotypes throughout RNA sequencing; while cpPGAM1B (Phatr3_J51404) but not cpPGAM2 (Phatr3_J37201) was upregulated in cpPGAM1A knockouts but not cpEnolase knockouts under 19C CL conditions, which may suggest compensatory functions between cpPGAM1A and cpPGAM1B (Supplementary Data Set 5, sheet 12).
Genome-scale enrichment analyses of the in silico localizations of proteins encoded by differentially expressed genes revealed distinctive changes in glycolysis knockout organelle metabolism.These effects were most evident in 19C CL, in which 90/239 (38%) of the genes differentially upregulated (mean fold-change >2, P-value < 0.05) in both cpEnolase and cpPGAM1A knockout lines compared to controls were predicted to possess chloroplast targeting sequences based on ASAFind (Gruber et al. 2015) or HECTAR (Gschloessl et al. 2008).This was significantly greater than the proportion of genes (1,585/11,514, 14%) across the entire genome predicted to encode chloroplast-targeted proteins that were detected in RNAseq data (one-tailed chi-squared P < 10 −05 ; Fig. 5A; Supplementary Data Set 5, sheet 10).These results were supported by domain enrichment analyses, indicating significant (one-tailed chi-squared P < 0.05) enrichments in light-harvesting complex (GO:0030076), photosynthesis (GO:0009765), and protein-chromophore linkage (GO:0018298) GO terms.A more detailed resolution of gene expression patterns underpinning core organelle metabolism pathways (Ait-Mohamed et al. 2020) suggested concerted upregulation of genes encoding light-harvesting complexes and photosynthesis machinery and plastid fatty acid synthesis machinery, alongside a probable upregulation of mitochondrial respiratory complex I and ATP synthase (Supplementary Data Set 5, sheets 10 to 11).Less dramatic changes were evident in 19C LD and 8C CL, although 13 of the 51 genes (25%) inferred to be downregulated in both cpEnolase and cpPGAM1A knockout lines under 8C CL were inferred to encode chloroplast-targeted proteins by either ASAFind or HECTAR, representing likewise an enrichment compared to all genes identified within the RNAseq data (one-tailed chi-squared P < 0.05; Fig. 5A).
To gain a more precise insight into the effects of plastid glycolysis-gluconeogenesis on P. tricornutum metabolism, we additionally validated the differential expression of eleven exemplar genes encoding chloroplast-and mitochondria-targeted proteins by qPCR in knockout and empty-vector control lines across all three conditions (Fig. 5B; Supplementary Data Set 5, sheet 12).A total of 6,612 discrete Cp values were measured for these genes, alongside 2 reporter genes as above, for knockout lines and 2 empty-vector controls under 19C LD, 19C CL, and 8C CL conditions.These genes showed relatively limited differences under 19C LD, limited to a slight depression in the accumulation of lhcf1 (Phatr3_J18049) and chorismate mutase (Phatr3_J43277) mRNA in both cpEnolase and cpPGAM1A knockouts compared to control lines (∼50% downregulation, two-tailed t-test P < 0.05; Fig. 5B).Both knockout lines overaccumulated (>600%; 2-tailed t-test P < 10 −05 ) mRNAs encoding mitochondrial phosphoglycerate mutase (Phatr3_J33839) under 19C LD compared to control lines (Fig. 5B).

Metabolite profiling of Phaeodactylum cpEnolase and cpPGAM1A knockouts
Next, we considered the compound effects of cpEnolase and cpPGAM1A knockout on global metabolite accumulation under each environmental condition via GC-MS profiling of 32 sugars and amino acids (Fig. 6; Supplementary Fig. S16), across 139 samples drawn from multiple knockout and control lines under 19C LD, 19C CL, and 8C CL.These comprised 53 cpEnolase, 46 cpPGAM1A, and 50 control line samples; 46 total harvested under 19C LD, 49 under 19C CL, and 44 under 8C CL conditions.A minimum of 13 samples for each genotype: treatment combination was included for all subsequent quantitative analyses.These samples were obtained from cell pellets collected from mid-exponential-phase cultures and thus correspond to the longterm impacts on metabolite accumulation in actively growing plastid glycolysis knockout lines.Complete outputs are tabulated in Supplementary Data Set 6, sheets 1 to 2.
We were unable to directly measure the accumulation of any of the products or substrates of either cpPGAM1A or cpEnolase (3-phosphoglycerate, 2-phosphoglycerate, PEP), although we detected significantly diminished (1-way ANOVA 2-tailed P-value < 10 −05 ) pyruvate accumulation, as a metabolite synthesized from PEP (by pyruvate kinase), in cpPGAM1A knockouts under all three conditions, and in cpEnolase knockouts under 8C CL (Fig. 6, Supplementary Fig. S16).We similarly could not directly measure the accumulation of glyceraldehyde-3-phosphate (the substrate for PGAM), but could detect an overaccumulation of glycerol-3phosphate (synthesized from glyceraldehyde-3-phosphate by glycerol-3-phosphate dehydrogenase) in cpEnolase knockout lines under all three conditions (Fig. 6).
In all three conditions, significant reductions (1-way ANOVA 2-tailed P-value < 0.01 in both cpEnolase and cpPGAM1A knockout lines) were observed in cytoplasmic sugars and sugar derivatives (glucose, sucrose, histidine, and myo-inositol) in cpEnolase and cpPGAM1A knockouts compared to control lines (Fig. 6).cpEnolase and cpPGAM1A knockout lines further underaccumulated citric acid in all three conditions, and malic acid in 8C CL (Fig. 6).A probable overaccumulation of phosphoric acid was observed in all knockout lines except cpPGAM1A under 19C CL (Fig. 6; Supplementary Fig. S17).Significant (1-way ANOVA 2-tailed P-value < 10 −05 ) over-accumulations were identified for valine in cpEnolase and cpPGAM1A knockouts under 19C CL and 8C CL; for methionine and ornithine in 19C CL only; and an underaccumulation for arginine under 19C CL only (Fig. 6).
Finally, specific differences were observed in the metabolite accumulation patterns observed in cpEnolase and cpPGAM1A knockout lines (Fig. 6; Supplementary Fig. S16).These include a significant (1-way ANOVA 2-tailed P-value < 10 −05 ) overaccumulation of three amino acids (aspartate, leucine, and phenylalanine) and one sugar phosphate (glycerol-3-phosphate) specifically in cpEnolase knockout lines under all three conditions, and in serine under 19C CL and 8C CL only.These differences contrast to cpPGAM1A knockouts in which no significant changes were observed.Surprisingly glycerol-3-phosphate and serine were found to significantly underaccumulate under all three conditions in cpPGAM1A knockouts compared to controls (Fig. 6; Supplementary Fig. S16).

Lipid profiling of Phaeodactylum cpEnolase and cpPGAM1A knockouts
Next, we performed GC-MS (55 samples) and LC-MS (89 samples) of lipid profiles in multiple knockout and control lines under 19C LD, 19C CL, and 8C CL.GS-MS analyses comprised 18 cpEnolase, 23 cpPGAM1A, and 14 control line samples; 18 total harvested under 19C LD, 19 under 19C CL, and 18 under 8C CL conditions.LC-MS analyses comprised 24 cpEnolase, 42 cpPGAM1A, and 23 control line samples; 28 total harvested under 19C LD, 29 under 19C CL, and 32 total harvested under 8C CL conditions.A minimum of four samples for each genotype: treatment combination was included for all subsequent quantitative analyses.Outputs are listed in Supplementary Data Set 6, sheets 1, 3 to 5. While the GC-MS data project significant (1-way ANOVA 2-tailed P-value < 0.05) impacts of growth condition on fatty acid profiles (e.g. a decrease of C20:5 side chain lipids balanced by an increase of C16:1 side chain lipids in 19C CL, and an overaccumulation of C16:3 side chain lipids under 19C LD, and of C18:0 side chain lipids under 8C CL), no substantial differences were observed between cpEnolase, cpPGAM1A, and control lines under any conditions studied (Supplementary Data Set 6, sheet 3).

Reaction kinetics of expressed copies of Phaeodactylum cpEnolase and cpPGAM1A
Finally, we assessed the kinetics of cpPGAM and cpEnolase in both glycolytic and gluconeogenic directions.Previous studies (e.g. in animal renal and liver tissue) project reversible reaction kinetics for both Enolase and PGAM enzymes.The reaction rates of Enolase and PGAM show limited difference in glycolytic vs gluconeogenic directions in vivo, with measured enolase reaction rates in rat kidney tissue equivalent to approximately 14,000 µmol g dry weight −1 h −1 in the glycolytic direction, and 20,000 µmol g dry weight −1 hr −1 in the gluconeogenic direction (Krebs 1963;Scrutton and Utter 1968;Reinoso et al. 1997).In contrast, purified Enolase and PGAM typically show greater affinity for 3-PGA (glycolysis) than PEP (gluconeogenesis), with a 5-to 8-fold difference in K m measured in mammalian, yeast, and Trypanosoma brucei enzymes (Rider and Taylor 1974;Hannaert et al. 2003).
Using a previously defined assay (Sutherland et al. 1949;Zhang et al. 2020) with modified versions of each protein (codon-optimized, and lacking signal peptides) expressed in E. coli, we measured nicotinamide adenine dinucleotide (NADH) consumption coupled to either lactate dehydrogenase (glycolysis) or glyceraldehyde-3-phosphate dehydrogenase.Six replicate experiments were performed for each reaction (Supplementary Fig. S20).Both enzymes were inferred to possess reversible reaction kinetics, metabolizing NADH when supplied both with 3-PGA (in the glycolytic direction) and with PEP (in the gluconeogenic direction; Fig. 8, Supplementary Fig. S20).The measured reaction rates were effectively reversible, albeit marginally greater in the glycolytic than gluconeogenic direction (e.g.2.47 ± 0.43 vs 1.97 ± 0.29 nMol NADH consumption per gram free weight purified enzyme per minute when supplied with12 mM 3PGA or 12 mM PEP; Fig. 8).

Discussion
We characterize a lower glycolytic-gluconeogenic pathway associated with diatom plastids, relating specifically to two plastidtargeted proteins, cpEnolase and cpPGAM1A, and focusing on the model species P. tricornutum.Our data position plastid glycolysis-gluconeogenesis as arising in a recent ancestor of diatoms and their closest relatives [e.g.pelagophytes, dictyochophytes (Nonoyama et al. 2019)].The presence of plastid glycolysis in haptophytes may be a result of endosymbiotic transfers into this group from a pelagophyte/dictyochophyte-related alga, as suggested in previous studies (Dorrell et al. 2017;Jiang et al. 2023).We further show that plastidial lower glycolysis-gluconeogenesis has a limited distribution across the algal tree of life, with no examples in primary red and few in primary green algae (Supplementary Fig. S1, Supplementary Fig. S12).It is possible that the occurrence of organelle-targeted isoforms of these enzymes is underestimated, e.g.due to lower sensitivity of diatom and plant-trained targeting predictors on other algal groups (Fuss et al. 2013;Gruber et al. 2015).We propose that diatom plastid glycolysis most likely originated through the duplication and retargeting of mitochondrial respiratory enzymes (Fig. 1).
Using meta-genomic data from Tara Oceans, we demonstrate that diatom plastid glycolysis is likely highly expressed at high latitudes (Fig. 3, Supplementary Figs.S8 to S11), which are subject to A B Figure 7. LC-MS lipid distributions in glycolysis-gluconeogenesis mutant lines.A) Scatterplots of relative proportions of MGDG and TAG in total lipid LC-MS samples in cpEnolase and cpPGAM1A knockout lines and empty-vector controls under each growth condition, showing increased MGDG: TAG in glycolysis knockout lines under 19C, and the inverse relationship in cpEnolase knockout lines only under 8C.B) Violin plots of relative abundances of 3 further lipid categories inferred to differentially accumulate in glycolysis knockout lines under different growth conditions.Significant differences between knockout and control lines (1-way ANOVA) are asterisked.extreme photoperiods and low temperatures.These data are further supported by collection sites of cultured species, with no occurrences of cultured diatoms lacking plastid-targeted PGAM enzymes beyond 50°N (Supplementary Fig. S1B), and evidence from MAGs, in diatoms that possess apparent plastidtargeted Enolase and PGAM enzymes show an associative preference for high latitudes, while those that lack them do not (Supplementary Fig. S12).These enrichments appear to be largely specific to diatoms, with polar circle haptophytes, cryptomonads and other ochrophytes lacking apparent plastidial glycolysis found further than 60°N and 70°S, considering both cultured species and MAG data, although with potential parallel recruitment of plastidial glycolysis to isolates of the prasinophyte genus Micromonas abundant in high-latitude Tara stations (Lovejoy et al. 2007).We thus tentatively propose that lower half plastid glycolysis correlates to diatom occurrence in high-latitude Tara Oceans stations, with diatoms that lack identifiable copies of these proteins absent from these stations, and other algal groups (except potentially Micromonas), showing no preference for plastidial glycolysis at high latitudes.
We are hesitant to state that plastid glycolysis is an adaptive feature of diatoms toward high latitudes, given it apparently originated in a common ancestor of diatoms and several other algal groups (i.e.pelagophytes, and dictyochophytes), and is retained in species such as Phaeodactylum which is typically associated with intermediate latitudes (Rastogi et al. 2020).cpEnolase and cpPGAM cannot therefore have been viewed to have been gained in specific diatom species in response to environmental selection.An open question, particularly given the largely latitude-insensitive distributions to diatom MAGs lacking plastid glycolysis, remains to what extent diatoms that have secondarily lost their plastid glycolytic pathway are abundant in nature (Supplementary Fig. S12).Ultimately, the physiological functions of diatom plastid glycolysis will be best identified through competition assays, e.g. between diatom species with different plastid carbon metabolism arrangements, or between Phaeodactylum knockout and empty-vector control lines under each condition (Siegel et al. 2020).
Nonetheless, from our analysis of published Phaeodactylum transcriptome data and RT-qPCRs, we note that both cpEnolase and cpPGAM1A genes are transcriptionally induced in response to both long-day conditions and low temperatures (Fig. 2B, Supplementary Fig. S7).These biases are particularly interesting given the growth analysis of P. tricornutum knockout lines.In particular, we observe more intense growth defects in Phaeodactylum lines under continuous illumination than in light: dark cycles (Fig. 4, Supplementary Fig. S9), which alongside gene expression data suggests increased importance of plastid glycolysis in diatoms subject to long days.In contrast, under low temperatures, no difference was observed in the growth rate of glycolysis knockouts showed to control lines (Fig. 4).We note that the relative expression of cpEnolase and cpPGAM1A are even greater at 8C than 19C under continuous light conditions, and it is possible that flux occurs through plastid glycolysis at low temperature despite the absence of a clear aberrant growth phenotype in knockout vs control lines.We thus tentatively propose that lower glycolysisgluconeogenesis may have multiple functions in the diatom plastid, with different functions dependent on both light and temperature conditions.
Considering the observed phenotypes of knockout and control lines (Figs. 4 to 7; Supplementary Figs.S14 to S19) and the reversible kinetics of expressed enzymes, we suggest potential functions contributed by the lower half of plastid glycolysis-gluconeogenesis in diatoms under 19C LD, 19C CL, and 8C CL conditions (Fig. 8).
Overall, our suggested roles for cpEnolase and cpPGAM1A are predominantly in favor of metabolic flux in the glycolytic direction, reflecting the underaccumulation of pyruvate in cpPGAM1A knockouts, and overaccumulation of 3-phosphoglycerate in cpEnolase knockouts (Fig. 6).We also present this hypothesis based on the innate metabolic activity of the Calvin-Benson-Bassham Cycle, which is likely to yield a high relative abundance of triose phosphate in the plastid under illuminated and photosynthetically active conditions; although we note that studied diatom triose phosphate transporters show higher transport affinity for PEP than DHAP, which may facilitate substrate supply for gluconeogenic activity (Moog et al. 2020) (Fig. 8).These results are nonetheless inferential based on the long-term accumulation patterns of stable metabolites and the expression of implicated metabolic genes.Whilst these would be more effectively validated via direct flux measurements, e.g.comparative 13 C-glycerol or C-glucose labeling of glycolysis knockout and control lines (Zheng et al. 2013;Huang et al. 2015), this was beyond the scope of the current study.
Under 19C LD, we observe limited gene expression changes in cpPGAM1A and cpEnolase knockout lines, except (as inferred from qPCR) a downregulation in plastid chorismate mutase, which forms part of the plastid shikimate pathway, that typically consumes PEP (Bromke 2013) and may form a primary acceptor of glycolytic products (Fig. 5B).One of the products of chorismate mutase activity, phenylalanine, does seem to overaccumulate in metabolite pools of cpEnolase mutants only under these conditions, pointing to potentially more complex fluxes (discussed below).We also note an upregulation of mitochondrial PGAM in both lines (Fig. 5B), which might relate to a greater level of mitochondrial glycolysis, e.g. of exported plastid glyceraldehyde-3phosphate in the knockout lines.Strikingly, both mutant lines underaccumulate TCA cycle intermediates (citric acid in both lines, and fumaric and malic acid in cpPGAM1A only), which may suggest less retention of metabolized sugar in the mitochondrion.Finally, both mutants also underaccumulate sugars typically synthesized in the cytosol (sucrose, myo-inositol), which we interpret to imply less excess fixed carbon in the knockout compared to control lines (Fig. 6).Overall, these data seem to point to less efficient carbon usage, and an overall redirection of plastidial triose phosphate from plastid or cytoplasmic anabolic reactions to mitochondrial catabolism in knockout lines.
We also note some evidence for lipid remodeling in glycolysis mutant lines.These include a relative overaccumulation of galactolipids in lieu of TAGs, and short-chain fatty acids in lieu of longer equivalents (Fig. 7, Supplementary Fig. S18).Previous studies have noted the importance of lipid metabolism in diatom stress responses (Zulu et al. 2018) and that most or all diatom lipid synthesis occurs directly in the plastid (Huang et al. 2024).Many of the metabolic reactions required for lipid synthesis, including acyl-coA synthesis from pyruvate (Maréchal and Lupette 2020), glycerol-3-phosphate from glyceraldehyde-3-phosphate (Kroth et al. 2008), and glucosyl-1-phosphate from cytoplasmic glucosyl-1-phosphate (Zhu et al. 2016), are likely to be Impacted by plastid carbon metabolism.Specifically, an underaccumulation of TAGs and fatty acids in lieu of galactolipids would also suggest a lower ratio of pyruvate (for acyl-coA synthesis) to glyceraldehyde-3-phosphate (for galactosyl-phosphate synthesis) in glycolysis knockout lines (Demé et al. 2014), similarly to the underaccumulation of pyruvate observed in our metabolomic data (Fig. 6).We infer that these changes are probably driven by substrate limitation, as we observe no changes in the transcription of genes involved in fatty acid synthesis in glycolysis Probing Diatom Chloroplast Lower Glycolysis | 3597 knockout lines; nor do we increased expression of cpEnolase or cpPGAM1A in cellular conditions (N and P limitation) known to induce diatom lipid accumulation [Fig.5B; Supplementary Fig. S7 (Abida et 2015)].
Under 19C CL, we observed much more dramatic remodeling of cellular transcription in knockout lines compared to controls (Fig. 5A).These are notably concordant with greater expression of cpEnolase and cpPGAM1A in wild-type cells (Fig. 2B), and an enhanced growth defect in knockout lines, together suggesting greater potential flux through this pathway than under 19C LD (Fig. 4B).The transcriptional changes include greater overall photosynthesis gene expression, e.g.Lhcf1 (Fig. 5B), which was corroborated in photophysiological analyses by larger PSII antenna size, i.e. a larger functional cross-section [σPSII (Supplementary Fig. S15)].It should be noted that the increase in PSII antenna size does not necessarily change the quantum yield of individual PSII reaction centers, and therefore, the increased σPSII is independent of the F v /F m measured, which remains equivalent between knockout and control lines (Supplementary Fig. S15).We did not observe consistent differences in the expression of nitrogen or phosphorus stress metabolism, or in the expression of the P. tricornutum biophysical carbon concentration mechanisms of knockout lines, suggesting that these differences were not caused by N, P, or CO 2 limitation in the control lines (Supplementary Data Set 5, sheets 4 to 5) (McCarthy et al. 2017;Nawaly et al. 2023).We further did not measure differences in photosynthetic performance (electron transport) or an upregulation of genes encoding proteins involved in photoprotection, e.g.LhcX family or xanthophyll cycle enzymes in knockout lines under 19C CL [Fig.5B; Supplementary Fig. S15; Supplementary Data Set 4, sheet 12 (Buck et al. 2019;Bai et al. 2022)], suggesting that the differential expression of photosynthesis genes in the knockout lines does not directly influence photosynthesis.
In contrast, from RNAseq and qRPCR, we observed an upregulation of multiple mitochondrial NDH dehydrogenase and ATP synthase subunits and downregulation of TCA cycle enzymes in glycolysis knockout lines (Fig. 5B; Supplementary Data Set 5, sheet 12).Our metabolomic data further show an underaccumulation of citric acid, as per in 19LD conditions, but also arginine, synthesized in the diatom mitochondria from glutamate and aspartate in the urea cycle (Allen et al. 2011;Bromke 2013).We globally interpret these phenotypes to mean an increase in mitochondrial respiratory electron transport in glycolysis knockout lines without necessarily an increase in mitochondrial primary metabolic activity.Previous studies have noted the important role of diatom mitochondria in dissipating excess plastid reducing potential (Bromke 2013;Bailleul et al. 2015;Broddrick et al. 2019), and we wonder if these phenotypes observed in knockout lines under 19C CL conditions relate to the respiratory dissipation of plastidial NADPH.
It remains to be determined what routes beyond plastidial glycolysis contribute substrates (e.g.PEP, pyruvate) to the P. tricornutum pyruvate hub.Previous studies have noted that diatom plastid triose phosphate transporters may be able to transport PEP directly from the cytoplasm, and one of these (Phatr3_J54017) is indeed upregulated in both cpEnolase and cpPGAM1A knockout lines under 19C CL (Supplementary Data Set 5, sheet 3) (Moog et al. 2020;Liu et al. 2022).Elsewhere our data suggest that amino acids may modify the concentrations of P. tricornutum plastid PEP and/or pyruvate.These data include the overexpression of plastid alanine transaminase in cpPGAM1A knockouts (Fig. 5B), and the overaccumulation in both knockout lines of amino acids synthesized either from pyruvate (valine), PEP (aspartate via PEP carboxylase, and methionine from aspartate), or more broadly involved in plastid amino acid recycling [ornithine and glutamate, in the diatom plastid ornithine cycle (Levering et al. 2016;Smith et al. 2019;Yu et al. 2022); Fig. 6].The direction (import or export) and significance of these amino acid fluxes will be best determined, e.g. with an inducible knockout mutant compromised for both plastidial glycolysis and amino acid incorporation (e.g.Under 8C CL, we identify an overaccumulation of mRNAs encoding plastid biogenesis and mitochondrial glycolytic proteins, an overaccumulation of short-chain amino acids (valine) and an underaccumulation of cytoplasmic sugars and amino acids (glucose and histidine) in cpEnolase and cpPGAM1A knockouts relative to controls (Figs.5B).We note underaccumulations of pyruvate in both knockout lines (Fig. 6).Knockout lines under 8C conditions, however, have additional phenotypes not observed at 19C.These include an overall enrichment in downregulated genes encoding plastid-targeted proteins (Fig. 5A), and a specific overaccumulation of TCA cycle (citrate synthase) and a possible nonphotochemical quenching-associated mRNA [LhcX4; Fig. 5B (Bailleul et al. 2015;Murik et al. 2019)].Finally, specific differences are observed between cpEnolase and cpPGAM1A knockout lines at 8C.These include an overaccumulation of TAGs and SGDQs over glucosyl-lipids, and the overaccumulation of aspartate and phenylalanine in cpEnolase knockouts only (Figs. 6 and 7; Supplementary Fig. S17).
The more complex phenotypes observed in our knockout lines at 8C, and particularly the differences between cpPGAM1A and cpEnolase knockouts, may be due to several reasons.PGAM is typically viewed to have a greater catalytic activity than Enolase from biochemical studies, although this may be compensated by the greater relative abundance of cpEnolase than cpPGAM1A in Phaeodactylum plastids [Supplementary Fig. S6A (Scrutton and Utter 1968;Huang et al. 2024)].It is true that cpPGAM1A may be compensated by functionally redundant proteins (cpPGAM1B and cpPGAM2) in the Phaeodactylum plastid in knockout lines, whereas cpEnolase is functionally non-redundant, but none of these genes are specifically induced in RNAseq data of cpPGAM1A knockouts under 8C CL (Supplementary Data Set 5, sheet 3), suggesting an absent of specific compensation in the cpPGAM1A mutant line.We note that several of the phenotypes associated with 8C CL, and with cpEnolase knockouts specifically, relate to accumulation either of acyl-coA (TAGs) or PEP (aspartate and phenylalanine), which may suggest impediment of the gluconeogenic rather than glycolytic activity of cpEnolase and cpPGAM.The reversibility of the cpPGAM1A and cpEnolase reaction is confirmed by enzymatic data (Fig. 8), and it remains to be determined to what extent these enzymes function bidirectionally in vivo.It also remains to be determined how these functions impact growth kinetics and viability of diatoms in the wild, given the limited differences in growth rate observed in the lab between knockout and control lines (Fig. 4).
The complex phenotypes for diatom plastid glycolysis inferred from environmental and experimental data contrast with those for plant plastid glycolysis, with (for example) A. thaliana cpEnolase and cpPGAM mutants presenting relatively limited aberrant phenotypes (Prabhakar et al. 2009;Andriotis et al. 2010).We note that the cytoplasmic and respiratory plant Enolase and PGAM1 isoforms, alongside having predominant impacts on plant carbon flux, also have important moonlighting roles in plant development, immune responses and even in the structural coordination of plastids and mitochondria (Zhao and Assmann 2011;Zhang et al. 2020;Yang et al. 2022).We similarly anticipate that further surprises will be identified for the functions of diatom plastid glycolysis, and for this still poorly understood pathway in the photosynthetic tree of life.

Culture conditions
P. tricornutum strain Pt1.86 was grown in enhanced seawater (ESAW) medium supplemented with vitamins, but without silicon or added antibiotics, in 50 μE m −2 s −1 white light.Light profiles were measured with a SpectraPen photofluorometer (Photon Systems Instruments, Czech Republic); and are provided in Supplementary Data Set 4, sheet 13.Cultures were grown under one of four light, temperature, and shaking regimes.For general molecular work and transformation, cultures were grown under 19 °C with 12-h light: 12-h dark cycling, shaken at 100 rpm, following the established methodology of Falciatore et al. (Falciatore et al. 1999).For comparative physiology work, we were unable to replicate shaking conditions at low temperatures, and therefore chose to use conditions without shaking: 19 °C with 12-h light: 12-h dark cycling (« LD » growth conditions and physiological analysis); 19 °C with 24 h continuous light and without shaking (« CL » growth conditions and physiological analysis); or 8 °C with 24-h continuous light and without shaking (« 8C » growth conditions and physiological analysis).All cultures achieved measured midexponential F v /F m values of >0.6, suggesting that the absence of shaking did not impact photosynthetic efficiency (Supplementary Data Set 5, sheet 8).
Batch culturing of P. tricornutum for genetic manipulation was performed under fluorescent lamps.Physiological experiments were principally performed at 19 °C in an AlgaeTron AG230 (Photon Systems Instruments) with cool white LED (WIR) illumination, and technical specifications described in https://growth-chambers.com/ data/algaetron-ag-230/download/AlgaeTron_AG_230_Manual2021finalweb.pdf.Growth experiments were performed at 8 °C using a low-temperature adapted cool white LED (WIR, ECCLIM).Details of all three spectra used, as measured with a SpectraPen (PSI), are provided in Supplementary Data Set 4, sheet 13.

Phylogenetic identification of plastid lower glycolysis-gluconeogenesis enzymes
Plastid-targeted glycolysis lower half enzymes were searched across 1,673 plant and algal genomes and transcriptomes (Supplementary Data Set 1, sheet 1).Briefly, this involved an initial search of P. tricornutum genes annotated as pgam (Phatr3_J17086, Phatr3_J51404, Phatr3_J5605, Phatr3_J5629, Phatr3_J8982, Phatr3_J37201, P and hatr3_J47096) or enolase (Phatr3_draftJ1192, Phatr3_draftJ1572, and Phatr3_J41515), using translated peptide sequences with BLASTp and a threshold e-value of 10 −05 , and a reciprocal BLASTp with criteria -max_target_seqs 1 to retrieve the best homologs against the entire P. tricornutum genome.For PGAM, where P. tricornutum queries failed to retrieve homologs in >50% searched libraries, a second BLASTp was performed with query peptide sequences from A. thaliana (AT2G17280, AT1G09780, AT3G05170, AT3G08590, AT3G50520, AT5G04120, and AT5G64460), and a reciprocal BLASTp was performed with the P. tricornutum genome supplemented with these sequences.Similar approaches were subsequently used to identify equivalent plastid glycolysis proteins from Tara Oceans MAGs, as assembled in the study by Delmont et al. (2022).
The domain content of each potential homolog was identified using hmmscan and the version 33.1 Pfam database (Mistry et al. 2021).Only Enolase sequences that contained >90% Probing Diatom Chloroplast Lower Glycolysis | 3599 predicted domain coverage to both Enolase_N and Enolase_C domains; and PGAM sequences that contained >50% domain coverage to the His_Phos domain (based on the corresponding coverage observed in P. tricornutum sequences) were viewed as being complete.Sequences for which the N-terminus of the region homologous to the PFAM domain was located within the first 20 aa of the predicted sequence (i.e. less than the length of a typical plastidtargeting sequence) (Emanuelsson et al. 2007) were viewed as lacking credible targeting sequences.All remaining proteins were scanned, considering both complete proteins and sequences trimmed to the first encoded N-terminal methionine, using targetp [using a plant scoring matrix (Emanuelsson et al. 2007)], PredAlgo (Tardif et al. 2012), HECTAR (Gschloessl et al. 2008), and ASAFind [with SignalP 5.0 (Gruber et al. 2015;Almagro Armenteros et al. 2019)].Sequences from primary plastidcontaining organisms (plants, green and red algae, glaucophytes) that were inferred to possess a plastid-targeting sequence either with TargetP or PredAlgo, and sequences from secondary plastidcontaining organisms that were inferred to possess a plastidtargeting sequence with either HECTAR or ASAFind, considering both complete and N-trimmed sequence models, were annotated as putatively plastid-targeted.
A more detailed phylogenetic analysis was performed using Enolase and PGAM homologs obtained from a subset of 289 complete cryptomonad, haptophyte and stramenopile genomes and transcriptomes in the above library, alongside homologs identified from a further 85 prokaryotic and eukaryotic genomes sampled with taxonomic balance from across the remainder of the tree of life (Liu et al. 2022).Sequences were also screened for mitochondrial presequences using HECTAR (Gschloessl et al. 2008), and MitoFates, run with a threshold value of 0.35 (Fukasawa et al. 2015).
The pooled set of sequences was aligned first with MAFFT v 7.0 under the -auto setting, followed by the in-built alignment program in GeneIOUS v 10.0.9, under default settings (Kearse et al. 2012;Katoh et al. 2017).Incomplete and poorly aligned sequences, alongside taxonomically uninformative N-and C-terminal regions were removed from the alignment manually, followed by trimal with setting -gt 0.5 (Capella-Gutiérrez et al. 2009).Phylogenetic analyses were performed with MrBayes v 3.2 and rAxML v 8, integrated into the CIPRES webserver (Stamatakis 2014;Miller et al. 2015).MrBayes trees were run for 10,000,000 generations with the GTR, Jones, and WAG substitution matrices, 4 starting chains and sumt and sump burnin fractions set to −0.5; all cold chains were confirmed to have reached a P-value plateau below 0.1 prior to the start of the consensus building.rAxML trees were run with GTR, JTT, and WAG substitution matrices, 350 to 400 ML generations, and automatic bootstopping.Phylogenies were either rooted between bacterial and eukaryotic sequences (Enolase), or on the midpoint (PGAM1, PGAM2) due to the absence of a single monophyletic bacterial outgroup in these latter two proteins.A summary of these analyses is provided in Supplementary Data Set 1.

Analysis of previously published Phaeodactylum data
The mean relative abundance of peptides corresponding to Enolase and PGAM sequences was retrieved from published mass spectrometry data of Phaeodactylum plastid-enriched fractions and total cell pellets, following Huang et al. (2024).The two datasets were found to show a positive correlation with one another (r = 0.891, n = 901, P < 10 −05 ) considering all proteins recovered in the plastid-associated fraction with a previously suspected plastid localization (plastid-encoded, or plastid-targeted nucleus-encoded proteins inferred using combined ASAFind and HECTAR predictions).These data are provided in Supplementary Data Set 2, sheet 1.
The expression trends of Phaeodactylum plastid glycolysis proteins (cpEnolase, cpPGAM1A, PGAM2) were assessed in combat normalized RNAseq data (Ait-Mohamed et al. 2020) assembled from three prior studies, relating to induced nitrate limitation (nitrate reductase knockout), phosphate limitation and resupply, and iron limitation over a Circadian cycle (Cruz de Carvalho et al. 2016;Smith et al. 2016;McCarthy et al. 2017).A second set of comparisons were performed using normalized Phaeodactylum microarray data [summarized in the study by Ashworth et al. (2016)], particularly relating to changes in light quality, wavelength, and Circadian time point.To enable global analyses of coregulation to cpEnolase and cpPGAM enzymes, these two datasets were converted into ranked values (i.e. for Spearman correlation) and merged, following (Liu et al. 2022).These data are summarized in Supplementary Data Set 2, sheets 2 to 4.

Tara Oceans Analysis
The complete Tara Oceans and Tara Oceans Polar Circle libraries of meta-genome and meta-transcriptome diversity (Carradec et al. 2018;Royo-Llonch et al. 2021) were searched for orthologues of diatom cpEnolase, cpPGAM1A and PGAM2 sequences via a phylogenetic reconciliation approach benchmarked in previous studies (Kazamia et al. 2018;Liu et al. 2022).This approach uses the combined outputs of hmmer, BLAST best-hit, and single-gene tree topologies to only retain Tara Oceans meta-genes that reconcile as monophyletic with a defined query set, in these case plastid-targeted diatom isoforms of each enzyme.Exemplar tree topologies are shown in Supplementary Fig. S8.
First, a HMM (hidden Markov model) was constructed for all diatom plastid-targeted sequences in the untrimmed alignments for each phylogeny, as detailed above, and searched into the complete Tara Oceans catalog by hmmer (http://hmmer.org) with e-value 10 −10 to identify putative meta-gene homologs of each protein.Matching sequences were extracted and searched by BLASTp against the complete copy of the P. tricornutum genome (Rastogi et al. 2018).Only sequences that retrieved a best hit against an Enolase or PGAM sequence (and therefore likely correspond to homologs of each protein) were retained.Next, the retained sequences were similarly searched by BLASTp against the complete untrimmed alignment of cultured Enolase and PGAM sequences.Only sequences that retrieved a diatom plastidtargeted isoform were retained, allowing the elimination of nondiatom and homologs of diatom non-plastid sequences.Finally, sequences were combined with the untrimmed alignment of cultured sequences from each gene and realigned using the same MAFFT, GeneIOUS and trimal pipeline as defined above.Curated alignments were screened by rAxML tree with the JTT substitution matrix, as above.Only Tara Oceans sequences that resolved within a monophyletic clade with diatom plastid-targeted proteins, defined as all sequences that position closer on a midpoint rooting of the tree to diatom plastid-targeted proteins than to any nondiatom or non-plastid-targeted sequences, were extracted for further analyses.
Relative abundances were calculated for the total occurrence of all phylogenetically verified diatom plastid-targeted proteins in both meta-transcriptome and meta-genome data.Relative expression levels of each gene were estimated by reconciling the calculated meta-transcriptome abundances either to total diatom meta-transcriptome sequences using the formula 10 ^6(Σ metaT / Σ DiatomT ), i.e. expressed per million reconciled diatom reads, or to calculated meta-genome abundances, using the formula and log 10 (1+ Σ metaT ) -log 10 (1+ Σ metaG ), to allow inclusion of zero metaG values.Pearson and Spearman correlations were calculated between relative abundances and all quantitative measured environmental variables associated with Tara Oceans samples as stored within the PANGAEA repository (Pesant et al. 2015).All calculations were repeated independently for each depth (surface, or deep chlorophyll maximum/DCM) and size fraction (0.8 to 2,000 μm, 0.8 to 5 μm, 3/5 to 20 μm, 20 to 180 μm, and 180 to 2,000 μm), with 3-and 5-μm filters viewed as equivalent to allow reconciliation of Arctic and non-Arctic data, respectively.All Tara Oceans meta-gene assignations, alongside individual and total abundance calculations, are provided in Supplementary Data Set 3, sheets 1 to 10.
Tara Oceans MAGs were partitioned into those that contained credible chloroplast-targeted copies of Enolase and/or PGAM sequences, using a similar reciprocal BLAST best-hit, PFAM analysis and in silico targeting prediction pipeline as used for cultured species data (Supplementary Data Set 3, sheet 11).MAGs were partitioned into those containing both identifiable chloroplast-targeted Enolase and PGAM enzymes; one only; and neither.The mean mapped vertical depth (analogous to abundance) was calculated for each MAG from 0.8 to 2,000 μm data at DCM and surface depths, following data from (Delmont et al. 2022), and was compared to absolute station latitude by Pearson correlation and two-tailed t-test.

Nucleic acid isolation
For DNA isolation, 150 mL early stationary phase P. tricornutum culture, grown under 19 °C with 12-h light: 12-h dark cycling, and shaken at 100 rpm as described above, was centrifuged at 4000 rpm for 10 min.The resulting cell pellet was washed in sterile growth medium 3 times, and incubated for 4 h in 5 mL TEN buffer (0.1 M NaCl, 0.01 M Tris pH8, and 0.001 M EDTA) supplemented with 2% volume: volume SDS, and 1 U μl −1 proteinase K (Fisher Scientific).Lysed cell fractions were used for phenol: chloroform precipitation of cellular DNA, as previously described (Nash et al. 2007), prior to dissolution in 50 μl nuclease-free water, and quantification with a nanodrop photospectrometer.
For RNA isolations, 10 5 stationary phase P. tricornutum cells, as calculated with cell densities counted from a Malassez hemocytometer, were inoculated in a 250 mL conical Erlenmeyer containing 80 mL ESAW without antibiotics.Cell cultures were harvested in mid-exponential phase, at counted densities of between 1 and 2 × 10 6 cells mL −1 ; 19C CL cultures were typically harvested 8 days post-inoculation, 19C LD cultures 9 days post-inoculation, and 8C CL cultures 17 days post-inoculation, in agreement with growth curve dynamics.Cells were harvested at the midpoint of the light-induction phase of the LD growth condition (15:00 CET), per previous gene expression studies in P. tricornutum (Cruz de Carvalho et al. 2016).
RNA was isolated from 10 8 cells from each culture, pelleted and washed as before, and snap-frozen in liquid nitrogen.Frozen cell suspensions were re-equilibrated with 1 mL Trizol reagent (Invivogen) and 200 μl chloroform (Honeywell), prior to phenol: chloroform extraction.An additional separation step was performed in 500 μl pure chloroform to remove any residual phenol traces from the aqueous phase, and purified nucleic acids were precipitated overnight in 500 μl isopropanol at −20 °C.RNA was collected by centrifugation at 10,000 rpm for 30 min, washed with 900 μl 100% ethanol, and resupended in 50 μl RNAse-free water (Qiagen).

GFP localization
Full-length mRNA sequences of cpEnolase, cpPGAM1A, and cpPGAM2 were amplified from P. tricornutum RNA libraries grown under 19 °C, light: dark cycling and replete nutrient conditions as described above, by reverse transcription with RT Maxima First Strand synthesis kit (Thermo Fisher) from 200 ng template RNA, following the manufacturer's instructions; and gene-specific primers as shown in Supplementary Data Set 2, sheet 4. PCRs were performed using Pfu high-fidelity DNA polymerase, in 50 μl total reaction volume, including 1 μl cDNA template and 2 μl each specific primer, following the manufacturer's protocol.Typical PCR conditions were 10 min at 95 °C; followed by 35 cycles of 45 s at 95 °C, 45 s at 55 °C, and 2 min at 72 °C; followed by a terminal elongation phase of 5 min at 72 °C.Amplified products were migrated on a 1% agarose gel stained with ethidium bromide, cut out, and purified using a MinElute PCR purification kit (Qiagen).
Purified products were cloned into linearized versions of pPhat vectors containing eGFP and a zeocin resistance gene (SHBLE).These products were amplified using an analogous PCR protocol as above, with 1 ng purified plasmid DNA, and outward-directed PCR primers extending from the start of the fluorescence protein gene sequence to the end of the FcpA promoter region (Supplementary Data Set 2, sheet 4); cut, purified, and treated with 1 U FastDigest DpnI (Thermo Fisher) to remove any residual plasmid DNA.The gene-specific primers for each cpEnolase and cpPGAM construct were modified with 15 5′ nucleotides complementary to the terminal regions of the FcpA and GFP sequences, allowing cloning of complete vector sequences using a Hi-Fi DNA assembly kit (NEB), following the manufacturer's instructions.Assembled vectors were transformed into chemically competent Top10 E. coli, and positive clones (as identified by Sanger sequencing of positive colony PCR products) were used to generate purified plasmid DNA with a Plasmid Midi Kit (Qiagen).
Subcellular localization constructs were transformed into wildtype P. tricornutum Pt186 by biolistic transformation, as previously described (Falciatore et al. 1999); 5 × 10 7 mid-exponential-phase cells were plated on a ½ ESAW-1% agarose plate, and left to recover for 2 days, prior to bombardment with 10 mg 1 μm tungsten beads treated with 5 μg purified plasmid DNA in a Helios gene gun (BioRad) at 1,550 PSI.Cells were left to recover for 2 days, prior to replating on ½ ESAW-1% agarose plates supplemented with 100 μg mL −1 ampicillin, 100 μg mL −1 streptomycin, 30 μg mL −1 chloramphenicol, and 100 μg mL −1 zeocin.Plates post-bombardment and for the first 2 days post-selection were maintained in a low light environment (10 μE m −2 s −1 ) prior to return to standard growth conditions.
Positive transformant colonies, as verified by immunoblot analysis with a mouse anti-GFP antibody (Thermo Fisher), were visualized using an SP8 inverted spinning disc confocal microscopy (Leica) under 400× magnification, with excitation wavelength 485 nm and emission wavelength filters 500 to 550 nm.GFP-negative colonies were used to confirm detection specificity, and empty-vector Probing Diatom Chloroplast Lower Glycolysis | 3601 GFP-transformed cell lines (with cytoplasmic localizations) were used as fluorescence-positive controls.A minimum of 12 GFP-expressing clones were visualized for each construct with consistent localization.

CRISPR mutagenesis
CRISPR target sequences for cpEnolase and cpPGAM1A genes were identified using PhytoCRISP-Ex (Rastogi et al. 2016), privileging positions in the N-terminal region of the conserved domain to minimize the probability of enzyme functionality in knockout lines, and uniqueness across the entire P. tricornutum genome within the final 11 bp of the target sequence to minimize off-target effects.Primers were designed for each target sequence, and introduced into a pu6:SG1 CRISPR target sequence plasmid by PCR, as previously described (Nymark et al. 2016). 2 μg insertion-positive pu6:SG1 plasmids, as confirmed by Sanger sequencing were co-introduced into wild-type P. tricornutum Pt186 cells by bombardment along with 2 μg HA-tagged Cas9 and pPhat vectors, as described above.Mutant colonies were genotyped using a DNA lysis buffer containing 0.14 M NaCl, 5 mM KCl, 10 mM Tris-HCl pH 7.5, 1% v/v NP40 to generate crude DNA extracts, followed by PCR amplification across the CRISPR target sequences with DreamTaq PCR reagent (Promega) and Sanger sequencing (Eurofins genomics).Mixed mutant: wild-type colonies were segregated via repeated dilution on ESAW: zeocin plates until only homozygous mutant genotypes were detected (Nymark et al. 2016;McCarthy et al. 2017).Empty-vector control lines were generated using the same protocol, except with only HA-Cas9 and pPhat plasmids, cotransformed without a CRISPR target sequence.
Tabulated cleaned knockout mutants, their associated genotypes, and the expression levels of mutated gene copies are shown in Supplementary Data Set 4, sheets 1 to 2. Mutant colony genotypes were periodically confirmed (approx.once per month) by PCR and Sanger sequencing throughout the duration of all subsequent experiments, and the CRISPR-induced gene modifications were found to remain stable.P. tricornutum Enolase proteins were determined by immunoblot not to be crossreactive to an anti-Arabidopsis thaliana Enolase-2 antibody (Agrisera), and no plant-derived anti-PGAM antibodies were available at the time of study.Thus, knockout line protein expression was confirmed by RT-qPCR, as described below.

Complementation of knockout lines
Knockout lines were complemented with pPhat:GFP vectors carrying overexpressing copies (under an FcpA promoter) of cpEnolase and cpPGAM1A synthetic constructs, with all CRISPR target sequences replaced with silent mutations (Eurofins).Genes were fused to C-terminal GFP, allowing the verification of protein expression and localization.Vectors were identical to those previously used for localization, but with a blasticidin S-deaminase gene in lieu of SHBLE (Buck et al. 2018) introduced by NEB Hi-Fi kit as before.Complementation constructs were transformed via bombardment, and cotransformed colonies were selected on ½ ESAW-1% agarose plates supplemented with 100 μg mL −1 ampicillin, 100 μg mL −1 streptomycin, 30 μg mL −1 chloramphenicol, 100 μg mL −1 zeocin, and 4 μg mL −1 blasticidin.
For each complementation, three cpEnolase and cpPGAM1A knockout lines (including at least one for each CRISPR target sequence) were complemented both with the conjugate construct, and an empty blasticidin resistance vector as a placebo; and two empty-vector lines were further complemented with both cpEnolase and cpPGAM1A overexpressor constructs, plus an empty blasticidin resistance vector, to exclude possible effects from ectopic overexpression of each gene on cell physiology.A total of 47 colonies, with a minimum of 6 colonies for each knockout: complementation combination, including lines complemented from at least two distinct primary knockout mutant genotypes, were selected for subsequent study (Supplementary Data Set 4, sheet 7).The retention of the primary knockout mutant genotype in each complemented line was verified by colony PCR and sequencing as above, and the overexpression and correct localization of the complementing protein sequence (i.e. to the chloroplast for cpEnolase:GFP and cpPGAM1:GFP, or the cytoplasm for ectopic GFP) was verified by immunoblot with an anti-GFP antibody [Thermo Fisher (Erdene-Ochir et al. 2019)] and confocal microscopy.

Growth rate measurements
A starting density of 10 4 ML −1 stationary phase P. tricornutum cells of a given culture line, as verified with a Malassez hemocytometer, were inoculated into a 15 mL volume antibiotic-free ESAW within a sterile, ventilated cap plastic culture flask (Celltreat) and grown under LD, CL, or 8C culture conditions as described.Cell densities were recorded: every day from 1-day post-inoculation (CL); every day from 2 days post-inoculation (LD); or every 2 days from 5 days post-inoculation (8C) at the midpoint of the LD light induction phase using a counting CyFlow Cube 8 cytometer (ParTec).
Typically, 15 μl cell culture, passed at 0.5 μl s −1 , was used for each measurement, with three technical replicates performed for each culture of which the first (enriched in non-cellular particles) was excluded from downstream calculations.Cytometer particle counts were correlated to actual cell densities using a calibration curve realized from hemocytometer counted densities of wild-type cell culture, and cultures with observed densities >2 × 10 6 cells mL −1 were diluted 10-fold in blank growth media to avoid saturation of the cytometer.
Cell densities were measured until cell lines were confirmed to have exited log phase (i.e.reached a stationary phase plateau).Primary knockout mutant growth curves were repeated a minimum of six times (three biological replicates per-inoculation, with two independent repetitions) for each mutant line.Growth curves were tested for 7 cpEnolase knockout, 5 cpPGAM1A knockout, and 4 empty-vector control lines, providing a minimum of 24 measurements (i.e. 4 distinct mutant lines) per each genotype studied (cpEnolase knockout, cpPGAM1A knockout, and emptyvector control lines).
To avoid artifacts based on the proximity of the seed cell culture to exponential phase at the time of inoculation (which may impact lag phase length) or the relative diameter of each cell in culture (which may impact carrying capacity), cell growth rates were measured exclusively from the log-phase relative division rate.This was calculated by considering Δlog 2 (cell density)/Δlog 2 (time) for a timeperiod corresponding to 5 × 10 4 to 4 × 10 6 cells/mL, covering in most cases 6 successive measurements of each individual growth curve.To confirm that the cells were measured in exponential phase and were influenced by neither particle contamination of the cytometer nor cell exhaustion of the growth medium, the linear correlation was calculated between the log value, with most calculated correlations (129/132) showing linearity (r > 0.95).Three exemplar growth curve outputs are provided in Supplementary Data Set 4, sheets 3 to 5, and an overview of relative growth rates expressed as a function of mean empty-vector control growth rates are provided in Supplementary Data Set 4, sheet 6.
Complementation growth curves were repeated with at least 2 independent repetitions for each cell line, with 5 timepoints taken to project growth rates, and therefore, a minimum of 60 independent measurements for each mutant: complementation genotype under each growth condition, with the average of the two fastest growth rates of each culture calculated as estimates for the growth rate.A heatmap of all estimated complementation line growth rates is provided in Supplementary Data Set 4, sheet 7.

Photophysiology
Cultures for photophysiological analysis were grown in 10-mL ventilated plastic flasks, without shaking, under 19C CL and 19C LD as described above.Cultures were grown from a starting inoculum of 10 5 cells mL −1 as measured with a Malassez hemocytometer.Cell cultures that had reached a measured density of 10 6 cells mL −1 were then refreshed into fresh media at the initial starting concentration of 10 5 cells mL −1 to allow a prolonged adaptation to each photophysiological condition under a continuous exponential phase.Cells from refreshed culture lines were harvested in the exponential phase (between 1 and 3 × 10 6 cells mL −1 ) and good physiology was verified by F v /F m measurements >0.6 across all measured lines (Supplementary Data Set 4, sheet 8).
Steady-state light curves (SSLC) were conducted with a fluorescence CCD camera recorder (SpeedZen, jBeamBio, France) in a selected set of control lines (n = 2), cpPGAM (n = 3), and cpEnolase knockouts (n = 6), as well in complemented cpEnolase (n = 2) and cpPGAM1A (n = 3) knockout lines in which we observed a suppression of the knockout growth defect compared to complemented control lines.Measurements were repeated a minimum of two and in most cases 4 times per line and treatment condition, with a minimum of 6 unique measurements performed for each genotype and treatment.Curves were measured on cell cultures concentrated between 2 and 5 × 10 7 cells mL −1 .Samples were exposed to an initial 5-min illumination of 35 µmol photons m −2 s −1 green actinic light (532 nm), followed by 6 steps of 3 min each of increasing intensity to 750 µmol photons m −2 s −1 .
Minimum (F 0 ) and maximum (F M ) fluorescence were measured in dark-adapted (at least 1 min) samples, before and at the end of a 250 ms saturating (multiple turnover) pulse of light (532 nm, 5000 µmol photons m −2 s −1 ) and the maximum quantum yield of PSII in the dark was calculated as F V /F M = (F M -F 0 )/F M .Every minute of light exposure, steady-state fluorescence (F S ) and maximum fluorescence under light (F M ' ) were measured.PSII quantum yield (ϕPSII) and NPQ were calculated on the last time point of each light step as ϕPSII = (F M '-F s )/F M ' and NPQ = F M /F M '-1, and rETR at PSII as rETR = ϕPSII.E.The whole rETR vs E curve was fitted as rETR = rETR M .(1-exp(-α.E/rETR M )) where rETR M is the maximum rETR and α is the light-limited slope of rETR vs E (Jassby and Platt 1976).Only rETR values from 0 to 450 µmol photons m −2 were used for the fit because values from 600 and 750 µmol photons m −2 were too noisy.The light saturation parameter E K was calculated as rETR M /α and the fitted values of the parameters were used to estimate ϕPSII under the growth light intensity of 50 µmol photons m −2 s −1 as ϕPSII 50µE = (rETR M .(1-exp(-α.50/rETR M )))/50.The NPQ vs E curve was fitted as NPQ = NPQ M × E n /(E 50 NPQ n + E n ), where NPQ M is the maximal NPQ, E 50 NPQ is the half-saturation intensity for NPQ and n is the sigmoidicity coefficient (Serôdio and Lavaud 2011).
The PSII functional absorption cross-section, σPSII, was calculated from the fluorescence induction upon a single turnover flash of blue light (100 µs, 455 nm, and 60 nm bandwidth) on nonconcentrated cell culture.The induction curve was measured on 20 min dark-acclimated samples before centrifugation (average of 2 to 4 independent replicates) with a Fluorescence Induction and Relaxation (miniFIRe) fluorometer (Gorbunov et al. 2020), which also measures single turnover F V /F M and PSII connectivity.
Parameters measured with the miniFIRe fluorometer (as defined below) were also quantified for cultures grown under 8C CL, as the measurements were sufficiently rapid to allow the culture to be maintained at growth temperatures (Gorbunov et al. 2020).Measured photophysiological values are tabulated in Supplementary Data Set 4, sheet 8. Violin plots of photophysiological parameters were generated with BoxPlotR (Spitzer et al. 2014).

Gene expression analysis
Libraries were prepared from 200 ng DNAse-treated RNA for each mutant line and treatment condition, with at least three replicates per sample.Sequencing was performed by Fasteris (Plan-les-Ouates, Switzerland).After initial quality control checks, stranded Illumina mRNA libraries were prepared with a Novaseq V1.5 kit and sequenced with an SP-flow cell with 2× 100 bp over 200 cycles, yielding circa 130 to 160 Gb sequence data per sample with ≥85% of bases higher than Q30.
FastQ files were mapped using Nextflow's RNA sequencing assembly pipeline https://nf-co.re/rnaseq/usage, to gff3 annotations of the P. tricornutum version 3 genome (Rastogi et al. 2018;Lataretu and Hölzer 2020).Total mapped read counts were then compared between all biological and technical replicates for (i) each mutant line sequenced, (ii) each genotype (cpEnolase knockout, cpPGAM1A knockout, and control), and (iii) each treatment condition performed (LD, CL, and 8C) by principal component analysis (PCA) using the R package factoextra, with highly variant libraries removed (Kassambara and Mundt 2017).The final dataset included 63 RNAseq libraries, including 5 cpEnolase and 5 cpPGAM1A knockout lines and 4 empty-vector controls, and a minimum of 4 RNA libraries prepared from at least two genetically distinct mutant constructs for each genotype (cpEnolase, cpPGAM1A, and control) considered (Supplementary Data Set 5, sheets 1 to 2).Differentially expressed genes (DEGs) were then calculated between each genotype for each treatment condition using DESeq2 with cutoff fold-change 2 and P-value 0.05 [Liu et al. 2021 (Supplementary Data Set 5, sheets 2 to 3)].
The mean transcript abundances of DEGs in knockout compared to control lines were first assessed in RNAseq data of N and P-limited P. tricornutum cell lines under two and nine timepoints, respectively [Supplementary Data Set 5, sheet 4 (Cruz de Carvalho et al. 2016;McCarthy et al. 2017)].No significant differences were found between DEGs and other genes in the P. tricornutum genome (1-way ANOVA, P > 0.05; Supplementary Data Set 5, sheet 5), confirming that the RNAseq samples were not generated from N-or P-limited cultures.Next, functional enrichments in DEGs were from previously tabulated values for the entire P. tricornutum genome (Supplementary Data Set 5, sheets 6 to 10) (Rastogi et al. 2018;Ait-Mohamed et al. 2020).Functional enrichments were identified by two-tailed chi-squared test (P < 0.05) of a differentially expressed gene occurring in either one (cpEnolase v control; cpPGAM1A v control) knockout-versus-control line tests or in both tests realized under each physiological condition.Finally, the distribution of DEGs across P. tricornutum core plastid and mitochondrial metabolism pathways were mapped onto a previously defined proteomic model of each organelle (Ait-Mohamed et al. 2020), with the strongest DEG enrichment taken in the case of enzymes with multiple homologs (Supplementary Data Set 5, sheet 11).
Quantitative RT-PCR (RT-qPCR) validations were performed using cDNA synthesized from 5 ng dNase-treated RNA (per RT-qPCR) Probing Diatom Chloroplast Lower Glycolysis | 3603 and an RT Maxima First Strand synthesis kit (Thermo Fisher), following the manufacturer's instruction; using a 384-well Lightcycler (Roche) and Takyon No ROX SYBR 2X MasterMix (Eurogentec), following the manufacturers' protocols.Typical amplification conditions were: 10 min at 95 °C, followed by 40 cycles of 30 s at 95 °C, 30 s at 55 °C, and 30 s at 72 °C.Primer pairs for RT-qPCR amplifications were designed using NCBI Primer-BLAST (Ye et al. 2012), privileging unique amplification targets within the genomic sequence, an amplicon size of 100 to 150 base pairs, primer positions at different regions of the gene studied, and a 3′ terminal G or C on each primer.Primer efficiencies were tested by qPCR with serial dilutions of P. tricornutum gDNA, with only primer pairs that yielded a Cp increment of between 0.9 and 1.1 per half dilution of DNA retained for RT-qPCR analysis.RT-qPCRs were at least 3 times for each amplicon: sample pair.RT-equivalents were performed to subtract residual genomic DNA from each Cp value obtained, and two housekeeping genes (Ribosomal protein S1, RPS; TATA binding protein, TBP) previously shown to have conditionally invariant expression patterns in P. tricornutum were used as quantification references (Sachse et al. 2013).Tabulated RT-qPCR outputs are shown in Supplementary Data Set 5, sheet 13; and sample information and reaction conditions per MIQE guidelines (Bustin et al. 2009) are tabulated in Supplementary Data Set 5, sheet 14.

Metabolite analysis
Cell pellets were taken from exponential-phase P. tricornutum culture (counted density 1 to 2 × 10 6 cells mL −1 , 1.5 × 10 8 cells per sample) for metabolite and lipid analysis.Cell pellets were collected without washing to minimize impacts on metabolite turnover and then transferred to a pre-weighed, double-pierced, and doubleautoclaved 1.5-mL Eppendorf tube for lyophilization.Cell pellet masses were recorded, and samples were immediately snap-frozen in liquid nitrogen and stored at −80 °C for subsequent analysis.
Metabolite profiling was carried out by gas chromatographymass spectrometry (ChromaTOF software, Pegasus driver 1.61; LECO) as described previously (Lisec et al. 2006).The chromatograms and mass spectra were evaluated using TagFinder software (Luedemann et al. 2012).Metabolite identification was manually checked by the mass spectral and retention index collection of the Golm Metabolome Database (Kopka et al. 2005).Peak heights of the mass fragments were normalized successively on the basis of the fresh weight of the sample, the added amount of an internal standard (ribitol), and values obtained for loading column controls obtained from the same experiment.

Glycerolipid analysis
Glycerolipids were extracted by suspending cell pellets in 4 mL of boiling ethanol for 5 min to prevent lipid degradation.Lipids were extracted by the addition of 2 mL methanol and 8 mL chloroform at room temperature (Folch et al. 1957).The mixture was then saturated with argon and stirred for 1 h at room temperature.After filtration through glass wool, cell remains were rinsed with 3 mL chloroform/methanol 2:1, v/v and 5 mL of NaCl 1% was added to the filtrate to initiate biphase formation.The chloroform phase was dried under argon and stored at −20 °C.The lipid extract was resuspended in pure chloroform when needed.
Total glycerolipids were quantified from their fatty acids: in an aliquot fraction, 5 µg of 15:0 was added, and the fatty acids present were converted to methyl esters (FAME) by a 1-h incubation in 3 mL 2.5% H 2 SO 4 in pure methanol at 100 °C (Jouhet et al. 2003).The reaction was stopped by the addition of 3 mL water and 3 mL hexane.The hexane phase was analyzed by a gas chromatography-flame ionization detector (GC-FID; Perkin Elmer) on a BPX70 (SGE) column.FAMEs were identified by comparison of their retention times with those of standards (Sigma) and quantified using 15:0 for calibration.
Glycerolipids were further analyzed by high-pressure liquid chromatography-tandem mass spectrometry (HPLC-MS/MS), based on a previously described procedure (Rainteau et al. 2012).The lipid extracts corresponding to 25 nmol of total fatty acids were dissolved in 100 µL of chloroform/methanol [2/1, (v/v)] containing 125 pmol of each internal standard.Internal standards used were phosphatidylethanolamine (PE) 18:0 to 18:0 and diacylglycerol (DAG) 18:0 to 22:6 from Avanti Polar Lipid, and SQDG 16:0 to 18:0 extracted from spinach thylakoids (Demé et al. 2014) and hydrogenated (Buseman et al. 2006).Lipid classes were separated using an Agilent 1200 HPLC system using a 150 mm × 3 mm (length × internal diameter) 5 µm diol column (Macherey-Nagel), at 40 °C.The mobile phases consisted of hexane/isopropanol/water/1 M ammonium acetate, pH 5.3 [625/350/24/1, (v/v/v/v)] (A) and isopropanol/water/1 M ammonium acetate, pH 5.3 [850/149/1, (v/v/v)] (B).The injection volume was 20 µL.After 5 min, the percentage of B was increased linearly from 0 to 100 in 30 min and kept at 100% for 15 min.This elution sequence was followed by a return to 100% A in 5 min and equilibration for 20 min with 100% A before the next injection, leading to a total runtime of 70 min.The flow rate of the mobile phase was 200 µL min −1 .The distinct glycerophospholipid classes were eluted successively as a function of the polar head group.Mass spectrometric analysis was performed on a 6460 triple quadrupole mass spectrometer (Agilent) equipped with a Jet stream electrospray ion source under following settings: drying gas heater at 260 °C, drying gas flow at 13 L•min −1 , sheath gas heater at 300 °C, sheath gas flow at 11 L•min −1 , nebulizer pressure at 25 psi, capillary voltage at ± 5000 V, and nozzle voltage at ± 1,000 V. Nitrogen was used as the collision gas.The quadrupoles Q1 and Q3 were operated at widest and unit resolution, respectively.
Phosphatidylcholine (PC) and diacylglyceryl hydroxymethyltrimethyl-β-alanine (DGTA) analyses were carried out in positive ion modes by scanning for precursors of m/z 184 and 236, respectively, at a collision energy (CE) of 34 and 52 eV.SQDG analysis was carried out in negative ion mode by scanning for precursors of m/z −225 at a CE of −56 eV.PE, phosphatidylinositol (PI), phosphatidylglycerol (PG), MGDG, and digalactosyldiacylglycerol (DGDG) measurements were performed in positive ion modes by scanning for neutral losses of 141 Da,277 ,189,179,and 341 Da at cEs of 20 eV, 12 , 16, 8, and 8 eV, respectively.DAG and triacylglycerol (TAG) species were identified and quantified by multiple reaction monitoring (MRM) as singly charged ions [M + NH4] + at a CE of 16 and 22 eV, respectively.Quantification was done for each lipid species by MRM with 50-ms dwell time with the various transitions previously recorded (Abida et al. 2015).Mass spectra were processed using the MassHunter Workstation software (Agilent) for lipid identification and quantification.Lipid amounts (pmol) were corrected for response differences between internal standards and endogenous lipids as described previously (Jouhet et al. 2017).
Normalized metabolite and lipid abundances were screened by PCA, as per the RNAseq analysis above, and outliers and biologically non-representative samples were removed.The final datasets consist of 139 libraries (metabolite GC-MS), 55 libraries (lipid GC-MS), and 49 libraries (lipid LC-MS), with a minimum of 3 libraries prepared from at least two genetically distinct mutant constructs for each genotype considered (Supplementary Data Set 6, sheet 1).Violin plots of differentially accumulated lipids were generated with BoxPlotR (Spitzer et al. 2014).

Expressed enzyme reaction kinetics
Measurements of cpEnolase and cpPGAM1A reaction rates were performed following a previously defined protocol [Zhang et al. 2020 (Supplementary Fig. S20)].First, codon-optimized constructs for E. coli expression were synthesized (Eurofins) using full-length cpEnolase and cpPGAM1A mRNA sequences as templates.Constructs were cloned into a Gateway pDest-CTD-His vector via a pDONR intermediate and BP/LR clonase (all reagents Thermo Fisher) following the manufacturer's instructions (Hartley et al. 2000).To enable optimal expression in E. coli, multiple N-terminal length variants were synthesized from each gene, with those corresponding to the full gene length minus the predicted N-terminal signal peptide domain as inferred with SignalP (Almagro Armenteros et al. 2019).Complete constructs and primers tested are provided in Supplementary Data Set 6, sheet 7.
cpEnolase and cpPGAM1A-CTD-His vectors were cloned into Rosetta (DE3) strain E. coli (Novagen) and coselected on ampicillin (100 µg/ml) and chloramphenicol (34 µg/ml).Proteins were induced in overnight cultures at 28 °C, purified on a His-Trap column (GE Healthcare) following the manufacturers' instructions, and eluted in a buffer consisting of 125 mM NaCl, 250 mM Imidiazol (Sigma), and protease inhibitors.Eluted proteins were desalted using a Q10/PD10 column (GE Healthcare) and quantified using a Bradford.Protein integrity and quantity were assessed routinely throughout the purification using SDS-PAGE.
Reaction rates were measured on purified 100 µg cpPGAM1A and 100 µg cpEnolase, as quantified with a nanodrop spectrometer.Rates were measured separately for glycolytic and gluconeogenic activity.Briefly, to measure glycolytic reaction rates, both enzymes were combined with 10 units pyruvate kinase and 10 units lactate dehydrogenase (both Sigma-Aldrich) at 25 °C, alongside varying concentrations 9 mM D(-)3-Phosphoglyceric Acid, 25 mM Adenosine ''-Diphosphate, and 25 mM reduced ß-NADH.Enzymatic activity was measured by considering 340 nm colorimetry as a proxy for NADH consumption following a previously defined protocol [Sigma protocols EC 5.4.2.1 (Sutherland et al. 1949)].To measure gluconeogenic reaction rates, a similar reaction was performed with both enzymes combined with 10 units phosphoglycerate kinase and 10 units glyceraldehyde-3phosphate dehydrogenase (both Sigma-Aldrich), alongside 9 mM phospho-enol-pyruvate, 25 mM Adenosine 5′-Diphosphate, and 25 mM reduced ß-NADH.Enzymatic activity was similarly measured by 340-nm colorimetry.A schematic of the measured reactions is provided in Supplementary Fig. S20.Complete measured reaction rates over all technical replicates are provided in Supplementary Data Set 6, sheet 8.

Statistical analysis
Expression levels of cpEnolase and cpPGAM, mutant growth rates, and mutant GC-MS/LC-MS metabolite distributions compared to control lines were assessed by 1-way ANOVA.Correlations to environmental variations in Tara Oceans data were assessed by Pearson and Spearman correlations, with significance identified via a 1-way t-test.DEGs were assessed using iDEP.91,with cutoff fold-change 2 and P-value 0.1.Functional enrichments in differentially expressed gene categories were assessed by two-tailed chi-squared tests.All statistical analyses, including mean values, standard deviations, and sample sizes related to each calculation, are provided in the relevant tabs of Supplementary Data Sets 1 to 6. P-values are signed as follows: for ANOVAs, negative values indicate the first category in the comparison has a smaller mean value than the second, and positive values that the first category in the comparison has a larger mean value than the second; for correlations, negative values indicate a negative correlation coefficient, and positive values a positive correlation coefficient; and for chi-squared tests, negative values indicate the observed frequency of a particular character is less than the expected frequency, and positive values that the observed frequency of a particular character is greater than the expected frequency.

Materials distribution statement
The author(s) responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (https://academic.oup.com/plcell/pages/General-Instructions) are Richard G. Dorrell (richard.dorrell@sorbonne-universite.fr) and Chris Bowler (cbowler@bio.ens.psl.eu).

Accession numbers
RNAseq data associated with this project have been deposited with NCBI BioProject with project number PRJNA788211.

Figure 4 .
Figure 4. Growth phenotypes of cpEnolase and cpPGAM1A CRISPR-Cas9 knockout mutant and zeocin-resistant empty-vector control P. tricornutum lines.A) exemplar growth curves from single experiments realized for P. tricornutum lines in 50 μE m −2 s −1 illumination, non-shaken cultures and replete ESAW media, under 3 conditions: (i) 19 °C and 12-h light: 12-h dark Circadian cycles (« 19C LD »); (ii) 19 °C and 24 h continuous light (« 19C CL »); and (iii) 8 °C and 24 h continuous light (« 8C CL »).Hashed black lines show the approximative concentrations (between 5 × 10 4 and 4 × 10 6 cells ml −1 ) over which growth rates were calculated).B) mean relative log phase growth rates of each genotype under each condition, measured through a minimum of 3 biological replicates and two technical repetitions (6 measurements per line, minimum 24 measurements per genotype), over 5 timepoints with linear (r 2 > 0.95 relationship between log cell density and time).Asterisks indicate significant differences as inferred by 1-way ANOVA.An alternative version of this figure showing absolute growth rates of individual cell lines is provided in Supplementary Fig. S14.

Figure 5 .
Figure5.Changes in plastid and mitochondrial metabolic architecture inferred from gene expression analyses.A) predicted consensus localizations (either chloroplast or non-chloroplast) from ASAFind(Gruber et al. 2015)  and HECTAR(Gschloessl et al. 2008) of all genes inferred (P < 0.05, fold-change expression >2) to be up-or downregulated in both cpEnolase and cpPGAM1A knockout compared to control lines under 19C LD, 19C CL, and 8C CL.Significantly enriched localizations (two-tailed chi-squared test) are asterisked.B) relative mRNA abundances of eleven genes encoding exemplar chloroplast-and mitochondria-targeted proteins, verified by RT-qPCR.Genes differentially expressed (two-tailed t-test, P < 0.05) in each condition are asterisked.

Figure 6 .
Figure 6.Volcano plots of differentially accumulated metabolites assessed by GC-MS.Scatterplots of the log 2 accumulation ratios and −log 10 P-values of difference in the mass, ribitol, and quality-control-normalized abundances of 39 sugars and amino acid metabolites in cpEnolase and cpPGAM1A knockout compared to empty-vector control lines, measured by GC-MS in all 3 experimental conditions tested.Metabolites that show a differential accumulation in each plot (P < 10 −05 ) are labeled, with metabolites that show a differential accumulation in both knockout lines in each condition shown in black text, and 5 metabolites that are uniquely over-accumulated in cpEnolase knockout lines under all 3 conditions shown in dark red text.
via nitrate reductase) to allow direct metabolic quantification of de novo synthesized amino acids (McCarthy et al. 2017; Yin and Hu 2023).

Figure 8 .
Figure 8. Proposed activities of P. tricornutum plastid lower glycolysis-gluconeogenesis.Schematic diagram showing potential inferred roles of lower half diatom plastid glycolysis-gluconeogenesis in each environmental condition tested.The measured V max of purified cpEnolase and cPGAM1A supplemented with 3-PGA (glycolytic direction) or PEP (gluconeogenic direction) are provided for 9 mM substrate in each case.