Article https://doi.org/10.1038/s41467-024-51841-5 Covariation of hot spring geochemistry with microbial genomic diversity, function, and evolution Daniel R. Colman 1 , Lisa M. Keller1, Emilia Arteaga-Pozo2, Eva Andrade-Barahona2, Brian St. Clair2, Anna Shoemaker 3, Alysia Cox 2 & Eric S. Boyd 1 The geosphere and the microbial biosphere have co-evolved for ~3.8 Ga, with many lines of evidence suggesting a hydrothermal habitat for life’s origin. However, the extent that contemporary thermophiles and their hydrothermal habitats reflect those that likely existed on early Earth remains unknown. To address this knowledge gap, 64 geochemical analytesweremeasured and 1022 metagenome-assembled-genomes (MAGs) were generated from 34 chemo- synthetic high-temperature springs in Yellowstone National Park and analysed alongside 444 MAGs from 35 published metagenomes. We used these data to evaluate co-variation in MAG taxonomy, metabolism, and phylogeny as a function of hot spring geochemistry.We found that cohorts ofMAGs and their functions are discretely distributed across pH gradients that reflect different geochemical provinces. Acidic or circumneutral/alkaline springs harborMAGs that branched later and are enriched in sulfur- and arsenic-based O2-depen- dent metabolic pathways that are inconsistent with early Earth conditions. In contrast, moderately acidic springs sourced by volcanic gas harbor earlier- branching MAGs that are enriched in anaerobic, gas-dependent metabolisms (e.g. H2, CO2, CH4 metabolism) that have been hypothesized to support early microbial life. Our results provide insight into the influence of redox state in the eco-evolutionary feedbacks between thermophiles and their habitats and suggest moderately acidic springs as early Earth analogs. The microbial biosphere and the geosphere (comprising the litho- sphere, hydrosphere, cryosphere, and atmosphere) have changed in concert for ~3.8 Ga1–3. Subaerial hydrothermal systems (i.e. hot springs) are among the earliest known habitats to support microbial life, as evinced by ~3.5 Ga fossil and geochemical evidence of microorganisms and their activities preserved in ancient hot spring deposits4,5. Phylo- genetic and inferred physiologic data also consistently place microbial thermophiles as among the earliest-branching lineages and suggest that they were supported by anaerobic, chemoautotrophic metabolisms dependent on geogenic energy substrates6,7. Concomitantly, con- temporary continental hydrothermal systems host diverse and abun- dant microbial communities8–11, largely attributed to the extensive geochemical variation and abundant energy substrates available in these environments8,9,12. At the highest temperatures (>~74 oC in cir- cumneutral springs and >~54 oC in acidic springs), photosynthetic metabolism is excluded in hot springs and microbial productivity is driven by chemoautotrophic metabolism supported by geogenic energy substrates like H2S, H2, CO2, and S0 13–16. Consequently, Received: 22 November 2023 Accepted: 20 August 2024 Check for updates 1Department of Microbiology and Cell Biology, Montana State University, Bozeman, MT, USA. 2Department of Chemistry and Geochemistry, Montana Technological University, Butte, MT, USA. 3Department of Earth Sciences, Montana State University, Bozeman, MT, USA. e-mail: daniel.colman@montana.edu; eric.boyd@montana.edu Nature Communications | (2024) 15:7506 1 12 34 56 78 9 0 () :,; 12 34 56 78 9 0 () :,; http://orcid.org/0000-0002-3253-6833 http://orcid.org/0000-0002-3253-6833 http://orcid.org/0000-0002-3253-6833 http://orcid.org/0000-0002-3253-6833 http://orcid.org/0000-0002-3253-6833 http://orcid.org/0009-0002-6670-7730 http://orcid.org/0009-0002-6670-7730 http://orcid.org/0009-0002-6670-7730 http://orcid.org/0009-0002-6670-7730 http://orcid.org/0009-0002-6670-7730 http://orcid.org/0000-0001-9108-143X http://orcid.org/0000-0001-9108-143X http://orcid.org/0000-0001-9108-143X http://orcid.org/0000-0001-9108-143X http://orcid.org/0000-0001-9108-143X http://orcid.org/0000-0003-4436-5856 http://orcid.org/0000-0003-4436-5856 http://orcid.org/0000-0003-4436-5856 http://orcid.org/0000-0003-4436-5856 http://orcid.org/0000-0003-4436-5856 http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-024-51841-5&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-024-51841-5&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-024-51841-5&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1038/s41467-024-51841-5&domain=pdf mailto:daniel.colman@montana.edu mailto:eric.boyd@montana.edu www.nature.com/naturecommunications contemporary high-temperature springs and the communities they support provide an opportunity to understand how chemosynthetic populations were supported on early Earth and how they have evolved alongside their hydrothermal habitats over geologic time. The extensive variation in the geochemical composition observed within hot springs is generatedby a convergenceof several surface and subsurface processes. In most systems, oxic meteoric or saline waters infiltrate the crust and are heated to high temperature, forming a hydrothermal aquifer17,18. These hydrothermal aquifer fluids tend to be solute rich (e.g. of Na+ and Cl−) due to extensive high-temperature water-rock reactions and are infused with magmatic gases like CO2, 3He, and SO2, the latter of which disproportionates at high tempera- ture in thepresenceofwater to formSO4 2− andH2Sor S 0, dependingon temperature and sulfur concentrations19. During circulation in the subsurface and their ascent to the surface, hydrothermal fluids become depleted in oxygen and are further enriched in crustal or meteoric gases such as CH4, 4He, and H2 17,18,20,21. The development of stark chemical variation in waters principally arises from fluids undergoing decompressional boiling during their ascent along frac- tures and faults, resulting in separation of fluids into a lower-density vapor phase carrying volatile gases (e.g. CO2, H2S, H2, and CH4) and a higher density liquid phase comprising non-volatilizing solutes (e.g. Na+ and Cl−)17,18,22. The low-density vapor phase can continue ascending to the surface and form fumaroles or otherwise condense with near- surface waters18,23. Condensation of vapor phase gases with near sur- facefluids equilibratedwith atmospheric O2 can result in the oxidation of H2S, resulting in the formation of S0 and ultimately sulfuric acid (H2SO4) 23–25. The oxidation ofH2S and S0 compounds is likely driven, at least in part, byO2-dependent thermoacidophilic Archaea25–29 thatmay have diverged from neutrophilic ancestors over only the past ~1 Ga28. The separation of fluids into a vapor phase and a liquid phase and subsequent oxidation of H2S to form SO4 2− and acid leads to the bimodal distribution of hot spring pH observed globally30. This includes the two primary types of hot springs in contemporary geo- thermal fields: acid-sulfate and circumneutral-alkaline springs. Given that O2 only began to accumulate in the atmosphere ~2.4 Ga and only reached present-day levels at ~0.8 to 0.6 Ga (as summarized in ref. 31), acidic hot springs and the thermoacidophiles they host are likely to be relatively recent phenomena on Earth28. Circumneutral to alkaline pH hot springs are the surface expres- sion of the liquid phase and thus tend to be gas- and oxidant-poor, the latter of which is due to long residence times in the subsurface22 that permit extensive water-rock reactions and consumption of oxidants17. Paradoxically, these springs are often dominated by obligately aerobic bacteria within the Aquificales order32–34 and have recently been argued toonly be habitable due to the infusionof atmosphericO2 once the waters reach the surface22,35. It is consequently unclear whether acidic hot springs or circumneutral/alkaline hot springs could have been prevalent microbial habitats on early Earth when O2 was not readily available. Springs with pH intermittent to acid-sulfate and cir- cumneutral/alkaline types are globally more rare30 and occur due to dilution of hydrothermal water by meteoric water or by mixing of meteoric and/or hydrothermalwater/gasses36,37. Thus,while hot spring ecosystems are often invoked as suitable analogs for investigating early life4,38, it remains unclear how well their contemporary geo- chemistry and microbiology reflect early Earth conditions. The most widely studied hydrothermal system is in Yellowstone National Park (YNP) that is host to >10,000 geothermal features that widely vary in their geochemical composition12,17,24,25,39. Extensive spa- tial geochemical variation among hot springs selects for unique assemblages of microorganisms with diverse functionalities8,10,11,34. However, the provenance of this taxonomic and functional diversity and their distributions remain unclear, in particular, given that the geochemical compositions of hydrothermal waters are likely to have substantively changed over Earth history as the Earth became more oxidized, as described above. Here, a census of hot spring genomic and functional diversitywasgenerated alongside detailed geochemical measurements in 34 high-temperature hot springs (>61.9 oC) with conditions that preclude photosynthesis to evaluate the adaptive evolution of thermophilic lineages and their functions in coordination with their hydrothermal habitats. A total of 64 geochemical analytes were measured and 1022 metagenome-assembled-genomes (MAGs) were generated from community metagenomes from these springs. These data were also analyzed in conjunction with 444 MAGs gener- ated from 35 metagenomes in our other recent studies8,29,35,40–43. Results Geologic and geochemical context Thirty-four sediment samples from 34 YNP springs with conditions that preclude photosynthesis (i.e. via a combination of pH, tempera- ture, and sulfide13,14) were collected for metagenomic analysis (Sup- plementary Dataset 10). Water and dissolved gas samples were collected for geochemical analyses in coordination with sediment sampling (Supplementary Dataset 1). In addition, 35 previously pub- lished YNP hot spring samples (20 sediment samples and 15 water samples) from 19 other springs that were analyzed in our other studies (Supplementary Dataset 1) were included in the analyses and were similarly sampled and processed8,35,40–43. The 53 springswere located in 14 geothermal regions spanning a large geographic extent of YNP (Fig. 1a). The primary bedrock type in YNP is rhyolite (silica-rich, iron- poor rock) due to rhyolitic eruptions and lava flows from the Yellow- stone caldera, although minor basalt flows (silica-poor, iron-rich rock) are present in discrete regions of YNP (Fig. 1a). The geyser basins sampled were primarily located in rhyolite formations44,45 including in the Smokejumper (SJ),MiddleGeyser Basin (MGB), LowerGeyser Basin (LGB), Upper Geyser Basin (UGB), Gibbon River (GRV), Sylvan Springs (SYL), Geyser Creek (GCR), Norris Geyser Basin (NOR), Norris- Mammoth Corridor (NMC), Seven Mile Hole (SMH), Crater Hills (CRH), Phantom Fumarole (PFU), and Greater Obsidian Pool Area (GOPA) (Fig. 1a). Geochemical analyses have indirectly suggested the interaction of thermal waters with sedimentary formations in the Washburn (WB) geyser basin17 that is underlaid by the Absaroka Vol- canic Supergroup formation that contains basaltic, andesitic (inter- mediate silica and iron), and dacitic (intermediate silica and iron) components46. Thus, the broadgeologic context formost of the geyser basins sampled in this studywas similar, except for theone spring from the WB area (Fig. 1a). The range of spring pH (1.3–8.9) and temperature (61.9–92.8 oC) among the 69 total samples encompassed previously documented ranges of YNP springs that host non-photosynthetic communities (Fig. 1b)13,14,47. In addition, the spring waters exhibited variation in geo- chemistry spanning that previously observed for YNP springs (Fig. 1c)25. Sulfate (SO4 2−) and chloride (Cl−) concentrations are often used to identify the sources of fluids to hot springs globally18,25. Dilute meteoric- only (MO) waters are sourced from near surface aquifers that are recharged with recent snowmelt or rainfall and carry little solute con- tent, including low concentrations of SO4 2− or Cl−25 (Fig. 1c). The infusion of vapor phase gas containing sulfide into meteoric waters (meteoric water + gas, MG) leads to acidic waters with elevated SO4 2− concentra- tions, but minimal Cl− concentrations (Fig. 1c)25. The deep hydrothermal aquifer of YNP is estimated to have residence time estimated as between ~300 and ~1500 years22, although earlier estimates suggested ages of >10,000 years48 and its waters have high Cl− concentrations due tomore extensive high-temperature water-rock interactions in the subsurface17. Springs sourced by the deep hydrothermal aquifer, or those termed hydrothermal only (HO), therefore have elevatedCl- andmoderate SO4 2− concentrations, the latter being derived from the initial dis- proportionation of SO2 25. Mixing of the aforementionedwater types can result in waters intermediate to the end-member compositions descri- bedabove,while additional infusionofgas (e.g.H2S) andboilingcan lead Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 2 www.nature.com/naturecommunications to boiled hydrothermal-only waters (HB type) with greater solute- concentrations (Fig. 1c) or boiled hydrothermal waters with additional gas input (i.e. HBG type) type waters25. Using the above framework, representatives of nearly all hydro- thermal water types previously documented in YNP were sampled (Fig. 1c), with the exception of HBG-type waters that comprise only a few known examples within YNP25 and carbonate-rich waters in the Mammoth Hot Springs area that are also anomalous within YNP17. The springs were largely reflective of the bimodal pH observed in con- tinental hot springs, with acid-sulfate springs (pH< ~5) and cir- cumneutral alkaline-chloride springs (pH> ~7) representing the two most prevalent spring types18,30. In addition, several springs were sampled that were moderately acidic (e.g. pH ~ 5–7) that derived from mixing of different fluids, dilution of hydrothermal fluids with meteoric water, or infusion of meteoric waters with volcanic gas25,42,49. Consequently, these hot spring pH provinces (pH < 5, 5–7, and >7) were used to structure the subsequent analyses of geochemical and microbial biodiversity and are referred to as acidic springs (AS),mixed springs (MS), and circumneutral/alkaline springs (CS) henceforth. Geochemical variation The distributions of sample temperatures among the three pH groups were similar, owing to the strategy to sample as evenly as possible across temperature and pH realms in springs that do not support photosynthesis (Supplementary Fig. 1). Likewise, water conductivity was similar among groups, except for some of the lowest conductivity values observed in MS springs (Fig. 2a), reflec- tive of waters that arise when vapor phase gas mixes with dilute meteoric water21 or when HO is diluted by meteoric water50. Nearly all waters were hypoxic (defined as those with <60 µM O2 or ~0.95mg/L51 (Fig. 2b), with many also being suboxic (<~5 µM O2 or ~0.16mg/L) (Supplementary Dataset 1). This is consistent with the inverse exponential relationship between O2 solubility and temperature12. MS geochemistry (pH 5–7) reflected values inter- mediate to many of those measured in AS and CS, including of SO4 2− (Fig. 2c; expected to be highest in AS, as discussed above); Cl- (Fig. 2d; expected to be highest in CS, as discussed above), metals like Fe(II) or total Fe (Fig. 2i; Supplementary Figs. 1–4), dissolved organic carbon (DOC), Fig. 2k and dissolved inorganic carbon (DIC) (Supplementary Fig. 1). Concentrations of gases (H2, CH4, and CO2; Fig. 2e–g), in addition to total dissolved sulfide (Fig. 2h), were equivalent or higher among MS than AS (only CO2 was significantly higher), possibly due to increased contributions of gas input to someMS21,42 and the formation of some AS by additional vapor phase gas input. Further, some metals exhibited higher concentrations in MS or CS that could be related to sourcing by deeper circumneutral/ lo g1 0 Su lfa te (m g L-1 ) Chloride (mg / L) a b c 0 2 4 6 8 10 12 pH 60 65 70 75 80 85 90 95 100 Te m pe ra tu re o C 0 200 400 600 800 1,000 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 MG MO HO HBG HB 1 3 45 8 9 10 11 12 13 14 6 72 Fig. 1 | Locations of 69 Yellowstone National Park (YNP) hot spring samples analyzed in this study and major geochemical parameters measured in spring waters. a Map of sampling locations within YNP, modified from USGS I-711, as indicated in the Materials and Methods. Hydrothermal regions are labeled as fol- lows: 1: Norris-Mammoth Corridor (NMC), 2: Norris Geyser Basin (NOR), 3: Sylvan Springs (SYL), 4: Geyser Creek (GCR), 5: Gibbon River (GRV), 6: Washburn (WB), 7: SevenMile Hole (SMH), 8: LowerGeyser Basin (LGB), 9:MiddleGeyser Basin (MGB), 10: Upper Geyser Basin (UGB), 11: Smokejumper (SJ), 12: Phantom Fumarole (PFU), 13: Greater Obsidian Pool Area (GOPA), 14: Crater Hills (CRH). The regions hosting the 34 springs newly analyzed here are indicated in black circles and those only hosting the springs from previous studies are shown in grey circles. b Spring pH and temperature of the 34 hot spring samples newly analyzed in this study (black circles) and the 35 from previous studies (dark grey circles) in context of 7706 measurements of YNP hydrothermal features from a previous geothermal survey (light grey circles; http://rcn.montana.edu). c The sulfate (SO4 2−) and chloride (Cl−) concentrations of the 34 hot spring samples newly analyzed in this study (black circles) and the 35 from previous studies (dark grey circles) in context of 488 other YNPhydrothermal features fromaprevious survey of YNPhot springs (as described in ref. 21). Five hot spring geochemical end-members (i.e. types) are indicated by text, as previously defined by Nordstrom et al. 25 and outlined in the text. MO meteoric only; HO hydrothermal only; MG meteoric + gas; HB hydrothermal only + boiling; HBG hydrothermal only + boiling + gas. The geographic coordinates and geochemical information for each site are provided in Supplementary Dataset 1. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 3 http://rcn.montana.edu www.nature.com/naturecommunications alkaline fluids that undergo high-temperature water-rock interac- tions with rhyolitic bedrock. For example, arsenic (As) (Fig. 2j) exhibited the highest concentrations in CS, consistent with its sourcing from high-temperature reaction with rhyolite bedrock52,53. In addition, major ion composition was largely similar across springs (Supplementary Figs. 1–2), reflective of their sourcing within similar bedrock types. Nearly all waters were depleted in oxidized nitrogen species (i.e. NO3 -, Supplementary Fig. 1; NO2 − was below detection in 2 4 6 C on du ct iv ity ( m S /c m ) 0.0 0.3 0.6 0.9 1.2 D is so lv ed O 2 (m g/ L) 1.0 1.5 2.0 2.5 3.0 3.5 lo g1 0 S O 42- ( pp m ) 0 200 400 600 800 C l- ( pp m ) 1 2 3 lo g1 0 H 2 (n m ol ) −1 0 1 2 3 lo g1 0 C H 4 (n m ol ) 0 1 2 3 lo g1 0 C O 2 (µ m ol ) 1 2 3 4 lg o1 0 S ul fid e (µ g/ L) −1 0 1 lo g1 0 F e (I I) ( m g/ L) 0 1 2 3 lo g1 0 A s (p pb ) −1.0 −0.5 0.0 0.5 1.0 lo g1 0 D O C ( pp m ) 2.0 2.2 2.4 2.6 lo g1 0 S iO 2 (m g/ L) a b c d e f g h i j k l AS MS CS AS MS CS AS MS CS * *** ** ** ** ** * *** *** * * * * * ** Fig. 2 | Variation in select geochemicalparametersmeasured in 69Yellowstone National Park (YNP) hot springs. Springs are organized as acidic (AS; pH < 5), mixed (MS; pH 5–7), and circumneutral/alkaline (CS; pH > 7) spring types. Boxplots for each parameter show the interquartile ranges of distributions in the grey boxes, withmedians shownas black lines in the center of the boxes.Whiskers show the full ranges of the distributions. White circles show individual sample points and those beyond the whiskers are considered outliers. a Conductivity (n = 60), (b) dissolved oxygen (O2) (n = 34), (c) sulfate (SO42−, n = 68), (d) chloride (Cl−; n = 68), (e) dis- solved hydrogen (H2) (n = 35), (f) dissolved methane (CH4) (n = 35), (g) dissolved carbon dioxide (CO2,n = 34), (h) total sulfide (n = 60), (i) ferrous iron (Fe(II),n = 61), (j) total arsenic (As, n = 34), (k) dissolved organic carbon (n = 39), (l) dissolved silica (SiO2, n = 37). Some parameters were log transformed when their ranges spanned several orders ofmagnitude. Statistical analysis in differences in distributionmeans was evaluated with pairwise Wilcoxon rank sum tests (with Benjamini & Hochberg correction for multiple comparisons), and statistically significant differences (two- sided) are shown between groups with red lines and asterisks: *: p <0.05; **: p <0.01; ***: p <0.001. Exact p-values are shown in the accompanying Source Data file. Geochemical data are provided in SupplementaryDataset 1 and additional distributions of other analytes are shown in Supplementary Figs. 1–4. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 4 www.nature.com/naturecommunications all waters) and phosphate (PO4 3−; Supplementary Fig. 1), with all values < 1mg/L, except for one AS (SYL01; PO4 3− concentration of ~15.6mg/L; Supplementary Dataset 1). Geochemical controls on genomic diversity The 69 metagenomic libraries analyzed here (including the 34 newly described in this study) comprised ~2.3 Tbp of paired-end read data (average of 114 × 106 reads/sample) and 7.4 Gbp of assembled data (Supplementary Dataset 1). A total of 1466 metagenome-assembled- genomes (MAGs) of at least medium to high quality (>50% estimated completeness and <10% contamination) were compiled from 67 of the metagenomes. Only the raw reads from 2 of the metagenomes (Old Faithful Geyser andSpouterGeyser)wereused for taxonomic analyses, given the inability to generate appropriate quality MAGs from these samples, as previously described43. The MAGs comprised 372 meta- genomic operational taxonomic units (mOTUs) defined at 95% average nucleotide identity (ANI) that approximately reflects species-level clusters54. The overall census of hot spring diversity was relatively complete, as indicated by rarefaction analysis of mOTUs (Fig. 3a) and based on the contribution of each metagenomic sample to the overall a b e c d 0 100 200 300 0 500 1,000 1,500 U ni qu e m O T U s 0 20 40 60 MAGs Sampled 0 100 200 300 U ni qu e m O T U s -0.4 -0.2 0 0.2 0.4 NMDS Axis 1 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 N M D S Ax is 2 2 3 4 5 6 7 8 -0.4 -0.2 0 0.2 0.4 NMDS Axis 1 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 N M D S A xi s 2 65 70 75 80 85 90 95 pH T(oC) f 14 15 16 17 2.5 5.0 7.5 pH N on pa re il D iv er si ty 70 80 90 Temperature (oC) 14 15 16 17 N on pa re il D iv er si ty Metagenomes Sampled 60 Fig. 3 | Genomic biodiversity present among 1466 metagenome-assembled- genomes (MAGs) recovered in this study from 69 Yellowstone National Park (YNP) hot springs. The 1466 MAGs were clustered into 372 metagenomic opera- tional taxonomic units (mOTUs) at the 95% average nucleotide identity (ANI) level that approximately corresponds to species-level thresholds. Rarefaction curves showing the sampling of unique mOTU diversity with increasing recovery of MAGs (a) or with additional sample inclusion (b). The light red shading shows the expected 95% confidence intervals (not identifiable in panel a because they are less than the thickness of the line). c, d The genomic diversity, as measured with the Nonpareil diversity metric, in each metagenome in relationship to hot spring pH (2nd order polynomial regression; adjusted R2 = 0.26, p = 1.594 × 10−5, two-sided) (c) or temperature (linear regression R2 = 0.043, p =0.05, two-sided) (d). e, f Non- metric multidimensional scaling (NMDS) ordinations of hot spring mOTU com- positionbasedonvariation inmOTUabundances across the 67 hot springs used for MAG reconstruction (2 previously published metagenomes were not used to gen- erate MAGs, as described in the 'Methods'). Each community is colored by corre- sponding hot spring pH (envfit correlation; R2 = 0.87, p ≤0.001, two-sided, uncorrected formultiple comparisons) (e) or temperature (R2 = 0.50, p <0.01, two- sided, uncorrected for multiple comparisons) (f). NMDS stress was 0.10. The rar- efaction andNMDSplot sourcedata are provided in the accompanying SourceData file, while the diversity values are provided in Supplementary Dataset 1. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 5 www.nature.com/naturecommunications mOTU count (Fig. 3b). Thus, the present study represents a nearly complete census of the species-level genomic diversity in high- temperature YNP springs. Among the 1466MAGs, 986were archaeal and 480were bacterial, representing 11 and 36 archaeal and bacterial phylum-level groups, respectively, in addition to one bacterial MAG from SMH02 that could not be classified to a known bacterial phylum (Supplementary Fig. 5; SupplementaryDataset 2).Many archaeal and bacterialMAGswere not classifiable (i.e. genomic database representatives are not available) at lower taxonomic ranks (classes, orders, families, and genera) (Sup- plementary Fig. 5). The level of unclassified taxa far surpassed esti- mates of uncharacterized archaeal andbacterial genomicdiversity that have been considered high in other ecosystems including in the Arctic Ocean and Baltic Sea55,56, while also rivaling that of a recent analysis of global deep-sea hydrothermal vents57. Unclassified taxa were not evenly distributed among springs or spring types, with springs exhi- biting pH ranges between pH 5 and 7 (i.e. MS-type springs) harboring most of the unclassified diversity (Supplementary Fig. 5). Unclassified genus-level diversitywas highly correlatedwith overall spring genomic diversity (measured via the Nonpareil diversity metric; linear regres- sion adjustedR2 = 0.40,p <0.001, two-sided), suggesting that themost biodiverse springs in YNP are MS and that these host the least studied taxa (i.e. a lack of representatives in genomic databases). To evaluate whether the MAG-based dataset accurately repre- sented community taxonomic composition, MAG taxonomic classifi- cations and their associated relative abundances were compared against estimates from assembly-free read-based taxonomic analyses for the 34 metagenomes newly described here. The agreement of the two relative abundance estimates was exceptionally high (Supple- mentary Fig. 6; adjusted R2 = 0.94, slope = 1.02, p <0.001), indicating that the MAG-based dataset robustly represented the taxonomic diversity present in each metagenome. These results are consistent with the genomic coverage analyses described above that suggested nearly complete coverage of genomic diversity for all metagenomes. Taxonomic classifications identified in the read-based analyses, but not in the MAG-based analyses, were also evaluated to assess whether specific taxa were systematically not identified in the MAG-based analyses. None of the read-based classifications that were absent in the MAG-based dataset exhibited estimated relative abundances > ~3% and only two had estimated relative abundances of > 1% (Supplementary Dataset 3). Taken together, the results suggest that the vast majority of taxa, including essentially all higher relative abundance taxa (e.g. ~ >1%), in the spring community metagenomes were present within theMAG-based analyses and their relative abundance estimates were robust to different estimation methods. Undoubtedly, very low- abundance taxa specific to one or a few springs may not be repre- sented in the dataset, but significantly greater sequencing depth than the ~50–100Gbp dedicated to each samplewouldbeneeded to enable recovery of at least medium-quality MAGs. The comprehensive nature of the sequence dataset generated herein was further scrutinized by comparison to an earlier genomic survey of high-temperature non- photosynthetic hot spring communities in YNP that was conducted with low-throughput sequencing methods and that recovered ~30 MAGs comprising ~15 phylotypes from 14 springs58. A comparison of the taxonomic identities of the entire assemblies from the previous 14 springs (n = 297 classifications; 154 unique) against those for the 1022 MAG assemblies newly produced here (n = 2235; 465 unique) revealed that all 154 unique taxonomic affiliations of sequences from the previous study were recovered, in addition to an overall >3-fold increase of taxonomic classifications in the newly reported MAG assemblies of this study (Supplementary Dataset 4). Collectively, these results suggest a highly comprehensive survey of the genomic and taxonomic diversity present within high-temperature YNP springs and represent a significant advancement in our understanding of the microbial biodiversity in these systems. Genomic diversity was not uniformly distributed among spring types, with diversity (based on the Nonpareil diversity metric) highly associated with pH. An initial Pearson correlation analysis indicated a highly significant association with Nonpareil diversity and pH (R =0.51, p < 1 × 10−5), along with weaker correlations to several other para- meters. Specifically, the Nonpareil diversity metric was also sig- nificantly correlated with spring temperature (R = −0.25, p = 0.039), conductivity (R = −0.35, p =0.007), δ18O (R = −0.48, p =0.001), δD (R = −0.43, p =0.003), DIC (R = 0.36, p =0.021), DOC (R = −0.40, p =0.015), Al (R = −0.40, p =0.020), Ti (R = −0.37, p =0.030), Pb (R = −0.38, p =0.028), Th (R = −036, p = 0.035), and CO2 (R = −0.41, p =0.016). Most of these other parameters related to diversity were themselves highly correlated to pH (Supplementary Fig. 7, Supple- mentary Dataset 5), as expected given the role of pH in modulating analyte solubility, speciation, and availability (including of metals: Fe(II), total Fe, Al, Ti, Sb, Ba, La, Ce, Nd, and W; and chemical species: DIC andDOC). The association of Nonpareil diversity and pH exhibited a slightly humped relationship (Fig. 3c) thatwas slightly betterfitwith a 2nd-order polynomial regression; adj. R2 = 0.26, p <0.0001, two-sided) than a linear model (adj. R2 = 0.23, p =0.0117, respectively, two-sided). Temperature was only weakly negatively associated with genomic diversity (Fig. 3d; Pearson R = −0.25, p = 0.0039; linear regression adj. R2 = 0.04, p =0.05, two-sided), contrasting with previously observed temperature-dependent diversity patterns based on 16S rRNA gene sequence analyses of hot springs59–61. This distinction could arise from the use of community genomic data to evaluate biodiversity rather than 16S rRNA gene diversity, as has been used previously, but is also likely influenced by springs of this study spanning ~35 oC since lower- temperature springs (i.e. thosewhere photosynthesis is possible) were not considered. Intriguingly, conductivity was also significantly and inversely correlated to Nonpareil diversity (Pearson R = −0.35, p =0.007, two- sided; linear regression adj. R2 = 0.11, p <0.01, two-sided), although conductivity was not correlated with temperature or pH (Supple- mentary Fig. 7, Supplementary Dataset 5). The peak in genomic diversity in MS is consistent with recent analyses suggesting that moderately acidic hot springs that exhibit greater evidence for mixing of different fluid types generally exhibit higher diversity8,42,61. In many cases, MS exhibit very low conductivity due to the mixing of vapor phase fluids with meteoric waters, while concomitantly harboring exceptionally high taxonomic and functional diversity8. Thus, MS represent biodiversity hot spots in YNP, likely due to underlying geo- logical, hydrological, and geochemical processes that can promote the generation and mixing of oxidized and reduced fluids that generate and maintain biodiversity in hydrothermal systems. Similar observa- tions have been made in deep sea hydrothermal environments, wherein complex subsurface mixing and hydrogeological processes supported exceptionally diverse thermophilic communities62. These observations point to the convergence of subsurface and surface processes in generating andmaintainingmicrobial diversity, such as in the MS identified herein. Among the 66 analytes that were evaluated, water pH best- explained community compositions across springs, followed by tem- perature (Fig. 3e, f; envfit correlation;R2 = 0.87,p <0.001 andR2 = 0.50, p <0.01, two-sided, uncorrected for multiple comparisons, respec- tively). Several other analytes were significantly (p < 0.05) correlated with community composition to a lesser extent than pH or tempera- ture (SupplementaryDataset 6).However, all but four of these analytes were significantly correlated to pH (Supplementary Fig. 7; Supple- mentary Dataset 5), for reasons discussed above. Likewise, of the 4 analytes not significantly associated with pH: Ga, CO2, and elevation were significantly correlated to temperature while B was significantly associated with concentrations of major ions indicative of bulk dif- ferences in water chemistry (e.g. Cl−), suggesting that it is a proxy for major differences in water compositions (e.g. as a proxy for Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 6 www.nature.com/naturecommunications circumneutral/alkaline springs with high Cl− concentrations) (Supple- mentary Fig. 7; Supplementary Dataset 5). Thus, at the system-wide scale, pH and temperature are the only parameters that significantly influence community composition. It is neverthelessworth noting that other individual parameters likely constrain finer-scale taxonomic or functional distributions within spring types, as discussed in the fol- lowing sections. The dominant influence of pH in structuring high- temperature spring communities is consistent with previous taxo- nomic compositional analyses of hot springs based on 16S rRNA gene surveys in YNP and elsewhere9,34,61,63–66, in addition to earlier metage- nomic analysis of 20 YNP hot springs58 andmeta-analyses of functional gene distributions among YNP hot springs67–69. Notably, the single community fromanon-rhyolite setting (WB-02 from Washburn Hot Springs) harbored MAGs almost entirely within mOTUs that were common in other acidic hot springs across YNP (Supplementary Dataset 7), suggesting that the influence of bedrock setting was not as important as overall geochemistry for structuring microbial communities. A recent analysis of 16S rRNA gene amplicon sequence variant (ASV) profiles across continental hot spring systems from different geologic, tectonic, and geographic settings (YNP, Ice- land, and Japan) revealed similar taxonomic groups across all regions61. Notably, spring pH was the greatest predictor of within-system diver- sity distributions, but with evidence for subtle variation in geochem- istry across geologic settings being highly associated with overall variation in communities. Specific comparisons of rhyolite-hosted communities from Iceland and YNP revealed starkly different ASV compositions, despite being from springs with similar geochemical profiles (i.e. with similar pH and temperature). Further, many of the ASVs, including abundant ASVs, were shared between regions, indi- cating that dispersal limitation alone could not account for the observed community differences in geographically distant springs hosted in the same bedrock type. Thus, complex influences from geologic settings, dispersal limitations, and localized geologic/hydro- logic characteristics all likely influence observed biogeographic pat- terns of hot spring biodiversity at within-system and global scales. The inclusion of 35 previously published metagenomes (Supple- mentary Dataset 1) permitted some insights into the stability of com- munities in hot springs over time and differentiation among hot spring phases (e.g. water vs. sediment communities). Three of the springs investigated herewere sampled across years, including three sediment metagenomes from ‘Figure 8’ spring (2013, 2018, and 2019), 2 sedi- ment metagenomes from Perpetual Spouter (2018 and 2019; in addi- tion to a water community metagenome from 2018), and 2 sediment metagenomes fromCinder Pool (2018 and2019), in addition to 6water metagenomes from Cinder Pool (from 0, 9, and 15m depths in 2016 and 2020) (Supplementary Fig. 8). The sediment metagenome com- positions were nearly identical in all three cross-year comparisons, even though the geochemistry of Cinder Pool dramatically changed (acidified from pH ~4 to pH ~2.5) between 2018 and 201941, as did 'Figure 8' across the sampling periods (Supplementary Fig. 8). Like- wise, the depth-resolved samples from Cinder Pool from 2016 and 2020 were all nearly identical, despite spanning the period of spring acidification. In addition, whilemetagenomes concomitantly collected from waters and sediments of 6 different springs were generally compositionally similar, notable differences were observed for some springs, including for Perpetual Spouter and GCR-JH (PS and JH, respectively, in Supplementary Fig. 9). Thus, while overarching geo- chemical determinants like pH and temperature control hot spring composition overall, other factors including historical effects through time (e.g. succession following seismic activity70) and finer-scale geo- chemical differences in waters and sediments likely also contribute to variation35,42,63. Additional studies of the taxonomic and functional composition across a variety of temporal scales and spatial scales within springs would provide useful insights for identifying additional community structuring processes. The pH-dependent variation in community composition reflected discrete pH ranges of individual taxa that likely point to their respec- tive ecological niches (Fig. 4; Supplementary Figs. 10 and 11). For example, AS were dominated by Archaea, including the orders Sulfo- lobales, Desulfurococcales, and Thermoproteales (Fig. 4). The only bacterial taxa identified in AS was the Hydrogenobaculum genus (Aquificota). In contrast, diverse archaeal and bacterial taxa were prevalent among MS, consistent with the above diversity analyses. Lastly, Archaea including Desulfurococcales, Thermoproteales, and Caldarchaeales, along with Thermocrinis (Aquificota) and Armatimo- nadota Bacteria were particularly dominant members of CS. While some taxonomic groups were only identified in discrete pH ranges (e.g. Sulfolobales, Korarchaeales, and Bathyarcheia Archaea, in addi- tion to Deinococcota Bacteria; Fig. 4), many other taxonomic groups were arrayed across broad pH ranges. For example, the 38 Desulfur- ococcales mOTUs (comprising 303 MAGs) and 18 Thermoproteales mOTUs (comprising 143 MAGs) exhibited highly differentiated pH distributions, with both containing specific groups only found in AS, MS, or CS (Supplementary Fig. 10). Likewise, the ninemOTUs classified as Thermocrinis (26MAGs) exhibited distinct pH ranges in eitherMSor CS, while the three mOTUs classified as Sulfurihydrogenibium (10 MAGs) also exhibiteddistinctpH ranges (Supplementary Fig. 11). These observations suggest the widespread presence of taxa (at various phylogenetic ranks) that have evolved to inhabit discrete geochemical provinces, as reflected by spring pH. Network analysis of the correlations in abundance of the 372 mOTUs among the 67 metagenomes used for MAG-based analyses revealed the presence of discrete clusters of taxa, arrayed by the pH of the springs that they inhabited (Supplementary Fig. 12). mOTUs derived fromASwithmean spring pH distributions of pH< 4 exhibited completely non-overlapping sub-networks relative to those derived from MS/CS. Likewise, separate clusters within the sub-network of mOTUs from MS/CS coincided with differences in mean pH distribu- tion ranges (Supplementary Fig. 12). Thus, the network analysis sug- gested the presence of discrete species-level unit (i.e. mOTU) cohorts in springs with similar pH. Taken together, these results suggest that the geologic, hydrologic, and geochemical processes leading to hot spring geochemical variation (as proxied by pH) result in discrete ecological provinces that host discrete cohorts of taxa, implying the presence of eco-evolutionary dynamics among and between lineages and their hot spring environments. Functional distributions across hot spring geochemical provinces To determine how metabolic potentials of the MAGs align with varia- tion in geochemistry, metabolic reconstructions were performed for the 1,466 MAGs recovered in this study (Fig. 5). A comparison of the estimated abundances of functions encoded by the 1022 newly gen- erated MAGs to those within their entire corresponding metagenomic assembly revealed a high level of concordance (Pearson’s R = 0.99; linear model adj. R2 = 0.97, p =0, slope =0.80; Supplementary Fig. 13). Likewise, the overall variation in functional potential profiles from MAGs and those derived from entire metagenome assemblies were essentially identical (Supplementary Fig. 14). Consequently, while some functions encoded in the metagenomes may not have been identified inMAGs, as expected, the comparisons above and the nearly complete census of taxonomic diversity in the unassembled read datasets (described above) suggest that theMAG-based estimations of community functional potentials were comprehensive. In addition, functional profiles formOTUswerebasedon consensus evidence from theMAGswithin thatmOTU that derived fromdifferent samples. Thus, while this represents a conservative approach to defining functions, the leveraging of independently generated MAGs across multiple springs provides additional confidence in the functional assessments described herein. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 7 www.nature.com/naturecommunications Evaluation of the association of geochemistry with functional potential profiles encoded by members of communities (i.e. those related to energy metabolism) indicated that spring pH was again the strongest correlate to variation (envfit R2 = 0.61; p <0.001, two-sided, uncorrected for multiple comparisons; Supplementary Fig. 15). This is consistent with observations for the MAG distribution analyses (Fig. 3e) and also with our previous meta-analysis of functional genes among hot spring metagenomes67. Notably, temperature was not correlated to variation in MAG functional potential (envfit R2 = 0.01; p =0.822, two-sided, uncorrected for multiple comparisons). These results suggest a close relationship between taxonomic andmetabolic diversification in association with hot spring geochemical variation and change. To evaluate the concordance of specific metabolic func- tions with hot spring geochemical provinces, the relative abundances of individual functions (i.e. those shown in Fig. 5) among the three spring types were evaluated and summarized by metabolic cate- gory below. Carbon fixation and 1-carbon metabolism The prevalence of C-fixation pathways significantly varied across spring type, largely due to the predominant taxa within the springs (Fig. 5, Supplementary Fig. 16). Specifically, the reverse TCA (rTCA) C-fixation pathway was significantly more enriched in MS and CS type communities (Fig. 5, Supplementary Fig. 16) and was primarily attrib- uted to Aquificota genera (Thermocrinis, Sulfurihydrogenibium, and Hydrogenobacter) inMS andCS (albeit through twopathway variants33) (Supplementary Dataset 8). The Aquificota have been inferred to be important primary producers in hot springs and exhibit taxon-specific, pH-dependent distributions33,34, consistent with their distributions reported here (Fig. 4, Supplementary Fig. 11). While the Aquificota genus Hydrogenobaculum also encodes the rTCA pathway33 (Supple- mentary Dataset 8) and were present in some AS in this study, they generally only comprised small proportions of their communities and are thus not likely major autotrophs across the AS springs examined herein (Supplementary Fig. 16; Supplementary Dataset 8). Ribulose- bisphosphate carboxylase (RuBisCO)-based (i.e. Calvin Cycle) carbon fixation was also statistically enriched in MS and CS, albeit only in relatively low abundance taxa, including Thermus and Meiothermus (family Thermaceae) and others (Supplementary Dataset 8, Supple- mentary Fig. 16). The potential for autotrophic or mixotrophic meta- bolism in Thermaceae remains enigmatic given prior identification of the potential for the Calvin Cycle in some isolates71 although they are generally considered heterotrophic72. Nevertheless, somemixotrophic isolates have been reported from Iceland73, so it is possible that WOR−3 (8) / (41) Verrucomicrobiota (4) / (6) Thermotogota (4) / (11) Thermodesulfobiota (1) / (2) SZUA−79 (2) / (2) Sumerlaeota (1) / (2) Spirochaetota (4) / (10) Proteobacteria (10) / (13) Planctomycetota (1) / (1) Patescibacteria (16) / (32) Omnitrophota (3) / (3) Nitrospirota (2) / (3) Myxococcota (1) / (1) KSB1 (2) / (2) Goldbacteria (1) / (1) Firmicutes (4) / (5) Elusimicrobiota (3) / (8) Dictyoglomota (4) / (15) Desulfobacterota (12) / (48) Deinococcota (2) / (20) CSP1−3 (3) / (3) Coprothermobacterota (2) / (3) Chloroflexota (14) / (48) Calescibacterota (4) / (14) Caldisericota (3) / (7) Caldatribacteriota (1) / (2) Bipolaricaulota (5) / (20) Bacteroidota (9) / (25) Armatimonadota (12) / (49) Sulfurihydrogenibium (3) / (10) Hydrogenobaculum (1) / (10) Unclassified Aquificaceae (1) / (1) Thermocrinis (8) / (26) Hydrogenobacter (1) / (8) Actinobacteriota (2) / (3) Acidobacteriota (12) / (22) Unclassified Bacteria (1) / (1) Thermoproteia; Thermoproteales (14) / (143) Thermoproteia; Thermofilales (5) / (59) Thermoproteia; Sulfolobales (12) / (83) Thermoproteia; Marsarchaeales (2) / (2) Thermoproteia; Gearchaeales (6) / (25) Thermoproteia; Desulfurococcales (35) / (303) Nitrososphaeria; UBA164 (4) / (45) Nitrososphaeria; Nitrososphaerales (4) / (12) Nitrososphaeria; EX4484−205 (3) / (8) Nitrososphaeria; Caldarchaeales (11) / (60) Unclassified Nitrososphaeria (3) / (6) Methanomethylicia; Nezhaarchaeales (4) / (24) Methanomethylicia; Methanomethylicales (3) / (12) Methanomethylicia; B29−G17 (2) / (11) Unclassified Methanomethylicia (1) / (4) Korarchaeia; Korarchaeales (3) / (21) Bathyarchaeia; TCS64 (1) / (1) Bathyarchaeia; B26−1 (12) / (33) Bathyarchaeia; B24 (2) / (7) Bathyarchaeia; 40CM−2−53−6 (4) / (9) UBA10834 (1) / (1) Thermoplasmatales (3) / (4) ARK−15 (YNP-TEG) (1) / (7) Unclassified Thermoplasmata (1) / (1) E2; UBA202 (1) / (1) Woesearchaeales (1) / (1) UBA10117 (1) / (1) Pacearchaeales (2) / (5) Nanoarchaeales (5) / (25) Micrarchaeota; Micrarchaeales (4) / (11) Halobacteriota; Methanocellales (1) / (1) Halobacteriota; Archaeoglobales (2) / (16) Hadarchaeota; Hadarchaeales (7) / (16) Asgardarchaeota; Odinarchaeales (1) / (1) Aenigmatarchaeota (17) / (23) Relative Abundance (%) 0.1 1.0 10.0 50.0 75.0 Aquificota Nanoarchaeota Thermoplasmatota Thermo- proteota pH < 5 pH 5 - 7 pH > 7 Fig. 4 | pH-dependent ecologicaldistributions andevolutionaryhistories ofhot spring taxa.Distribution of higher order taxonomic groups among 67 Yellowstone National Park (YNP) hot spring communities. Each column represents a single hot spring community metagenome, and each row represents a taxonomic group, identified on the left. The number of metagenomic operational taxonomic units (mOTUs) represented by each taxonomic classification is shown in parentheses, followed by the total number of MAGs within the mOTUs. Archaeal taxa are summed at the order level (phylum given before parentheses or shown to the left) while bacterial taxa are summed at the phylum level (except for the Aquificota that are shown as individual genera). Archaeal taxa are shown in black circles at the top while bacterial taxa are shown in lighter grey circles at the bottom. The estimated relative abundances are shown by circle size based on the scale in the bottom lower left. The estimated relative abundance values are provided alongside taxonomic and mOTU classification information in Supplementary Dataset 2. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 8 www.nature.com/naturecommunications Thermaceae may contribute to primary production via mixotrophic activity in MS and especially CS systems where they are ubiquitous. The 3-hydroxypropionate/4-hydroxybutyrate (3-HP/4-HB) carbon fixation pathway typical of Sulfolobales74 was only statistically enri- ched in AS springs and accordingly was primarily associated with Sulfolobales that were very abundant in those springs (Fig. 5; Supple- mentary Fig. 16; Supplementary Dataset 8). The dicarboxylate/4- hydroxybutyrate (DC/4-HB) carbon fixation pathway shares most of the enzymatic steps as the 3-HP/4-HB pathway75 and was consequently considered present where the 3-HP/4-HB pathway was present, explaining its statistical enrichment among AS communities (Supple- mentary Fig. 16). In contrast to the above carbonfixationpathways, the Wood-Ljungdahl (WL) pathway was statistically enriched in MS MAGs (Fig. 5, Supplementary Fig. 16) and was encoded by diverse archaeal and bacterial taxa (Supplementary Dataset 8). In addition to the con- sistent identification ofWLpathways in recently characterized lineages (Supplementary Dataset 8), taxa not previously known to encode the WL pathway were identified, including two Caldarchaeales (formerly Aigarchaeota) mOTUs not classifiable to a known family (mOTU106 and mOTU235) and a Thermoplasmatota (Thermoplasmata) mOTU in an unclassified family (mOTU306). The widespread prevalence of the WL pathway suggests that it may be a broadly important autotrophic mechanism in MS-type communities. In addition, the distribution of the WL pathway among many members of MS communities (rather MGB02 MGB01 LGB05 LGB02 GCR05 SMH03 LGB01 NMC01 NOR−PSS NOR−PSP NOR02 MGB03 GCR−GGS063S GCR−GGS063P GRV03 GCR06 GCR11 NMC−RSWS NMC−RSWP SMH01 SYL04 SMH02 GCR01 GCR07 GCR10 GOPA02 SYL02 EvPrim SJ3 GCR04 NMC−RSNS NMC−RSNP NOR06 NOR−CP2016.15m NOR−CP2016.9m NOR−CP2016.0m GCR02 GOPA−Fig8.2013 GOPA−F8S NOR−RPS NOR−OHSPS NOR−DS GOPA01 NOR07 NOR05 NOR04 WB02 NOR03 NOR−HFS NOR−Cinder NOR−LCS NOR−CP2020.0m NOR−CP2020.9m NOR−CP2020.15m NMC−RSEP GCR−SBS SYL03 SYL01 NOR−RBS NOR−RBP CRH−Alice PF01 CRH−RHS GOPA−Moose GCR−JHS GCR−JHP GRV02 Relative Abund. (%) 1 10 50 75 Se le na te re du ct io n C hl or ite re du ct io n Pe rc hl or at e re du ct io n H al og en m et ab ol is m F- ty pe AT Pa se V- ty pe AT Pa se O xy ge n re sp ira tio n N AD H d eh yd ro ge na se Iro n re du ct io n Iro n ox id at io n Ar se na te re du ct io n Ar se ni te ox id at io n An ae ro bi c Ar se ni te o xi da tio n N itr ou s ox id e re du ct io n N itr ic o xi de re du ct io n N itr ite ox id at io n N itr ite re du ct io n N itr ite re du ct io n to a m m on ia N itr at e re du ct io n Am m on ia o xi da tio n N itr og en fi xa tio n Th io su lfa te d is pr op or tio na tio n Th io su lfa te ox id at io n Su lfu r r ed uc tio n Su lfu ro xi da tio n Su lfa te re du ct io n Su lfi te re du ct io n Su lfi de ox id at io n Ae ro bi c C O o xi da tio n H yd ro ge n m et ab ol is m [N iF e] H yd ro ge n m et ab ol is m [F eF e] M et ha ne ox id at io n M et ha no ge ne si s Fo rm at e ox id at io n Fo rm al de hy de ox id at io n M et hy l a m in e -> F or m al de hy de M et ha no l o xi da tio n 3H P- 4H B D C -4 -H B H P C BBW L rT C A C-fixation 1-C Gas (CH4, H2, CO) Sulfur Nitrogen Arsenic & Iron O2 / Respir. Misc. pH < 5 pH 5-7 pH > 7 Fig. 5 | Functional potential of metagenome-assembled-genomes (MAGs) recovered from 67 hot spring community metagenomes. Metagenomes are arranged in order of increasing pH, as indicated to the right of the plot. Each pre- dicted function is indicated by a column and is arrayed according to functional category as indicated below the plot. The circle sizes represent the summed relative abundances for all MAGs predicted to encode the given function for a given meta- genome, with relative abundance values scaled to the legend on the bottom left. Distributions of the relative abundances for each specific function are individually shown in Supplementary Figs. 16, 17 and 19, 20, and the relative abundance estima- tions alongside functional inferences are provided in Supplementary Dataset 8. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 9 www.nature.com/naturecommunications than by a few dominant taxa) suggests that they may be ecologically structured more evenly than in AS and CS that were generally domi- nated by one or a few autotrophic populations. These results are consistentwith a previousmetagenomic studyofObsidianPool (anMS type spring also sampled in this study as GOPA-01) that revealed enrichment of Archaea encoding theWL pathway76. TheWL pathway is generally considered to be strictly associated with anaerobic Archaea and Bacteria (e.g. in methanogens, acetogens, and sulfate reducers)77, in contrast to the rTCA and 3-HP/4-HB pathways of the generally aerobic Aquificota and Sulfolobales, respectively. Thus, the differential statistical enrichment of these three pathways among spring types suggests that the modes of primary productivity are related to the redox state of springs. Consistently, DO was generally lowest (but not statistically significantly lower) in MS springs (Fig. 2b). However, it should be noted that DOwas onlymeasured in 32 of the springs newly analyzed in this study in addition to one other previously published spring (Supplementary Dataset 1). Lastly, the 3-hydroxypropionate pathwaywas only identified in 2 low-abundanceMAGs and is therefore not discussed further here. Among the 1-carbon metabolisms evaluated here, formate oxi- dation capacity via formate dehydrogenases and formaldehyde oxi- dation (that involves some of the same genes as formate oxidation) was widespread among taxa and spring types, although formate oxi- dation was particularly statistically enriched in AS (Fig. 5; Supple- mentary Fig. 16; Supplementary Dataset 8). Formate is a low-potential reductant that has been argued to support hot spring primary pro- ductivity, including in YNP78,79, which is consistent with the above results. DOC values were significantly higher in AS than in MS and CS (Fig. 2k), consistent with other studies80, indicating that organic car- bon (e.g. formate) would be more readily available in AS and poten- tially explaining the enrichment of formate oxidation capacity in those springs. Lastly, the capacities formethanol oxidation andmethylamine metabolism were both minimally represented among a few mOTUs (Fig. 5; Supplementary Dataset 8) and are consequently not highly relevant mechanisms of energy conservation in high-temperature YNP springs. Gas-based metabolisms Recent studies have suggested that microbial metabolisms fueled by gaseous substrates areprevalent in springs that exhibitmixingofwater types and that are moderately acidic8,21,42. Consistent with the enrich- ment of WL-encoding MAGs in MS (Supplementary Fig. 17), archaeal MAGs encoding methanogenesis or methane/alkane oxidation path- ways have recently been identified and characterized in MS within YNP8,81. In the present study, methanogenesis (or methane/alkane oxidation) capacity was only inferred for a single mOTU of the Archaeoglobales (mOTU154), a single mOTU of the Hadarchaeota (mOTU087), and three mOTUs within the Methanomethylicia (mOTU062, 127, and 181), all of which were predominantly present in MS, albeit in low abundances (generally <1.0% relative abundance), with complete absence in AS and near absence in CS also (Fig. 5; Supplementary Dataset 8). Contrasting with previous reports, the capacity for methanogenesis (or methane/alkane oxidation) was not observed in six MAGs from six different hot springs that were highly related (>96–99% AAI) to three previously recovered Korarchaeia MAGs (Ca. Methanodesulfokores washburnensis/NM4/LM09) that were produced from the same Washburn hot spring YNP metagenome81–83, with the former six MAGs all estimated as 78-94% complete (Supplementary Dataset 2). The Ca. Methanodesulfokores washburnensis organismswerepreviously suggested to encode the co- occurring capacity to reduce SO3 2− that is coupled to methan/alkan- otrophy. While the SO3 2− reduction capacity was confirmed for the six Ca. Methanodesulfokores washburnensis MAGs of this study (Sup- plementary Datasets 8-9), none of the MAGs encoded the proteins required for methanogenesis (or methan/alkan-otrophy; e.g. Mcr and Mtr). The inability to detect genes encoding homologs of proteins involved in methanogenesis affiliated with Ca. Methanodesulfokores washburnensis in any of theMAGs estimated to be nearly complete (or from searches of the unbinned metagenomic data for each of the six springs) suggests an incomplete understandingof themethanogenesis or methan-/alkan-otrophy potential and ecological role of Ca. Metha- nodesulfokores washburnensis-related organisms within hot springs. Nevertheless, the lack of detection of methane-metabolizing Ca. Methanodesulfokores washburnensis in theMAG or unbinned data for the hot springs analyzed here suggest that they are either very low abundance taxa in YNP hot springs and/or comprise highly genomi- cally variable sub-populations across hot springs. The co-occurring capacity for both SO4 2− reduction and metha- nogenesis (or methan/alkan-otrophy) has also been reported for sev- eral Archaeoglobales MAGs recovered from the same Washburn hot spring metagenome described above (previously published MAGs LM01 and LM0382), in addition to another population exclusively putatively involved in methanogenesis (LM02)82. In contrast, two dis- tinct Archaeoglobales OTUs were identified in the present dataset across the 67 metagenomes (mOTU027, n = 13 MAGs from 13 springs and mOTU154, n = 3 MAGs from three springs), with mOTU027 only encoding the capacity formethanogenesis (ormethan-/alkan-otrophy) and mOTU154 only encoding the capacity for SO4 2− reduction (Sup- plementary Dataset 8). The similarity of the previously described MAGs (LM01 and LM03) was high relative to the 13 exclusively SO4 2− reducing Archaeoglobales MAGs of this study (93–96% AAI), con- sistent with them being a YNP hot spring-specific metapopulation (Supplementary Dataset 9). However, the McrA from LM01/03 was nearly identical to the McrA of the three exclusively CH4-metabolizing MAGs identified in this study (99–100%AAI; SupplementaryDataset 9). The three CH4-metabolizing Archaeoglobales MAGs of this study co- occurred with the SO4 2− reducing Archaeoglobales in all three springs where the formerwas identified (SupplementaryDataset 8),whichmay suggest that they are involved in synergistic metabolic interactions. However, the results concurrently suggest that the previously inferred co-occurring capacity for these functions in the same cells to have potentially been artifactual. Nevertheless, these data point to a con- sistent presence and co-occurrence of CH4-metabolizing organisms in MS (Fig. 5, Supplementary Fig. 17). CH4 values were highest in AS and MS, with some of the highest values reported in MS where the CH4- metabolizing organisms were found (Supplementary Dataset 1), although differences relative to CS were not statistically significant (Fig. 2f). The presence of CH4 in MS may suggest the widespread potential for anaerobic methan-/alkan-otrophy in these spring types. Such potential is consistent with recent enrichment cultures of methylotrophic methanogenic cultures of Archaeoglobales from YNP84 in addition to inferences of methylotrophic methanogenesis or alkanotrophic metabolism in taxa related to the Mcr-encoding organisms identified here85. The potential for methanotrophy via soluble methane monooxygenases (Mmo) was also statistically enri- ched in MS and CS (Supplementary Fig. 17) and was moderately abundant in these spring types. Methanotrophy through Mmo is generally considered an aerobic process86, although the taxa encoding this capacity were largely inferred to be anaerobic, includingmembers of the Desulfurococcales, Nezhaarchaeales, and Thermo- desulfobacteria (Supplementary Dataset 8). To our knowledge, noth- ing is known of this potential in these taxa and warrants further analyses. Nevertheless, given the high levels of CH4 in MS described above, such metabolisms may be expected for these systems. H2 metabolism via [NiFe]- and [FeFe]-hydrogenases was wide- spread among hot spring populations, although primarily via [NiFe]- hydrogenases (Supplementary Fig. 17). [FeFe]-hydrogenases were sta- tistically enriched in MS and CS, albeit at very low abundances (Sup- plementary Fig. 17), while [NiFe]-hydrogenases were highly abundant in all spring types, but especially prominent in MS, and to a lesser Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 10 www.nature.com/naturecommunications extent CS and AS (Fig. 5; Supplementary Fig. 17). [FeFe]-hydrogenases were only present in 31/372 mOTUs and largely within anaerobic bac- terial taxa like the Thermotogota, Caldisericota, and Firmicutes, while [NiFe]-hydrogenases were present in 190/372 mOTUs comprising diverse archaeal and bacterial clades (Supplementary Dataset 8). This is consistent with recent analyses showing enrichment of hydro- genases in MS21 and with geochemical data suggesting H2 concentra- tions as highest in MS springs, albeit not statistically significantly higher than in AS or CS (Fig. 2e)87. The capacity for aerobic carbon monoxide oxidation (e.g. via aerobic carbon monoxide dehy- drogenase, Cox) was statistically enriched in AS and CS communities and mOTUs generally (encoded by 104/372 mOTUs). Cox subunits were also encoded by diverse archaeal taxa in this study, including by Thermoproteales, Nitrososphaeria, Sulfolobales, Desulfurococcales, Gearchaeales, and others (Supplementary Dataset 8), suggesting a widespread ability to capitalize on CO as a reductant and/or carbon source. Despite the enrichment of this function in AS and CS com- munities, it was nevertheless prevalent across all springs, potentially coinciding with high energetic favorability of CO oxidation across the scale of pH found in hot springs12. Sulfur metabolisms The capacities for SO3 2− or SO4 2− reduction (SR) were widespread among the three spring types (primarily through the presence of dis- similatory sulfite reductases, dsr) (Fig. 5, Supplementary Fig. 17). The taxa encoding DsrAB isoforms were distinct among spring types and the DsrAB isoforms themselves exhibited a pH-dependent distribu- tion, with Thermoproteales (Vulcanisaeta and Caldivirga) and Nitro- sosphaeria SR inhabiting AS; Thermoplasmatota, unclassified Thermoproteales, Korarchaeales, Archaeoglobales, Gearchaeales, and bacterial SR inhabiting MS; and Caldarchaeales, in addition to diverse bacterial SR taxa inhabiting CS (Supplementary Fig. 17). Dsr has been suggested to have possibly arisen in a moderately thermoacidophilic archaeon40 and its widespread prevalence in hot spring thermophiles in YNP suggests that SO3 2−or SO4 2− reducingmetabolisms are favorable in diverse geochemical environments, despite considerable variation in the availability of SO3 2− and SO4 2− (e.g. Fig. 1c)40,88. Nevertheless, the general availability of SO4 2− across all hot spring types due to the input of volcanic sulfur dioxide (SO2) in deep hydrothermal aquifers likely supports the phylogenetically diverse SR populations within them. In addition to the confirmation of SO3 2− reduction capacity in recently characterized Archaea including the Thermoplasmatota40, Korarchaeia83, and Nitrososphaeria70, the capacity for SO3 2− reduction was also present in several Caldarchaeales and Gearchaeales MAGs that have not been previously characterized (Supplementary Data- set 8; Supplementary Fig. 18). The basal branching of the novel Cal- darchaeales and Gearchaeales DsrAB among reductive ‘archaeal/ bacterial-types’ in a phylogenetic reconstruction (Supplementary Fig. 18), along with the inferred inability to reduce SO4 2− by either of these taxa (Supplementary Dataset 8), provide additional support for recently developed hypotheses that SO4 2− reduction among Archaea and Bacteria originated in a thermophilic SO3 2−-reducing archaeal lineage40,89. Like SR, the capacity for S0 oxidation was prevalent in taxa across all spring types (i.e. in 108/372mOTUs), suggesting that it is a generally important metabolic function of hot spring populations. Although the total sulfur content of AS is higher than in MS and CS, the capacity to oxidize S0 primarily via genes also involved in thiosulfate (S2O3 2−) oxidation (i.e. sulfur oxidase, sox, genes) was widespread, but statis- tically enriched inAS (Fig. 5, Supplementary Fig. 17). The taxa encoding S0 oxidation capacity included Aquificota taxa that are distributed across all spring types, Sulfolobales primarily within AS springs, and several uncultured archaeal taxa found in MS and CS including Bath- yarchaeia, Nitrososphaeria (including Caldarchaeales), and Gearch- aeales, in addition to uncultured bacterial taxa within the Hydrothermaceae and Calescibacteria groups. The anaerobic meta- bolism of S2O3 − disproportionation was also statistically enriched in MS and CS, largely due to its encoding by Thermodesulfobacteria that characteristically inhabited these spring types, consequently repre- senting a relatively limited metabolism of hot spring populations (Supplementary Dataset 8). AS were dominated by Sulfolobales (Fig. 4), in addition to sub- dominant populations of Desulfurococcales, Nitrososphaeria, and Thermoproteales, among others (Fig. 4). Thus, many of the enriched functions within these springs reflected metabolic functions encoded by the above taxa. Sulfide and S2O3 2− oxidation (that comprised over- lapping genes for S0 oxidation) were particularly prevalent metabo- lisms among MAGs recovered from AS, along with the capacity for S0 reduction, consistent with the high total sulfur contents of these spring types (Fig. 5; Supplementary Fig. 17). Sulfide oxidation is a key initial step in the acidification of hot springs (described in detail above) and recent evidence suggests that Sulfolobales can significantly accelerate its oxidation relative to abiotic reactivity29. Thus, the wide- spread capacity for sulfide and sulfur oxidation in AS observed here is consistent with a key role for Archaea in driving the acidification of hot springs28. Nitrogen metabolisms Several nitrogen-related metabolic functions were notably absent among the MAGs analyzed here. The genes encoding the capacity for ammonia (NH3/NH4 +) oxidation (ammonia monooxygenase; amo), NO2 − oxidation, or dinitrogen (N2) fixation (nitrogenase; nif/anf/vnf) were particularly rare, and the capacity for anammoxwasnot observed at all (Fig. 5).Only 1mOTUcomprising 2MAGs encodedamo genes and was closely related to the previously characterized NH3/NH4 + oxidizer CandidatusNitrosocaldus yellowstonii90. Given the limiteddistribution ofCa. Nitrosocaldus to a fewCS springs, NH3/NH4 +maynot be awidely used reductant in high-temperature YNP springs, despite high NH3/ NH4 + availability in many AS springs91,92. In contrast to AS, NH3/NH4 + availability is low in most MS/CS springs91,92, yet the capacity for N2 fixation to generate bioavailable nitrogen was only identified in three Hydrogenobaculum MAGs (of the 10 Hydrogenobaculum MAGs within mOTU036 only found in AS), in addition to one Ignisphaera MAG (Desulfurococcales order) of mOTU022 (comprising 14 MAGs dis- tributed in MS and CS). The capacity for N2 fixation has not been previously described in either of these taxa, and this represents the first evidence of putative N2 fixation in Archaea outside of methano- gens/methanotrophs. Notably, the 3 Hydrogenobaculum MAGs that encoded the nif operon were all from 'Figure 8' spring from 3 separate sampling years (2013, 2018, and 2019). Numerous Hydrogenobaculum genomes from YNP hot springs have been previously analyzed that lack nif homologs33,93, including in 7 other springs of this study where Hydrogenobaculum was present. Thus, the consistent presence of potentially N2-fixing Hydrogenobaculum in the same spring across multiple years suggests that this capacity may be an adaptation to the specific geochemical or ecological characteristics of that spring. Nevertheless, the limited distribution of nitrogenase among high- temperature YNP springs, in particular in NH3/NH4 +-limited CS, sug- gests that bioavailable nitrogen is generally not limiting to high- temperature spring communities. The ability to utilize oxidized nitrogen species (e.g. NO3 −, NO2 −; nitric oxide, NO; and nitrous oxide, N2O) was generallymore prevalent in MS and CS communities, with dissimilatory nitrate reduction to ammonium, NO2 − reduction, and N2O reduction being statistically enriched in those spring types, while NO3 − and NO reduction was prevalent in all spring types (Supplementary Fig. 19). However, oxi- dized nitrogen compounds were in very low abundance in all spring types, often being below detection limits (Supplementary Fig. 1). Nevertheless, the reduction of these compounds represents highly energetically favorable metabolisms in most hot springs12,94, perhaps Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 11 www.nature.com/naturecommunications explaining the widespread nature of these pathways in MAGs across hot spring types. Consequently, while oxidized nitrogen species may only represent transient sources of energy substrates (or nitrogen sources) for hot spring taxa, theymay be particularly useful substrates to capitalize upon when available. Arsenic and iron metabolisms Of the few metabolic functions particularly enriched in CS popula- tions, those involved in arsenic transformations were significantly enriched relative to bothMS and AS, with enrichment inMS also being significant relative to AS. Aerobic arsenite oxidation (e.g. via Aio enzymatic complexes) and arsenate reduction (e.g. via putative arsenate reducingmolybdopterin enzymes)werehighly enriched inCS communities (Fig. 5, Supplementary Fig. 19, SupplementaryDataset 8). In addition, the capacity for anaerobic arsenite oxidation via anaerobic arsenite oxidases (Arx) was statistically enriched in CS communities, although the relative abundances of taxa encoding these functions were generally low (Fig. 5, Supplementary Fig. 19, Supplementary Dataset 8). Consistently, the availability of arsenic was significantly higher inMSandCS (Fig. 2j).Genes enabling aerobic arsenite oxidation wereprevalent amongAquificota andThermusBacteria, amongothers, that were predominant in CS, in addition to several archaeal taxa, including Pyrobaculum, Bathyarchaeia, and uncultured Desulfur- ococcales (Supplementary Dataset 8). In contrast, the potential for anaerobic arsenite oxidation was limited to a few mOTUs primarily within the Caldarchaeles archaeal order and the Armatimonadota bacterial phylum that were abundant in CS. Consistently, Arx homo- logs related to members of these orders were identified in a previous meta-analysis of YNP hot spring metagenomes52, suggesting a role for these taxa in arsenic cycling within high-temperature circumneutral hydrothermal waters. The potential for arsenate reduction was also prevalent among numerous Archaea including Pyrobaculum, Desul- furococcales, Nitrosophaeria (including the Caldarchaeales), Gearch- aeales, and diverse Bacteria including Chloroflexota, Desulfobacterota, Deinococcocota, and Bipolaricaulota, among others (Supplementary Dataset 8). Notably, few homologs of canonical arsenate reductase (ArrA) were identified, with nearly all arsenate reduction potential deriving from phylogenetically cohesive, but divergent molybdopterin catalytic subunits implicated in the specific reduction of arsenate in arsenate-reducing Pyrobaculum Archaea and arsenate-reducing Anaeromyxobacter Bacteria95,96. Arsenic in YNP hydrothermal waters derives from high-temperature water-rock interactions with rhyolite in the deep subsurface52. Consequently, the prevalence of arsenic metabolism-related adaptations of CS commu- nity taxa (and to a lesser extent MS taxa) is consistent with, and reflects, the sourcing of CS waters that are more reflective of deeply sourced hydrothermal aquifers, as described above. The capacity to oxidize Fe(II) was prevalent in AS populations via several putative iron oxidation homologs97, albeit the capacity was not statistically significantly enriched; Fig. 5, Supplementary Fig. 19). Nevertheless, iron oxidation potential coincided with the significantly greater availability of Fe(II) in AS springs (Fig. 2i; Supplementary Fig. 3)22,49,98 due to the leaching of iron from minerals by acid during reactive transport. Although the capacity for Fe(II) oxidation wasmost prevalent among the Sulfolobales, consistent with metabolisms iden- tified in several Sulfolobales members74, Fe(II) oxidation potential was also encoded by members of the Nitrososphaeria and Desulfur- ococcales inhabiting AS springs (Supplementary Dataset 8), suggest- ing roles for these organisms in iron cycling. Iron reduction was only implicated in a few low-abundance bacterial taxa, suggesting it may not be a particularly prominent metabolism in high-temperature hot springs, despite thewidespreadavailability of oxidized ironminerals at acidic pH. It should be noted that known genetic mechanisms of iron oxidation and reduction are not fully understood. As such, iron-based dissimilatory metabolisms are likely present in more diverse Archaea and Bacteria than currently known, as evinced by recent reports of cryptic iron metabolisms in subsurface-typical taxa99,100. Thus, esti- mates of the prevalence of these activities in the datasets presented here should be considered conservative. O2 respiration and other metabolisms The ability to respire O2 was prevalent in taxa in all three spring types, pointing to its potential importance as an oxidant in hot spring com- munities (Fig. 5; Supplementary Figs. 20; Supplementary Dataset 8), despite that DO was minimally detected or below detection in most springs (Fig. 2b). Given the high energetic favorability of usingO2 as an oxidant during respiration12, the prevalent capacity to use these sub- strates combinedwith their low concentrations suggests that cellsmay preferentially utilize these oxidants when they are available, even if only transiently. In conjunction with the inferred high level of respiratory capacity across spring types, NADH dehydrogenases were encoded by many taxa, but especially in MS and CS, while V-Type and F-Type ATPases followed distributions of the types of taxa that dominated spring types (Fig. 5; Supplementary Fig. 20) (e.g. Archaea in AS and Archaea/Bacteria in MS and CS). In addition, the capacities for chlorite and perchlorate reduction (via chlorite dismutases, Cld, or perchlorate reductase, Pcr, respectively) were particularly prominent across abundant taxa in all three spring types (Fig. 5, Supplementary Fig. 20). Little is knownof the prevalence of thesemetabolisms in high- temperature environments or thermophiles101. Nevertheless, the anaerobic respiration of chlorine oxyanions has recently been shown to be broadly distributed across Archaea and Bacteria via genomic analyses102, suggesting thesemetabolic capacities aremore ubiquitous than previously recognized. During perchlorate reduction to chlorite via Pcr and chlorite dismutation to Cl−, O2 is produced as a byproduct103. This process has been recently invoked to support sub- stantial O2 production in subsurface environments that would other- wise be expected to be anoxic104. The inverse relationship between temperature and O2 solubility, combined with the low potential for O2 ingassing into spring waters from the atmosphere, would lead to substantial ecological advantages for taxa that encode metabolic processes to generate O2. These observations may explain the wide- spread prevalence of these enzymes among hot spring taxa in this study, although little is known about chlorine oxyanion concentrations in hot spring waters, nor of in situ microbial chlorine oxyanion reduction at high temperatures, warranting additional research. Association of geochemistry and evolutionary histories Given the distinctiveness of microbial communities among geo- chemical realms and the implied eco-evolutionary dynamics described above, the phylogenetic patterning of the mOTUs was evaluated. Amaximum likelihood phylogenomic reconstruction was generated including representative MAGs in addition to 622 other genomes from a previously published bacterial + archaeal phyloge- nomic reconstruction comprising selections from major bacterial and archaeal order-level lineages105. The tree was rooted between the bacterial and archaeal domains and the distance of each MAG to the phylogenetic root was calculated as a proxy for the relative ages of the taxa (i.e. tips; Supplementary Fig. 21), using the root-to-tip distance measure. The distances of tips to the archaeal-bacterial divergence were used to estimate degree of relative divergence from the root. The tree agreed with a previously published phylo- geny constructed to identify the phylogenetic placement of major bacterial and archaeal lineages, as evinced by a highly significant level of concordance for the phylogenetic distances of reference taxa in both trees (n = 622) (linear regression adj. R2 = 0.95, p < 0.001, two-tailed, slope = 1.30; Supplementary Fig. 22). Likewise, the topology of the bacterial component of the tree was consistent with previously observed phylogenetic placement of major bacteria bacterial lineages including for deeply-branching Thermotogota Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 12 www.nature.com/naturecommunications and Bipolaricaulota (Acetothermia), followed by Terrabacteria phyla and then Gracilicutes (Supplementary Fig. 23)105,106. The general topology of the archaeal phylogeny also reflected previously observed branching orders for major archaeal groups, including a deep-branching placement of the DPANN, followed by Halobacter- ota (Euryarchaeota) and the TACK + Asgardarchaeota group (Supplementary Fig. 24). The root-to-tip distances generally fol- lowed the branching patterns in the phylogenetic trees, with the exception of well-known highly derived clades that exhibited higher distances (e.g. Patescibacteria and Nanoarchaeota; Supplementary Figs. 25–26). The distributions of phylogenetic distances significantly varied with pH (Fig. 6; Supplementary Fig. 27). The statistical significance was maintained whether considering Archaea or Bacteria alone, although the trend forArchaeawasmore significant (Supplementary Fig. 27) and comparisons of MS and CS to AS could not be adequately made for Bacteria given the few bacterial MAGs identified in AS. Likewise, the overall ranges of archaeal and bacterial phylogenetic distances were 1.75 2.00 2.25 2.50 2.75 GRV−02 GCR−JHP GCR−JHS GOPA−Moose CRH−RHS PFU−01 CRH−Alice NOR−RBS NOR−RBP SYL−01 SYL−03 GCR−SBS NMC−RSEP NOR−CP2020.15m NOR−CP2020.0m NOR−LCS NOR−CP2020.9m NOR−HFS NOR−03 WB−02 NOR−04 NOR−05 N−07 GOPAOR−01 NOR−DS NOR−OHSPS NOR−RPS GOPA−F8S GOPA−Fig8.2013 GCR−02 NOR−Cinder NOR−CP2016.9m NOR−CP2016.0m NOR-06 NMC−RSNS NMC−RSNP GCR−04 SJ3 EvPrim SYL−02 GOPA−02 GCR−10 GCR−07 SGCR−01 MH−02 SYL−04 SMH−01 NMC−RSWP NMC−RSWS GCR−11 GCR−06 GRV−03 GCR−GGS063P GCR-GGS063S MGB−03 NOR−02 NOR−PSS NOR−PSP NMC−01 LGB−01 SMH−03 GCR−05 LGB−02 LGB−05 MGB−01 MGB−02 1 2 3 4 5 6 7 8 9 Spring pH 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 Ph yl og en et ic D is ta nc e to R oo t 2.0 2.5 3.0 p < 2e-16 p = 0.002 p = 7.1e-08 pH < 5 pH 5-7 pH >7 pH < 5 pH 5-7 pH >7 Phylogenetic Distance to Root Ph yl og en et ic D is ta nc e to R oo t NOR−CP2016.15m a b c Fig. 6 | Concordance of phylogenomic analyses and ecological distributions of hot spring populations. a Phylogenetic distances of metagenome-assembled- genomes (MAGs) to the phylogenetic root. Distances to the phylogenetic root (i.e. the archaeal-bacterial bifurcation) were calculated for each MAG as the distance of theplacementof thatMAGto the root in amaximum likelihood (ML)phylogenomic tree (seematerials andmethods for details). TheML tree contained representatives of OTUs for all bacterial and archaeal MAGs analyzed in this study (n = 372) along with 622 other isolate genomes andMAGs frommajor archaeal andbacterial orders used in a previous phylogenetic analysis ofmajor archaeal and bacterial lineages105. The distances calculated for eachmOTU (defined at >95%genomehomology) were used for all MAGs within that mOTU. Root-to-tip distance distributions are shown for the collection of MAGs from each hot spring community, arranged in order of ascending pH of the spring extending from the top to the bottom of the plot. The hot spring community identifier is shown on the Y-axis. Additional details for each hot spring are provided in Supplementary Dataset 1. Black lines within boxplots show median values for that hot spring community distribution, the edges of the boxes show the 25th and 75th quartile ranges, whiskers show the range of data. Outliers are not shown to facilitate visualization.bRoot-to-tip distances for all 1466 MAGs of this study, arranged by the pHof the spring they derive fromon the X-axis. c) Statistical analysis of the distributions of root-to-tip distances for MAGs within acidic (AS; pH< 5, n = 298), mixed (MS; pH 5–7, n = 840), and circumneutral/alka- line (CS; pH> 7, n = 314) spring types. Boxplots are shown and defined the same as in (a), except that outliers are shown as black circles. Differences in overall dis- tributions were statistically assessed by pairwise Wilcoxon rank sum tests with Benjamini & Hochberg multiple comparison correction. The resulting pairwise p- values (two-sided) are shown for each comparison. The distance values for each MAG in association with metagenome origin and taxonomic classification are provided in Supplementary Dataset 2. The phylogenies used to evaluate distances are shown in Supplementary Figs. 21–24 and Supplementary Dataset 11. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 13 www.nature.com/naturecommunications similar,with the exception of thoseof Patescibacteria thatwere among the highest in the datasets (Supplementary Figs. 25, 27; Supplementary Dataset 2), consistent with recent evidence that they comprise a derived bacterial clade105. Thus, the dominance of communities by one or the other domain did not significantly impact observed phyloge- netic distance distributions. AS communities comprised lineageswith the largest phylogenetic distances, indicating the presence of relatively later-branching taxa (Fig. 6). These observations are consistent with other phylogenetic estimates for the relatively recent divergence of thermoacidophiles like the Sulfolobales28,107 that dominate acidic hot spring environments in YNP and in other geothermal systems58,108. Likewise, the other Archaea inhabiting AS including members of the Desulfurococcales (Acidilobus and Caldisphaera; Supplementary Figs. 28–30), Thermo- proteales (Caldivirga, Vulcanisaeta, Thermoproteus, and Thermo- cladium) (Supplementary Figs. 28–29, 31), the UBA164 group of Nitrososphaeria (formerly Thaumarchaeota) (Supplementary Figs. 28–29, 32), and the Thermoplasmatales group of Thermo- plasmata (Supplementary Fig. 28), in addition to the single acidophilic bacterium, Hydrogenobaculum (Supplementary Fig. 33), were all among the later diverging taxa within their higher order taxonomic groups (e.g. orders or classes). These data point to independent diversification of several taxa into acidic niches and that these diver- gences occurred relatively recently within each lineage. The wide- spread oxygenation of Earth’s atmosphere occurred only at ~2.4 Gya and further oxygenation to near present-day values only occurred ~0.8–0.6 Gya, as reviewed in ref. 31. These changes in Earth history are considered to have significantly altered biogeochemical cycling across Earth and especially, enabling oxidative sulfur cycling109. Notably, the organisms that dominate AS (i.e. primarily Sulfolobales) are obligate or facultative aerobes (Supplementary Dataset 8), whose carbon fixation typically relies on the aerobic oxidation of sulfur compounds (e.g. sulfide and S0) (Fig. 5, Supplementary Fig. 17)74. Likewise, acidic hot springs are generated by the oxidation of these same compounds, potentially due to the activity of Sulfolobales or others, as inferred from geomicrobiological studies spanning ~60 years23–29,41. It is possi- ble, if not likely, that these lineages and their environments only appeared relatively recently in Earth history as a consequence of Earth’s oxygenation. These interpretations are consistent with recent inference that O2-dependent ammonia oxidizing Archaea evolved upon initial oxygenation of Earth’s atmosphere and then diversified into subsequently greater oxygenated fresh- andmarinewaters only in the last ~1 Ga when O2 became more widely available110. CS also harbored relatively later diverging lineages (Fig. 6). These springs were primarily dominated by Thermocrinis Bacteria, Pyr- obaculum Archaea, and Archaea from the Desulfurococcales and Cal- darchaeales orders (Fig. 5; Supplementary Dataset 2) that were also among the later-diverging taxa of their higher order taxonomic groupings (Supplementary Figs. 28–32). In continental geothermal systems, higher pH springs are generally derived from the liquid phase following decompressional boiling and are depleted of volatile gases and oxidants, but replete with solutes due to high-temperature water- rock interactions (Fig. 2), as described above and elsewhere17,18,24,25. A typical hallmark of these springs in YNP is the precipitation of silica sinters (e.g. as terraces or mounds) upon their emergence due to cooling of hydrothermal waters supersaturated with respect to silica (Fig. 2l)24,25. Many of these springs exhibit minimal mixing with other waters due to silica armoring of the flow paths that fluids take to reach the surface22,111. Further, high pH waters originating from the subsur- face generally exhibit low DOC concentrations80 (Fig. 2k) and are generally gas-poor (Fig. 2e–g)21,87. Thus, the lack of available oxidants, organic carbon, and gaseous substrates renders these waters energe- tically limited environments for microbial activity. Indeed, recent studies have suggested that these high-temperature springs only become habitable upon their emergence at the surface that allows for infusion of atmospheric O2 22,35. Specifically, near-surface CS-type waters are dominated by autotrophic obligate aerobes (e.g. Thermo- crinis) that likely facilitate the presence of heterotrophs (including those inhabiting waters, sediments, and biofilms)32,33,43. Given the above observations, the habitability of these springs in the absence of atmospheric O2 infusion remains unclear. Moreover, of the few types of metabolisms enriched in CS communities in this study was aerobic oxidation of arsenite in addition to arsenate reduction. Arsenite oxi- dation to arsenate in hot springs is predominantly driven by biological activity52 like that of Thermocrinis that dominated these spring types. Thus, both arsenite oxidation and the generation of arsenate for bio- logical reduction are intrinsically tied to O2 availability at the surfaces of these springs. Consequently, the relatively later-diverging nature of bacterial and archaeal lineages that inhabit CS may suggest relatively recent adaptation to these environments, perhaps also when atmo- spheric O2 became more widely available on Earth. An alternative explanation is that these taxa could have competitively replaced anaerobic taxa that inhabited these environment types once O2 became available. In contrast to the AS and CS communities, taxa inhabiting MS generally comprised earlier-diverging lineages (Fig. 6). This observa- tion was apparent when evaluating higher-order taxonomic groupings (e.g. across the class/phylum levels of Archaea; Supplementary Fig. 28), in addition towithin lineages (e.g. within the Thermoproteota, Caldarchaeales, and Aquificota; Supplementary Figs. 29–31). This pat- tern is driven, in part, by the abundance of archaeal lineages in these springs that exhibit smaller phylogenetic distances from the archaeal- bacterial root and have elsewhere been considered early-branching, including the Methanomethylicia (formerly Verstratearchaeota), Bathyarchaeia, and Hadarchaeota (Fig. 5; Supplementary Fig. 28)112. The WL pathway was specifically enriched in these spring types, and which is primarily associated with anaerobic Archaea and Bacteria77. The WL pathway has been suggested to be the oldest carbon fixation pathway in Archaea and Bacteria due to its presence in deeply branching lineages of both domains113–115, which is consistent with the smallest phylogenetic distances to the root being in MS and the enrichment of this pathway in MS communities. Notably, springs that are only moderately acidic are rare relative to the more prominent AS and CS types in continental hydrothermal systems18,30,37. Many of the MS of this study contained visibly black sediments (Supplementary Dataset 10) typical of sulfide-rich hot springs and the presence of iron- sulfide minerals found in anoxic high-temperature environments116. Further, many of these springs exhibited among the lowest measured DO concentrations (Fig. 2b). Intriguingly, the sediment community of one of the MS that was sampled in a previous study, 'Roadside West' spring42, exhibited among the lowest overall distribution of phyloge- netic distances, while the taxa in the water community from the same spring (NMC-RSWS and NMC-RSWP, respectively, Fig. 6a) exhibited much larger distances. Given that the primary difference in the geo- chemistry ofwater overlying sediments is likely the redox state and the presence of redox-active minerals in sediments, these results support that the oxidation state of hot spring environments plays a significant role in establishing the phylogenetic patterns described above. The separation of fluids into a vapor and liquid phase during decompressional boiling is a physical process that is expected to uni- versally occur in continental hydrothermal systems both today and in the geologic past. While contemporary hot springs exhibit bimodal pH distributions due to theO2-dependent oxidation ofH2S, the absence of near-surface oxygenated waters would be expected to lead to an absence of highly acidic low-pH waters (i.e. AS) in continental hydro- thermal systems, given current models of hot spring formation18,24,25. Consequently, the largely suboxic MS may represent contemporary analogs for understanding microbial systems resultant from incom- plete oxidation of vapor phase gases following condensation with meteoric water. This hypothesis is consistent with those invoking Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 14 www.nature.com/naturecommunications anoxic vapor-dominated continental geothermal systems on early Earth as potential sites for the origin of life117 as well as fossil evidence from the Dresser Formation, Australia, suggesting the presence of microbial life in vapor-sourced hot springs at ~3.5 Ga118. Anaerobic metabolisms are enriched among Archaea and Bacteria inferred to be early-diverging6 and thus, may help explain the phylogenetic pattern- ing and ecological distributions observed herein, where relatively earlier-diverging lineages were present in largely anoxic or suboxicMS relative to AS and CS. Discussion Hot springs have long served as analogs to understand the environ- mental conditions andmetabolisms that supported the earliest life on Earth. However, hot springs vary widely in their geochemical and microbiological composition, rendering it unclear the extent to which contemporaryhydrothermal environments actually reflect the types of environments that might have supported early life on Earth. Here, a comprehensive and integrated assessment of geochemical, physiolo- gical, andphylogenetic data of YNPhigh-temperature springs and their communities revealed that the two most abundant spring types in contemporary continental hydrothermal environments (acidic and circumneutral/alkaline springs) host relatively later-branching archaeal and bacterial lineages, while also hosting communities that are especially dependent on atmospheric O2 for their functioning. In contrast, moderately acidic springs (pH 5–7) that are generally rare in continental hydrothermal systems today hosted relatively earlier- branching lineages comprising populations especially supported by anaerobic, gas-dependent metabolisms. Moderately acidic springs are formed by mixing of volatile (e.g. H2S, H2, CO2 CH4) rich gas with meteoric or hydrothermal waters and represent precursors to the formation of acidic springs (pH< 4). However, the transition from moderately acidic (pH 5–7) conditions to acidic springs (pH < 4) requires sufficient oxidizing power (e.g. O2 or O2-derived oxidation products such as Fe(III)) to generate strong acid (e.g. sulfuric acid) and long enough residence times to allow for acidification to occur. In many instances, such moderately acidic springs form at higher eleva- tions in YNP, likely facilitating lower water residence times20,21 that enables incomplete acidification and suggests a geomorphological component to the geochemical-microbiological associations identi- fied herein. Consequently, prior to the emergence of oxygenic pho- tosynthesis and the generation of widely available O2 during Earth history, moderately acidic springs were likely much more prevalent. Thus, the generally anoxic, gas-rich conditions observed in con- temporary moderately acidic springs may better reflect conditions of such springs. Infusion of O2 from the atmosphere or by input of O2- containing meteoric water is unavoidable in hot spring environments today. Nevertheless, moderately acidic springs in contemporary con- tinental geothermalfields and themicrobial populations they hostmay serve as useful platforms to understand volcanic gas-influenced, similar springs that may have been more prevalent on a mostly anoxic early Earth. The results of this study provide insight into the distribution of genomic biodiversity, functional potential, and phylogenetic pattern- ing of thermophilic Archaea and Bacteria as a function of geochemical variation in YNP hot springs. It should nevertheless be noted that the YNP geothermal system is a singular example that has arisen primarily due to rhyolitic volcanism. Consequently, similar studies should be conducted in other geologic contexts, although the prevailing geo- physical and geochemical processes that lead to geochemical variation in YNP should be similar globally and should lead to similar eco- evolutionary feedbacks within local environments. Consistently, our recent 16S rRNA gene-based analysis of hot spring community com- position among globally distributed, geologically distinct settings demonstrated similar co-variation in geochemical-taxonomic compo- sition patterns across systems61. These collective results suggest that complex influences from geologic settings, dispersal limitation, and localized geologic/hydrologic characteristics likely influence observed biogeographic patterns of global hot spring biodiversity. Notably, tectonic setting has recently been shown to significantly influence coordinated geochemical and microbiological variation observed among continental61,119,120 and deep-sea62 hydrothermal systems. Additional targeted comparisons of such systemsmay provide further insights into the coordinated evolution of thermophiles and their hydrothermal environments across Earth history and across global hydrothermal contexts. Methods Sampling and geochemical analyses Hot spring surface sediment samples were sterilely collected for molecular analyses from 34 springs in 2019 and 2020 (Supplementary Datasets 1 and 10). Data from 19 additional springs comprising 35 total samples (20 sediment community and 15 water community samples) (Supplementary Dataset 1) fromour other studies were included in the analyses andwere similarly sampled andprocessed. Specific details for eachmetagenomeare provided in the original publishing article that is indicated in Supplementary Dataset 1. For the sediment samples newly described here, they were immediately placed on dry ice for transport to the laboratory, followed by subsequent storage at −80 oC. On-site measurements weremade in waters directly above sediment sampling locations for temperature, pH, and conductivity using portable instruments, in addition to measurements of Fe(II), total sulfide (S2−), and total dissolved silica using a field spectrophotometer (Hach DR890) and colorimetric assays as previously described63,121. Dissolved oxygen (DO) concentrations were measured on-site using a high- temperature PSt3 oxygen dipping probe and a Fibox 4 DO instrument (PreSens, Regensburg, Germany)121. Waters were filtered with 0.22 µm pore size filters (Pall) and preserved for geochemical analyses includ- ing anion, trace element, and δD/δ18O water isotopes, as previously described, including immediate refrigeration of samples and acid- ification of waters for trace element analysis63,121. Waters were similarly filtered using 0.22 µm filters and preserved for cation, dissolved inor- ganic carbon (DIC), dissolved organic carbon (DOC), and carbon iso- tope analyses. Briefly, major cations (total dissolved lithium, sodium, potassium, calcium, and magnesium) were analyzed at the Montana Bureau of Mines & Geology (MBMG) Analytical Laboratory using a Thermo Scientific iCAP 6000 Series inductively coupled plasma- optical emission spectrometer (ICP-OES, EPAmethod 30.0). Dissolved inorganic (DIC) and organic carbon (DOC) concentrations were mea- sured using an Aurora 1030WTotal Carbon Analyzer. All samples were analyzed in triplicate and the concentrations determined using a cali- bration curve generated through the analysis of standard solutions made with lithium carbonate/sodium bicarbonate for DIC and potas- sium hydrogen phthalate/sucrose for DOC. The δ13C values were acquired using a Picarro δ13C-CO2 analyzer coupled to the Aurora 1030W. The error for total DIC, DOC, and δ13C values were reported as the standard deviation of the three replicates of each sample. Dis- solvedgasseswere collectedbypumpingwater fromdepth ingas-tight tubing, followed by exsolving gasses into a ~ 10mL ultrapure N2 gas bubble in a syringe with 100mL of spring water. Triplicate N2/dis- solved gas mixtures were then injected via displacement into sealed 20mL glass vials containing a saturated NaCl solution and stored upside down to avoid leakage through the rubber stopper of the vial. Gas (H2, CH4, and CO2) concentrations were then analyzed on a gas chromatograph (model SRI 8610C, SRI instruments, Torrance, CA), as previously described122. Gas concentrations were calculated based on reference to standard curves and converted to concentrations based on Henry’s Law, as previously described123. A geologic map was con- structed inARCGis using datasets from theNational Park Service (NPS) Geologic Resources Inventory (GRI) program, 20200616, Digital Geologic-GIS Map of the Yellowstone National Park and Vicinity, Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 15 www.nature.com/naturecommunications Wyoming,Montana and Idaho (NPS, GRD, GRI, YELL, YELL digitalmap) adapted from U.S. Geological Survey maps by Christiansen, Blank, Prostka, Smedes, Pierce, the U.S. Geological Survey, Elliot, Nelson, Wahl, Witkind, Love and others (1956–2007), and aMontana Bureau of Mines and Geology map by Berg, Lonn and Locke (1999). Auto- correlation among geochemical variables was evaluated using the ‘rcorr’ function of the Hmisc package (v.4.6-0; https://CRAN.R-project. org/package=Hmisc) for R using Pearson correlation values. Correla- tion plots were generated using the corrplot function of the corrplot package for R (v.0.92; https://r-project.org/web/packages/corrplot/). Metagenomic library preparation, sequencing, and processing DNA extractions were conducted as previously described35. Briefly, at least duplicate extractions were conducted from ~0.5 g aliquots of sediments using the FAST DNA Spin Kit for Soil (MP Biomedicals, Irvine, CA) following the manufacturer’s protocols. Extracts were then pooled as needed to achieve sufficientDNA for shotgunmetagenomics analyses at the Genomics Core Facility at the University of Wisconsin- Madison on a single lane of the Illumina NovaSeq S4 platform (2 × 150bp paired-end reads). Metagenomic libraries were sequenced at a depth of ~70–100 × 106 quality-filtered reads individually, although three samples contained < 65 × 106 quality-filtered reads due to lower library yields (Supplementary Dataset 1). Readswere quality-filtered and sequence adapters removed using the TrimGalore v.0.6.0 (https://github.com/FelixKrueger/TrimGalore) program and the Cutadapt pipeline v.2.1.0 with default parameters124. Quality-filtered reads were first down-sampled using the Bbnorm script (https://github.com/BioInfoTools/BBMap; v.38.87) to improve assemblies41. The normalized reads were then assembled with the MetaSPAdes (v.3.14.0) assembler125 using default parameters. Assem- bled contigs from the best quality assembly were then used for metagenome-assembled-genome (MAG) binning using theMetaWRAP v.1.3.2 pipeline126. The binning pipeline first incorporates read- mapping to assembled contigs with the BWA aligner (v.0.7.17)127. Tet- ranucleotide frequency distributions and coverage profiles were then used to independently bin contigs with the metaBAT v.2128, MaxBin v.2129,130, and CONCOCT v.1.1.0130 algorithms, specifying a length (-l) of 2500. The bin_refinement module of MetaWRAP was then used to identify the highest quality individual MAGs based on completeness and contamination estimates via CheckM v.1.1.3 and considering bins derived from each individual binning algorithm and from combina- tions of bin outputs. Protein coding genes within the MAGs were identified and annotated using the prokka pipeline (v.1.14.5)131 with protein prediction conducted with Prodigal (v.2.6.3)132. The relative abundances of MAGs were estimated based on read mapping of quality-filtered reads to theMAG assemblies using the CheckM (v.1.1.3) coverage and profile functions. Taxonomic classifications were con- ducted using the genome taxonomy database (GTDB) classify work- flow within the GTDB-tk program (v.1.3.0)133. Despite sampling springs with conditions that should preclude photosynthetic organisms, sev- eral low-abundantMAGs affiliatedwith knownphotosynthetic lineages were recovered. Those MAGs, along with low-abundance common contaminants of sequence library preparations134, were not included in downstream analyses. A total of 30 MAGs were removed, including seven associated with photosynthetic bacteria (Cyanobacterota or Chloroflexota: relative abundances of 0.09–2.4%), and 23 from com- mon contaminants (Ralstonia and Propionibacterales: relative abun- dances of 0.06–2.7%). The percentage of reads mapped to the MAG assemblies from each metagenome for the newly described meta- genomes was 70.1% on average (range of 48–84%), while that of the previously published metagenomes was 58% on average (range of 20- 93%) (Supplementary Dataset 1). The taxonomic compositions of MAG assemblies and unprocessed reads from each metagenome were also analyzed and compared using the SingleM platform (v.0.15.0) and the ‘pipe’ function with default parameters135. The genomic coverage values for each classification were normalized within each metagen- ome to calculate relative abundanceestimates for a given classification label. To evaluate the concordance of taxonomic composition profiles from the MAG-based datasets with profiles from the raw reads alone, the relative abundances of the 1,022MAGsnewly reported in this study were statistically compared against the estimated relative abundances of the same taxonomic classification among the raw read datasets. The R214 GTDB version was used for the classifications of both the reads and MAGs. Where MAGs were assigned multiple taxonomic classifi- cations (e.g. at the species rank or only at the genus rank), the classi- fication that wasmost represented (based on percentage of contigs) in the MAG assembly was chosen. If an absolute difference in relative abundance between the read-based and MAG-based classification estimates was >10%, the results were manually inspected to verify the discrepancy, and curated if necessary. The estimated relative abun- dances for each taxonomic classification in the read-based and MAG- based datasets were statistically compared using a linear regression in the R base package software platform. Diversity analyses Metagenomic operational taxonomic units (mOTUs) were identified by first calculating the average nucleotide identities (ANIs) among the curated MAGs from all springs (n = 1466) using the fastANI program (v.1.32)136. MAGs were not generated from the previously published Old Faithful and Spouter Geyser samples for reasons indicated in the original report43. The ANI percentages were converted to dissimilarity values by subtracting the pairwise ANI values from 1 (i.e. identical genomes were given scores of 0.0 and genomes that did not share enough ANI to assign a value were given scores of 1.0). The ANI dis- similarity matrix was then clustered at the 0.05 distance level (i.e. 95% similarity in whole-genome ANI) using the ‘cluster’ command of the mothur program (v.1.42.1)137. The 95% ANI level was used because it corresponds to commonly used bacterial and archaeal species-level thresholds54. Rarefaction curves considering all mOTUs together were also constructed in mothur. Whole-metagenome sequence diversity was calculated using a bin-independent approach via the Nonpareil (v.3.304)138 metric and the quality-filtered reads for eachmetagenome. Briefly, the Nd sequence diversity metric is derived from Nonpareil sequencing coverage curves and estimates the level of sequence redundancy within a metagenomic dataset to approximate genomic diversity within a metagenome138. Between-sample mOTU diversity was evaluated based on the relative abundances of mOTUs among sites and the Bray-Curtis dissimilarity metric and non-metric multidimensional scaling (NMDS), as implemented in the vegan package (v.2.5-7)139 for R (v.4.0.4)140. The association of community composition with envir- onmental variables was assessed by fitting the environmental para- meter data to the NMDS ordination using the envfit function of vegan with removal of data containing missing values. mOTU co- occurrence networks were constructed using Pearson correlation coefficients between mOTU pairs that were calculated based on their estimated relative abundances across sites. Only mOTUs identified in >1 site were used for the analyses. Pearson correlation coefficients and corresponding statistical significance (p) values were calculated using the rcorr function for R. The correlation matrix was used to identify strong correlations of mOTU abun- dances (p < 0.001 and |r| >0.6). The resulting 1,023 mOTU correla- tions were used as edges for a co-occurrence network within the Cytoscape (v.3.8.2)141 software suite. The network was visualized using the edge-weighted algorithm and p values as the weighting metric. The networks were visualized with each node representing anmOTU that is colored by the average pH of the springs where they were identified. Sub-networks that were not connected to others and with <3 nodes were removed to simplify visualization. mOTU sample breadth (as a function of spring pH) was visualized using the Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 16 https://CRAN.R-project.org/package=Hmisc https://CRAN.R-project.org/package=Hmisc https://r-project.org/web/packages/corrplot/ https://github.com/FelixKrueger/TrimGalore https://github.com/BioInfoTools/BBMap www.nature.com/naturecommunications ‘geom_density_ridges’ function of the ggridges package for R (v.0.5.4). Functional annotations Proteins encoded by the MAGs were subjected to initial functional annotation and mapping to metabolic pathways within the Kyoto Encyclopedia for Genes and Genomes (KEGG) database142 using the METABOLIC software program (v.4.0)143 and a module identification default cutoff value of 0.75. The presence of several functions were then manually curated following initial identification including: 1) the presence of nitrogenase (Nif) modules was confirmed by evaluating MAGs for the presence of nifHDK operons on scaffolds; 2) the catalytic directionality of dissimilatory sulfite reductases (Dsr; i.e. either func- tioning in sulfite (SO3 2−) reduction or S0 oxidation) was evaluated by phylogenetic analysis as previously described40; 3) instances of the capacity for sulfate reduction in the absence of sulfite reduction capacity were removed; 4) the use of bd-type ubiquinol reductases for O2 respiration in Archaeawere not considered, given a lack of clarity in their functionalities for archaeal respiration144; 5) [NiFe]- and [FeFe]- hydrogenases were supplemented by BLAST identification of diverse large subunit (LSU) homologs against the datasets and checked for conserved CXXC motifs as previously described145,146 and with refer- ence to the HydDB platform147; and 6) the presence of the Wood- Ljungdahl (WL) carbon fixation pathway was confirmed by the pre- sence of most of the carbon monoxide dehydrogenase/acetyl-CoA synthase (CODH/Acs) subunits, in addition to most of the proteins within the bacterial or archaeal western branches of the pathway. Partial WL pathways were considered present for archaeal groups that have been shown to comprise various components of WL pathway western branches76. Several additions were amended to the METABOLIC-based KEGG functional annotations, including the presence of 1) O2 respiration pro- teins encoded by Sulfolobales (Desulfurolobus oxidases; DoxBCE)74; 2) thiosulfate:quinone oxidoreductase proteins encoded by acidophiles and others (DoxAD)74; 3) iron reduction or oxidation proteins, as annotated with the FeGenie software program97; 4) S0 oxidation capacity via a heterodisulfide reductase operon (hdrAB1C1B2C2) that exhibits a conserved structure148; 5) S0 reduction capacity via a putative NAD(P)H sulfur oxidoreductase (NSR) complex and associated ferredoxin:NAD(P)+ oxidoreductase (FNOR) complex149; and 6) putative arsenate reduction genes encoding molybdopterin oxidoreductases that are phylogenetically related to tetrathionate reductase-like genes, but that have been implicated in archaeal95 and bacterial96 arsenate specific reduction. Molybdopterin oxidoreductases comprise a large family of proteins with diverse functionalities and whose functionalities are considered phylogenetically cohesive with catalytic subunit amino acid sequences150. Consequently, a phylogenetic analysis was conducted using catalytic subunits from major molybdopterin oxidoreductase groups, in addition to putative arsenate-respiring homologs described above, and tetrathionate reductase homologs used elsewhere96. Details of the phylogenetic analyses are provide in the below section. The functional compositions for all MAGs within a metagenome were sta- tistically compared using NMDS ordinations, as described above for the taxonomic compositions of communities. In addition, the environ- mental datawere fit to theNMDSordination, as also described above, to assess the association of geochemistry and functional compositional variation. Functional categories in the METABOLIC output were not inclu- ded in visualization or downstream analyses if evidence for their pre- sence in any of the MAGs was not identified. To achieve conservative estimates of the presence of functions and mitigate potential artefac- tual attribution of functions to taxa, OTUs with ≥3 MAGs that only harbored 1MAGencoding a particular functionwerenot considered to encode that function, unless previously published data otherwise suggested the function to actually be encoded by the taxa. The exception to this caveatwas nitrogenase-encoding genes identified in 1 Desulfurococcales MAG that was considered non-artefactual based on comparison to other datasets (described in the results). To assess whether the functional profiles constructed from MAGs deviated sig- nificantly from that present in entire metagenomes, metabolic anno- tations were also evaluated for the entiremetagenome assemblies (i.e. not the MAG-based datasets) using the METABOLIC platform, as described above, but considering all 315 functions. Thenumbersof hits to a given function across the entire metagenome assembly and the sum of the MAGs in that same metagenome were then statistically compared using a linear regression. Only the 34 newly described metagenomes of the present study were compared against the meta- genomes to control against variation in metagenomic assemblies based on sequencing depths, sequencing platform, and other factors. Theprofiles generated from theMAGs andentiremetagenomes for the newly described datasets were also statistically compared using NMDS ordinations, as described above, but after first normalizing the num- bers of functional hits in each profile to the total in that profile to mitigate artefactual comparisons due to diversity and sequencing depth differences. Statistical comparisons of summed relative abun- dances for each metagenome among spring types were conducted using pairwise Wilcoxon rank sum tests in the base R stats package (two-sided significance testing),withBenjamini &Hochbergcorrection for multiple comparisons. Phylogenetic analysis A bacterial and archaeal phylogenomic reconstruction was generated using a concatenation of 30 proteins encoded by housekeeping genes including ribosomal proteins (n = 27) and RNA polymerase subunits (n = 3) thatwere identifiedusing themarkerfinder pipeline151. To enable reasonable phylogenetic computational times, representatives from each of the 372 mOTUs were selected for phylogenomic analysis in conjunction with 622 reference genomes from the major orders of Archaea and Bacteria. The 622 reference genomes were the same as used in a previous phylogenomic analysis conducted to establish the phylogenetic placement ofmajor archaeal and bacterial lineages using a balanced taxonomic sampling across orders105. Additional reference genomes from the Thermotogota were also included since they have been previously shown as one of the deepest-branching bacterial lineages105,106. The markerfinder script (https://github.com/faylward/ markerfinder) was used to search genome assemblies for the 30 uni- versal genes. The individual protein datasets were aligned with Clus- talO (v.1.2.4)152 and concatenated. The concatenated alignment was then subjected to trimming using trimal (v.1.4.rev22) by removing alignment sites with gaps in >90% of the sequences (e.g. -gt 0.1)153. The concatenated alignments were subjected to Maximum Likelihood phylogenetic reconstruction using IQ-TREE (v.1.6.12)154 and the LG + F + I +G4 amino acid substitution model based on the model testing function, ‘TEST’ within IQTREE. One thousand ultrafast bootstraps were used to evaluate branch support and 10 independent phyloge- netic runs were used to identify the optimal tree. The consensus tree that exhibited the highest likelihood was rooted at the bifurcation between bacterial and archaeal lineages for subsequent visualization and analyses. The tree is available as Supplementary Dataset 11. To proxy the relative ages of phylogenetic lineages, the phylogenetic distance of each tip to the root was calculated across the entire phy- logenetic tree using the distroot function of the adephylo package for R (v.1.1-13). To quantitatively evaluate agreement with robustly sup- ported published archaeal+bacterial phylogenies, the root-to-tip dis- tances were also calculated from the balanced order final tree previously published with the reference genomes used here105. The distanceswere then statistically compared via linear regression against the values obtained for the same genomes in the phylogenetic analysis of this study. Themolybdopterin phylogenetic analysis was conducted as described for the phylogenomic analyses, except that the Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 17 https://github.com/faylward/markerfinder https://github.com/faylward/markerfinder www.nature.com/naturecommunications substitutionmodel chosen for analysis was the LG + F +R7 substitution model and removal of alignments with <50% of the median residue count for the entire dataset. Reporting summary Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Data availability All metagenomic assembly and individual MAG sequence data, in addition to annotation and associated metadata that were generated for this study are publicly available in the Integrated Microbial Gen- omes (IMG) M/ER database under accessions provided for each metagenome or MAG, as listed in Supplementary Datasets 1 and 2. Sequence read data are available in the NCBI database under the bio- project accession PRJNA791658. The MAG sequence data from the previously published metagenomes are also available under NCBI bioproject accessions or IMG accessions given for each dataset in Supplementary Dataset 1. Source data locations for all figures are indicated in the corresponding figure legends and are either present in Supplementary Datasets or within the accompanying Source Data file. Source data are provided with this paper. References 1. Knoll, A. H. & Nowak, M. A. The timetable of evolution. Sci. Adv. 3, e1603076 (2017). 2. Anbar, A. D. Oceans. Elements and evolution. Science 322, 1481–1483, (2008). 3. David, L. A. & Alm, E. J. Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469, 93–96, (2011). 4. Djokic, T., Van Kranendonk, M. J., Campbell, K. A., Walter, M. R. & Ward, C. R. Earliest signs of life on land preserved in ca. 3.5Ga hot spring deposits. Nat. Commun. 8, 15263 (2017). 5. Schopf, J. W. Microfossils of the Early Archean Apex chert: new evidence of the antiquity of life. Science 260, 640–646, (1993). 6. Weiss, M. C. et al. The physiology and habitat of the last universal common ancestor. Nat. Microbiol. 1, 16116 (2016). 7. Leng, H., Wang, Y., Zhao,W., Sievert, S.M. & Xiao, X. Identification of a deep-branching thermophilic clade sheds light on early bacterial evolution. Nat. Commun. 14, 4354 (2023). 8. Colman, D. R., Lindsay, M. R. & Boyd, E. S. Mixing of meteoric and geothermal fluids supports hyperdiverse chemosynthetic hydro- thermal communities. Nat. Commun. 10, 681 (2019). 9. Power, J. F. et al. Microbial biogeography of 925 geothermal springs in New Zealand. Nat. Commun. 9, 2876 (2018). 10. Hugenholtz, P., Pitulle, C., Hershberger, K. L. & Pace, N. R. Novel division level bacterial diversity in a Yellowstone hot spring. J. Bacteriol. 180, 366–376 (1998). 11. Barns, S. M., Fundyga, R. E., Jeffries, M.W. & Pace, N. R. Remarkable archaeal diversity detected in a Yellowstone National Park hot spring environment. Proc. Natl Acad. Sci. USA 91, 1609–1613 (1994). 12. Shock, E. L. et al. Quantifying inorganic sources of geochemical energy in hydrothermal ecosystems, Yellowstone National Park, USA. Geochim Cosmochim. Ac 74, 4005–4043 (2010). 13. Boyd, E. S., Fecteau, K. M., Havig, J. R., Shock, E. L. & Peters, J. W. Modeling the habitat range of phototrophs in Yellowstone National Park: toward the development of a comprehensive fit- ness landscape. Front. Microbiol. 3, 1–11 (2012). 14. Cox, A., Shock, E. L. & Havig, J. R. The transition to microbial photosynthesis in hot spring ecosystems. Chem. Geol. 280, 344–351 (2011). 15. Castenholz, R. W. The effect of sulfide on the blue-green algae of hot springs II. Yellowstone National Park. Micro Ecol. 3, 79–105 (1977). 16. Brock, T. D.Microorganisms adapted to high temperatures.Nature 214, 882–885, (1967). 17. Fournier, R. O. Geochemistry and dynamics of the Yellowstone- National-Park hydrothermal system.Annu Rev. Earth Planet Sci. 17, 13–53 (1989). 18. Arnórsson, S., Stefánsson, A. & Bjarnason, Jn. O. Fluid-fluid inter- actions in geothermal systems.Rev.Miner. Geochem.65, 259–312 (2007). 19. Holland, H. D. Some applications of thermochemical data to problems of ore deposits; [Part] 2, mineral assemblages and the composition of ore forming fluids. Econ. Geol. 60, 1101–1166 (1965). 20. Lowenstern, J. B., Bergfeld, D., Evans, W. C. & Hunt, A. G. Origins of geothermal gases at Yellowstone. J. Volcano Geoth. Res. 302, 87–101 (2015). 21. Lindsay, M. R. et al. Probing the geological source and biological fateof hydrogen in Yellowstonehot springs.Environ.Microbiol.21, 3816–3830 (2019). 22. Sims, K. W. et al. The dynamic influence of subsurface geological processes on the assembly and diversification of thermophilic microbial communities in continental hydrothermal systems. Geochim. Cosmochim. Acta 362, 77–103 (2023). 23. White, D. E., Keith, T. E. C. & Hutchinson, R. A. The geology and remarkable thermal activity of Norris Geyser Basin, Yellowstone National Park, Wyoming. U.S. Gol. Surv. Prof. Pap., 1456 (1988). 24. Nordstrom, K. D., Ball, J. W. & McCleskey, R. B. Ground water to surface water: chemistry of thermal outflows in Yellowstone National Park in Geothermal biology and geochemistry in Yellow- stone National Park (eds. W. P. Inskeep & T. R. McDermott) 73–94 (Montana State University, 2005). 25. Nordstrom, D. K., McCleskey, R. B. & Ball, J. W. Sulfur geochem- istry of hydrothermal waters in Yellowstone National Park: IV Acid- sulfate waters. Appl. Geochem. 24, 191–207 (2009). 26. Mosser, J. L., Mosser, A. G. & Brock, T. D. Bacterial origin of sulfuric acid in geothermal habitats. Science 179, 1323–1324 (1973). 27. Brock, T. D. & Mosser, J. L. Rate of sulfuric-acid production in YellowstoneNational Park.Geol. Soc. Am. Bull.86, 194–198 (1975). 28. Colman, D. R. et al. Geobiological feedbacks and the evolution of thermoacidophiles. ISME J. 12, 225–236 (2018). 29. Fernandes-Martins, M. C., Colman, D. R. & Boyd, E. S. Sulfide oxidation by members of the Sulfolobales. PNAS Nexus 3, pgae201 (2024). 30. Brock, T. D. Bimodal distribution of pHvalues of thermal springs of the world. Geol. Soc. Am. Bull. 82, 1393 (1971). 31. Lyons, T.W., Reinhard, C. T. & Planavsky, N. J. The rise of oxygen in Earth’s early ocean and atmosphere.Nature 506, 307–315, (2014). 32. Meyer-Dombard, D. R. et al. Hydrothermal ecotones and streamer biofilm communities in the Lower Geyser Basin, Yellowstone National Park. Environ. Microbiol. 13, 2216–2231 (2011). 33. Takacs-Vesbach, C. et al. Metagenome sequence analysis of fila- mentous microbial communities obtained from geochemically distinct geothermal channels reveals specialization of three Aquificales lineages. Front. Microbiol. 4, 84 (2013). 34. Mitchell, K. R. Controls on microbial community structure in ther- mal environments; exploring Bacterial diversity and the relative influence of geochemistry and geography Ph.D. thesis, University of New Mexico, (2009). 35. Fernandes-Martins, M. C. et al. Ecological dichotomies arise in microbial communities due to mixing of deep hydrothermal waters and atmospheric gas in a circumneutral hot spring. Appl. Environ. Microbiol. 87, e0159821 (2021). 36. Stefánsson, A. et al. Quantifying mixing, boiling, degassing, oxi- dation and reactivity of thermal waters at Vonarskard, Iceland. J. Volcano Geoth. Res. 309, 53–62 (2016). Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 18 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA791658/ www.nature.com/naturecommunications 37. Amenabar, M. J., Urschel, M. R., & Boyd, E. S. Metabolic and taxonomic diversification in continental magmatic hydrothermal systems in Microbial Evolution Under Extreme Conditions. 57–95 (De Gruyter, Berlin, 2015). 38. Damer, B. & Deamer, D. The hot spring hypothesis for an origin of life. Astrobiology 20, 429–452 (2020). 39. Allen, E. T. & Day, A. Hot Springs of the Yellowstone National Park. (Carnegie Institute of Washington Publications, 1935). 40. Colman, D. R. et al. Phylogenomic analysis of novel Diaforarchaea is consistent with sulfite but not sulfate reduction in volcanic environments on early Earth. ISME J. 14, 1316–1331 (2020). 41. Colman, D. R., Amenabar, M. J., Fernandes-Martins, M. C. & Boyd, E. S. Subsurface Archaea associated with rapid geobiological change in a model Yellowstone hot spring. Commun. Earth Environ. 3, 205 (2022). 42. Fernandes-Martins,M.C., Colman,D. R. &Boyd, E. S. Relationships between fluid mixing, biodiversity, and chemosynthetic primary productivity in Yellowstone hot springs. Environ. Microbiol. 25, 1022–1040 (2023). 43. Keller, L. M., Colman, D. R. & Boyd, E. S. An active microbiome in Old Faithful geyser. PNAS Nexus 2, pgad066 (2023). 44. Christiansen, R. L. The Quaternary and Pliocene Yellowstone Pla- teau volcanic field of Wyoming, Idaho, and Montana. U.S. Geol. Surv. Professional paper, 729–72, (2001). 45. Larson, P. B. et al. A preliminary study of older hot spring alteration in Sevenmile Hole, Grand Canyon of the Yellowstone River, Yel- lowstone Caldera, Wyoming. J. Volcano Geoth. Res. 188, 225–236 (2009). 46. Smedes, H. W. & Prostka, H. J. Stratigraphic framework of the Absaroka volcanic supergroup in the Yellowstone National Park region. U.S. Geol. Surv. Report No. 2330–7102, (1972). 47. Fecteau, K. M. et al. Cyanobacteria and algae meet at the limits of their habitat ranges in moderately acidic hot springs. J. Geophys. Res. Biogeosci. 127, e2021JG006446 (2022). 48. Kharaka, Y. K., Thordsden, J. J. &White, L. D. Isotope and chemical compositions of meteoric and thermal waters and snow from the greater Yellowstone National Park region. U.S. Geol. Surv. Open- File Report-2-194 (2002). 49. Kaasalainen, H. & Stefansson, A. The chemistry of trace elements in surface geothermal waters and steam, Iceland. Chem. Geol. 330, 60–85 (2012). 50. Stefánsson, A. et al. GeoFluids database 2016: chemical compo- sition of Icelandic fluids and gases. Sci. Inst. Report RH-10-2016 (2016). 51. Berg, J. S. et al. How low can they go? Aerobic respiration by microorganisms under apparent anoxia. FEMS Microbiol. Rev. 46, fuac006 (2022). 52. McCleskey, R. B. et al. The source, fate, and transport of arsenic in the Yellowstone hydrothermal system-an overview. J. Volcanol. Geoth. Res. 432, 107709 (2022). 53. Stauffer, R. E. & Thompson, J. M. Arsenic and antimony in geo- thermal waters of Yellowstone National Park, Wyoming, USA. Geochim. Cosmochim. Acta 48, 2547–2561 (1984). 54. Konstantinidis, K. T. & Tiedje, J. M. Genomic insights that advance the species definition for prokaryotes. Proc. Natl Acad. Sci. USA 102, 2567–2572, (2005). 55. Royo-Llonch, M. et al. Compendium of 530 metagenome- assembled bacterial and archaeal genomes from the polar Arctic Ocean. Nat. Microbiol. 6, 1561–1574 (2021). 56. Alneberg, J. et al. Ecosystem-widemetagenomic binning enables prediction of ecological niches from genomes. Commun. Biol. 3, 119 (2020). 57. Zhou, Z., St. John, E., Anantharaman, K. &Reysenbach, A.-L.Global patterns of diversity and metabolism of microbial communities in deep-sea hydrothermal vent deposits.Microbiome 10, 241 (2022). 58. Inskeep, W. P. et al. The YNP Metagenome Project: environmental parameters responsible for microbial distribution in the Yellow- stone Geothermal Ecosystem. Front. Microbiol. 4, 67 (2013). 59. Sharp, C. E. et al. Humboldt’s spa: microbial diversity is controlled by temperature in geothermal environments. ISME J. 8, 1166–1174 (2014). 60. Miller, S. R., Strong, A. L., Jones, K. L. & Ungerer, M. C. Bar-coded pyrosequencing reveals shared bacterial community properties along the temperature gradients of two alkaline hot springs in Yellowstone National Park. Appl. Environ. Microbiol. 75, 4565–4572 (2009). 61. Colman, D. R. et al. Tectonic and geologic setting influence hot spring microbiology. Environ. Microbiol. 25, 2481–2497 (2023). 62. Reysenbach, A.-L. et al. Complex subsurface hydrothermal fluid mixing at a submarine arc volcano supports distinct and highly diverse microbial communities. Proc. Natl Acad. Sci. USA 117, 32627–32638 (2020). 63. Colman, D. R. et al. Ecological differentiation in planktonic and sediment-associated chemotrophic microbial populations in Yel- lowstone Hot Springs. FEMS Microbiol. Ecol. 92, 9 (2016). 64. Hou,W. et al. A comprehensive census ofmicrobial diversity in hot springs of Tengchong, Yunnan Province China using 16S rRNA gene pyrosequencing. PLoS One 8, e53350 (2013). 65. Moreras-Marti, A. et al. Volcanic controls on the microbial habit- ability of Mars-analogue hydrothermal environments. Geobiology 19, 489–509 (2021). 66. Boyd, E. S., Hamilton, T. L., Wang, J., He, L. & Zhang, C. L. The role of tetraether lipid composition in the adaptation of thermophilic archaea to acidity. Front. Microbiol. 4, 62 (2013). 67. Colman, D. R., Lindsay, M. R., Amenabar, M. J. & Boyd, E. S. The intersection of geology, geochemistry, and microbiology in continental hydrothermal systems. Astrobiology 19, 1505–1522 (2019). 68. Boyd, E. S., Hamilton, T. L., Spear, J. R., Lavin, M. & Peters, J. W. [FeFe]-hydrogenase in Yellowstone National Park: evidence for dispersal limitation and phylogenetic niche conservatism. ISME J. 4, 1485–1495 (2010). 69. Alsop, E. B., Boyd, E. S. & Raymond, J. Mergingmetagenomics and geochemistry reveals environmental controls on biological diversity and evolution. BMC Ecol. 14, 16 (2014). 70. Payne, D. et al. Geologic legacy spanning >90 years explains unique Yellowstone hot spring geochemistry and biodiversity. Environ. Microbiol. 21, 4180–4195 (2019). 71. Müller, W. J. et al. Whole genome comparison of Thermus sp. NMX2. A1 reveals principal carbon metabolism differences with closest relation Thermus scotoductus SA-01. G3 Genes Genomes Genet. 6, 2791–2797 (2016). 72. Balows, A., Trüper, H. G., Dworkin, M., Harder, W. & Schleifer, K.-H. The prokaryotes: a handbook on the biology of bacteria: ecophy- siology, isolation, identification, applications. Vol. 2 (Springer, 1992). 73. Skirnisdottir, S., Hreggvidsson, G.O., Holst, O. & Kristjansson, J. K. Isolation and characterization of a mixotrophic sulfur-oxidizing Thermus scotoductus. Extremophiles 5, 45–51 (2001). 74. Counts, J. A.,Willard, D. J. & Kelly, R.M. Life in hot acid: a genome- based reassessment of the archaeal order sulfolobales. Environ. Microbiol. 23, 3568–3584 (2020). 75. Fuchs, G. Alternative pathways of carbon dioxide fixation: insights into the early evolution of life? Annu Rev. Microbiol. 65, 631–658 (2011). 76. Berghuis, B. A. et al. Hydrogenotrophic methanogenesis in archaeal phylum Verstraetearchaeota reveals the shared ancestry of all methanogens. Proc. Natl Acad. Sci. USA 116, 5037–5044 (2019). 77. Ragsdale, S. W. Enzymology of the wood-Ljungdahl pathway of acetogenesis. Ann. N. Y. Acad. Sci. 1125, 129–136, (2008). Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 19 www.nature.com/naturecommunications 78. Windman, T., Zolotova, N., Schwandner, F. & Shock, E. L. Formate as an energy source for microbial metabolism in chemosynthetic zonesofhydrothermalecosystems.Astrobiology7, 873–890 (2007). 79. Urschel, M. R., Kubo, M. D., Hoehler, T. M., Peters, J. W. & Boyd, E. S. Carbon source preference in chemosynthetic hot spring com- munities. Appl. Environ. Microbiol. 81, 3834–3847 (2015). 80. Nye, J. J., Shock, E. L. & Hartnett, H. E. A novel PARAFACmodel for continental hot springs reveals unique dissolved organic carbon compositions. Org. Geochem. 141, 103964 (2020). 81. Borrel, G. et al. Wide diversity of methane and short-chain alkane metabolisms in uncultured archaea. Nat. Microbiol. 4, 603–613 (2019). 82. Wang, Y., Wegener, G., Hou, J., Wang, F. & Xiao, X. Expanding anaerobic alkane metabolism in the domain of archaea. Nat. Microbiol. 4, 595–602 (2019). 83. McKay, L. J. et al. Co-occurring genomic capacity for anaerobic methane and dissimilatory sulfur metabolisms discovered in the Korarchaeota. Nat. Microbiol. 4, 614–622 (2019). 84. Lynes, M. M., Jay, Z. J., Kohtz, A. J. & Hatzenpichler, R. Methylo- trophic methanogenesis in the Archaeoglobi revealed by cultiva- tion of Ca. Methanoglobus hypatiae from a Yellowstone hot spring. ISME J. 18, wrae026 (2024). 85. Vanwonterghem, I. et al. Methylotrophic methanogenesis dis- covered in the archaeal phylum Verstraetearchaeota. Nat. Micro- biol. 1, 16170 (2016). 86. Semrau, J. D., DiSpirito, A. A. & Yoon, S. Methanotrophs and copper. FEMS Microbiol. Rev. 34, 496–531 (2010). 87. Bergfeld, D., Hunt, A. G., Shanks, W., & Evans, W. Gas and isotope chemistry of thermal features in Yellowstone National Park, Wyom- ing. (US Department of the Interior, US Geological Survey, 2014). 88. Zinder, S. & Brock, T. D. Sulfur dioxide in geothermal waters and gases. Geochim Cosmochim. Ac 41, 73–79 (1977). 89. Neukirchen, S., Pereira, I. A. C. & Sousa, F. L. Stepwise pathway for early evolutionary assembly of dissimilatory sulfite and sulfate reduction. ISME J. 17, 1680–1692 (2023). 90. de la Torre, J. R., Walker, C. B., Ingalls, A. E., Konneke, M. & Stahl, D. A. Cultivation of a thermophilic ammonia oxidizing archaeon syn- thesizing crenarchaeol. Environ. Microbiol. 10, 810–818, (2008). 91. Holloway, J. M., Nordstrom, D. K., Böhlke, J., McCleskey, R. B. & Ball, J. W. Ammonium in thermal waters of Yellowstone National Park: processes affecting speciation and isotope fractionation. Geochim. Cosmochim. Acta 75, 4611–4636 (2011). 92. Hamilton, T. L. et al. Competition for ammonia influences the structure of chemotrophic communities in geothermal springs. Appl. Environ. Microbiol. 80, 653–661 (2014). 93. Romano, C. et al. Comparative genomic analysis of phylogeneti- cally closely related Hydrogenobaculum sp. isolates from Yellow- stone National Park. Appl. Environ.Microbiol. 79, 2932–2943 (2013). 94. Amend, J. P. & Shock, E. L. Energetics of overall metabolic reac- tions of thermophilic and hyperthermophilic archaea and bac- teria. FEMS Microbiol. Rev. 25, 175–243 (2001). 95. Cozen, A. E. et al. Transcriptional map of respiratory versatility in the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. J. Bacteriol. 191, 782–794 (2009). 96. Muramatsu, F. et al. Possible involvement of a tetrathionate reductase homolog in dissimilatory arsenate reduction by Anae- romyxobacter sp. Strain PSR-1. Appl. Environ. Microbiol. 86, e00829-20 (2020). 97. Garber, A. I. et al. FeGenie: a comprehensive tool for the identifi- cation of iron genes and iron gene neighborhoods in genome and metagenome assemblies. Front. Microbiol. 11, 37 (2020). 98. Amenabar, M. J. & Boyd, E. S. Mechanisms of mineral substrate acquisition in a thermoacidophile. Appl. Environ. Microbiol. 84, e00334–00318 (2018). 99. Dunham, E. C., Dore, J. E., Skidmore, M. L., Roden, E. E. & Boyd, E. S. Lithogenic hydrogen supports microbial primary production in subglacial and proglacial environments. Proc. Nat. Acad. Sci. USA 118, e2007051117 (2021). 100. Zavarzina, D., Gavrilov, S. & Zhilina, T. Direct Fe (III) reduction from synthetic ferrihydrite by haloalkaliphilic lithotrophic sulfidogens. Microbiology 87, 164–172 (2018). 101. Liebensteiner, M. G., Tsesmetzis, N., Stams, A. J. & Lomans, B. P. Microbial redox processes in deep subsurface environments and the potential application of (per) chlorate in oil reservoirs. Front. Microbiol. 5, 428 (2014). 102. Barnum, T. P. & Coates, J. D. Chlorine redox chemistry is wide- spread in microbiology. ISME J. 17, 70–83 (2023). 103. Van Ginkel, C., Rikken, G., Kroon, A. & Kengen, S. Purification and characterization of chlorite dismutase: a novel oxygen-generating enzyme. Arch. Microbiol. 166, 321–326, (1996). 104. Ruff, S. E. et al. Hydrogen and dark oxygen drive microbial pro- ductivity in diverse groundwater ecosystems. Nat. Commun. 14, 3194 (2023). 105. Martinez-Gutierrez, C. A. & Aylward, F. O. Phylogenetic signal, congruence, and uncertainty across bacteria and archaea. Mol. Biol. Evol. 38, 5514–5527 (2021). 106. Coleman, G. A. et al. A rooted phylogeny resolves early bacterial evolution. Science 372, eabe0511 (2021). 107. Blank, C. E. Not so old archaea–the antiquity of biogeochemical processes in the archaeal domain of life. Geobiology 7, 495–514 (2009). 108. Ward, L. et al. Microbial community dynamics in Inferno Crater Lake, a thermally fluctuating geothermal spring. ISME J. 11, 1158–1167 (2017). 109. Canfield, D. E. & Teske, A. Late Proterozoic rise in atmospheric oxygen concentration inferred from phylogenetic and sulphur- isotope studies. Nature 382, 127–132, (1996). 110. Ren, M. et al. Phylogenomics suggests oxygen availability as a driving force in Thaumarchaeota evolution. ISME J. 13, 2150–2161 (2019). 111. Gibson, M. L. & Hinman, N.W.Mixing of hydrothermal water and groundwater near hot springs, YellowstoneNational Park (USA): hydrology and geochemistry. Hydrogeol. J. 21, 919–933 (2013). 112. Mei, R., Kaneko, M., Imachi, H. & Nobu, M. K. The origin and evo- lution of methanogenesis and Archaea are intertwined. PNAS Nexus 2, 1–10 (2023). 113. Martin,W., Baross, J., Kelley, D. &Russell,M. J. Hydrothermal vents and the origin of life. Nat. Rev. Microbiol 6, 805–814 (2008). 114. Nitschke, W. & Russell, M. J. Beating the acetyl coenzyme A-pathway to the origin of life. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20120258 (2013). 115. Russell, M. J. & Martin, W. The rocky roots of the acetyl-CoA pathway. Trends Biochem Sci. 29, 358–363, (2004). 116. Inskeep, W. P. et al. Metagenomes from high-temperature che- motrophic systems reveal geochemical controls on microbial community structure and function. PLoS One 5, e9773 (2010). 117. Mulkidjanian, A. Y., Bychkov, A. Y., Dibrova, D. V., Galperin, M. Y. & Koonin, E. V. Origin of first cells at terrestrial, anoxic geothermal fields. Proc. Natl Acad. Sci. USA 109, E821–E830 (2012). 118. VanKranendonk,M. J. et al. Elements for theoriginof life on land: a deep-time perspective from the Pilbara Craton of Western Aus- tralia. Astrobiology 21, 39–59 (2021). 119. Upin, H. E., Newell, D. L., Colman, D. R. & Boyd, E. S. Tectonic settings influence the geochemical andmicrobial diversity of Peru hot springs. Commun. Earth Environ. 4, 112 (2023). 120. Fullerton, K. M. et al. Effect of tectonic processes on biosphere–geosphere feedbacks across a convergent margin. Nat. Geosci. 14, 301–306 (2021). Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 20 www.nature.com/naturecommunications 121. Dahlquist-Selking, G. R., Helfrich, P., Timmer, J. & Cox, A. D. Mudpot geochemistry reveals subsurface geologic activity. ACS Earth Space Chem. 7, 2458–2474 (2023). 122. Lindsay, M. R. et al. Subsurface processes influence oxidant availability and chemoautotrophic hydrogen metabolism in Yel- lowstone hot springs. Geobiology 16, 674–692 (2018). 123. Spear, J. R., Walker, J. J., McCollom, T. M. & Pace, N. R. Hydrogen and bioenergetics in the Yellowstone geothermal ecosystem. Proc. Natl Acad. Sci. USA 102, 2555–2560 (2005). 124. Martin, M. Cutadapt removes adapter sequences from high- throughput sequencing reads. EMBnet. J. 17, 3 (2011). 125. Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaS- PAdes: a new versatile metagenomic assembler.Genome Res. 27, 824–834 (2017). 126. Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018). 127. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010). 128. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019). 129. Wu, Y. W. et al. An automated binning method to recover indivi- dual genomes from metagenomes using an expectation- maximization algorithm. Microbiome 2, 26 (2014). 130. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014). 131. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioin- formatics 30, 2068–2069 (2014). 132. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and trans- lation initiation site identification. BMC Bioinform. 11, 119 (2010). 133. Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2022). 134. Sheik, C. S. et al. Identification and removal of contaminant sequences from ribosomal gene databases: Lessons from the census of deep life. Front. Microbiol. 9, 840 (2018). 135. Woodcroft, B. J. et al. SingleM and Sandpiper: robust microbial taxonomic profiles from metagenomic data. bioRxiv, 2024.2001.2030.578060 (2024). 136. Jain, C., Rodriguez, R. L., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic gen- omes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018). 137. Schloss, P. D. et al. Introducing Mothur: Open-source, platform- independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009). 138. Rodriguez, R. L., Gunturu, S., Tiedje, J. M., Cole, J. R. & Kon- stantinidis, K. T. Nonpareil 3: fast estimation of metagenomic coverage and sequence diversity.mSystems3, e00039-18 (2018). 139. vegan: Community Ecology Package v. R package version 2.4 (2017). 140. R: A language and environment for statistical computing v. 3.1.0. (R Foundation for Statistical Computing, Vienna, Austria, 2017). 141. Smoot, M. E., Ono, K., Ruscheinski, J., Wang, P. L. & Ideker, T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27, 431–432 (2011). 142. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017). 143. Zhou, Z. et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks.Microbiome 10, 33 (2022). 144. Jay, Z. J. et al. Pyrobaculum yellowstonensis strain WP30 respires on elemental sulfur and/or arsenate in circumneutral sulfidic geothermal sediments of YellowstoneNational Park.Appl Environ. Microbiol. 81, 5907–5916 (2015). 145. Poudel, S. et al. Origin and evolution of flavin-based electron bifurcating enzymes. Front. Microbiol. 9, 1762 (2018). 146. Schut, G. J., Boyd, E. S., Peters, J. W. & Adams, M. W. The modular respiratory complexes involved in hydrogen and sulfur metabolism by heterotrophic hyperthermophilic archaea and their evolutionary implications. FEMS Microbiol. Rev. 37, 182–203 (2013). 147. Søndergaard, D., Pedersen, C. N. & Greening, C. HydDB: a web tool for hydrogenase classification and analysis. Sci. Rep. 6, 34212 (2016). 148. Koch, T. & Dahl, C. A novel bacterial sulfur oxidation pathway provides a new link between the cycles of organic and inorganic sulfur compounds. ISME J. 12, 2479–2491 (2018). 149. Mardanov, A. V. et al. The genome sequence of the crenarchaeon Acidilobus saccharovorans supports a new order, Acidilobales, and suggests an important ecological role in terrestrial acidic hot springs. Appl. Environ. Microbiol. 76, 5652–5657 (2010). 150. Laska, S., Lottspeich, F. & Kletzin, A. Membrane-bound hydro- genase and sulfur reductase of the hyperthermophilic and acid- ophilic archaeon Acidianus ambivalens. Microbiology 149, 2357–2371 (2003). 151. Sunagawa, S. et al. Metagenomic species profiling using uni- versal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013). 152. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). 153. Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phyloge- netic analyses. Bioinformatics 25, 1972–1973, (2009). 154. Minh, B. Q. et al. IQ-TREE 2: newmodels and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020). Acknowledgements This work was supported by a NASA grant to DRC, ESB, AC, and BSC (80NSSC19M0150). The Department of Energy Joint Genome Insti- tute’s Community Sequencing Program (CSP 504081 to DRC and ESB) supported the metagenomic sequence generation for several sam- ples that were previously published (Supplementary Dataset 1). We thank Annie Carlson of the National Park Service for permitting under permit numbers YELL-SCI-5544 and -7008. We would like to thank the 2019 and 2020 Laboratory Exploring Geobiochemical Engineering and Natural Dynamics (LEGEND) members and field assistants for sample collection and analyses including Anne Morse, Nathan Car- penter, Kyle Nacey, Robert Pal, Caitlin Cox, Alaina Cox, Alex Mellnik, Ariana Rivera, Justin Pottenger. We thank Jackie Timmer of the Montana Bureau of Mines and Geology for δ2H and δ18O analyses of water. Author contributions D.R.C. and E.S.B. designed the study and prepared the manuscript. D.R.C. conducted the analyses and L.M.K. contributed to genomic data generation. E.A.P., E.A.B., B.S.C., A.C., and D.R.C. generated and ana- lyzed geochemical data. A.S. aided in figure preparation and map con- struction. D.R.C., E.S.B., L.M.K., E.A.P., E.A.B., B.S.C., A.C., and D.R.C. contributed to field sampling. All authors contributed to manuscript preparation. Competing interests The authors declare no competing interests. Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 21 www.nature.com/naturecommunications Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41467-024-51841-5. Correspondence and requests for materials should be addressed to Daniel R. Colman or Eric S. Boyd. Peer review information Nature Communications thanks the anon- ymous reviewers for their contribution to the peer review of this work. A peer review file is available. Reprints and permissions information is available at http://www.nature.com/reprints Publisher’s note Springer Nature remains neutral with regard to jur- isdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. Youdonot havepermissionunder this licence toshare adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by-nc-nd/4.0/. © The Author(s) 2024 Article https://doi.org/10.1038/s41467-024-51841-5 Nature Communications | (2024) 15:7506 22 https://doi.org/10.1038/s41467-024-51841-5 http://www.nature.com/reprints http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ www.nature.com/naturecommunications Covariation of hot spring geochemistry with microbial genomic diversity, function, and evolution Results Geologic and geochemical context Geochemical variation Geochemical controls on genomic diversity Functional distributions across hot spring geochemical provinces Carbon fixation and 1-carbon metabolism Gas-based metabolisms Sulfur metabolisms Nitrogen metabolisms Arsenic and iron metabolisms O2 respiration and other metabolisms Association of geochemistry and evolutionary histories Discussion Methods Sampling and geochemical analyses Metagenomic library preparation, sequencing, and processing Diversity analyses Functional annotations Phylogenetic analysis Reporting summary Data availability References Acknowledgements Author contributions Competing interests Additional information