ALKALINE MICROALGAE FROM YELLOWSTONE  
NATIONAL PARK: PHYSIOLOGICAL AND GENOMIC  
CHARACTERIZATION FOR BIOFUEL PRODUCTION 
by 
Karen Margaret Moll 
A dissertation submitted in partial fulfillment 
of the requirements for the degree 
of 
Doctor of Philosophy 
in 
Microbiology 
MONTANA STATE UNIVERSITY 
Bozeman, Montana 
April 2021 
  
 
 
 
©COPYRIGHT 
by 
Karen M. Moll 
2021 
All Rights Reserved 
ii 
 
DEDICATION 
 This dissertation is dedicated to Robert D. Gardner. 
  
iii 
 
ACKNOWLEDGEMENTS 
 
I wish to thank my committee members for their support and guidance resulting in the 
successful completion of my graduate work: Dr. Brent Peyton, Dr. Matthew Fields, Dr. Joann 
Mudge, and Dr. Mensur Dlakic and Dr. Robin Gerlach for feedback. I would especially like to 
take a moment to acknowledge Brent Peyton for his continued support throughout the years. 
Thank you to Mensur Dlakic for your help with ezTree. I am especially grateful to Donna 
Negaard for keeping me on track throughout this process. Thank you to all present and past 
members of the Peyton Lab, especially Rob Gardner, Lisa Kirk, Tisza Bell, Muneeb Rathore, 
Everett Eustance, and Todd Pedersen. This work could not have been done without help in the 
lab over the years: Hannah Newhouse, Nathan Murphy, Daniel McDonald, Cansu Bozbiyik, 
Burcu Orza, and Berrak Erturk. I would also like to thank my brother, Brian, for always being 
my biggest advocate, my mother and Pamela Cersosimo for your love, guidance, wisdom and 
support. Thank you to my friends who have been supportive over the years, especially, Karla 
Sartor and Brian Larson, Anne Rockhold and Sarah Huth. Thank you to everyone at The Center 
for Biofilm Engineering for your friendships and providing a collaborative work environment. I 
want to acknowledge everyone at NCGR, especially Joann Mudge, Thiru Ramaraj, Connor 
Cameron, Callum Bell, Kathy Meyers, Anitha Sundararjan, and Nico Devitt. This work was 
supported by a NM INBRE Pilot Award, Bridge Funding, Dissertation Completion Awards, TBI, 
DOE, NSF, and Yellowstone National Park. 
 
  
iv 
 
TABLE OF CONTENTS 
 
1. INTRODUCTION ...................................................................................................................... 1 
Background ................................................................................................................................. 1 
Difference between Prokaryotic and Eukaryotic Genome Projects ........................................ 2 
Sequencing Types ....................................................................................................................... 3 
Sequencing by Synthesis ......................................................................................................... 3 
454 Pyrosequencing ................................................................................................................ 3 
Illumina ................................................................................................................................... 4 
Long-Read Technologies ........................................................................................................ 5 
Genome Assembly ...................................................................................................................... 5 
Scaffolding Approaches .......................................................................................................... 5 
Combining Technologies ........................................................................................................ 6 
What Is a Good Genome Assembly? ...................................................................................... 7 
Eukaryotic Genome Annotation ............................................................................................. 9 
Conclusion .............................................................................................................................. 9 
Dissertation Overview .............................................................................................................. 10 
2. BIODIESEL (MICROALGAE) ................................................................................................ 13 
Contribution of Authors and Co-Authors ................................................................................. 13 
Manuscript Information ............................................................................................................ 14 
Introduction ............................................................................................................................... 15 
Microalgae ................................................................................................................................ 17 
Extreme Environments .............................................................................................................. 20 
Targeting Extremophiles ........................................................................................................... 21 
Bioprospecting ...................................................................................................................... 23 
Algae as Biofuels ...................................................................................................................... 25 
Other Secondary Products ......................................................................................................... 31 
Take Home Message ................................................................................................................. 32 
3. CHARACTERIZATION OF NINE NOVEL GREEN ALGAE STRAINS FROM 
YELLOWSTONE NATIONAL PARK ....................................................................................... 33 
Contribution of Authors and Co-Authors ................................................................................. 33 
Manuscript Information ............................................................................................................ 34 
 
v 
 
TABLE OF CONTENTS CONTINUED 
Abstract ..................................................................................................................................... 35 
Introduction ............................................................................................................................... 36 
Materials & Methods ................................................................................................................ 38 
Screening Studies .................................................................................................................. 39 
In-depth Characterization Studies ......................................................................................... 40 
Bicarbonate Addition ............................................................................................................ 40 
Determination of Unialgal Strains and Strain Identification ................................................ 41 
Dry Cell Weight .................................................................................................................... 41 
Nitrate ................................................................................................................................... 41 
Nile Red Fluorescence .......................................................................................................... 42 
Results ....................................................................................................................................... 42 
Verification of Unialgal Strains and Strain Identification .................................................... 43 
Doubling Time ...................................................................................................................... 46 
Biomass Production .............................................................................................................. 49 
Highest Lipid Producing Strains ........................................................................................... 53 
Discussion ................................................................................................................................. 56 
Growth rate ............................................................................................................................... 56 
Biomass production .................................................................................................................. 56 
Lipid production ........................................................................................................................ 57 
Summary & Conclusions .......................................................................................................... 61 
4. DRAFT GENOME FOR A NOVEL, EXTREMOPHILIC, FRESHWATER DIATOM ........ 62 
Contribution of Authors and Co-Authors ................................................................................. 62 
Manuscript Information ............................................................................................................ 63 
Abstract ..................................................................................................................................... 64 
Introduction ............................................................................................................................... 65 
Methods ..................................................................................................................................... 68 
DNA Extraction .................................................................................................................... 68 
Whole-genome Sequencing ...................................................................................................... 68 
Illumina ................................................................................................................................. 68 
PacBio ................................................................................................................................... 68 
 
vi 
 
TABLE OF CONTENTS CONTINUED 
Assembly Methods ................................................................................................................ 69 
BUSCO ................................................................................................................................. 69 
Read Alignments and Validation .......................................................................................... 70 
Structural Annotation ............................................................................................................ 70 
Functional Annotation .......................................................................................................... 70 
K-mer Analysis ..................................................................................................................... 71 
Concatenated Protein Phylogenetic Tree .............................................................................. 71 
16S Amplified Sequencing and Analysis ............................................................................. 71 
RNA Sequencing and Transcriptome Assembly ...................................................................... 72 
Transcript Alignment and Assembly .................................................................................... 72 
Results ....................................................................................................................................... 73 
Assembly Comparison .......................................................................................................... 73 
Gene Space Completeness .................................................................................................... 74 
Transcriptome Assembly ...................................................................................................... 77 
Assembly Completeness ....................................................................................................... 80 
Comparative Genomics ......................................................................................................... 81 
RGd-1 Genome-based Metabolic Pathway Analysis ............................................................ 84 
Phycosphere Bacteria ............................................................................................................ 93 
Discussion ................................................................................................................................. 94 
Genome Observations ........................................................................................................... 94 
Metabolic Observations ........................................................................................................ 96 
Bacterial Cohabitants ............................................................................................................ 98 
Conclusions ............................................................................................................................. 101 
5. GENOME SEQUENCE FOR AN NOVEL BREVUNDIMONAS STRAIN ........................ 103 
Contribution of Authors and Co-Authors ............................................................................... 103 
Manuscript Information .......................................................................................................... 104 
Abstract ................................................................................................................................... 105 
Announcement ........................................................................................................................ 105 
6. SUMMARY AND FUTURE DIRECTIONS ......................................................................... 109 
Synopsis .................................................................................................................................. 109 
 
vii 
 
TABLE OF CONTENTS CONTINUED 
Strain Selection ....................................................................................................................... 110 
Future Directions .................................................................................................................... 111 
Future Work ........................................................................................................................ 114 
Closing .................................................................................................................................... 114 
References ............................................................................................................................... 116 
APPENDICES ........................................................................................................................ 144 
APPENDIX A: CHARACTERIZATION OF NINE NOVEL GREEN ALGAE 
STRAINS FROM YELLOWSTONE NATIONAL PARK ................................................... 145 
APPENDIX B: RGD-1 GENOME SUPPLEMENTARY DATA .......................................... 150 
Hight Molecular Weight DNA Extraction .......................................................................... 151 
BioNano and Assembly Data .............................................................................................. 152 
Transcript Raw Read Data .................................................................................................. 153 
Metabolic Pathways ............................................................................................................ 222 
APPENDIX C: DETERMINING THE EFFECTS OF BLUE LIGHT ON THE RGd-1 
GROWTH RATE .................................................................................................................... 228 
Introduction ............................................................................................................................. 229 
Background ............................................................................................................................. 231 
Methods ................................................................................................................................... 233 
Results and discussion ............................................................................................................ 235 
Conclusions ............................................................................................................................. 242 
APPENDIX D: THE EFFECTS OF ARSENIC SUPPLEMENTATION ON RGd-1 
GROWTH RATE AND LIPID ACCUMULATION ............................................................. 244 
Introduction ............................................................................................................................. 245 
Methods ................................................................................................................................... 249 
Initial Testing ...................................................................................................................... 249 
Ash Free Dry Weight (AFDW) ........................................................................................... 250 
FAME analysis using GC-MS ............................................................................................ 251 
Determination of the optimal P:As ratio ............................................................................. 252 
Results ..................................................................................................................................... 253 
Initial testing ....................................................................................................................... 254 
Arsenate .............................................................................................................................. 257 
 
viii 
 
TABLE OF CONTENTS CONTINUED 
Discussion ............................................................................................................................... 261 
Summary & Conclusions ........................................................................................................ 263 
Sodium arsenite – Supplementary data ............................................................................... 263 
APPENDIX E: STRATEGIES FOR OPTIMIZING BIONANO AND DOVETAIL 
EXPLORED THROUGH A SECOND REFERENCE QUALITY ASSEMBLY FOR 
THE LEGUME MODEL, MEDICAGO TRUNCATULA ....................................................... 270 
Abstract ................................................................................................................................... 272 
Background ............................................................................................................................. 273 
Results ..................................................................................................................................... 276 
Assembly Continuity .......................................................................................................... 276 
Assembly Completeness ..................................................................................................... 278 
Gene Space Completeness .................................................................................................. 279 
Joins and Breaks ................................................................................................................. 280 
Joins and Breaks in Relation to A17 ................................................................................... 282 
Gaps .................................................................................................................................... 283 
Ordering of Technologies ................................................................................................... 285 
Final Assembly Draft .......................................................................................................... 285 
Novel sequences revealed by the R108 assembly ............................................................... 287 
Chromosomal-scale translocation ....................................................................................... 288 
Discussion ............................................................................................................................... 293 
Novel Sequence Was Found in the R108 Assembly .......................................................... 293 
Technologies Made Similar Continuity Gains and Are Valuable Individually .................. 293 
Further Gains Were Made Using Both Technologies ......................................................... 294 
Join Accuracy Appears to be Higher in Dovetail Compared To BioNano ......................... 295 
Strengths and Weaknesses Dictate Strategy for Ordering Technologies ............................ 297 
Conclusions ............................................................................................................................. 298 
Methods ................................................................................................................................... 299 
PacBio Sequencing and Assembly ...................................................................................... 299 
Dovetail ............................................................................................................................... 300 
BioNano .............................................................................................................................. 301 
 
ix 
 
TABLE OF CONTENTS CONTINUED 
Illumina ............................................................................................................................... 302 
Transcriptome assembly ..................................................................................................... 303 
BUSCO ............................................................................................................................... 303 
Read alignments .................................................................................................................. 304 
Structural Annotation .......................................................................................................... 304 
Identification of structural rearrangements and novel sequences in R108 ......................... 305 
List of Abbreviations: ......................................................................................................... 305 
Availability of data and material ......................................................................................... 306 
Additional Files ................................................................................................................... 306 
APPENDIX F: SOURCES AND RE-SOURCES:  IMPORTANCE OF NUTRIENTS, 
RESOURCE ALLOCATION, AND ECOLOGY IN MICROALGAL CULTIVATION 
FOR LIPID ACCUMULATION ............................................................................................ 307 
Abstract ................................................................................................................................... 309 
Introduction ............................................................................................................................. 310 
Nutrient Dependent Lipid Accumulation ............................................................................ 313 
Nitrogen and Phosphorus .................................................................................................... 316 
Carbon ................................................................................................................................. 319 
Silicon Limitation ............................................................................................................... 320 
Iron Limitation .................................................................................................................... 321 
Biofilm Growth ................................................................................................................... 322 
Ecological Effects ............................................................................................................... 324 
Integrating Life-Cycle Analysis .......................................................................................... 326 
Conclusion .............................................................................................................................. 328 
APPENDIX G: DIRECT MEASUREMENT AND CHARACTERIZATION OF 
ACTIVE PHOTOSYNTHESIS ZONES INSIDE WASTEWATER REMEDIATING 
AND POTENTIAL BIOFUEL PRODUCING MICROALGAL BIOFILMS ....................... 330 
Abstract ................................................................................................................................... 332 
Introduction ............................................................................................................................. 333 
Materials and methods ............................................................................................................ 335 
Laboratory Strains, Culturing Conditions, and Biomass Sampling. ................................... 335 
Outdoor Culturing Conditions. ........................................................................................... 337 
Oxygen Microsensor Analysis. ........................................................................................... 337 
x 
 
Lipid Analysis. .................................................................................................................... 338 
Results and discussion ............................................................................................................ 339 
Biofilm Cultivation ............................................................................................................. 339 
Field-RABR for Wastewater Remediation ......................................................................... 341 
Nitrogen Depletion in Lab-RABR Samples ....................................................................... 347 
Biofuel Precursor Production. ............................................................................................. 352 
Conclusions ............................................................................................................................. 356 
Appendix G: Supplementary Data ...................................................................................... 356 
APPENDIX H: DISSOLVED INORGANIC CARBON ENHANCED GROWTH, 
NUTRIENT UPTAKE, AND LIPID ACCUMULATION IN WASTEWATER 
GROWN MICROALGAL BIOFILMS .................................................................................. 357 
Manuscript Information .......................................................................................................... 358 
Abstract ................................................................................................................................... 359 
Introduction ............................................................................................................................. 360 
Materials and methods ............................................................................................................ 362 
Microalgal biofilm culturing and sampling ........................................................................ 362 
Water quality monitoring .................................................................................................... 363 
Oxygen microsensor analysis ............................................................................................. 364 
Biodiesel analysis ................................................................................................................ 364 
Results and discussion ............................................................................................................ 365 
Microalgae growth rate and yield ....................................................................................... 365 
Removal of nitrogen and phosphorus from synthetic wastewater using algal biofilms ..... 366 
Microalgal biofilm photosynthesis and coupled respiration ............................................... 370 
Biofuel precursor production .............................................................................................. 376 
Conclusions ............................................................................................................................. 378 
Appendix H: Supplementary Data ...................................................................................... 379 
 
 
  
xi 
 
LIST OF TABLES 
 
Table Page 
 
Table 2.1 Examples of extremophilic microalgae and their desirable temperature, pH, and 
salinity conditions with each having at least one environmental condition outside of 
normal range. ................................................................................................................................ 22 
Table 3.1 SSU rDNA (18S) DNA concentrations for the extracted, amplified, and 
purified prior to 454-Pyrosequencing for the nine YNP green algae strains. DNA was 
quantified using a Qubit fluorometer with a dsDNA BR Assay kit. ............................................ 43 
Table 3.2 Representative sequences for 18S SSU rDNA. Each sequence was BLAST 
searched for identification. There were three strains that had identical BLAST results 
with different identifications for 18S (MF1, PGV6, and PGV10-G1), which represents the 
diverse collection of sequences in NCBI. ..................................................................................... 44 
Table 3.3 BLAST identification of ITS amplicons obtained by Sanger sequencing. The 
sequence identity was determined by BLAST search for identification. There were two 
strains that had identical BLAST results with different identifications ITS (PGV6 and 
WC-1). .......................................................................................................................................... 45 
Table 4.1 Genome assembly statistics for two RGd-1 assembly versions, v. 1.0 and v. 1.5 
(with a small percentage of additional long PacBio reads). .......................................................... 73 
Table 4.2 The RGd-1 v.1.0 genome assembly was analyzed using five BUSCO lineages, 
Eukaryota, Protists, Alveolata/Stramenopiles, Chlorophyta, and Embryophyta. The gene 
capture percentage was measured as a fraction of the total number of searchable BUSCOs 
identified in the assemblies. .......................................................................................................... 75 
Table 4.3 Gene capture measured by BUSCO. A total of 303 BUSCOs were searched 
within the eukaryota lineage. ........................................................................................................ 76 
Table 4.4 RGd-1 v.1.0 genome assemblies were compared to the other publicly available 
diatom genome assemblies, P. tricornutum, T. pseudonana, C. cryptica, P. multiseries 
and, F. cylindrus. The gene capture percent was measured as a fraction of the total 
number of searchable BUSCOs identified in the assemblies. A total of 303 BUSCOs were 
searched within the eukaryota lineage. ......................................................................................... 77 
Table 4.5 Genome assembly statistics for two RGd-1 transcriptome assembly versions, de 
novo and reference-guided using the RGd-1 v.1.0 genome assembly. Each transcriptome 
assembly shows the statistics pre- and post-filtering for reads ≥ 500 bp. ..................................... 79 
xii 
 
Table 4.6 Two different RGd-1transcriptome assemblies were compared; a de novo 
assembly and reference-guided assembly. The gene capture percent was measured as a 
fraction of the total number of searchable BUSCOs identified in the assemblies. A total 
of 303 BUSCOs were searched within the eukaryota lineage. ..................................................... 79 
Table 4.7 Whole-genome alignments using BWA mem where the RGd-1 v. 1.0 genome 
assembly was indexed and the other assembly was queried. The publicly available diatom 
genomes, C. cryptica, F. cylindrus, P. multiseries, P. tricornutum, and T. pseudonana 
were each aligned to RGd-1 on the nucleotide-level. ................................................................... 82 
Table 4.8 Identification for 16S amplified sequencing in the RGd-1 culture. Organisms 
were identified using the 16S RDB Database within CLARK.199, 200 Each organism was 
calculated for the percentage of all of the categories below (the 9 genera identified and 
the unknowns) and the percentage classified (the 9 genera excluding the unknowns). ............... 94 
Table 5.1 Genome assembly statistics for Brevundimonas sp., strain KM-427. The 
genome was assembled with Canu as part of an RGd-1 PacBio sequencing project.175 ............ 106 
Table 5.2 Gene capture measured by BUSCO. A total of 148 BUSCOs were searched 
within the bacteria odb9 lineage. ................................................................................................ 106 
Table A.1 The optimal Nile Red exposure stain times and stain methods for each of the 
11 green algae strains. Each strain was exposed to the lipophilic stain, Nile Red, in 20% 
DMSO and acetone until an optimal stain time was indicated. The stain method that 
resulted in higher fluorescence was selected as the proper stain method for each strain 
because that carrier (DMSO or acetone) was able to cross the cell membrane more 
effectively.1 ................................................................................................................................. 147 
Table A.2 The endpoint DCW and doubling times in the air-only and sodium bicarbonate 
added conditions for each of the 11 strains. Each condition was grown in triplicate. ................ 148 
Table A.3 Final DCWs and doubling times for each green algae strain for the control and 
sodium bicarbonate addition conditions. The DCWs were the average and 95% 
confidence interval of each triplicate at the time of harvest for each experiment. ..................... 149 
Table B.1 DNA Extraction (JGI Method).2 The 1.5, 30 and 60 mL headers refer to the 
container volumes recommended for the DNA extraction volumes. Fifty milliliters were 
centrifuged and adjusted to ~OD600 1.0 as indicated in step 6. To improve cell wall 
breakage, mechanical stress was applied with sterile sand, mortar and pestle and, liquid 
nitrogen. Rather than using isopropanol, the DNA was suspended in molecular grade 
ethanol in the -20C freezer overnight to improve DNA precipitation. ....................................... 151 
 
 
xiii 
 
Table B.2 Assembly statistics for BioNano data. RGd-1 biomass was submitted to the 
Bioinformatics Center at Kansas State for high molecular weight (HMW) DNA extraction and 
whole genome map assembly. The HMW DNA was digested using the endonuclease, Dnase1 to 
introduce nicks and create 3’ hydroxyl group. DNA polymerase 1 catalyzed the addition of 
fluorescently labeled Alexa 546 dUTP fluorescent dyes that attached to the nucleotides at the 3’ 
hydroxyl group. 5’ to 3’ exonuclease activity removed the nucleotides from the 5’ phosphoryl 
terminus of the nick. The labeled and unlabeled nucleotides displaced the excised nucleotides in 
the original DNA strand. The fluorescently-labeled DNA were visualized using the intercalating 
dye, YOYO-1. The labeled DNA was added to an IrysChip flow cell, linearized with an 
electrophoretic current and imaged. ............................................................................................ 152 
Table B.3 Pfam proteins. Seven Pfam proteins were found in common among the 18 algal 
genomes that were used for the concatenated protein tree using the ezTree, pipeline. .............. 152 
Table C.1 Growth conditions (filter types) and the measured PAR passing through the filter 
measured with a spectroradiometer (Ocean Optics). .................................................................. 234 
Table C.2 Growth conditions (filter types) and doubling times for blue light growth studies. Each 
condition was grown in duplicate. One replicate in the yellow condition was excluded due to 
severe clumping. ......................................................................................................................... 239 
Table D.1 2013 Witch Creek water analyses and B8.7SiS chemical concentrations. ................ 251 
Table D.2 Molar Phosphorus and arsenic ratios used in arsenate experiments.61 ...................... 252 
Table D.3 Descriptions of arsenate experiments. Seven experiments were performed with 
different concentrations of sodium arsenate. The first set of experiments was used to determine 
the optimal As:P ratio for RGd-1. That phosphorus concentration was used for all future 
experiments with varying As. ..................................................................................................... 253 
Table D.4 Doubling times and DCW for the RGd-1 initial testing with sodium arsenate. ........ 257 
Table E.1 Number and characteristics of contigs and scaffolds for each of the five assemblies.
 ..................................................................................................................................................... 277 
Table E.2 Characteristics of Input Scaffolds that were Joined by BioNano and/or Dovetail. .... 281 
Table E.3 Characteristics of the gaps introduced into the assemblies by BioNano and Dovetail.  
Note, there are no gaps in the Pb only base assembly so it is not included. ............................... 283 
Table E.4 Assembly Statistics for R108 version 1.0 (PbDtBn PBJelly gap filled) and its input 
assembly (PbDtBn). .................................................................................................................... 286 
 
xiv 
 
Table E.5 R108 v 1.0 assembly characteristics in comparison to the A17 reference assembly. 288 
Table F.1 Genera of 56 eukaryotic, photoautotrophs previously studied and reported for the 
accumulation of lipids.  Modified from Breuer et al. (2012).185 ................................................. 317 
Table G.1 Measurements of areal photosynthesis rates, areal respiration rates and relevant depth 
scales for the laboratory- and field-RABR cultured biofilms. .................................................... 340 
Table G.2 Mean extractable biofuel precursor weight % and areal concentrations for the 
laboratory- and field-RABR cultured biofilms (n = 3 with one standard deviation error, or n=2 
with range reported as error). ......................................................... Error! Bookmark not defined. 
Table G.3 Mean FAME %, weight %, and areal concentration from the laboratory- and field-
RABR cultured biofilms. Biomass was directly transesterified to determine total biofuel potential 
from all fatty acid precursor molecules (extractable and non-extractable) (n=3 with one standard 
deviation error, or n = 2 with range reported as error). ............................................................... 353 
Table H.1 Measurements of photosynthetic rates, respiration rates, and relevant depth parameters 
for laboratory grown microalgal biofilms with and without bicarbonate amendment. .............. 373 
Table H.2 Total and percent composition of extractable biofuel precursor weight (%) in 
laboratory grown algal biofilms with and without bicarbonate amendment. ............................. 377 
 
 
  
xv 
 
LIST OF FIGURES 
Figure Page 
 
Figure 2.1 YNP diatom strain RGd-1 (left) and YNP green algal WC-1 (right, scale bar 
10µm). ........................................................................................................................................... 17 
Figure 2.2 Each bar represents the fold difference in Nile Red fluorescence intensities at 
15 d for each treatment compared to the 2 mM Si control. .......................................................... 18 
Figure 2.3 Inputs from thermal hot springs into Witch Creek (research in Yellowstone 
was conducted under an approved Yellowstone Research Permit [Permit # 5480]). ................... 21 
Figure 2.4 Typical heterogeneity of a sampling site containing green algae, diatoms and 
cyanobacteria (research in Yellowstone was conducted under an approved Yellowstone 
Research Permit [Permit # 5480]). ................................................................................................ 23 
Figure 2.5 RGd-1 transmitted light (left) and Nile Red fluorescence under epifluorescent 
light (right). ................................................................................................................................... 25 
Figure 2.6 Outdoor raceway pond (2000L) at Utah State University, Logan UT. ....................... 27 
Figure 2.7 Photobioreactor illuminated (Green Wave Energy, Inc.) by artificial light in 
pilot-scale laboratory setting at Montana State University, Bozeman MT. .................................. 28 
Figure 3.1 Outline for algae isolation beginning from field collection to strain 
characterization. Each strain was streaked for isolation on solid growth medium, grown in 
liquid growth medium and visualized microscopically at each step to ensure strain 
isolation. ........................................................................................................................................ 39 
Figure 3.2 Full-length ITS Sanger results used for strain identification. Each strain 
(median = 1194 bp), was aligned with Muscle (v3.8.1551)134 and phylogenetic distances 
were determined using the Maximum Likelihood method with RaxML (version 
8.2.12).127 The scale bar represents number of nucleotide changes between strains. The 
bootstrap values present at the nodes represent the divergence event on a time scale. ................ 46 
Figure 3.3 Average doubling times based on the maximal growth rates for each of the 
eleven strains (replicate 1 and replicate 2). The error bars represent 95% confidence 
intervals. ........................................................................................................................................ 47 
 
 
xvi 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure 3.4 Growth and lipid accumulation observations for the cultures with the fastest 
growth rates, WC-1, WC-2b and WC-5 with and without sodium bicarbonate addition: 
cell concentration (A), pH (B), medium nitrate concentration (C) and Nile Red 
fluorescence (D) for the fastest doubling times. The error bars represent 95% confidence 
intervals. For Figures A-C, the series are represented by the following: WC-5 (circles), 
WC-1 (triangles) and WC-2b (squares). In Figure D1, WC-1 (circles) and WC-2b 
(triangles) and Figure D2 WC-5 (circles).The open and filled symbols represent air only 
and sodium bicarbonate addition, respectively, for all conditions. ............................................... 50 
Figure 3.5 Growth and lipid accumulation observations for the cultures with the highest 
biomass production as DCW, PC-3 (circles), and WC-2b (triangles), with and without 
sodium bicarbonate addition: cell concentration (A), pH (B), medium nitrate 
concentration (C) and neutral lipids (D). The error bars represent 95% confidence 
intervals. The open and filled symbols represent air only and sodium bicarbonate 
addition, respectively, for all conditions. ...................................................................................... 52 
Figure 3.6 Average final Nile Red fluorescence for each of the eleven strains (with and 
without sodium bicarbonate addition). The error bars represent 95% confidence intervals. ....... 53 
Figure 3.7 Growth and lipid accumulating observations for the cultures with the highest 
lipid production, WC-5 (circles) and UTEX 395 (triangles), with and without sodium 
bicarbonate addition: cell concentration (A), pH (B), medium nitrate concentration (C), 
and FAMEs (D). The error bars represent 95% confidence intervals. The open and filled 
symbols represent air only and sodium bicarbonate addition, respectively, for all 
conditions. ..................................................................................................................................... 55 
Figure 4.1 Comparison of two draft RGd-1 genome assemblies, v. 1.0 and v. 1.5. The 
difference between the two assemblies was the inclusion of an additional small PacBio 
dataset. This figure was generated using MultiQC.204 ................................................................ 74 
Figure 4.2 The number trimmed paired-end reads that were uniquely aligned as pairs, had 
one mate pair uniquely, one mate mapped in multiple locations, the pairs mapped 
discordantly, the pair-end reads mapped in multiple locations, or neither read aligned. 
Reads were aligned using HiSat2 and the figure was generated using MultiQC.204 .................... 78 
Figure 4.3 A k-mer sweep generated by KAT34 using a k-mer length of 27. The analysis 
was performed using the paired-end, 50 bp, Illumina reads. ........................................................ 81 
 
 
xvii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure 4.4 The protein sequences from fifteen publicly available genome annotations (1 
red alga, 1 brown alga, 11 green algae and 3 diatoms) and RGd-1 were used to construct 
a concatenated protein using a modified ezTree pipeline.193 RGd-1 was phylogenetically 
closest to P. tricornutum on a protein-level. The proteins were trimmed using trimAL196 
and aligned with MAFFT-L-INS-i.195 The scale represents the bootstrap values. ....................... 83 
Figure 4.5 Comparison of P. tricornutum and RGd-1 assemblies based on amino acid 
sequences. The x-axes contains the scaffolds for the P. tricornutum assembly and the y-
axes contains the 520 scaffolds for the RGd-1 genome assembly. Within the mummer 
packager, promer translates the nucleic acid-based assemblies into amino acids.211 
Perfectly syntenous assemblies would have a slope of 1.0. .......................................................... 84 
Figure 4.6 Annotated carbon fixation pathway in photosynthetic organisms. The green 
boxes represent genes that are present in the RGd-1 genome. The annotated pathway was 
produced using KeggMapper.191 ................................................................................................... 86 
Figure 4.7 Annotated Citrate Acid Cycle pathway. The green boxes represent genes that 
are present in the RGd-1 genome. The annotated pathway was produced using 
KeggMapper.191 ............................................................................................................................ 87 
Figure 4.8 Annotated Glyoxylate and Dicarboxylate Metabolism. The green boxes 
represent genes that are present in the RGd-1 genome. The annotated pathway was 
produced using KeggMapper.191 ................................................................................................... 87 
Figure 4.9 Annotated fatty acid metabolism. The green boxes represent genes that are 
present in the RGd-1 genome. The annotated pathway was produced using 
KeggMapper.191 ............................................................................................................................ 89 
Figure 4.10 Annotated Fatty Acid Degradation Metabolism. The green boxes represent 
genes that are present in the RGd-1 genome. The annotated pathway was produced using 
KeggMapper.191 ............................................................................................................................ 90 
Figure 4.11 Annotated Glycerolipid Metabolism. The green boxes represent genes that 
are present in the RGd-1 genome. The annotated pathway was produced using 
KeggMapper.191 ............................................................................................................................ 92 
Figure 4.12 The citric acid cycle with the glyoxylate and dicarboxylate pathway which 
diverts isocitrate to malate.226 The two enzymes, isocitrate lyase and malate synthase 
modify the citric acid cycle avoiding two decarboxylation steps resulting in the formation 
of malate from 2 molecules of acetyl-CoA.226 .............................................................................. 97 
xviii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure 4.13 Potential mechanisms of symbiosis between marine diatoms and bacteria. 
Phytoplankton, such as diatoms may provide dissolved organic carbon (DOC), particulate 
organic carbon (POC), and other complex algal polysaccharides. The bacteria may supply 
micronutrients, macronutrients, and vitamins such as B12.160,162-165 .......................................... 100 
Figure 5.1 The Brevundimonas sp. strain, KM-427, genome annotation major features. .......... 107 
Figure A.1 Light microscopy images of the nine YNP green algae isolates. (A) PGV-6 
(B) PGV8-G1 (C) PGV8-G2 (D) PGV10-G1 (E) PGV10-G2 (F) WC2b (G) WC-5A (H) 
MF1 and (I) WC-1. ..................................................................................................................... 146 
Figure B.1 Each sample was analyzed using FastQC and compiled within MultiQC for 
their unique and duplicate sequence counts. There were a total of nine samples, three 
culture conditions and three replicates for each condition. The forward (R1) and reverse 
(R2) reads were analyzed for each sample. Samples A1-A3 had the largest number of 
unique reads among the sequenced samples and C1-C3 had the least number of unique 
reads. ........................................................................................................................................... 153 
Figure B.2 This figure indicates the presence of adapter sequence contamination in 
sample A1-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 154 
Figure B.3 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample A1-R1. Here, 23.67% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 154 
Figure B.4 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample A1-R1. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 155 
Figure B.5 Quality scores across the positions of the 150 bp reads for sample A1-R1. The 
x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 155 
xix 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.6 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample A1-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 156 
Figure B.7 The per sequence GC content for sample A1-R1. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. ............................................................................................................................... 156 
Figure B.8 The per sequence quality score for sample A1-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 157 
Figure B.9 The distribution of the sequence lengths for sample A1-R1. The x- and y-axes 
refer to the read sequence lengths and the number of reads with those lengths. The data 
here indicates that all reads were 150 bp. ................................................................................... 157 
Figure B.10 This figure indicates the presence of adapter sequence contamination in 
sample A1-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 158 
Figure B.11 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample A1-R2. Here, 29.88% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 158 
Figure B.12 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample A1-R2. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 159 
 
 
xx 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.13 Quality scores across the positions of the 150 bp reads for sample A1-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 159 
Figure B.14 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample A1-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 160 
Figure B.15 The per sequence GC content. The x- and y-axes represent the %GC content 
per read and read counts, respectively for sample A1-R2. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. ............................................................................................................................... 160 
Figure B.16 The per sequence quality score for sample A1-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 161 
Figure B.17 The distribution of the sequence lengths for sample A1-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 161 
Figure B.18 This figure indicates the presence of adapter sequence contamination in 
sample A2-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 162 
Figure B.19 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample A2-R1. Here, 26.72% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 162 
 
xxi 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.20 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample A2-R1. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 163 
Figure B.21 Quality scores across the positions of the 150 bp reads for sample A2-R1. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 163 
Figure B.22 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample A2-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 164 
Figure B.23 The per sequence GC content for sample A2-R1. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. ............................................................................................................................... 164 
Figure B.24 The per sequence quality score for sample A2-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 165 
Figure B.25 The distribution of the sequence lengths for sample A2-R1. The x- and y-axes 
refer to the read sequence lengths and the number of reads with those lengths. The data 
here indicates that all reads were 150 bp. ................................................................................... 165 
Figure B.26 This figure indicates the presence of adapter sequence contamination in 
sample A2-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 166 
 
 
xxii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.27 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample A2-R2. Here, 33.26% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 166 
Figure B.28 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample A2-R2. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 167 
Figure B.29 Quality scores across the positions of the 150 bp reads for sample A2-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 167 
Figure B.30 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample A2-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 168 
Figure B.31 The per sequence GC content for sample A2-R2. The x- and y-axes represent 
the %GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. ............................................................................................................................... 168 
Figure B.32 The per sequence quality score for sample A2-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 169 
Figure B.33 The distribution of the sequence lengths for sample A2-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 169 
 
 
xxiii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.34 This figure indicates the presence of adapter sequence contamination in 
sample A3-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 170 
Figure B.35 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample A3-R1. Here, 19.32% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 170 
Figure B.36 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample A3-R1. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 171 
Figure B.37 Quality scores across the positions of the 150 bp reads for sample A3-R1. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 171 
Figure B.38 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample A3-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 172 
Figure B.39 The per sequence GC content for sample A3-R1. The x- and y-axes represent 
the %GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. ............................................................................................................................... 172 
Figure B.40 The per sequence quality score for sample A3-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 173 
 
xxiv 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.41 The distribution of the sequence lengths for sample A3-R1. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 173 
Figure B.42 This figure indicates the presence of adapter sequence contamination in 
sample A3-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 174 
Figure B.43 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample A3-R2. Here, 23.64% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 174 
Figure B.44 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample A3-R2. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 175 
Figure B.45 Quality scores across the positions of the 150 bp reads for sample A3-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 175 
Figure B.46 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample A3-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent nucleotide 
lines should be parallel to each other. The erratic peaks in the beginning of the reads is due 
random hexamer ligation that imparts bias in RNA-seq libraries. While the bias does not 
affect the entirety of the reads, it enriches for k-mers at the 5’end of the reads.3 ...................... 176 
Figure B.47 The per sequence GC content for sample A3-R2. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. ............................................................................................................................... 176 
 
xxv 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.48 The per sequence quality score for sample A3-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 177 
Figure B.49 The distribution of the sequence lengths for sample A3-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 177 
Figure B.50 This figure indicates the presence of adapter sequence contamination in 
sample B1-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 178 
Figure B.51 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample B1-R1. Here, 29.46% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 178 
Figure B.52 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample B1-R1. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 179 
Figure B.53 Quality scores across the positions of the 150 bp reads for sample B1-R1. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 179 
Figure B.54 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample B1-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 180 
 
xxvi 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.55 The per sequence GC content for sample B1-R1. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. The two broad, red peaks that shift away from the mean %GC = 47 
indicates contamination. ............................................................................................................. 180 
Figure B.56 The per sequence quality score for sample B1-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 181 
Figure B.57 The distribution of the sequence lengths for sample B1-R1. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 181 
Figure B.58 This figure indicates the presence of adapter sequence contamination in 
sample B1-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 182 
Figure B.59 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample B1-R2. Here, 24.99% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 182 
Figure B.60 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample B1-R2. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 183 
Figure B.61 Quality scores across the positions of the 150 bp reads for sample B1-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 183 
 
 
xxvii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.62 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample B1-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 184 
Figure B.63 The per sequence GC content for sample B1-R2. The x- and y-axes represent 
the %GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. The two broad, red peaks that shift away from the mean %GC = 47 
indicates contamination. ............................................................................................................. 184 
Figure B.64 The per sequence quality score for sample B1-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 185 
Figure B.65 The distribution of the sequence lengths for sample B1-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 185 
Figure B.66 This figure indicates the presence of adapter sequence contamination in 
sample B2-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 186 
Figure B.67 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample B2-R1. Here, 14.12% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 186 
Figure B.68 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample B2-R1. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 187 
 
xxviii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.69 Quality scores across the positions of the 150 bp reads. The x- and y-axes 
represent the quality scores and position within the read for sample B2-R1. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 187 
Figure B.70 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample B2-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 188 
Figure B.71 The per sequence GC content for sample B2-R1. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. The two broad, red peaks that shift away from the mean %GC = 47 
indicates contamination. ............................................................................................................. 188 
Figure B.72 The per sequence quality score for sample B2-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 189 
Figure B.73 The distribution of the sequence lengths for sample B2-R1. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 189 
Figure B.74 This figure indicates the presence of adapter sequence contamination in 
sample B2-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 190 
Figure B.75 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample B2-R2. Here, 14.12% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 190 
 
xxix 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.76 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample B2-R2. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 191 
Figure B.77 Quality scores across the positions of the 150 bp reads for sample B2-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 191 
Figure B.78 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample B2-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 192 
Figure B.79 The per sequence GC content for sample B2-R2. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. The two broad, red peaks that shift away from the mean %GC = 47 
indicates contamination. ............................................................................................................. 192 
Figure B.80 The per sequence quality score for sample B2-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 193 
Figure B.81 The distribution of the sequence lengths for sample B2-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 193 
Figure B.82 This figure indicates the presence of adapter sequence contamination in 
sample B3-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 194 
 
xxx 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.83 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample B3-R1. Here, 20.06% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 194 
Figure B.84 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample B3-R1. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 195 
Figure B.85 Quality scores across the positions of the 150 bp reads for sample B3-R1. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 195 
Figure B.86 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample B3-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 196 
Figure B.87 The per sequence GC content for sample B3-R1. The x- and y-axes represent 
the % GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. The two broad, red peaks that shift away from the mean %GC = 47 
indicates contamination. ............................................................................................................. 196 
Figure B.88 The per sequence quality score for sample B3-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 197 
Figure B.89 The distribution of the sequence lengths for sample B3-R1. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 197 
 
xxxi 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.90 This figure indicates the presence of adapter sequence contamination in 
sample B3-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 198 
Figure B.91 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample B3-R2. Here, 15.98% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 198 
Figure B.92 The percentage of Ns and their positions across all bases in the 150 bp reads 
for sample B3-R2. The y-axis represents the percentage and the x-axis represents the 
position of the base in the read. The data presented here indicate that there are 0% Ns in 
the reads and all nucleotides are composed of A, T, C and Gs. .................................................. 199 
Figure B.93 Quality scores across the positions of the 150 bp reads for sample B3-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 199 
Figure B.94 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample B3-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 200 
Figure B.95 The per sequence GC content for sample B3-R2. The x- and y-axes represent 
the %GC content per read and read counts, respectively. The blue line represents the GC 
content theoretical distribution and the red line represents the actual GC content for the 
150 bp reads. The two broad, red peaks that shift away from the mean %GC = 47 
indicates contamination. ............................................................................................................. 200 
Figure B.96 The per sequence quality score for sample B3-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 201 
xxxii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.97 The distribution of the sequence lengths for sample B3-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 201 
Figure B.98 This figure indicates the presence of adapter sequence contamination in 
sample C1-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 202 
Figure B.99 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample C1-R1. Here, 4.65% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 202 
Figure B.100 The percentage of Ns and their positions across all bases in the 150 bp 
reads for sample C1-R1. The y-axis represents the percentage and the x-axis represents 
the position of the base in the read. The data presented here indicate that there are 0% Ns 
in the reads and all nucleotides are composed of A, T, C and Gs. ............................................. 203 
Figure B.101 Quality scores across the positions of the 150 bp reads for sample C1-R1. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 203 
Figure B.102 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample C1-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 204 
Figure B.103 The per sequence GC content for sample C1-R1. The x- and y-axes 
represent the %GC content per read and read counts, respectively. The blue line 
represents the GC content theoretical distribution and the red line represents the actual 
GC content for the 150 bp reads. The shift away from the mean %GC = 47 indicates 
bacterial contamination. .............................................................................................................. 204 
xxxiii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.104 The per sequence quality score for sample C1-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 205 
Figure B.105 The distribution of the sequence lengths for sample C1-R1. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 205 
Figure B.106 This figure indicates the presence of adapter sequence contamination in 
sample C1-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 206 
Figure B.107 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample C1-R2. Here, 8.8% of the reads 
were deduplicated. The peaks in the blue line, or the total sequences indicate the presence 
of contaminants or highly expressed transcripts under the conditions that were tested. ............ 206 
Figure B.108 The percentage of Ns and their positions across all bases in the 150 bp 
reads for sample C1-R2. The y-axis represents the percentage and the x-axis represents 
the position of the base in the read. The data presented here indicate that there are 0% Ns 
in the reads and all nucleotides are composed of A, T, C and Gs. ............................................. 207 
Figure B.109 Quality scores across the positions of the 150 bp reads for sample C1-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 207 
Figure B.110 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample C1-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 208 
 
 
xxxiv 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.111 The per sequence GC content for sample C1-R2. The x- and y-axes 
represent the % GC content per read and read counts, respectively. The blue line 
represents the GC content theoretical distribution and the red line represents the actual 
GC content for the 150 bp reads. The shift away from the mean % GC = 47 indicates 
bacterial contamination. .............................................................................................................. 208 
Figure B.112 The per sequence quality score for sample C1-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 209 
Figure B.113 The distribution of the sequence lengths for sample C1-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 209 
Figure B.114 This figure indicates the presence of adapter sequence contamination in 
sample C2-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 210 
Figure B.115 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample C2-R2. Here, 7.08% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 210 
Figure B.116 The percentage of Ns and their positions across all bases in the 150 bp 
reads for sample C2-R2. The y-axis represents the percentage and the x-axis represents 
the position of the base in the read. The data presented here indicate that there are 0% Ns 
in the reads and all nucleotides are composed of A, T, C and Gs. ............................................. 211 
Figure B.117 Quality scores across the positions of the 150 bp reads for sample C2-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 211 
 
 
xxxv 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.118 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample C2-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 212 
Figure B.119 The per sequence GC content for sample C2-R2. The x- and y-axes 
represent the % GC content per read and read counts, respectively. The blue line 
represents the GC content theoretical distribution and the red line represents the actual 
GC content for the 150 bp reads. The shift away from the mean %GC = 47 indicates 
bacterial contamination. The sharp peak may indicate overrepresented bacterial reads. ........... 212 
Figure B.120 The per sequence quality score for sample C2-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 213 
Figure B.121 The distribution of the sequence lengths for sample C2-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 213 
Figure B.122 This figure indicates the presence of adapter sequence contamination in 
sample C3-R1. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 214 
Figure B.123 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample C3-R1. Here, 29.88% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 214 
Figure B.124 The percentage of Ns and their positions across all bases in the 150 bp 
reads for sample C3-R1. The y-axis represents the percentage and the x-axis represents 
the position of the base in the read. The data presented here indicate that there are 0% Ns 
in the reads and all nucleotides are composed of A, T, C and Gs. ............................................. 215 
 
xxxvi 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.125 Quality scores across the positions of the 150 bp reads for sample C3-R1. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 215 
Figure B.126 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample C3-R1. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 216 
Figure B.127 The per sequence GC content for sample C3-R1. The x- and y-axes 
represent the % GC content per read and read counts, respectively. The blue line 
represents the GC content theoretical distribution and the red line represents the actual 
GC content for the 150 bp reads. The shift away from the mean %GC = 47 indicates 
bacterial contamination. .............................................................................................................. 216 
Figure B.128 The per sequence quality score for sample C3-R1. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 217 
Figure B.129 The distribution of the sequence lengths for sample C3-R1. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 217 
Figure B.130 This figure indicates the presence of adapter sequence contamination in 
sample C3-R2. The x- and y-axes represent the position within the 150 bp read and the 
percentage, respectively. The red line indicates the presence of the Illumina Universal 
Adapter starting at approximately bases 78-79 of 150, indicating the adapter trimming 
strategies should occur in the middle to 3’ end of the reads. ...................................................... 218 
Figure B.131 The blue line represents the percent of the total sequences and the red line 
indicates the deduplicated (unique) sequences for sample C3-R2. Here, 8.93% of the 
reads were deduplicated. The peaks in the blue line, or the total sequences indicate the 
presence of contaminants or highly expressed transcripts under the conditions that were 
tested. .......................................................................................................................................... 218 
 
xxxvii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.132 The percentage of Ns and their positions across all bases in the 150 bp 
reads for sample C3-R2. The y-axis represents the percentage and the x-axis represents 
the position of the base in the read. The data presented here indicate that there are 0% Ns 
in the reads and all nucleotides are composed of A, T, C and Gs. ............................................. 219 
Figure B.133 Quality scores across the positions of the 150 bp reads for sample C3-R2. 
The x- and y-axes represent the quality scores and position within the read. The blue line 
represents the average quality of the bases at each position. The error bars represent 10 
and 90% of the reads fall within that range. The yellow box represents 25-75% of the 
reads falling within that range. .................................................................................................... 219 
Figure B.134 The percentage of each nucleotide, A, C, T and G and their positions across 
the 150 bp reads for sample C3-R2. The x- and y-axes represent the position within the 
150 bp reads and the percentage of each nucleotide. Ideally, each of the percent 
nucleotide lines should be parallel to each other. The erratic peaks in the beginning of the 
reads is due random hexamer ligation that imparts bias in RNA-seq libraries. While the 
bias does not affect the entirety of the reads, it enriches for k-mers at the 5’end of the 
reads.3 .......................................................................................................................................... 220 
Figure B.135 The per sequence GC content for sample C3-R2. The x- and y-axes 
represent the % GC content per read and read counts, respectively. The blue line 
represents the GC content theoretical distribution and the red line represents the actual 
GC content for the 150 bp reads. The shift away from the mean %GC = 47 indicates 
bacterial contamination. The sharp peak may indicate overrepresented bacterial reads. ........... 220 
Figure B.136 The per sequence quality score for sample C3-R2. The x- and y- axes 
represent the mean quality (Phred Score) and the number of reads, respectively. The data 
here indicate that the majority of the quality scores were > 38. ................................................. 221 
Figure B.137 The distribution of the sequence lengths for sample C3-R2. The x- and y-
axes refer to the read sequence lengths and the number of reads with those lengths. The 
data here indicates that all reads were 150 bp. ............................................................................ 221 
Figure B.138 The glycolysis/gluconeogenesis metabolic pathway with genes present 
(green) in the RGd-1 genome. The pathway was searched and populated using the online 
platform, KeggMapper.4 ............................................................................................................. 222 
Figure B.139 The pyruvate metabolic pathway with genes present (green) in the RGd-1 
genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 ............................................................................................................................. 223 
xxxviii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure B.140 The fatty acid degradation metabolic pathway with genes present (green) in 
the RGd-1 genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 ............................................................................................................................. 224 
Figure B.141 The glycerolipid metabolic pathway with genes present (green) in the RGd-
1 genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 ............................................................................................................................. 225 
Figure B.142 The ⍺-linoleic acid metabolic pathway with genes present (green) in the 
RGd-1 genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 ............................................................................................................................. 226 
Figure B.143 The arachidonic acid metabolic pathway with genes present (green) in the 
RGd-1 genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 ............................................................................................................................. 227 
Figure C.1 RGd-1 cellular morphologies as imaged using field emission – scanning 
electron microscopy. The RGd-1 morphology in 2009 (left), and different cell 
morphologies in 2014 (middle and right). .................................................................................. 230 
Figure C.2 The light intensity at Witch Creek (late morning August 2012, left) and 
laboratory fluorescent grow lights (right). Two measurements were taken in the field 
(1807 and 1828 uW cm-2 nm-1) and three measurements were taken for the MSU lab light 
systems (421, 412 and 379 uW cm-2 nm-1). ................................................................................ 235 
Figure C.3 The control (without a color filter) spectroradiometer measurement at 22% 
LED intensity. The spectroradiometer measurements were taken inside empty 
photobioreactor tubes. ................................................................................................................. 237 
Figure C.4 Spectroradiometer measurement (Ocean Optics) for the Rosco filter #313 
(light yellow) with low blue intensity using 22% LED intensity. The spectroradiometer 
measurements were taken inside empty photobioreactor tubes. ................................................. 237 
Figure C.5 Spectroradiometer measurements (Ocean Optics) for the Rosco filter #6 
(yellow) high blue intensity using 22% LED intensity. The spectroradiometer 
measurements were taken inside empty photobioreactor tubes. ................................................. 238 
Figure C.6 Spectroradiometer measurements (Ocean Optics) for the Rosco filter #384 
(blue) very high blue intensity, and low intensity for other wavelengths and 22% LED 
intensity. The spectroradiometer measurements were taken inside empty photobioreactor 
tubes. ........................................................................................................................................... 238 
xxxix 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure C.7 Cell concentrations for the 4 blue light conditions tested, the no light filter 
control, light yellow (#313), yellow (#6) and blue (#384). The error bars represent the 
standard deviation of the mean. .................................................................................................. 239 
Figure C.8 Total chlorophyll concentrations (mg/mL) for each of the 4 blue light conditions 
tested, the no-filter control, light yellow (#313), yellow (#6) and blue (#384). The error bars 
represent the standard deviation of the mean. ............................................................................. 241 
Figure C.9 The Nile Red fluorescence (rfu) for the 4 blue light conditions tested, the no-
filter control, light yellow (#313), yellow (#6) and blue (#384). The error bars represent 
the standard deviation of the mean. ............................................................................................ 241 
Figure D.1 Speciation of As(III) and As(IV) across pH ranges -2 to 14 in water.57, 58 .............. 249 
Figure D.2 The Nile Red fluorescence for the initial testing with sodium arsenate. The 
error bars represent the standard deviation of the mean. ............................................................ 255 
Figure D.3 The doubling times for the initial testing with sodium arsenate. Two 
conditions did not grow; Witch Creek Water with Bold’s additions at pH 8 + EDTA. This 
condition was repeated to verify the result. The error bars represent the standard deviation 
of the mean. ................................................................................................................................. 256 
Figure D.4 Cell counts for four phosphorus to arsenic ratios (10:1, 5:1. 2:1 and 1:1) and a 
control (B8.7SiS phosphorus concentrations). The error bars represent the standard 
deviation of the mean. ................................................................................................................. 258 
Figure D.5 The pH for four phosphorus to arsenic ratios (10:1, 5:1. 2:1 and 1:1) and a 
control (B8.7SiS phosphorus concentrations). The error bars represent the standard 
deviation of the mean. ................................................................................................................. 259 
Figure D.6 The total Nile Red fluorescence for four phosphorus to arsenic ratios (10:1, 
5:1. 2:1 and 1:1) and a control (B8.7SiS phosphorus concentrations). The error bars 
represent the standard deviation of the mean. ............................................................................. 259 
Figure D.7 The doubling times for the different arsenate concentrations tested. The error 
bars represent the variance resulting from an ANOVA (2-factor without replacement) 
analysis. The error bars represent the standard deviation of the mean. ...................................... 260 
Figure D.8 The average ash-free dry weights for the different arsenate concentrations 
tested. The error bars represent the variance resulting from an ANOVA (2-factor without 
replacement) analysis. The error bars represent the standard deviation of the mean. ................ 261 
xl 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure D.9 The doubling time for RGd-1 cultures grown in the presence of sodium 
arsenite in place of sodium arsenate at varying concentrations. ................................................. 264 
Figure D.10 The Nile Red fluorescence for RGd-1 grown in the presence of sodium 
arsenite instead of sodium arsenite at varying concentrations. ................................................... 264 
Figure E.1 Synteny alignment of partial chromosomes 4 and 8 between A17 and R108 
confirms rearrangement of the long arms of the chromosomes. ................................................. 289 
Figure E.2 Synteny alignment of partial A17 chromosomes 4 and 8 against syntenic 
regions in the R108 Illumina-based assembly (top panel), PacBio-based assembly (Pb, 
middle panel) as well as the gap-filled PbDtBn (v1.0) assembly (bottom panel). ..................... 291 
Figure E.3 Schematic of the rearrangement between chromosomes 4 and 8 in A17 (left) 
compared to R108 (right). Green segments indicate homology to A17’s chromosome 4 
while blue segments indicate homology to A17 chromosome 8. Red segments indicate 
sequences not present in the A17 reference). Breakpoint 1 (br1) is pinpointed to a 104 bp 
region (chr4:39,021,788-39,021,891) and includes a 100 bp gap. Breakpoint 2 (br2) is 
pinpointed to a 7,665 bp region (chr8:33,996,308-34,003,972) and includes a 7,663 bp 
gap. Breakpoint 3 (br3) is pinpointed to a 708 bp region (chr8: 34,107,285-34,107,992) 
and includes a 100 bp gap. Breakpoint 4 is pinpointed to a 277 bp region 
(chr8:34,275,249-34,275,525) and includes a 100 bp gap). ....................................................... 292 
Figure F.1 The biological recycling of carbon, nitrogen, and phosphorus to harvest fuel 
and food linked to sunlight to reduce net consumption of N and P and net production of 
C. ................................................................................................................................................. 313 
Figure F.2 Hypothetical performance curve for an increasingly perturbed (i.e., stressed) 
microalgal system being used to produce photoautotrophic biomass and/or lipids.  
Adapted from Odum et al. (1979).175 .......................................................................................... 314 
Figure F.3 Primary stages and (alternative processes) in the microalgae to fuel production 
process. ........................................................................................................................................ 326 
Figure G.1 Representative photographs for: (A) the field-RABR and (B) lab-RABR 
culturing systems designed for algal biofilm culturing (insert shows cross-sectioned 
excised cotton cord substratum with biofilm growth). Note the ‘top’ and ‘bottom’ biofilm 
orientation corresponding to the outer and inner sections of the field-RABR wheel, 
respectively. ................................................................................................................................ 334 
 
xli 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure G.2 Field-RABR: dissolved oxygen microprofiles measured in the light extending 
from the surface for biofilms grown on the (A) outer wheel surface and (B) inner wheel 
surface; dissolved oxygen microprofiles measured in the dark for biofilms grown on the 
(C) outer wheel surface and (D) inner wheel surface; and photosynthesis profiles 
extending from the surface for biofilms grown on the (E) outer wheel surface and (F) 
inner wheel surface. Note that the biofilm surface position (depth = 0) is approximated by 
the position at which oxygen responses were measureable (subject to ± 25 µm error or ± 
100 µm error for the photosynthesis profiles where each data point is a representative 
gross volumetric photosynthesis rate from 2-3 replicates.) and individual data points 
represent the mean values from 3-4 replicate profiles in both light and dark conditions. 
Error bars represent plus or minus one standard deviation. Dotted lines indicate the 
photic-zone termination depth, estimated from the light:dark shift method. Note the scale 
change on the x-axis. ................................................................................................................... 342 
Figure G.3 lab-RABR: dissolved oxygen microprofiles measured in the light extending 
from the surface for biofilms grown in (A) nitrate replete and (B) nitrate deplete 
conditions; dissolved oxygen microprofiles measured in the dark for biofilms grown in 
(C) nitrate replete and (D) nitrate deplete conditions; and photosynthesis profiles 
extending from the surface for biofilms grown in (E) nitrate deplete and (F) nitrate 
deplete. Note that the biofilm surface position (depth = 0) is approximated by the position 
at which oxygen responses were measurable (subject to ± 25 µm error or ± 100 µm error 
for the photosynthesis profiles where each data point is a representative gross volumetric 
photosynthesis rate from 2-3 replicates.) and individual data points represent the mean 
values from 3-4 replicate profiles in both light and dark conditions. Error bars represent 
plus or minus one standard deviation. Dotted lines indicate the photic-zone termination 
depth, estimated from the light:dark shift method. Note the scale change on the x-axis. .......... 349 
Figure H.1 Growth curve from log transformed data showing the exponential phase (day 
3-10) and stationary phase (day 11 -18). Insert: Equations and R2 describing the 
exponential phase for both biofilms with and without bicarbonate amendment. ....................... 365 
Figure H.2 Growth curves for attached and suspended microalgae (A) and dissolved 
inorganic carbon (DIC) concentrations (B) in laboratory-RABRs amended with 
bicarbonate and without bicarbonate addition. Error bars for algal biofilm yield and DIC 
measurements represent standard deviation (n=4). Error bars for suspended growth 
represent range (n=2). Verticle dotted lines represent end of 5 day hydraulic retention 
time. ............................................................................................................................................ 367 
Figure H.3 Ammonium, nitrate, nitrite, and phosphate ion concentrations in medium 
amended with bicarbonate and without bicarbonate addition. Error bars represent range 
for (n=2). Verticle dotted lines represent end of 5 day hydraulic retention time. ....................... 369 
xlii 
 
LIST OF FIGURES CONTINUED  
Figure Page 
Figure H.4 Steady state oxygen microprofiles for illuminated algal biofilms under 
nitrogen replete (A) and nitrogen deprived (B) conditions. Error bars represent standard 
deviation of replicate profiles (n=3); steady state oxygen microprofiles in the dark for 
algal biofilms under nitrogen replete (C) and nitrogen deprived (D) conditions. Error bars 
represent standard deviation of replicate profiles (n=3); and representative photosynthesis 
profiles for algal biofilms under nitrogen replete (E) and nitrogen deprived (F) 
conditions. Zero depth (surface) is at the algal biofilm/air interface. ......................................... 371 
Figure H.5 Total FAMEs and free fatty acid composition of the FAMEs. A: Mean percent 
FAME (w/w), B: percent lipids (w/w), C: areal concentration (g m-2). Error bars 
represent range (n=2). ND and NR represent nitrogen deprived and replete algal biofilms, 
respectively. ................................................................................................................................ 378 
 
  
xliii 
 
GLOSSARY 
 
YNP   Yellowstone National Park 
FAME   Fatty acid methyl ester 
TAG   Triacylglycerol 
DIC   Dissolved inorganic carbon 
DCW   Dry cell weight 
GC-MS  Gas chromatography mass spectrometry 
PAR   Photosynthetically active radiation 
BP   Biofuel potential 
rfu   Relative fluorescence units 
AFDW  Ash free dry weight 
NGS   Next generation sequencing 
ASP   Aquatic Species Program 
GB   Gigabyte 
UTR   Untranslated region 
EST   Expressed sequence tag 
ZMW   Zero mode wave  
SNA   Single nucleotide addition 
PPi   Pyrophosphate 
SBS   Sequencing by synthesis 
TIRF   Total internal reflection fluorescence 
HMW   High molecular weight (DNA) 
OUT   Operational taxonomic units 
 
 
  
xliv 
 
ABSTRACT 
Alternatives are needed to avoid future economic and environmental impacts from continued 
exploration, harvesting transport, and combustion of conventional hydrocarbons resulting in a 
rise of atmospheric CO2. Microalgae, including diatoms, are eukaryotic photoautotrophs that can 
utilize inorganic carbon (e.g., CO2) as a carbon source and sunlight as an energy source, and 
many microalgae can store carbon and energy in the form of neutral lipids. In addition to 
accumulating useful precursors for biofuels and chemical feed-stocks, the use of autotrophic 
microorganisms can further contribute to reduced CO2 emissions through utilization of 
atmospheric CO2. Most microalgal biofuel research has focused on green algae. However, there 
are good reasons to consider diatoms for biofuel research. Diatoms are responsible for 
approximately 40% of marine primary productivity, are important in freshwater systems, and are 
known to assimilate 20% of global CO2. Identification and implementation of factors  that can 
contribute to rapid growth will minimize inputs and production costs, thus improving algal 
biofuel viability. Nine green algae strains that were isolated from Witch Creek, Yellowstone 
National Park, were compared to two culture collection strains (PC-3 and UTEX395) for growth 
rates, dry cell weights and lipid accumulation. The strains exhibiting the fastest growth rates 
were WC-5, WC-1 and WC-2b. The culture collection strain was the best biomass producer and 
WC-5 and UTEX395 were the most productive for lipid. Based on the growth rates and lipid 
content, the best strains for biodiesel production were WC-1 and WC-5. In addition to the green 
algae strains, diatom strain, RGd-1 has previously been found to accumulate 30-40% (w/w) 
triacylglycerol and 70-80% (w/w) fatty acid methyl esters that can be transesterified into 
biodiesel. The RGd-1 was sequenced via Illumina 2x50 and PacBio RSII reads and genome 
comparisons revealed that the RGd-1 genome is significantly divergent from other publicly 
available genome sequences. RGd-1 was found to have nearly complete metabolic pathways for 
fatty acid elongation using acetyl-CoA in the mitochondrion or malonyl-CoA in the cytoplasm. 
The ability to switch between two different starting substrates may confer an advantage for fatty 
acid and neutral lipid biosynthesis. Further, RGd-1 was found to use the glyoxylate shunt as part 
of its central carbon metabolism. This carbon conservation pathway may potentially explain why 
RGd-1 is able to produce high concentrations of lipids. Using IlluminaÒ MiSeq sequencing it 
was possible to obtain thorough community analysis of bacteria associated with RGd-1 in 
culture. Nine primary taxa were identified and further research will elucidate their roles as 
potential phycosphere bacteria that may have specific functional roles that contribute to RGd-1 
health. With long-range PacBio reads, RGd-1 was found to have a potential bacterial symbiont, 
Brevundimonas sp.  
 1 
CHAPTER ONE 
INTRODUCTION 
Background 
As the world population continues to increase, especially in countries with high energy 
needs such as the United States, China, and India, transportation fuel demand will increase,1 
likely exceeding production.2 However, the search for alternative fuels has been impeded by 
reluctance to invest in technologies that cannot compete with low oil prices. For instance, the 
U.S. Department of Energy’s “Aquatic Species Program” (ASP), after screening more than 3,000 
green algae and diatom species as microalgal biofuel candidates, was cut because the cost of 
algal biofuel production was higher than that of petrofuels production.3 
Improving algal biofuel viability is beneficial for renewable fuel resources and 
minimizing US dependence on non-renewable fossil fuels. Regardless of current market 
conditions and availability of conventional petroleum sources, alternatives are needed to avoid 
future economic and environmental impacts from continued exploration and harvesting of 
conventional hydrocarbons. Microalgae, including diatoms, are eukaryotic photoautotrophs that 
can utilize inorganic carbon (e.g., CO2) as a carbon source and sunlight as an energy source, and 
many microalgae can store carbon and energy in the form of neutral lipids. In addition to 
accumulating useful precursors for biofuels and chemical feed-stocks, the use of autotrophic 
microorganisms can further contribute to reduced CO2 emissions through utilization of 
atmospheric CO2.4 For these reasons, microalgae have been studied in the context of lipid 
accumulation for over 50 years. Even with conservative estimates of lipid accumulations (25–
30% [w/w]), microalgae could replace 50% of U.S. transportation fuel need with an area 
 2 
equivalent to 3% of U.S. arable cropland.5, 6 With revived interest in microalgal fuels,7-9 
significant fundamental and applied research is needed to fully maximize biomass and 
biochemical production for biofuels and renewable biochemicals.4  
Most microalgal biofuel research has focused on green algae, however, diatoms are 
unique photoautotrophs that also can produce lipids. Diatoms are responsible for approximately 
40% of marine primary productivity and are known to assimilate 25–45% of global CO2.10-12 To 
provide perspective on how much CO2 diatoms fix per year, the Amazon Rainforest fixes 
approximately 2 gigatons of CO2 from the atmosphere each year compared to the 50 gigatons 
fixed each year by diatoms globally.13, 14 Further, in addition to high lipid accumulation, diatoms 
can accumulate high concentrations of other carbonaceous compounds useful for production of 
renewable fuels and high value coproducts.15 
Identification and implementation of factors that will contribute to rapid algal/diatom 
growth while minimizing inputs into algal growth systems and production costs will improve 
algal biofuel viability. Here, I focused on screening 11 green algae strains for biodiesel 
production while sequencing and assembling the RGd-1 genome to identify potential novel genes 
and pathways that are responsible for RGd-1 accumulating high concentrations of lipids. 
However, given some of the unique characteristics of diatoms and the low number of available 
diatom genomes, the sequencing, assembly, and analysis of diatom genomes is not without 
challenges.   
Difference between Prokaryotic and Eukaryotic Genome Projects 
Eukaryotic genomes are inherently more complex than bacterial and archaeal genomes. 
The smaller genomes of bacteria and archaea are relatively simple to sequence and assemble. 
Unlike bacteria and archaea, eukaryotes contain introns, long repetitive regions that are difficult 
 3 
to sequence and assemble and, have alternative gene-splicing.16 The final mRNA will contain a 
selection of exons that were differentially expressed under different growth conditions and 
alternative splicing represents different isoforms of a gene.16 Furthermore, eukaryotic genomes 
may have multiple chromosomes and be polyploid, which can make the creation of one reference 
genome assembly difficult.16 As a result, there are extra challenges associated with creating a 
reference assembly for eukaryotic genomes, especially if there is not a reference available for a 
similar strain.  
Sequencing Types 
Sequencing by Synthesis 
Next generation sequencing was revolutionized by the high-throughput, sequencing by 
synthesis (SBS) technologies. In SBS, the addition of each labelled nucleotide is tracked. Here 
we will focus on two types of SBS technologies; pyrosequencing (454 Pyrosequencing) and 
sequencing by reversible termination (Illumina).17  
454 Pyrosequencing 
454 Pyrosequencing was the first Next-Generation Sequencing type.18 As a 
bioluminescent method, light is produced following the release of an inorganic phosphate using a 
single-nucleotide addition (SNA) method, where pyrophosphate (Ppi) is released and is 
converted to ATP by ATP sulphurylase using adenosine 5’phosphosulfate.19 In the presence of 
ATP, luciferase converts luciferin to oxyluciferin to generate the light.19 Once dNTPs have been 
incorporated, DNA polymerase extends the primer and pauses.17 DNA synthesis is continued 
when the next round of dNTPs is dispensed. The DNA sequence is determine by the order of the 
bases and intensity of the light produced by the addition of each base.17 In 454 Pyrosequencing, 
 4 
there is only one signal that indicates the addition of a nucleotide. Therefore, base must be added 
individually at each position. The light is imaged with a charge-coupled device (ccd).17 
Insertions and deletions are the most common error type and homopolymersequencing where the 
addition of five to six identical bases cannot be detected accurately.17, 19 Overall, the accuracy of 
454 Pyrosequencing is at least 99%.20  
Illumina 
Illumina is perhaps the most successful of the short read technologies. Illumina uses 
cyclic reversible termination (CRT) technology for nucleotide incorporation, fluorescence 
imaging, and cleavage.17, 18 DNA fragments are clonally amplified by bridge PCR, or solid-phase 
amplification, where forward and reverse primers are attached to the flow cell.17, 19  First, DNA 
polymerase incorporates a fluorescently-labeled nucleotide that is complementary to the template 
DNA that becomes temporarily suspended from incorporating additional bases due to the 
blocked by 3′-O-azidomethyl group ribose-3’OH groups. Once the base has been incorporated, 
the remaining unincorporated fluorescently-labeled nucleotides are washed away. Each of the 
four bases is labeled consistently with a specific color. The incorporated nucleotide is imaged to 
determine which base was incorporated, after which, the terminating and fluorescently-labeled 
groups are removed. Each of the four colors is detected by total internal reflection fluorescence 
(TIRF) that utilizes two to four lasers, depending on the sequencer type17, 18. Regeneration of the 
3’OH group occurs using a reducing agent such as tris(2-carboxyethyl)phosphine (TCEP)17. 
Substitutions are the most common error with this chemistry.  
 
 
 5 
Long-Read Technologies 
Pacific Biosciences. Unlike the chain termination method utilized by Illumina, Pacific 
BioSciences (PacBio) utilizes real-time, continuous base-addition measurements. In PacBio 
chemistry, DNA polymerase is attached to the bottom of zero-mode wave (ZMW) detectors to 
continuously measure the addition of each base in real time. Following the addition of a 
nucleotide, the phosphate group containing a fluorophore is cleaved, resulting in fluorescence.17 
Once the hexaphosphate group is cleaved off, the fluorescent signal is quenched, ultimately to 
background levels. Similar to Illumina technology, each fluorophore corresponds to a different 
nucleotide. This NGS platform is known to incur significant sequencing errors (83%).17 
However, with sequencing depth and error correction, PacBio reads have been found to be 99% 
correct. The PacBio HiFi chemistry to the PacBio Sequel is highly accurate at 99% and requires 
less coverage, approximately 20X coverage compared to 100X coverage needed on the RSII 
system.21-24  
Genome Assembly 
Scaffolding Approaches 
In recent years, there has been a renewed interest in optical-mapping technology to 
improve scaffolding for genome assemblies. BioNano Genomics offers long-read, low-resolution 
scaffolds generated by one or more restriction endonucleases,18 a method that is independent of 
sequencing. This technology employs restriction site mapping, where restriction enzymes 
introduce fluorophores into restriction sites. The restriction enzymes introduce nicks at specific 
motifs into double-stranded DNA, creating a 3’OH. DNA polymerase 1 catalyzes the addition of 
fluorescently labeled Alexa 546 dUTP fluorescent dyes that conjugate to the 3’OH that was 
 6 
introduced to the double-stranded DNA by the restriction enzyme. The 5’-3’ exonuclease activity 
removes nucleotides from the 5’ phosphoryl terminus of the nick. Labeled and unlabeled 
nucleotides replace excised nucleotides in the sequences, displacing the original DNA strand. To 
image the long DNA strands, DNA is labeled using the intercalating dye, YOYO-1. The high-
molecular-weight DNA fragments are aligned based on the position of the fluorescent restriction 
sites, that appear as “dots on a string”, to ultimately create a “consensus genome map”.25, 26 A 
reference assembly is then digested in-silico using the same restriction enzyme that was used to 
create the consensus genome map. The long-reads can improve scaffolding or indicate any 
erroneously assembled scaffolds such as inversions, insertions, deletions, and gaps regions in 
comparison to the reference assembly.25, 26 
The labeled DNA is linearized on an IrysChip flow cell once electrophoretic movement 
unravels the DNA. Once the DNA is linear, the current is shut off and the DNA is imaged. Read 
sizes may vary from hundreds of kilobases to megabase-sized reads.25, 26 Once imaging has 
finished, molecules are flushed and the process is repeated, producing several gigabyte (GB) per 
hour. BioNano technology is a high throughput technology that can process thousands of parallel 
channels. Within each channel, only one long strand may be linearized at a time. Once 
linearized, the Irys CCD detector captures the images. 
Combining Technologies 
A recent assembly for the domestic goat (Capra hircus) used a combined strategy with 
PacBio, Illumina short-reads, BioNano maps, and Phase Genomics Hi-C chromatin interaction 
maps.27 Initial contigs were assembled using the PacBio reads, which were further scaffolded 
and error-corrected by the BioNano maps and finally combined with the Hi-C chromatin maps. 
Gaps were filled using PacBio reads, and the final assembly was polished using the Illumina 
 7 
reads resulting in 2.92 Gb assembly with a contig N50 = 19Mb and scaffold N50 = 87 Mb. This 
combination resulted in the best assembly. 
Moll et al. (2016) used a similar approach to generate an assembly for the Medicago truncatula, 
a model organism (clover) that is used to study alfalfa.28 The methods were similar to the goat 
assembly; however, instead of using Illumina reads for error correction or as part of the genome 
assembly, Dovetail Genomics was used, a type of Hi-C mapping that uses Illumina reads with 
the Chicago library preparation. Here, long-range information was generated with high accuracy 
using the Dovetail Chicago Library preparation method that uses restriction endonucleases on 
high-molecular-weight DNA.28, 29  
What Is a Good Genome Assembly? 
  There are a number read types (e.g. Illumina or PacBio), read length, quality and library 
preparation that may contribute to an assembly.17 While there typically is not a standardized 
protocol for the steps required for a genome assembly, the following measures indicate the 
assembly quality:  
(1) Short read alignment. Through validation by the alignment of short-reads against the 
genome assembly.30 Generally, the higher the alignment percentage (≥80%), the greater 
that assembly can capture the high-fidelity reads. A lower alignment may indicate a 
fragmented assembly that may cause problems with gene calling, as evidenced by 
BUSCO (see below).  
(2)  N50. The minimum contig length that represents the assembly 50% of the genome. 
While there is no set standard for the N50 required for a good genome, the higher the 
N50, the more contiguous the genome assembly.16 With long-reads generated from 
 8 
PacBio and scaffolding technologies such as BioNano,25 it is possible to get N50s in the 
megabase range.  
(3) Gene capture. Programs such as BUSCO (Benchmarking Universal Single-Copy 
Orthologs) assume that orthologs are present for at least 90% of the organisms within a 
lineage.31 When a genome assembly is tested against a lineage, it is possible to determine 
the number of single-copy, duplicated, missing, or fragmented orthologs in an 
assembly.31 Therefore, it is of utmost importance to choose a lineage that is relevant to 
the organism represented in the assembly.  
(4) Kmer analysis. Kmers are subsequences, length K that can provide insight into a genome 
assembly such as genome complexity, ploidy, heterozygosity, repeats, and 
contamination.32-34 A kmer profile measures how often kmers of each length occur within 
the short, unassembled reads.33 A comparison of kmer analyses between the reads and the 
genome assembly can be used to determine what fraction of reads were incorporated into 
the assembly. 
(5) Percent GC. The GC content can be an indicator of sample purity or contamination. Two 
or more GC% peaks may be an indicator of more than one organism represented in the 
reads, e.g. 43% and 58%. Alternatively, the extra peaks may be over-represented 
organellar reads. 
 
Using the criteria outlined above, it is possible to determine whether different experimental 
strategies are required for strain isolation, additional sequencing is needed to improve the 
assembly or a different sequencing approach is needed to account for a more complex 
community than anticipated.  
 9 
Eukaryotic Genome Annotation 
 Eukaryotic genome annotation using sequence assembly alone yields less accurate 
results. The reasons are multifactorial: the intronic nature of eukaryotic genomes, alternative 
splicing, and LTR retrotransposons that may resemble open-reading frames. Unlike prokaryotes, 
eukaryotic ab initio gene prediction using sequence assembly alone cannot identify intron/exon 
boundaries, untranslated regions (UTR), or alternative splice sites. Further, regions of long-
terminal-repeat (LTR) retrotransposons can be identified as protein-coding sequences.16 
Evidence-based methods, such as expressed sequence tags (EST), RNA-seq, or proteomics, are 
required to identify these sites. For example, approximately 86% of the P. tricornutum genome 
assembly annotation was supported by the 130,000 ESTs acquired from 16 different growth 
conditions.35  
 Using whole-genome alignments, there is significant divergence between the diatoms 
whose genomes are available. For instance, the P. tricornutum and T. pseudonana (both marine 
diatoms) genomes are 57% identical on the nucleotide level.36 While it is expected that there will 
be greater sequence similarity between RGd-1 and P. tricornutum due to their pennate cell 
morphology, we do not expect the genome wide-sequence similarity to be high enough to use P. 
tricornutum or T. pseudonana transcriptome or protein files to facilitate genome annotation. 
Conclusion 
At present, there are only two published, publicly available diatom genomes 
(Phaeodactylum tricornutum and Thalassiosira pseudonana) with quality assemblies.35-38 
Cyclotella cryptica was published in 2016 and was previously available, but has since been 
removed from the UCSC Genome Browser where it had previously been deposited.37 Two other 
diatom projects (Fragillariopsis cylindrus and Pseudo-nitzschia multiseries) are in draft form.38-
 10 
40 P. tricornutum and T. pseudonana are the most advanced and characterized diatom genomes 
available for study.35, 36  
 With only two diatom genomes (Phaeodactylum tricornutum and Thalassiosira 
pseudonana) that are assembled to the chromosome level and several other diatom genomes in 
draft, RGd-1 represents a novel genome from an extremophilic, alkaline, freshwater stream in 
Yellowstone National Park (YNP). Analysis of the RGd-1 genome will provide insight into 
novel genes and specialized pathways that may allow RGd-1 to accumulate high lipid 
concentrations. 
Dissertation Overview 
To date there have been 2 marine diatoms that have had their genomes sequenced and 
made publicly available. The RGd-1 genome is markedly different from any other diatom 
sequenced because it is an extremophilic, freshwater diatom. Furthermore, with long-range 
PacBio reads, RGd-1 was found to have a potential bacterial symbiont, Brevundimonas sp. Using 
IlluminaÒ Mi-Seq sequencing it was possible to obtain a community analysis of bacteria 
associated with RGd-1 in culture. A total of 9 prokaryotic operational taxonomic units (OTUs) 
were identified and further research could elucidate their roles as potential phycosphere bacteria, 
where they may have specific functional roles with RGd-1 in the area around an algal cell.41-48 
Following the introduction in Chapter One, Chapter Two titled “Biodiesel (Microalgae)” 
from the book, “Extremophilic Microbial Processing of Lignocellulosic Feedstocks to Biofuels, 
Value-Added Products, and Usable Power”, provides an overview of extremophilic algae, 
isolation, and their use for biodiesel production and secondary high-value coproducts.  
 11 
 Transitioning from a broad overview of extremophilic algae for use in biofuels, Chapter 
Three discusses how to isolate strains, determine whether they are unialgal using sequencing 
methods and determine which strains are promising for further optimization through 
physiological characterization. Specifically, 11 strains that were isolated from Witch Creek, 
Yellowstone National Park, WY, USA, and two culture-collection strains were grown with and 
without sodium bicarbonate addition. To facilitate the analysis, the strains were divided into 
fastest growers, lipid producers, and biomass producers to determine if they were promising for 
biodiesel production or other applications. 
 After examining algae on a physiological-level, Chapter Four discusses the RGd-1 
genome sequencing, assembly and annotation. This organism was chosen because it was found 
to naturally contain 30-40% (w/w) triacylglycerol and 70-80% (w/w) fatty acid methyl esters that 
can be transesterified into biodiesel,49 among the highest biofuel potential (BP) published in the 
literature. The majority of this chapter discusses the RGd-1 genome and 16S amplified results for 
the microbial community in the RGd-1 culture that is thought to be the phycosphere. Of 
particular focus is the Brevundimonas sp. genome assembly that was assembled as part of an 
RGd-1 Pac-Bio sequencing project. Collectively, a genome assembly and its associated 
community members are referred to as the hologenome, or the entire genome.50 With the RGd-1 
genome assembly and data from the 16S amplified sequencing, it is possible to gain insight into 
the RGd-1 ecological interactions, ecology, and evolution.  
The Brevundimonas sp. genome was introduced in Chapter 4, and Chapter Five builds on 
this by providing the genome announcement for the Brevundimonas sp., in preparation for 
publication including its genome assembly statistics, and genes of interest that were identified 
 12 
from the genome annotation. Finally, Chapter Six summarizes the work performed in this 
dissertation and proposes future directions.  
  
 13 
CHAPTER TWO 
BIODIESEL (MICROALGAE)  
Contribution of Authors and Co-Authors 
Manuscript in Chapter 2 
 
Author: Karen Moll 
 
Contributions: Wrote the book chapter 
 
Co-Author: Todd Pederson 
 
Contributions: Wrote the book chapter  
 
Co-Author: Robert D. Gardner 
 
Contributions: Wrote the book chapter 
 
Co-Author: Brent M. Peyton 
 
Contributions: Discussed, commented, and edited the book chapter 
  
 14 
Manuscript Information 
Karen Moll, Todd Pederson, Robert D. Gardner, Brent M. Peyton 
 
“Extremophilic Microbial Processing of Lignocellulosic Feedstocks to Biofuels, Value-Added 
Products, and Usable Power.” Chapter 4, “Biodiesel (Microalgae), D.R. Sani, Editor. 2018, pp 
63-78, Springer: New York, NY.” 
 
Status of Manuscript:  
____Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
__x_ Published in a peer-reviewed journal 
  
 15 
Introduction 
 Algal biomass represents a promising renewable energy system due to fast 
photoautotrophic growth rates, CO2 fixation, and accumulation of carbon storage metabolites 
which can be used as precursors for fuels and specialty chemicals; moreover, it offers a solution 
to offset the global dependence on conventional fuels. Currently, transportation fuels make up 
approximately 66% of the global energy demand.51, 52 Correspondingly, extensive use of non-
renewable energy sources has increased the global carbon dioxide (CO2) concentration 
approximately 43% since the use of these fuels was significantly intensified since the Industrial 
Revolution, around 1750 (USEPA 2016). Fossil fuels are subject to volatile price swings based 
on geopolitical issues and availability of crude oil. As sources of crude oil become depleted, 
prices associated with fuels and other petroleum-derived products will experience rapid increases 
in response. Biofuels and other bio-products derived from microalgae have potential to 
contribute significantly to this market, yet how large their impact on the market will be remains 
to be seen.52 
 Microalgae are oxygenic, phototrophic eukaryotes, which are abundantly found across 
diverse environments ranging from acidic hot springs to arctic ice and snow. Like other 
phototrophs, microalgae require light energy, water and a few inorganic nutrients (carbon 
dioxide, nitrogen, phosphorus, iron, etc.) which they convert to biomass with diverse 
biochemical composition.53 There are several advantages to utilizing microalgae for biofuel 
production. First, microalgae have increased theoretical photosynthetic efficiency (10–12%) over 
terrestrial plants (4-6%) and high cell division rates (1–3 day-1) which leads to overall improved 
biomass yield per unit area, which when paired with the ability to be cultivated continuously 
year-round further improves their productivity over terrestrial plants.54 Additionally, microalgae 
 16 
can be grown using brackish, salt, and wastewater sources reducing their demand for freshwater, 
and can utilize nitrogen and phosphorus from agricultural, industrial, and municipal wastewaters 
as low-cost nutrient sources and as a method for remediation of the wastewater.55 Furthermore, 
algae have the potential to reduce carbon emissions if co-located with a power plant to sequester 
portions of the emitted CO2 before it enters the atmosphere.56 Lastly, and perhaps most 
importantly, microalgae frequently have higher lipid content than terrestrial plants,57 and with the 
combination of the other added benefits, often result in higher biofuel productivities on a per 
biomass basis. The inherent advantages to using microalgae for sustainable biofuel and bio-
product formation are well known, though several bottlenecks still exist on the path from lab 
bench to full-scale production of microalgae. 
 One of the primary challenges associated with scale-up of biofuel production is algal 
species selection. For rapid growth in large-scale open systems, a robust species that tolerates 
moderate temperature, pH and salinity changes must be selected to keep productivity high. 
Extremophilic algae are a compelling choice for biofuel production because of their innate 
ability to survive and even thrive on the boundary of extreme conditions.58 Extremophilic 
microalgal strains have an added benefit of growing in conditions that inhibit growth of many 
competing microorganisms, which may allow higher biofuel productivity of targeted strains. 
Some strains isolated from alkaline or halophilic environments have been shown to contain very 
high concentrations of lipids, primarily in the form of triacylglycerol (TAG). Further, alkaline 
environments have greater flux of atmospheric CO2 into the algal growth medium, thus 
increasing inorganic carbon available for fixation. Therefore, extremophilic, microalgal strains 
have the potential to improve algal biofuel viability by providing a more cost-effective 
 17 
production with a greater potential for algal biodiesel productivity and decreased probability for 
significant contamination.  
Microalgae 
 Algal biofuels are derived from two predominant groups, green algae and diatoms, both 
of which are unicellular, photosynthetic eukaryotes (Figure 2.1). Some strains are known to store 
high concentrations of triacylglycerol (TAG) that can be converted into biodiesel. Diatom strain, 
RGd-1, was found to produce 30–40% (w/w) TAG and 70–80% (w/w) biofuel potential (BP) for 
ash-free dry weight.49 An isolate from the Heart Lake area of Yellowstone National Park, RGd-1 
is able to grow in exceptionally high silica concentrations that are often inhibiting for marine 
diatoms.59 Moll et al. (2014) found that RGd-1 maintained the best growth and TAG 
accumulation when grown in 2 mM Si, which is roughly an order of magnitude greater than the 
silica concentration in seawater.49 They went on to further stimulate TAG accumulation by 
adding 25 mM NaHCO3 just prior to nutrient depletion. The greatest lipid accumulation occurred 
during the stress of a combined Si and NO3- limitation with NaHCO3 addition which yielded a 
nearly a two-fold increase in TAG accumulation compared to Si limitation alone (Figure 2). 
Further, NaHCO3 addition increased TAG accumulation compared to only nutrient limitation.49 
 
Figure 2.1 YNP diatom strain RGd-1 (left) and YNP green algal WC-1 (right, scale bar 10µm).  
 18 
Chlorophytes (green algae) are thought to utilize the C3 photosynthesis pathway for carbon 
fixation, whereas diatoms, including Thalassiosira pseudonana, are thought to use the C3 and C4 
pathways.36, 60, 61 However, both mechanisms utilize ribulose-1,5-bisphosphate carboxylase 
oxygenase (RuBisCo) to catalyze the first step in the Calvin cycle. RuBisCo has a relatively low 
affinity for CO2 and is less than half saturated under normal atmospheric conditions.62 By 
consequence, microalgae have evolved carbon concentrating mechanisms (CCMs) to increase 
the carbon flux to RuBisCo.63-65 Algal CCMs are essential two phase processes. In the first 
phase, inorganic carbon is acquired from the environment and shuttled to the chloroplast. During 
the second phase, HCO3 is increased in the chloroplast stroma.65 
Microalgae have a number of carbonic anhydrases and bicarbonate transport channels to 
move inorganic carbon across the periplasmic membrane, through the cytosol, into the 
chloroplast, and convert the carbon to CO2 in the direct vicinity of the Rubisco in the pyrenoid.   
 
Figure 2.2 Each bar represents the fold difference in Nile Red fluorescence intensities at 15 d for 
each treatment compared to the 2 mM Si control. 
 19 
Interestingly, C4 pathways have the extra ability to convert HCO3 directly to a C4 organic acid 
molecule which is shuttled to the pyrenoid and reconverted to CO2 to be used by RuBisCo.61, 66   
 Alkaline environments (e.g., Soap Lake, Washington) often have high bicarbonate ion 
concentrations. Given the current understanding of CCM’s it is not surprising that soda lakes are 
highly photosynthetically productive. Organisms that thrive in these extreme environments have 
physiological adaptations that allow them to be successful under conditions that would be lethal 
to other microorganisms. Diatoms are uniquely suited to living in alkaline environments due to 
their C4 metabolism. Evidence from the Phaeodactylum tricornutum and Thalassiosira 
pseudonana genomes indicates a propensity for C4 metabolism. Valenzuela et al. (2012) found 
evidence for P. tricornutum using C3 and C4 metabolism when dissolved inorganic carbon 
concentrations were low.67 Further, they found an increase in expression for P. tricornutum 
pyruvate carboxylase, malic enzyme and malate dehydrogenase which indicates the presence of 
the C4 pathway.67 This is advantageous by providing another pathway for CO2 fixation, 
especially given that C4 carboxylases are high affinity molecules allowing carbon to be 
concentrated in the chloroplast. As more extremophilic microalgae are isolated, identified and 
characterized, further advances in biotechnology for biofuels and renewable biochemicals will 
become available.  
 Under replete growth conditions, microalgae capable of TAG accumulation will synthesis 
TAG during light hours and utilize the stored carbon for cellular maintenance during dark 
hours.68-70 However, when the cellular cycling is stressed or arrested due to nutrient limitation 
environmental stress (e.g., pH, light, temperature stress),71-74 or by chemical addition,75, 76 many 
algal strains will accumulate and maintain TAG vacuoles within the cell. Thus, industrial algal 
biofuel systems producing TAG as a biofuel substrate have to balance rapid growth with a means 
 20 
of impeding the cell cycle when the culture has reached a desired density.77, 78 Typically, this is 
accomplished by timing cellular density with the depletion of nitrogen in the growth medium, 
however this can often make the culture susceptible to contamination or predation from other 
microorganisms. However, use of extremophilic strains as an industrial algal biofuel platform is 
an under studied tactic and merits additional investigation focused on cellular cycling and TAG 
accumulation. 
Extreme Environments 
 Microalgae have been isolated from extreme environments such as Arctic/Antarctic 
regions and acidic hot springs, as well as from alkaline and/or hypersaline environments. 
Examples of such environments include Yellowstone National Park, Wyoming; Soap Lake, 
Washington; Mono Lake, California; Great Salt Lake, Utah; and the East African Soda Lakes. In 
addition to providing a selective advantage for algal growth, with increased pH, there is 
increased CO2 solubility, leading to enhanced algal growth.79 Soda lakes accumulate very high 
concentrations of sodium carbonate salts due to the limited Mg2+ and Ca2+ concentrations with 
pH 8-12.80 The East African Soda Lakes are among the most productive lakes in the world with 
gross photosynthetic rates up to 36 g O2 m2 day-1 for Lake Nakuru, Kenya.81 Another example of 
a soda lake from which high lipid containing strains have been isolated is Soap Lake, 
Washington which is pH 9.9 and contains very high concentrations of sodium carbonate and 
sodium bicarbonate at 6870 mg L-1 (0.7%) and 5209 mg L-1 (0.5%), respectively.82 Halophiles 
are uniquely adapted to their environments by keeping high concentrations of intracellular K+ to 
compensate for the high extracellular Na+ concentrations. Pick et al. (1986) found that when 
 21 
Dunaliella salina was grown in 1-4 M NaCl, intracellular Na+ concentrations were 20-100 mM 
and K+ concentrations were 150-250 mM.83  
 Witch Creek is an alkaline, freshwater creek located in the Heart Lake area of 
Yellowstone National Park (WY, USA). Witch Creek is approximately two miles long and is fed 
by a combination of fresh groundwater and effluent channels (Figure 2.3) from alkaline hot 
springs with high concentrations of metals such as arsenic (~300 ppb) and silicon (~72 ppm). 
Regular inputs from these thermal features into Witch Creek make the creek alkaline, leading to 
the growth of microorganisms including microalgae that are adapted to the alkaline conditions 
found in Witch Creek. Such microalgae have been isolated and characterized for biofuel 
applications. 
 
 
Figure 2.3 Inputs from thermal hot springs into Witch Creek (research in Yellowstone was 
conducted under an approved Yellowstone Research Permit [Permit # 5480]). 
Targeting Extremophiles 
 For microalgae, extremophilic organisms are those considered to have improved growth 
outside of “normal” environments. For microalgae, these defined “normal” environmental 
parameters are outlined as in Seckbach (2007) as those having an optimum temperature range 
 22 
between 4–40°C, a pH range of 5–8.5, and a salinity range between that of fresh and salt water 
(0%–3.5%). The bulk of extremophilic microalgae species fall into the alkaliphile, acidophile, or 
halophile classifications although there are thermophilic microalgae as well.58 Although the 
majority of microalgae species find their optimum growth somewhere in these defined ranges, 
extremophilic species have been targeted for use in biofuel production because of the generalized 
acceptance that product formation of lipids and other products is increased when the cell cycle is 
ceased and environmental stresses are implemented.15, 84, 85 Table 2.1 highlights several 
extremophilic algae including acidophiles, alkaliphiles, psychrophiles, thermophiles and 
halophiles, and the conditions from which they were isolated.  
 
Table 2.1 Examples of extremophilic microalgae and their desirable temperature, pH, and 
salinity conditions with each having at least one environmental condition outside of normal 
range.  
Extremophilic Condition 
Reference 
Extremophilic Organism Temp pH Salinity 
Dunaliella salina 0-38°C 6-9 3-31% Borowitzka (1990) 
Cyanidium caldarium 35-55°C 2-3 <3% Doemel and Brock (1971) 
Yellowstone Diatom Isolate Rgd-1 
28°C 9.3 <3% Moll et al. (2014) 
Yellowstone Green Isolate WC-1 Scendesmus sp.  
24°C 9.3 <3% Gardner et al. (2010) 
Chlamydomonas nivalis 
1.5-20° 6-8 <3% Remias et al. (2005) 
Aphanothece halophytica 
30°C 7.5 15-30% Madigan et al. (2008) 
Cyanodioschyzon sp.  45-55°C 2.5 <3% Skorupa et al. (2014) 
 
 
 23 
Bioprospecting 
 Bioprospecting for potential strains that can be used for biodiesel production begins by 
matching desired growth conditions (e.g., high pH and salinity values) with natural environments 
that contain those conditions. In addition, locations for microbiological sampling are either on 
public or private property, and written permission to collect samples should be obtained for any 
samples collected. For locating extremophilic microalgae, find environments with a pH value 
below 5 or above pH 8.5, and a salinity range above that of salt water (3.5% w/w) up to sodium 
chloride saturation (35% [w/w]). For temperature limits, the upper limit for microalgal 
(eukaryotic) growth is approximately 57°C.58 However, around 45°C phototrophic growth may 
be dominated by cyanobacterial species. Therefore, with regards to temperature, microalgal 
growth will be primarily in the mesophilic range (20-45°C).86 As shown in Figure 4, microbial 
and microalgal communities can change significantly over very short distances due to gradients 
in temperature, pH, salinity, or nutrient availability, therefore care should be taken to 
characterize specific sampling locations for optimum selection of targeted microorganisms. Once 
areas have been targeted for sampling and written permission is obtained, samples can be taken 
from the area of interest and returned to the lab for isolation.  
 
Figure 2.4 Typical heterogeneity of a sampling site containing green algae, diatoms and 
cyanobacteria (research in Yellowstone was conducted under an approved Yellowstone Research 
Permit [Permit # 5480]). 
 24 
In the approach recommended here, samples should be disaggregated and inoculated into 5 mL 
of various microalgal media types (e.g., Bold’s Basal Medium) to determine which would 
provide the best conditions for growth. Typically, standard algal growth media are not ideal for 
isolation of extremophilic microalgae, and so must be adjusted to higher or lower pH, higher 
salinity, or temperature must be controlled at a higher or lower value. Samples should also be 
streaked for isolation on the appropriate solid growth media. Once the colonies have grown to 
sufficient size, individual, isolated colonies should be aseptically “picked” from the agar plate 
and inoculated into liquid medium (1 mL) and grown until they change the color of the medium 
(often green or brown, for green algae or diatoms, respectively), and subsequently transferred to 
liquid medium (5 mL). After approximately two weeks or once they have reached substantial 
growth, cultures should be streaked for isolation again and re-picked from agar plates for a total 
of three rounds of streaking for isolation and transferring to new medium to ensure isolation 
from other algal species. Following each round of isolation, strains should be observed under 
transmitted light microscopy to determine the cellular morphology.  
 Each isolated strain should be screened for TAG content by staining with Nile Red dye or 
Bodipy 505-515 and observed using epifluorescent microscopy.87 Fluorescence of lipid vacuoles 
appear bright yellow for Nile Red dye under epifluorescent light (Figure 2.5) or green when 
stained with Bodipy. Strains that have the highest TAG content should be selected for further 
characterization, since these will likely have the highest biodiesel potential. Strains found to have 
high concentrations of TAG during the screening process should also have their growth rates 
measured, since a combination of fast growth rate and high lipid production is desirable. Strains 
that show faster growth rates and TAG production should be inoculated into approximately 150 
 25 
mL of liquid medium in 250 mL baffled flasks and grown in triplicate at various temperatures to 
determine the optimal growth and TAG accumulation conditions for each strain.  
 
 
Figure 2.5 RGd-1 transmitted light (left) and Nile Red fluorescence under epifluorescent light 
(right). 
Algae as Biofuels 
         Manufacturing of biofuels and bioproducts from microalgae at industrial scale has many 
proposed methodologies for start-to-finish generation of targeted end products. Similar to a 
conventional petroleum refinery which makes multiple products and fuels from crude petroleum, 
a biorefinery would produce biofuels and other products from algal biomass. An operation such 
as this would have a variety of different steps, but the major sequences are as follows: cultivation 
of algae for biomass generation, harvesting of algal biomass, and extraction/conversion of algal 
biomass. Significant life cycle analyses and techno-economic analyses will be needed with 
regard to each step of this process to determine the most advantageous strategy for each phase of 
the operation.88 
 Primary cultivation systems for the production of algal biomass are through the use of 
raceway ponds (Figure 2.6) or photobioreactors (Figure 2.7). Raceway ponds are typically large 
 26 
closed-loop ponds, in which microalgal culture is continuously circulated through a designed 
path, generally with the use of revolving paddle wheels.5 Algae are most often grown in large 
outdoor raceway ponds because they are one of the most cost efficient ways to grow large 
quantities of algae;56 however, raceway ponds can suffer from large evaporative losses and poor 
mass transfer properties for the application of CO2.89 Still, the major caveat of this type of system 
is that it is non-sterile and has a potential for undesirable contamination from faster growing 
microorganisms that are not biofuel productive. One advantage of using extremophilic algae in 
outdoor raceway ponds is to create pond conditions that are favorable for extremophilic algal 
growth while being inhibitory for faster growing microorganisms that do not produce biodiesel 
precursors, but are relatively low-cost to build and operate. Conversely, photobioreactors are 
collections of small to medium diameter (< 10 cm) transparent tubes which are all oriented 
parallel to each other and usually stacked vertically to increase the reactor volume in a given 
footprint.5 Primary concerns regarding the use of photobioreactors are the design limitations on 
tube length, which is dependent on the degree of O2 production, CO2 depletion, and pH 
variation.54 While photobioreactors can cost more to build, they typically offer a more controlled 
environment and higher productivity than open raceway ponds.3 
 
 27 
 
Figure 2.6 Outdoor raceway pond (2000L) at Utah State University, Logan UT.  
Even dense cultures of microalgae require removal of excess water for downstream processing to 
some extent, and cost-effective and energy-efficient processes for the removal of water and 
subsequent harvesting of algal biomass are required for economical production of algal 
biofuels.56 Harvesting of the algae is an energy intensive step because many conversion 
pathways require the algal biomass to be at substantially higher concentrations than cultures 
grow in nature. Typically, even for extremely productive strains, biomass concentrations will not 
exceed 5% [w/w] suspensions while most conversion strategies require a minimum 20% biomass 
slurry and can require even more dewatering and drying. There are many different approaches to 
harvesting of algae, but the major developed strategies for harvesting algae are  
 
 28 
 
Figure 2.7 Photobioreactor illuminated (Green Wave Energy, Inc.) by artificial light in pilot-
scale laboratory setting at Montana State University, Bozeman MT. 
flocculation and sedimentation, filtration, and centrifugation. Each of these methods has 
advantages and disadvantages, and these methods are often used in combination to reach the 
desired final algae to water ratio. Flocculation and sedimentation is a routine method for 
harvesting algae which do not settle out in well-maintained reactors because of their small cell 
size.51 Flocculation can be obtained through chemical additives such as alum, lime, 
polyacrylamide polymers or surfactants. Following flocculation of the cells, the cells are allowed 
to settle, and excess water can be removed from the top of the cell sediment. Flocculation is also 
commonly used with dissolved air flotation (DAF) where the flocculated biomass is driven 
upward by the attachment of microbubbles where it can be collected at high concentration at the 
tank surface. Another common form of harvesting microalgae is centrifugation. It is likely that 
 29 
centrifugation will play a minor role in harvesting of culture where other harvesting methods fail 
to reach the desired algae content for slurry, as centrifugation can reach higher concentration 
biomass slurries than flocculation with sedimentation or other harvesting alternatives. However, 
centrifugation is an energy intensive process which makes it seemingly unappealing for scale-up 
of algae cultivation for biofuels.90 Filtration is another possible alternative for harvesting algae, 
but has lost some appeal in scale-up from laboratory testing because of the potential for 
membrane or screen fouling as well as the labor-intensive process of operating such a system. 
Combinations of these practices for harvesting algae can reach biomass concentrations in the 
20%–30% [w/w] range required for conversion methods utilizing wet biomass, but are not ideal 
for methods that require whole or dry biomass. If further dewatering is needed after the culture 
harvesting, a drying step will be necessary for removing excess water from algae paste or slurry. 
Thermal drying using methane drum dryers is most commonly practiced, but other oven-type 
dryers have been used, as well as solar drying and freeze-drying of algae slurry.51 
 Conversion of algal biomass can be accomplished through different methods which are 
generally categorized into two categories, thermochemical conversion or biochemical 
conversion.57 From cultivated microalgal biomass there are two major conversion strategies for 
making usable biofuels. The first, and more well known, is biochemical conversion. Most 
common for microalgal biodiesel production is the process of transesterification. Through the 
use of heat and an acid or base catalyst, algal lipids are converted to fatty acid methyl esters 
(FAME’s) and glycerol. These FAME’s are crude biodiesel and are similar in composition to 
those produced from transesterification of vegetable oils. The high lipid content in microalgae, 
primary TAG, makes transesterification an efficient process with production yields between 
70%–90%. While transesterification is primarily a straightforward chemical reaction, it falls into 
 30 
the biochemical conversion category because it does not require the significant energy 
requirements for high temperature and pressure systems typical with thermochemical conversion 
pathways. Another biochemical conversion pathway is fermentation of an algal slurry to produce 
ethanol. Fermentation of algae is a less common method for energy production, mostly because 
of the difficulties associated with the process of separating produced ethanol after the 
fermentation process as well as the relatively low starch content of microalgae compared with 
alternative lignocellulosic biomass. Still, fermentation of lipid extracted algae for conversion of 
the residual carbohydrates may offset costs associated with biofuel production from microalgae. 
         As an alternative to biochemical conversion, thermochemical conversion is currently 
being heavily studied for its application to microalgal biomass. Of the many different techniques 
for thermochemical conversion, hydrothermal liquefaction and pyrolysis are emerging as the two 
benchmark technologies, while gasification and hydrogenation will likely play smaller roles in 
utilizing all products from the conversion process.91 Hydrothermal liquefaction is a process 
which uses sub-critical water at moderate temperature and pressure (~300 °C and 10 Mpa) to 
convert wet biomass into a liquid fuel called primary oil. This oil can be separated and purified 
using a solvent such as dichloromethane. Other products from hydrothermal liquefaction such as 
the aqueous and gas phases can be recycled to supply nutrients for more algae cultivation or used 
in a gasification or hydrogenation process for other products. Pyrolysis is another 
thermochemical conversion process to produce energy rich compounds such as biofuel, charcoal 
and gaseous products from algal biomass. Short residence times and high temperatures (500°C) 
are used to crack biomass into short chain molecules which can then be rapidly cooled into a 
liquid phase to produce biofuel. Pyrolysis has a high-energy input required because it requires 
algal biomass to be completely dried, adding the necessity for 20%–30% algal slurries to have all 
 31 
remaining residual water removed through one of the processes mentioned previously. There is 
still no general consensus on whether biochemical or thermochemical conversion processes will 
be the ultimate solution to producing biofuel from microalgae; however, life cycle analysis and 
techno-economic analysis considering the entire production of algae to biofuels is being 
completed with considerations for both types of conversion methods.92 
Other Secondary Products 
         Microalgae not only offer a source of sustainable biodiesel, but can be used to make an 
assortment of products such as food supplements, fertilizers, bioplastics, nutraceuticals and 
cosmetics.15 For algal biofuels to be economically viable, additional co-products will need to be 
formed in concert with biofuel. In particular, those products which have a combination of the 
highest yields and highest specific selling price (e.g., $/lb.) will be the optimum targets for 
coproducing with biofuels. Two examples of these higher value compounds are carotenoids and 
unsaturated fatty acids, which are both produced naturally by microalgae. Carotenoids are 
colorful pigments which can be used as food and feed additives, as well as nutraceutical 
supplements to promote health. The two most common carotenoids produced naturally by 
microalgae are β-carotene and astaxanthin15. Two specific organisms produce these compounds 
in much greater quantity than any others studied, Dunaliella salina and Haematococcus pluvialis 
for β-carotene and astaxanthin production, respectively. Coincidentally, Dunaliella salina is a 
halophile with optimum sodium chloride concentrations between 10%–27% depending on 
targeted growth regime, but also produces culture rich in β-carotene. Alternatively, 
Haematococcus pluvialis produces high concentrations of astaxanthin when environmentally 
stressed with sodium chloride concentrations in the range of 4%–6%. Furthermore, both these 
 32 
organisms can withstand high light environments and even can be induced to make more of their 
targeted carotenoids with light induced stress.15 
         Mono- and poly-unsaturated fatty acids being another alternative secondary product, they 
must be valued separately from being simply a biofuel precursor. Depending on the intended 
application, omega-3 fatty acids such as α-linolenic acid or eicosapentaenoic acid can be sold as 
health supplements for much higher value as opposed to being converted to biofuel. Similarly, 
the monounsaturated fatty acid oleic acid is a valuable precursor to 9-decenoic acid, which can 
be used to create valuable products such as surfactants, lubricants, and polyester, amongst other 
things.93 A feasible production strategy would make not only biofuels, but some other valuable 
secondary products as well.  
Take Home Message 
● Microalgae are promising candidates for biodiesel production due to their fast growth 
rates, can be cultivated on non-arable land, can use brackish or wastewater and avoid 
competing with food supplies.  
● Some microalgae strains accumulate high concentrations of TAG that can be converted 
into biodiesel. 
● Extremophilic algae are uniquely suited for growth because their growth conditions 
inhibit growth for most contaminating (biofuel non-productive) microorganisms. 
 
 
 
 33 
CHAPTER THREE 
CHARACTERIZATION OF NINE NOVEL GREEN 
ALGAE STRAINS FROM YELLOWSTONE 
NATIONAL PARK 
Contribution of Authors and Co-Authors 
Manuscript in Chapter 3 
 
Author: Karen Moll 
 
Contributions: Designed and performed experiments and primary writer 
 
Co-Author: Robert D. Gardner 
 
Contributions: Designed experiments, performed initial experiments 
 
Co-Author: Todd Pederson 
 
Contributions: Discussed and performed experiments, contributed to writing 
 
Co-Author: Muneeb S. Rathore 
 
Contributions: Discussed and performed experiments, contributed to writing 
 
Co-Author: Brent M. Peyton 
 
Contributions: Discussed, commented, and edited the manuscript 
 
 
 
 
 
 
 34 
Manuscript Information 
Karen M. Moll, Robert D. Gardner, Todd Pederson, Muneeb S. Rathore, Robin Gerlach, Brent 
M. Peyton 
 
Algal Research 
 
Status of Manuscript:  
___x Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
____ Published in a peer-reviewed journal 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 35 
Abstract 
 Many algae have been isolated and characterized for biofuel production capabilities. This 
screening study evaluated nine green algae strains isolated from an alkaline stream in 
Yellowstone National Park (YNP). Two highly characterized culture-collection strains, PC-3 and 
UTEX395, Coelastrella sp. and Chlorella sp. were characterized in parallel. Here, we describe 
the methods used for strain isolation and evaluation from field samples to bench-scale strain 
characterization for potential biomass and biofuel production. Each strain was evaluated for 
growth rate, biomass, and lipid production. The 11 strains were grown with and without sodium 
bicarbonate addition to determine whether added inorganic carbon could stimulate an increase in 
lipid accumulation. While no single strain was ideal for all three evaluation criteria, several 
strains were promising for two of the three criteria indicating that those strains might be good 
candidates for biodiesel or biomass production. Strain WC-5 had a fast growth rate and high fatty 
acid methyl ester (FAME) content and was considered the best candidate for biodiesel 
production. WC-2b performed the best for biomass production due to a fast growth rate and 
comparatively high final dry-cell weight (DCW) compared to the other strains.   
 
Keywords: Algae, Alkaliphile, Biodiesel, Biofuel, Extremophile, TAG, Yellowstone 
  
 36 
Introduction  
Petroleum-based fuels are not sustainable as a long-term resource for transportation fuels. 
As a renewable energy source, algal biofuels can be used to alleviate transport-fuel demands.94 
With the exception of recent economic impacts of the Covid-19 pandemic, the largest spikes in 
fuel prices have triggered economic crises for 10 of 12 U.S. recessions since World War II.95,96 
While it may appear that reserves are increasing, oil that can be accessed most easily is 
diminishing.95, 96 This has resulted in drilling for oil in more dangerous and less ideal locations 
with higher environmental impacts; with exploration of drilling into environmentally, 
culturally,97 historically, or archaeologically sensitive locations such as Arctic National Wildlife 
Refuge, Standing Rock, Bears Ears, and Grand Staircase Escalante. As of 2016, there were 13 
National Parks with energy production within park borders,98 and an average of 550 active oil 
and gas wells are within the U.S. National Park System each year.99.  
A potential alternative to petroleum-based transportation fuels is algal biofuel. The 
benefits of algal biofuels over second-generation (ethanol) biofuels are multiple: algae require 
less land area for growth, have faster doubling times than cellulosic crops, and do not increase 
the prices of food-stocks.94 To improve algal biofuel viability, strain selection has been identified 
as a key factor for improvement.3, 77, 94, 100 Fast growth rates, high lipid accumulation and/or high 
biomass concentration are the most important considerations with regard to algal biofuel success. 
Fast growth rates increase the probability of successful biofuel production using algal 
monocultures or defined mixed cultures. When algal growth rates are fast, there is an increased 
potential that an algal pond will be productive for biofuel by outcompeting contaminating 
(micro)organisms, or low lipid-yielding algae strains.101 Open systems, such as outdoor raceway 
ponds, are susceptible to unwanted algal contamination by wind, rain, animals, and other non-
 37 
sterile sources.102 With slower growth rates, algae may permit faster growing organisms such as 
bacteria and other common contaminants, such as golden algae,103 cyanobacteria, protists and 
bacteria that are naturally found in water, to consume limiting nutrients required by algae for 
growth and lipid accumulation.103-105 If nutrients such as nitrate, phosphate, sulfate, silica, iron or 
magnesium have been depleted by non-productive organisms, conditions for a “pond-crash” will 
result as defined by McBride et al. (2014) and Carney et al. (2016).104-106  
Some algae strains naturally accumulate high concentrations of lipids in the form of 
triacylglycerol (TAG) that can be converted to biodiesel. Previous studies have shown that some 
green algae and diatoms can increase the accumulation of TAG and other lipids by 
approximately two-fold when sodium bicarbonate is added at the appropriate time.49, 70, 107, 108 
However, not all strains respond to this stimulation and each strain must be evaluated to 
determine whether sodium bicarbonate addition will increase lipid accumulation. 
While some strains accumulate lipids, others are known to reach high cell densities and 
dry cell weight (DCW) with low lipid concentrations. The ability for these strains to quickly 
divide and reach high cell densities diverts fixed carbon into new biomass or cells rather than 
into long-chain fatty acids, such as TAGs.  Strains that have fast growth rates and accumulate 
high concentrations of TAGs or reach high DCW are the most promising strains for commercial 
production of desirable algal products. Strain evaluation and improvement are therefore of the 
utmost importance.  
Promising algae strains have been isolated from a variety of unique environments 
including extreme environments such as the polar regions (Arctic/Antarctic), alkaline lakes such 
as Soap Lake, Washington, East African Soda Lakes (Lake Nekuru, Kenya), Great Salt Lake, 
Mono Lake, California and YNP.69, 71, 80, 81, 101, 109-114 Targeting algae from these extreme 
 38 
environments may have added benefits. For example, alkaline growth conditions may inhibit the 
growth of many contaminating microorganisms as well as provide increased CO2 availability.3, 41, 
49, 101 Furthermore, the high pH can potentially pose an added “stress” to further induce lipid 
accumulation.49, 107, 115  
Here, we provide an in-depth analysis of nine green-algae strains that were isolated from 
YNP, WY, USA, and two reference culture-collection strains for growth rate, biomass, and 
biodiesel potential (BP). The nine alkaline-adapted YNP strains and two culture-collection 
strains were evaluated for their potential for biodiesel or biomass production. 
Materials & Methods 
Samples collected from Witch Creek in Yellowstone National Park, WY, USA (pH of 
9.3),116 were streaked for isolation on agar plates prepared from the media types; Bold’s Basal 
Medium adjusted to pH 8.7 (B8.7), ASP-2 200, Bristol’s Medium, ASP-2-Fresh, and B8.7SiS. 
The ASP-2-based media types contained approximately twenty-times the NO3- concentrations 
(59 mM NaNO3) compared to Bristol’s and Bold’s Basal Medium-based media types (2.94 mM 
NaNO3).49, 117-120 Cultures were grown in direct sunlight or in a light incubator (Percival) for at 
least one week. Isolated green colonies were selected from the streak-plates and inoculated into 
1.0 mL of sterile liquid medium, incubated for one to two weeks, and streaked for isolation 
again. A total of three rounds of streaking for isolation and picking individual colonies into 1.0 
mL of sterile liquid medium were performed. For each round of isolation, colony and cellular 
morphologies were documented. At the time of each transfer, colonies were exposed to Nile Red 
fluorescent stain (Sigma Aldrich) to assess the lipid accumulation within the cells87 and were 
 39 
examined with light microscopy to examine cell morphologies. Strains that contained relatively 
large lipid vacuoles were selected for additional characterization (Figure 3.1). 
 
Algae Growth Isolation
Collection of field samples Streak for isolation Repeat 3x
Transfer each colony type to 
Algae isolation growth medium Determine whether strains are 
unialgal and potential 
identification
Following sufficient growth, 
streak for isolation again
Obtain microscopic cell 
morphology and stain with 
Nile Red
 
Figure 3.1 Outline for algae isolation beginning from field collection to strain characterization. 
Each strain was streaked for isolation on solid growth medium, grown in liquid growth medium 
and visualized microscopically at each step to ensure strain isolation. 
Screening Studies 
Following the three rounds of isolation, the culture volume was increased from 1 mL to 5 
mL to 75 mL to 150 mL to ultimately obtain a culture volume sufficient for experimental testing. 
For growth studies, cultures were inoculated in triplicate flasks in B8.749 and grown at room 
temperature (25°C ± 2°C) with light provided at 400 μmole photons m-2s-1 using twelve T5 four-
foot fluorescent lights in a square-wave 14:10 light/dark (L/D) cycle. Experimental parameters 
that were measured include: cell concentrations which were monitored using a hemacytometer 
(Reichert), pH (Accumet) and Nile Red fluorescence described in detail below. Cultures were 
grown for at least 10 days or until Nile Red fluorescence showed a peak indicating that cultures 
had reached a maximum lipid accumulation.49 The cultures exhibiting the fastest growth rates 
 40 
(lowest doubling times) and highest Nile Red fluorescence were selected for additional in-depth 
characterization. 
In-depth Characterization Studies 
At this stage, eleven cultures (9 YNP isolates and 2 culture collection strains) were each 
grown in triplicate B8.7 medium in 1L photobioreactor (PBR) tubes, in a temperature-controlled 
circulating-water bath (25 ± 2°C).49, 115, 117 Each strain was inoculated into two sets of triplicates. 
One triplicate received sodium bicarbonate (NaHCO3) addition and was compared to the other 
set of triplicate cultures that served as an air-only control. Prior to sodium bicarbonate addition, 
the replicates are referred to as replicates 1 and 2, after which, they are called air-only or sodium 
bicarbonate condition. Cultures were sparged with ambient air at a flow rate of 0.4 L min-1, 
which contained approximately 400 ppb of atmospheric CO2 or 8–9 mg C L-1 dissolved 
inorganic carbon (DIC) in the growth medium, as measured by flow meters and carbon analyzer 
measurements, respectively. 
Bicarbonate Addition 
To determine if sodium bicarbonate addition increased TAG accumulation, 25 mM 
sodium bicarbonate was added before nitrogen depletion at the beginning of the 14:10 L:D light 
cycle.107 The nitrate concentration in the medium was monitored using the szechrome NAS assay 
to ensure sufficient nitrate was available prior to sodium bicarbonate addition.107, 121 Dissolved 
inorganic carbon was measured using a Skalar Formacs TOC/TN Carbon Analyzer with an LAS-
160 autosampler (Skalar). The DIC concentration was determined through injection of 100 μL of 
sample in 2% (v/v) phosphoric acid; CO2 peak area was measured with an infrared (IR) detector 
and calibrated with peak responses from sodium carbonate/bicarbonate standard solutions.49 
 41 
Determination of Unialgal Strains and Strain Identification 
DNA was extracted from fifty mL of each strain and sequenced using 454 
Pyrosequencing to determine that they were unialgal. DNA was extracted using a CTAB/bead 
beating method outlined by Moll (2014)49 and quantified using a Qubit fluorometer with a 
dsDNA BR Assay kit. The 454-Pyrosequencing was performed according to Bowen de Leon et 
al., 2012, and Bell et al., 2018.41, 122, 123 Reads were analyzed using Qiime124 and redundant reads 
were clustered and eliminated with CDHIT-EST.125 The phylogenetic tree was constructed using 
RaxML (version 8.2.12) and FigTree (version 1.4.4).126, 127 To obtain resolved identifications, 
each strain was amplified across the internal transcribed spacer region ITS 1 (ITS 1: 5’ 
TCCGTAGGTGAACCTGCGG 3’), 5.8S, and ITS 2 (ITS 4: 5’ TCCTCCGCTTATTGATATGC 
3’) regions with Sanger Sequencing (Table 3.2).128-131 Each culture was periodically verified as 
axenic when grown on solid medium (B8.7 with 0.5% glucose, 0.5% yeast extract and 2% 
agarose) at room temperature (25 ± 2°C) in the dark. 
Dry Cell Weight 
Twenty five mL of suspended culture were filtered GF/F Glass Microfiber Filters 
(Whatman) to determine the DCW. To remove water, each sample was lyophilized (Labconco 
Lyophilizer). All samples were weighed immediately upon retrieval from the lyophilizer to 
minimize air vapor condensation.49  
 
 
Nitrate 
 42 
To ensure sufficient nitrate was available in the growth medium, the colorimetric 
szechrome NAS assay was used prior to sodium bicarbonate addition.107, 121 The Dionex™ ICS-
1100 Ion Chromatography System (IC) was also used to verify and obtain a more accurate 
analysis of the anions (including nitrate) in each sample. The ion chromatography system 
consisted of a Dionex™ IonPac AS22 separation column coupled with a Dionex™ AERS 500 (4 
mm) suppressor. Nitrate peaks were reported in µS using a Thermo Scientific DS6 Heated 
Conductivity Cell. The peaks were integrated and compared against the standard calibration 
curve made using the Dionex™ Combined Seven Anion Standard I solution. The samples were 
diluted (1:5) when necessary using nanopore water (MilliQ) to stay within the linear range of 
quantification. 
Nile Red Fluorescence 
To obtain a daily estimation of TAGs for each culture condition, samples were stained 
with Nile Red (9-diethylamino-5H-benzo (α) phenoxa-zine-5-one) (Sigma- Aldrich).49, 69, 72, 87, 115 
Samples were diluted 1:5 using ultra-filtered water and suspended in 20% DMSO or acetone 
(Table A.1).132 Each sample was stained with 20 µL of Nile Red (Sigma-Aldrich). The optimal 
stain times49, 71, 107, 133 had been determined for each strain prior to experimentation ranging from 
4 to 60 min where stained green algae cells were quantified on a microplate reader (BioTek, 
Synergy H1) with 530/575 (20% DMSO) and 480/580 (acetone) excitation/emission filters. The 
time after staining when a maximum peak in Nile Red fluorescence relative fluorescence units 
(rfu) was observed indicated the optimal time at which the stain permeated the cell wall and 
stained neutral lipids in the lipid vacuoles.49  
Results 
 43 
Each of the nine YNP strains was verified to be unialgal by 454-Pyrosequencing and 
analysis by Qiime before the strain characterization studies.124 To facilitate analysis, the strains 
were first categorized by the three evaluation criteria: growth rate, lipid yield, and biomass 
production. The strains within each of those categories were then characterized in further depth 
for each of the parameters tested. 
Verification of Unialgal Strains and Strain Identification 
 Previously, algae strains were primarily determined to be unialgal by cell morphology 
when viewed microscopically. Today, methods such as next generation sequencing (NGS), 
enable more robust determination of culture purity. Each of the nine YNP isolates was confirmed 
unialgal by 454 Pyrosequencing and analysis with Qiime124 before the characterization studies. 
The resulting extracted DNA concentrations from the YNP green algae strains ranged from 30.1 
to 497 ng/𝜇L	for	PGV10-G2	and	WC-5,	respectively	(Table	3.1).   
 
Table 3.1 SSU rDNA (18S) DNA concentrations for the extracted, amplified, and purified prior 
to 454-Pyrosequencing for the nine YNP green algae strains. DNA was quantified using a Qubit 
fluorometer with a dsDNA BR Assay kit. 
Strain	 Extracted	DNA	 DNA	Concentration	(ng/𝜇L)	
(ng/𝜇L)	 after	PCR	cleanup	for	454-Pyrosequencing	
MF-1	 115	 67.0	
PGV-6	 140	 78.9	
PGV8-G1	 43.4	 63.4	
PGV8-G2	 73.1	 48.6	
PGV10-G1	 237	 52.1	
PGV10-G2	 30.1	 48.2	
WC-1	 252	 56.8	
WC-2b	 157	 45.1	
WC-5	 497	 57.6	
 
 44 
Strain purity was determined using the approximately 500 bp amplified V1-V3 region of 18S 
SSU rDNA sequenced by 454 Pyrosequencing. Strain identity was determined using the full-
length ITS1, 5.8S and ITS2 regions by Sanger sequencing (Tables 3.2 and 3.3, Figure 3.2).  
Table 3.2 Representative sequences for 18S SSU rDNA. Each sequence was BLAST searched 
for identification. There were three strains that had identical BLAST results with different 
identifications for 18S (MF1, PGV6, and PGV10-G1), which represents the diverse collection of 
sequences in NCBI. 
Amplico
Algae 
isolate Identification % ID Query 
Coverage E-value Accession n length 
(bp) 
Chlorobion braunii gi|933801142|KT833591.1 
95.73% 100% 0.0 gi|19847825|X91263.1 
MF1 Podohedriella 513 
falcate 95.73% 100% 0.0  
Tetradesmus 
PGV6 obliquus 95.05% 100% 0.0 gi|1341122479|MG022741.1 
95.05% 100% 0.0 gi|930306211|KR082492.1 525 
Scenedesmus acutus 
 
PGV8- gi|563323151|KF791553.1 
G1 Chlorococcum sp. 93.74% 99% 0.0 523 
 
PGV8-
G2 Chlorococcum sp. 94.41% 100% 0.0 gi|563323151|KF791553.1 482 
PGV10- Desmodesmus sp. 98.45% 87% 0.0 gi|693012472|AB917137.1 
G1 Scenedesmus sp. 98.45% 87% 0.0 gi|402704215|JX258841.1 518 
PGV10- Chlorella 
G2 pyrenoidosa 96.77% 85% 0.0 gi|5326955|AJ242762.1 544 
Tetradesmus 
obliquus 97.56% 74% 0.0 gi|1345677876|MG971386.1 
WC-1 gi|1245685193|KY816917.1 551 
Scenedesmus sp. 97.56% 74% 0.0  
WC-2b Chlorocococcum sp. 99.27% 100% 5e-136 gi|563323151|KF791553.1 498 
WC-5 Chlorella 
sorokiniana 94.94% 100% 0.0 gi|1176441302|KY921855.1 514 
 45 
Table 3.3 BLAST identification of ITS amplicons obtained by Sanger sequencing. The sequence 
identity was determined by BLAST search for identification. There were two strains that had 
identical BLAST results with different identifications ITS (PGV6 and WC-1). 
Sample Identification % ID Query E- Accession Amplicon 
Coverage value length 
(bp) 
gi|530330741|KF471115.1 
MF1 Scenedesmus sp. 99.32% 100% 0.0 1162 
gi|998524167|KU170646.1 
PGV6 Acutodesmus obliquus 100% 100% 0.0 gi|359385305|FR865721.1 1224 
 
Scenedesmus obliquus 100% 100% 0.0 
 1358 
PGV8- Chloromonas sp. 98.26% 25% 9e-77 gi|312270216|HQ404890.1 
G1 
 
gi|312270216|HQ404890.1 
PGV8- Chloromonas sp. 98.26% 28% 9e-77  1194 
G2 
gi|1520100899|MH010842.1 1132 
PGV10- Desmodesmus sp. 99.65% 100% 0.0 
G1 
gi|1569048762|MK496927.1 
PGV10- Desmodesmus sp. 100% 100% 0.0  1119 
G2 
gi|1341122479|MG022741.1 
WC-1 Tetradesmus obliquus 99.53% 71% 0.0 gi|1273809524|MF326554.1 1752 
gi|1027901709|KU291882.1 
Acutodesmus sp. 99.53% 71% 0.0  
Scenedesmus basilliensis 
99.53% 71% 0.0 
gi|558605714|KF537773.1 
WC-2b Botryococcus sp. 93.79% 99% 0.0  1177 
gi|1321358367|MG757661.1 
WC-5 Chlorella sp. 95.73% 99% 0.0  1298 
 
 46 
WC-5
0.77 Botryococcus sp.
0.32
WC2b
0.81
Chlamydomonas reinhardtii
0.55
Chlorococcum pyrenoidosum
0.66
PGV8-G2
0.02
0.86 PGV8-G1
Desmodesmus sp.
0.45
0.53 PGV10-G1
PGV10-G2
0.79
WC-1
0.66
1.19 PGV6
Tetradesmus obliquus
0A.9c3utodesmus sp.
0.93
0.93Chlorella sorokiniana
Scenedesmus obliquus
1.17 MF1
0.77
1 Monoraphidium sp.
1.11 Sphearopleales sp.
Arabidopsis thaliana
0.2  
Figure 3.2 Full-length ITS Sanger results used for strain identification. Each strain (median = 
1194 bp), was aligned with Muscle (v3.8.1551)134 and phylogenetic distances were determined 
using the Maximum Likelihood method with RaxML (version 8.2.12).127 The scale bar 
represents number of nucleotide changes between strains. The bootstrap values present at the 
nodes represent the divergence event on a time scale. 
Doubling Time 
Three of eleven strains exhibited distinctly faster doubling times compared to the others 
(Figure 3.3). The strains exhibiting the fastest doubling times were WC-2b, WC-5, and WC-1 
and at 15.98 ± 1.48 h (air-only), 15.00 ± 0.50 h (replicate 2) and 16.07 ± 1.45 h (replicate 2) 
(average ± 95% confidence interval), respectively. The slowest growing strains were PGV6, PC-
 47 
3, and UTEX395 with the doubling times 30.40 ± 2.84 h (air-only) and 27.45 ± 1.45 h (sodium 
bicarbonate addition) (Figure 3.3). 
 
Figure 3.3 Average doubling times based on the maximal growth rates for each of the eleven 
strains (replicate 1 and replicate 2). The error bars represent 95% confidence intervals. 
 
While there was a slight increase in doubling time in replicate 2 for each of the three 
strains, the air-only and sodium bicarbonate conditions were within error of each other. Figure 
3.4A shows that WC-5 reached the highest final cell concentrations (7.75 x 107 ± 6.34 x 106 cell 
mL-1 and 7.66 x 107 ± 8.64 x 106 cell mL-1 for the air only and sodium bicarbonate conditions, 
respectively) on the day of harvest, day 13.5. On day two there was a large amount of error in 
cell counts due to the nature of the WC-2b cell division, which was highly variable from PBR to 
PBR. WC-2b has a colonial cellular organization, such that when a mature cell lyses, it can 
 48 
release multiple smaller cells.134 In asynchronous cultures, the cell lysis will be offset, causing 
large variation in less dense cell cultures. 
Among the fastest-growing strains, WC-5 reached the highest pH. The pH of the air only 
and sodium bicarbonate conditions were similar at 10.88 ± 0.56 and 10.41 ± 0.41, respectively 
(Figure 3.4B). The pH was greater than 10.5 indicating that the culture was likely carbon limited 
and inorganic carbon in the culture primarily existed as carbonate rather than bicarbonate (Figure 
3.4B). There was a noticeable pH decrease for each culture just prior to the sodium bicarbonate 
additions. This was due to the accumulation of dissolved CO2 during the dark-period, just before 
the bicarbonate additions were made.   
As shown in Figure 3.4C, each strain depleted the medium of NO3- between days 6 and 
10. The starting nitrate concentration in Bold’s is 182.4 mg L-1, but after the photobioreactors 
were autoclaved, some evaporative loss occurred causing the nitrate concentration to vary 
slightly. The medium NO3- concentrations were measured just prior to sodium bicarbonate 
addition to ensure there were sufficient concentrations available. Previous results have shown 
that the addition of sodium bicarbonate after NO3- depletion results in a drastic decrease in lipid 
concentrations.115 Nitrate was determined to be available for the three fastest growing strains, 
ensuring that lipid accumulation could result following sodium bicarbonate addition. As shown 
in Figure 3.4C, each strain depleted the medium of NO3- between day 6 and 10. 
The Nile Red fluorescence showed an increase after nitrate depletion (Figure 3.4C, D). 
The addition of sodium bicarbonate changed the Nile Red fluorescence intensity of WC-5, WC-1 
and WC-2b, with an increase of 45%, -13%, and 37% respectively, compared to the air only 
condition (Figure 3.4D, Table Appendices A2). Strain WC-5 reached the highest Nile Red 
 49 
fluorescence with nearly one order of magnitude in difference for both conditions compared to 
WC-1 and WC-2b. 
Biomass Production 
As shown in Figure 3.5, the final highest DCW occurred after sodium bicarbonate 
addition, for nine of eleven strains. At harvest, the strains with the highest DCWs were PC-3 and 
WC-2b at 1.50 ± 0.14 and 1.44 ± 0.10 g L-1, respectively, for the sodium bicarbonate condition. 
The strains with the lowest final DCWs were PGV10-G1, PGV8-G1 and PGV8-G2 at 0.69 
± 0.10 (sodium bicarbonate addition), 0.56 ± 0.11 (air only), and 0.54 ± 0.10 (sodium 
bicarbonate addition) g L-1, respectively (Table 3.3). The strains that were unaffected by sodium 
bicarbonate addition were PGV-6, WC-5, PGV10-G1, PGV10-G2 and PGV8-G2 (Figure 3.5). 
 
Figure 3.5 Average DCW at harvest for each of the eleven strains (with and without sodium 
bicarbonate addition). The error bars represent 95% confidence intervals.  
 50 
A D1
1e+8 3500
3000
1e+7
2500
2000
1e+6
1500
1000
1e+5
500
1e+4 0
B D2
12 50000
11 40000
10 30000
9 20000
8 10000
7 0
0 2 4 6 8 10 12 14
C
Time (days)
200
150
100
50
0
0 2 4 6 8 10 12 14
Time (days)  
 
Figure 3.4 Growth and lipid accumulation observations for the cultures with the fastest growth 
rates, WC-1, WC-2b and WC-5 with and without sodium bicarbonate addition: cell concentration 
(A), pH (B), medium nitrate concentration (C) and Nile Red fluorescence (D) for the fastest 
doubling times. The error bars represent 95% confidence intervals. For Figures A-C, the series 
are represented by the following: WC-5 (circles), WC-1 (triangles) and WC-2b (squares). In 
Figure D1, WC-1 (circles) and WC-2b (triangles) and Figure D2 WC-5 (circles).The open and 
filled symbols represent air only and sodium bicarbonate addition, respectively, for all 
conditions.  
-1
Nitrate (mg L-1) Cell concentration (cells mL )
pH
Nile Red fluorescence (rfu) Nile Red fluorescence (rfu)
 51 
PC-3 and WC-2b were the two strains that produced the highest endpoint dry cell weight 1.08 ± 
0.25 and 1.50 ± 0.14 and 1.02 ± 0.15 and 1.44 ± 0.10, respectively, for air-only and sodium 
bicarbonate addition (Figure 3.5). While WC-2b had faster doubling times than PC-3 (Figure 
3.3) (18.13 ± 0.71 h and 27.45 ± 4.75 h), PC-3 had a higher maximum cell count by stationary 
phase of growth at 1.04 x 107 ± 5.18 x 106 cell mL-1 compared to WC-2b at 6.08 x 106 ± 1.91 x 
106 cell mL-1 (Figure 3.6A) indicating higher biomass production.  
At pH values greater than 10.3 the majority of the carbonate species will primarily exist 
as CO32-, which is difficult to impossible for some algae to fix.70 At high pH values, some algae 
strains experience a “pH stress” which may contribute to lipid accumulation.49 Once sodium 
bicarbonate was added, the pH ranged from 9.86 – 10.26 during the light periods, due to the 
buffering effect of sodium bicarbonate (pKa = 10.3).135 WC-2b reached the highest pH of the 
two high biomass producers at 11.56 ± 0.06 at	day	8	for	the	sodium	bicarbonate	condition.	
When	grown	in	the	air-only	condition,	PC-3 maintained a higher pH at approximately 11.10 
compared to the sodium bicarbonate condition that ranged from 9.86 – 10.38, due to its buffering 
ability (Figure 3.6B). 
Once nitrate was depleted (Figure 3.6C), the Nile Red fluorescence increased on days 8 
and 10 for WC-2b and PC-3, respectively (Figure 3.6D). PC-3 had a maximum Nile Red 
fluorescence of 1553 ± 363 and 1678 ± 166 relative fluorescence units (rfu) for the air-only and 
sodium bicarbonate addition conditions, respectively (Figure 3.6D). WC-2b had a maximum Nile 
Red fluorescence of 1688 ± 489 and 2693 ± 250 rfu. While there was not a significant increase in 
Nile Red fluorescence when sodium bicarbonate was added to the PC-3 cultures, there was a 
37% increase in fluorescence for WC-2b, indicating a greater concentration of neutral lipids.  
 
 52 
A C
1e+8
200
1e+7
150
1e+6
100
1e+5 50
1e+4 0
 
B D
12 6000
5000
11
4000
10
3000
9
2000
8
1000
7 0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Time (days) Time (days)  
 
Figure 3.5 Growth and lipid accumulation observations for the cultures with the highest biomass 
production as DCW, PC-3 (circles), and WC-2b (triangles), with and without sodium bicarbonate 
addition: cell concentration (A), pH (B), medium nitrate concentration (C) and neutral lipids (D). 
The error bars represent 95% confidence intervals. The open and filled symbols represent air 
only and sodium bicarbonate addition, respectively, for all conditions. 
 
 
Cell concentration (cells mL-1)
pH
Nile Red fluorescence (rfu) Nitrate (mg L-1)
 53 
Highest Lipid Producing Strains 
 Using Nile Red screening, the two highest lipid producing strains were WC-5 and 
UTEX395. Compared to the next highest strain in Nile Red fluorescence, MF-1 at 18928 
± 2549 rfu, WC-5 and UTEX-395 were approximately double in fluorescence at 41120 
± 6342, and 34960 ± 1522 rfu, respectively. The two strains that reached the highest DCW, PC-3 
and WC-2b, were the lowest in Nile Red fluorescence.  
 
Figure 3.6 Average final Nile Red fluorescence for each of the eleven strains (with and without 
sodium bicarbonate addition). The error bars represent 95% confidence intervals. 
 
Among these strains, WC-5 grew the fastest at a doubling time of 16.00 ± 1.48 h for the 
air only condition compared to UTEX-395 that doubled at a rate of 25.08 ± 2.42 h. (Figure 
3.8A). However, both strains were similar in their final cell concentrations with WC-5 reaching 
 54 
7.75 x 107 ± 6.34 x 106 cell mL-1 and UTEX-395 at 7.66 x 107 ± 8.64 x 106 cell mL-1. Sodium 
bicarbonate was added at the beginning of the light cycle following a 10-hour dark period. 
Because photosynthesis did not occur during this time, dissolved CO2 in the growth medium 
accumulated in the growth medium which can be observed by a drop in pH at 6.4, 7.4, and 8.4 
days (Figure 3.8B). The lower pH at the beginning of the light cycle compared to the normal 
sampling time at the end of the light cycle is due to the accumulation of H2CO3 in the growth 
medium.115 Following this measurement, sodium bicarbonate was added. Nitrate was verified to 
be present in each of the samples just before sodium bicarbonate addition (Figure 3.8C) on day 
7.4.  
Among the highest lipid producers, the strain that reached the highest Nile Red 
fluorescence intensity was WC-5 (Figure 3.8D). With the addition of sodium bicarbonate, UTEX 
395 showed a 65% increase in Nile Red fluorescence above the air-only condition. Among the 
best lipid producing strains, the Nile Red fluorescence of 4213.33 ± 495.36 (replicate 1) 3116.67 
± 1580.67 (replicate 2) for WC-5 and 11263.33 ± 763.83 (replicate 1) 11036.67 ± 436.26 
(replicate 2) for UTEX 395 started to increase on days 6 and 7.4, respectively, which coincided 
with a pH stress and nitrate limitation. 
 
 55 
A C
250
1e+8 200
1e+7 150
100
1e+6
50
1e+5
0
B D
12 50000
11 40000
10 30000
9 20000
8 10000
7 0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Time (days) Time (days)  
 
Figure 3.7 Growth and lipid accumulating observations for the cultures with the highest lipid 
production, WC-5 (circles) and UTEX 395 (triangles), with and without sodium bicarbonate 
addition: cell concentration (A), pH (B), medium nitrate concentration (C), and FAMEs (D). The 
error bars represent 95% confidence intervals. The open and filled symbols represent air only 
and sodium bicarbonate addition, respectively, for all conditions. 
 
 
 
Cell concentration (cells mL-1)
pH
Nile Red fluorescence (rfu)
Nitrate (mg L-1)
 56 
Discussion 
As more strains are isolated, potentially novel properties may be identified such as lipid 
accumulation abilities in specialized environments and high-value coproducts. Eleven green 
algae strains were screened for their potential use in biodiesel production. Strains were grouped 
based on three properties: fastest growth, highest biomass yield, and highest lipid production. 
Growth rate 
Of the eleven strains evaluated, the three fastest growing strains were WC-5 (Chlorella), 
WC-1 (Tetradesmus), and WC-2b (Botryococcus). Additionally, some Botryococcus sp. are 
known to accumulate high lipid concentrations,136-138 while WC-2b was one of the lowest strains 
in Nile Red fluorescence. Fast growing strains are more likely to be productive in outdoor 
raceway ponds because they can outcompete other non-algae strains for nutrients in the growth 
medium. Fast growing strains may also be productive for biomass and or lipid concentrations. 
Identification of high biomass producers and lipid producers can determine which strains should 
be used for certain applications, such as wastewater remediation or biodiesel production.  
Biomass production 
Based on doubling time and DCW, the best strain for biomass production characterized 
here was PC-3. This strain may be promising in applications where biomass production is 
combined with nutrient removal in wastewater treatment and bioremediation of NO3- and PO4.3- 
139-143 Eustance et al. (2013) indicated that Scenedesmus obliquus and Monoraphidium sp. were 
be able to utilize ammonia, nitrate or urea while concomitantly producing biodiesel for two green 
algae strains and could potentially be used in wastewater remediation.144 145-147 Further, 
 57 
microalgae have been studied for remediation of nutrient-rich swine and cattle waste or 
municipal wastewater while eliminating ammonia and phosphate and followed by neutral lipid 
accumulation for biofuels.145, 146, 144, 148, 149 150, One study used a mixed culture composed of 
Scenedesmus, Actinastrum, Chlorella, Spirogyra, Micractinium, Golenkinia, Chlorococcum, 
Closterium, and Nitzschia for the treatment of dairy and wastewater in bench-scale reactors. 
While dairy wastewater resulted in lipid productivity of 17 mg/L/day, the municipal wastewater 
reactors resulted in 24 mg/L/day indicating that these mixed cultures were more efficient at 
assimilation of ammonia and phosphate into their biomass than production of lipids. 145  
Lipid production 
When considering doubling time and lipid accumulation, the two best strains for 
biodiesel production observed here were WC-1 (Tetradesmus) and WC-5 (Chlorella).  The two 
highest lipid-producing strains were WC-5 (Chlorella) and UTEX395 (Chlorella). UTEX 395 
was included as a control since it has previously been shown to be an oleaginous algae strain.151, 
152 With fast doubling times, these two strains have a higher potential to outcompete 
contaminating, non-biodiesel-producing microorganisms. Algae strains with fast growth rates 
and high lipid content have the greatest potential to produce biodiesel.  
WC-1 was previously identified as Scenedesmus obliquus115 and Gardner et al., (2012) 
showed WC-1 had a doubling time of 21 hours in an air only condition which is similar to the 
doubling time reported here at 19.81 ± 7.19 hours.15 The WC-1 DCWs (0.99 g L-1 ± 0.07; 
⍺=0.05) were most similar to the DCWs reported by Gardner, 2012 (1.1 g L-1 ± 0.1; ⍺=0.05) 
which used the same 1.25 L photobioreactors, and aeration, allowing greater availability of CO2 
to the cells compared to algae grown in flasks.107, 115 
 58 
When screening for promising strains for lipid production, it can be important to 
determine which strains are responsive to sodium bicarbonate and which are not.69,92 Bicarbonate 
addition buffers the pH and provides higher concentrations of inorganic carbon necessary to 
make lipids. This combined with nitrate depletion, which reduces new biomass growth, can often 
lead to higher rates and extents of lipid production.49, 69, 70, 133 Not all algae increase lipid 
production with added sodium bicarbonate. In this study, there were four strains that did not 
respond to sodium bicarbonate addition. Gardner et al. (2012) found that when sodium 
bicarbonate was added to cultures after nitrate depletion, there was an overall loss in neutral 
lipids, whereas bicarbonate addition just before nitrate depletion increased lipid production 
significantly. Here, nitrate concentrations were measured and verified to be present in all strains 
prior to sodium bicarbonate addition. Strain WC-1 was previously reported to increase its neutral 
lipid content as a response to sodium bicarbonate addition.69 Here the WC-1 medium nitrate 
concentration was present, but low (4.3 ± 2.5 mg L-1). It is possible that WC-1 has a minimum 
medium nitrate concentration required for sodium bicarbonate addition to result in increased 
lipid content, or perhaps the nitrate was depleted completely between the time of nitrate 
measurement and the sodium bicarbonate addition. Further work is needed to elucidate when and 
why some strains increase lipid production with sodium bicarbonate addition and others do not.  
It was hypothesized there would be a positive correlation between strains with low 
doubling times and high DCWs. When strains grow faster, they often accumulate more cells and 
ultimately more biomass. The results in Figure 3.9 indicate that there was no correlation 
observed between doubling times and DCWs for each strain as indicated by a linear regression. 
Similarly, it was hypothesized that there would be an inverse relationship for strains having low 
doubling times and high concentrations of lipids, as indicated by Nile Red fluorescence. When 
 59 
strains grow more slowly, the fixed carbon may be stored as long-chain fatty acids rather than 
used to make new cells. However, the 11 algae strains characterized here showed an R2 of 
0.0207 indicating a lack of correlation between doubling time and Nile Red fluorescence (Figure 
3.10). 
 
Air Only
35
30
25
20
15 y = 7.0193x + 16.205
R² = 0.0625
10
0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2
DCW (g/L)
 
Figure 3.9 Linear regression of the doubling times and DCW for the 9 YNP green algae strains 
and 2 industrial strains. The R2 for the linear regression was 0.0625 indicating that there was not 
a correlation between doubling time and DCW.  
Doubling time (h)
 60 
Air Only
25000
20000
15000
10000
5000
y = -224.34x + 15377
R² = 0.0207
0
10 15 20 25 30 35
Doubling time (h)
 
Figure 3.10 Linear regression of the doubling times and Nile Red fluorescence for the 11 YNP 
green algae strains. The R2 for the linear regression was 0.0207 indicating that there was not a 
correlation between doubling time and Nile Red fluorescence. 
The benefits of using extremophilic algae, especially alkaliphilic strains are numerous. In 
unbuffered systems, algae fix the acidic forms of dissolved inorganic carbon such as carbonic 
acid, driving the pH of the system to alkaline conditions.115 Algae that are adapted to alkaline 
conditions can withstand ≥ pH 8.5.101 Once the pH increases such that the DIC is limited to the 
alkaline CO32-, it becomes harder for algae adapted to alkaline conditions to fix these species, so 
they stop fixing the DIC and the pH drops as CO2 accumulates in the growth medium. At pH 
between 10.5 and 11, a pH-induced lipid accumulation “stress” can sometimes be observed.115 At 
pH > 8, there is an increased flux of CO2 into the growth medium increasing the inorganic 
carbon availability, which can be used for carbonaceous molecules such as TAGs, cellulose, and 
other carbon-rich molecules.49, 70, 107, 133, 144, 153  
 
Nile Red fluorescence (rfu)
 61 
Summary & Conclusions 
In the work described here, 9 newly isolated extremophilic algal strains and 2 culture 
collection strains were compared and characterized for their potential for biomass and/or biofuel 
production. The results obtained represent initial detailed characterization, but not optimized 
growth conditions. Parameters such as temperature, pH, light intensity, and growth medium 
composition could further be optimized for each strain and will potentially improve growth rates, 
biomass yields, and the rate and extent of lipid accumulation.  
Among the eleven strains tested, three were found to grow rapidly; WC-1, WC-5 and 
WC-2b. Two strains were identified as producing the highest DCW; PC-3 and WC-2b. Two 
strains were distinctly higher in lipid accumulation than the other nine; the two Chlorella strains, 
WC-5 and UTEX395. No single strain performed well in all three categories, however, two 
strains performed well in two categories; WC-5 performed well for growth and lipid 
accumulation and WC-2b had fast growth rates and high DCW density. Based on these results, 
of the 11 strains tested WC-5 and WC-2b are the best potential strains for further optimization of 
lipid for biodiesel and biomass production, respectively. Among the 9 green algae strains 
collected from one extremophilic stream, there are significant differences among them in terms 
of growth rate, biomass production and lipid accumulation. 
 
  
 62 
CHAPTER FOUR 
DRAFT GENOME FOR A NOVEL, EXTREMOPHILIC, 
FRESHWATER DIATOM 
Contribution of Authors and Co-Authors 
Manuscript in Chapter 4 
 
Author: Karen M. Moll 
 
Contributions: Performed data analysis, and writing. 
 
Co-Author: Thiruvarangan Ramaraj 
 
Contributions: Mentored data analysis 
 
Co-Author: Nicholas P. Devitt 
 
Contributions: Mentored data analysis 
 
Co-Author: Connor Cameron 
 
Contributions: Mentored data analysis 
 
Co-Author: Matthew Fields 
 
Contributions: Contributed to initial sequencing design and manuscript review 
 
Co-Author: Joann Mudge 
 
Contributions: Discussed results, mentored data analysis, manuscript review and revision.  
 
Co-Author: Brent M. Peyton 
 
Contributions: Discussed results, manuscript review and revision.  
 
 
 
 63 
Manuscript Information 
Karen M. Moll, Thiruvarangan Ramaraj, Nicholas P. Devitt, Connor Cameron, Matthew Fields, 
Joann Mudge, Brent M. Peyton 
 
BMC Genomics 
 
Status of Manuscript:  
__x__ Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
____ Published in a peer-reviewed journal 
 
 
 
 
  
 64 
Abstract 
  Diatoms are known for atmospheric CO2 remediation and decreasing ocean acidification. 
With changes in atmospheric CO2, it is important to characterize diatom genomes as they are 
integral to understanding global carbon fixation, with potential for photosynthetic biodiesel 
production. Isolated from an alkaline stream (pH 9.3) in Yellowstone National Park (YNP), 
diatom strain RGd-1 has been shown in previous work to yield lipid concentrations up to 30-40% 
(w/w) triacylglycerol (TAG) and 70-80% (w/w) fatty acid methyl esters (FAMEs) that can be 
transesterified into biodiesel for biofuel applications. Here we report the 24 Mb draft genome for 
RGd-1, an extremophilic, freshwater, pennate diatom. RGd-1 was found to align best to the 
centric diatom, Thalassiosira pseudonana and Phaeodactylum tricornutum on a nucleotide and 
protein level, respectively. A de novo transcriptome assembly was used to annotate the RGd-1 
genome assembly. RGd-1 was shown to have a nearly complete glyoxylate pathway that could 
be used as a carbon conservation strategy to accumulate high concentrations of neutral lipids. As 
part of the RGd-1 whole-genome sequencing project, we assembled an associated, novel 3.1 Mb 
Brevundimonas sp. genome. Nine major bacterial OTUs were found in the RGd-1 culture 
through 16S amplification and sequencing. Of those strains, seven may produce iron chelating 
siderophores, which could make iron biologically available to RGd-1 in an alkaline environment.  
 
 
 
 
 65 
Introduction 
Diatoms, unicellular, photosynthetic algae with siliceous cell walls, are critical to 
ecosystem health at a global scale and have the capacity to help buffer climate change. Producing 
at least 25% of Earth’s atmospheric oxygen, diatoms also fix ~25–45% of global CO2 directly 
mitigating the major driver behind climate change.12 However, diatoms are not immune to the 
effects of climate change, manifested in aquatic systems as rapid temperature fluctuations and 
acidification. As a result, a drastic global decline in diatoms has recently been observed that may 
directly reduce oxygen contributions and carbon fixation, further compounding climate change.12  
 As aquatic phytoplankton (class Bacillariophyceae) that contribute to primary production, 
diatoms fix dissolved CO2 as a necessary requirement for growth. The innate ability of diatoms 
to sequester dissolved CO2 and incorporate the carbon into either biomass, starch, or lipids, 
makes these unicellular microorganisms suitable candidates for large-scale cultivation in terms 
of CO2 utilization and valuable by-products. Once diatoms die, their siliceous cell walls sink to 
the bottom of the ocean, thus removing CO2 from the atmosphere and adding a carbon source for 
the utilization of benthic organisms.154, 155 They are especially promising given their potential to 
outcompete other phytoplankton based on silicate availability.154 Accordingly, diatoms may be 
viewed as natural sources of carbon sequestration.154   
While diatoms have been largely overlooked in biofuel research, which has primarily 
focused on green algae, there are many good reasons to consider diatoms for biofuel research and 
potential production. As a near carbon-neutral technology, using phototrophs for biofuel 
production would limit additional CO2 emissions while helping to meet transportation fuel 
demands. Further, in addition to high lipid accumulation, diatoms can amass high concentrations 
of other carbonaceous compounds useful for the production of renewable fuels and high-value 
 66 
co-products (e.g. chrysolaminarin, and fucoxanthin).15, 156 With these attributes, diatoms can help 
recycle atmospheric carbon dioxide while producing biofuels, thereby contributing to improved 
environmental health. Understanding the genetic potential of these understudied organisms is 
now more critical than ever. Characterization of the genome sequence has revealed important 
information on the metabolic capacity that provides an important foundation for monitoring or 
manipulating biofuel production capabilities. 
RGd-1, a pennate diatom most closely related to Lemnicola hungarica based on SSU 
rDNA (18S), was isolated from Witch Creek, an alkaline (pH 9.3), thermally-impacted, arsenic-
containing (300 ppb) creek in Yellowstone National Park (YNP).49 Previously, RGd-1 was 
shown to accumulate very high concentrations of triacylglycerol (TAG) 30–40% wt/wt and fatty 
acid methyl ester (FAME) 70–80% wt/wt, making it a promising candidate for biofuel 
production.49  
Until recently, most diatom biofuel research has been performed on axenic strains.10, 11, 
157, 158 Recent research has shown there are benefits for established relationships between bacteria 
and diatoms.44, 159, 160 While some bacteria associated with diatoms have been identified, specific 
interactions are much less well defined44 with the literature being particularly scarce for 
microbiomes of non-marine diatoms. Bacteria attached to diatom frustules have been found to 
exchange substances such as vitamins, indole, and organic carbon compounds with their 
associated diatom and to receive protection from predators.161-164 Bacteria attached to diatoms 
have been hypothesized to promote organic matter degradation thus increasing community 
biomass due to cross-feeding and increasing sedimentation, which ultimately provides organic 
carbon to benthic communities.165, 166 Additionally, while diatoms do not produce siderophores 
(iron-chelating molecules that bind iron in bioavailable forms) it is thought that siderophore-
 67 
producing bacteria may provide bioavailable iron to diatoms, a mechanism that is particularly 
important in alkaline environments where bioavailable iron is scarce.44, 159  
The association of bacteria with diatoms can be thought of as a “diatom microbiome”. 
The bacteria reside in the “phycosphere,” defined by Bell and Mitchell as a region around an 
algal cell surrounded by bacteria.42 This is an important concept in the study of algae and is 
becoming increasingly appreciated. Lending support to the importance of the diatom 
phycosphere, several marine diatoms have been observed to harbor distinct bacterial 
communities.44, 161, 167  For instance, bacteria attached to Thalasssiosira rotula and Skeletonema 
costatum have been found to consistently be members of the Flavobacter-Sphingobacteria 
groups within the Bacteriodetes phylum and the unattached bacteria are commonly in the 
Roseobacter group within 𝛼-Proteobacteria.161 As an extremophilic, freshwater diatom, RGd-1 
was found to contain bacteria remarkably different from those that have been associated with 
marine diatoms.  
Given the unique environment of RGd-1 and the lack of available diatom genomes, the 
RGd-1 diatom was selected for whole-genome sequencing to obtain insight into unique 
characteristics and its ability to accumulate high concentrations of lipids.49 Further, it was 
possible to gain insight into how a fresh-water diatom adapted to extreme conditions of 
temperature, pH, and metals. Here, we present the draft genome assembly for the extremophilic, 
freshwater diatom, RGd-1, and associated bacteria. 
 
 
 
 68 
Methods 
DNA Extraction 
RGd-1 high molecular weight DNA was extracted using a cetyl trimethyl ammonium 
bromide (CTAB) DNA extraction method (see Appendix B)116 and amplified according to the 
JGI DNA extraction protocol.168 The RGd-1 culture was unialgal, but even after antibiotic 
treatments with Ampicillin (1000 mg L-1), a beta-lactam antibiotic that inhibits cell wall 
synthesis,169, 170 and Imipenem (5 mg L-1), a fluoroquinolone that inhibits prokaryotic DNA 
gyrase (topoisomerase),171 the culture still contained bacteria.  
Whole-genome Sequencing 
Illumina 
IlluminaÒ (San Diego, CA) HiSeq 2000 2x50 reads were sequenced at the National 
Center for Genome Resources (Santa Fe, NM). TruSeq DNA libraries were prepared using a 300 
bp insert size. A total of 84,132,116 paired-end, 2x50 reads were generated for a total of 366x 
coverage. 
PacBio 
Long-reads were obtained from the Pacific Biosciences (Menlo Park, CA) sequencing 
technology at the National Center for Genome Resources. A total of 11 single-molecule real-time 
Single Molecule Real Time (SMRT)cells were used for sequencing. Seven SMRTcells were run 
on the Real-time Sequencing (RS)I and four SMRTcells were run on the RSII machine. The first 
nine SMRTcells used chemistry 2 polymerase 4 (C2P4) (v. 1.0) and the remaining three were run 
with C4P6 chemistry (6-hour movies) to improve read length and coverage (v. 1.5). BluePippen 
 69 
size selection at 10Kb lengths was used for sample preparation prior to sequencing.172 The 
PacBio sequencing produced a total of 226,481,107 raw reads and 206,073,836 filtered reads 
yielding 1,723,858,582 bp for a total of 74X coverage. 
Assembly Methods 
The PacBio reads were assembled using the de Bruijn graph to overlap consensus 
(DBG2OLC) read assembler,173 which is a hybrid assembler that uses Illumina reads to error-
correct longer PacBio reads, which are then assembled. The PacBio reads were also assembled 
separately, using Falcon174 (RGd-1 v. 1.0) or Canu (RGd-1 v. 1.5).175 The DBG2OLC assembly 
was then used as “trusted contigs” in SPAdes with the Illumina reads to error-correct. The 
SPAdes assembly was then filtered for contigs ≥ 1000 bp,176 after which CAP3 was used to 
improve scaffolding177 and as a clustering algorithm; and CD-HIT-EST was used to remove 
redundancies.125 Finally, Gapcloser (part of the SOAPdenovo package) was used to fill gaps 
present in the scaffolds (v. 1.0. v. 1.5).178 An additional assembly incorporated a small 
percentage of long PacBio reads that were used for scaffolding (v. 1.5). The alternative 
assembly, v. 1.5, v. 1.0 was used as “trusted contigs” in SPAdes and all other downstream steps 
remained the same for the two assemblies.   
BUSCO  
Benchmarking Universal Single Copy Orthologs (BUSCO) software provides a 
quantitative assessment of genome assemblies based on orthologs selected from OrthoDB v.9, a 
catalog of orthologous protein-coding genes for vertebrates, arthropods, fungi, plants, and 
bacteria.179 Assembly assessments were performed using the eukaryote lineage within BUSCO 3, 
which contains 303 genes that are present in at least 90% of the eukaryotic species used to 
 70 
assemble the database, though other lineages were also tried.31 BUSCOs were identified by 
tBLASTn searches that align proteins against translated DNA using the basic local alignment 
search tool (BLAST) followed by Augustus gene predictions180, 181, and classified into lineage-
specific matches using the HMMER program within the BUSCO package.  
Read Alignments and Validation 
Illumina 2x50 reads were aligned to the genome assembly using the Burrows-Wheeler 
Aligner (BWA) algorithm (version 0.7.12) with a maximum of two paths and sequence 
alignment map (sam) output format to determine how well the genome was assembled.182 
Structural Annotation 
The MAKER genome annotation pipeline was used for structural annotation of the RGd-
1 genome.183-187 The MAKER pipeline aligns the provided ESTs to the genome and creates ab 
initio gene predictions with SNAP188 and Augustus180 using evidence-based quality values. 
Following the completion of MAKER, the files were combined using fasta_merge and 
gff3_merge that are included as part of the MAKER package to obtain cumulative fasta and gff 
files that represent the annotated genes in the genome assembly.  
Functional Annotation 
The transcriptome assembly was used as EST-evidence within Maker.186, 187 The 
transcriptome was filtered for reads ≥ 500 bp. The resulting proteins were used as input for the 
web-based, functional annotation program, GENSAS (v. 6.0).189  The MAKER predicted 
proteins were used as input into KeggMapper for annotation of metabolic pathways.190  
 
 
 71 
K-mer Analysis 
A k-mer sweep was performed on the Illumina 2x50 nt short-reads using KAT.33, 191 KAT 
is a reference-free toolkit that uses k-mer frequencies and GC content to assess the quality of 
genome assemblies.34 
Concatenated Protein Phylogenetic Tree  
A concatenated protein tree was created to more accurately determine the phylogeny of 
RGd-1 through modification of the ezTree pipeline.192 Protein sequences of 18 publicly available 
algal genomes were downloaded from the JGI (https://genome.jgi.doe.gov) and ensemble 
genome portals (http://ensemblgenomes.org). The ezTree pipeline was modified to skip the gene 
prediction step, and the single-copy marker proteins that were shared among all genomes were 
identified with the reference database, Pfam v32.0193 (Supplemental table B.3). These proteins 
were aligned individually with MAFFT-L-INS-I,194, and trimmed by trimAL using a 75% gap 
threshold.195 The final proteins were concatenated to generate the final alignment file and viewed 
with FigTree.196  
16S Amplified Sequencing and Analysis 
For the associated bacteria,16S amplified sequencing was performed post-antibiotic 
treatments. The SSU rRNA gene was amplified using FD1 (5′-AGAGTTTGATCCTGGCTCAG-
3′) and 529R (5′- CGCGGCTGCTGGCAC-3′), targeting the V1-V3 region of bacteria.122, 197 
High-quality amplicons were sequenced on the Illuminaâ MiSeq platform (Illuminaâ, San 
Diego, CA) according to the manufacturer’s procedure for paired-end sequencing with 300 
cycles.198 Two PCR reactions were performed. The first PCR reaction was performed to amplify 
the region of interest after which amplicons were purified with Ampure beads. A second PCR 
 72 
was performed to index each sample to ensure sample identification for analyses. Indexed 
samples were again purified with Ampure beads and target concentrations were determined with 
Pico-green 480 nm/520 nm excitation/emission filters (Quant-IT, Invitrogen, Carlsbad, CA, 
USA) and measured on a fluorescent plate reader (BioTek, Synergy H1 Hybrid Reader). Final 
sample concentrations were normalized so that the same concentration of each sample was 
added. The PhiX control library was added at 10% of the sample DNA concentration. The 
samples and control library were pooled and sequenced.198 For analysis, reads were analyzed 
using Clark, a k-mer based clustering algorithm199 using the ribosomal database project (RDP) 
16S SSU database for identification.200 
RNA Sequencing and Transcriptome Assembly 
RNA from 9 discreet RGd-1 samples from three different experimental conditions, each 
run in biological triplicates, were extracted and purified using the NEB Monarch Total RNA 
Miniprep Kit, according to the manufacturer’s instructions. The samples of extracted mRNA 
were submitted for strand-specific RNA sequencing. IlluminaÒ (San Diego, CA) HiSeq 5000 
2x150, single-index reads were sequenced at Genewiz (South Plainfield, NJ). TruSeq Stranded 
libraries were prepared using a 300-400 bp insert size. PolyA selection was used for the removal 
of rRNA. A total of 424,066,367 paired-end reads and 127,218 Mb yield with a mean quality 
score of 38.05 (provided by the Genewiz NGS Data Report).  
Transcript Alignment and Assembly 
 Transcripts were aligned with the strand-aware aligner, HiSat2, to determine how well 
the reads aligned against the RGd-1 assembly.201  The reads were trimmed using the 
Trimmomatic function within the Trinity assembler after which the reads were assembled in de-
 73 
novo mode and filtered for lengths ≥ 500 bp.184, 185, 202 To determine the best transcriptome 
assembly method, the trimmed reads were also assembled against RGd-1 v. 1.0 in a reference-
guided transcriptome assembly in genome-guided mode within Trinity.202  
Results 
The RGd-1 genome assembly was created using 366X Coverage of Illumina 2x50 reads 
and 74X coverage of PacBio reads. As shown in Table 4.1, the RGd-1 genome assembly 
consisted of 520 contigs with a total length of 23.3 Mb. The longest contig was 372,285 bp with 
a contig N50 of 102,387 bp. Using the genome annotation tool, MAKER, the RGd-1 genome 
assembly was found to have 13,422 gene models. 
 
Table 4.1 Genome assembly statistics for two RGd-1 assembly versions, v. 1.0 and v. 1.5 (with a 
small percentage of additional long PacBio reads). 
Statistic RGd-1 (v 1.0) RGd-1 (v 1.5) 
Number of contigs 520 1537 
Genome length 23,268,988 27,896,030 
Contig N50 102,387 37,952 
Maximum contig size 372,285 243,262 
Minimum contig size 1,041 1,294 
 
Assembly Comparison 
  Two different RGd-1 draft genome assemblies were compared to determine the best 
quality assembly (v. 1.0) to proceed with further analysis (Table 4.1). The primary difference 
between the two assemblies was the incorporation of additional C4P6 PacBio reads. However, 
the majority of the PacBio reads from the C4P6 PacBio sequencing project were bacterial, and 
only approximately 5% of the total reads were assigned to class Bacillariophyceae to which 
 74 
diatoms belong. The diatom reads were used for scaffolding. However, all attempts to include 
the small fraction of long, diatom PacBio reads resulted in three times the number of contigs and 
approximately 1/3 the contig N50 (v. 1.5) compared to the original assembly v. 1.0 (Table 1.0) 
despite there being a larger number of long contigs (Figure 4.1).   
 
v. 1.0 
v. 1.5 
 
Figure 4.1 Comparison of two draft RGd-1 genome assemblies, v. 1.0 and v. 1.5. The difference 
between the two assemblies was the inclusion of an additional small PacBio dataset. This figure 
was generated using MultiQC.203 
Gene Space Completeness 
To assess genome assembly completeness, or how well the genome was constructed in 
terms of genes captured, the assembly was evaluated with BUSCO. BUSCO assumes the 
orthologous groups it uses will be identified in 90% of the organisms within that lineage. Given 
the primary and secondary endosymbiotic events that have occurred within the diatom group,204 
many genes in either the protist or eukaryota lineage may have diverged enough that they were 
 75 
not identified as complete, partial, or fragmented genes, but were instead grouped by BUSCO 
into the “missing genes” category. The RGd-1 assembly was analyzed using the eukaryota, 
chlorophyta, alveolata, and protist lineages (Table 4.2). The best results were generated when 
BUSCO was run with the eukaryota lineage, which was then used for further comparison. The 
eukaryote lineage found 57.8% single copy, 1.0% duplicate, 6.6% fragmented, and 34.6% 
missing orthologs to be captured in the genome (Table 4.3). The large percentage of missing 
orthologs may be due to the incomplete genome assembly but may also reflect evolutionary 
distance from the gene models included in the eukaryota lineage. In other words, some of the 
missing orthologs may be present but have evolved enough to not be recognized. 
 
Table 4.2 The RGd-1 v.1.0 genome assembly was analyzed using five BUSCO lineages, 
Eukaryota, Protists, Alveolata/Stramenopiles, Chlorophyta, and Embryophyta. The gene capture 
percentage was measured as a fraction of the total number of searchable BUSCOs identified in 
the assemblies. 
Alveolata 
Eukaryota Protists Chlorophyta Embryophyta 
BUSCO type Stramenopiles  
odb9 ensemble odb10 odb9 
ensemble 
Complete 58.8% 46.1% 26.1%  21.9% 7.5% 
Complete-single copy 57.8% 45.6% 26.1%  21.0% 6.9% 
Complete-duplicated 1.0% 0.5% 0.0%  0.9% 0.6% 
Fragmented 6.6% 0.5% 0.4%  3.5% 0.8% 
Missing 34.6% 53.4% 73.5%  74.6% 91.7% 
Total orthologs searched 303 215 234  2168 1440 
 
 
 
 
 76 
Table 4.3 Gene capture measured by BUSCO. A total of 303 BUSCOs were searched within the 
eukaryota lineage. 
BUSCO type Number of BUSCOs found % BUSCOs found 
Complete-single copy 175 57.8% 
Complete-duplicated 3 1.0% 
Fragmented 20 6.6% 
Missing 105 34% 
 
To distinguish between these two possibilities for missing orthologs due to an incomplete 
genome assembly or evolutionary distance, the P. tricornutum genome assembly was analyzed 
with the eukaryota lineage. P. tricornutum has a quality genome assembly where the number of 
scaffolds (33) is equivalent to the number of chromosomes35 and most genes are expected to be 
captured. We would, therefore, expect orthologues labeled as missing to be present but evolved 
beyond recognition. P. tricornutum generated 75.0% complete orthologs, 6.6% fragmented, and 
20.4% missing (Table 4.4). While the number of complete orthologs is 17% higher for P. 
tricornutum than RGd-1, indicating that there is room for assembly improvement, there are still 
20% missing orthologs in the P. tricornutum genome assembly, indicating that the eukaryota 
BUSCO lineage does not capture unique features within the diatom lineage.  
To determine how well BUSCO captures diatom genes across different genomes, four 
additional publicly available diatom genomes were also evaluated. Like P. tricornutum, T. 
pseudonana had a high percentage of captured complete orthologs (65.7%). C. cryptica, P. 
multiseries, and F. cylindrus had successively lower complete orthologs at 61.1%, 59.4%, and 
40.6%, respectively (Table 4.4). The minimum percentage of missing orthologs was 20.4% for 
P. tricornutum, which represents the “best-case scenario” of the available diatom genome 
assemblies since it is regularly updated in ensemble. To reiterate, the comparison of BUSCO 
results across six genome assemblies indicates that at least 20% of the 303 genes present in 90% 
 77 
of the species within the eukaryota lineage are divergent from the diatom lineage. If we expect to 
capture at best 75.0% of the diatom genes, the recalculated BUSCO scores for RGd-1 are 72.6% 
single copy, 1.2% duplicate, 8.6% fragmented orthologs within the eukaryota lineage. 
 
Table 4.4 RGd-1 v.1.0 genome assemblies were compared to the other publicly available diatom 
genome assemblies, P. tricornutum, T. pseudonana, C. cryptica, P. multiseries and, F. cylindrus. 
The gene capture percent was measured as a fraction of the total number of searchable BUSCOs 
identified in the assemblies. A total of 303 BUSCOs were searched within the eukaryota lineage. 
BUSCO type RGd-1 P. T. C. P. F. 
v.1.0 tricornutum pseudonana cryptica multiseries cylindrus 
Complete- 57.8% 73.3% 65.7% 61.1% 59.4% 40.6% 
single copy 
Complete- 1.0% 1.7% 1.7% 5.6% 1.0% 65.7% 
duplicated 
Fragmented 6.6% 4.6% 7.9% 7.3% 6.3% 61.1% 
Missing 34.6% 20.4% 24.0% 26.0% 33.3% 59.4% 
Transcriptome Assembly 
 The raw transcripts aligned to the RGd-1 genome assembly with an alignment rate of 
20.96%. When the reads were trimmed, the percent alignment increased to 33.31% (Figure 4.2). 
This alignment is low and potentially indicates the presence of bacterial contaminants, despite 
the RNA depletion step completed as part of the Illumina library preparation.205, 206 Nevertheless, 
the de-novo transcript assembly captured 83.8% complete (27.7% single-copy, 56.1% 
duplicated) with 5.0% fragmented and 11.2% missing orthologs. A total of 11,150 proteins were 
identified following the GENSAS EvidenceModeler annotation function.189  
 78 
  
Figure 4.2 The number trimmed paired-end reads that were uniquely aligned as pairs, had one 
mate pair uniquely, one mate mapped in multiple locations, the pairs mapped discordantly, the 
pair-end reads mapped in multiple locations, or neither read aligned. Reads were aligned using 
HiSat2 and the figure was generated using MultiQC.203 
To confirm the best method to assemble the RGd-1 transcriptome, the de novo assembly 
was compared to a reference-guided assembly using the RGd-1 v. 1.0 genome assembly.  The 
transcriptome assembly statistics in Table 4.5 show pre- and post-filtering for reads ≥ 500 bp. 
While the filtered de novo assembly had an 82% increase in the number of contigs, compared to 
the filtered reference-guided assembly, the de novo genome length was nearly 50% longer, 
indicating that a greater number of genes could be captured (Tables 4.5, 4.6).  
 
 
 
 
 
 79 
Table 4.5 Genome assembly statistics for two RGd-1 transcriptome assembly versions, de novo 
and reference-guided using the RGd-1 v.1.0 genome assembly. Each transcriptome assembly 
shows the statistics pre- and post-filtering for reads ≥ 500 bp. 
Statistic de novo  de novo filtered Reference-guided Reference-guided filtered 
Number of contigs 1,875,787 78,720 222,478 42,868 
Genome length 496,220,882 142,135,629 124,808,503 74,549,208 
Contig N50 225 3,052 845 2,853 
Maximum contig size 29,652 29,652 24,608 24,608 
Minimum contig size 129 500 180 500 
     
A comparison between the BUSCO results for the de novo transcriptome assembly and 
reference-guided assembly found 11.2% more complete BUSCOs for the de novo assembly. 
Furthermore, the reference-guided assembly had 10.3% more missing orthologs than the de novo 
assembly. While there are more complete-duplicated orthologs in the de novo assembly, they are 
still within the complete fraction and may represent true duplications.28 Based on the BUSCO 
results combined with the assembly statistics, the de novo transcriptome assembly was used for 
annotation.  
 
Table 4.6 Two different RGd-1transcriptome assemblies were compared; a de novo assembly 
and reference-guided assembly. The gene capture percent was measured as a fraction of the total 
number of searchable BUSCOs identified in the assemblies. A total of 303 BUSCOs were 
searched within the eukaryota lineage. 
BUSCO type de novo assembly Reference-guided assembly 
Complete total 83.8% 72.6% 
Complete-single copy 27.7% 35.3% 
Complete-duplicated 56.1% 37.3% 
Fragmented 5.0% 5.9% 
Missing 11.2% 21.5% 
 
 80 
Assembly Completeness 
To validate the RGd-1 genome assembly, raw Illumina 2x50 nt reads were aligned 
against the genome assembly using the Burroughs Wheeler Aligner (BWA)207 algorithm to 
estimate the percentage of the genome that was assembled. The Illumina capture percent was 
74.9% across the entire assembly, indicating that approximately 25% of the reads did not align to 
the genome, likely reflecting missing parts of the assembly, large repetitive regions that are not 
expected to be captured by the short reads or contaminating bacterial sequences.  
A K-mer sweep can reveal insight about a genome assembly including contamination, 
and genome complexity.34, 208 As shown in Figure 4.3, the sweep revealed two peaks with the 
coverage (the number of reads containing a specific kmer, here kmer-27 was used for the kmer 
sweep) of the rightmost peak approximately two times the size that of the left peak. The left peak 
likely represents sequences from heterozygous regions in which haplotypes generated different 
k-mers with half as much coverage as those in the right peak, which were likely derived from 
homozygous regions (Figure 4.3).  
	
	
 81 
 
Figure 4.3 A k-mer sweep generated by KAT34 using a k-mer length of 27. The analysis was 
performed using the paired-end, 50 bp, Illumina reads. 
Comparative Genomics 
 The diatoms,  C. cryptica, F. cylindrus, P. multiseries,  P. tricornutum, and T. 
pseudonana were each aligned to the RGd-1 genome assembly to determine the percentage of 
nucleotides in common across their entire genomes (Table 4.7).  
 
 
 
 82 
Table 4.7 Whole-genome alignments using BWA mem where the RGd-1 v. 1.0 genome 
assembly was indexed and the other assembly was queried. The publicly available diatom 
genomes, C. cryptica, F. cylindrus, P. multiseries, P. tricornutum, and T. pseudonana were each 
aligned to RGd-1 on the nucleotide-level. 
  
Genomes Aligned % Aligned 
RGd-1 v. 1.0 C. cryptica 3.29 
RGd-1 v. 1.0 F. cylindrus 13.38 
RGd-1 v. 1.0 P. multiseries 0.59 
RGd-1 v. 1.0 P. tricornutum 5.0 
RGd-1 v. 1.0 T. pseudonana 33.33 
 
There were generally low overall alignment percentages on the nucleotide level between 
RGd-1 and the marine diatoms, with the highest being T. pseudonana at 33.33% on a whole-
genome level. This is surprising given that T. pseudonana is a centric diatom and RGd-1 is a 
pennate diatom. The centric and pennate diatoms are thought to have diverged approximately 
130 million years ago during the Cretaceous Period.209 Given the extremophilic, freshwater 
habitat of RGd-1, it is possible that RGd-1 and potentially other extremophilic freshwater 
diatoms have greater divergence from the marine pennate lineage than previously recognized.  
Using a concatenated protein tree to determine phylogeny consisting of 15 publicly 
available algae including; 1 red alga, 1 brown alga, 10 green algae and 3 diatom genomes and 
RGd-1, RGd-1 was found to be most similar to P. tricornutum and P. multiseries (Figure 4.4) 
which is compatible with the shared pennate morphology of RGd-1, P. tricornutum, and P. 
multiseries which may be a reflection of different adaptations that have occurred as a result of 
the different evolutionary pressures from their respective environments. This result differs from 
the whole genome alignment on a nucleotide-level, which indicated the most shared nucleotide 
space with T. pseudonana.  
 83 
Cyanidioschyzon merolae
0.6464
Galdiaria sulphuraria
Ostreococcus tauri
0.757
0.563
Micromonas pusilla
0.7076 Scenedesmus obliquus
0.5935
Monoraphidium neglectum
0.6402
Duneliela salinas
0.5967
Volvox carteri
0.5145
Chlamydomonas reinhardtii
Thalassiosira pseudonana
RGd-1
0.5438
0.4563
0.8436 Phaeodactylum tricornutum
0.507
0.7346 Pseudo-nitzschia multiseries
0.388
Fragillariopsis cylindrus
0.7813 Aureococcus anaphagefferens
Nannochloropsis gaditana
0.7459
Ectocarpus siliculosus
Emiliania huxleyi
0.09  
Figure 4.4 The protein sequences from fifteen publicly available genome annotations (1 red alga, 
1 brown alga, 11 green algae and 3 diatoms) and RGd-1 were used to construct a concatenated 
protein using a modified ezTree pipeline.192 RGd-1 was phylogenetically closest to P. 
tricornutum on a protein-level. The proteins were trimmed using trimAL195 and aligned with 
MAFFT-L-INS-i.194 The scale represents the bootstrap values.  
 
Figure 4.5 shows the extent to which RGd-1 and P. tricornutum are related on a 
nucleotide and protein level. The x-axes contain the P. tricornutum assembly chromosomes. The 
y-axes contain the 520 current contigs in the RGd-1 assembly. If P. tricornutum and RGd-1 were 
perfectly syntenous, the slope of the line would be 1.0. As shown in Figure 4.5, RGd-1 differs 
significantly on a nucleotide level, but is more syntenic on a protein level. The lack of synteny 
between P. tricornutum and RGd-1 indicates that on a nucleic acid level, the two diatoms have 
evolutionarily diverged. However, on a protein level, the two diatoms have more conservation, 
indicating that shared proteins may have similar functions. 
 84 
nucleotides proteins
Phaeodactylum scaffolds	(chromosomes)
 
Figure 4.5 Comparison of P. tricornutum and RGd-1 assemblies based on amino acid sequences. 
The x-axes contains the scaffolds for the P. tricornutum assembly and the y-axes contains the 
520 scaffolds for the RGd-1 genome assembly. Within the mummer packager, promer translates 
the nucleic acid-based assemblies into amino acids.210 Perfectly syntenous assemblies would 
have a slope of 1.0.  
 
RGd-1 Genome-based Metabolic Pathway Analysis 
 Annotated pathways were approximately 80% complete, as determined by the number of 
genes found within each pathway. This matches the BUSCO scores with complete total orthologs 
of 83.8% for the de novo transcriptome assembly that was used for annotation.  The missing 
genes may be explained by an incomplete assembly or because the genes are evolutionarily 
distant from currently known genes in the databases.31  
 RGd-1 was found to have a nearly complete carbon fixation pathway (Figure 4.6). There 
is evidence that RGd-1 uses C4-metabolism and C3-metabolism in Figure 4.6. C4 metabolism is 
more energetically expensive than C3 metabolism. However, C4 metabolism is more 
photosynthetically efficient.60 Organisms that use C4-metabolism use the 4-carbon intermediate, 
oxaloacetate as the primary product of carbon fixation. C4 metabolism differs from C3 
RGd-1	scaffolds
 85 
metabolism by the separation of two intracellular compartments, allowing for two 
carboxylations. There is evidence that the primary carboxylation for the T. pseudonana, P. 
tricornutum and F. cylindrus occurs by the carboxylation of HCO3- with phosphoenolpyruvate 
carboxylase (PEPCase) resulting in oxaloacetate and inorganic phosphate in the outer 
compartment of the chloroplast.204 Oxaloacetate is transformed into another four-carbon acid 
such as malate or aspartate and transported from the outer compartment to the inner compartment 
of the chloroplast.204 The second carboxylation occurs when the CO2 is fixed by Rubisco and the 
Calvin-Benson cycle. Following decarboxylation of the four-carbon intermediate, a three-carbon 
intermediate such as pyruvate or alanine are released to the outer compartment of the 
mitochondria. The results presented here indicate that pyruvate-phosphate dikinase (PPDK) to 
produce phosphoenolpyruvate with aspartate as an intermediate. Aspartate moves back into the 
mitochondria where it is converted to oxaloacetate by aspartate aminotransferase.204 These 
results are consistent with the analysis by Valenzuela et. al. 2012 on P. tricornutum.67  
 
 86 
 
Figure 4.6 Annotated carbon fixation pathway in photosynthetic organisms. The green boxes 
represent genes that are present in the RGd-1 genome. The annotated pathway was produced 
using KeggMapper.190 
Figure 4.7 shows many core RGd-1 genes present in the TCA cycle. However, the 
following genes were not identified; aconitase and isocitrate dehydrogenase. Interestingly, 
isocitrate lyase was identified in the glyoxylate and dicarboxylate pathway (Figure 4.8) where 
isocitrate is converted to succinate with glyoxylate as an intermediate, followed by the presence 
of malate synthase to form malate. The carboxylation of malate then forms malonyl-CoA via 
acetyl-CoA-carboxylase after which the malonyl-CoA feeds into fatty acid biosynthesis. Overall, 
the glyoxylate and dicarboxylate pathway bypasses two decarboxylations that normally occur in 
the TCA.211 It may be possible that the use of the carbon conserving glyoxylate shunt that 
bypasses removal of CO2, may partly explain why RGd-1 can accumulate high concentrations of 
lipids.49  
 87 
 
Figure 4.7 Annotated Citrate Acid Cycle pathway. The green boxes represent genes that are 
present in the RGd-1 genome. The annotated pathway was produced using KeggMapper.190 
 
Figure 4.8 Annotated Glyoxylate and Dicarboxylate Metabolism. The green boxes represent 
genes that are present in the RGd-1 genome. The annotated pathway was produced using 
KeggMapper.190  
 88 
RGd-1 was found to have nearly complete metabolic pathways for fatty acid elongation 
using acetyl-CoA in the mitochondrion or malonyl-CoA in the cytoplasm. The ability to switch 
between two different initial substrates may confer an advantage for fatty acid and neutral lipid 
biosynthesis. In September 2013, Witch Creek had a total nitrogen (TN) and dissolved inorganic 
carbon (DIC) concentrations of 125 ppb and 13.5 ppm, respectively. The ability to accumulate 
high lipid concentrations provides an ample storage of carbon and electrons that can be recycled 
and used to generate ATP during low nutrient conditions.  
 RGd-1 had a nearly complete fatty acid degradation pathway. Occurring in the 
mitochondria, fatty acid catabolism is important for recycling carbon to generate acetyl-CoA for 
the TCA cycle. Fatty acid oxidation involves multiple cycles of oxidation causing cleavage at the 
C𝛼-C𝛽 at the 𝛽-carbon. Each catabolic cycle generates 2 carbon molecules in the form of acetyl-
CoA and releases 4 electrons generated as NADH and FADH2 driving the electron transport 
pathway and oxidative phosphorylation for the production of ATP.212  
Glycerolipid metabolism is important for the biosynthesis of neutral lipids such as TAGs. 
Glycerolipids are formed from a precursor, phosphatidic acid. Glycerolkinase catalyzes the 
phosphorylation of glycerol, forming glycerol-3-phosphate, the glycolysis intermediate and 
building block for glycerolipid synthesis. Acylated at the 1 and 2, positions, producing glycerol-
3-phosphate acyltransferase. Acyltransferase adds the first acyl chain, forming 
monoacylglycerides (MAG), with the concomitant reduction of the backbone with NADPH and 
catalyzed by acyldihydroxyacentone phosphate reductase. The addition of acyl groups leads to 
the formation diacylglyderides (DAG) and triacylglyderides (TAG).212  
 
 
 89 
 
 
Figure 4.9 Annotated fatty acid metabolism. The green boxes represent genes that are present in 
the RGd-1 genome. The annotated pathway was produced using KeggMapper.190 
 
 90 
The RGd-1 genome had multiple copies of Ricin B lectin domain that is normally 
identified in castor beans and the ergot fungus, Claviceps,213 with e-values ranging from 0.001-
7.848e-7. Ricin has been found to have antifungal activity and was co-expressed with the Palmitic 
acid-specific elongase gene in the diatom, Chaetoceros gracilis, and may be transesterified to the 
glycerol backbone. 213  
 
Figure 4.10 Annotated Fatty Acid Degradation Metabolism. The green boxes represent genes 
that are present in the RGd-1 genome. The annotated pathway was produced using 
KeggMapper.190 
 
 91 
A simple experiment may provide the answer to this hypothesis: split the RGd-1 culture into 2 
“lines”. One line will be transferred on strict timelines every two weeks. The other line will be 
transferred less often. It is hypothesized that the older culture will accumulate ricinoleic acid or a 
derivative if it is produced by RGd-1. Older cultures may have slower growth rates and distorted 
cell morphologies. The production of ricin, in the form of ricinoleic acid, may potentially 
contribute to the high concentrations of lipids produced by RGd-1. There is supporting evidence 
that RGd-1 may produce ricinoleic acid, an unsaturated C18 fatty acid in Moll et al., 2014, where 
C18:1-3 unsaturated fatty acids were the third most abundant in the RGd-1 FAME data. I 
hypothesize that the production of ricin may confer an anti-predatory or other fitness advantage 
in Witch Creek where there is a constant flow of creek water that can diffuse ricin away from the 
cells. Potentially, ricin may have at least two benefits, where it contributes to the pool of long-
chain fatty acids and protects them from predation. However, in batch culture, the production of 
ricin may become toxic to the cells, especially for older cultures. 
 
 92 
 
Figure 4.11 Annotated Glycerolipid Metabolism. The green boxes represent genes that are 
present in the RGd-1 genome. The annotated pathway was produced using KeggMapper.190 
 
 
 93 
Phycosphere Bacteria 
 Observations suggest RGd-1 grows in close association with bacteria, becoming unviable 
in antibiotic treated cultures. Nine different bacterial taxa were identified in the RGd-1 culture. 
Two possible functions, among many, for RGd-1 phycosphere bacteria are (1) aiding in arsenic 
reduction and uptake and (2) providing siderophores that may improve iron bioavailability in 
alkaline environments. The three most abundant classified taxa were Agrobacterium 40.9%, 
Geobacter 11.1%, and Riemerella 6.2%. Each of these genera is known to produce siderophores, 
including catechols (Rhizobium and Afipia)214 and hydroxamates (Brevundimonas). Interestingly, 
both Rhizobium and Afipia contain many species of nitrogen-fixing bacteria often associated with 
plant roots. Rhizobium, Brevundimonas, and Afipia are known for siderophore production, which 
may be beneficial for diatom growth especially in alkaline environments such as the one from 
which RGd-1 was isolated.215 In all, the seven populations observed here may be able to produce 
siderophores as other strains within those genera have consistently been found to produce 
siderophores. Specifically, members of the genera Geobacter, Agrobacterium, Brevundimonas, 
Magnetococcus, Niastella, and Riemerella have been observed to produce siderophores or have 
the genetic potential to produce siderophores.214, 216-221 Fifty-six percent of the bacteria observed 
here were unable to be classified using the RDB database for identification within CLARK, 
indicating taxonomic divergence from the reference organisms (Table 4.7).  
 As part of RGd-1 PacBio C4P6 sequencing, a bacterial genome was also assembled, and 
may represent a potential symbiont.222 The bacterial genome had a 99% identity and 44% query 
coverage with Brevundimonas sp. within the Caulobacteraceae family. Due to the low query 
coverage, Brevundimonas sp., identified as KM-427, may represent a new species. KM-427 was 
found to have genes for siderophore biosynthesis (enterobactins and brucebactins), ferrichrome 
 94 
iron receptors, Fe outer membrane receptor proteins, ferrochelatase, ferric uptake regulation 
protein (FUR), ferric iron ABC transporter and ferric enterobactin receptor. Further, this 
Brevundimonas sp., KM-427 was also found to have genes for arsenic resistance and reduction: 
ArsH (arsenic resistance protein), ACR3 (arsenic resistance protein), ArsR (transcription 
regulatory protein), arsenate reductase glutaredoxin and arsenical resistance operon repressor. 
These genes may be important for survival in Witch Creek, a low-iron and high arsenic 
environment.223 
Table 4.8 Identification for 16S amplified sequencing in the RGd-1 culture. Organisms were 
identified using the 16S RDB Database within CLARK.199, 200 Each organism was calculated for 
the percentage of all of the categories below (the 9 genera identified and the unknowns) and the 
percentage classified (the 9 genera excluding the unknowns).   
Name Tax ID Count Percentage of All (%) Percentage of Classified (%) 
Agrobacterium 373 28924 17.82 40.89 
Geobacter  351604 7819 4.82 11.05 
Riemerella  34085 4390 2.71 6.20 
Niastella  354356 3899 2.40 5.51 
Magnetococcus  1124597 3236 1.99 4.57 
Halothiobacillus  927 3236 1.85 4.24 
Brevundimonas  74313 2854 1.76 4.03 
Agrobacterium H13 2545 1.57 3.60 
Sulfuricurvum 148813 1626 1.00 2.30 
UNKNOWN N/A 91472 56.38 N/A 
 
Discussion 
Genome Observations 
Diatoms have immense, untapped biochemical potential due to their strong CO2 
utilization activity and ability to produce unique compounds, but with only five sequenced 
diatom genomes, our knowledge of these organisms is very limited. The five publicly available 
diatom genome assemblies are marine in origin, with no freshwater diatom genomes available. 
 95 
The addition of the genome from the extremophilic, freshwater strain, RGd-1, will provide 
further insight about diatom taxonomy, physiology, metabolism, and evolution. 
The RGd-1 draft genome has 520 contigs and 57.8% complete single-copy BUSCOs 
using the eukaryota lineage. When corrected for the total amount of complete orthologs that P. 
tricornutum has, RGd-1 only has 83.8% complete orthologs and 11.2% missing. This indicates 
that while the BUSCO scores can inform the gene capture rate based on the quality of the 
genome assembly, it is dependent on the gene models that were used to create each lineage and 
the diatom lineage is evolutionarily distant from other organisms included in presented 
comparisons. Since there is a 17% difference between the number of complete orthologs found 
in the RGd-1 and P. tricornutum assemblies, this indicates that while the RGd-1 genome 
assembly was able to capture a large proportion of orthologs, there is room for genome assembly 
improvement to capture more gene space.   
RGd-1 was the most similar to T. pseudonana on a nucleotide level. This is surprising 
because RGd-1 and T. pseudonana belong to different lineages within class Bacillariophyceae, 
the pennates, and the centrics, which describes the diatom cell morphology. Considering that 
each of the diatoms with publicly available genome assemblies is marine in origin, it is 
reasonable that there would be a divergence between freshwater and marine diatoms. RGd-1 and 
T. pseudonana may have convergent shared functions that cause them to have a greater sequence 
identity. Despite RGd-1 being the most evolutionarily distant on a nucleotide level from P. 
tricornutum, when translated to proteins, there is more in common between these two diatoms 
indicating that there likely is a shared function for many proteins. 
 The de novo transcriptome assembly had better assembly statistics and BUSCO scores 
than the reference-guided transcriptome assembly which is consistent with the idea that any 
 96 
errors in the genome assembly will be carried into the transcriptome assembly. In de novo mode, 
there is no reference bias and the reads can align optimally.  
Metabolic Observations 
RGd-1 was found to have nearly complete central carbon metabolism pathways. 
Specifically, RGd-1 showed a nearly complete glyoxylate pathway, where the TCA cycle is 
diverted from isocitrate to malate, conserving two carbons that would have been lost as 
decarboxylation steps in the TCA cycle (Figure 4.12). This is a potential contribution as to why 
RGd-1 can accumulate such high concentrations of palmitoleic acid (C16:1), palmitic acid 
(C16:0), C18:1-3 and eicosapentaenoic acid (C20:5).49 While P. tricornutum has been shown to 
use the glyoxylate shunt,224 to the best of our knowledge, no one has contextualized the 
glyoxylate shunt in terms of its potential role in improving lipid accumulation and possibly for 
biofuel production. Further analysis using differential gene expression may shed important light 
on the level of expression of genes in the glyoxylate shunt pathway under different lipid 
accumulating conditions such as carbon deplete and replete growth conditions.  
 
 97 
 
Figure 4.12 The citric acid cycle with the glyoxylate and dicarboxylate pathway which diverts 
isocitrate to malate.225 The two enzymes, isocitrate lyase and malate synthase modify the citric 
acid cycle avoiding two decarboxylation steps resulting in the formation of malate from 2 
molecules of acetyl-CoA.225  
 The organisms inhabiting Witch Creek experience a wide range of conditions throughout 
the year. The shallow stream experiences temperature shifts from the hot effluent channels that 
feed into the stream, while being impacted by high levels of snowfall during the winter months. 
In addition to the temperature shifts, the creek receives high concentrations of silica and arsenic 
from the effluent channels and is generally limited in nitrogen and inorganic carbon. The ability 
to accumulate high concentrations of lipids via different metabolic pathways offers a greater 
potential for success during the bleaker months of Yellowstone National Park. Through 𝛽-
 98 
oxidation of long-chain fatty acids, RGd-1 can recycle carbon and electrons towards 
maintenance energy during the harsher months when there are fewer nutrients in an already 
nutrient-limited environment and the temperature is sub-optimal for diatom growth.  
Bacterial Cohabitants 
RGd-1 grows well as a unialgal culture. The effects of the bacterial co-habitants are 
currently unknown. The relationships between bacteria and algae can be complex. Phaeobacter 
galliensis has been found to secrete different compounds to either stimulate growth (auxins) of 
Emiliania huxleyi or induce cell lysis with algaecides for older cells, indicating that P. galliensis 
can switch from symbiotic to a parasitic relationship with E. huxleyi depending on the stage of 
growth of the cells.226 While most algal cultures, including diatoms, have been grown as axenic, 
or unialgal cultures, there has been a recent paradigm-shift towards growing algae in mixed 
cultures to provide more stable growth and productivity.148, 227 Studying the mechanisms of 
cross-feeding between phototrophs and heterotrophs, a subject about which relatively little is 
known, may provide fundamental insight into diatom growth and critical microbial community 
interactions.  Further, it is important to devise diatom growth strategies that provide the best 
opportunities for improving the rate and extent of biofuel precursors and high-value products. 
Understanding phototroph-heterotroph interactions would improve the ability to implement 
diatom growth strategies that exploit these interactions.  
The 16S amplified community analysis was performed post-antibiotic treatments after it 
was discovered that Brevundimonas sp. was present and the majority of the original PacBio reads 
were prokaryotic. Given the distribution of the bacteria in the RGd-1 culture it is surprising that 
the dominant bacterium, Agrobacterium, was not sequenced and assembled as part of the PacBio 
sequencing project. Following this discovery, aggressive antibiotic treatments were employed 
 99 
and MiSeq 16S sequencing was used to determine what bacteria were potentially still present in 
the RGd-1 culture. Therefore, these treatments likely decreased Brevundimonas sp. 
concentrations and the overall bacterial community would have shifted as a result.  
The association of bacteria with diatoms within a phycosphere can be thought of as a 
diatom microbiome. This is an important concept in the study of algae and is often overlooked. 
While some bacteria associated with diatoms have been identified, their specific interactions are 
much less well defined,44 with the literature being particularly scarce for non-marine diatoms. 
Bacteria attached to diatom frustules have been found to exchange substances such as vitamins, 
indole, and organic carbon compounds with their associated diatom and to receive protection 
from predators (Figure 4.13).161-164 Attached bacteria have been hypothesized to promote organic 
matter degradation, which may or may not result in CO2. I hypothesize that this may be one 
explanation for the overall lower TAG accumulation that is observed in mixed communities 
compared to unialgal cultures.102  
When carbon is passed between bacteria to algae increasing community biomass, 
sedimentation occurs providing organic carbon to benthic communities.44, 159 However, it is 
currently not known whether freshwater diatoms and bacteria form symbiotic relationships to 
overcome the diatom’s inability to produce iron-scavenging siderophores. These inter-domain 
interactions are anticipated to be especially important in high-pH environments, characterized by 
very low iron concentrations, such as Witch Creek, the environment from which RGd-1 was 
isolated. 
 100 
 
Figure 4.13 Potential mechanisms of symbiosis between marine diatoms and bacteria. 
Phytoplankton, such as diatoms may provide dissolved organic carbon (DOC), particulate 
organic carbon (POC), and other complex algal polysaccharides. The bacteria may supply 
micronutrients, macronutrients, and vitamins such as B12.159,161-164 
Multiple bacteria identified in the RGd-1 culture (Table 4.8) are closely related to organisms that 
have been shown to produce siderophores. Various Agrobacterium strains have been found to 
produce hydroxamates or agrobactin.214, 217 Brevundimonas diminuta has been found to produce 
siderophores within the rhizosphere of Oryza sativa (rice) as determined by the CAS assay.228, 229 
Niastella is a member of the phylum Bacteroidetes and multiple species within the Niastella 
genus and related strain, Arachidicoccus rhizosphaerae, have been found to produce plant 
promoting properties such as indole-3-acetic acid, siderophores (as detected by the CAS assay), 
and NH3.220, 229 Sulfuricurvum also has a complete siderophore biosynthesis pathway. 
Furthermore, G. uraniireducens is capable of dissimilatory iron reduction (DIRB),230, 231 which, 
in addition to siderophore production, may provide a unique source of iron availability by 
reducing Fe(III) to Fe(II), a more biologically available form. It is surprising that Geobacter 
 101 
constituted 4.82% of the total bacteria identified in the RGd-1 culture given its strict anaerobic 
growth requirements in a highly oxic environment.220 One explanation may be that there are 
microenvironments on or within the RGd-1 frustules that are shielded from oxygen. Witch Creek 
has a pH of 9.3 and is very low in iron. It is possible that in this environment, symbioses may 
form between diatoms and phycosphere bacteria so that the bacteria chelate any iron that may be 
available in Witch Creek and bring the iron in close proximity to the diatom for cell surface 
reduction and/or uptake. 
Conclusions 
Diatom strain RGd-1 is a novel diatom that is genetically divergent from the marine 
diatoms with publicly available sequenced genomes. A comparison of two genome assemblies 
showed that although one contained longer read lengths, it had worse genome statistics and 
BUSCO scores. Therefore, RGd-1 v. 1.0 was used for all downstream analyses. The de novo 
transcriptome assembly had better assembly statistics and BUSCO scores compared to the 
reference-guided transcriptome assembly and was used for the RGd-1 v. 1.0 genome annotation. 
The RGd-1 genome annotation revealed evidence that RGd-1 has the glyoxylate shunt pathway 
that could be used as a carbon conservation strategy; a mechanism that may contribute to 
observations that RGd-1 can accumulate very high concentrations of neutral lipids. The 
glyoxylate shunt has been identified in T. pseudonana, P. tricornutum232, and C. fusiformis.233 
Next-generation sequencing was used to identify bacteria found in the RGd-1 cultures. Important 
dynamics may be occurring between RGd-1 and its phycosphere bacterial community. While 
cross-feeding of vitamins and nutrients between diatoms and bacteria have been documented, 
there may be other important mechanisms of symbiosis that, to date, are largely unexplored. 
 102 
Nine bacterial community members were observed that may play key functions (e.g. iron 
transfer), and contribute to diatom health in low iron and high arsenic environments.  
Algae are often claimed to be axenic in the literature. The inclusion of next-generation 
sequencing has increased the sensitivity to determine bacterial contaminants, cohabitants, and 
potential phycosphere organisms. Using GC content and codon bias of observed sequences, we 
were able to assemble the genome of a new species of Brevundimonas, which was shown to have 
the genetic potential to reduce and assimilate heavy metals such as arsenic. Future work is 
needed to elucidate the stability and various functions of the RGd-1 phycosphere community.  
  
 103 
CHAPTER FIVE 
GENOME SEQUENCE FOR AN NOVEL BREVUNDIMONAS STRAIN 
Contribution of Authors and Co-Authors 
Manuscript in Chapter 5 
 
Karen M. Moll, Nico Devitt, Thiruvarangan Ramaraj, Joann Mudge, Brent M. Peyton  
 
Author: Karen M. Moll 
 
Contributions: Performed the analysis, wrote the paper 
 
Co-Author: Thiruvarangan Ramaraj 
 
Contributions: Provided mentorship and guidance for the analyses 
 
Co-Author: Nicholas P. Devitt 
 
Contributions: Provided mentorship and guidance for the analyses 
 
Co-Author: Joann Mudge 
 
Contributions: Provided mentorship and guidance for the analyses and paper writing 
 
Co-Author: Brent M. Peyton 
 
Contributions: Provided mentorship and guidance for the analyses and paper writing 
 
 
 
 
 
 
 
 104 
Manuscript Information 
Karen M. Moll, Nico Devitt, Thiruvarangan Ramaraj, Joann Mudge, Brent M. Peyton  
Microbiology Resource Announcements 
Status of Manuscript:  
___x Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
____ Published in a peer-reviewed journal 
 
  
 105 
Abstract 
Brevundimonas sp. strain KM-427 was found to potentially be a phycosphere bacterium 
associated with the extremophilic diatom, RGd-1. Here, we present the complete genome 
assembly and annotation for Brevundimonas sp. strain KM-427 that was sequenced as part of a 
PacBio sequencing project for the diatom, RGd-1. This genome provides insight into the 
Brevundimonas genus.  
Announcement 
Diatom strain, RGd-1, was isolated from an alkaline stream in Yellowstone National 
Park, WY, USA.49 Genomic DNA from the unialgal culture was extracted using a phenol-
chloroform extraction to isolate high-molecular-weight DNA. To sequence this DNA, 2 µg of 
purified high-molecular-weight DNA was prepared. The BluePippen size selection at 10 Kb 
lengths was used for sample preparation prior to sequencing.234 Three SMRTcells were run with 
C4P6 chemistry (6-hour movies) to improve read length and coverage at The National Center for 
Genome Resources (Santa Fe, NM). The resulting sequence was approximately 5% diatom and 
95% bacterial, eventually determined to belong to genus, Brevundimonas.  
Brevundimonas sp. was sequenced and assembled completely into one contig with a 3.1 
Mb genome length and 68.85% GC content (Table 5.1). The reads were assembled using Canu 
and two genomes were assembled, diatom strain RGd-1 and Brevundimonas strain, KM-427. 
The Brevundimonas sp. genome was annotated using Prokka (version 1.12) and separately with 
RAST.235-237 KM-427 had 99% identity and 44% query coverage with Brevundimonas sp. within 
the Caulobacteraceae family. Due to the low query coverage, Brevundimonas sp., KM-427, may 
represent a new species.  
 106 
 
Table 5.1 Genome assembly statistics for Brevundimonas sp., strain KM-427. The genome was 
assembled with Canu as part of an RGd-1 PacBio sequencing project.175 
  
Assembly Statistic KM-427 
Number of contigs 1 
Genome length 3,090,090 
Contig N50 3,090,090 
Maximum contig size 3,090,090 
Minimum contig size 3,090,090 
GC content (%) 68.85 
 
A total of 70.9% benchmarking universal single-copy orthologs (BUSCOs) were 
identified, 0.7% fragmented, and 28.4% missing BUSCOs within the bacteria lineage (Table 
5.2). Since KM-427 was assembled entirely into one contig, the missing BUSCOs may represent 
unique orthologs that were not present in the bacteria lineage.31 
Table 5.2 Gene capture measured by BUSCO. A total of 148 BUSCOs were searched within the 
bacteria odb9 lineage. 
BUSCO type Number of BUSCOs found % BUSCOs found 
Complete-single copy 175 57.8% 
Complete-duplicated 3 1.0% 
Fragmented 20 6.6% 
Missing 105 34% 
 
The Brevundimonas genome annotation revealed genes involved with copper 
homeostasis, arsenic resistance, and cobalt, zinc, and cadmium resistance (Figure 5.1).236, 237 
KM-427 was found to contain genes for arsenic resistance, reduction, and uptake. The arsenic 
resistance operon with the following genes identified; arsH, acr3, arsR, and arsM (Figure 5.1). 
Arsenic metabolism may be important for the survival and adaptation of Brevundimonas sp. 
strain KM-427 since its natural environment containing 300 ppb of arsenic.101  
 107 
The KM-427 genome annotation revealed genes for producing siderophore-producing 
genes that are used for chelating iron in iron-limiting environments. TonB and Tol transport 
systems and hemin transport systems were identified that are known to be involved in ferric 
siderophore transport and ABC transporters, ferrous iron transport protein A, ferrichrome iron 
receptor, ferric uptake regulation protein, FUR, ferric iron ABC transporter ATP-binding protein, 
iron-uptake factor PiuB, ferrous iron transport protein B, ferric iron ABC transporter iron-
binding protein, and ferric iron ABC transporter permease protein. In addition, KM-427 was 
found to contain enterobactin transferase and brucebactin synthetase genes. Further, 43% of the 
annotated genes are considered “hypothetical proteins”, and it is possible that more siderophore 
biosynthesis genes are within this fraction.   
 Siderophore biosynthesis may also be beneficial to the diatom, RGd-1 from which KM-
427 was identified and isolated. Previous studies have shown the ability of diatoms to utilize 
siderophores produced by bacteria.238, 239 In this case, siderophores may provide RGd-1 with 
bioavailable iron in an otherwise iron-limited environment.  
 
Figure 5.1 The Brevundimonas sp. strain, KM-427, genome annotation major features. 
 108 
Data availability. The complete genome assembly is available under the following Genbank 
BioSample accession number, SAMN12024285, strain 2588940, and BioProject PRJNA548375. 
 
 
 
 
  
 109 
CHAPTER SIX 
SUMMARY AND FUTURE DIRECTIONS 
Synopsis 
This dissertation focused on alkaliphic algae for biofuel applications. Algae have a strong 
potential in the renewable energy sector, however more research needs to be performed to bring 
the price at the pump equivalent to petroleum-based fuels. The combustion of petroleum-based 
fuels adds CO2 to the atmosphere and is ultimately environmentally polluting and the combustion 
of petroleum-based fuels adds CO2 into the atmosphere.52, 240 As a near carbon neutral 
biotechnology, algae provide a more environmentally sustainable source of fuels.  
A type of algae that forms distinctive silica shells, diatoms are critical to ecosystem 
health at a global scale and have the capacity to buffer climate change. Producing at least 25% of 
Earth’s atmospheric oxygen, diatoms also fix ~25-45% of global CO2 directly mitigating the 
major driver behind climate change.12 However, diatoms are not immune to the effects of climate 
change, manifested in aquatic systems as rapid temperature fluctuations and acidification. As a 
result, a drastic global decline in diatoms has recently been observed that will directly reduce 
oxygen contributions and carbon fixation, further compounding climate change.241 Such 
compounding factors serve to accelerate climate change, and with very limited time to correct, 
now is the critical moment to truly understand diatom ecology, including the relationships with 
bacterial symbionts starting at the genomic-level. These relationships are vital to diatom 
productivity and survival, and are thus critical to oxygen production and carbon fixation. 
Without a fundamental understanding of these symbiotic relationships, diatoms could succumb 
to climate change faster than they can buffer its effects. While some work has studied symbiotic 
 110 
relationships of marine diatoms, little is known about freshwater diatom symbiotic relationships, 
despite their often dominant presence in these systems. 
Strain Selection 
 Algal strain selection has been an important focus for algal biodiesel research. We can 
select strains from environments with qualities most similar to those used in outdoor raceway 
ponds. Halophiles have been selected due to their salt tolerance as salinity increases with 
evaporative loss. In addition, conditions like high salinity or alkalinity can limit colonization by 
competing microorganisms. Here, we focused on isolating alkaliphilic strains found to be most 
productive for biofuels based on growth rate, dry cell weight, and lipid accumulating abilities.  
Algal lipid concentration is influenced by several factors, including carbon availability. 
Some algae strains have been shown to increase their lipid concentrations when sodium 
bicarbonate is added just prior to nitrate depletion. However, not all strains respond to the 
sodium bicarbonate addition, and further work is needed to elucidate why some strains respond 
and some do not. Diatom strain, RGd-1, accumulated approximately two times the lipid 
concentrations compared when given sodium bicarbonate. Because RGd-1 was able to 
accumulation 70-80% FAMEs w/w for ash free dry weight, interest was generated as to why it 
was able to accumulate such high concentrations of lipids.49 The first step toward understanding 
its lipid metabolism was to sequence, assemble and annotate the RGd-1 genome. An annotated 
genome facilitates the identification of potentially novel genes or potential pathways that enable 
RGd-1 to accumulate very high lipid concentrations. 
 Using the RGd-1 genome assembly and annotation, it was possible to determine 
metabolic pathways used by RGd-1. RGd-1 had complete lipid synthesis and degradation 
 111 
pathways and near-complete carbon metabolism pathways such as the TCA cycle, carbon 
fixation and uses they glyoxylate shunt within the TCA cycle. The glyoxylate shunt is of 
particular interest because it is well known to be a carbon conserving pathway that avoids two 
decarboxylations that occur in the full TCA cycle between isocitrate to ⍺-ketoglutarate and ⍺-
ketoglutarate to succinyl-CoA with the enzymes, isocitrate dehydrogenase and oxalosuccinate 
decarboxylase, respectively. Use of the glyoxylate shunt may be one mechanism that allows 
RGd-1 to accumulation high concentrations of lipids.  
 In carbon-limited environments, like alkaline systems, it may be advantageous for a 
microorganism such as RGd-1 to develop lipid accumulating and carbon-conservation strategies 
for survival and maintenance. At alkaline pH, there is a greater flux of atmospheric CO2 into the 
stream that will enrich the DIC concentration, allowing for greater access to carbon. However, 
any competing microorganisms that grow at a faster rate, such as cyanobacteria, will be able to 
utilize these resources leaving Witch Creek in a regular state of nutrient limitation. The ability 
for RGd-1 to utilize the glyoxylate cycle, use both acetyl-CoA or malonyl-CoA as starting 
substrates for fatty acid biosynthesis, and potentially use ricin as a fatty acid source may all 
contribute to high fatty lipid concentrations. Having a collection of lipid accumulating and 
carbon conserving strategies increase the potential for RGd-1’s success.  
Future Directions 
One phycosphere inhabitant, Brevundimonas sp., has the genetic capacity to produce 
siderophores, making iron bioavailable to RGd-1. Two additional taxa, previously shown to 
produce siderophores,229 were identified in association with RGd-1. Sequencing has confirmed 
an additional six taxa living in association with RGd-1.49 The relationship between RGd-1 and 
 112 
these bacteria, particularly Brevundimonas sp., is an ideal focus due to potentially intertwined 
metabolite exchange. This relationship likely facilitates the survival of RGd-1in a high pH (9.3), 
arsenic-rich (300 ppb), and iron poor environment. Exploring this relationship through 
CRISPR/CAS modifications will provide critical insights into the importance of symbiosis to 
freshwater diatom resilience. Using CRISPR/CAS as an antibiotic to systematically target the 
phycosphere bacteria to determine their effects on RGd-1 growth and lipid accumulation. By 
focusing on the removal of each phycosphere bacteria, it will be possible to gain a better 
understanding of inter-domain iron transfer. 
Despite being vital to diatom health, the role of bacterial symbionts is often overlooked. 
Residing in the phycosphere, the outer sheath encapsulating a diatom, bacteria consume a nearly 
limitless carbon supply while occupying a niche virtually free of predators.42 In exchange, 
bacteria provide diatoms with vitamins like B12, growth factors, siderophores and 
antimicrobials.159,161-164 Together, each organism is more efficient at its ecosystem function; 
bacteria increase organic matter degradation producing CO2 and freeing carbon to lower trophic 
levels while diatom photosynthesis sequesters CO2. However, CO2 sequestration can 
dramatically elevate pH, limiting nutrient bioavailability. It is here that bacterial may symbiosis 
play one of its most critical roles, providing nutrients that are otherwise inaccessible. When pH ≥ 
8, iron is precipitated in biologically unavailable forms, such as Fe(OH)3. A strategy 
implemented by many bacteria in iron-limited environments is to produce siderophores, low 
molecular weight (0.5-1.5 kDa) Fe(III) chelating agents that establish multiple (normally 
hexavalent) bonds with iron solubilizing it, making it biologically available. While diatoms do 
not produce siderophores, they efficiently scavenge iron from siderophores produced by bacteria.  
 113 
Although diatoms are unable to produce siderophores themselves, it has been shown that 
they can utilize iron complexed with bacterial siderophores57, 242 and that diatoms interact with 
siderophore-producing bacteria.44 For instance, previous studies have found that diatoms can 
utilize exogenously produced siderophores such as desferrioxamine B and E, and ferrioxamines 
D and E when grown in iron deficient media.238, 239 At the diatom cell surface, iron is thought to 
be reduced by membrane-bound ferrireductase and taken up by iron permease.243 Weak 
siderophore affinity for Fe(II) allows dissociation of iron from the siderophore complex, 
enabling iron entry into the cell. Further, there is genetic evidence for the direct uptake of 
siderophores by diatoms. The ferrichrome binding protein (FBP) gene in P. tricornutum is 
associated with siderophore uptake and works in concert with ferric reductase (FRE) to sequester 
and convert Fe to biologically available forms. Kazamia et al. 2018 found that P. tricornutum 
endocytized the siderophore-Fe complex, especially under iron-limiting conditions when iron 
starvation-induced protein 1 (ISIP1) was highly expressed.244 
Lipids are also important for plant-microbe interactions. Phospholipids, sphingolipids, 
glycolipids, and sterol lipids are important to establish plant microbe interactions. Some lipids 
such as diacylglycerol, phosphaditic acid, and free fatty acids are signaling molecules for plants 
and microbes.245 These types of molecules may be important in low-iron environments. Here the 
diatom my detect low iron, followed by an increased expression of ISIP as seen in P. 
tricornutum. Potentially, the diatom may produce a signaling molecule in the form of a lipid that 
may tell an associated bacteria to produce siderophores. Some bacteria produce TAGs, wax 
esters or polyhydroxyalkanoates (PHAs) that can be use ultimately used for biodiesel.246 It is 
possible that the bacteria associated with RGd-1 produce biodiesel-producing TAGs or free fatty 
acids that become part of the RGd-1 FAME pool when transesterified. More work is needed to 
 114 
determine whether the bacteria associated with RGd-1 have the genetic and physiological ability 
to produce signaling and/or biodiesel containing lipids. 
Future Work 
1. The glyoxylate shunt will be examined more closely for its potential role in the very high 
lipid accumulating abilities of RGd-1.   
2. Determine the metabolic potential for RGd-1 to make Ricin. 
 
Closing 
 The work presented here demonstrates the novelty of the extremophilic, freshwater, 
diatom strain RGd-1 from a genomics perspective. RGd-1 is divergent from the five publicly 
available marine diatom genome assemblies, which will provide insight into diatom ecology and 
evolution. Further, it is possible to gain insight into key genes that have allowed RGd-1 to 
become adapted to its alkaliphilic environment containing high concentrations of heavy metals 
such as arsenic. With additional omics approaches, combined with physiological data, it will be 
possible to obtain a deep understanding of what makes RGd-1 unique. Lastly, it is important to 
consider not only one organism but of how multiple organisms function together in their 
environment. RGd-1 and Brevundimonas sp. may have a dependent relationship and it is 
important to determine what factors drive these types of relationship. The complex dynamics 
between the co-habitating bacteria and diatoms may operate so efficiently as to function as a 
single organism.  
 
 
 115 
 
 
  
 116 
References 
1. Global Transport Scenarios 2050. 2011; Available from: 
https://www.worldenergy.org/wp-
content/uploads/2012/09/wec_transport_scenarios_2050.pdf. 
2. Oil Market Report. 11 April 2019; Available from: 
https://www.iea.org/oilmarketreport/omrpublic/. 
3. Sheehan, J., Dunahay, T. Benemann, J. & Roessler, P., A look back at the U.S. 
Department of Energy’s Aquatic Species Program- biodiesel from algae. National 
Renewable Energy Laboratory. 1998. 
4. Fields, M.W., et al., Sources and resources: importance of nutrients, resource allocation, 
and ecology in microalgal cultivation for lipid accumulation. Appl Microbiol Biotechnol, 
2014. 98(11): p. 4805-16. 
5. Chisti, Y., Biodiesel from microalgae. Biotechnol Adv, 2007. 25(3): p. 294-306. 
6. Georgianna, D.R. and S.P. Mayfield, Exploiting diversity and synthetic biology for the 
production of algal biofuels. Nature, 2012. 488(7411): p. 329-335. 
7. Courchesne, N.M., et al., Enhancement of lipid production using biochemical, genetic 
and transcription factor engineering approaches. J Biotechnol, 2009. 141(1-2): p. 31-41. 
8. Greenwell, H.C., et al., Placing microalgae on the biofuels priority list: a review of the 
technological challenges. J R Soc Interface, 2010. 7(46): p. 703-26. 
9. Razeghifard, R., Algal biofuels. Photosynth Res, 2013. 117(1-3): p. 207-19. 
10. Trentacoste, E.M., et al., Metabolic engineering of lipid catabolism increases microalgal 
lipid accumulation without compromising growth. Proc Natl Acad Sci U S A, 2013. 
110(49): p. 19748-53. 
11. d'Ippolito, G., et al., Potential of lipid metabolism in marine diatoms for biofuel 
production. Biotechnol Biofuels, 2015. 8: p. 28. 
12. Cavicchioli, R., et al., Scientists' warning to humanity: microorganisms and climate 
change. Nat Rev Microbiol, 2019. 
13. Mock, T., et al., Whole-genome expression profiling of the marine diatom Thalassiosira 
pseudonana identifies genes involved in silicon bioprocesses. Proceedings of the National 
Academy of Sciences, 2008. 105(5): p. 1579-1584. 
14. Fleischer, K., et al., Amazon forest response to CO2 fertilization dependent on plant 
phosphorus acquisition. Nature Geoscience, 2019. 12(9): p. 736-741. 
15. Markou, G. and E. Nerantzis, Microalgae for high-value compounds and biofuels 
production: a review with focus on cultivation under stress conditions. Biotechnol Adv, 
2013. 31(8): p. 1532-42. 
16. Yandell, M. and D. Ence, A beginner's guide to eukaryotic genome annotation. Nat Rev 
Genet, 2012. 13(5): p. 329-42. 
17. Metzker, M.L., Sequencing technologies - the next generation. Nat Rev Genet, 2010. 
11(1): p. 31-46. 
18. Goodwin, S., J.D. McPherson, and W.R. McCombie, Coming of age: ten years of next-
generation sequencing technologies. Nat Rev Genet, 2016. 17(6): p. 333-51. 
19. Ambardar, S., et al., High Throughput Sequencing: An Overview of Sequencing 
Chemistry. Indian J Microbiol, 2016. 56(4): p. 394-404. 
 117 
20. Margulies, M., et al., Genome sequencing in microfabricated high-density picolitre 
reactors. Nature, 2005. 437. 
21. Lang, D., et al., Comparison of the two up-to-date sequencing technologies for genome 
assembly: HiFi reads of Pacbio Sequel II system and ultralong reads of Oxford 
Nanopore. bioRXiv, 2020. 
22. Hon, T., et al., Highly accurate long-read HiFi sequencing data for five complex 
genomes. bioRxiv, 2020. 
23. Logsdon, G.A., M.R. Vollger, and E.E. Eichler, Long-read human genome sequencing 
and its applications. Nat Rev Genet, 2020. 21(10): p. 597-614. 
24. Wenger, A.M., et al., Accurate circular consensus long-read sequencing improves 
variant detection and assembly of a human genome. Nat Biotechnol, 2019. 37(10): p. 
1155-1162. 
25. Lam, E.T., et al., Genome mapping on nanochannel arrays for structural variation 
analysis and sequence assembly. Nat Biotechnol, 2012. 30(8): p. 771-6. 
26. Pendleton, M., et al., Assembly and diploid architecture of an individual human genome 
via single-molecule technologies. Nat Methods, 2015. 12(8): p. 780-6. 
27. Bickhart, D.M., et al., Single-molecule sequencing and chromatin conformation capture 
enable de novo reference assembly of the domestic goat genome. Nat Genet, 2017. 49(4): 
p. 643-650. 
28. Moll, K.M., et al., Strategies for optimizing BioNano and Dovetail explored through a 
second reference quality assembly for the legume model, Medicago truncatula. BMC 
Genomics, 2017. 18(1): p. 578. 
29. Putnam, N.H., et al., Chromosome-scale shotgun assembly using an in vitro method for 
long-range linkage. Genome Res, 2016. 
30. Simpson, J.T., et al., ABySS: a parallel assembler for short read sequence data. Genome 
Research, 2009. 19(6): p. 1117-1123. 
31. Simao, F.A., et al., BUSCO: assessing genome assembly and annotation completeness 
with single-copy orthologs. Bioinformatics, 2015. 31(19): p. 3210-2. 
32. Vurture, G.W., et al., GenomeScope: fast reference-free genome profiling from short 
reads. Bioinformatics, 2017. 33(14): p. 2202-2204. 
33. Marcais, G. and C. Kingsford, A fast, lock-free approach for efficient parallel counting of 
occurrences of k-mers. Bioinformatics, 2011. 27(6): p. 764-70. 
34. Mapleson, D., et al., KAT A K-mer Analysis Toolkit to quality control NGS datasets and 
genome assemblies. Bioinformatics, 2017. 33(4): p. 574-576. 
35. Bowler, C., et al., The Phaeodactylum genome reveals the evolutionary history of diatom 
genomes. Nature, 2008. 456(7219): p. 239-44. 
36. Armbrust, E.V., et al., The genome of the diatom Thalassiosira pseudonana: ecology, 
evolution, and metabolism. Science, 2004. 306(5693): p. 79-86. 
37. Traller, J.C., et al., Genome and methylome of the oleaginous diatom Cyclotella cryptica 
reveal genetic flexibility toward a high lipid phenotype. Biotechnol Biofuels, 2016. 9: p. 
258. 
38. Helliwell, K.E., et al., Insights into the evolution of vitamin B12 auxotrophy from 
sequenced algal genomes. Mol Biol Evol, 2011. 28(10): p. 2921-33. 
39. Bowler, C., A. Vardi, and A.E. Allen, Oceanographic and biogeochemical insights from 
diatom genomes. Ann Rev Mar Sci, 2010. 2: p. 333-65. 
 118 
40. Mock, T., et al., Evolutionary genomics of the cold-adapted diatom Fragilariopsis 
cylindrus. Nature, 2017. 541(7638): p. 536-540. 
41. Bell, T.A.S., et al., Microbial community changes during a toxic cyanobacterial bloom in 
an alkaline Hungarian lake. Antonie Van Leeuwenhoek, 2018. 
42. Mitchell, W.B.a.R., CHEMOTACTIC AND GROWTH RESPONSES OF MARINE 
BACTERIA TO ALGAL EXTRACELLULAR PRODUCTS. Biological Bulletin, 1972. 
143(2): p. 265-277. 
43. Lang, W.H.B.a.J.M., Selective stimulation of marine bacteria by algal extracellular 
products. Limnology and Oceanography, 1974. 19(5): p. 833-839. 
44. Amin, S.A., M.S. Parker, and E.V. Armbrust, Interactions between diatoms and bacteria. 
Microbiol Mol Biol Rev, 2012. 76(3): p. 667-84. 
45. Krohn-Molt, I., et al., Insights into Microalga and Bacteria Interactions of Selected 
Phycosphere Biofilms Using Metagenomic, Transcriptomic, and Proteomic Approaches. 
Front Microbiol, 2017. 8: p. 1941. 
46. Rolland, J.L., et al., Quorum Sensing and Quorum Quenching in the Phycosphere of 
Phytoplankton: a Case of Chemical Interactions in Ecology. J Chem Ecol, 2016. 42(12): 
p. 1201-1211. 
47. Sapp, M., et al., Species-specific bacterial communities in the phycosphere of 
microalgae? Microb Ecol, 2007. 53(4): p. 683-99. 
48. Wang, H., et al., Effects of bacterial communities on biofuel-producing microalgae: 
stimulation, inhibition and harvesting. Crit Rev Biotechnol, 2016. 36(2): p. 341-52. 
49. Moll, K.M., et al., Combining multiple nutrient stresses and bicarbonate addition to 
promote lipid accumulation in the diatom RGd-1. Algal Research, 2014. 5: p. 7-15. 
50. Zilber-Rosenberg, I. and E. Rosenberg, Role of microorganisms in the evolution of 
animals and plants: the hologenome theory of evolution. FEMS Microbiol Rev, 2008. 
32(5): p. 723-35. 
51. Bhateria, R. and R. Dhaka, Algae as biofuel. Biofuels, 2015: p. 1-25. 
52. Hill, J., et al., Environmental, economic, and energetic costs and benefits of biodiesel and 
ethanol biofuels. Proc Natl Acad Sci U S A, 2006. 103(30): p. 11206-10. 
53. Markou, G., D. Vandamme, and K. Muylaert, Microalgal and cyanobacterial cultivation: 
the supply of nutrients. Water Res, 2014. 65: p. 186-202. 
54. Brennan, L. and P. Owende, Biofuels from microalgae—a review of technologies for 
production, processing, and extractions of biofuels and co-products. Renewable and 
sustainable energy reviews, 2010. 14(2): p. 557-577. 
55. Hu, Q., et al., Microalgal triacylglycerols as feedstocks for biofuel production: 
perspectives and advances. Plant J, 2008. 54(4): p. 621-39. 
56. Schenk, P., et al., Second Generation Biofuels: High-Efficiency Microalgae for Biodiesel 
Production. BioEnergy Research, 2008. 1(1): p. 20-43. 
57. Amin, S.A., et al., Photolysis of iron‚ siderophore chelates promotes bacterial‚ algal 
mutualism. Proceedings of the National Academy of Sciences, 2009. 106(40): p. 17071-
17076. 
58. Seckbach, J., Algae and cyanobacteria in extreme environments. 2007, The Netherlands: 
Springer. 
59. Hildebrand, M., et al., The place of diatoms in the biofuels industry. Biofuels, 2012. 3(2): 
p. 221-240. 
 119 
60. Reinfelder, J.R., A.J. Milligan, and F.M. Morel, The role of the C4 pathway in carbon 
accumulation and fixation in a marine diatom. Plant Physiol, 2004. 135(4): p. 2106-11. 
61. Roberts, K., et al., Carbon acquisition by diatoms. Photosynthesis Research, 2007. 93(1): 
p. 79-88. 
62. Giordano, M., J. Beardall, and J.A. Raven, CO2 CONCENTRATING MECHANISMS IN 
ALGAE: Mechanisms, Environmental Modulation, and Evolution. Annual Review of 
Plant Biology, 2005. 56(1): p. 99-131. 
63. Kaplan, A. and L. Reinhold, CO2 CONCENTRATING MECHANISMS IN 
PHOTOSYNTHETIC MICROORGANISMS. Annual Review of Plant Physiology & Plant 
Molecular Biology, 1999. 50(1): p. 539. 
64. Somanchi, J.V.M.a.A., How Do Algae Concentrate CO2 to Increase the Efficiency of 
Photosynthetic Carbon Fixation? Plant Physiology, 1999. 119: p. 9-16. 
65. Moroney, J.V. and R.A. Ynalvez, Proposed carbon dioxide concentrating mechanism in 
Chlamydomonas reinhardtii. Eukaryot Cell, 2007. 6(8): p. 1251-9. 
66. Radakovits, R., et al., Draft genome sequence and genetic transformation of the 
oleaginous alga Nannochloropis gaditana. Nat Commun, 2012. 3: p. 686. 
67. Jacob Valenzuela, A.M., Ross P Carlson, Robin Gerlach, Keith E Cooksey, Brent M 
Peyton and Matthew W Fields, Potential role of multiple carbon fixation pathways 
during lipid accumulation in Phaeodactylum tricornutum. Biotechnology for Biofuels, 
2012. 5(40): p. 1-17. 
68. Bigelow, N., et al., A Comprehensive GC–MS Sub-Microscale Assay for Fatty Acids and 
its Applications. Journal of the American Oil Chemists' Society, 2011. 88(9): p. 1329-
1338. 
69. Gardner, R.D., et al., Use of sodium bicarbonate to stimulate triacylglycerol 
accumulation in the chlorophyte Scenedesmus sp. and the diatom Phaeodactylum 
tricornutum. Journal of Applied Phycology, 2012. 24(5): p. 1311-1320. 
70. Gardner, R., et al., Cellular Cycling, Carbon Utilization, and Photosynthetic Oxygen 
Production during Bicarbonate-Induced Triacylglycerol Accumulation in a Scenedesmus 
sp. Energies, 2013. 6(11): p. 6060-6076. 
71. Gardner, R., et al., Medium pH and nitrate concentration effects on accumulation of 
triacylglycerol in two members of the chlorophyta. Journal of Applied Phycology, 2010. 
23(6): p. 1005-1016. 
72. Guckert, J.B. and K.E. Cooksey, Triglyceride Accumulation and Fatty Acid Profile 
Changes in Chlorella (Chlorophyta) During High pH-Induced Cell Cycle Inhibition. 
Journal of Phycology, 1990. 26(1): p. 72-79. 
73. Sharma, K.K., H. Schuhmann, and P.M. Schenk, High Lipid Induction in Microalgae for 
Biodiesel Production. Energies, 2012. 5(5): p. 1532-1553. 
74. Valenzuela, J., et al., Nutrient resupplementation arrests bio-oil accumulation in 
Phaeodactylum tricornutum. Appl Microbiol Biotechnol, 2013. 97(15): p. 7049-59. 
75. Gardner, R.D., et al., Comparison of CO(2) and bicarbonate as inorganic carbon sources 
for triacylglycerol and starch accumulation in Chlamydomonas reinhardtii. Biotechnol 
Bioeng, 2013. 110(1): p. 87-96. 
76. Hunt, R.W., et al., Effect of biochemical stimulants on biomass productivity and 
metabolite content of the microalga, Chlorella sorokiniana. Applied biochemistry and 
biotechnology, 2010. 162(8): p. 2400-2414. 
 120 
77. Griffiths, M. and S. Harrison, Lipid productivity as a key characteristic for choosing 
algal species for biodiesel production. Journal of Applied Phycology, 2009. 21(5): p. 
493-507. 
78. Borowitzka, M.A., Algal biotechnology products and processes — matching science and 
economics. Journal of Applied Phycology, 1992. 4(3): p. 267-279. 
79. Stumm, W. and J.J. Morgan, Aquatic chemistry: chemical equilibria and rates in natural 
waters. Vol. 126. 2012: John Wiley & Sons. 
80. Jones, B.E., et al., Microbial diversity of soda lakes. Extremophiles, 1998. 2(3): p. 191-
200. 
81. Kilham, J.M.M.P., Photosynthetic activity of phytoplankton in tropical African soda 
lakes. Limnology and Oceanography, 1974. 19(5): p. 743-755. 
82. Richards, A., Identification and structural characterization of siderophores produced by 
halophilic and alkaliphilic bacteria, in Department of Chemical Engineering. 2007, 
Washington State University. 
83. Pick, U., Karni, Leah And Avron, Mokdhay Determination of Ion Content and Ion Fluxes 
in the Halotolerant Alga Dunaliella salina. Plant Physiol., 1986. 82: p. 91-96. 
84. Cooksey, K.E., Regulation of the initial events in microalgal triacylglycerol (TAG) 
synthesis: hypotheses. Journal of Applied Phycology, 2014. 
85. Cooksey, K.E., Regulation of the initial events in microalgal triacylglycerol (TAG) 
synthesis: hypotheses. Journal of applied phycology, 2015. 27(4): p. 1385-1387. 
86. Madigan, M.T., Martinko, John M., Stahl, David, A. & Clark, David P., Brock Biology of 
Microorganisms. 13 ed. 2012, Boston: Benjamin Cummings. 
87. Cooksey, K.E., et al., Fluorometric determination of the neutral lipid content of 
microalgal cells using Nile Red. Journal of Microbiological Methods, 1987. 6(6): p. 333-
345. 
88. Davis, R., A. Aden, and P.T. Pienkos, Techno-economic analysis of autotrophic 
microalgae for fuel production. Applied Energy, 2011. 88(10): p. 3524-3531. 
89. Terry, K.L. and L.P. Raymond, System design for the autotrophic production of 
microalgae. Enzyme and Microbial Technology, 1985. 7(10): p. 474-487. 
90. Williams, P.J.l.B. and L.M.L. Laurens, Microalgae as biodiesel & biomass feedstocks: 
Review & analysis of the biochemistry, energetics & economics. Energy & 
Environmental Science, 2010. 3(5): p. 554. 
91. Amin, S., Review on biofuel oil and gas production processes from microalgae. Energy 
Conversion and Management, 2009. 50(7): p. 1834-1840. 
92. Hise, A.M., et al., Evaluating the relative impacts of operational and financial factors on 
the competitiveness of an algal biofuel production facility. Bioresour Technol, 2016. 220: 
p. 271-281. 
93. Burns, N.A., Biomass--The next revolution in surfactants? Inform, 2010. 21(727-729): p. 
779. 
94. Ahmad, A.L., et al., Microalgae as a sustainable energy source for biodiesel production: 
A review. Renewable and Sustainable Energy Reviews, 2011. 15(1): p. 584-593. 
95. King, J.M.D., Oil’s tipping point has passed. Nature, 2012. 481: p. 434-435. 
96. Hamilton, J.D., Causes and Consequences of the Oil Shock of 2007–08. Brookings 
Papers on Economic Activity, 2009. Spring: p. 215-283. 
 121 
97. Finer, M., et al., Oil and gas projects in the Western Amazon: threats to wilderness, 
biodiversity, and indigenous peoples. PLoS One, 2008. 3(8): p. e2932. 
98. Geltman, E.A.G., Oil & Gas Drilling in National Parks. Natural Resources Journal, 
Winter 2016. 56(145). 
99. Geltman, E.A.G., Oil & Gas Drilling in National Parks. Natural Resources Journal, 
2016. 56(1): p. 145-192. 
100. Mata, T.M., A.A. Martins, and N.S. Caetano, Microalgae for biodiesel production and 
other applications: A review. Renewable and Sustainable Energy Reviews, 2010. 14(1): 
p. 217-232. 
101. Moll, K.M., Pedersen, T.C., Gardner, R.D., & Peyton, B.M., Biodiesel (Microalgae), in 
Extremophilic Microbial Processing of Lignocellulosic Feedstocks to Biofuels, Value-
Added Products, and Usable Power, D.R. Sani, Editor. 2018, Springer: New York, NY. 
p. 63-78. 
102. Nalley, J.O., M. Stockenreiter, and E. Litchman, Community Ecology of Algal Biofuels: 
Complementarity and Trait-Based Approaches. Industrial Biotechnology, 2014. 10(3): p. 
191-201. 
103. Xu, C., et al., The Use of the Schizonticidal Agent Quinine Sulfate to Prevent Pond 
Crashes for Algal-Biofuel Production. Int J Mol Sci, 2015. 16(11): p. 27450-6. 
104. Carney, L.T., et al., Pond Crash Forensics: Presumptive identification of pond crash 
agents by next generation sequencing in replicate raceway mass cultures of 
Nannochloropsis salina. Algal Research, 2016. 17: p. 341-347. 
105. Park, S., et al., The Selective Use of Hypochlorite to Prevent Pond Crashes for Algae-
Biofuel Production. Water Environment Research, 2016. 88(1): p. 70-78. 
106. McBride, R.C., et al., Contamination Management in Low Cost Open Algae Ponds for 
Biofuels Production. Industrial Biotechnology, 2014. 10(3): p. 221-227. 
107. Gardner, R., et al., Use of sodium bicarbonate to stimulate triacylglycerol accumulation 
in the chlorophyte Scenedesmus sp. and the diatom & Phaeodactylum tricornutum. 
Journal of Applied Phycology, 2012: p. 1-10. 
108. Pedersen, T.C., et al., Assessment of Nannochloropsis gaditana growth and lipid 
accumulation with increased inorganic carbon delivery. Journal of Applied Phycology, 
2018. 30(4): p. 2155-2166. 
109. Doemel, W.N.B., T.D., The Physiological Ecology of Cyunidium caldarium. Journal of 
General Microbiology, 1971. 67: p. 17-32. 
110. Skorupa, D.J., et al., In situ gene expression profiling of the thermoacidophilic alga 
Cyanidioschyzon in relation to visible and ultraviolet irradiance. Environ Microbiol, 
2014. 16(6): p. 1627-41. 
111. Oren, A., The ecology of Dunaliella in high-salt environments. Journal of Biological 
Research-Thessaloniki, 2014. 21(1): p. 23. 
112. Palmisano, A.C. and C.W. Sullivan, Growth, metabolism, and dark survival in sea ice 
microalgae, in Sea ice biota. 2018, CRC Press. p. 131-146. 
113. Wensel, P., et al., Isolation, characterization, and validation of oleaginous, multi-trophic, 
and haloalkaline-tolerant microalgae for two-stage cultivation. Algal Research, 2014. 4: 
p. 2-11. 
 122 
114. Vadlamani, A., et al., Cultivation of microalgae at extreme alkaline pH conditions: a 
novel approach for biofuel production. ACS Sustainable Chemistry & Engineering, 2017. 
5(8): p. 7284-7294. 
115. Gardner, R., et al., Medium pH and nitrate concentration effects on accumulation of 
triacylglycerol in two members of the chlorophyta. Journal of Applied Phycology, 2011. 
23(6): p. 1005-1016. 
116. Moll, K., Diatom Biofuels: Optimizing Nutrient Requirements for Growth and Lipid 
Acumulation in YNP Isolate RGd-1, in Microbiology. 2012, Montana State University: 
Bozeman, MT. p. 170. 
117. Bischoff, H.W. and H.C. Bold, Some soil algae from enchanted rock and related Algal 
species. Publication / The University of Texas;no. 6318. 1963, Austin: [s.n.]. 
118. Provasoli, L., J.J.A. McLaughlin, and M.R. Droop, The development of artificial media 
for marine algae. Archives of Microbiology, 1957. 25(4): p. 392-428. 
119. Andersen, R.e., Algal Culturing Techniques. 2005, San Francisco: Academic Press. 
120. Bold, H.C., The Morphology of Chlamydomonas chlamydogama, Sp. Nov. Bulletin of the 
Torrey Botanical Club, Mar-Apr., 1949. 76(2): p. 101-108. 
121. Kuenen, L.A.R.a.J.G., Aerobic denitrification a controversy revived. Archives of 
Microbiology, 1984. 139: p. 351-354. 
122. Bowen De Leon, K., B.D. Ramsay, and M.W. Fields, Quality-score refinement of SSU 
rRNA gene pyrosequencing differs across gene region for environmental samples. 
Microb Ecol, 2012. 64(2): p. 499-508. 
123. Bell, T.A.S., et al., Contributions of the microbial community to algal biomass and 
biofuel productivity in a wastewater treatment lagoon system. Algal Research, 2019. 39. 
124. Caporaso, J.G., et al., QIIME allows analysis of high-throughput community sequencing 
data. Nature Methods, 2010. 7(5): p. 335-336. 
125. Li, W. and A. Godzik, Cd-hit: a fast program for clustering and comparing large sets of 
protein or nucleotide sequences. Bioinformatics, 2006. 22(13): p. 1658-9. 
126. Ybanez, A.P., M. Sashika, and H. Inokuma, The phylogenetic position of Anaplasma 
bovis and inferences on the phylogeny of the genus Anaplasma. Journal of Veterinary 
Medical Science, 2013: p. 13-0411. 
127. Stamatakis, A., RAxML version 8: a tool for phylogenetic analysis and post-analysis of 
large phylogenies. Bioinformatics, 2014. 30(9): p. 1312-3. 
128. White, T.J., Bruns, T., Lee, S., & Taylor, J. , Amplification and Direct Sequencing of 
Fungal Ribosomal RNA Genes for Phylogenetics, in PCR Protocols: A Guide to Methods 
and Applications, M.A. Innis, Gelfand, D.H., Sninksky, J.J. & White, T.J., Editor. 1990, 
Academic Press, Inc.: San Diego. p. 315-322. 
129. Gardes, M., et al., Identification of indigenous and introduced symbiotic fungi in 
ectomycorrhizae by amplification of nuclear and mitochondrial ribosomal DNA. 
Canadian Journal of Botany, 1991. 69(1): p. 180-190. 
130. Gardes, M. and T.D. Bruns, ITS primers with enhanced specificity for basidiomycetes - 
application to the identification of mycorrhizae and rusts. Molecular Ecology, 1993. 2(2): 
p. 113-118. 
131. Mitchell, T.G., et al., Unique oligonucleotide primers in PCR for identification of 
Cryptococcus neoformans. Journal of Clinical Microbiology, 1994. 32(1): p. 253-255. 
 123 
132. Chen, W., et al., A high throughput Nile red method for quantitative measurement of 
neutral lipids in microalgae. J Microbiol Methods, 2009. 77(1): p. 41-7. 
133. Gardner, R.D., et al., Comparison of CO2 and bicarbonate as inorganic carbon sources 
for triacylglycerol and starch accumulation in Chlamydomonas reinhardtii. 
Biotechnology and Bioengineering, 2012. 
134. Weiss, T.L., et al., Colony organization in the green alga Botryococcus braunii (Race B) 
is specified by a complex extracellular matrix. Eukaryotic cell, 2012. 11(12): p. 1424-
1440. 
135. Natalia Pismenskaya, E.L., Victor Nikonenko, Abdulla El Attar, Bernard Auclair, Gérald 
Pourcelly, Dependence of composition of anion-exchange membranes and their electrical 
conductivity on concen-V. Journal of Membrane Science, 2001. 181: p. 185-197. 
136. Cheng, P., et al., The growth, lipid and hydrocarbon production of Botryococcus braunii 
with attached cultivation. Bioresour Technol, 2013. 138: p. 95-100. 
137. Vazquez-Duhalt, R. and H. Greppin, Growth and production of cell constituents in batch 
cultures of botryococcus sudeticus. Phytochemistry, 1987. 26(4): p. 885-889. 
138. Ruangsomboon, S., Effect of light, nutrient, cultivation time and salinity on lipid 
production of newly isolated strain of the green microalga, Botryococcus braunii KMITL 
2. Bioresource Technology, 2012. 109: p. 261-265. 
139. Sydney, E.d., et al., Screening of microalgae with potential for biodiesel production and 
nutrient removal from treated domestic sewage. Applied Energy, 2011. 88(10): p. 3291-
3294. 
140. Mata, T.M., A.n.A. Martins, and N.S. Caetano, Microalgae for biodiesel production and 
other applications: A review. Renewable and Sustainable Energy Reviews, 2010. 14(1): 
p. 217-232. 
141. Zhou, W., et al., Local bioprospecting for high-lipid producing microalgal strains to be 
grown on concentrated municipal wastewater for biofuel production. Bioresour Technol, 
2011. 102(13): p. 6909-19. 
142. Liu, J., et al., Aerated swine lagoon wastewater: a promising alternative medium for 
Botryococcus braunii cultivation in open system. Bioresour Technol, 2013. 139: p. 190-4. 
143. Ji, M.K., et al., Effect of mine wastewater on nutrient removal and lipid production by a 
green microalga Micratinium reisseri from concentrated municipal wastewater. 
Bioresour Technol, 2014. 157: p. 84-90. 
144. Eustance, E., et al., Growth, nitrogen utilization and biodiesel potential for two 
chlorophytes grown on ammonium, nitrate or urea. Journal of Applied Phycology, 2013. 
25(6): p. 1663-1677. 
145. Nelson, I.W.A.F.T.L.a.Y., Algae Grown on Dairy and Municipal Wastewater for 
Simultaneous Nutrient Removal and Lipid Production for Biofuel Feedstock. Journal of 
Environmental Engineering, 2009. 135(11): p. 1115-1122. 
146. Ann C. Wilkie, W.W.M., Recovery of dairy manure nutrients by benthic freshwater 
algae. Bioresour Technol, 2002. 84: p. 81-91. 
147. Cheng, D.L., et al., Microalgae biomass from swine wastewater and its conversion to 
bioenergy. Bioresour Technol, 2019. 275: p. 109-122. 
148. Gopalakrishnan, K., J. Roostaei, and Y. Zhang, Mixed culture of Chlorella sp. and 
wastewater wild algae for enhanced biomass and lipid accumulation in artificial 
 124 
wastewater medium. Frontiers of Environmental Science & Engineering, 2018. 12(4): p. 
14. 
149. Kothari, R., et al., Experimental study for growth potential of unicellular alga Chlorella 
pyrenoidosa on dairy waste water: an integrated approach for treatment and biofuel 
production. Bioresour Technol, 2012. 116: p. 466-70. 
150. Farooq Ahmad, A.U.K.a.A.Y., Uptake of Nutrients from Municipal Wastewater and 
Biodiesel Production by Mixed Algae Culture. Pakistan Journal of Nutrition, 2012. 11(7). 
151. Guarnieri, M.T., et al., Genome Sequence of the Oleaginous Green Alga, Chlorella 
vulgaris UTEX 395. Front Bioeng Biotechnol, 2018. 6: p. 37. 
152. Lohman, E.J., et al., Optimized inorganic carbon regime for enhanced growth and lipid 
accumulation in Chlorella vulgaris. Biotechnol Biofuels, 2015. 8: p. 82. 
153. Bernstein, H.C., et al., Direct measurement and characterization of active photosynthesis 
zones inside wastewater remediating and biofuel producing microalgal biofilms. 
Bioresour Technol, 2014. 156: p. 206-15. 
154. Smetacek, V., et al., Deep carbon export from a Southern Ocean iron-fertilized diatom 
bloom. Nature, 2012. 487(7407): p. 313-9. 
155. A. Subramaniam, P.L.Y., E. J. Carpenter, C. Mahaffey, K. Bjo ̈ rkman , S. Cooley, A. B. 
Kustka, J. P. Montoya, S. A. San ̃ udo-Wilhelmy, R. Shipe, and D. G. Capone, Amazon 
River enhances diazotrophy and carbon sequestration in the tropical North Atlantic 
Ocean. PNAS, 2008. 105(30): p. 10460–10465. 
156. Gao, B., et al., Co-production of lipids, eicosapentaenoic acid, fucoxanthin, and 
chrysolaminarin by Phaeodactylum tricornutum cultured in a flat-plate photobioreactor 
under varying nitrogen conditions. Journal of Ocean University of China, 2017. 16(5): p. 
916-924. 
157. Lohman, E.J., et al., An efficient and scalable extraction and quantification method for 
algal derived biofuel. J Microbiol Methods, 2013. 94(3): p. 235-44. 
158. Nurachman, Z., et al., Oil from the tropical marine benthic-diatom Navicula sp. Appl 
Biochem Biotechnol, 2012. 168(5): p. 1065-75. 
159. Seymour, J.R., et al., Zooming in on the phycosphere: the ecological interface for 
phytoplankton-bacteria relationships. Nat Microbiol, 2017. 2: p. 17065. 
160. Johansson, O.N., et al., Friends With Benefits: Exploring the Phycosphere of the Marine 
Diatom Skeletonema marinoi. Front Microbiol, 2019. 10: p. 1828. 
161. Grossart, H.-P., et al., Marine diatom species harbour distinct bacterial communities. 
Environmental Microbiology, 2005. 7(6): p. 860-873. 
162. Droop, M.R., Vitamins, phytoplankton and bacteria: symbiosis or scavenging? Journal of 
Plankton Research, 2007. 29(2): p. 107-113. 
163. Grossart, H.-P., G. Czub, and M. Simon, Algae-bacteria interactions and their effects on 
aggregation and organic matter flux in the sea. Environmental Microbiology, 2006. 8(6): 
p. 1074-1084. 
164. Pisman, T.I., Y.V. Galayda, and N.S. Loginova, Population dynamics of an algal–
bacterial cenosis in closed ecological system. Advances in Space Research, 2005. 35(9): 
p. 1579-1583. 
165. Hutchins, D.A. and K.W. Bruland, Iron-limited diatom growth and Si:N uptake ratios in 
a coastal upwelling regime. Nature, 1998. 393(6685): p. 561-564. 
 125 
166. Mark Hildebrand, A.K.D., Sarah R Smith, Jesse C Traller & Raffaela Abbriano, The 
place of diatoms in the biofuels industry.pdf. Biofuels, 2012. 3(2): p. 221-240. 
167. Sison-Mangus, M.P., et al., Host-specific adaptation governs the interaction of the 
marine diatom, Pseudo-nitzschia and their microbiota. ISME J, 2014. 8(1): p. 63-76. 
168. William S. and Helene Feil, A.C., Bacterial genomic DNA isolation using CTAB, M.H. 
11-12-12, Editor. 2004, Joint Genome Institute, Department of Energy. 
169. Hunken, M., J. Harder, and G.O. Kirst, Epiphytic bacteria on the Antarctic ice diatom 
Amphiprora kufferathii Manguin cleave hydrogen peroxide produced during algal 
photosynthesis. Plant Biol (Stuttg), 2008. 10(4): p. 519-26. 
170. Droop, M., A procedure for routine purification of algal cultures with antibiotics. British 
Phycological Bulletin, 1967. 3(2): p. 295-297. 
171. Han, X.Y. and R.A. Andrade, Brevundimonas diminuta infections and its resistance to 
fluoroquinolones. J Antimicrob Chemother, 2005. 55(6): p. 853-9. 
172. Kim, K.E., et al., Long-read, whole-genome shotgun sequence data for five model 
organisms. Sci Data, 2014. 1: p. 140045. 
173. Ye, C., et al., DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous 
Reads of the Third Generation Sequencing Technologies. Sci Rep, 2016. 6: p. 31900. 
174. Chin, C.S., et al., Phased diploid genome assembly with single-molecule real-time 
sequencing. Nat Methods, 2016. 13(12): p. 1050-1054. 
175. Koren, S., et al., Canu: scalable and accurate long-read assembly via adaptive k-mer 
weighting and repeat separation. bioRxiv, 2016. 
176. Bankevich, A., et al., SPAdes: a new genome assembly algorithm and its applications to 
single-cell sequencing. J Comput Biol, 2012. 19(5): p. 455-77. 
177. Huang, X., CAP3: A DNA Sequence Assembly Program. Genome Research, 1999. 9(9): 
p. 868-877. 
178. Simpson, J.T. and R. Durbin, Efficient de novo assembly of large genomes using 
compressed data structures. Genome Res, 2012. 22(3): p. 549-56. 
179. Zdobnov, E.M., et al., OrthoDB v9.1: cataloging evolutionary and functional annotations 
for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res, 
2017. 45(D1): p. D744-D749. 
180. Stanke, M., et al., AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic 
Acids Res, 2006. 34(Web Server issue): p. W435-9. 
181. Stanke, M., et al., Gene prediction in eukaryotes with a generalized hidden Markov 
model that uses hints from external sources. BMC Bioinformatics, 2006. 7: p. 62. 
182. Li, H., et al., The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009. 
25(16): p. 2078-9. 
183. Holt, C. and M. Yandell, MAKER2: an annotation pipeline and genome-database 
management tool for second-generation genome projects. BMC Bioinformatics, 2011. 
12: p. 491. 
184. Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data without a 
reference genome. Nat Biotechnol, 2011. 29(7): p. 644-52. 
185. Haas, B.J., et al., De novo transcript sequence reconstruction from RNA-seq using the 
Trinity platform for reference generation and analysis. Nat Protoc, 2013. 8(8): p. 1494-
512. 
 126 
186. Campbell, M.S., et al., Genome Annotation and Curation Using MAKER and MAKER-P. 
Curr Protoc Bioinformatics, 2014. 48: p. 4 11 1-4 11 39. 
187. Cantarel, B.L., et al., MAKER: An easy-to-use annotation pipeline designed for emerging 
model organism genomes. Genome Research, 2008. 18(1): p. 188-196. 
188. Korf, I., Gene finding in novel genomes. Bmc Bioinformatics, 2004. 5(1): p. 59. 
189. Humann, J.L., et al., Structural and Functional Annotation of Eukaryotic Genomes with 
GenSAS, in Gene Prediction: Methods and Protocols, M. Kollmar, Editor. 2019, Springer 
New York: New York, NY. p. 29-51. 
190. Kanehisa, M. and Y. Sato, KEGG Mapper for inferring cellular functions from protein 
sequences. Protein Science, 2020. 29(1): p. 28-35. 
191. Perez, N., M. Gutierrez, and N. Vera, Computational Performance Assessment of k-mer 
Counting Algorithms. J Comput Biol, 2016. 23(4): p. 248-55. 
192. Wu, Y.W., ezTree: an automated pipeline for identifying phylogenetic marker genes and 
inferring evolutionary relationships among uncultivated prokaryotic draft genomes. 
BMC Genomics, 2018. 19(Suppl 1): p. 921. 
193. El-Gebali, S., et al., The Pfam protein families database in 2019. Nucleic Acids Res, 
2019. 47(D1): p. D427-D432. 
194. Katoh, K. and D.M. Standley, MAFFT multiple sequence alignment software version 7: 
improvements in performance and usability. Mol Biol Evol, 2013. 30(4): p. 772-80. 
195. Capella-Gutierrez, S., J.M. Silla-Martinez, and T. Gabaldon, trimAl: a tool for automated 
alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 2009. 25(15): 
p. 1972-3. 
196. Rambaut, A., FigTree v1. 4.0. A graphical viewer of phylogenetic trees. See http://tree. 
bio. ed. ac. uk/software. figtreetree, 2012. 
197. Bell, T.A., et al., A Lipid-Accumulating Alga Maintains Growth in Outdoor, Alkaliphilic 
Raceway Pond with Mixed Microbial Communities. Front Microbiol, 2015. 6: p. 1480. 
198. McKay, L.J., et al., Occurrence and expression of novel methyl-coenzyme M reductase 
gene (mcrA) variants in hot spring sediments. Sci Rep, 2017. 7(1): p. 7252. 
199. Ounit, R., et al., CLARK: fast and accurate classification of metagenomic and genomic 
sequences using discriminative k-mers. BMC Genomics, 2015. 16: p. 236. 
200. Cole, J.R., et al., Ribosomal Database Project: data and tools for high throughput rRNA 
analysis. Nucleic Acids Res, 2014. 42(Database issue): p. D633-42. 
201. Kim, D., B. Langmead, and S.L. Salzberg, HISAT: a fast spliced aligner with low 
memory requirements. Nat Methods, 2015. 12(4): p. 357-60. 
202. Robert Henschel, P.M.N., Matthias Lieber, Brian J. Haas, Le-Shin Wu & Richard D. 
LeDuc, Trinity RNA-Seq Assembler Performance Optimization.pdf, in XSEDE12. July 16 
- 20 2012: Chicago, Illinois, USA. 
203. Ewels, P., et al., MultiQC: summarize analysis results for multiple tools and samples in a 
single report. Bioinformatics, 2016. 32(19): p. 3047-8. 
204. Smith, S.R., R.M. Abbriano, and M. Hildebrand, Comparative analysis of diatom 
genomes reveals substantial differences in the organization of carbon partitioning 
pathways. Algal Research, 2012. 1(1): p. 2-16. 
205. Wei Zhao, X.H., Katherine A Hoadley, Joel S Parker, David Neil Hayes & Charles M 
Perou, Comparison of RNASeq by polyA capture ribosomal RNA depletion and DNA 
microarray for expression profiling. BMC Genomics, 2014. 15(419): p. 1-11. 
 127 
206. Hrdlickova, R., M. Toloue, and B. Tian, RNA‐Seq methods for transcriptome analysis. 
Wiley Interdisciplinary Reviews: RNA, 2017. 8(1): p. e1364. 
207. Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler 
transform. Bioinformatics, 2009. 25(14): p. 1754-60. 
208. Simpson, J.T., Exploring genome characteristics and sequence quality without a 
reference. Bioinformatics, 2014. 30(9): p. 1228-1235. 
209. Armbrust, E.V., The life of diatoms in the world's oceans. Nature, 2009. 459(7244): p. 
185-92. 
210. Delcher, A.L., S.L. Salzberg, and A.M. Phillippy, Using MUMmer to identify similar 
regions in large sequence sets. Current protocols in bioinformatics, 2003(1): p. 10.3. 1-
10.3. 18. 
211. White, D., The Physiology and Biochemistry of Prokaryotes. Third Edition ed. 2007, 
New York: Oxford University Press. 
212. Grisham, R.H.G.C.M., Biochemistry. Fourth Edition ed. 2010, University of Virginia: 
Brooks/Cole. 
213. Kajikawa, M., et al., Production of ricinoleic acid-containing monoestolide 
triacylglycerides in an oleaginous diatom, Chaetoceros gracilis. Scientific Reports, 2016. 
6(1). 
214. Sally A. Ong, T.P.a.J.B.N., Agrobactin, a Siderophore from Agrobacterium tumefaciens. 
The Journal of Biological Chemistry, 1979. 254(6): p. 1860-1865. 
215. Mehansho, H., Iron Fortification Technology Development: New Approaches. The 
Journal of Nutrition, 2006. 136(4): p. 1059-1063. 
216. BRITT A. HOLME ́N, J.D.S., DOUGLAS C. NELSON, and WILLIAM H. CASEY, 
Hydroxamate siderophores, cell growth and Fe(III) cycling in two anaerobic iron oxide 
media containing Geobacter metallireducens. Geochimica et Cosmochimica Acta, 1999. 
63(2): p. 227-239. 
217. Penyalver, R., et al., Iron-binding compounds from Agrobacterium spp.: biological 
control strain Agrobacterium rhizogenes K84 produces a hydroxamate siderophore. Appl 
Environ Microbiol, 2001. 67(2): p. 654-64. 
218. Park, Y., et al., Growth promotion of Chlorella ellipsoidea by co-inoculation with 
Brevundimonas sp. isolated from the microalga. Hydrobiologia, 2007. 598(1): p. 219-
228. 
219. Schubbe, S., et al., Complete genome sequence of the chemolithoautotrophic marine 
magnetotactic coccus strain MC-1. Appl Environ Microbiol, 2009. 75(14): p. 4835-52. 
220. Madhaiyan, M., et al., Arachidicoccus rhizosphaerae gen. nov., sp. nov., a plant-growth-
promoting bacterium in the family Chitinophagaceae isolated from rhizosphere soil. Int J 
Syst Evol Microbiol, 2015. 65(Pt 2): p. 578-86. 
221. Tu, J., et al., The siderophore-interacting protein is involved in iron acquisition and 
virulence of Riemerella anatipestifer strain CH3. Veterinary Microbiology, 2014. 168(2): 
p. 395-402. 
222. Karen M. Moll, N.D., Thiruvarangan Ramaraj, Joann Mudge, Brent M. Peyton GENOME 
SEQUENCE FOR AN EXTREMOPHILIC BREVUNDIMONAS STRAIN. Microbiology 
Resource Announcements, In Progress. 
 128 
223. Ghosh, P., et al., Bacterial ability in AsIII oxidation and AsV reduction: Relation to 
arsenic tolerance, P uptake, and siderophore production. Chemosphere, 2015. 138: p. 
995-1000. 
224. Valenzuela, J., Mazurie, A., Carlson, R.P., Gerlach, R., Cooksey, K.E., Peyton, B.M & 
Fields, M.W., Potential role of multiple carbon fixation pathways during lipid 
accumulation in Phaeodactylum tricornutum. Biotechnology for biofuels, 2012. 5(40): p. 
10.1186/1754-6834-5-40. 
225. Erb, T.J., et al., Synthesis of C5-dicarboxylic acids from C2-units involving crotonyl-CoA 
carboxylase/reductase: the ethylmalonyl-CoA pathway. Proc Natl Acad Sci U S A, 2007. 
104(25): p. 10631-6. 
226. Seyedsayamdost, M.R., et al., The Jekyll-and-Hyde chemistry of Phaeobacter 
gallaeciensis. Nat Chem, 2011. 3(4): p. 331-5. 
227. Chen, G., L. Zhao, and Y. Qi, Enhancing the productivity of microalgae cultivated in 
wastewater toward biofuel production: a critical review. Applied Energy, 2015. 137: p. 
282-291. 
228. Singh, N., et al., Brevundimonas diminuta mediated alleviation of arsenic toxicity and 
plant growth promotion in Oryza sativa L. Ecotoxicol Environ Saf, 2016. 125: p. 25-34. 
229. Schwyn, B. and J.B. Neilands, Universal chemical assay for the detection and 
determination of siderophores. Analytical Biochemistry, 1987. 160(1): p. 47-56. 
230. Tan, Y., et al., The Low Conductivity of Geobacter uraniireducens Pili Suggests a 
Diversity of Extracellular Electron Transfer Mechanisms in the Genus Geobacter. Front 
Microbiol, 2016. 7: p. 980. 
231. Anderson, R.T., et al., Stimulating the in situ activity of Geobacter species to remove 
uranium from the groundwater of a uranium-contaminated aquifer. Appl Environ 
Microbiol, 2003. 69(10): p. 5884-91. 
232. Kroth, P.G., et al., A model for carbohydrate metabolism in the diatom Phaeodactylum 
tricornutum deduced from comparative whole genome analysis. PLoS One, 2008. 3(1): p. 
e1426. 
233. Davis, A., et al., Clarification of Photorespiratory Processes and the Role of Malic 
Enzyme in Diatoms. Protist, 2017. 168(1): p. 134-153. 
234. Wang, M., et al., PacBio-LITS: a large-insert targeted sequencing method for 
characterization of human disease-associated chromosomal structural variations. BMC 
genomics, 2015. 16(1): p. 214. 
235. Seemann, T., Prokka: rapid prokaryotic genome annotation. Bioinformatics, 2014. 
30(14): p. 2068-9. 
236. Overbeek, R., et al., The SEED and the Rapid Annotation of microbial genomes using 
Subsystems Technology (RAST). Nucleic Acids Res, 2014. 42(Database issue): p. D206-
14. 
237. Brettin, T., et al., RASTtk: a modular and extensible implementation of the RAST 
algorithm for building custom annotation pipelines and annotating batches of genomes. 
Sci Rep, 2015. 5: p. 8365. 
238. Soria-Dengg, S.a.H., U., Ferrioxamines B and E as iron sources for the marine diatom 
Phaeodactylum tricornutum. Marine Ecology Progress Series, 1995. 127: p. 269-277. 
 129 
239. Soria-Dengg, S., Reissbrodt, R. & Horstmann, U., Siderophores in marine coastal waters 
and their relevance for iron uptake by phytoplankton experiments with the diatom 
Phaeodactylum tricornutum.pdf. Marine Ecology - Progress Series, 2001. 220(73-82). 
240. Gurney, K.R., et al., High Resolution Fossil Fuel Combustion CO2 Emission Fluxes for 
the United States. Environmental Science & Technology, 2009. 43(14): p. 5535-5541. 
241. Hutchins, D.A., et al., Climate change microbiology - problems and perspectives. Nat 
Rev Microbiol, 2019. 17(6): p. 391-396. 
242. Kuma, K., et al., Effect of Hydroxamate Ferrisiderophore Complex (Ferrichrome) on 
Iron Uptake and Growth of a Coastal Marine Diatom, Chaetoceros sociale. Limnology 
and Oceanography, 2000. 45(6): p. 1235-1244. 
243. Kustka, A.B., A.E. Allen, and F.M.M. Morel, Sequence analysis and transcriptional 
regulation of iron acquisition genes in two marine diatoms. Journal of Phycology, 2007. 
43(4): p. 715-729. 
244. Elena Kazamia, R.S., Javier Paz-Yepes, Richard G. Dorrell, and J.M. Fabio Rocha 
Jimenez Vieira, Joe Morrissey, Sébastien Leon, France Lam, Eric Pelletier, Jean-Michel 
Camadro, Chris Bowler, Emmanuel Lesuisse, Endocytosis-mediated siderophore uptake 
as a strategy for Fe acquisition in diatoms. Science Advances, 2018. 4(5): p. 1-14. 
245. Siebers, M., et al., Lipids in plant-microbe interactions. Biochim Biophys Acta, 2016. 
1861(9 Pt B): p. 1379-1395. 
246. Hwangbo, M. and K.H. Chu, Recent advances in production and extraction of bacterial 
lipids for biofuel production. Sci Total Environ, 2020. 734: p. 139420. 
247. De Riso, V., et al., Gene silencing in the marine diatom Phaeodactylum tricornutum. 
Nucleic Acids Res, 2009. 37(14): p. e96. 
248. Chisti, Y., Constraints to commercialization of algal fuels. J Biotechnol, 2013. 167(3): p. 
201-14. 
249. Huysman, M.J., et al., AUREOCHROME1a-mediated induction of the diatom-specific 
cyclin dsCYC2 controls the onset of cell division in diatoms (Phaeodactylum 
tricornutum). Plant Cell, 2013. 25(1): p. 215-28. 
250. Huysman, M.J., W. Vyverman, and L. De Veylder, Molecular regulation of the diatom 
cell cycle. J Exp Bot, 2014. 65(10): p. 2573-84. 
251. Coesel, S., et al., Diatom PtCPF1 is a new cryptochrome/photolyase family member with 
DNA repair and transcription regulation activity. EMBO reports, 2009. 10(6): p. 655-
661. 
252. Anne Jungandreas1, Benjamin Schellenberger Costa, Torsten Jakob, Martin von Bergen 
and C.W. Sven Baumann, The Acclimation of Phaeodactylum tricornutum to Blue and 
Red Light Does Not Influence the Photosynthetic Light Reaction but Strongly Disturbs 
the Carbon Allocation Pattern. PLOS One, 2014. 9(8): p. 1-14. 
253. Benjamin Schellenberger Costa, M.S., Anne Jungandreas, Carolina Rio Bartulos, Ansgar 
Gruber, Torsten Jakob, Peter G. Kroth , Christian Wilhelm, Aureochrome 1a is involved 
in the Photoacclimation of the Diatom Phaeodactylum tricornutum.pdf. PLoS One, 2013. 
8(9): p. 1-14. 
254. Beel, B., et al., A flavin binding cryptochrome photoreceptor responds to both blue and 
red light in Chlamydomonas reinhardtii. The Plant Cell Online, 2012. 24(7): p. 2992-
3008. 
 130 
255. Armbrust, E.V., The life of diatoms in the world's oceans. Nature, 2009. 459(7244): p. 
185-192. 
256. Cao, S., J. Wang, and D. Chen, Settlement and cell division of diatom Navicula can be 
influenced by light of various qualities and intensities. J Basic Microbiol, 2013. 53(11): 
p. 884-94. 
257. Xuhong Yu, H.L., John Klejnot, and Chentao Lin, The Cryptochrome Blue Light 
Receptors, in The Arabidopsis Book. 2010. p. 1-27. 
258. Xuhong Yu, H.L., John Klejnot, and Chentao Lin, The Cryptochrome Blue Light 
Receptors.pdf. The Arabidopsis book/American Society of Plant Biologists, 2010: p. 2-
27. 
259. Su, Y., N. Lundholm, and M. Ellegaard, The effect of different light regimes on diatom 
frustule silicon concentration. Algal Research, 2018. 29: p. 36-40. 
260. Su, Y., Nina Lundholm, Søren M. M. Friis & Marianne Ellegaard, Implications for 
photonic applications of diatom growth and frustule nanostructure changes in response 
to different light wavelengths. Nano Rsearch, 2015: p. 1-10. 
261. Huysman, M., et al., Genome-wide analysis of the diatom cell cycle unveils a novel type 
of cyclins involved in environmental signaling. Genome Biology, 2010. 11(2): p. R17. 
262. Kafri, R., et al., Dynamics extracted from fixed cells reveal feedback linking cell growth 
to cell cycle. Nature, 2013. 494(7438): p. 480-3. 
263. Ritchie, R.J., Consistent sets of spectrophotometric chlorophyll equations for acetone, 
methanol and ethanol solvents. Photosynth Res, 2006. 89(1): p. 27-41. 
264. Ritchie, R., Universal chlorophyll equations for estimating chlorophylls a, b, c, and d 
and total chlorophylls in natural assemblages of photosynthetic organisms using acetone, 
methanol, or ethanol solvents. Photosynthetica, 2008. 46(1): p. 115-126. 
265. Beale, G.L.M.a.S., Blue light regulated expression of genes for two early steps of 
chlorophyll biosynthesis in chlamydomonas reinhardtii. Plant Physiol, 1995. 109: p. 471-
479. 
266. Blankenship, R.E., Molecular Mechanisms of Photosynthesis. 2002, Ames, Iowa, USA: 
Iowa State University Press A Blackwell Science Company. 
267. H. K. Lichtenthaler, G.K., U. Prenzel, C. Buschmann, and D. Meier, Adaptation of 
chloroplast ultrastructure and of chlorophyll protein levels to high light and low light 
growth conditions. Zeitschrift für Naturforschung, 1982. 37: p. 464-475. 
268. Needoba, J.A. and P.J. Harrison, INFLUENCE OF LOW LIGHT AND A LIGHT: DARK 
CYCLE ON NO3– UPTAKE, INTRACELLULAR NO3–, AND NITROGEN ISOTOPE 
FRACTIONATION BY MARINE PHYTOPLANKTON1. Journal of Phycology, 2004. 
40(3): p. 505-516. 
269. Vadiveloo, A., et al., Effect of different light spectra on the growth and productivity of 
acclimated Nannochloropsis sp. (Eustigmatophyceae). Algal Research, 2015. 8: p. 121-
127. 
270. Kenneth Eskins, C.Z.J.a.R.S., Light‐quality and irradiance effects on pigments, light‐
harvesting proteins and Rubisco activity in a chlorophyll‐ and light‐ harvesting‐deficient 
soybean mutant. Physiologia Plantarum, 1991. 83: p. 47-53. 
271. Marc Valls, V. and c.d. Lorenzo, Exploiting the genetic and biochemical capacities of 
bacteria for the remediation of heavy metal pollution. FEMS Microbiology Reviews, 
2002. 26: p. 327-338. 
 131 
272. Anindita Mitra, S.C., and Dharmendra K. Gupta, Uptake, Transport, and Remediation of 
Arsenic by Algae and Higher Plants, in Arsenic Contamination in the Environment, S.C. 
D.K. Gupta, Editor. 2017, Springer International Publishing. p. 145-169. 
273. Arsenic Contamination in the Environment. 2017, UK: Springer. 
274. Tripathi, R.D., et al., Arsenic hazards: strategies for tolerance and remediation by plants. 
Trends Biotechnol, 2007. 25(4): p. 158-65. 
275. Qin, J., et al., Biotransformation of arsenic by a Yellowstone thermoacidophilic 
eukaryotic alga. Proc Natl Acad Sci U S A, 2009. 106(13): p. 5213-7. 
276. Burke, D.L.J.a.R.M., BIOLOGICAL MEDIATION OF CHEMICAL SPECIATION II. 
ARSENATE REDUCTION DURING MARINE PHYTOPLANKTON BLOOMS. 
Chemosphere, 1978. 8: p. 645-648. 
277. Sanders, J.G., Effects of arsenic speciation. J. Phycol., 1979. 15: p. 424-484. 
278. Toshikazu Kaise, M.O., Takao Nozaki, Kazuhisa Saitoh, Teruaki Sakurai, Chiyo 
Matsubara, Chuichi Watanabe and Ken’ichi Hanaoka, Biomethylation of Arsenic in an 
Arsenic-rich Freshwater Environment. Applied Organometallic Chemistry, 1997. 11: p. 
297-304. 
279. Jie Qin, C.R.L., Chungang Yuan, X. Chris Le, Timothy R. McDermott, and Barry P. 
Rosen, Biotransformation of arsenic by a Yellowstone thermoacidophilic eukaryotic alga. 
PNAS, 2009. 106(13): p. 5213–5217. 
280. Wang, S. and C.N. Mulligan, Occurrence of arsenic contamination in Canada: sources, 
behavior and distribution. Sci Total Environ, 2006. 366(2-3): p. 701-21. 
281. A. G. Howard, S.D.W.C.D.K.E.E.A., and D. A. Purdie, Arsenic Speciation and Seasonal 
Changes in Nutrient Availability and Micro-plankton Abundance in Southampton Water, 
U.K. Estuarine Coastal and Shelf Science, 1995. 40: p. 435-450. 
282. Andreae, M.O., Distribution and speciation of arsenic in natural waters and some marine 
algae. Deep Sea Research, 1978. 25(4): p. 391-402. 
283. Meharg, A.A. and A. Raab, Getting to the bottom of arsenic standards and guidelines. 
Environ Sci Technol, 2010. 44(12): p. 4395-9. 
284. National Primary Drinking Water Regulations; Arsenic and Clarifications to Compliance 
and New Source Contaminants Monitoring; Final Rule. January 22, 2001. p. 6976-7066. 
285. Barral-Fraga, L., et al., Short-term arsenic exposure reduces diatom cell size in biofilm 
communities. Environ Sci Pollut Res Int, 2016. 23(5): p. 4257-70. 
286. Cibik, J.G.S.S.J., Adaptive behavior of euryhaline phytoplankton communities to arsenic 
stress. Mar. Ecol. Prog. Ser., 1985. 22: p. 199-205. 
287. Healey, D.P.F.P., EFFECTS OF ARSENATE ON GROWTH AND PHOSPHORUS 
METABOLISM OF PHYTOPLANKTON. J Phycol, 1978. 14: p. 337-341. 
288. Oremland, J.R.L.a.R.S., Microbial Transformations of Arsenic in the Environment From 
Soda Lakes to Aquifers. Elements, 2006. 2: p. 85-90. 
289. Sele, V., et al., Arsenolipids in marine oils and fats: A review of occurrence, chemistry 
and future research needs. Food Chemistry, 2012. 133(3): p. 618-630. 
290. Zhu, C. and Y. Lee, Determination of biomass dry weight of marine microalgae. Journal 
of Applied Phycology, 1997. 9(2): p. 189-194. 
291. Chelf, P., Environmental control of lipid and biomass production in two diatom species. 
Journal of Applied Phycology, 1990. 2(2): p. 121-129. 
 132 
292. Slaughter, D.C., R.E. Macur, and W.P. Inskeep, Inhibition of microbial arsenate 
reduction by phosphate. Microbiol Res, 2012. 167(3): p. 151-6. 
293. Zienkiewicz, K., et al., Stress-induced neutral lipid biosynthesis in microalgae - 
Molecular, cellular and physiological insights. Biochim Biophys Acta, 2016. 1861(9 Pt 
B): p. 1269-1281. 
294. Bligh, E.G. and W.J. Dyer, A rapid method of total lipid extraction and purification. 
Canadian journal of biochemistry and physiology, 1959. 37(8): p. 911-917. 
295. Griffiths, M., R. van Hille, and S. Harrison, Selection of Direct Transesterification as the 
Preferred Method for Assay of Fatty Acid Content of Microalgae. Lipids, 2010. 45(11): 
p. 1053-1060. 
296. Branco, R., A.P. Chung, and P.V. Morais, Sequencing and expression of two arsenic 
resistance operons with different functions in the highly arsenic-resistant strain 
Ochrobactrum tritici SCII24T. BMC Microbiol, 2008. 8: p. 95. 
297. Zhao, C., et al., Insights into arsenic multi-operons expression and resistance 
mechanisms in Rhodopseudomonas palustris CGA009. Front Microbiol, 2015. 6: p. 986. 
298. Rothstein, A., Interactions of arsenate with the phosphate-transporting system of yeast. J 
Gen Physiol, 1963. 46: p. 1075-85. 
299. SCARBOROUGH, G.A., The mechanism of arsenate inhibition of the glucose active 
transport system in Neurospora crassa. Archives of Biochemistry and Biophysics, 1975. 
166: p. 245-250. 
300. Shendure, J. and E.L. Aiden, The expanding scope of DNA sequencing. Nature 
biotechnology, 2012. 30(11): p. 1084. 
301. Mostovoy, Y., et al., A hybrid approach for de novo human genome sequence assembly 
and phasing. Nature methods, 2016. 13(7): p. 587-590. 
302. VanBuren, R., et al., Single-molecule sequencing of the desiccation-tolerant grass 
Oropetium thomaeum. Nature, 2015. 527(7579): p. 508-511. 
303. Bickhart, D.M., et al., Single-molecule sequencing and conformational capture enable de 
novo mammalian reference genomes. bioRxiv, 2016: p. 064352. 
304. Ashrafi, H. Using spinach to compare technologies for whole genome assemblies. in 
Plant and Animal Genome Conference XXIII. 2015. San Diego, CA. 
305. Bertioli, D.J., et al., The genome sequences of Arachis duranensis and Arachis ipaensis, 
the diploid ancestors of cultivated peanut. Nat Genet, 2016. 48(4): p. 438-46. 
306. Chaney, L., et al., Genome Mapping in Plant Comparative Genomics. Trends Plant Sci, 
2016. 
307. Schwartz, D.C., et al., Ordered restriction maps of Saccharomyces cerevisiae 
chromosomes constructed by optical mapping. Science, 1993. 262(5130): p. 110-114. 
308. Imelfort, M. and D. Edwards, De novo sequencing of plant genomes using second-
generation technologies. Brief Bioinform, 2009. 10(6): p. 609-18. 
309. Somes K. Das, M.D.A., Matthew C. Akana, Paru Deshpande, Han Cao and Ming Xiao, 
Single molecule linear analysis of DNA in nano-channel labeled with sequence specific 
fluorescent probes. Nucleic Acids Res, 2010. 38(18): p. 1-8. 
310. Shelton, J.M., et al., Tools and pipelines for BioNano data: molecule assembly pipeline 
and FASTA super scaffolding tool. BMC Genomics, 2015. 16(1): p. 734. 
311. Belton, J.M., et al., Hi-C: a comprehensive technique to capture the conformation of 
genomes. Methods, 2012. 58(3): p. 268-76. 
 133 
312. Schatz, M.C., J. Witkowski, and W.R. McCombie, Current challenges in de novo plant 
genome sequencing and assembly. Genome biology, 2012. 13(4): p. 1. 
313. Jiao, W.B., et al., Improving and correcting the contiguity of long-read genome 
assemblies of three plant species using optical mapping and chromosome conformation 
capture data. Genome Res, 2017. 
314. Jiao, Y., et al., Improved maize reference genome with single-molecule technologies. 
Nature, 2017. 546(7659): p. 524-527. 
315. Chin, C.-S., et al., Phased diploid genome assembly with single-molecule real-time 
sequencing. Nat Meth, 2016. 13(12): p. 1050-1054. 
316. Jarvis, D.E., et al., The genome of Chenopodium quinoa. Nature, 2017. 542(7641): p. 
307. 
317. Zapata, L., et al., Chromosome-level assembly of Arabidopsis thaliana Ler reveals the 
extent of translocation and inversion polymorphisms. Proceedings of the National 
Academy of Sciences, 2016. 113(28): p. E4052-E4060. 
318. Berlin, K., et al., Assembling large genomes with single-molecule sequencing and 
locality-sensitive hashing. Nature biotechnology, 2015. 33(6): p. 623-630. 
319. Zhang, J., et al., Extensive sequence divergence between the reference genomes of two 
elite indica rice varieties Zhenshan 97 and Minghui 63. Proceedings of the National 
Academy of Sciences, 2016. 113(35): p. E5163-E5171. 
320. Daccord, N., et al., High-quality de novo assembly of the apple genome and methylome 
dynamics of early fruit development. Nature genetics, 2017. 49(7): p. 1099-1106. 
321. Du, H., et al., Sequencing and de novo assembly of a near complete indica rice genome. 
Nature communications, 2017. 8(1): p. 1-12. 
322. Reyes-Chin-Wo, S., et al., Genome assembly with in vitro proximity ligation data and 
whole-genome triplication in lettuce. Nature Communications, 2017. 8(1): p. 1-11. 
323. Bredeson, J.V., et al., Sequencing wild and cultivated cassava and related species reveals 
extensive interspecific hybridization and genetic diversity. Nature biotechnology, 2016. 
34(5): p. 562-570. 
324. Pootakham, W., et al., De novo hybrid assembly of the rubber tree genome reveals 
evidence of paleotetraploidy in Hevea species. Scientific reports, 2017. 7: p. 41457. 
325. Sato, S., et al., Genome structure of the legume, Lotus japonicus. DNA research, 2008. 
15(4): p. 227-239. 
326. Schmutz, J., et al., Genome sequence of the palaeopolyploid soybean. Nature, 2010. 
463(7278): p. 178-83. 
327. Young, N.D., et al., The Medicago genome provides insight into the evolution of 
rhizobial symbioses. Nature, 2011. 480(7378): p. 520-4. 
328. Varshney, R.K., et al., Draft genome sequence of chickpea (Cicer arietinum) provides a 
resource for trait improvement. Nat Biotechnol, 2013. 31(3): p. 240-6. 
329. Kang, Y.J., et al., Genome sequence of mungbean and insights into evolution within 
Vigna species. Nat Commun, 2014. 5: p. 5443. 
330. Chen, X., et al., Draft genome of the peanut A-genome progenitor (Arachis duranensis) 
provides insights into geocarpy, oil biosynthesis, and allergens. Proc Natl Acad Sci U S 
A, 2016. 113(24): p. 6785-90. 
331. Li, Y.H., et al., De novo assembly of soybean wild relatives for pan-genome analysis of 
diversity and agronomic traits. Nat Biotechnol, 2014. 32(10): p. 1045-52. 
 134 
332. Gan, X., et al., Multiple reference genomes and transcriptomes for Arabidopsis thaliana. 
Nature, 2011. 477(7365): p. 419-23. 
333. Schatz, M.C., et al., Whole genome de novo assemblies of three divergent strains of rice, 
Oryza sativa, document novel gene space of aus and indica. Genome biology, 2014. 
15(11): p. 1. 
334. Zhou, P., et al., Exploring structural variation and gene family architecture with De Novo 
assemblies of 15 Medicago genomes. BMC Genomics, 2017. 18(1): p. 261. 
335. Golicz, A.A., et al., The pangenome of an agronomically important crop plant Brassica 
oleracea. Nature communications, 2016. 7(1): p. 1-8. 
336. Tadege, M., et al., Large-scale insertional mutagenesis using the Tnt1 retrotransposon in 
the model legume Medicago truncatula. Plant J, 2008. 54(2): p. 335-47. 
337. Branca, A., et al., Whole-genome nucleotide diversity, recombination, and linkage 
disequilibrium in the model legume Medicago truncatula. Proceedings of the National 
Academy of Sciences, 2011. 108(42): p. E864-E870. 
338. Tadege, M., P. Ratet, and K.S. Mysore, Insertional mutagenesis: a Swiss Army knife for 
functional genomics of Medicago truncatula. Trends Plant Sci, 2005. 10(5): p. 229-35. 
339. Tang, H., et al., An improved genome release (version Mt4. 0) for the model legume 
Medicago truncatula. BMC genomics, 2014. 15(1): p. 1. 
340. Cannon, S.B., et al., Legume genome evolution viewed through the Medicago truncatula 
and Lotus japonicus genomes. Proceedings of the National Academy of Sciences, 2006. 
103(40): p. 14959-14964. 
341. Blanc, G. and K.H. Wolfe, Widespread paleopolyploidy in model plant species inferred 
from age distributions of duplicate genes. Plant Cell, 2004. 16(7): p. 1667-78. 
342. Gnerre, S., et al., High-quality draft assemblies of mammalian genomes from massively 
parallel sequence data. Proceedings of the National Academy of Sciences, 2011. 108(4): 
p. 1513-1518. 
343. Kamphuis, L.G., et al., The Medicago truncatula reference accession A17 has an 
aberrant chromosomal configuration. New Phytol, 2007. 174(2): p. 299-303. 
344. Chin, C.-S., et al., Nonhybrid, finished microbial genome assemblies from long-read 
SMRT sequencing data. Nature methods, 2013. 10(6): p. 563-569. 
345. Koren, S., et al., Hybrid error correction and de novo assembly of single-molecule 
sequencing reads. Nature biotechnology, 2012. 30(7): p. 693-700. 
346. Koren, S. and A.M. Phillippy, One chromosome, one contig: complete microbial 
genomes from long-read sequencing and assembly. Current opinion in microbiology, 
2015. 23: p. 110-120. 
347. Ribeiro, F.J., et al., Finished bacterial genomes from shotgun sequence data. Genome 
research, 2012. 22(11): p. 2270-2277. 
348. English, A.C., et al., Mind the gap: upgrading genomes with Pacific Biosciences RS long-
read sequencing technology. PloS one, 2012. 7(11): p. e47768. 
349. Hongzhi Cao, A.R.H., Dandan Cao, Ernest T Lam, Yuhui Sun, Haodong Huang, Xiao 
Liu, Liya Lin, Warren Andrew, Saki Chan, Shujia Huang, Xin Tong, Michael Requa, 
Thomas Anantharaman, Anders Krogh, Huanming Yang, Han Cao and Xun Xu, Rapid 
detection of structural variation in a human genome using nanochannel-based genome 
mapping technology. GigaScience, 2014. 3(34): p. 1-11. 
 135 
350. Stankova, H., et al., BioNano genome mapping of individual chromosomes supports 
physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol J, 
2016. 
351. Meyer, C.A. and X.S. Liu, Identifying and mitigating bias in next-generation sequencing 
methods for chromatin biology. Nat Rev Genet, 2014. 15(11): p. 709-21. 
352. Yaffe, E. and A. Tanay, Probabilistic modeling of Hi-C contact maps eliminates 
systematic biases to characterize global chromosomal architecture. Nat Genet, 2011. 
43(11): p. 1059-65. 
353. Santoferrara, L.F., et al., De novo transcriptomes of a mixotrophic and a heterotrophic 
ciliate from marine plankton. PLoS One, 2014. 9(7): p. e101418. 
354. Simpson, J.T., et al., ABySS: a parallel assembler for short read sequence data. Genome 
Res, 2009. 19(6): p. 1117-23. 
355. Huang, X. and A. Madan, CAP3: A DNA sequence assembly program. Genome 
Research, 1999. 9(9): p. 868-877. 
356. ChristianIseli, C.V.J.P.B., ESTScan a program for detecting, evaluating, and 
reconstructing potential coding regions in ESTsequences. SMB-99 Proceedings, 1999: p. 
138-148. 
357. Wu, T.D. and C.K. Watanabe, GMAP: a genomic mapping and alignment program for 
mRNA and EST sequences. Bioinformatics, 2005. 21(9): p. 1859-75. 
358. Chaisson, M.J.a.T., Glenn, Mapping single molecule sequencing reads using basic local 
alignment with successive refinement (BLASR) application and theory. BMC 
Bioinformatics, 2012. 13(328): p. 1-17. 
359. Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data without a 
reference genome. Nature biotechnology, 2011. 29(7): p. 644-652. 
360. Stanke, M. and S. Waack, Gene prediction with a hidden Markov model and a new intron 
submodel. Bioinformatics, 2003. 19(Suppl 2): p. ii215-ii225. 
361. Kent, W.J., BLAT—the BLAST-like alignment tool. Genome research, 2002. 12(4): p. 
656-664. 
362. Morgulis, A., et al., A fast and symmetric DUST implementation to mask low-complexity 
DNA sequences. Journal of Computational Biology, 2006. 13(5): p. 1028-1040. 
363. Benson, G., Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids 
research, 1999. 27(2): p. 573. 
364. Finn, R.D., et al., The Pfam protein families database: towards a more sustainable future. 
Nucleic acids research, 2016. 44(D1): p. D279-D285. 
365. J., S., A look back at the US Department of Energy’s aquatic species program 
1988((NREL/TP-580-24190)). 
366. Chisti, Y., Biodiesel from microalgae. Biotechnol Adv, 2007. 25. 
367. Georgianna, D.R. and S.P. Mayfield, Exploiting diversity and synthetic biology for the 
production of algal biofuels. Nature, 2012. 488. 
368. Courchesne, N.M.D., et al., Enhancement of lipid production using biochemical, genetic 
and transcription factor engineering approaches. J Bacteriol, 2009. 141. 
369. Greenwell, H.C., et al., Placing microalgae on the biofuels priority list: a review of the 
technological challenges. J R Soc Interface, 2010. 7. 
370. Razghefard, R., Algal biofuels. Photosynth Res, 2013. 117. 
 136 
371. Dismukes, G.C., et al., Aquatic phototrophs: efficient alternatives to land-based crops for 
biofuels. Curr Opin Biotechnol, 2008. 19. 
372. Hu, Q., et al., Microalgal triacylglycerols as feedstocks for biofuel production: 
perspectives and advances. Plant J, 2008. 54. 
373. Botte, P., et al., Combined exploitation of CO2 and nutrient replenishment for increasing 
biomass and lipid productivity of the marine diatoms Thalassiosira weissflogii and 
Cyclotella cryptica. Journal of Applied Phycology, 2017. 
374. Mus, F., et al., Physiological and molecular analysis of carbon source supplementation 
and pH stress-induced lipid accumulation in the marine diatom Phaeodactylum 
tricornutum. Applied Microbiology and Biotechnology, 2013. 
375. Chu, F., et al., Phosphorus plays an important role in enhancing biodiesel productivity of 
Chlorella vulgaris under nitrogen deficiency. Bioresour Technol, 2013. 134. 
376. Schnurr, P.J., G.S. Espie, and D.G. Allen, Algae biofilm growth and the potential to 
stimulate lipid accumulation through nutrient starvation. Bioresour Technol, 2013. 136. 
377. Atta, M., et al., Intensity of blue LED light: a potential stimulus for biomass and lipid 
content in fresh water microalgae Chlorella vulgaris. Bioresour Technol, 2013. 148. 
378. Whitman, W.B., D.C. Coleman, and W.J. Wiebe, Prokaryotes: the unseen majority. Proc 
Natl Acad Sci U S A, 1998. 95. 
379. Pate, R., G. Klise, and B. Wu, Resource demand implications for US algae biofuels 
production scale-up. Appl Energy, 2011. 88. 
380. Aslan, S. and I.K. Kapdan, Batch kinetics of nitrogen and phosphorus removal from 
synthetic wastewater by algae. Ecol Eng, 2006. 28. 
381. Hoffmann, J.P., Wastewater treatment with suspended and nonsuspended algae. J 
Phycol, 2002. 34. 
382. Mallick, N., Biotechnological potential of immobilized algae for wastewater N, P, and 
metal removal: a review. BioMetals, 2002. 15. 
383. Pittman, J.K., A.P. Dean, and O. Osundeko, The potential of sustainable algal biofuel 
production using wastewater resources. Bioresour Technol, 2011. 102. 
384. Gordon, J.M. and J.E.W. Polle, Ultrahigh bioproductivity from algae. Appl Microbiol 
Biotechnol, 2007. 76. 
385. Schuhmann, H., D.K.Y. Lim, and P.M. Schenk, Perspectives on metabolic engineering 
for increased lipid contents in microalgae. Biofuels, 2012. 3. 
386. Valenzuela, J., et al., Potential role of multiple carbon fixation pathways during lipid 
accumulation in Phaeodactylum tricornutum. Biotechnol Biofuels, 2012. 5. 
387. Odum, E.P., Trends expected in stressed ecosystems. Bioscience, 1985. 35. 
388. Ratha, S.K., et al., Exploring nutritional modes of cultivation for enhancing lipid 
accumulation in microalgae. J Basic Microbiol, 2013. 53. 
389. Khozin-Goldberg, I. and Z. Cohen, The effect of phosphate starvation on the lipid and 
fatty acid composition of the fresh water eustigmatophyte Monodus subterraneus. 
Phytochemistry, 2006. 67. 
390. Aury, J.M., et al., Global trends of whole-genome duplications revealed by the ciliate 
Paramecium tetraurelia. Nature, 2006. 444(7116): p. 171-8. 
391. Shekh, A.Y., et al., Stress-induced lipids are unsuitable as a direct biodiesel feedstock: a 
case study with Chlorella pyrenoidose. Bioresour Technol, 2013. 138. 
 137 
392. Msanne, J., et al., Metabolic and gene expression changes triggered by nitrogen 
deprivation in the photoautotrophically grown microalgae Chlamydomonas reinhardtii 
and Coccomyxa sp. C-169. Phytochemistry, 2012. 75. 
393. Halsey, K.H., et al., A common partitioning strategy fro photosynthetic products in 
evolutionary distinct phytoplankton species. New Phytol, 2013. 198. 
394. Brown, A.P., A.R. Slabas, and J.B. Rafferty, Fatty acid biosynthesis in plants—metabolic 
pathways, structure and organization, in Lipids in photosynthesis. 2009, Springer. p. 11-
34. 
395. Wang, Z.T., et al., Algal lipid bodies: stress induction, purification, and biochemical 
characterization in wild-type and starchless Chlamydomonas reinhardtii. Eukaryotic 
Cell, 2009. 8. 
396. Valenzuela, J., et al., Nutrient re-supplementation arrests bio-oil accumulation in 
Phaeodactylum tricornutum. Appl Microbiol Biotechnol, 2013. 97. 
397. Breuer, G., et al., The impact of nitrogen starvation on the dynamics of triacylgycerol 
accumulation in nine microalgae strains. Bioresour Technol, 2012. 124. 
398. Wu, Y.H., Y. Yu, and H.Y. Hu, Potential biomass yield per phosphorus and lipid 
accumulation property of seven microalgal species. Bioresour Technol, 2013. 130: p. 
599-602. 
399. Li, Y., et al., Adhesion behavior of marine benthic diatom Nitzschia closterium 
MMDL533 on cationically modified phosphorylcholine copolymer films. Asia‚ÄêPacific 
Journal of Chemical Engineering, 2013. 
400. Ren, H.Y., et al., A new lipid-rich microalgae Scenedesmus strain R-16 isolated using 
Nile Red staining: effects of carbon and nitrogen sources and initial pH on the biomass 
and lipid production. Biotechnol Biofuels, 2013. 6. 
401. Kothari, R., et al., Production of biodiesel from microalgae Chlamydomonas 
polypyrenoideum grown on dairy industry wastewater. Bioresour Technol, 2013. 144. 
402. Eustance, E., et al., Growth, nitrogen utilization and biodiesel potential for two 
chlorophytes grown on ammonium, nitrate or urea. Journal of Applied Phycology, 2013: 
p. 1-15. 
403. Bertozzini, E., et al., Neutral lipid content and biomass production in Skeletonema 
marinoi (Bacillariophyceae) culture in response to nitrate limitation. Appl Biochem 
Biotechnol, 2013. 170(7): p. 1624-36. 
404. Yang, Y., et al., At high temperature lipid production in Ettlia oleoabundans occurs 
before nitrate depletion. Appl Microbiol Biotechnol, 2013. 97. 
405. Feng, P., et al., Lipid accumulation and growth characteristics of Chlorella zofingiensis 
under different nitrate and phosphate concentrations. J Biosci Bioeng, 2012. 114. 
406. Liang, K., et al., Effect of phosphorus on lipid accumulation in freshwater microalga 
Chlorella sp. J Appl Phycol, 2012. 25. 
407. Burrows, E.H., et al., Dynamics of Lipid Biosynthesis and Redistribution in the Marine 
Diatom Phaeodactylum tricornutum Under Nitrate Deprivation. BioEnergy Research, 
2012. 5(4): p. 876-885. 
408. Redfield, A., The biological control of chemical factors in the environment. Am Sci, 
1958. 46. 
 138 
409. Palmqvist, K., S. Sjöberg, and G. Samuelsson, Induction of inorganic carbon 
accumulation in the unicellular green algae Scenedesmus obliquus and Chlamydomonas 
reinhardtii. Plant Physiol, 1988. 87. 
410. Raven, J.A., Inorganic carbon acquisition by eukaryotic algae: four current questions. 
Photosynth Res, 2010. 106(1-2): p. 123-34. 
411. Spalding, M.H., Microalgal carbon-dioxide-concentrating mechanisms: Chlamydomonas 
inorganic carbon transporters. J Exp Bot, 2008. 59. 
412. Sharma, K.K., H. Schuhmann, and P.M. Schenk, High lipid induction in microalgae for 
biodiesel production. Energies, 2012. 5. 
413. Roessler, P.G., Effects if silicon deficiency in lipid composition and metabolism in the 
diatom Cyclotella cryptica. J Phycol, 1988. 24. 
414. Kroger, N. and N. Poulsen, Diatoms-from cell wall biogenesis to nanotechnology. Annu 
Rev Genet, 2008. 42: p. 83-107. 
415. Raven, J.A., The transport and function of silicon in plants. Biol Rev, 1983. 58. 
416. Burrows, E.H., et al., Dynamics of lipid biosynthesis and redistribution in the marine 
diatom Phaeodactylum tricornutum under nitrate deprivation. BioEnergy Res, 2012. 5. 
417. Smith, S.R., R.M. Abbriano, and M. Hildebrand, Comparative analysis of diatom 
genomes reveals substantial differences in the organization of carbon partitioning 
pathways. Algal Res, 2012. 1. 
418. Lombardi, A. and P. Wangersky, Influence of phosphorus and silicon on lipid class 
production by the marine diatom Chaetoceros gracilis grown in turbidostat cage 
cultures. Marine ecology progress series. Oldendorf, 1991. 77(1): p. 39-47. 
419. McGinnis, K., T. Dempster, and M. Sommerfeld, Characterization of the growth and 
lipid content of the diatom Chaetoceros muelleri. Journal of Applied Phycology, 1997. 
9(1): p. 19-24. 
420. Obata, T., A.R. Fernie, and A. Nunes-Nesi, The central carbon and energy metabolism of 
marine diatoms. Metabolites, 2013. 3(2): p. 325-46. 
421. Yu, E., et al., Triacylglycerol accumulation and profiling in the model diatoms 
Thalassiosira pseudonana and Phaeodactylum tricornutum (Baccilariophyceae) during 
starvation. Journal of Applied Phycology, 2009. 21(6): p. 669-681. 
422. Taguchi, S., J.A. Hirata, and E.A. Laws, Silicate deficiency and lipid synthesis of marine 
diatoms. J Phycol, 1987. 23. 
423. Darley, W.M. and B.E. Volcani, Role of silicon in diatom metabolism: A silicon 
requirement for deoxyribonucleic acid synthesis in the diatom Cylindrotheca fusiformis 
Reimann and Lewin. Experimental Cell Research, 1969. 58(2-3): p. 334-342. 
424. Buesseler, K.O., et al., The effects of iron fertilization on carbon sequestration in the 
southern ocean. Science, 2004. 304. 
425. Allen, A.E., et al., Whole-cell response of the pennate diatom Phaeodactylum 
tricornutum to iron starvation. Proc Natl Acad Sci U S A, 2008. 105(30): p. 10438-43. 
426. Morel, F.M.M., J.G. Rueter, and N.M. Price, Iron nutrition of phytoplankton and its 
possible importance in the ecology of ocean regions with high nutrient and low biomass. 
Oceanography, 1991. 4. 
427. Johnson, M.B. and Z. Wen, Development of an attached microalgal growth system for 
biofuel production. Appl Microbiol Biotechnol, 2010. 85. 
 139 
428. Christenson, L.B. and R.C. Sims, Rotating algal biofilm reactor and spool harvester for 
wastewater treatment with biofuels by-products. Biotechnol Bioeng, 2012. 109. 
429. Schnurr, P.J., G.S. Espie, and D.G. Allen, Algae biofilm growth and the potential to 
stimulate lipid accumulation through nutrient starvation. Bioresour Technol, 2013. 136: 
p. 337-44. 
430. Ozkan, A., et al., Reduction of water and energy requirement of algae cultivation using 
an algae biofilm photobioreactor. Bioresour Technol, 2012. 114. 
431. Patil, J.S. and A.C. Anil, Biofilm diatom community structure: influence of temporal and 
substratum variability. Biofouling, 2005. 21. 
432. Irving, T.E. and D.G. Allen, Species and material considerations in the formation and 
development of microalgal biofilms. Appl Microbiol Biotechnol, 2011. 92. 
433. Avendaño-Herrera, R.E. and C.E. Riquelme, Production of a diatom-bacteria biofilm in a 
photobioreactor for aquaculture applications. Aquac Eng, 2007. 36. 
434. Tilman, D., Tests of resource competition theory using 4 species of Lake Michigan algae. 
Ecology, 1981. 62. 
435. Downing, A.L. and M.A. Leibold, Ecosystme consequences of species richness and 
composition in pond food webs. Nature, 2002. 416. 
436. Striebel, M., S. Behl, and H. Stibor, The coupling of biodiversity and productivity in 
phytoplankton communities: consequences for biomass stoichiometry. Ecology, 2009. 90. 
437. Interlandi, S.J. and S.S. Kilham, Limiting resources and the regulation of diversity in 
phytoplankton communities. Ecology, 2001. 82. 
438. Stockenreiter, M., et al., The effect of species diversity on lipid production by microalgal 
communities. J Appl Phycol, 2012. 24. 
439. Schindler, D.W., et al., The cultural eutrophication of Lac la Biche, Alberta, Canada: a 
paleoecological study. Can J Fish Aquat Sci, 2008. 65. 
440. Schindler, D.W., Evolution of phosphorus limitation in lakes. Science, 1977. 195. 
441. Smith, V.H. and S.J. Bennett, Nitrogen:phosporus supply ratios and phytoplankton 
community structure in lakes. Arch Hydrobiol, 1999. 146. 
442. Stockenreiter, M., et al., Functional group richness: implications of biodiversity for light 
use and lipid yield in microalgae. J Phycol, 2013. 49. 
443. Tilman, D., Resource competition between planktonic algae—experimental and 
theoretical approach. Ecology, 1977. 58. 
444. Murray, A.G., Phytoplankton exudation—exploitation of the microbial loop as a defense 
against algal viruses. J Plankton Res, 1995. 17. 
445. Rhodes, C.J. and A.P. Martin, The influence of viral infection on a plankton ecosystem 
undergoing nutrient enrichment. J Theo Biol, 2010. 265. 
446. Smith, S., Organic contaminants in sewage sludge (biosolids) and their significance for 
agricultural recycling. Philosophical Transactions of the Royal Society A: Mathematical, 
Physical and Engineering Sciences, 2009. 367(1904): p. 4005-4041. 
447. Singh, N.K. and D.W. Dhar, Microalgae as second generation biofuel. Agron Sustain 
Dev, 2011. 31. 
448. Naumann, T., et al., Growing microalgae as aquaculture feeds on twin-layers: a novel 
solid-state photobioreactor. J Appl Phycol, 2013. 25. 
449. Huesemann, M.H., et al., A screening model to predict microalgae biomass growth in 
photobioreactors and raceway ponds. Biotechnol Bioeng, 2013. 110. 
 140 
450. Sander, K. and G.S. Murthy, Life cycle analysis of algae biodiesel. Int J Life Cycle 
Assess, 2010. 15. 
451. Clarens, A.F., et al., Environmental life cycle comparison of algae to other bioenergy 
feedstocks. Environ Sci Technol, 2011. 44. 
452. Zaimes, G.G. and V. Khanna, Environmental sustainability of emerging algal biofuels: a 
comparative life cycle evaluation of algal biodiesel and renewable diesel. Environ Prog 
Sustain Energy, 2013. 32. 
453. Klöpffer, W., Life cycle assessment. Environ Sci Pollution Res, 1997. 4. 
454. Brennan, L. and P. Owende, Biofuels from microalgae—a review of technologies for 
production, processing, and extractions of biofuels and co-products. Renew Sust Energ 
Rev, 2010. 14. 
455. Lardon, L., et al., Life-cycle assessment of biodiesel production from microalgae. 
Environ Sci Technol, 2009. 43. 
456. Kirrolia, A., N.R. Bishnoi, and R. Singh, Microalgae as a boon for sustainable energy 
production and its future research and development aspects. Renew Sustain Energy Rev, 
2013. 20. 
457. Davis, R., A. Aden, and P.T. Pienkos, Techno-economic analysis of autotrophic 
microalgae for fuel production. Appl Energy, 2011. 88. 
458. Quinn, J.C., et al., Microalgae to biofuels lifecycle assessment—multiple pathway 
evaluation. Bioenergy Res, 2013. 6. 
459. Chowdhury, R., S. Viamajala, and R. Gerlach, Reduction of environmental and energy 
footprint of microalgal biodiesel production through material and energy integration. 
Bioresour Technol, 2012. 108: p. 102-11. 
460. Torres, C.M., et al., Microalgae-based biodiesel: a multicriteria analysis of the 
production process using realistic scenarios. Bioresour Technol, 2013. 147. 
461. Ríos, S.D., et al., Microalgae-based biodiesel: economic analysis of downstream process 
realistic scenarios. Bioresour Technol, 2013. 136. 
462. Rawat, I., et al., Biodiesel from microalgae: a critical evaluation from laboratory to 
large-scale production. Appl Energy, 2013. 103. 
463. Nagarajan, S., et al., An updated comprehensive techno-economic analysis of algae 
biodiesel. Bioresour Technol, 2013. 145. 
464. Liu, X., et al., Pilot-scale data provide enhanced estimates of the life cycle energy and 
emissions profile of algae biofuels produced via hydrothermal liquefaction. Bioresour 
Technol, 2013. 148. 
465. Frank, E.D., et al., Life cycle comparison of hydrothermal liquefaction and lipid 
extraction pathways to renewable diesel from algae. Mitig Adapt Strateg Glob Chang, 
2013. 18. 
466. Elliott, D.C., G.G. Neuenschwander, and T.R. Hart, Hydroprocessing bio-oil and 
products separation for coke production. Acs Sustainable Chemistry & Engineering, 
2013. 1(4): p. 389-392. 
467. López Barreiro, D.L., et al., Hydrothermal liquefaction (HTL) of microalgae for biofuel 
production: state of the art review and future prospects. Biomass Bioenergy, 2013. 53. 
468. Collet, P., et al., Biodiesel from microalgae–Life cycle assessment and recommendations 
for potential improvements. Renewable Energy, 2014. 71: p. 525-533. 
 141 
469. Christenson, L. and R. Sims, Production and harvesting of microalgae for wastewater 
treatment, biofuels, and bioproducts. Biotechnology advances, 2011. 29(6): p. 686-702. 
470. Ördög, V., et al., Screening microalgae for some potentially useful agricultural and 
pharmaceutical secondary metabolites. Journal of applied phycology, 2004. 16(4): p. 
309-314. 
471. Pokoo-Aikins, G., et al., Design and analysis of biodiesel production from algae grown 
through carbon sequestration. Clean Technologies and Environmental Policy, 2010. 
12(3): p. 239-254. 
472. Hall-Stoodley, L., J.W. Costerton, and P. Stoodley, Bacterial biofilms: from the natural 
environment to infectious diseases. Nature reviews microbiology, 2004. 2(2): p. 95-108. 
473. Stewart, P.S. and M.J. Franklin, Physiological heterogeneity in biofilms. Nat Rev 
Microbiol, 2008. 6(3): p. 199-210. 
474. Kühl, M., et al., Microenvironmental control of photosynthesis and photosynthesis‐
coupled respiration in an epilithic cyanobacterial biofilm. Journal of Phycology, 1996. 
32(5): p. 799-812. 
475. Falkowski, P.G., and Raven J.A., Aquatic photosynthesis. 1997, Blackwell Science: 
Malden, MA. 
476. Boelee, N.C., Temmink, Hardy, Janssen, Marcel, Buisman, Cees J. N. & Wiffels, Rene 
H., Scenario Analysis of Nutrient Removal from Municipal Wastewater by Microalgal 
Biofilms. Water, 2012. 4(2): p. 460-473. 
477. Revsbech, N.P., An oxygen microsensor with a guard cathode. Limnology and 
Oceanography, 1989. 34(2): p. 474-478. 
478. Stewart, P.S., A review of experimental measurements of effective diffusive permeabilities 
and effective diffusion coefficients in biofilms. Biotechnology and Bioengineering, 1998. 
59(3): p. 261-272. 
479. Bernstein, H.C., et al., In situ analysis of oxygen consumption and diffusive transport in 
high‐temperature acidic iron‐oxide microbial mats. Environmental microbiology, 2013. 
15(8): p. 2360-2370. 
480. Glud, R.N., N.B. Ramsing, and N.P. Revsbech, Photosynthesis and photosynthesis‐
coupled respiration in natural biofilms quantified with oxygen microsensors. Journal of 
Phycology, 1992. 28(1): p. 51-60. 
481. Eustance, E., et al., Growth, nitrogen utilization, and biodiesel potential for two 
chlorophytes grown on ammonium, nitrate, or urea. J Appl Phycol, 2013. 25. 
482. Jørgensen, B.B. and D.J. Des Marais, The diffusive boundary layer of sediments: oxygen 
microgradients over a microbial mat. Limnology and Oceanography, 1990. 35(6): p. 
1343-1355. 
483. Kliphuis, A.M., et al., Effect of O2: CO2 ratio on the primary metabolism of 
Chlamydomonas reinhardtii. Biotechnology and bioengineering, 2011. 108(10): p. 2390-
2402. 
484. Jensen, J. and N.P. Revsbech, Photosynthesis and respiration of a diatom biofilm 
cultured in a new gradient growth chamber. FEMS Microbiology Ecology, 1989. 5(1): p. 
29-38. 
485. Converti, A., et al., Effect of temperature and nitrogen concentration on the growth and 
lipid content of Nannochloropsis oculata and Chlorella vulgaris for biodiesel production. 
 142 
Chemical Engineering and Processing: Process Intensification, 2009. 48(6): p. 1146-
1151. 
486. Stephenson, A.L., et al., Influence of nitrogen-limitation regime on the production by 
Chlorella vulgaris of lipids for biodiesel feedstocks. Biofuels, 2010. 1(1): p. 47-58. 
487. Bernstein, H.C. and R.P. Carlson, Microbial Consortia Engineering for Cellular 
Factories: in vitro to in silico systems. Comput Struct Biotechnol J, 2012. 3: p. 
e201210017. 
488. Ellis, J.T., et al., Acetone, butanol, and ethanol production from wastewater algae. 
Bioresource technology, 2012. 111: p. 491-495. 
489. Gardner, R., et al., Medium pH and nitrate concentration effects on accumulation of 
triacylglycerol in two members of the chlorophyta. J Appl Phycol, 2011. 23. 
490. Gardner, R., et al., Use of sodium bicarbonate to stimulate triacylglycerol accumulation 
in the chlorophyte Scenedesmus sp. and the diatom Phaeodactylum tricornutum. J Appl 
Phycol, 2012. 24. 
491. Cai, T., S.Y. Park, and Y. Li, Nutrient recovery from wastewater streams by microalgae: 
status and prospects. Renewable and Sustainable Energy Reviews, 2013. 19: p. 360-369. 
492. Sturm, B.S. and S.L. Lamer, An energy evaluation of coupling nutrient removal from 
wastewater with algal biomass production. Applied Energy, 2011. 88(10): p. 3499-3506. 
493. Hoffmann, J.P., Wastewater treatment with suspended and nonsuspended algae. Journal 
of phycology, 1998. 34(5): p. 757-763. 
494. Gross, M., The mysteries of the diatoms. Curr Biol, 2012. 22(15): p. R581-5. 
495. Kesaano, M., et al., Dissolved inorganic carbon enhanced growth, nutrient uptake, and 
lipid accumulation in wastewater grown microalgal biofilms. Bioresour Technol, 2015. 
180C: p. 7-15. 
496. Pizarro, C., et al., An economic assessment of algal turf scrubber technology for 
treatment of dairy manure effluent. Ecological Engineering, 2006. 26(4): p. 321-327. 
497. Su, C.-H., et al., Factors affecting lipid accumulation by Nannochloropsis oculata in a 
two-stage cultivation process. Journal of Applied Phycology, 2011. 23(5): p. 903-908. 
498. Devi, M.P., G.V. Subhash, and S.V. Mohan, Heterotrophic cultivation of mixed 
microalgae for lipid accumulation and wastewater treatment during sequential growth 
and starvation phases: effect of nutrient supplementation. Renewable energy, 2012. 43: 
p. 276-283. 
499. Rodolfi, L., et al., Microalgae for oil: strain selection, induction of lipid synthesis and 
outdoor mass cultivation in a low-cost photobioreactor. Biotechnol Bioeng, 2009. 
102(1): p. 100-12. 
500. Peng, X., et al., Triacylglycerol accumulation of Phaeodactylum tricornutum with 
different supply of inorganic carbon. Journal of applied phycology, 2014. 26(1): p. 131-
139. 
501. White, D., et al., The effect of sodium bicarbonate supplementation on growth and 
biochemical composition of marine microalgae cultures. Journal of Applied Phycology, 
2013. 25(1): p. 153-165. 
502. Chi, Z., et al., Bicarbonate-based integrated carbon capture and algae production system 
with alkalihalophilic cyanobacterium. Bioresource technology, 2013. 133: p. 513-521. 
503. Metcalf, E., Wastewater Engineering: Treatment and Reuse. fourth ed ed. 2003, New 
York: McGraw Hill. 
 143 
504. Rhine, E., et al., Improving the Berthelot reaction for determining ammonium in soil 
extracts and water. Soil Science Society of America Journal, 1998. 62(2): p. 473-480. 
505. Crofcheck, C.L., et al. Influence of media composition on the growth rate of Chlorella 
vulgaris and Scenedesmus acutus utilized for CO2 mitigation. in 2012 Dallas, Texas, July 
29-August 1, 2012. 2012. American Society of Agricultural and Biological Engineers. 
506. Glud, R.N., Oxygen dynamics of marine sediments. Marine Biology Research, 2008. 4(4): 
p. 243-289. 
507. Wieland, A. and M. Kühl, Irradiance and temperature regulation of oxygenic 
photosynthesis and O2 consumption in a hypersaline cyanobacterial mat (Solar Lake, 
Egypt). Marine Biology, 2000. 137(1): p. 71-85. 
508. Mus, F., et al., Physiological and molecular analysis of carbon source supplementation 
and pH stress-induced lipid accumulation in the marine diatom Phaeodactylum 
tricornutum. Appl Microbiol Biotechnol, 2013. 97. 
 
  
 144 
 
 
 
 
 
 
 
APPENDICES 
  
 145 
 
 
 
 
 
APPENDIX A 
 
CHARACTERIZATION OF NINE NOVEL GREEN ALGAE STRAINS FROM 
YELLOWSTONE NATIONAL PARK 
 
Supplementary Data 
 
 
  
 146 
 
 
Figure A.1 Light microscopy images of the nine YNP green algae isolates. (A) PGV-6 (B) 
PGV8-G1 (C) PGV8-G2 (D) PGV10-G1 (E) PGV10-G2 (F) WC2b (G) WC-5A (H) MF1 and (I) 
WC-1. 
 
 
  
 147 
Table A.1 The optimal Nile Red exposure stain times and stain methods for each of the 11 green 
algae strains. Each strain was exposed to the lipophilic stain, Nile Red, in 20% DMSO and 
acetone until an optimal stain time was indicated. The stain method that resulted in higher 
fluorescence was selected as the proper stain method for each strain because that carrier (DMSO 
or acetone) was able to cross the cell membrane more effectively.1    
Strain Stain Time Stain Method 
MF1 60 min 20% DMSO 
PGV-6 60 min 20% DMSO 
PGV8-G1 4 min 20% DMSO 
PGV8-G2 6 min 20% DMSO 
PGV10-G1 60 min 20% DMSO 
PGV10-G2 40 min 20% DMSO 
WC-1 60 min Acetone 
WC-2B 10 min Acetone 
WC-5A 4 min 20% DMSO 
PC-3A 30 min Acetone 
UTEX-395 10 min 20% DMSO 
 
 148 
Table A.2 The endpoint DCW and doubling times in the air-only and sodium bicarbonate 
added conditions for each of the 11 strains. Each condition was grown in triplicate. 
Dry Cell Weight (g/L) Doubling time (h) 
  air only  air + HCO₃⁻   air only  air + HCO₃⁻   
  Ave Std dev Ave Std dev Ave Std dev Ave Std dev 
MF1 1.00 0.06 1.22 0.11 24.79 0.91 25.79 3.15 
PC3 1.08 0.22 1.50 0.12 27.45 4.20 26.44 1.56 
PGV10-G1 1.02 0.08 0.69 0.09 24.33 0.32 26.23 1.36 
PGV10-G2 0.95 0.13 0.95 0.19 21.37 1.44 24.15 1.92 
PGV6 1.06 0.26 1.08 0.06 30.40 2.51 29.76 2.44 
PGV8-G1 0.56 0.10 0.65 0.06 24.63 0.18 20.39 0.19 
PGV8-G2 0.73 0.12 0.54 0.01 17.36 5.64 22.62 7.77 
UTEX 395 0.83 0.02 1.03 0.12 25.08 2.14 27.44 1.78 
WC-1 1.02 0.06 1.18 0.14 19.80 6.35 16.07 1.28 
WC-2b 1.02 0.14 1.44 0.08 18.13 0.63 15.00 0.43 
WC-5 0.86 0.05 0.91 0.17 15.98 1.31 20.97 1.41 
 
 
  
 149 
Table A.3 Final DCWs and doubling times for each green algae strain for the control and 
sodium bicarbonate addition conditions. The DCWs were the average and 95% 
confidence interval of each triplicate at the time of harvest for each experiment. 
 
Strain 50mM NaHCO3 Cell Dry Weight [g·L-1]
WC-1 Control 1.02 ± 0.06
Bicarbonate 1.18 ± 0.14
WC-2b Control 1.02 ± 0.14
Bicarbonate 1.44 ± 0.08
WC-5 Control 0.86 ± 0.05
Bicarbonate 0.91 ± 0.17
PGV-6 Control 1.06 ± 0.26
Bicarbonate 1.08 ± 0.06
PGV-8 G1 Control 0.56 ± 0.10
Bicarbonate 0.65 ± 0.06
PGV-8 G2 Control 0.73 ± 0.12
Bicarbonate 0.54 ± 0.01
PGV-10 G1 Control 1.02 ± 0.08
Bicarbonate 0.69 ± 0.09
PGV-10 G2 Control 0.95 ± 0.13
Bicarbonate 0.95 ± 0.18
MF1 Control 1.00 ± 0.06
Bicarbonate 1.22 ± 0.11
PC-3a Control 1.08 ± 0.22
Bicarbonate 1.50 ± 0.12
UTEX 395 Control 0.83 ± 0.02
Bicarbonate 1.02 ± 0.12
 
 
 
 
 
 150 
 
 
 
 
APPENDIX B 
 
RGD-1 GENOME SUPPLEMENTARY DATA 
  
 151 
Hight Molecular Weight DNA Extraction 
Table 4 DNA Extraction (JGI Method).2 The 1.5, 30 and 60 mL headers refer to the 
container volumes recommended for the DNA extraction volumes. Fifty milliliters were 
centrifuged and adjusted to ~OD600 1.0 as indicated in step 6. To improve cell wall 
breakage, mechanical stress was applied with sterile sand, mortar and pestle and, liquid 
nitrogen. Rather than using isopropanol, the DNA was suspended in molecular grade 
ethanol in the -20C freezer overnight to improve DNA precipitation. 
 1.5ml 30ml 60ml 
1. Grow cells (see above) in broth and pellet at 10,000 rpm for 5 min or scrape from plate. 
2. Transfer bacterial suspension to the appropriate centrifuge tube. 
3. Spin down cells in microfuge or centrifuge at 10,000 rpm for 5 minute. 
4. Discard the supernatant. 
5. Resuspend cells in TE.         
6. Adjust to OD600 @ 1.0 with TE buffer (10mM tris; 1 mM EDTA, pH 8.0) 
7. Transfer given amount of cell suspension to a clean centrifuge tube. -------    740µl 14.8ml 29.6ml 
8. Add lysozyme (conc. 100mg/ml).  Mix well.     ------------------------------------- 20µl 400µl 800µl 
 This step is necessary for hard to lyse gram (+) and some gram (–) bacteria.  
9. Incubate for 5 min. at room temperature. 
10. Add 10% SDS. Mix well.          --------------------------------------------------------- 40µl 800µl 1.6ml 
11. Add Proteinase K (10mg/ml). Mix well.        ---------------------------------------- 8µl 160µl 320µl 
12. Incubate for 1 hr at 37°C. 
13. Add 5 M NaCl. Mix well.        ----------------------------------------------------------- 100µl 2ml 4ml 
14. Add CTAB/NaCl (heated to 65°C). Mix well.         -------------------------------- 100µl 2ml 4ml 
15. Incubate at 65°C for 10 min. 
16. Add chloroform:isoamyl alcohol (24:1). Mix well.       ---------------------------- 0.5ml 10ml 20ml 
17. Spin at max speed for 10 min at room temperature. 
18. Transfer aqueous phase to clean eppendorf (should not be viscous). 
19. Add phenol:chloroform:isoamyl alcohol (25:24:1). Mix well.      --------------- 0.5ml 10ml 20ml 
20. Spin at max speed for 10 min at room temperature. 
21. Transfer aqueous phase and add 0.6 vol isopropanol (-20°C). 
 (e.g. if 400 µl of aqueous phase is transferred, add 240 µl of isopropanol.             ---- Add 0.6 of vol. ---- 
22. Incubate at room temp for 30 min. 
23. Spin at max speed for 15 min. 
24. Wash pellet with 70% ethanol, spin at max speed for 5 min. 
25. Discard the supernatant and and let pellet dry for 5 – 10 min at room temp. 
26. Resuspend in TE plus RNAse (99 µl TE + 1 µl RNAse (10 mg/ml)). -------- 20µl 400µl 800µl 
27. Transfer to sterile microcentrifuge tubes. 
28. Incubate at 37°C for 20 min. 
29. Run 1 µl in a 1% agarose gel with concentration standards.  
 152 
BioNano and Assembly Data 
Table B.1 Assembly statistics for BioNano data. RGd-1 biomass was submitted to the 
Bioinformatics Center at Kansas State for high molecular weight (HMW) DNA extraction and 
whole genome map assembly. The HMW DNA was digested using the endonuclease, Dnase1 to 
introduce nicks and create 3’ hydroxyl group. DNA polymerase 1 catalyzed the addition of 
fluorescently labeled Alexa 546 dUTP fluorescent dyes that attached to the nucleotides at the 3’ 
hydroxyl group. 5’ to 3’ exonuclease activity removed the nucleotides from the 5’ phosphoryl 
terminus of the nick. The labeled and unlabeled nucleotides displaced the excised nucleotides in 
the original DNA strand. The fluorescently-labeled DNA were visualized using the intercalating 
dye, YOYO-1. The labeled DNA was added to an IrysChip flow cell, linearized with an 
electrophoretic current and imaged. 
 
Table B.2 Pfam proteins. Seven Pfam proteins were found in common among the 18 algal 
genomes that were used for the concatenated protein tree using the ezTree, pipeline. 
Pfam ID 
PF08149.10 BING4CT BING4CT 
PF12295.7 Symplekin_C Symplekin 
PF03332.12 PMM Eukaryotic 
PF04034.12 Ribo_biogen_C Ribosome 
PF00692.18 dUTPase dUTPase 
PF06862.11 UTP25 Utp25 
PF03568.16 Peptidase_C50 Peptidase 
 
 
 
 
 153 
Transcript Raw Read Data 
 
Figure B.1 Each sample was analyzed using FastQC and compiled within MultiQC for their 
unique and duplicate sequence counts. There were a total of nine samples, three culture 
conditions and three replicates for each condition. The forward (R1) and reverse (R2) reads were 
analyzed for each sample. Samples A1-A3 had the largest number of unique reads among the 
sequenced samples and C1-C3 had the least number of unique reads. 
 
 
 
 
 
 
 154 
Sample A1-R1 
 
Figure B.2 This figure indicates the presence of adapter sequence contamination in sample A1-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.2 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample A1-R1. Here, 23.67% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 155 
 
Figure B.4 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample A1-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure 2 Quality scores across the positions of the 150 bp reads for sample A1-R1. The x- and y-
axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 156 
 
Figure B.6 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample A1-R1. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.7 The per sequence GC content for sample A1-R1. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
 157 
 
Figure B.8 The per sequence quality score for sample A1-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.9 The distribution of the sequence lengths for sample A1-R1. The x- and y-axes refer to 
the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 158 
Sample A1-R2 
 
Figure B.10 This figure indicates the presence of adapter sequence contamination in sample A1-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.11 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample A1-R2. Here, 29.88% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 159 
 
Figure B.12 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample A1-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.13 Quality scores across the positions of the 150 bp reads for sample A1-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 
 160 
 
Figure B.14 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample A1-R2. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.15 The per sequence GC content. The x- and y-axes represent the %GC content per 
read and read counts, respectively for sample A1-R2. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
 
 161 
 
Figure B.16 The per sequence quality score for sample A1-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
 
Figure B.17 The distribution of the sequence lengths for sample A1-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 162 
Sample A2-R1 
 
Figure B.18 This figure indicates the presence of adapter sequence contamination in sample A2-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.19 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample A2-R1. Here, 26.72% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 163 
 
Figure B.20 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample A2-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.21 Quality scores across the positions of the 150 bp reads for sample A2-R1. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 
 164 
 
Figure B.22 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample A2-R1. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.23 The per sequence GC content for sample A2-R1. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
 
 165 
 
Figure B.24 The per sequence quality score for sample A2-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
 
Figure B.25 The distribution of the sequence lengths for sample A2-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 166 
Sample A2-R2 
 
Figure B.26 This figure indicates the presence of adapter sequence contamination in sample A2-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.27 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample A2-R2. Here, 33.26% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 
 167 
 
Figure B.28 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample A2-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.29 Quality scores across the positions of the 150 bp reads for sample A2-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 168 
 
Figure B.30 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample A2-R2. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.31 The per sequence GC content for sample A2-R2. The x- and y-axes represent the 
%GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
 169 
 
Figure B.32 The per sequence quality score for sample A2-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.33 The distribution of the sequence lengths for sample A2-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 170 
Sample A3-R1 
 
Figure B.34 This figure indicates the presence of adapter sequence contamination in sample A3-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.35 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample A3-R1. Here, 19.32% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 171 
 
Figure B.36 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample A3-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.37 Quality scores across the positions of the 150 bp reads for sample A3-R1. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 172 
 
Figure B.38 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample A3-R1. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.39 The per sequence GC content for sample A3-R1. The x- and y-axes represent the 
%GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
 173 
 
Figure B.40 The per sequence quality score for sample A3-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
 
Figure B.41 The distribution of the sequence lengths for sample A3-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 174 
Sample A3-R2 
 
Figure B.42 This figure indicates the presence of adapter sequence contamination in sample A3-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.43 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample A3-R2. Here, 23.64% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 175 
 
Figure B.44 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample A3-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.45 Quality scores across the positions of the 150 bp reads for sample A3-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 176 
 
Figure B.46 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample A3-R2. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.47 The per sequence GC content for sample A3-R2. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
 177 
 
Figure B.48 The per sequence quality score for sample A3-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.49 The distribution of the sequence lengths for sample A3-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
  
 178 
Sample B1-R1 
 
Figure B.50 This figure indicates the presence of adapter sequence contamination in sample B1-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.51 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample B1-R1. Here, 29.46% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 179 
 
Figure B.52 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample B1-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.53 Quality scores across the positions of the 150 bp reads for sample B1-R1. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 180 
 
Figure B.54 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample B1-R1. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.55 The per sequence GC content for sample B1-R1. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The two broad, red peaks that shift away from the mean %GC = 47 indicates contamination. 
 181 
 
Figure B.56 The per sequence quality score for sample B1-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.57 The distribution of the sequence lengths for sample B1-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 182 
Sample B1-R2 
 
Figure B.58 This figure indicates the presence of adapter sequence contamination in sample B1-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.59 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample B1-R2. Here, 24.99% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 183 
 
Figure B.60 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample B1-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.61 Quality scores across the positions of the 150 bp reads for sample B1-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 184 
 
Figure B.62 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample B1-R2. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.63 The per sequence GC content for sample B1-R2. The x- and y-axes represent the 
%GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The two broad, red peaks that shift away from the mean %GC = 47 indicates contamination. 
 185 
 
Figure B.64 The per sequence quality score for sample B1-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.65 The distribution of the sequence lengths for sample B1-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 186 
Sample B2-R1 
 
Figure B.66 This figure indicates the presence of adapter sequence contamination in sample B2-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.67 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample B2-R1. Here, 14.12% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 187 
 
Figure B.68 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample B2-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.69 Quality scores across the positions of the 150 bp reads. The x- and y-axes represent 
the quality scores and position within the read for sample B2-R1. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 188 
 
Figure B.70 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample B2-R1. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.71 The per sequence GC content for sample B2-R1. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The two broad, red peaks that shift away from the mean %GC = 47 indicates contamination. 
 
 189 
 
Figure B.72 The per sequence quality score for sample B2-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.73 The distribution of the sequence lengths for sample B2-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 190 
Sample B2-R2 
 
Figure B.74 This figure indicates the presence of adapter sequence contamination in sample B2-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.75 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample B2-R2. Here, 14.12% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 
 191 
 
Figure B.76 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample B2-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.77 Quality scores across the positions of the 150 bp reads for sample B2-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 192 
 
Figure B.78 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample B2-R2. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.79 The per sequence GC content for sample B2-R2. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The two broad, red peaks that shift away from the mean %GC = 47 indicates contamination. 
 193 
 
Figure B.80 The per sequence quality score for sample B2-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
 
Figure B.81 The distribution of the sequence lengths for sample B2-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 194 
Sample B3-R1 
 
Figure B.82 This figure indicates the presence of adapter sequence contamination in sample B3-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.83 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample B3-R1. Here, 20.06% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 195 
 
Figure B.84 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample B3-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.85 Quality scores across the positions of the 150 bp reads for sample B3-R1. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 196 
 
Figure B.86 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample B3-R1. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.87 The per sequence GC content for sample B3-R1. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The two broad, red peaks that shift away from the mean %GC = 47 indicates contamination. 
 197 
 
Figure B.88 The per sequence quality score for sample B3-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.89 The distribution of the sequence lengths for sample B3-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 198 
Sample B3-R2 
 
Figure B.90 This figure indicates the presence of adapter sequence contamination in sample B3-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.90 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample B3-R2. Here, 15.98% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 199 
 
Figure B.91 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample B3-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure 3 Quality scores across the positions of the 150 bp reads for sample B3-R2. The x- and y-
axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 200 
 
Figure B.93 The percentage of each nucleotide, A, C, T and G and their positions across the 150 
bp reads for sample B3-R2. The x- and y-axes represent the position within the 150 bp reads and 
the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be parallel 
to each other. The erratic peaks in the beginning of the reads is due random hexamer ligation that 
imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the reads, it 
enriches for k-mers at the 5’end of the reads.3 
 
Figure B.94 The per sequence GC content for sample B3-R2. The x- and y-axes represent the 
%GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The two broad, red peaks that shift away from the mean %GC = 47 indicates contamination. 
 201 
 
Figure B.96 The per sequence quality score for sample B3-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.97 The distribution of the sequence lengths for sample B3-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 202 
Sample C1-R1 
 
Figure B.98 This figure indicates the presence of adapter sequence contamination in sample C1-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.99 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample C1-R1. Here, 4.65% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 203 
 
Figure B.100 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample C1-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.101 Quality scores across the positions of the 150 bp reads for sample C1-R1. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 204 
 
Figure B.102 The percentage of each nucleotide, A, C, T and G and their positions across the 
150 bp reads for sample C1-R1. The x- and y-axes represent the position within the 150 bp reads 
and the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be 
parallel to each other. The erratic peaks in the beginning of the reads is due random hexamer 
ligation that imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the 
reads, it enriches for k-mers at the 5’end of the reads.3 
 
Figure B.103 The per sequence GC content for sample C1-R1. The x- and y-axes represent the 
%GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The shift away from the mean %GC = 47 indicates bacterial contamination. 
 205 
 
Figure B.104 The per sequence quality score for sample C1-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.105 The distribution of the sequence lengths for sample C1-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 206 
Sample C1-R2 
 
Figure B.106 This figure indicates the presence of adapter sequence contamination in sample C1-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.107 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample C1-R2. Here, 8.8% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 207 
 
Figure B.108 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample C1-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.109 Quality scores across the positions of the 150 bp reads for sample C1-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 208 
 
Figure B.110 The percentage of each nucleotide, A, C, T and G and their positions across the 
150 bp reads for sample C1-R2. The x- and y-axes represent the position within the 150 bp reads 
and the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be 
parallel to each other. The erratic peaks in the beginning of the reads is due random hexamer 
ligation that imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the 
reads, it enriches for k-mers at the 5’end of the reads.3 
 
Figure B.111 The per sequence GC content for sample C1-R2. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The shift away from the mean % GC = 47 indicates bacterial contamination. 
 209 
 
Figure B.112 The per sequence quality score for sample C1-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.113 The distribution of the sequence lengths for sample C1-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 210 
Sample C2-R2 
 
Figure B.114 This figure indicates the presence of adapter sequence contamination in sample C2-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.115 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample C2-R2. Here, 7.08% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 211 
 
Figure B.116 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample C2-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.117 Quality scores across the positions of the 150 bp reads for sample C2-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 212 
 
Figure B.118 The percentage of each nucleotide, A, C, T and G and their positions across the 
150 bp reads for sample C2-R2. The x- and y-axes represent the position within the 150 bp reads 
and the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be 
parallel to each other. The erratic peaks in the beginning of the reads is due random hexamer 
ligation that imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the 
reads, it enriches for k-mers at the 5’end of the reads.3 
 
Figure B.119 The per sequence GC content for sample C2-R2. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The shift away from the mean %GC = 47 indicates bacterial contamination. The sharp peak may 
indicate overrepresented bacterial reads. 
 213 
 
Figure B.120 The per sequence quality score for sample C2-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.121 The distribution of the sequence lengths for sample C2-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 214 
Sample C3-R1 
 
Figure B.122 This figure indicates the presence of adapter sequence contamination in sample C3-
R1. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.123 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample C3-R1. Here, 29.88% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 215 
 
Figure B.124 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample C3-R1. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.125 Quality scores across the positions of the 150 bp reads for sample C3-R1. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 216 
 
Figure B.126 The percentage of each nucleotide, A, C, T and G and their positions across the 
150 bp reads for sample C3-R1. The x- and y-axes represent the position within the 150 bp reads 
and the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be 
parallel to each other. The erratic peaks in the beginning of the reads is due random hexamer 
ligation that imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the 
reads, it enriches for k-mers at the 5’end of the reads.3 
 
Figure B.127 The per sequence GC content for sample C3-R1. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The shift away from the mean %GC = 47 indicates bacterial contamination. 
 217 
 
Figure B.128 The per sequence quality score for sample C3-R1. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.129 The distribution of the sequence lengths for sample C3-R1. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 218 
Sample C3-R2 
 
Figure B.130 This figure indicates the presence of adapter sequence contamination in sample C3-
R2. The x- and y-axes represent the position within the 150 bp read and the percentage, 
respectively. The red line indicates the presence of the Illumina Universal Adapter starting at 
approximately bases 78-79 of 150, indicating the adapter trimming strategies should occur in the 
middle to 3’ end of the reads. 
 
Figure B.131 The blue line represents the percent of the total sequences and the red line indicates 
the deduplicated (unique) sequences for sample C3-R2. Here, 8.93% of the reads were 
deduplicated. The peaks in the blue line, or the total sequences indicate the presence of 
contaminants or highly expressed transcripts under the conditions that were tested. 
 219 
 
Figure B.132 The percentage of Ns and their positions across all bases in the 150 bp reads for 
sample C3-R2. The y-axis represents the percentage and the x-axis represents the position of the 
base in the read. The data presented here indicate that there are 0% Ns in the reads and all 
nucleotides are composed of A, T, C and Gs. 
 
Figure B.133 Quality scores across the positions of the 150 bp reads for sample C3-R2. The x- 
and y-axes represent the quality scores and position within the read. The blue line represents the 
average quality of the bases at each position. The error bars represent 10 and 90% of the reads 
fall within that range. The yellow box represents 25-75% of the reads falling within that range. 
 220 
 
Figure B.134 The percentage of each nucleotide, A, C, T and G and their positions across the 
150 bp reads for sample C3-R2. The x- and y-axes represent the position within the 150 bp reads 
and the percentage of each nucleotide. Ideally, each of the percent nucleotide lines should be 
parallel to each other. The erratic peaks in the beginning of the reads is due random hexamer 
ligation that imparts bias in RNA-seq libraries. While the bias does not affect the entirety of the 
reads, it enriches for k-mers at the 5’end of the reads.3 
 
Figure B.135 The per sequence GC content for sample C3-R2. The x- and y-axes represent the % 
GC content per read and read counts, respectively. The blue line represents the GC content 
theoretical distribution and the red line represents the actual GC content for the 150 bp reads. 
The shift away from the mean %GC = 47 indicates bacterial contamination. The sharp peak may 
indicate overrepresented bacterial reads. 
 221 
 
Figure B.136 The per sequence quality score for sample C3-R2. The x- and y- axes represent the 
mean quality (Phred Score) and the number of reads, respectively. The data here indicate that the 
majority of the quality scores were > 38. 
 
Figure B.137 The distribution of the sequence lengths for sample C3-R2. The x- and y-axes refer 
to the read sequence lengths and the number of reads with those lengths. The data here indicates 
that all reads were 150 bp. 
 
 
 
 222 
Metabolic Pathways 
 
Figure B.138 The glycolysis/gluconeogenesis metabolic pathway with genes present (green) in 
the RGd-1 genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 
 
 223 
 
Figure B.139 The pyruvate metabolic pathway with genes present (green) in the RGd-1 genome. 
The pathway was searched and populated using the online platform, KeggMapper.4 
 224 
 
Figure B.140 The fatty acid degradation metabolic pathway with genes present (green) in the 
RGd-1 genome. The pathway was searched and populated using the online platform, 
KeggMapper.4 
 225 
 
Figure B.141 The glycerolipid metabolic pathway with genes present (green) in the RGd-1 
genome. The pathway was searched and populated using the online platform, KeggMapper.4 
 226 
 
Figure B.142 The ⍺-linoleic acid metabolic pathway with genes present (green) in the RGd-1 
genome. The pathway was searched and populated using the online platform, KeggMapper.4 
 227 
 
Figure B.143 The arachidonic acid metabolic pathway with genes present (green) in the RGd-1 
genome. The pathway was searched and populated using the online platform, KeggMapper.4 
 
 
 
 
 
 
 
 
 228 
 
 
 
 
 
 
APPENDIX C 
DETERMINING THE EFFECTS OF BLUE LIGHT ON THE RGD-1 GROWTH RATE 
  
 229 
Introduction 
 
Diatoms are responsible for approximately 40% of marine primary productivity and are 
known to assimilate 20% of global CO2.247 As a necessary component for growth, diatoms fix 
atmospheric CO2, which on a large-scale may help offset the increase in atmospheric CO2 
concentrations. In recent years, there has been renewed interest in using high lipid-producing 
algae strains for biodiesel production. As a near carbon-neutral technology, using algae for 
biofuel production may reduce further CO2 emissions and help meet transport fuel demands. 
Outdoor raceway ponds are currently the most cost-effective method to grow algae for large 
scale biofuel production.5, 56, 248 However, to lower production costs, it is important to identify 
factors that contribute to enhanced growth and lipid accumulation. 
Moll et al. (2014) found that diatom RGd-1 accumulated very high lipid concentrations; 
30-40% (w/w) triacylglycerol (TAG) and 70-80% (w/w) biofuel potential (BP).49 Due to its 
ability to produce such high concentrations of lipids, it was selected for whole-genome 
sequencing. Currently, there are only two other diatoms with sequenced, published and publicly 
available genomes. 
RGd-1 was observed to differentiate into multiple cell sizes and cell morphologies 
(Figure C.1). One hypothesis that may account for different cell phenotypes is a lack of blue light 
in laboratory light systems. A recent study discovered novel, diatom specific genes in the blue 
light cryptochrome/photolyase family (CPF) in the Phaeodactylum tricornutum genome, which 
was found to act as transcriptional regulators for cyclin expression.249, 250 Overexpression of 
PtCPF1 was found to activate expression for blue light-induced genes that are known to be 
involved in the cell cycle and DNA repair250, 251. This suggests that CPF1 may sense changes in 
environmental light conditions and may play a role in signaling the completion of the cell cycle 
 230 
with sufficient blue light. Another newly described, stramenopile specific blue light-absorbing 
group of photoreceptors have also been identified as aureochromes252, 253. In P. tricornutum, 
AUREO1a is a transcription factor regulating the diatom specific cell cycle protein, dsCYC2, 
which controls the G1-S phase of the cell cycle252, 253. AUREO1a is a transcription factor that 
regulates the diatom specific cell cycle protein, dsCYC2, which controls the G1-S phase of the 
Figure C.1 RGd-1 cellular morphologies as imaged using field 
emission – scanning electron microscopy. The RGd-1 morphology in 
2009 (left), and different cell morphologies in 2014 (middle and 
right). 
cell cycle252, 253 and CPF1 may sense changes in environmental light conditions and may play a 
role in signaling the cell cycle as a result. Both AUREO1a and CPF1 have been identified in the 
RGd-1 genome. At least 18 significant hits with e-values 1e-5 or lower, and %ID of at least 80% 
were found when aureochromes from P. tricornutum were BLAST searched against the current 
RGd-1 genome assembly.  
At present, the RGd-1 growth rate is slower than desired (~29h doubling time). To 
improve growth rates, other strategies have been employed such as changing the Si, Fe and As 
concentrations as well as the light intensity. Here we have focused on varying the intensity of 
blue light (350-500 nm) to observe its effects on the RGd-1 growth rate. As shown in the results, 
the light spectrum produced from fluorescent algae grow lights is significantly deficient in blue 
light as compared to the natural light in Yellowstone National Park, where RGd-1 was isolated. 
 231 
It was hypothesized that varying the blue light intensity would affect progression through 
the cell cycle which may alter the growth rate. However, to the best of our knowledge, no studies 
have elucidated this effect on diatom growth. While the studies described above investigated the 
effects of different light wavelengths on circadian rhythm specific gene expression, there is a 
current gap in the knowledge of the effects of varying blue light intensities on diatom growth 
rates.  
Background 
Cryptochromes and photolyases are flavoproteins with blue light (350-500 nm) sensitive 
receptors that regulate gene expression and catalyze the repair of UV damaged DNA (repair of 
cyclobutane pyrimidine dimers or 6-4 photoproducts), respectively.254, 255 A novel 
cryptochrome/photolyase family protein (CPF) was recently identified in the P. tricornutum 
genome.251 Blue light is prevalent in the upper layers of the water column where diatoms most 
commonly reside, and it is one of the few wavelengths able to penetrate to greater depths in the 
water column.251, 256 Arabidopsis has been found to have seven blue light receptors, of which, 
cryptochromes (CRY1 and CRY2) have been found to have activity in hypocotyl elongation, 
floral initiation, the circadian rhythm, chloroplast development, guard cell development, stomata 
opening.257  Other possible functions including root development, programmed cell death, 
magnetoreception, seed dormancy, pathogen responses, and the cell cycle.258 For the pennate 
diatom, Navicula, blue light-induced dense, evenly-distributed lawns of cells and increased the 
growth rate more than three times compared to growth on other wavelengths.256 For P. 
tricornutum, PtCPF1 overexpression resulted in altered expression for genes associated with 
photoinhibition, thus decreasing photosynthetic efficiency which would limit growth and lipid 
accumulation.251, 253 
 232 
In animals, the CPFs act as transcriptional activators controlling the circadian rhythm.251 
The CPFs identified in both T. pseudonana and P. tricornutum are more closely related to 
animals than plants. This is expected given the evidence for secondary endosymbiosis making 
diatoms phylogenetically more related to animals than plants.36, 204, 255 Previous studies have 
investigated the role of animal cryptochromes (aCRY) in the green alga, Chlamydomonas 
reinhardtii and genome-wide changes in gene expression for T. pseudonana with regards to light 
at different times of the day.204, 255 Su et al. 2015 investigated the effects of different wavelengths 
of light on the growth rate and frustule size for the diatom, Coscinodiscus granii. Blue and red 
light wavelengths were found to produce the fastest growth rates and lower intensities of each 
wavelength resulted in morphological differences in the diatom frustule.259, 260 While this study 
looked at the effect of two different intensities using six different light-emitting diodes (LEDs) 
targeting the wavelengths of interest, they did not tightly control the wavelengths used in their 
studies or look at the transcriptomic effects of these conditions. 
Several diatom specific cyclins, proteins associated with the progression of the cell cycle, 
have been identified.67, 250, 261 Specifically, P. tricornutum and T. pseudonana have 24 and 52, 
respectively eleven of which were determined to be diatom specific.261 Bowler et al. 2008 
suggested that this large number of cyclins may be attributable to the unique diatom life cycles, 
affecting diatom cell size.35 
A study by Kafri et al. 2013 found that larger, faster growing tissue culture cells, halted 
progression throughout the cell cycle at the G1/S checkpoint until the smaller, slower growing 
cells caught up.262 This allowed all cells in a culture to progress through the remainder of the cell 
cycle uniformly at the same rate. A deficiency in blue light may alter progression through the 
cell cycle, resulting in a decreased growth rate.  
 233 
The hypothesis for this work was that there is an inverse relationship between blue light 
intensity and the RGd-1 growth rate. The approach was to change the intensity of the blue light 
spectrum only (zero, low, medium and high intensity) while maintaining similar intensities for 
all other wavelengths and overall photosynthetically active radiation (PAR).   
From this study, the effect of blue light on RGd-1 growth, cellular morphology, and lipid 
productivity was determined. It was expected that we would see an increased growth rate with 
increased blue light intensity and potentially improve our ability to grow this high lipid-
producing diatom. Stabilizing phenotypic traits such as growth rate and lipid accumulation is 
vital for improving the viability of algal biofuels.  
Methods 
All experiments were performed in temperature-controlled tubular photobioreactors using 
1.25 L of sterile alkaline growth medium (modified Bold’s Basal Medium supplemented with 
B12, S3 vitamin solution and titrated to pH 8.7), grown with a 14:10 light/dark (L:D) cycle at 
27°±1°C.49 Growth studies were performed using LEDs in the MSU phototrophic growth lab. 
Each experiment was maintained at 22% light intensity using a custom bank of light-emitting 
diodes (LED) (25.4 cm wide, 120.65 cm long). Three light filters were used blue (filter no. 384), 
light yellow (filter no. 313), yellow (filter no. 6) and a control (no filter) (Table C.1) (Rosco) 
were wrapped around the outside of the photobioreactor tubes to adjust the blue light intensities. 
All conditions remained consistent except for the filter type. Each culture was grown in duplicate 
with filters wrapped around each individual 1.25 L photobioreactor for a total of eight tubes. The 
blue light intensities were varied using filters (Rosco) that filter out blue light while leaving the 
other wavelengths at a consistent intensity.  
 234 
Table C.1 Growth conditions (filter types) and the measured PAR passing through the filter 
measured with a spectroradiometer (Ocean Optics). 
Filter PAR 
(µmole photons m-2 s-1) 
Control 1.97e-4 
Light Yellow (#313) 1.45e-4 
Yellow (#6) 1.70e-4 
Blue (#384) 1.72e-5 
 
To evaluate the effect of blue light on growth, the following measurements were 
performed daily as outlined by Moll et al. 201449: direct cell counts, pH, chlorophyll and Nile 
Red fluorescence intensity (rfu). To quantify the growth, a minimum of 100 cells were counted 
from each sample using a hemacytometer (Reichert). The sample pH was measured using a 
standard benchtop pH meter (Accumet). The lipid accumulation was measured daily by staining 
the cultures with Nile Red (0.25 mg/mL suspended in 20% DMSO) and using a microplate 
reader (BioTek Synergy). Final dry cell weights (DCW) were assessed at the end of the 
experiment by filtering 25 mL of each culture with pre-weighed F/F Glass Microfiber Filters 
(Whatman). Samples were re-weighed after drying at 60°C after approximately 24 and 48 hours. 
Once experiments reached maximum Nile Red fluorescence intensity, cultures were harvested, 
and dry cell weight was measured. Chlorophyll was extracted using acetone and measured on a 
plate reader (Bio-Tek) at 632, 652 and 665 nm to quantify chlorophylls a and c.263, 264 
It was expected that when the blue light intensity was changed, the PAR would also 
change. The filters were chosen to avoid this problem as much as possible. The PAR and light 
spectrum produced by each filter was measured using a spectroradiometer (Ocean Optics).   
 
 235 
Results and discussion 
 The overall blue light emitted from the MSU fluorescent grow lights was significantly 
lower compared to the blue light produced by the natural sunlight at Witch Creek, where RGd-1 
was isolated (Figure C.2). By changing the blue light intensity, it was hypothesized that there 
YNP MSU 
would be a significant difference in growth rates between the different conditions.  
Witch Creek Light systems
 
FiguPreA C.R2  TMhe leigahts iunternesimty aet Wnitcsh Creek (late moPrnAinRg  AMuguesta 2s01u2r, elefmt) aendn latbsoratory 
fluorescent1 g8ro0w7 lights (ri1g8ht2).8 Two measurements were ta4k1en2 in the4 f-2 -1 1ie2ld (183077 9and 1828 uW 
cm  nm ) and three measurements were taken for the MSU lab light systems (421, 412 and 379 
uW cm-2 nm-1). 
 
Four conditions were tested; a no filter control, and three different blue light filters with 
different blue light intensities (Table C.1). The light yellow filter had the lowest blue light 
intensity, the yellow filter had a higher blue light intensity and the blue filter had the highest blue 
light intensity but was deficient in the other wavelengths greater than approximately 550 nm. 
The PAR was substantially lower for the blue light filter (#384) compared to the other two filters 
and control (Table C.1).  
 236 
The light emission spectra for the four conditions that were tested are shown in Figures 3-
6; the no filter control (Figure C.3), light yellow (#313) (Figure C.4), yellow (#6) (Figure C.5) 
and blue (#384) (Figure C.6). It is important to note that the y-axis is different for the conditions 
depending on the highest intensity.  The no-filter control had the highest intensity and coverage 
for wavelengths greater than 500 nm (Figure C.3). Compared to the light yellow and yellow 
filters (Figures C.4 and C.5), the no-filter control also had a higher blue light intensity. The blue 
light filter had the highest blue light intensity but was deficient in wavelengths exceeding 
approximately 550 nm (Figure C.6). 
Table C.2 shows that each of the four conditions reached similar final cell concentrations 
at approximately 1.0 x 106 cells mL-1. However, the control was slightly higher at 1.45 x 106 
cells mL-1 ± 1.41 x 104 cells mL-1, which could be due to a reduced PAR inside the tubes with 
filters (Table C.1, Figure C.7). Further, the doubling times reflected this as well. The light yellow 
filter, with the lowest blue light intensity (Figure C.4), resulted in the fastest doubling time at 
27.04 ± 12.12 h (Table C.2). However, it was not statistically different from the control condition 
that grew at a doubling time of 38.49 ± 10.55 h. (Figure C.7). When RGd-1 was grown with the 
yellow filter that had a greater intensity of blue light that passed through the filter (Figure C.6), 
the doubling time decreased compared to the light yellow and control conditions at 40.23 h 
(Table C.2, Figure C.7). One of the photobioreactor tubes started clumping so only one of the 
two tubes was used for analysis.  
 237 
 
Figure C.3 The control (without a color filter) spectroradiometer measurement at 22% LED 
intensity. The spectroradiometer measurements were taken inside empty photobioreactor tubes. 
 
 
Figure C.4 Spectroradiometer measurement (Ocean Optics) for the Rosco filter #313 (light 
yellow) with low blue intensity using 22% LED intensity. The spectroradiometer measurements 
were taken inside empty photobioreactor tubes. 
 
 
 
 238 
 
Figure C.5 Spectroradiometer measurements (Ocean Optics) for the Rosco filter #6 (yellow) high 
blue intensity using 22% LED intensity. The spectroradiometer measurements were taken inside 
empty photobioreactor tubes. 
 
 
Figure C.6 Spectroradiometer measurements (Ocean Optics) for the Rosco filter #384 (blue) very 
high blue intensity, and low intensity for other wavelengths and 22% LED intensity. The 
spectroradiometer measurements were taken inside empty photobioreactor tubes. 
 239 
1.5E+06
Control
1.5E+05
Light Yellow
Yellow
Blue
1.5E+04
0 2 4 6 8 10 12 14
Time (Days)
 
Figure C.7 Cell concentrations for the 4 blue light conditions tested, the no light filter control, 
light yellow (#313), yellow (#6) and blue (#384). The error bars represent the standard deviation 
of the mean. 
Table C.2 Growth conditions (filter types) and doubling times for blue light growth studies. Each 
condition was grown in duplicate. One replicate in the yellow condition was excluded due to 
severe clumping. 
Final Cell Count  
Filter (cells mL-1) Doubling Time (h) 
Average ±  Standard Deviation Average ±  Standard Deviation 
Control 1.45e6 ± 1.41e4 38.49 ± 10.55 
Light Yellow (#313) 1.23e6 ± 2.62e5 27.04 ± 12.12 
Yellow (#6) 1.11e6 40.23 ± N/A 
Blue (#384) 1.10e6 ± 9.19e4 61.45 ± 2.66 
  
The highest observed chlorophyll concentrations were in the blue (#384) filtered cultures 
(Figure C.8) correlated with the lowest PAR. However, the filtered RGd-1 cultures were similar 
in chlorophyll concentration to the no-filter control. The chlorophyll enzyme, glutamate-1-
semialdehyde aminotransferase (gsa) is blue light-induced in Chlamydomonas reinhardtii.265 
Cell Concentration (cells mL⁻¹)
 240 
This enzyme catalyzes pyridoxal 5'-a phosphate-dependent reaction which converts glutamate-1-
semialdehyde (GSA) to δ-aminolevulinate (ALA), which is the first committed step in porphyrin 
biosynthesis.266 The decreased PAR in the blue (#384) filtered cultures may have resulted in 
increased chlorophyll concentrations. The low PAR, high chlorophyll concentration has been 
observed in other algae cultures.267, 268 
The two culture conditions resulting in the highest final Nile Red fluorescence 302.5 ± 
13.4 and 231 (one tube) also had the highest blue light intensities, blue (#384) and yellow (#6) 
(Figure C.9). However, this trend did not remain consistent for the two lower blue light 
intensities, light yellow (#313), and the control. The blue (#384) culture condition grew the 
slowest of the four conditions with a 61.45 h doubling time (Table C.1, Figure C.3). Increased 
TAG under higher blue light conditions can be explained by increased activity of carbonic 
anhydrase and ribulose bisphosphate carboxylase/oxygenase (Rubisco), both of which have 
greater activity under blue light.269, 270 It is important to recognize that the cultures in this study 
never reached maximum Nile Red fluorescence due to clumping in some of the tubes. However, 
trends can still be observed. There may be different optimal blue light intensities for TAG and 
for doubling time. Future work is required to elucidate this effect. 
 
 241 
1.2
1
0.8
Control
0.6
Light Yellow
Yellow
0.4 Blue
0.2
0
0 2 4 6 8 10 12 14
Time (Days)
 
Figure C.8 Total chlorophyll concentrations (mg/mL) for each of the 4 blue light conditions 
tested, the no-filter control, light yellow (#313), yellow (#6) and blue (#384). The error bars 
represent the standard deviation of the mean. 
350
300
250
200
Control
Light Yellow
150
Yellow
Blue
100
50
0
0 2 4 6 8 10 12 14
Time (Days)
 
Figure C.9 The Nile Red fluorescence (rfu) for the 4 blue light conditions tested, the no-filter 
control, light yellow (#313), yellow (#6) and blue (#384). The error bars represent the standard 
deviation of the mean. 
 
Nile Red fluorescence (rfu) Chlorophyll (mg/mL)
 242 
Conclusions 
 It was expected that there would be a clear effect of blue light supplementation on RGd-1 
growth, cellular morphology, and lipid productivity. In particular, it was thought there would be 
a decreased doubling time with increased blue light intensity. RGd-1 cultures grown with the 
light yellow filter (#313) were consistent with this trend where the highest blue light intensity 
resulted in the lowest doubling time compared to the other growth conditions (Table C.1). The 
highest total Nile Red fluorescence was found in the yellow filter condition (#6). The blue filter 
(#384) condition resulted in the highest doubling time and highest Nile Red fluorescence. 
However, the overall PAR (Table C.1) was considerably lower compared to the other conditions 
resulting in a slower doubling time (Table C.2), which resulted in higher concentrations of lipids. 
These results indicate that the light yellow filter may be promising as an additional strategy for 
decreasing the RGd-1 growth rate.   
Algal biofuels is an emerging technology, but further research is required to decrease 
production costs. Diatom strain, RGd-1, is a very promising candidate for large-scale algal 
biofuel applications due to its high TAG and biodiesel content at 30-40% (w/w) and 70-80% 
(w/w), respectively. To increase the growth rate, other low-cost strategies are required, such as 
supplementing cultures with blue light. Understanding the genes expressed under the different 
blue light conditions will elucidate the role of blue light on the cell cycle which may be highly 
correlated with the growth rate. Further, it is important to perform fundamental studies of diatom 
growth that provide the best opportunities to maximize understanding of unique aspects of 
diatom physiology.  
To understand the effects of different blue light intensities on RGd-1 growth, it is 
important to determine the changes in global gene expression. Given the light-dependent 
 243 
activation of aureochromes that regulate dsCYC2 at the G1/S phase of the cell cycle, it is 
expected that there would be increased expression for gene encoding cyclin D, which is 
prominent during the G1 phase of the cell cycle.28 Additionally, diatom specific cell cyclins 
dsCYC1, dsCYC2, dsCYC5, dsCYC6, dsCYC7, dsCYC8, dsCYC9, and dsCYC11 mRNA were 
found to be more prevalent during G1 or S phases for P. tricornutum, 28 and we expect to see 
increased expression at G1/S for RGd-1. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 244 
 
 
 
 
 
 
APPENDIX D 
 
THE EFFECTS OF ARSENIC SUPPLEMENTATION ON RGd-1 GROWTH RATE AND 
LIPID ACCUMULATION 
  
 245 
Introduction 
Green algae and diatoms have the ability to take up and transform arsenic in a variety of 
aquatic environments. Microbial bioremediation, including algae, may provide a more cost-
effective and environmentally less detrimental way of remediating heavy metals such as 
arsenic.271 Arsenate and arsenite can be taken up through the cell via phosphate transporters and 
aquaglyceroporins, respectively.272, 273 The RGd-1 culture is especially interesting because 
arsenic resistance and reductase genes have been identified in both the diatom and an associated 
bacterium, Brevundimonas sp. with an assembled genome. 
Arsenic resistance in bacteria is under the control of the ars operon.272, 274, 275 The ars 
operon contains three genes, arsR (trans-acting repressor), arsB (membrane-bound arsenite 
permease pump) and arsC (intracellular arsenate reductase).272 ArsR senses the presence of 
As(III) and controls the expression of arsB and arsC. ArsC reduces arsenate to arsenite using 
glutathione as a reducing agent. ArsB functions as an As(OH)3- H+ antiporter resulting in the 
As(III) out of the cell. In some bacteria such as E. coli, additional genes are co-located in the ars 
operon, arsD, and arsA. E. coli has also been found to have two efflux proteins, ArsB, and 
ArsAB that facilitate the removal of arsenite from the cell.  
Arsenate (AsO43) is chemically similar to phosphate (PO43-) and is, therefore, a competitive 
inhibitor of phosphate uptake at high concentrations.273 Decreased phosphate concentrations for 
some microalgae cultures have led to an increase in arsenate uptake.276 Arsenate uptake could be 
reduced by increasing the PO43- concentrations in Chlorella salina and Skeletonema costatum,277 
and in Chlamydomonas reinhardtii.272 Arsenate reduction has been identified in Chlorella, 
Skeletonema costatum, and Thalassiosira.276, 278 Previously, Chlorella has been found to convert 
 246 
arsenate to dimethylarsenic species.278 Interestingly, this Chlorella was isolated from hot springs 
in Japan, which contain higher levels of arsenic compared to other freshwater areas such as non-
volcanic or non-hotspring streams and lakes.278 The cyanobacteria, Synechocystis and 
Cyanidiales can oxidize As (III) to As (IV).273, 279  
Arsenic is introduced into the environment by natural occurrences such as weathering of 
rocks, volcanic or hotspring activity, or by anthropogenic activities such as pesticides, mining, 
the combustion of fossil fuels or wood preservatives.272, 280 In marine environments, algae and 
other phytoplankton that are capable of arsenic uptake, convert it into dimethylarsenic species, 
which exists primarily as arsenosugars catalyzed by the enzyme, arsenite methyltransferase 
(ArsM). When consumed up the food chain, the dimethylarsenic species are converted to 
trimethyl arsenic species (arsenobetaine). The arsenosugars and arsenobetaine that 
bioaccumulate up the food chain have much lower toxicity compared to arsenate/arsenite for 
mammals.278 Arsenoribosides have been detected in Chaetoceros concavicornis, Chlorella 
vulgaris, Monoraphidium arcuatum, Chlamydomonas, Dunaliella, Phaeodactylum, 
Thalassiosira, and the cyanobacteria, Synechocystis and Nostoc272. Lipid-soluble arsenic 
compounds have been detected in C. vulgaris, C. ovalis, C. pyrenoidosa, D. tertiolecta, P. 
tricornutum, S. costatum, and T. pseudonana and the cyanobacteria, Oscillatoria rubescens and 
Synchecystis272. The major arsenic species in freshwater systems are inorganic arsenic and 
methylarsenicals including dimethyl- and trimethylarsenic species.278 
Other diatoms besides RGd-1 have been found in arsenic-contaminated water. Skeletonema 
costatum was found to reduce methylated arsenic (III) at NW Netley Buoy and Calshot Buoy near 
Hampshire, England.281 Methylation and reduction of arsenate and arsenite have been found 
primarily in photic zones and taken up by phytoplankton, including algae. Further, 
 247 
methylarsenicals and arsenite are have been located in freshwater systems with concentrations 
similar to marine concentrations.282 Witch Creek has been found to have approximately 300 ppb 
arsenic, or thirty times the upper limit of the EPA allowable drinking water standard of 10 ppb.283, 
284   
Not all diatoms can utilize arsenic. The main effect on Achnanthidium minutissimum was 
a reduction in cell size during acute exposures.285 Arsenic has been found to inhibit growth and 
oxidative phosphorylation in phytoplankton.277, 286, 287 Specifically, arsenic uptake by diatoms has 
been found to occur when phosphate concentrations are low.277 Diatoms were shown to reduce 
arsenate to dimethylarsenic acid in aquatic environments, which results in a stable, less toxic 
arsenic form.   
While arsenic cannot be degraded from contaminated sites, it can be transformed to a less 
toxic state. Several methods of arsenic remediation have been identified.273 
1. Biotransformation using arsM. Many bacteria can convert arsenate to volatile methylated 
species in aqueous or sub-surface soil systems.273 
2. Sulfate reduction. With sulfate as the electron acceptor, sulfate-reducing bacteria (SRB) 
have been found to generate sulfide which can then react with arsenic. The sulfide-arsenic 
complex has low solubility causing the complex to precipitate. Arsenic can be precipitated 
as arsenates, arseno-sulfides or co-precipitated with other metals.273  
3. Ferrous oxide removal. When As (III) is oxidized to As (V), it becomes adsorbed to Fe(III) 
and is removed from aqueous systems or immobilization in soil systems.273 
According to Figure D.1, the arsenic was predominantly in the form of HAsO42- at pH 9.3 
when RGd-1 was collected from Witch Creek. Inorganic forms of arsenic such as arsenate, As 
 248 
(V), and arsenite, As (III), are the most toxic. As (V) can become incorporated into 
phosphorylated compounds that will be used for ATP synthase or by blocking (sulfhidryl) SH 
groups which would interfere with enzymes such as pyruvate dehydrogenase or 2-oxoglutarate 
dehydrogenase resulting in membrane leakage and cell death due to the production of reactive 
oxygen species.288  
For diatom strain, RGd-1, early results had shown a 28% decrease in doubling time when 
grown artificial Witch Creek water (Bold’s Basal Medium with added Witch Creek chemicals) 
compared to when sodium arsenate was not included in the artificial Witch Creek water. Similarly, 
a decrease in doubling time was found when RGd-1 was grown in the presence of sodium arsenite 
(25 ppb) compared to the control without arsenite. Both datasets indicated that there was potential 
for increased TAG and decreased doubling time in the presence of sodium arsenate in modified 
Bold’s Basal Medium (B8.7SiS – Bold’s + 2 mM sodium metasilicate and ASP2 concentrations 
of B12 and other vitamins) as a diatom growth medium.49, 117, 118 It has been suggested that some 
microbiota detoxify arsenic by attaching the metalloid to lipids as arsenolipids.289 Further, some 
algae have been found to hyper-accumulate and transform arsenic from the surrounding water.272  
Due to these initial testing results, it was hypothesized that the presence of arsenic might 
decrease the RGd-1 doubling time and that arsenic would be stored as arsenolipids that can be 
converted for use in biofuels on the high-lipid producing diatom, RGd-1. Here, both arsenate and 
arsenite were added to B8.7SiS and different molar ratios of phosphorus to arsenic were tested to 
determine the optimal ratio for RGd-1 growth.  
 249 
 
Figure D.1 Speciation of As(III) and As(IV) across pH ranges -2 to 14 in water.57, 58 
 
Methods 
There were two sets of experiments that exposed RGd-1 to arsenic; (1) initial testing with 
RGd-1 growing in Witch Creek Water or Artificial Witch Creek Water and (2) growth in sodium 
arsenate in B8.7SiS.  
Initial Testing 
The initial testing was performed in triplicate 250 mL flasks. RGd-1 Cultures were grown 
under 14:10 light/dark (L/D)cycle, aerated with ambient air temperature-controlled incubator at 
30°C. The light intensity of the incubator was μmole photons m-2s-1 using twelve T5 four ft 
fluorescent lights in a square-wave 14:10 light/dark (L/D) cycle. Samples were collected daily 
just prior to the end of the light cycle. To quantify the growth, a minimum of 100 cells were 
counted from each sample using a hemacytometer (Reichert). The sample pH was measured 
using a standard benchtop pH meter (Accumet). The lipid accumulation was measured daily by 
staining the cultures with Nile Red (0.25 mg/mL suspended in 20% DMSO) and using a 
 250 
microplate reader (BioTek Synergy). Final dry cell weights (DCW) were assessed at the end of 
the experiment by filtering 25 mL of each culture with pre-weighed F/F Glass Microfiber Filters 
(Whatman). Samples were re-weighed after drying at 60°C after approximately 24 and 48 hours.  
 RGd-1 cultures were grown in water collected from Witch Creek (Witch Creek Water) 
and Artificial Witch Creek Medium. Because Witch Creek water is in limited supply, the 
chemicals from Witch Creek that were not present in B8.7SiS were added to create an artificial 
medium named Artificial Witch Creek Medium ((AWCM) Table D.1) used in later growth 
studies. Based on ion chromatography (Dionex ICS-1100 Ion Chromatography System), 
inductively coupled plasma-mass spectrometry (ICP-MS (Agilent Technologies 7500ce) and 
nitrogen analysis, it was possible to determine the major chemicals constituting Witch Creek 
water. The AWCM, therefore, contained all measured Witch Creek chemicals, except antimony.  
Ash Free Dry Weight (AFDW) 
The ash-free dry cell weight was determined by heating the DCW sample and filter to 
500°C for five hours. The samples were weighed immediately after removing from the furnace. 
The difference between the remaining sample DCW and filter was the ash-free dry weight. 49, 290, 
291  
 
 
 
 
 251 
 
Table D.1 2013 Witch Creek water analyses and B8.7SiS chemical concentrations. 
Creek Water Bold's Basal Medium
Sodium (ppm) 113.065 114.719
Magnesium (ppm) 13.129 7.389
Aluminum (ppb) 0.069 0.000
Silicon (ppm) 72.105 56.170
Potassium (ppm) 8.320 82.779
Calcium (ppm) 7.200 6.813
Vanadium (ppb) 0.001 0.000
Manganese (ppm) 0.782 0.399
Iron (ppm) 3.490 1.000
Cobalt (ppb) 0.349 0.099
Copper (ppb) 0.872 0.399
Zinc (ppb) 0.149 2.001
Arsenic (ppb) 0.300 0.000
Selenium (ppb) 0.004 0.000
Molybdenum (ppb) 0.095 0.473
Cadmium (ppb) 0.001 0.000
Antimony (ppb) 0.011 0.000
Barium (ppb) 0.010 0.000
Lead (ppb) 0.002 0.000
F (ppm) 130.820 0.000
Cl (ppm) 709.960 27.952
SO42- (ppm) 510.260 34.468
TC (ppm) 31.130 18.976
DIC (ppm) 13.525 0.000
TOC (ppm) 1.995 18.976
TN (ppm) 0.125 0.000  
FAME analysis using GC-MS 
Harvested biomass was lyophilized to remove all the water content. The biomass (20 - 30 
mg) was added to borosilicate culture tubes (Pyrex) and capped with Teflon caps. To the biomass, 
1 mL of toluene and 2 mL of sodium methoxide were added and the mixture heated at 90oC for 30 
minutes while vortexing every 10 minutes (Fischer Scientific). Two mL of 14% boron trifluoride 
 252 
(14%) in methanol (Sigma Aldrich) were added to room temperature-cooled samples and heated 
for an additional 30 minutes with vortexing every 10 minutes. Following this treatment, the 
samples were cooled and 0.8 mL of hexanes (Fischer Scientific) and 0.8 mL of saturated NaCl 
solution were added to facilitate separation of FAMEs into the organic phase. Samples were 
reheated to 90oC for 10 minutes to further facilitate phase separation and then centrifuged for two 
minutes at 6 ×g. The top organic layer was extracted using a glass syringe (Hamilton) and then run 
on gas chromatography-mass spectrometry (GC-MS) (Agilent) using a DB-23 column (Restek) to 
quantify FAMEs against NLEA FAME mix (Restek).  
Determination of the optimal P:As ratio 
Cultures grown with increased phosphate concentrations have been found to take up less 
arsenate. Previous studies with Agrobacterium tumefaciens have found an optimal P:As ratio of 
10:1.292 RGd-1 was grown in 5 different phosphorus to arsenic ratios to determine what would be 
the optimal parameters for growth; control (B8.7SiS P:As ratio, 1:1, 2:1, 5:1 and 10:1) (Table 
D.2).  
Table D.2 Molar Phosphorus and arsenic ratios used in arsenate experiments.61 
P:As Control 10:1 5:1 2:1 1:1 
P 1.63 mM 40 µM 20 µM 8 µM 4 µM 
As 0 1 µM 1 µM 1 µM 1 µM 
 
 253 
The 5:1 P:As ratio was selected to carry out the remainder of the experiments because it 
resulted in the highest cell count and was among the fastest doubling times. Therefore, all other 
experiments maintained the 20 µM P concentration while varying the As concentration in the 
growth medium. Low and high sodium arsenate ranges were targeted (low = 10-90 ppb As; high 
= 100-900) (Table D.3). 
Table D.3 Descriptions of arsenate experiments. Seven experiments were performed with 
different concentrations of sodium arsenate. The first set of experiments was used to determine 
the optimal As:P ratio for RGd-1. That phosphorus concentration was used for all future 
experiments with varying As. 
Experiment	 Description	 Location	
1	 P:As	Ratios	(10:1,	5:1,	2:1,	1:1;	0	As)	 Cobleigh	
2	 High	As	concentrations	(0,	150,	300,	600,	900	ppb)	– 307	
Discounted	due	to	undefined	period	of	time	with	lights	on	in	
Cobleigh	307	(Days	14-16).	P=20	µM	
3	 Low	As	concentrations	(0,	10,	25,	50,	75	ppb).	P=20	µM	
4	 High	As	Concentrations	–	repeated	(0,	150,	300,	600	ppb).	 Barnard	
P=20	µM	 115	
5	 Filling	in	low	As	concentrations	(0,	30,	40,	60,	90	ppb).	P=20	
µM	
6	 Bold’s	Basal	Medium	PO43-	concentration	=	1.63	mM		
(As	concentrations	=	0,	10,	50,	100,	500)	
7	 0	P	(As	concentrations	=	0,	10,	50,	100,	500)	
  
Results 
A consistent arsenic concentration of 262.5 ppb has been measured over multiple years 
since 2009. It was hypothesized that RGd-1 may have adapted to survive and succeed in high 
arsenic concentrations. Specifically, (1) the lack of arsenic in the RGd-1 growth medium may 
contribute to a changeable growth rate and cellular morphology (2) RGd-1 assimilates arsenic 
into arsenolipids or arsenosugars (3) RGd-1 respires arsenic reducing arsenate to arsenite. The 
following series of experiments were designed to determine the effect of arsenic on RGd-1 
 254 
growth and lipid accumulation. RGd-1 was grown in two sets of experiments (1) (initial testing 
in Witch Creek Water or AWCM) and (2) B8.7SiS + sodium arsenate.  
Initial testing 
  The fastest doubling times occurred when RGd-1 was grown in Witch Creek water with 
Bold's additions. Witch Creek water is low in phosphate and nitrate (Table D.1). However, this 
growth condition resulted in the lowest Nile Red fluorescence of all of the conditions tested at 
that time (Figure D.2, Table D.3). The slowest doubling times also occurred when twice the 
amount of iron was added to the AWCM (Figure D.3, Table D.4). There was one condition that 
resulted in no growth, Witch Creek Water with Bold’s additions titrated to pH 8 with Bold’s 
concentration of ethylenediaminetetraacetic acid (EDTA).  
 255 
45000
40000
35000
30000
25000
20000
15000
10000
5000
0
iS Fe ter 9.3  8 8.0 .30  8  8S x a  H  A
9 H H T SiS  8 A) A  N  8
8.7 iS 2  w  pH
T
 p H T  + H
B S er  pH H ) p s p  ED 8.7 ) p ED  ED ns ) p
8.7 ree
k
SiS at iS  p A  A  + B A  + tio A
B  C .7  W S
8.7 ate
r DT  no  8 T e& 8 i T
h 8 k E ns  pH
e B W & o &ED (F H dd ED
itc B re k Fe iti ns Fe s 2
x
ns p lds A &
W  C ee Fe
ch Cr  2x
 ( dd itio  ( on io o  (
it h s  a dd s 2
x iti dit B x
c n W A n dd Ad er 
+ s 2
W it n
W ditio  C s itio  a  
 + old d CW old
s at
W itio
 ad SiS d   dd
W 8.7 r +
 B  a 2 +  + B k a
e W S r ree
 
e W
S + C B at + C i
 S at.7 h CW S W tc S + C
Si k Si B8 k i Si
B8.7 Cree .7
B8 Cree W .7
h h B8
itc itc
W W
 
Figure D.2 The Nile Red fluorescence for the initial testing with sodium arsenate. The error bars 
represent the standard deviation of the mean. 
Nile Red fluorescence (rfu)
 256 
250
200
150
100
50
0
SiS Fe
.7 2x ate
r 9.3  8 0 8 8 ) 8
 w H H  8. .30  8
H H TA SiS H A TA + N H 
B8 iS  
ek   p ter
 p  9 p 7 p T  p
S S a iS pH H ) s pp A A  ED 8.B A) ED  ED s  
.7 ion A)
8 Cre i S er T o 8 + T e& 8 + it T
B h 8.7
S  W .7 at ED s n H 
8 ED (F H dd ED
itc B ree
k B k W e& n  p & x  p A
W  C ee  (F ditio ns Fe s 2 ns s d Fe&
ch Cr  2x  ad ditio x ( n
tio itio Bol x (
it  s s 2 di d + 
h s 2
W itc itio
n W  Ad on ad  Ad er C s n
W d  + ld diti  ds at itio
 ad
W l d
7S
iS
+ Bo
 ad  C
.  +  Bo k W  ad
 CW B8 ter  CW iS 2 er 
+
t Cree
+ a + S a   CW
iS  .7
S ch  +
.7 eek
 W iS 8  W t iS
8 r 8.7
S B k i S
B C B ree W .7
 B8
tch tch
 C
i i
W W
 
Figure D.3 The doubling times for the initial testing with sodium arsenate. Two conditions did 
not grow; Witch Creek Water with Bold’s additions at pH 8 + EDTA. This condition was 
repeated to verify the result. The error bars represent the standard deviation of the mean. 
 
 
 
 
 
Doubling Time (h)
 257 
Table D.4 Doubling times and DCW for the RGd-1 initial testing with sodium arsenate. 
Samples Average DCW 
doubling time (g/L) 
 (h)  
Rgd1 Growth rate experiment-2 
B8.7SiS 57.45 0.536 
Witch Creek Water 53.67 0.139 
Bold's Media, 2xFe, pH 8.70, [Fe]/[EDTA]=0.30  55.81 0 .424 
Rgd1 Growth rate experiment-3 
Witch Creek Water w/ Bold's Media Additions pH 8.00, 0.51 30.72 0.503 
M Nitrogen, [Fe]/[EDTA]=0.47 
Witch Creek Water w/ Bold's Media Additions pH 9.30, 0.51 36.00 0.543 
M Nitrogen [Fe]/[EDTA]=0.47 
Bold's Media w/ Witch Creek Water Additions pH 8.00, 37.02 0.432 
[Fe]/[EDTA]=0.47 
Bold's Media w/ Witch Creek Water Additions pH 9.30, 38.53 0.463 
[Fe]/[EDTA]=0.47   
Growth rate experiment-4 
Bold's Media, pH 8.70 51.92 0.508 
Bold's Media w/WC additions, no As, pH 9.30, 78.90 0.488 
[Fe]/[EDTA]=0.47 
WC  w/Bold's Media additions, 2x EDTA, pH 9.30, no growth 0.187 
[Fe]/[EDTA]=0.15 
Bold's Media w/WC additions, 2x Fe&EDTA, pH 9.30, 39.59 0.336 
[Fe]/[EDTA]=0.30 
Bold's Media, 2x Fe&EDTA, pH 9.30, [Fe]/[EDTA]=0.15  52.05 0 .491 
Growth rate experiment-5 
Bold's Media, 2x Fe&EDTA, pH 9.30, [Fe]/[EDTA]=0.15 148.08 0.153 
Witch Creek Water  w/Bold's Media additions, 2x EDTA, pH no growth 0.024 
9.30, [Fe]/[EDTA]=0.15 
Witch Creek Water w/ Bold's Media Additions, 2.97 M 68.81 0.480 
Nitrogen (right amount) pH 9.30, [Fe]/[EDTA]=0.47 
Bold's Media w/WC additions, 2x Fe&EDTA, pH 9.30, 91.99 0.360 
 [Fe]/[EDTA]=0.30   
K.M. Moll et al 2014 (2 mM Si) (tube reactors) 28.28 0.500 
 
Arsenate 
 When grown in different P:As ratios, it was observed that there was a significant increase 
in cell numbers in the 5:1 P:As condition (Figure D.4). The doubling time was the lowest in the 
10:1 and 5:1 conditions. With the increase in cell numbers and low doubling time for the 5:1 
 258 
condition, it was chosen to proceed forward for the remainder of the studies. The 1:1 condition 
was the most similar in cell numbers and in doubling time when compared to the B8.7SiS control 
(Table D.4).  
 The pH was similar in the 5 conditions tested. There was a significant increase in the 5:1 
and 10:1 condition (Figure D.5). However, the 5:1 and 10:1 ratio conditions were within error of 
each other. According to Figure D.6, there was no significant difference in Nile Red fluorescence 
between the five conditions.  
3.0E+07
2.5E+07
2.0E+07
CONTROL
1.5E+07 10:1
5:1
1.0E+07 2:1
1:1
5.0E+06
0.0E+00
0 2 4 6 8 10 12
Time (Days)
 
Figure D.4 Cell counts for four phosphorus to arsenic ratios (10:1, 5:1. 2:1 and 1:1) and a control 
(B8.7SiS phosphorus concentrations). The error bars represent the standard deviation of the 
mean. 
Cell Counts (cells mL⁻¹)
 259 
11.5
11.0
10.5
10.0
9.5 CONTROL
10:1
9.0 5:1
2:1
8.5 1:1
8.0
7.5
7.0
0 2 4 6 8 10 12
Time (Days)
 
Figure D.5 The pH for four phosphorus to arsenic ratios (10:1, 5:1. 2:1 and 1:1) and a control 
(B8.7SiS phosphorus concentrations). The error bars represent the standard deviation of the 
mean. 
8,000
7,000
6,000
5,000
CONTROL
4,000 10:1
5:1
3,000
2:1
2,000 1:1
1,000
0
0 2 4 6 8 10 12
Time (Days)
 
Figure D.6 The total Nile Red fluorescence for four phosphorus to arsenic ratios (10:1, 5:1. 2:1 
and 1:1) and a control (B8.7SiS phosphorus concentrations). The error bars represent the 
standard deviation of the mean. 
 
Nile Red fluorescence (rfu)
pH
 260 
The fastest doubling times occurred for the 0 phosphorus, 10 and 50 ppb arsenate 
conditions. These conditions grew significantly faster than the same arsenate concentrations grown 
at the 5:1 phosphorus to arsenic concentration (Figure D.7). Neither the arsenate nor the 
phosphorus concentrations affected the AFDW (Figure D.8).  
As shown in Figure D.11, the highest % FAMEs occurred for cultures grown in the highest 
arsenate conditions, whereas the lowest % FAMEs were found in cultures grown in 0 phosphate 
conditions (Figure D.11). The percent TAG decreased substantially compared to previous reports 
of 70-80% FAME.49 However, there was substantially lower phosphate in these experiments 
compared to previous high FAME-accumulating experiments (B8.7SiS concentrations of P (1.63 
mM)).117  
60
50
40
30
20
10
0
0 0 0 0 10 10 30 40 50 50 60 90 100 100 100 150 300 500 500 600
Arsenate Concentration ppb
 
Figure D.7 The doubling times for the different arsenate concentrations tested. The error bars 
represent the variance resulting from an ANOVA (2-factor without replacement) analysis. The 
error bars represent the standard deviation of the mean. 
 
Doubling time (h)
 261 
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 0 0 0 10 10 30 40 50 50 60 90 100 100 100 150 300 500 500 600
Arsentate Concentration ppb
 
Figure D.8 The average ash-free dry weights for the different arsenate concentrations tested. The 
error bars represent the variance resulting from an ANOVA (2-factor without replacement) 
analysis. The error bars represent the standard deviation of the mean. 
Discussion 
The initial testing revealed a 28% decrease in doubling time when arsenate was present in 
AWCM. However, this decrease was not observed when arsenate was added to B8.7SiS. Witch 
Creek Water contains an extensive list of chemicals that are not present in B8.7SiS. There may be 
chemicals (e.g. antimony) in AWCM that interact synergistically with arsenic that may contribute 
to an increased growth rate and TAG accumulation.  
In the B8.7SiS + As studies, the TAG concentration decreased considerably compared to 
previous RGd-1 reports in B8.7SiS. This indicates that the addition of arsenic to B8.7SiS had a 
negative effect on TAG accumulation. However, there was also substantially lower concentrations 
of phosphorus in these studies compared to standard Bold’s phosphorus concentrations, perhaps 
resulting in insufficient phospholipids to build lipids.293  
DCW (g/L)
 262 
Arsenotriacylgylcerols arsenic hydrocarbons and arsenic fatty acids have been observed in 
marine algae including the green alga, Coccomyxa. When transesterified, the arsenophosphate 
group would be cleaved off resulting in fatty acid methyl esters (FAMEs).289, 294, 295 When 
transesterified, the head group containing the arsenic will be cleaved off leaving free fatty acids 
and a pool of soluble arsenic. Another arsenic precipitation step would be required to remove the 
majority of the arsenic from the arsenic pool. It may be possible that arsenic exists midchain or at 
the end of the chain in arsenotriacylglycerols and arsenic fatty acids. In that case, it would not be 
possible to obtain arsenic-free FAMEs, since transesterification cleaves the head group. It is not 
likely that FAMEs containing arsenic will be combusted for use as biodiesel or used for 
neutraceuticals. Another step would be required to separate arsenic-free FAMES from arseno-
FAMEs for use in biodiesel combustion. As mentioned above, when diatoms methylate arsenic, it 
becomes less toxic. A more favorable option would be simply to use diatoms or other algae that 
are able to assimilate arsenic into the biomass to bioremediate arsenic-contaminated sites.  
According to the results from the initial testing, RGd-1 and its potential phycosphere 
bacteria may have adapted to the chemical constitution of Witch Creek so that in the presence of 
arsenic, RGd-1 may have a reduced growth rate and increased TAG accumulation. The co-
habitating bacterium, Brevundimonas sp. strain, KM-427, has been found to contain the 
following genes in the Ars operon, arsH (arsenic resistance), arsM (As(III)-methyltransferase, 
arsR (arsenate reductase) and acr3 (arsenic resistance). These genes are involved in arsenic 
resistance. 296, 297 Brevundimonas sp., strain KM-427, may contribute RGd-1 success in high 
arsenic conditions.  
 263 
 The observed RGd-1 doubling time was the lowest when grown in 0 phosphorus conditions 
with 10 and 50 ppb, and the FAME concentrations were the lowest in these conditions as well. 
Because arsenic is an analog of phosphate, it can potentially replace phosphate in molecules such 
as arsenolipids.289 Arsenic may inactivate the phosphate active transport system as well as inhibit 
glucose metabolism.277, 298, 299 This may explain why the doubling time was lowest in the lowest 
arsenate concentrations. There may have been just enough arsenate to improve the growth rate 
without negatively impacting other aspects of metabolism like glucose metabolism.  
Summary & Conclusions 
 The fastest growth rates occurred in the 0 phosphorus, 10 and 50 ppb arsenate conditions. 
These arsenate concentrations may be promising for future work, especially in AWCM. 
However, the low FAME results are currently sub-optimal compared to previous results. Future 
work at higher phosphorus concentrations may result in high FAME concentrations. Further, it is 
necessary to determine whether arsenic is assimilated or respired. This work has a potential 
application for arsenic bioremediation through assimilation into diatom biomass into 
arsenolipids.  
Sodium arsenite – Supplementary data 
 To determine the effects of arsenite on RGd-1, cultures were grown in B8.7SiS with 
added sodium arsenite in low (10 -50 ppb) and high (100-600 ppb) concentrations. No other 
modifications to the growth medium were made. Cultures were grown in 1.25 L photobioreactors 
illuminated with 400 μmole photons m-2s-1 of light and grown at 27°±1°C. When grown in 
B8.7SiS + sodium arsenate, there was no consistent effect on doubling time (Figure D.9) or Nile 
Red fluorescence (Figure D.10).  
 264 
60
50
40
30
20
10
0
0 0 10 30 50 100 150 300 600 0 10 15 20 25 40
Arsenite concentration ppb  
Figure 4 The doubling time for RGd-1 cultures grown in the presence of sodium arsenite in place 
of sodium arsenate at varying concentrations. 
 
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
0 0 10 30 50 100 150 300 600 0 10 15 20 25 40
Arsenite Concentration ppb
 
Figure D.10 The Nile Red fluorescence for RGd-1 grown in the presence of sodium arsenite 
instead of sodium arsenite at varying concentrations. 
 
 
Nile Red fluorescence (rfu) Doubling Time (h)
 265 
 
 
 266 
 
 
 
 267 
 
 
 
 
 268 
 
 
 
 
 269 
 
 
 
  
 270 
 
 
 
 
 
 
APPENDIX E 
 
STRATEGIES FOR OPTIMIZING BIONANO AND DOVETAIL EXPLORED THROUGH A 
SECOND REFERENCE QUALITY ASSEMBLY FOR THE LEGUME MODEL, MEDICAGO 
TRUNCATULA 
  
 271 
Manuscript Information 
 
 
Karen M. Moll, Peng Zhou, Thiruvarangan Ramaraj, Diego Fajardo, Nicholas P. Devitt, Michael 
J. Sadowsky, Robert M. Stupar, Peter Tiffin, Jason R. Miller, Nevin D. Young, Kevin A.T. 
Silverstein, Joann Mudge 
  
BMC Genomics 
 
Status of Manuscript:  
____ Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
_x__ Published in a peer-reviewed journal 
 
18:1 
 
 
  
 272 
Abstract 
Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, 
facilitate genome assembly by spanning ambiguous regions and improving continuity. This has 
been critical for plant genomes, which are difficult to assemble due to high repeat content, gene 
family expansions, segmental and tandem duplications, and polyploidy. Recently, high-
throughput mapping and scaffolding strategies have further improved continuity. Together, these 
long-range technologies enable quality draft assemblies of complex genomes in a cost-effective 
and timely manner. Here, we present high quality genome assemblies of the model legume plant, 
Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano 
technologies. To test these technologies for plant genome assembly, we generated five 
assemblies using all possible combinations and ordering of these three technologies in the R108 
assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary 
gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio 
alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, 
resulted in notable improvements compared to Dovetail or BioNano alone.  A combination of 
PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. 
truncatula accession widely used in studies of functional genomics. This strategy proved 
efficient and cost-effective for developing a quality draft assembly compared to traditional 
reference assemblies. As a test for the usefulness of the resulting genome sequence, the new 
R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a 
previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 
Mb of novel sequence not present in the earlier A17 reference assembly. 
 
 273 
Background 
Next generation sequencing technologies such as 454, Illumina, and SOLiD became 
available in the late 2000s.17, 300 These technologies have the advantage of extremely high 
throughput and much lower cost per sequenced base compared to Sanger sequencing.26, 301-303 
Long read sequencing technologies, such as PacBio and Oxford Nanopore, produce reads in the 
tens- of kilo-base range, much longer than what was possible even with traditional Sanger 
technology. However, they also have higher error rates, lower throughput, and higher costs per 
base compared to the short read technologies. Recently, PacBio throughput and cost per base 
have improved to the point that de novo plant genome assemblies using only PacBio are 
possible.304, 305 
Concomitantly, the throughput and cost of long-range scaffolding and mapping 
technologies that can increase continuity of an assembly have also improved dramatically. 
Traditional physical maps, dependent on expensive BAC library preparation, have given way to 
a variety of new technologies, including Opgen, Keygene, BioNano, and Nabsys maps.25, 306-309 
BioNano is a high throughput optical mapping technology that utilizes endonucleases to nick 
long DNA molecules at the enzyme’s recognition site, incorporating fluorescent nucleotides to 
obtain sequence-based patterns. The specific patterns are then used to assemble DNA molecules 
into a larger genome map, which can then be used to direct and improve a de novo genome 
assembly.310 
Genomic architecture analyses also can be achieved by sequencing libraries produced 
from chromatin proximity ligation methods such as Hi-C.311 Dovetail Chicago libraries are 
similar to Hi-C but rely on library preparation from in vitro rather than in vivo reconstituted 
chromatin that has been cross-linked and sheared. Dovetail Chicago libraries also use extraction 
 274 
of high molecular weight DNA extraction which limits input DNA length compared to Hi-C, 
which uses intact chromosomes. These libraries retain proximity signal with sequences 
physically close together being linked more often than those farther apart. This generates 
sequence pairs with insert sizes that can be as large as the size of the input DNA, typically 
~100kb, for use in scaffolding with  Dovetail’s in-house software.29 
Although BioNano and Dovetail are both long-range scaffolding technologies, there are 
several important differences. While both rely on restriction endonuclease digestions, different 
restriction enzymes are used for both technologies, potentially introducing different regional 
biases. Dovetail and BioNano also differ in the way they handle gaps. Dovetail does not attempt 
to size the gap, but instead adds 100 Ns between scaffolds that it joins. By contrast, BioNano 
estimates gap size. Consequently, BioNano can appear to increase scaffold size more when the 
same scaffolds are joined with both technologies. In addition, BioNano does not automatically 
split sequences while Dovetail does. BioNano produces a file with possible chimeric sequences, 
but splitting of these sequences requires manual intervention by the user. 
These new sequencing and mapping technologies have increased throughput, driven 
down costs, and introduced important technological advantages facilitating the sequencing of 
plant genomes, which are notoriously difficult due to large-scale duplications and repeats.312 
Indeed, these technologies are enabling the construction of multiple high quality plant genome 
assemblies302, 304, 313-324 and are now poised to increase the number of sequenced plant genomes 
even further. 
Because legumes (family Fabaceae) are important in both agriculture and natural 
ecosystems, primarily due their capacity to form symbiotic relationships with nitrogen fixing 
bacteria, multiple genome assemblies are now available. Reference assemblies exist for lotus 
 275 
(Lotus japonicus),325 soybean (Glycine max),326 medicago (Medicago truncatula),327 chickpea 
(Cicer arietinum),328 mungbean (Vigna radiata)329 and peanut (Arachis sp.).305, 330 Recently, 
multiple genome assemblies of a single plant species have begun to appear, enabling the 
identification of variation in genome content and structure segregating within species,331-335 
including legumes.331, 334 
Medicago truncatula is a widely studied legume genome, especially in the area of plant-
bacterial symbioses. Two Medicago accessions have been mainly used for genomic studies, 
R108 and A17.327, 336 The relationship of R108 to A17, the accession used for generating the M. 
truncatula reference genome, makes it valuable both for a technology comparison and as a 
second M. truncatula assembly. Genotype R108 is one of the most distant M. truncatula 
accessions from A17.337 Relative to A17, R108 has much higher transformation efficiency, has a 
shorter generation time, and is easier to germinate, making it attractive for genetic studies.338 
Also, R108 is also important to the plant and symbiosis communities because it is the accession 
that was used to create a large Tnt1-insert population, widely used in functional analysis.336, 338 
Having two high quality references in Medicago therefore allowed us to perform comprehensive 
genome-scale comparisons between the two assemblies, revealing additional novel R108 
sequences as well as increased fine-structure details of important re-arrangement events 
compared to previous analyses using ALLPATHS-LG assemblies.334  
M. truncatula has a modest genome size, approximately 465 Mb.339 However, it also has 
an evolutionary history of whole genome duplications340, 341 and frequent local duplications, 
which appear to be particularly common in this plant species,327 both of which make assembly 
difficult. We therefore generated and evaluated five combinations of PacBio, BioNano, and 
Dovetail technology to see how the technologies could complement each other and to explore 
 276 
differences in the ordering of technologies. Ultimately, we present a second, high quality 
reference genome for M. truncatula accession R108, based on an optimized combination of the 
three sequencing/mapping technologies.  
Results 
 Assembly Pb was generated using ~100X PacBio coverage and the FALCON assembler 
followed by Quiver polishing. Four additional assemblies were then created that had either 
BioNano (PbBn), Dovetail (PbDt), or both scaffolding technologies added onto the base 
assembly.  The assemblies with both scaffolding technologies were created by applying BioNano 
and then Dovetail (PbBnDt) or Dovetail and then BioNano (PbDtBn). 
Assembly Continuity 
 The Pb base assembly had just over 1,000 contigs with no gaps in the sequence (Table 
E.1). It totals just under 400 Mb compared to 412 Mb assembled in the M. truncatula A17 
reference out of the estimated 465 Mb genome size. The contig N50 for the Pb assembly is 3.77 
Mb and the longest sequence is 13.59 Mb. We then added mapping or scaffolding technologies 
(BioNano and/or Dovetail) on top of this base assembly to improve scaffolding. 
Both BioNano and Dovetail (PbBn or PbDt) technologies improved the PacBio only base 
assembly in similar ways (Table E.1).  The number of scaffolds decreased in both assemblies, 
dropping by 80 scaffolds in the PbBn assembly and 68 scaffolds in the PbDt assembly while 
having little effect on total scaffold length (Table E.1). The PbBn assembly increased the 
scaffold length by approximately 1%, adding 4.4 Mb, likely reflecting the fact that BioNano, 
unlike Dovetail, sizes the gaps it makes when joining sequences. Dovetail adds 100 Ns for each 
gap it creates, adding only 11.6 kb to the scaffold length. 
 277 
Table E.1 Number and characteristics of contigs and scaffolds for each of the five assemblies. 
 PacBio PacBio PacBio PacBio PacBio 
(Pb) BioNano Dovetail BioNano Dovetail 
(PbBn) (PbDt) Dovetail BioNano 
(PbBnDt) (PbDtBn) 
 
Assembly FALCON FALCON FALCON FALCON FALCON 
software Irys HiRise Irys HiRise 
 HiRise Irys 
Contigs 1,073 1,073 1,121 1,125 1,121 
Contig Length 396,973,838 396,973,942 396,973,838 396,973,942 396,973,934 
Contig N50a 3,768,504 3,768,512 3,768,504 3,768,512 3,768,504 
Scaffolds 1,073 993 1,005 965 942 
Scaffold 396,973,838 401,421,527 396,985,438 401,429,527 399,955,467 
Length 
Maximum 13,488,151 22,885,216 19,275,758 12,137,306 12,557,854 
Scaffold   
Length  
Scaffold N50a 3,768,504 6,819,834 6,895,511 12,137,306 12,557,854 
 
a N50s were also adjusted to use an assembly length of 400 Mb for all assemblies in order to 
facilitate comparisons across assemblies. Scaffold and contig N50s adjusted for a 400 Mb 
assembly size were identical to unadjusted N50s shown above, except for the PbDt scaffold N50 
for which the adjusted N50 was 6,348,449 nt. 
 
The scaffold N50s increased substantially for both the PbBn and PbDt assemblies, from 
3.8 Mb in the base Pb assembly to over 6.8 Mb in both assemblies (Table E.1). Although the 
scaffold N50 was slightly higher in the PbDt assembly (6.9 Mb vs 6.8 Mb), the N50 when 
adjusted for total genome size to allow for comparisons across assemblies (adjusted N50) 
dropped to 6.3 Mb in the PbDt assembly but remained unchanged in the PbBn assembly. 
Maximum scaffold sizes increased in both assemblies, from 13.5 Mb in the Pb assembly to 22.1 
Mb in the PbBn assembly and 19.3 Mb in the PbDt assembly. 
 Adding a second technology to the PbBn and PbDt assemblies resulted in two assemblies 
that differed only in the order in which the BioNano and Dovetail technologies were applied. 
 278 
Overall, the PbBnDt and PbDtBn assemblies were very similar by scaffold size metrics (Table 
E.1).  Combining all three technologies resulted in slight decreases in the number of scaffolds, 
slight increases in total scaffold length, and large increases in scaffold N50 (Table E.1).  The 
increase in continuity was particularly striking, with the scaffold N50 nearly doubling to over 12 
Mb relative to the PbBn and PbDt assemblies and nearly tripling relative to the Pb base 
assembly. The maximum scaffold length was slightly larger in the PbBnDt assembly (30.4 Mb vs 
27.3 Mb in the PbDtBn assembly), though the PbDtBn assembly had a slightly larger increase 
over its input assembly (PbDt). 
As expected, given that neither BioNano nor Dovetail added a significant amount of 
sequence data, the number of contigs, contig lengths, and N50s, were nearly identical for all five 
assemblies (Table E.1). The only substantial change to the contig stats was a slight increase in 
the number of contigs when Dovetail technology was used, due to the breaking of chimeric 
contigs (Table E.1). 
Assembly Completeness 
To assess assembly completeness we examined the number of genomic reads that were 
captured by the assembly. We used PacBio reads, which were used to create the assemblies, as 
well as Illumina reads, which represent an independent read set, that were captured by the 
assemblies.  The base (Pb) assembly captured 91.8% of the PacBio reads and 96.8% of the 
Illumina reads.  Moreover, 95.7% of the Illumina reads aligned as pairs with expected orientation 
and distance, indicating that, at least on the local scale, the assembly is accurate. 
Because BioNano and Dovetail are scaffolding technologies, they are not expected to add 
a substantial amount of additional sequence, but rather to organize the assembly sequences into 
 279 
longer scaffolds. Indeed, the estimates of assembly completeness obtained through read capture 
did not change meaningfully upon the addition of these technologies (Supplementary Table S1). 
Gene Space Completeness 
In order to investigate the completeness of the gene space in the five assemblies we 
determined rates of capture for conserved single-copy eukaryotic genes (BUSCO)31 and an R108 
transcriptome assembly, and assessed MAKER-P annotations. Because completeness results for 
all 5 assemblies were quite similar, we discuss only results for the Pb base assembly and present 
results for the other assemblies in the supplement (Supplementary Table S2).  The BUSCO 
analysis indicates that the base assembly (Pb) captured nearly all of the genes (878 of the 956 
genes in the dataset; 91.8%).  Nearly 16% (151) of the putative single-copy genes in the BUSCO 
database were duplicated within the assemblies.  These putative duplicates might be due to true 
duplications in the R108 genome or they might be due to artificial redundancy in the assembly. 
Even though the BUSCO gene groups are generally single copy, given plant genome duplication 
rates it isn’t surprising that some of the genes are duplicated. 
In addition to looking at capture of conserved genes, we also looked at capture of an 
R108 RNA-Seq assembly that was produced independently of the genome. Assembly 
completeness results were similar to those seen with BUSCO, with approximately 92% (94,519) 
transcripts captured. However, as would be expected, the duplication rate was much higher than 
that seen in BUSCO, which specifically focuses on single copy genes. In the R108 transcript 
assembly, 37,929 transcripts (37% of total, 40.1% of aligned transcripts) were duplicated. 
Finally, we analyzed the total number of genes predicted from MAKER-P. There were 
54,111 genes compared to 50,894 gene loci in Mt4.0 (accession A17). This gives additional 
 280 
confirmation that the gene space is largely complete. Further, there may be additional genes in 
the R108 Pb assembly not found in A17 (see below). 
Joins and Breaks 
 When characterizing the joins made by BioNano and Dovetail, some interesting trends 
emerged (Supplementary Table S3). Dovetail joined more scaffolds when applied to the base 
(Pb) assembly compared to BioNano. Dovetail joined 172 Pb scaffolds into 64 PbDt scaffolds 
while BioNano joined 140 Pb scaffolds into 50 PbBn scaffolds. The same trend of more joins for 
Dovetail compared to BioNano held when adding a second scaffolding or mapping technology.  
Dovetail joined 114 PbBn scaffolds into 45 PbBnDt scaffolds and BioNano joined 96 PbDt 
scaffolds into 33 PbDtBn scaffolds. For the two contrasting assemblies created with all 
technologies, the two rounds of scaffolding resulted in a total of 254 scaffolds joined in the 
PbBnDt assembly and 268 scaffolds joined in the PbDtBn assembly, a difference of just over 
5%. While Dovetail joined more scaffolds, BioNano had a higher average number of scaffolds 
per join (Supplementary Table S3). 
To determine the characteristics of scaffolds that were being joined, we pulled out 
scaffolds from the input assembly that were joined by either technology in either round (Table 
E.2, Supplementary Table S4). The biggest difference between the two technologies was in the 
ability to join shorter scaffolds. Dovetail was able to join scaffolds as short as 4,765 nucleotides 
into a larger super-scaffold (in both rounds 1 and 2), whereas the minimum scaffold size that 
BioNano was able to join was 172,295 in round 1 and 98,093 in round 2. To further understand 
the ability of Dovetail to join smaller contigs, we quantified the number of input scaffolds less 
than 100kb that each technology was able to join (Supplementary Table S4). Dovetail joined 35 
sub-100kb scaffolds (17 in round 1 and 18 in round 2). BioNano, on the other hand joined only 1 
 281 
sub-100kb scaffold total (in round 2), and that scaffold was nearly 100kb (98,093 nt). Clearly, 
Dovetail is better at incorporating short scaffolds less than 100 kb. 
 While Dovetail appears to be better at incorporating shorter scaffolds, it also appears to 
more effectively join longer scaffolds.  When only scaffolds >= 100kb cutoff were examined, 
Dovetail joined 253 input scaffolds and BioNano joined 237 across both rounds.  Similarly, 
when only very large scaffolds were examined (>=1Mb) Dovetail joined 141 input scaffolds and 
BioNano joined 128 across both rounds. Dovetail had a higher number of joins at each cutoff 
when the data were broken down by each round as well (data not shown). 
Table E.2 Characteristics of Input Scaffolds that were Joined by BioNano and/or Dovetail. 
Assembly Pb -> PbDt Pb -> PbBn PbDt -> PbBn -> 
 PbDtBn PbBnDt 
Scaffolds 172 140 96 114 
Max Scaffold 13,488,151 13,488,151 19,275,758 22,885,216 
Scaffold N50 3,957,684 3,698,567 6,895,511 6,819,834 
Scaffold N90 854,372 929,179 1,425,957 1,427,073 
Min Scaffold 4,765 172,295 98,093 4,765 
Total Scaffold Length 307,402,024 293,002,927 260,974,793 289,680,947 
 
 To identify similarities between the two technologies, we determined whether some of 
the joins made were the same between BioNano and Dovetail. We focused on the first round, 
where each technology was added onto the Pb assembly, looking for cases where the same Pb 
scaffolds were joined into a super-scaffold. There were 47 Pb input scaffolds that were 
scaffolded by both BioNano and Dovetail, resulting in 21 scaffolds in the PbDt assembly and 20 
scaffolds in the PbBn assembly. The fact that these joins were made by two independent 
technologies improves our confidence in these joins. Given that there were also joins made that 
were unique to both technologies supports the increased continuity and additional joins that we 
are seeing in assemblies that have both technologies added. 
 282 
 In order to determine whether Dovetail was breaking apart scaffolds that BioNano had 
previously created by merging Pb scaffolds, we looked further into the Dovetail breaks. In other 
words, we asked whether any of the joins made by BioNano when generating the PbBn assembly 
were subsequently split by Dovetail when applied to the PbBn assembly to generate the PbBnDt 
assembly. From the merged scaffolds generated in the PbBn assembly, only 8 PbBn scaffolds 
were broken by Dovetail in the PbBnDt assembly and no breaks occurred directly inside the gaps 
that had been generated by BioNano (median distance from gap was 137,686 nt). We generally 
found read support spanning these regions, with half or more of the alignments having equally 
good hits to other regions of the assembly (data not shown). This indicates that these were large 
repetitive regions and it was difficult to say confidently whether the region should be joined 
(BioNano correct) or broken (Dovetail correct). 
Joins and Breaks in Relation to A17 
 We used alignments of first round assembly scaffolds (PbBn and PbDt) to A17 to predict 
whether scaffold joins were correct. If joined pieces of a scaffolds mapped to the same A17 
chromosome, this lends support for the join. Because of the evolutionary distance between R108 
and A17, rearrangements are expected, so a negative result doesn’t necessarily mean the join is 
incorrect. However, vastly different rates of A17 synteny between scaffold joins made by 
BioNano and Dovetail would suggest better accuracy for one of the technologies. 
 Scaffolds joined by BioNano mapped to the same A17 chromosome at a rate of 78.57% 
while those joined by Dovetail mapped to the same A17 chromosome at a rate of 93.75%. This 
suggests that Dovetail had a better accuracy than BioNano. Scaffolds with joins that were 
supported by both BioNano and Dovetail appear to be of higher accuracy based on alignments to 
A17. For BioNano, while over half of joins (54.54%) were from scaffolds that had similar joins 
 283 
by Dovetail, only 20.00% of joins that mapped to different A17 chromosomes were supported by 
a similar Dovetail scaffold. This resulted in a 90.91% of Dovetail-supported BioNano joins that 
mapped to the same A17 chromosome, an increase of 12.34% over all BioNano joins. Dovetail, 
had more joins than BioNano (see above), with 36.67% of the joins supported by a similar 
BioNano scaffold. A similar percentage was seen in the number of BioNano-supported Dovetail 
joins compared to all Dovetail joins (33.33%), resulting in 94.29% of BioNano-supported 
Dovetail joins aligning to a single A17 chromosome, representing an increase of 0.54%. 
 Finally, we looked at A17 synteny in the eight PbBn scaffolds that were subsequently 
broken by Dovetail in the PbBnDt assembly. Three of the scaffolds had input pieces that mapped 
to chromosome U (unknown), making it difficult to determine A17 synteny and indicating that 
repetitive sequence is likely that made it difficult to make a chromosome assignment. Of the 
other 5 scaffolds, 3 mapped to the same A17 chromosome, supporting the BioNano join and 2 
mapped to different chromosomes, supporting the subsequent Dovetail break. 
Gaps 
 The sizing of gaps in BioNano versus the addition of 100 nts in Dovetail, resulted in an 
increase in the amount of nucleotides added to the total scaffold length in the first round for 
BioNano compared to Dovetail (Table E.1). 
Table E.3 Characteristics of the gaps introduced into the assemblies by BioNano and Dovetail.  
Note, there are no gaps in the Pb only base assembly so it is not included. 
 PbBn PbDt  PbBnDt PbDtBn 
Captured Gaps  80 116 160 179 
Max Gap  647,836 100 647,836 647,022 
Min Gap 500 100 100 100 
Mean Gap  55,595 100 27,847 16,657 
Gap N50  171,515 100 171,515 105,896 
Total Gap Length  4,447,585 11,600 4,455,585 2,981,533 
 
 284 
 In order to see how the gap strategies of BioNano and Dovetail interact, we analyzed the 
second round assemblies (PbBnDt and PbDtBn), which have both technologies incorporated but 
with differing order. When a second scaffolding or mapping technology was added to an 
assembly that already incorporated the other technology, the gaps from the first technology were 
carried over intact. As noted above, Dovetail sometimes broke apart scaffolds that BioNano had 
put together. However, when breaking these scaffolds, Dovetail never broke the scaffolds within 
the gap generated by BioNano but rather broke it in a nearby position. In assemblies where 
BioNano was added to the PbDt assembly, the minimum gap size that BioNano introduced was 
500 nt. This minimum size might be because 500nt is the minimum gap BioNano can span. 
Alternatively, given that the assemblies are all based upon PacBio data, it may be that smaller 
gaps were easily bridged by the PacBio data itself. 
 The assemblies with both BioNano and Dovetail (PbBnDt and PbDtBn) ended up with a 
similar number of captured gaps (Table E.3). The maximum gap length was over 647 kb, 
generated when adding BioNano onto the Pb assembly. Although Dovetail doesn’t size its gaps, 
given the insert size of ~100kb, it is likely that most of the gaps fall below this range. BioNano, 
with a gap N50 of 171,515 (Table E.3), therefore was able to jump across larger distances than 
Dovetail. 
A similarly sized gap generated when adding BioNano onto the PbDt assembly traces 
back to the same Pb scaffolds as the join made by BioNano on the Pb assembly. Finally, the total 
gap length varies. Among those assemblies that contain sized gaps (PbBn, PbBnDt, and 
PbDtBn), the PbDtBn assembly has considerably fewer nts in gaps compared to the other two. 
This is somewhat surprising given the fact that this assembly has the most gaps of any assembly 
and that there were more joins made over the two rounds in the PbDtBn assembly (268) than 
 285 
over both rounds in the PbBnDt assembly (254) (Supplementary Table S3). Overall, the gap 
sizes in PbDtBn are smaller (Table E.3), accounting for the lower number of nts in gaps. 
Finally, in order to surmise the nature of sequence in the gaps and why contigs stop 
instead of continuing on, we looked at the sequence flanking the gaps (10kb). Interestingly, the 
joins made by BioNano and Dovetail (and the breaks made by Dovetail) were enriched for 
repetitive sequence in the regions flanking the gap introduced with the join (Supplementary 
Figure S1). BioNano and Dovetail both appear to be able to jump across larger repetitive regions 
than is possible with PacBio reads. In other words, the value of the two technologies is often in 
their ability to bridge across repetitive regions that PacBio reads cannot currently cross. 
Ordering of Technologies 
The ordering of the scaffolding or mapping technologies made a difference to the 
continuity and completeness statistics (Table E.1, Supplementary Tables S1 and S2). Using 
Dovetail before BioNano provides multiple benefits. The fact that Dovetail breaks chimeric 
scaffolds automatically means that using it up front provides a cleaner assembly template for 
BioNano. Dovetail’s ability to scaffold much smaller pieces of DNA compared to BioNano 
means that if Dovetail is used up front, more joins will be made and a better base sequence 
assembly constructed. 
Final Assembly Draft 
 In order to create the best reference assembly, we gap-filled the PbDtBn assembly using 
PBJelly (named R108 version 1.0, Table E.4). The PbDtBn assembly was chosen because it had 
slightly better assembly stats compared to PbBnDt (Table E.1, Supplementary Tables S1 and 
S2). For the five preliminary assemblies interrogated above, we did not do any gap filling or 
polishing (except that the base assembly was polished with Quiver) because these methods 
 286 
would obscure the effects that the BioNano and Dovetail technologies were having on the 
assembly process. Nevertheless, PBJelly was used for gap-filling as well as super-scaffolding on 
the final assembly draft in order to improve continuity. While gap filling can be over-aggressive 
especially if flanking sequences are repetitive, having some sequence, even if not perfect, is 
often better than having just Ns. In addition, using Dovetail and then BioNano enabled us to use 
independent data to bring scaffolds together and size the gap between them, making us more 
confident with doing gap-filling. 
Table E.4 Assembly Statistics for R108 version 1.0 (PbDtBn PBJelly gap filled) and its input 
assembly (PbDtBn). 
 R108 v 1.0 PbDtBn 
Contigs 1,016 1,121 
Contig Length 399,348,944 396,973,934 
Contig N50 5,925,378 3,768,504 
Scaffolds 909 942 
Scaffold Length 402,065,285 399,955,467 
Scaffold N50 12,848,239 12,557,854 
 
 PBJelly was able to fill many of the captured gaps, increasing the continuity of the 
PbDtBn assembly (Tables E.1 and E.4). In total, it filled in 415 of 522 gaps (79.50%). As 
expected, gap-filling was able to fill far more small than large gaps, resulting in an increase of 
the gap N50 from 12,335nt to 110,194nt, a nearly 9-fold increase. The latter is much longer than 
typical PacBio reads and may represent repeats that were too long to span with these reads. The 
total gap length was only reduced by 8.82% despite the fact that 79.50% of the gaps were filled, 
again reflecting the preferential filling of small gaps. Nevertheless, continuity is much improved. 
The number of contigs dropped by ~ 12% to just over 1000 (1016 contigs), and the contig N50 
increased from 3,768,504nt to 5,925,378nt, representing an increase of 57.23%. Gap filling had 
little effect on the number of scaffolds, scaffold N50, or total assembly size (differences between 
gap filled and ungapped assemblies were < 0.5%. 
 287 
The completeness stats of the gap filled assembly improved slightly relative to the 
PbDtBn assembly before gap-filling (Supplementary Tables S1 and S2). The final draft R108 v 
1.0, assembly captured 93.2% of Pb reads and 96.8% of Illumina reads. Of the original Illumina 
readset, 95.8% were not only mapped but also properly paired, indicating that the assembly has 
captured most of the genome. The R108 v 1 assembly has captured most of the gene space, with 
estimates ranging from 92.3% for the transcript assembly to 95.2% for the BUSCO assembly, 
and 55,706 genes predicted MAKER-P. Overall, this final draft of the R108 assembly captures 
nearly all the assembly and gene space.  
Novel sequences revealed by the R108 assembly 
 A new high quality reference sequence for R108 allowed a side-by-side comparison of 
two Medicago accessions (A17 and R108). We were able to build chromosome-level synteny 
blocks between R108 and A17. We also found extensive novel sequence in the R108 assembly 
that was not part of the A17 reference assembly (Table E.5). There was nearly 23 Mb of R108 
assembly sequence that could not be found in the A17 assembly. This represents 5.7% of the 
nucleotides in the R108 genome. These “novel” sequences are likely a mix of sequences that are 
truly novel in the R108 genome as well as sequences that are present in both genomes but have 
diverged beyond our ability to detect them or sequences that are in the A17 genome but didn’t 
make it into the A17 assembly. Out the nearly 23 Mb of novel R108 sequence, 1.6 Mb represent 
novel R108 coding sequence that could not be found in the A17 assembly, values quite similar to 
those observed with an earlier ALLPATHS-LG342 assembly of R108334. These regions contain 
candidate R108-specific genes or gene that were deleted from A17 or arose independently in the 
R108 lineage.Table	E.5	R108	v	1.0	assembly	characteristics	in	comparison	to	the	A17	
reference	assembly.	
 288 
 
Table E.5 R108 v 1.0 assembly characteristics in comparison to the A17 reference assembly. 
 Nucleotides % Nucleotides 
Total Bases 399,348,955 100.00% 
Repetitive 96,760,262 24.23% 
Alignable to A17 366,489,898 91.77% 
Bases in Synteny with A17 283,853,354 71.08% 
Novel Sequences vs A17 22,763,508 5.70% 
Novel Coding Sequences vs A17 1,623,097 0.41% 
Chromosomal-scale translocation 
Although R108 is phylogenetically distant from A17 compared to other accessions, we 
were able to align more than 280 Mb of syntenic regions in both genomes (Table E.5), 
representing over 70% of the R108 assembly. These numbers also correspond well with 
sequence comparisons based on an earlier ALLPATHS-LG assembly of R108.334 Within these 
synteny blocks, extensive variations were discovered including single nucleotide changes, small 
insertions and deletions, as well as large structural changes such as inversion and translocation. 
While most structural changes were TE-related and only involve small local regions, we 
identified two large rearrangements on chromosomes 4 and 8 between R108 and A17. Through 
synteny comparison, we found one R108 scaffold (scf005, 16.4Mb) spanning the upper arm of 
chromosome 4 and the lower arm of chromosome 8 in A17, and another two scaffolds (scf015, 
12.0Mb and scf002, 17.6Mb) together spanning the upper arm of chromosome 8 plus the lower 
arm of chromosome 4 (Figure E.1), indicating a chromosomal-scale translocation between the 
reference Medicago accession (A17) and the widely-used R108 accession.  
Previously, Kamphuis et al. reported a rearrangement between linkage groups 4 and 8 in 
the reference accession A17 relative to other accessions.343 Using genetic markers and linkage 
mapping, the authors hypothesized a chromosomal-scale translocation private to A17 which 
 289 
Figure E.1 Synteny alignment of partial chromosomes 4 and 8 between A17 and R108 confirms 
rearrangement of the long arms of the chromosomes. 
involves the lower arms of chromosomes 4 and 8.343 To date, however, the physical location of 
the rearrangement has not been determined and, in fact, the rearrangement itself has not been 
elaborated through genome sequencing. Lack of high quality genome assemblies of non-A17 
accessions certainly hindered such whole genome comparison. However, even with the whole 
genome assemblies available (including the earlier R108 ALLPATHS-LG assembly), it is still 
difficult to fully resolve rearrangement events at such chromosomal scale given the relatively 
short scaffold span of most sequencing and assembly techniques. Figure E.2 clearly illustrates 
the improvements in resolving large-scale structural variation using long PacBio reads together 
with scaffolding or mapping technologies such as Dovetail and BioNano, over traditional 
Illumina-based assembly or assembly based on PacBio reads alone. Using the same synteny 
pipeline we aligned the Illumina-based R108 assembly, assembled with ALLPATHS-LG,342 to 
A17. The rearrangement region (~50Mb) on chromosomes 4 and 8 was split into ~30 
independent scaffolds in the ALLPATHS-LG R108 assembly (Figure E.2, top panel). The 
PacBio-based assembly (Pb), on the other hand, captured the region in ~10 scaffolds and 
partially resolved the breakpoint on chromosome 4 (Figure E.2, middle panel). With the aid of 
BioNano and Dovetail technologies, the affected region was captured in four long scaffolds in 
the final R108 assembly (PacBio+Dovetail+BioNano) with all breakpoints clearly resolved 
 290 
(Figure E.2, bottom panel). We were able to pinpoint exact breakpoints of the translocation to a 
single region on chromosome 4 and three regions on chromosome 8, something that could not be 
done with the Illumina-based ALLPATHS-LG assembly (Figure E.3). Interestingly, each of the 
four breakpoints involves a gap (i.e., ‘N’s) in the A17 reference, with one 7.5 kbp gap and three 
100 bp gaps, the latter representing gaps of undetermined size (Haibao Tang, personal 
communication). These gaps indicate that the regions in and around the rearrangement 
breakpoints are structurally unstable, repetitive and/or difficult to assemble even using a BAC-
by-BAC approach. We found numerous transposable element genes near the breakpoints, 
including a reverse transcriptase, a GAG-pre integrase and a cluster of 6 transferases near 
breakpoint 1, two helicases around breakpoint 2, two retrotransposons (UBN2) and two reverse 
transcriptases around breakpoint 3, and a MULE transposase right next to breakpoint 4. 
Intriguingly, a cluster of at least 10 CC-NBS-LRRs was found both upstream and downstream of 
breakpoint 2, and two CC-NBS-LRRs were also found right next to breakpoint 3, possibly 
suggesting a structural role of these resistance genes in plant genomes.  
 291 
 
. 
Figure E.2 Synteny alignment of partial A17 chromosomes 4 and 8 against syntenic 
regions in the R108 Illumina-based assembly (top panel), PacBio-based assembly (Pb, 
middle panel) as well as the gap-filled PbDtBn (v1.0) assembly (bottom panel). 
 
In addition to the translocation, we noticed two large stretches of R108 sequences (1.15 
Mb and 430 Kb) downstream from the translocation breakpoints on chromosome 4 and 8 (Figure 
E.3 red segments) that didn’t have a syntenic match in A17.  The chromosome 4 insertion in 
R108 is a ~1 Mb region with no synteny to A17 and right next to the chr4-8 translocation 
breakpoint. Both the translocation and insertion are found in several other accessions including 
HM034 and HM185 using a similar synteny comparison approach (data not shown). It is thus 
 292 
likely that the translocation is private to A17, which is consistent with,343 and this large insertion 
in R108 actually represents a private deletion in A17 which is expected to be found in the 
majority of M. truncatula accessions. Further examination revealed that most of the insertion is 
novel. A total of 623 kbp of novel segments that do not align anywhere in A17 were identified in 
this region with 136 genes found in this region (Supplementary Table S5). 
	
Figure E.3 Schematic of the rearrangement between chromosomes 4 and 8 in A17 (left) 
compared to R108 (right). Green segments indicate homology to A17’s chromosome 4 while 
blue segments indicate homology to A17 chromosome 8. Red segments indicate sequences not 
present in the A17 reference). Breakpoint 1 (br1) is pinpointed to a 104 bp region 
(chr4:39,021,788-39,021,891) and includes a 100 bp gap. Breakpoint 2 (br2) is pinpointed to a 
7,665 bp region (chr8:33,996,308-34,003,972) and includes a 7,663 bp gap. Breakpoint 3 (br3) is 
pinpointed to a 708 bp region (chr8: 34,107,285-34,107,992) and includes a 100 bp gap. 
Breakpoint 4 is pinpointed to a 277 bp region (chr8:34,275,249-34,275,525) and includes a 100 
bp gap). 
 293 
Discussion 
This work represents the first published example we are aware of examining multiple 
next generation scaffolding and mapping technologies in all possible combinations with a 
comparative analysis of their contributions. PacBio long reads combined with BioNano and 
Dovetail technologies have allowed us to generate a second, reference quality assembly for the 
model legume, M. truncatula, in the functionally-important R108 accession. In the process, we 
discovered important insights into how these technologies overlap and complement each other 
enabling us to propose an optimal strategy for their incorporation. 
Novel Sequence Was Found in the R108 Assembly 
Long reads improve the continuity of assemblies.315, 344-348 However, continuity is only 
one advantage of using long reads. The long reads help to correctly capture ambiguous regions of 
the genome in the assembly, including repeats and tandemly duplicated genes. Locally 
duplicated genes can be especially problematic as they are often collapsed or over-expanded in 
Illumina-only or even Illumina/PacBio hybrid assemblies (Miller et al, submitted). Using PacBio 
long reads, therefore, results in capture of additional sequence that is not possible with short 
reads. In addition, we capture accession specific sequences as well. In total, over 22 Mb of novel 
sequence, including 1.6 Mb of coding sequence were identified. 
Technologies Made Similar Continuity Gains and Are Valuable 
Individually  
Similar continuity gains were made by each technology in each round, as was seen in.313 
Both technologies improved the base Pb assembly, improving the 3.8 Mb scaffold N50 of the Pb 
assembly to just over 6.8 Mb (Table E.1). Indeed, many of the same joins were made between 
both of the technologies. Both technologies, individually, were valuable in increasing continuity. 
 294 
Despite the challenges of assembly the M. truncatula genome, with its history of whole 
genome duplication and high rate of locate duplication, there are many plant genomes that are 
much more complicated than the 500 Mb, largely homozygous Medicago truncatula genome. 
Increases in genome size, repetitive content, and the number of tandem, segmental, or whole 
genome duplications will change the dynamics of the assembly and the contributions of the 
technologies. In  Medicago described here, the PacBio assembly came together quite well, 
making the improvements when using BioNano and Dovetail less dramatic than they might have 
been. As genome complexity increases, including repeat and duplication content, coherent 
PacBio assemblies become increasingly difficult. As PacBio assemblies become more 
fragmented with increased genome complexity, we expect that the improvement in the assembly 
when adding BioNano and/or Dovetail will become increasingly crucial, leading to greater 
relative improvements, even while becoming more challenging. The assembly improvement with 
both technologies should follow similar patterns with increased genome complexity until 
extremely high levels of complexity, especially repeat size, become limiting even for these 
technologies. 
Further Gains Were Made Using Both Technologies 
Though similar gains were seen when using either scaffolding or mapping technology, 
the use of both technologies together increased continuity gains and join numbers further (Table 
1 and Supplementary Table S3).313 With a combined approach the two technologies were 
complementary by enabling additional joins than either Dovetail or BioNano could make 
independently. Using both scaffolding technologies in either order (PbDtBn or PbBnDt) 
increased the scaffold N50 to just over 12.1 Mb (Table E.1). 
 295 
One explanation for the complementarity between the two technologies may be a 
function of the differences in biases of the two technologies. BioNano’s information content is in 
restriction sites and the distances between them. As such, BioNano is highly dependent on the 
motif density of the restriction enzymes used,349, 350 which can vary within a genome 
(Supplementary Figure S2A). Genomic regions where motif density is high become “fragile 
sites”, that destabilize the DNA and resulting in limited or no coverage in the maps, and breaks 
in the genome map contigs.26, 303, 310, 350  In these regions scaffolding of the assembly simply 
cannot occur. By contrast, regions of the genome with too low of a density of cutting sites also 
will result in low label density and missed join opportunities (a minimum of eight restriction 
sites is required in each DNA molecule, which is a minimum of 150 kb). 
Dovetail is based on Hi-C technology, an extension of chromosome conformation 
capture, which has its own documented biases.351, 352 Dovetail’s information content is “contact 
probabilities,” indicating the probability that any two regions in the genome will be brought 
together during the ligation stage and is inversely correlated with distance. Dovetail, which 
incorporates Illumina sequencing, also inherits biases in next generation sequencing and 
alignment, such as biases in the amplification, shearing and mapping steps. 
Join Accuracy Appears to be Higher in Dovetail Compared To BioNano 
Using A17 synteny as a proxy for accuracy of joined R108 scaffolds, Dovetail had a 
much higher percentage of joins mapping to the same A17 chromosome compared to BioNano 
(93.75% vs 78.57%), suggesting that accuracy is higher in Dovetail than in BioNano. Further, 
when looking at joins in scaffolds supported by both technologies, Dovetail-supported BioNano 
joins mapped to the same A17 chromosome 90.91%, an increase of 12.34% over all BioNano 
joins. This suggests that Dovetail confirmation increases the accuracy of BioNano joins. 
 296 
BioNano-supported Dovetail joins, however, increased mapping to the same A17 chromosome 
by only 0.54%, suggesting that BioNano confirmation did little to improve Dovetail accuracy. 
These data argue that Dovetail joins are more accurate than BioNano joins. However, we 
cannot rule out that the possibility that the larger distances that the BioNano technology spanned 
while joining scaffolds (described above) might make it less likely that two joined scaffolds fall 
into a region that is syntenic with A17 given that synteny tends to decrease with distance. 
BioNano-joined scaffolds, therefore, might map to multiple A17 chromosomes more than 
Dovetail-joined scaffolds due to synteny breakdown rather than inaccuracy of joins. However, 
given that BioNano gaps span less than 200kb and that the majority of the R108 genome has 
synteny blocks with A17 that are greater than 1 Mb (Figures 1-3),334 we expect this different to 
be small and the difference between Dovetail and BioNano join accuracy to be real. 
Alternatively, Dovetail breaks performed much worse than joins using A17 synteny as a 
measure. Of the PbBn scaffolds subsequently broken by Dovetail in the PbBnDt assembly, only 
40% of them mapped to different A17 chromosomes, indicating that Dovetail might be breaking 
more correct BioNano joins than incorrect ones. 
A17 chromosomal mapping is far from a perfect gold standard given the evolutionary 
distance between A17 and R108. Joined segments of R108 scaffolds that map to different A17 
chromosomes may still map to the same R108 chromosome. Indeed, one of the joins shared by 
both Dovetail and BioNano that mapped to different A17 chromosomes corresponds to the 
known chromosome 4/8 translocation. This join, therefore, is correct, even though synteny to 
A17 put it on two different chromosomes. It is possible that there are other regions where 
synteny to A17 doesn’t accurately predict synteny in R108. Using long-range physical 
information, such as Hi-C data or a genetic map that involves R108, could allow us to better 
 297 
validate the BioNano and Dovetail technologies as well as to obtain chromosome-scale ordering 
of the genome assembly. 
Strengths and Weaknesses Dictate Strategy for Ordering Technologies 
For the final assembly, we chose to gap-fill the PbDtBn assembly rather than the PbBnDt 
assembly. This decision was based not only on comparisons of important assembly continuity 
and completeness statistics, as described above, but also on the knowledge we uncovered about 
the differences between the scaffolding and mapping technologies. 
One important difference between the two technologies is their ability to incorporate 
smaller scaffolds. In our study, Dovetail incorporated thirty-five small scaffolds (less than 100 
kb) over both rounds but BioNano incorporated only one. The minimum scaffold size joined by 
BioNano (98.1 kb) was more than 20 times larger than the minimum scaffold size joined by 
Dovetail (4.8 kb).  Similar results were found when applying BioNano maps to the short arm of 
wheat chromosome 7D where the optimum size for incorporation by BioNano was 90 kb or 
higher350 and sequences shorter than 30 kb could not anchored reliably. Given that the scaffold 
N50 was 3.7 Mb in the Pb assembly to which these technologies was added, the discrepancy 
between the two technologies in joining scaffolds less than 100 kb did not have as great an effect 
on our assemblies. However, if a much more fragmented assembly were used, we would expect 
Dovetail to perform much better than BioNano if only one scaffolding or mapping technology 
were used. If both technologies are used, applying Dovetail first to incorporate the smaller 
scaffolds and create a more contiguous substrate for BioNano to use makes sense and would be 
especially critical for highly fragmented assemblies. 
A second difference in the two technologies also supports applying Dovetail prior to 
BioNano for combined strategies. Dovetail breaks sequences it identifies as chimeric as it runs 
 298 
the software. BioNano logs potential chimeric sequences, but does not induce breaks in the 
assembly without manual intervention. Hence, if BioNano is applied first, chimeric contigs may 
not yet be properly separated when the assembler’s master plan for scaffolding is being formed. 
Having a more accurate assembly up-front, as should occur when Dovetail is applied first, is 
always best before scaffolding assemblies. 
Both technologies were able to bridge larger duplicated and/or repetitive regions than was 
PacBio, which requires multiple reads long enough to span an ambiguous region. With only ten 
percent of the sequenced nts in PacBio reads longer than 18,555 nt (N10), the ability of PacBio 
to span ambiguous regions is likely limited to a similar size, though longer reads will increase 
the size of the spannable repeats. Therefore, both mapping technologies can add value for 
spanning ambiguous regions that are beyond the reach of current PacBio capabilities. However, 
both technologies are limited in the size of gap they can span. Dovetail is limited by its longest 
pairs, which in this study, likely kept joins to around 100kb or less, though without sized gaps it 
is difficult to figure out the true maximum. BioNano can join scaffolds over much larger gaps. 
The largest span made in this study created a gap of nearly 650kb, though most joins spanned 
less than 100 kb (Table E.3). Nevertheless, Dovetail and BioNano both were able to span 
ambiguous regions that were beyond PacBio’s current capability. 
Conclusions 
 The use and analysis of both BioNano and Dovetail technologies in all possible 
combinations is novel and yielded strategic information about how best to apply these strategies 
to PacBio. Both technologies were able to span repetitive regions that PacBio was unable to 
bridge. Using PacBio, followed by Dovetail and then BioNano, and then gap-filled with PBJelly, 
we have generated a second, reference quality assembly for M. truncatula. Because of the 
 299 
distance between R108 and the A17 reference as well as the inability to interbreed them to create 
a genetic map, having a second high quality M. truncatula reference has been a priority in the 
Medicago truncatula community. A second reference assembly has yielded novel sequence and 
will be an important resource for the R108 functional community to support gene-finding in the 
Tnt1 lines. The R108 reference assembly has also allowed us to investigate the details of the A17 
translocation.  
Methods 
We generated five genome assemblies: a PacBio only assembly (Pb), a PacBio base 
assembly that was scaffold together with either Dovetail (PbDt) or BioNano (PtBn), a Pb base 
assembly that was scaffold together with Dovetail and then BioNano (PbDtBn) and a Pb  base 
assembly that was scaffold together with BioNano and then Dovetail (PbBnDt).  The 
completeness of each assembly was evaluated by alignments of PacBio reads as well as 
independent Illumina reads, and capture of an independent transcriptome as well as core 
eukaryotic genes. For comparison, we used the A17 version 4.0 reference genome.339 
PacBio Sequencing and Assembly 
DNA for PacBio assemblies was obtained from fifty grams of young leaf tissue obtained 
from multiple plants grown in the greenhouse and dark-treated for 24 hours. High molecular 
weight genomic DNA was generated by Amplicon Express (Pullman, WA) using their standard 
BAC nuclei prep followed by a CTAB liquid DNA precipitation. 
Whole-genome DNA sequencing was performed using a Pacific Biosciences RS II 
instrument (Pacific BioSciences, Menlo Park, CA). Libraries were constructed using the PacBio 
20-Kb protocol.172 These libraries were loaded onto 122 SMRT cells and sequenced using P4/P6 
 300 
polymerase and C2/C4 chemistry with 3- and 6-hour movie times, respectively. PacBio 
sequencing yielded approximately 107X sequence coverage. A de novo assembly of PacBio 
reads was generated using FALCON315 assembler version 0.4 using default parameters. Contigs 
smaller than 1kb were removed. In order to improve the accuracy of the assembly, Quiver 
polishing was done on SMRT portal  (version smrtanalysis_2.3.0.140936.p5.167094) using the 
"RS_Resequencing" protocol using the latest version available at the time. 
Dovetail 
DNA from Amplicon Express (described above) was used. A Chicago library (Dovetail 
Genomics LLC, Santa Cruz, CA)29 was generated using the DpnII restriction endonuclease 
(GATC). Briefly, this entailed reconstituting chromatin using purified histones and chromatin 
assembly factors, followed by cross-linking the chromatin using formaldehyde. DNA was then 
digested using the DpnII restriction endonuclease. The resulting sticky ends were filled in with 
thiolated and biotinylated nucleotides. A blunt end ligation of free ends followed by removal of 
the crosslinking and proteins yielded fragments with DNA joined across distances of up to about 
100 kb. An exonuclease was used to remove the biotinylated nucleotides. The thiolated 
nucleotides, which were proximal to the biotinylated nucleotides, protected the DNA from 
further exonucleation. 
The resulting DNA fragments were taken through a standard Illumina library prep, 
including shearing and adapter ligation. The library was sequenced on an Illumina HiSeq 2000 
(2x100 Base Pairs) to a physical coverage level of ~588X (67X sequence coverage). 
Sequence data generated from this library were used to scaffold the PacBio de novo 
assembly through Dovetail’s HiRise™ pipeline v. 1.3.0-57-g4d1fc9b.29 In short, Chicago library 
reads were mapped back to the assembly using a modified version of SNAP 
 301 
(http://snap.cs.berkeley.edu/). Pairs in which both reads were uniquely mapped were used to 
generate a  likelihood model representing how chromatin crosslinking brings sequences together. 
A graph where the nodes are contigs and the edges are ordered integer pairs representing 
placement of the paired reads in the contigs was used for scaffolding beginning with high 
confidence linear subpaths and prioritizing joins in order of log likelihood improvement. During 
the process, in addition to joining sequences, putative chimeric sequences were broken. An 
iterative approach was taken by feeding the resulting scaffolds back into the pipeline. 
Refinement of local ordering and orientation and gap closing using Meraculous’s Marauder 
module was done at the end [60]. 
BioNano 
Five grams of young leaf tissue was obtained from greenhouse-grown plants dark-treated 
for 24 hours before harvest. High molecular weight DNA was extracted and a de novo whole 
genome map assembly was generated using the BioNano Genomics (BNG) (BioNano Genomics, 
San Diego, CA) platform at the Bioinformatics Center at Kansas State University. High 
Molecular Weight (HMW) DNA was nicked and labeled according to the IrysPrep protocol. In 
brief, HMW DNA was double digested by a cocktail of single-stranded nicking endonucleases, 
Nt.BspQI (GCTCTTC) and Nt.BbvCI (CCTCAGC), and then labeled with a fluorescent-dUTP 
nucleotide analog using Taq polymerase. Nicks were ligated with Taq DNA ligase and the 
backbone of the labeled DNA was stained using the intercalating dye, YOYO-1. The nicked and 
labeled DNA was then loaded onto an IrysChip for imaging automatically on the Irys system 
(BioNano Genomics). BNG molecules were filtered with a minimum length of 150 kb and 8 
minimum labels. A p-value threshold for the BNG assembler was set to a minimum of 2.6e-9. 
 302 
Molecules were assembled with BioNano Pipeline Version 2884 and RefAligner Version 
2816.349  
For BioNano scaffolding, hybridScaffold.pl version 4618 from BioNano Genomics was 
used. The input assembly fasta sequence was nicked in silico for Nt.BspQI and Nt.BbvCI labels. 
Consensus Maps (CMAP) were only created for scaffolds > 20 kbp with > 5 labels.  A p-value of 
1e-10 was used as a minimum confidence value to output initial (BNG consensus map to in silico 
cmap). The final (in silico cmap to final hybrid cmap) alignments and a p-value of 1e-13 were 
used as minimum confidence value to flag chimeric/conflicting alignments and to merge 
alignments. Scaffolds that were not super-scaffolded were added to the output from 
hybridScaffold.pl. 
The BNG scaffolding pipeline identifies potential breaks that should be made to the base 
assembly in the form of a chimera file, but these suggested breaks are not made without manual 
intervention. We did not attempt to make any of the BioNano breaks. For BioNano joins, only 
joins that incorporated more than one scaffold were considered. 
BioNano sizes gaps but does not fill them exclusively with Ns. Rather, BioNano adds in 
restriction site recognition sequences within the gap according to where restriction sites were 
seen in the BioNano map. This results in hundreds of tiny contigs which break up the BioNano 
gaps into smaller fragments. For the purposes of this paper, we used the GAEMR basic stats 
default of using 200 as a minimum contig size, effectively ignoring these restriction sites island 
for calculating assembly statistics and obtaining a single gap per join. 
Illumina  
 In order to compare the completeness of assemblies constructed with different 
combinations of PacBio, Dovetail, and BioNano, we collected Illumina data that was 
 303 
independent of the assemblies. Illumina short-insert paired ends were generated from an 
independent DNA sample using TrueSeq v3.0 chemistry and sequenced on an Illumina HiSeq® 
2000. A total of 332,236,248 reads (71.4X coverage) of length 100 nt were generated.  
Transcriptome assembly 
To evaluate how the transcriptome was represented in the genome assemblies, the 
transcriptome of 14 day old R108 roots was sequenced using Illumina’s RNA-Seq protocol. The 
transcriptome was assembled using the Transcriptome Assembly Pipeline (BPA2.1.0).353 The 
BPA pipeline includes a kmer sweep assembly strategy with ABySS (using the kmer values of 
50, 60, 70, 80 and 90),354 followed by an OLC (overlap layout consensus) assembly with 
CAP3355 to find overlaps between contigs (unitigs). Scaffolding with ABySS and gap closure 
were performed to obtain the final assembled transcriptome sequences.354 The transcripts were 
clustered at 98% sequence identity using the CD-HIT-EST software.125 Finally, the set of 
transcript sequences were filtered by length (minimum length of 100bp). An additional filtering 
step using ESTScan356 was performed to identify open reading frames using M. truncatula 
protein coding genes as a reference, yielding the final transcriptome set. Transcripts were 
mapped against each of the five assemblies using GMAP.357 Transcript hits were retained if 
aligning along at least 90% of their sequence with at least 90% identity.  
BUSCO 
Benchmarking Universal Single Copy Orthologs (BUSCO) provides a quantitative 
assessment of genome assemblies based on orthologs selected from OrthoDB.31 Assembly 
assessments were performed using the plant early release of BUSCO v1.1b1, which contains 956 
genes that are present in at least 90% of the plant species used to assemble the database.31  
 304 
tBLASTn searches were used to identify BUSCOs followed by Augustus gene predictions and 
classified into lineage specific matches using HMMER within the BUSCO package. 
Read alignments 
In order to assess the completeness of the assembly, PacBio filtered (minimum length of 
50 and minimum quality of 75) subreads were realigned to the five assemblies using the BLASR 
mapper [67].358 All the subreads were considered for the alignment to the assemblies (-
useallccs). Illumina reads were aligned to the five assemblies using the Burrows-Wheeler 
Aligner (BWA),207 version 0.7.12 with a maximum of 2 paths and sam output format. 
Structural Annotation 
To understand how gene sequences were affected by the assembly strategies, the 
MAKER-P genome annotation pipeline was used to annotate the five genome assemblies.183, 186, 
187 All available M. truncatula R108 transcripts were assembled using the Trinity Assembler. All 
transcripts were from a single tissue, root, which is not ideal. Nevertheless, GMAP alignments to 
A17 indicate that the transcript assembly contains the majority of genes. Further, within the five 
assemblies, relative capture rates of these transcripts should not be biased by the lack of evidence 
transcripts from multiple tissues. 
The resulting assembly was used as input for expressed sequence tag (EST) evidence for 
MAKER-P annotations.185, 359 The MAKER-P pipeline aligns the provided ESTs to the genome 
and creates ab initio gene predictions with SNAP188 and Augustus180, 360 using evidence-based 
quality values. Each assembly was divided into ten chunks and processed through MAKER-P 
individually. Following completion of MAKER-P runs for each of the ten chunks, fasta and gff 
files were combined using fasta_merge and gff3_merge, respectively, included as part of the 
MAKER-P package. 
 305 
Identification of structural rearrangements and novel sequences in R108 
        Each R108 PacBio-based assembly was first aligned to the A17 reference (i.e., Mt4.0) using 
BLAT.361 The resulting alignments were merged, fixed (removing non-syntenic or overlapping 
alignment blocks) and cleaned (removing alignment blocks containing assembly gaps). BLAT 
Chain/Net tools were then used to obtain a single coverage best alignment net in the target 
genome (HM101) as well as a reciprocal-best alignment net between genomes. Finally, genome-
wide synteny blocks were built for each assembly (against HM101), enabling identification of 
genome structural rearrangements including the chr4-8 translocation.  
        Based on pairwise genome comparison of R108 and A17, we obtained a raw set of novel 
sequences (present in R108 but absent in A17) by subtracting all aligned regions from the gap-
removed assembly. Low-complexity sequences and short tandem repeats were scanned and 
removed using Dustmasker362 and Tandem Repeat Finder363. Potential contaminant sequences 
(best hit in non-plant species) were filtered by BLASTing359 against NCBI Nucleotide (nr/nt) 
database. Genes with more than 50% CDS in these regions comprised the accession-specific 
gene set. Pfam analysis and functional enrichment were then performed on this novel gene list.364 
List of Abbreviations: 
Pb: PacBio 
Dt: Dovetail 
Bn: BioNano 
PbDt: PacBio Dovetail 
PbBn: PacBio BioNano 
PbDtBn: PacBio Dovetail BioNano 
PbBnDt: PacBio BioNano Dovetail 
 
 
 
 306 
Availability of data and material 
The R108 v1.0 assembly, sample information and the raw PacBio reads are available in Genbank 
(BioProject: PRJNA368719, Biosample: SAMN04571790, PacBio reads: SRS1353205, 
assembly MWMB00000000.1). The gene annotation (GFF3) and BioNano (BNX) files are 
available under DOI numbers DOI: 10.13140/RG.2.2.29595.36647 and DOI: 
10.13140/RG.2.2.32950.80964, respectively. The R108 RNA-Seq data are available in the NCBI 
sequence read archive (SRA), under BioProject accession number SRP077692. 
Additional Files 
Supplementary Figures and Tables. Contains Supplementary Figure S1 and Supplementary 
Tables S1-S5. (DOCX 110 kb) DOI 10.1186. 
 307 
 
 
 
 
 
APPENDIX F 
 
SOURCES AND RE-SOURCES:  IMPORTANCE OF NUTRIENTS, RESOURCE 
ALLOCATION, AND ECOLOGY IN MICROALGAL CULTIVATION FOR LIPID 
ACCUMULATION 
  
 308 
Manuscript Information 
 
Fields, Matthew W., Adam Hise, Egan J. Lohman, Tisza Bell, Rob D. Gardner, Luisa Corredor, 
Karen Moll, Brent M. Peyton, Greg W. Characklis, and Robin Gerlach 
 
Applied Microbiology and Technology 
 
Status of Manuscript:  
____ Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
__x_ Published in a peer-reviewed journal 
 
99:11 
 
 
  
 309 
Abstract 
Regardless of current market conditions and availability of conventional petroleum 
sources, alternatives are needed to circumvent future economic and environmental impacts from 
continued exploration and harvesting of conventional hydrocarbons. Diatoms and green algae 
(microalgae) are eukaryotic photoautotrophs that can utilize inorganic carbon (e.g., CO2) as a 
carbon source and sunlight as an energy source, and many microalgae can store carbon and energy 
in the form of neutral lipids. In addition to accumulating useful precursors for biofuels and 
chemical feed-stocks, the use of autotrophic microorganisms can further contribute to reduced CO2 
emissions through utilization of atmospheric CO2.  Because of the inherent connection between 
carbon, nitrogen, and phosphorus in biological systems, macronutrient deprivation has been 
proven to significantly enhance lipid accumulation in different diatom and algae species. However, 
much work is needed to understand the link between carbon, nitrogen, and phosphorus in 
controlling resource allocation at different levels of biological resolution (cellular versus 
ecological). An improved understanding of the relationship between the effects of N, P, and 
micronutrient availability on carbon resource allocation (cell growth versus lipid storage) in 
microalgae is needed in conjunction with life cycle analysis. This Mini-Review will briefly discuss 
the current literature on the use of nutrient-deprivation and other conditions to control and optimize 
microalgal growth in the context of cell and lipid accumulation for scale-up processes.    
  
 310 
Introduction 
In modern societies, petroleum-based products and fuels have strongly influenced human 
culture and infrastructure.  For example, energy, food, and chemicals make up approximately 70% 
of commerce on the planet (www.eia.gov), and petroleum/hydrocarbons directly and indirectly 
impact these commodities.  Petroleum/hydrocarbon markets have become increasingly 
unpredictable and cause destabilized commodity prices (e.g., fuel, food).  In addition, the 
environmental impacts from increased carbon dioxide (CO2) without balanced CO2 sequestration 
has contributed to increases in atmospheric CO2 levels.  The amount of carbon released in one 
year from the consumption of fossil fuels is more than 400-fold the amount of carbon that can be 
fixed via net global primary productivity (Dukes 2003). In order to offset the massive influx of 
CO2 into the atmosphere, the utilization of renewable biofuels (e.g., ethanol, butanol, H2, CH4, 
and biodiesel) is needed.   
 Bacillariophyta (diatoms) and Chlorophyta (green algae) are eukaryotic photoautotrophs 
that can utilize inorganic carbon (e.g., CO2) as a carbon source and sunlight as an energy source, 
and many microalgae can store carbon and energy in the form of neutral lipids [e.g., 
triacylglycerides (TAGs)].  Moreover, different diatoms and algae can produce and accumulate 
different precursors (e.g., carbohydrates, fatty acids, and pigments) that are value-added products.  
In addition to accumulating useful compounds for biofuels and chemical feed stocks, the use of 
autotrophic microorganisms can further contribute to reduced CO2 emissions through utilization 
of atmospheric CO2.  For these reasons, eukaryotic photoautotrophs have been studied in the 
context of lipid accumulation for over 50 years and were a focus of the U.S. Department of 
Energy’s Aquatic Species Program in the 1980s and 1990s.365 However, low petroleum prices 
 311 
eventually eroded monetary support for alternative (and renewable) energy sources until increasing 
petroleum prices over the last two decades reinvigorated interest in alternatives.  The advent and 
increased use of fracking technologies has opened up new petroleum and hydrocarbon reservoirs, 
and almost $190 x 109 was spent in the United States in 2012 to drill and “frac” for conventional 
hydrocarbons (www.eia.gov).  However, the process of fracking increases the production rate and 
not the ultimate supply of hydrocarbons, and peak hydrocarbon production is predicted to occur 
around 2030 (www.eia.gov).  Regardless of current market conditions and availability of 
conventional sources, alternatives are needed to circumvent future economic and environmental 
impacts from continued exploration and harvesting of conventional hydrocarbons.   
 Conservative estimates predict (assuming a lipid content of 25–30%  (w/w) in microalgae) 
that an area equivalent to 3% of the arable cropland in the United States would be required to grow 
sufficient microalgae to replace 50% of the transportation fuel needs in the United States.366, 367 
Although the interest in algal biofuels has been reinvigorated,368-370 significant fundamental and 
applied research is still needed to fully maximize algal biomass and biochemical production for 
biofuels and other products. 
 The accumulation of lipids is of substantial interest because these compounds are energy-
rich biodiesel precursors.371, 372 Much of the reported research has focused on increasing algal lipid 
accumulation upon exposing cultures to a range of environmental stresses prior to harvest.157, 224, 
372-374 Temperature variations, pH, salinity, light, osmotic, and chemical stress inducements have 
also been investigated with varying success.73 While a stress event can increase lipid accumulation, 
it can also limit biomass production, but the stress scenario provides a tractable method to study 
and understand lipid accumulation at the laboratory-scale.74 Because of the inherent connection 
between carbon (C), nitrogen (N), and phosphorus (P) in biological systems, macronutrient 
 312 
deprivation has been proven to significantly enhance lipid accumulation in different diatom and 
algae species. While nitrogen limitation is the most commonly studied stress in green algae and 
diatoms; the effect of silica limitation is regularly studied in diatoms.67, 157, 375, 376 Light and 
temperature are also known stressors that can impact lipid accumulation,372 and particular 
wavelengths have been shown to impact the rate and amount of accumulated lipid in Chlorella.377 
Keeping in mind that a vast majority of living pools of C, N, and P resides in the microbial realm,378 
much work is needed to understand the link between C, N, and P in controlling resource allocation 
both with respect to natural and man-made systems.  In this context, a 50% replacement of 
transportation fuel by renewable biological sources would impose a vast nutrient demand.379 
However, microalgal biomass/product production can be coupled to wastewater resources (e.g., 
water, N, and P), and wastewater from agricultural, industrial, and municipal activity may provide 
a cost-effective source of nutrients. Agricultural and municipal wastewater can be high in N and 
P,380-383 and thus, there is great potential for the integration of wastewater treatment and algal 
biofuel/biomass production (Figure F.1). However, an improved understanding of the relationship 
between the effects of N, P, and micronutrient availability on cellular resource allocation (cell 
growth versus lipid storage) in microalgae is needed. This Mini-Review will briefly discuss the 
current literature on the use of nutrient-deprivation and other conditions to control and optimize 
microalgal culture growth in the context of cell and lipid accumulation.  
 313 
 
Figure F.1 The biological recycling of carbon, nitrogen, and phosphorus to harvest fuel and food 
linked to sunlight to reduce net consumption of N and P and net production of C. 
Nutrient Dependent Lipid Accumulation 
Under optimal growth conditions, (i.e., adequate supply of nutrients including C, N, P and 
sunlight), algal biomass productivity can exceed 30 g dry weight m-2 day-1;384 however, the lipid 
content of the biomass is typically very low (<5% w/w) dependent upon species.384 The low lipid 
content is due to lipid biosynthesis being a metabolic process that is typically stimulated by stress 
inducement. Essentially, biomass synthesis and lipid biosynthesis compete for photosynthetic 
assimilation of inorganic carbon, and a fundamental metabolic switch is required to shift from 
biomass production to energy storage metabolism.385, 386 As denoted by Odum (1985), stress is a 
syndrome that consists of inputs and outputs, and the input is the stressor that is contrasted to the 
stress, or the output.387 Lipids (the output) are typically believed to provide a storage function 
within the cell that enables the organism to endure adverse environmental conditions, i.e., the 
stressor.  The output can be viewed as the cessation of cell production and the accumulation of 
 314 
lipids in response to the input of unbalanced resources (e.g., N, P, and/or sunlight). It is likely that 
there are trade-offs in terms of biomass versus lipid accumulation depending on the different levels 
of perturbation (Figure F.2). 
 
 
Figure F.2 Hypothetical performance curve for an increasingly perturbed (i.e., stressed) 
microalgal system being used to produce photoautotrophic biomass and/or lipids.  Adapted from 
Odum et al. (1979).175 
 
 Recent research has provided evidence that lipids may also act as a reservoir for specific 
fatty acids such as poly-unsaturated fatty acids (PUFAs).388  PUFAs play a key role in the structural 
components of cell membranes, and as antioxidants (PUFAs can counteract free radical formation 
during photosynthesis). As such, PUFA-rich TAGs might donate specific compounds necessary to 
rapidly reorganize membranes through adaptive metabolic responses to sudden changes in 
environmental conditions.389, 390  However, a recent study showed that PUFA content in lipids can 
negatively impact biodiesel quality based upon lipids from Chlorella pyrenoidosa,391 and this 
 315 
result suggests that lipid composition, and not just amounts, should be considered.  In either case, 
lipid is an energy-rich storage compound that can be chemically transesterified to produce fatty 
acid methyl esters (FAME), the biological equivalent to diesel fuel (a.k.a., biodiesel).  However, 
to maximize lipid biosynthesis, the producing organism is typically induced through 
environmental stress conditions.372  In addition, most studies have been based upon axenic cultures 
with limited understanding of potential bacterial “contamination”, and thus, lipid accumulation 
may be different at different scales of biological resolution (discussed below).   
 Significant work has been done to identify and optimize stress inducement strategies that 
enhance lipid accumulation in microalgal species. Nutrient deprivation, specifically nitrogen 
depletion, is the most prevalent technique employed.372 This may be due to two factors: 1) Lack 
of requisite nutrients such as nitrogen limits the capacity to synthesize proteins necessary for 
biomass production (e.g., cellular division). In order to compensate, the organism must take 
advantage of alternative metabolic pathways for inorganic carbon fixation, such as fatty acid 
synthesis and hence store those de novo fatty acids as TAG.392 2) Photosynthesis and the electron 
transport chain in eukaryotic microalgae produce ATP and NADPH as energy “storage” and 
electron carrier metabolites, respectively.393  These metabolites are consumed during biomass 
production resulting in ADP and NADP+, which in turn are regenerated via photosystems.  Under 
normal growth conditions, this cycle maintains a balanced ratio of the reduced and oxidized forms 
of these metabolites; however, when biomass production is impaired due to a lack of requisite 
nutrients, the pool of NADP+ and ADP can become depleted.  This can lead to a potentially 
dangerous situation for the cell because photosynthesis is mainly controlled by light availability, 
and cannot be shut off completely.  Fatty acid synthesis consumes NADPH and ATP; therefore, 
increased fatty acid synthesis replenishes the pool of required electron acceptors in the form of 
 316 
NADP+, and de novo fatty acids are most frequently stored as lipid.394  Here we will review the 
most successful strategies involving nutrient stress to induce lipid accumulation in commonly 
studied microalgal species. 
Nitrogen and Phosphorus 
Nutrient availability is critical for cell division and intracellular metabolite cycling, and 
once nutrients such as N or P become depleted or limited in the medium, invariably a steady decline 
in cellular reproduction rate ensues.  Once this occurs, the activated metabolic pathways 
responsible for biomass production are down-regulated and cells instead divert and deposit much 
of the available C into lipid.395, 396 There have been numerous studies to compare different N 
sources in the context of maximal biomass or lipid accumulation, and the results are different 
dependent upon the organism.397 et al. (2012) accumulated previous literature on 56 eukaryotic, 
photoautotrophic genera studied in the context of lipid accumulation that included (Table F.1).  
The authors chose Chlorella vulgaris, Chlorella zofingiensis, Nannochloris UTEX 1999, 
Neochloris oleoabundans, Scenedesmus obliquus, Dunaliella tertiolecta, Isochrysis galbana, 
Phaeodactylum tricornutum, and Prophyridium cruentum to conduct normalized growth and lipid 
accumulation studies with nitrate as the N-source.397  Under N-deprivation, C. vulgaris, C. 
zofingiensis, N. oleoabundans, and S. obliquus accumulated over 35% dry weight as TAG, and S. 
obliquus and C. zofingiensis had the highest TAG productivity (240 – 320 mg l-1 day-1) among the 
nine compared strains.    
 
 
 
 
 317 
Table F.1 Genera of 56 eukaryotic, photoautotrophs previously studied and reported for the 
accumulation of lipids.  Modified from Breuer et al. (2012).185 
Amphora  Ankitrodesmus Biddulphia  Botryococcus 
Bracteacoccus  Chaetoceros  Chlamydomonas Chlorella 
Chlorococcum  Chroomonas  Cryphecodinium Cryptomonas   
Cylindrotheca  Dictyospaerium Dunaliella  Ellipsoidion   
Emuliania   Enteromorpha  Euglena  Fragilaria  
Glossomastrix  Gymnodinium  Haematococcus Hantzchi  
Hemiselmis  Isochrysis  Monallantus  Monodus  
Nannochloris  Nannochloropsis Navicula  Neochloris  
Nephroselmis  Nitzschia  Ochromonas  Parietochloris  
Pavlova  Phaeodactylum Pheomonas  Polytoma 
Porphyridium  Protosyphon  Prototheca  Rhodomonas 
Rhodosorus  Scenedesmus  Scrippsiella  Selenastrum 
Skeletonema  Stichococcus  Tetraselmis  Thalassiosira 
Ulothirx  Volvox 
 
When the model Chlorophyte Chlamydomonas reinhardtii was cultivated under N 
limitation, an increase in lipid was also observed. Interestingly, fully saturated C16 fatty acids were 
the most abundantly synthesized compounds, whereas polyunsaturated C18 fatty acids remained 
relatively unchanged in this organism under the tested conditions157 While nitrate supported 
increased biomass compared to ammonium in Monoraphidium sp. SB2,398 Chlorococcum 
ellipsoideum exhibited elevated lipid levels with urea compared to nitrate.399 A different 
Scenedesmus strain (sp. R-16) was shown to have the highest lipid accumulation with nitrate 
compared to urea, peptone, or yeast extract.400 To date, nitrate is a commonly studied N source 
used to understand nutrient deprivation to induce lipid accumulation; however, different N sources 
have different effects dependent upon the organism. This is most likely a consequence of typical 
habitat for the organism as well as long-term life history that is common for the respective species.  
As the need for nutrient recycling becomes more evident, different types and mixtures of nutrients 
(e.g., human, agriculture, industrial) must continue to be evaluated. For example, two recent 
 318 
studies investigated the ability of Chlamydomonas polypyrenoideum and Chlorella pyrenoidosa 
to grow and accumulate lipids during cultivation on dairy wastewater,149, 401 and we recently grew 
a green alga isolated from storage ponds of coal-bed water that produced lipids under nutrient 
deprivation.  Nitrogen deprivation was shown to induce lipid accumulation in the wastewater 
isolates, Scenedesmus sp. 131 and Monoraphidium sp. 92 with ammonium, nitrate, or urea402 or 
nitrate depletion in Skeletonema marinoi.403 Interestingly, Ettlia oleoabundans initiated lipid 
accumulation in response to increased temperature before nitrate was completely depleted.404 
These results suggest that different combinations of potential stressors could impact lipid 
accumulation in different ways. 
 In addition to N, P starvation to induce lipid accumulation in microalgae has been studied 
as a sole stress or in combination with N-limitation.  In general, greater lipid accumulation due to 
N deprivation has been observed compared to P deprivation as reported for various Chlorella 
species.405, 406  When the marine diatom Phaeodactylum tricornutum was grown under N and P 
limitation, an increase in lipid accumulation was noticed in all limiting conditions.67, 407 However, 
cultures of P. tricornutum that were limited exclusively in N showed a more significant increase 
in TAG than cultures that were limited solely in P.  The combined limitation of both N and P 
resulted in the highest lipid concentrations in P. tricornutum.67, 74 Given the commonly accepted 
N:P ratio of 16:1 in microalgal biomass,408 the P. tricornutum work demonstrated that the external 
N:P ratio was 27 and the cellular N:P ratio was between 8:1 and 9:1 when lipid accumulation was 
observed.224   
Both N- and P-deprivation result in cell cycle cessation, but the relative lipid accumulation 
response is different, and this observation is most likely a consequence of cellular resource 
allocation (e.g., protein/chlorophyll vs. nucleotides).  Based upon results in P. tricornutum, we 
 319 
observed a five-fold greater increase in specific fluorescence of Nile Red, a commonly used 
indicator of lipid accumulation,115 when cells were depleted of nitrate compared to cells depleted 
of phosphate.  In addition, re-supplementation of N or P promoted cellular growth, cessation of 
lipid accumulation, and increased lipid consumption in P. tricornutum.  
Carbon 
It is important to keep in mind that when comparing different nutrient-deprived states, 
carbon above all else is absolutely required for lipid biosynthesis.409-411 Without carbon, 
independent of nutrient deprivation, biomass or lipid biosynthesis is impossible.  Therefore, the 
most successful reports of lipid induction techniques in microalgal lipid production typically 
involve elevated concentrations of inorganic carbon in tandem with N and/or P limitation.107, 115, 
412  These strategies often employ a CO2 sparge to increase dissolved CO2 above atmospheric 
concentrations, or addition of soluble inorganic carbon during inoculation or just prior to nutrient 
depletion.107, 115  It should be kept in mind that the addition of soluble inorganic carbon (e.g., 
bicarbonate) can also affect pH and osmolarity.  The addition of large amounts of dissolved 
inorganic carbon via a CO2 gas sparge can contribute significantly to the production cost in an 
algal biorefinery (e.g., Liu et al. 2013), and alternative methods to gaseous CO2-based carbon 
supply should be considered in conjunction with pH control.  Gardner et al. (2011, 2012) 
demonstrated that the dosage of small amounts of bicarbonate, solely or in combination with a 
CO2 sparge, can achieve similar algal growth and lipid production yields compared to continuous 
CO2 sparging.107, 115 The use of bicarbonate addition, versus CO2 sparging, could result in 
significantly lower equipment costs.  In either case, elevated concentrations of C, combined with 
N or other nutrient deprivation has been shown to induce lipid accumulation in virtually every 
microalgal species tested.  However, an improved understanding of cellular and population 
 320 
responses to not only the respective concentrations but the ratios of macronutrients (e.g., C, N, and 
P) will improve resource utilization and promote efficient, cost-effective processes.   
Silicon Limitation 
Reports on silicon limitation have revealed that both marine and freshwater diatoms will 
accumulate lipid under Si-limiting conditions (Sharma et al, 2012),412 and diatoms possess 
immense potential as contributors to biodiesel production. When faced with Si-limitations, most 
diatoms appear to direct carbon storage towards lipid,413 albeit the response is dependent on the 
degree of Si content in the cell wall.  Diatoms incorporate biologically available Si as monomeric 
or dimeric silicic acid into silicious cell walls (frustules) and require approximately 7% of the 
energy expenditure required for polysaccharide cell wall formation characteristic of green algae59, 
414, 415 Diatoms produce comparatively less cellular starch, such that fixed carbon has increased 
potential to be allocated to lipid accumulation.115, 416, 413, 417   In fact, diatom cells can accumulate 
enough TAG to cause the frustules to break under silica deplete conditions,59 potentially reducing 
the need for energy intensive procedures associated with lipid extraction in green algae.  
Numerous studies have shown increased lipid accumulation when diatoms are cultured in 
silica deplete media.418-421 However, the majority of these studies were performed on marine 
diatoms (e.g., Cylindrotheca spp., Thalassiosira pseudonana, and Phaeodactylum tricornutum) 
grown in media containing comparatively lower silica concentrations.413, 421, 422 The results of Moll 
et al (2014) indicate that increasing the silica concentration will increase cell numbers, which is 
vital for improving algal biodiesel productivity in terms of increased biomass.49 Therefore, while 
research on marine diatoms for use in biofuel applications may be advantageous for use in large-
scale raceway ponds due to the ability to tolerate saline environments, the actual use may be limited 
until conditions are optimized for diatom cell growth and lipid accumulation.  
 321 
 While silica limitation is known to increase lipid accumulation, combined with other 
physiological stresses, lipid accumulation may be enhanced. A recent study investigated the effect 
of coincident silica and nitrate limitation and HCO3- addition to promote lipid accumulation in a 
freshwater diatom. Moll et al. (2014) observed that combined silica and nitrate limitation, as well 
as sodium bicarbonate addition increased lipid accumulation compared to individual stressors with 
or without HCO3-.49 One hypothesis for this observation is the effect on the cell cycle. Olsen et al. 
(13) and Vaulot et al. (20) revealed that for Thalassiosira weisflogii and Hymenomonas carterae, 
nitrate and silica limitation resulted in halting the cell cycle at G1 and the G1/S and G2/M 
boundaries, respectively.423  It is possible that the two combined nutrient limitations at different 
periods within the cell cycle may contribute to cellular stress and ultimately lead to enhanced lipid 
accumulation in diatoms. 
Iron Limitation 
 As mentioned above, N, P, and C are the most important macronutrients, but Fe is the most 
versatile and important trace element for biochemical catalysis.  Approximately 30 to 40% of the 
world’s oceans are iron limited, and studies have investigated “iron fertilization” experiments 
whereby iron is added to High Nutrient Low Chlorophyll (HNLC) areas to induce phytoplankton 
growth and CO2 fixation.424 Iron-limited conditions are thought to alter cell physiology by 
reducing cell volume, chlorophyll content, and photosynthetic activity, and, thus appear to impact 
cellular accumulation more than lipid accumulation per se.  Specifically in P. tricornutum, the 
following enzymes were down regulated during iron-starvation: β-carbonic anhydrase, 
phosphoribulokinase (PRK), two RuBisCO enzymes and a HCO3- transporter, likely resulting in 
decreased carbon fixation and cellular growth.425  The results suggest that iron limitation greatly 
impacts cell growth and accumulation, and that approximately 10 µmol Fe/mol C is needed by 
 322 
marine algae.426  Iron limitation has also been linked to increased rates of silicification, thus 
increasing cell density and cell sinking. According to Allen et al. (2008), cells grown under Fe-
limited conditions fixed carbon 14 times slower compared to cells grown in iron-replete 
conditions.425 Since iron limitation can result in detrimental physiological effects, it is pertinent to 
determine the potential for these processes to be useful for commercial scale lipid accumulation.   
Biofilm Growth 
 One of the most significant limitations to the economical use of algae is the high cost of 
harvesting and concentrating the biomass.153, 427-429  To date, research has been focused on 
microalgae in suspended phase for lipid production, and few studies have focused on the biofilm 
growth state. However, the biofilm growth state provides some advantages over suspended growth 
systems in terms of biomass accumulation and maintenance that would be beneficial for biomass 
harvesting and concentrating prior to processing. Algal suspensions are often between 0.02% and 
0.06% total suspended solids (TSS), and significant energy is required to harvest and concentrate 
the cells to 5 to 25% TSS.  Biofilms can range from 6 to 16% TSS,429 and could potentially 
minimize biomass-processing costs.427, 428, 430 In general, the available algal biofilm studies are 
based upon wastewater treatment, biofilm structure and development, and aquaculture 
applications.427, 428, 431-433  There is a small amount of research on biofilm systems for the 
production of biomass and lipids in eukaryotic photoautotrophs,153 but very little in relation to the 
influence of environmental stresses. 
Recently, Schnurr et al. (2013) reported biofilm growth under nutrient starvation to 
stimulate lipid accumulation.429 A semi-continuous flat-plate parallel horizontal photobioreactor 
system (PBR) was designed to control the bulk medium nitrogen and silicon concentrations until 
nutrient depletion and biofilm onset. Wastewater was used to seed biofilm growth and was later 
 323 
replaced by synthetic medium and pure cultures of Nitzschia palea and Scenedesmus 
obliquus.Well-attached, thick algal biofilms were observed in all experiments, until N and Si levels 
decreased to below detection limits, resulting in detachment from the substratum.  In contrast to 
suspended algae, the algal biofilms did not accumulate more neutral lipids when exposed to 
nutrient deficient conditions in these studies. Similar results were reported by Bernstein et al. 
(2014) who observed little lipid accumulation in mixed culture wastewater biofilms on the field 
scale or in laboratory-scale algal biofilm reactors seeded with a Botryococcus sp. (strain WC2B).  
Based upon these results, there appears to be fundamental differences in the way suspended 
cultures and biofilm cultures respond to nutrient deprivation. The exact reasons for differences 
between suspended and biofilm cells is unknown, but may be a consequence of altered nutrient 
cycling in biofilm cells due to altered carbon flow for cellular turnover and compound 
accumulation. A result of ‘community growth’ (i.e., biofilms) may be to accumulate excess C and 
reducing equivalent as cells and exopolymer rather than internal storage molecules (e.g., lipid). 
Future work is needed to discern the differences between the physiological states of biofilm and 
free-living cells in multiple species.  
  It is possible that benthic microorganisms would prove more useful for biofilm growth 
modes, and we have recently grown a benthic diatom in biofilm reactors that could accumulate 
lipids.  These results suggest that the two growth modes can elicit different behaviors, and 
numerous research approaches and questions need to be explored to better understand the 
feasibility and cellular responses of microalgal biofilms for biomass and lipid accumulation.   
 
 
 
 324 
Ecological Effects 
The literature offers many examples of increased lipid production in numerous algal 
species cultivated as monocultures in closed photobioreactor (PBR) or open raceway systems 
under varying nutrient limitations.  However, as demonstrated by mathematical models and field 
experiments, phytoplankton biodiversity can be correlated to increased productivity.434-436 
Furthermore, in natural freshwater systems, productivity, measured as biomass, was highest when 
there was abundant nutrient availability.437 These observations underlie the challenge of needing 
high biomass loads to maximize overall lipid production.  Obviously, productivity can fall under 
the guise of several metrics ranging from biomass, cell number, chlorophyll/pigments, and more 
recently lipid accumulation. Despite the success of increased lipid content in nutrient deprived 
monocultures, recent studies indicate that comparable lipid production can also be achieved in 
nutrient rich systems with a diverse community.   
A study by Stockenreiter et al. (2012) demonstrated increased lipid production in a 
naturally occurring algal communities compared to that of single species cultivated in PBRs.438 
For freshwater systems, P is the key nutrient responsible for eutrophication and can greatly alter 
productivity when limited.438-441  Based upon observations with PBRs with deplete nutrient 
availability, one would expect to see substantially higher lipid content in the oligotrophic 
communities than the eutrophic. However, Stockenreiter et al (2012) showed a linear increase in 
total algal lipid content in correlation with species richness of the examined natural communities 
and that lipid content in natural communities did not differ significantly from 22 laboratory 
monocultures (1.4 x 106 pg ml-1 versus 3.3 x 106 pg ml-1).438 Although in no way conclusive, results 
such as these suggest comparable lipid values in nutrient replete and deplete systems and indicate 
 325 
the need to further investigate the relationship between nutrient type and abundance in the context 
of lipid production in mixed communities.442    
A diverse community is also more resistant to invasion from other species that could 
outcompete the desired algal species.434, 443 Higher nutrient availability may also aid in algal 
cultivation by making algae less susceptible to viral infection (see below).  Coupled with 
community diversity, the relative health of each species is an important component.  “Healthy” 
algae (i.e., cells not under nutrient stress) can be more resistant to viral infection that leads to cell 
lysis.444 Rhodes and Martin (2010) developed a theoretical model that implicated high nutrient 
availability in significantly reduced viral infection, and such scenarios will be important to 
consider in microalgal cultivation processes.445  
 In contrast, despite the success of increased lipid accumulation in PBR monocultures under 
nutrient limiting conditions, economic assessments indicate that PBRs operating on a large scale 
may not be commercially viable.446 However, some have argued for hybrid systems that utilize a 
combination of both closed and open systems,447 or modified PBRs such as solid-state reactors.448 
In addition, if ponds are not well mixed, biomass loss due to dark respiration may impact 
performance for some microalgae).449 The ecology of open and closed systems will have different 
parameters and inputs that will need to be considered in order to control and optimize ecosystem 
function (e.g., biomass, lipids, value compounds). Therefore, life-cycle analyses should help direct 
research to identify complementarity between water footprint, nutrient sources, regional light 
availability, process design, and targeted lipid-producing organisms.  
 
 
 326 
Integrating Life-Cycle Analysis 
Algal biofuels have the potential to provide a substantial fraction of United States 
transportation fuel while imposing a relatively small (arable) land footprint,450 and providing 
opportunities for reducing water and nutrient consumption relative to first generation biofuels.451 
The degree to which biology and engineering can contribute to these goals will; however, be a 
function of the entire lifecycle.452 A circumscribed version of that lifecycle, one involving only the 
production cycle (distinct from the usage cycle), includes microbial growth, dewatering/drying, 
extraction/conversion, and energy/input recovery stages, and each stage involves a number of 
choices (Figure F.3).  
 
Figure F.3 Primary stages and (alternative processes) in the microalgae to fuel production 
process. 
 
With respect to benefits, life cycle analysis (LCA) has promoted system optimization by 
highlighting processing alternatives that produce a net increase in system performance, while also 
avoiding environmental “burden shifting” that can be obscured when viewing the production 
system less holistically.453  In addition, choices made in the other three production stages will have 
implications for the growth stage, with the technology selected for extraction/conversion having 
particular importance.454 While each has respective strengths and weaknesses, two of the most 
 327 
critical distinctions from a life-cycle perspective are (a) the degree of pre-conversion drying 
required455 and (b) whether the conversion process involves all of the algal biomass or only the 
lipid fraction.456 
The dependence of transesterification processes on algal lipid content can impose extra 
costs in the growth stage,454 and lipid accumulation procedures typically come at the cost of algal 
productivity.457 As noted by Quinn et al (2013) and Chowdhury et al (2012), increasing lipid 
content can result in an increase in process GHG (greenhouse gas) emissions, because less residual 
biomass is used in a potential energy recovery stage.458, 459 Thus, the grid energy requirement 
increases proportionally with the lipid fraction. Wet extraction transesterification processes, while 
significantly reducing the drying energy input, typically involve solvent-based extractions that lead 
to concerns over solvent disposal.460 In addition, the recycling of these solvents can be challenging 
and energy intensive due to the high volumes and accompanying wet slurry.461 Simultaneous 
extraction and transesterification processes (i.e., “reactive extraction”) offer the potential for 
increased oil yields and lower process costs,462 but the effectiveness of these processes at an 
industrial scale is still untested.463   
Hydrothermal liquefaction (HTL) processes, despite greater capital expense, also reduce 
drying/dewatering requirements through the utilization of a wet feedstock, while converting up to 
60% of the total biomass into a useable fuel product.464 HTL can result in greater fuel yields than 
those achieved via transesterification,465 and this technology may reduce the importance of 
advanced culturing methods to enhance algal lipid accumulation for biofuel production.466 
However, thermochemical conversion methods such as HTL make nutrient recycling less efficient, 
as the nutrient rich byproducts are poorly suited for direct recycling into the growth process or 
 328 
anaerobic digestion.467 In addition, N loss during the conversion process results in a substantially 
increased nutrient requirement in the production process.464   
Life cycle analysis has been, and continues to be, successfully utilized to identify optimal 
algal biofuel production pathways. Ongoing refinement and application of this analytical technique 
can lead to advances that will guide future research toward a better understanding of the 
implications of many important choices, and thereby promote the development of more cost-
effective and environmentally benign biofuel production processes. LCA has successfully 
identified synergies and tradeoffs between the growth stage and other parts of the production 
process, and results suggest that parallel research efforts involving both experimental research and 
life-cycle modeling can be effective.468 
Conclusion 
With the re-invigorated interest in alternative fuels, microalgae provide one option that will 
likely contribute to an overall plan for biomass, biochemical, and biofuel production in a more 
sustainable and efficient manner.  Given the typical ratio of C:N:P in microalgal biomass (C106: 
N16:P1), much of the research has focused on N and P (P to a lesser extent) and these two elements 
are linked in different ways to C through resource allocation at the cellular, population, and 
community levels.  In addition, the supply of C either as CO2 or bicarbonate at critical times in the 
growth cycle can significantly improve lipid and biomass productivity.  Micronutrients also play 
a role in cellular responses and activity, and Si and Fe need to be further studied with respect to 
C:N:P ratios and the allocation of C into desired compounds (e.g., lipids).  Diatoms have potential 
for important contributions to lipid and biomass production but are less studied than the green 
algae.  Many of the nutrient-deprived states have been studied with monocultures (or nearly 
 329 
axenic) as suspended cultures, and regardless of the systems used (e.g., closed reactors vs. open 
ponds), communities will assemble with different characteristics of stability, resiliency, and 
productivity.  In addition, biofilms will likely develop, and may even be desired for the traits of 
accumulated biomass that can provide advantages for harvesting.   
Moreover, while not directly covered in this mini-review, other resources/conditions will 
affect the cultivation of microalgae and include water, climate (e.g., light and temperature), land, 
and location (i.e., geography). Water will be essential for any biological process, and the re-cycle 
of water will be crucial as many parts of the globe become increasingly stressed for potable water. 
Light is obviously an important parameter for phototrophs, and is inherently related to temperature 
as the need for light energy and heat-regulation scale at different proportions.  Land is an essential 
commodity whether bioreactors or ponds are used and should not compete with agricultural needs.  
The location of growth and processing facilities are crucial aspects to be considered via LCA both 
for economic implications as well as the biology/ecology (e.g., biogeography) that can differ from 
region to region.  Therefore, targeted science and engineering research is needed to better inform 
life-cycle analyses and process design to maximize productivity, efficiency, and cost-ratios. 
 
 
 
 
 
 
 
 
 330 
 
 
 
 
 
 
APPENDIX G 
 
DIRECT MEASUREMENT AND CHARACTERIZATION OF ACTIVE PHOTOSYNTHESIS 
ZONES INSIDE WASTEWATER REMEDIATING AND POTENTIAL BIOFUEL 
PRODUCING MICROALGAL BIOFILMS 
  
 331 
Manuscript Information 
 
 
Hans C. Bernstein, Maureen Kesaano, Karen Moll, Terence Smith, Robin Gerlach, Ross P. 
Carlson, Charles D. Miller, Brent M. Peyton, Keith E. Cooksey, Robert D. Gardner, Ronald C. 
Sims 
 
Bioresource Technology 
 
Status of Manuscript:  
____ Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
__x_ Published in a peer-reviewed journal 
 
156 
 
 
 
  
 332 
Abstract 
Microalgal biofilm based technologies are of keen interest due to their high biomass 
concentrations and ability to utilize light and CO2. While photoautotrophic biofilms have long 
been used for wastewater remediation, biofuel production represents a relatively new and under-
represented focus area. However, the direct measurement and characterization of fundamental 
parameters required for industrial control are challenging due to biofilm heterogeneity. This 
study evaluated oxygenic photosynthesis and respiration on two distinct microalgal biofilms 
cultured using a novel rotating algal biofilm reactor operated at field- and laboratory-scales. 
Clear differences in oxygenic-photosynthesis and respiration were observed based on different 
culturing conditions, microalgal composition, light intensity and nitrogen availability. The 
cultures were also evaluated as potential biofuel synthesis strategies. Nitrogen depletion was not 
found to have the same effect on lipid accumulation compared to traditional planktonic 
microalgal studies. Physiological characterizations of these microalgal biofilms identify 
fundamental parameters needed to understand and control process optimization. 
 
Key words: Microalgae; Biofilm; Biofuel; Wastewater Remediation; Photosynthesis 
 
  
 333 
Introduction 
Photoautotrophic microorganisms are used as biotechnology platforms for many 
applications including biofuel production, wastewater remediation, carbon sequestration, and 
agriculture.469-471 Of these, microalgal biofuel production has been identified as especially 
promising due to its potential for sustainable supplementation or replacement of fossil fuels.5, 372 
Traditionally microalgae biotechnologies have focused on suspended, planktonic, culturing 
methodologies designed to facilitate photo-production; the capture and conversion of energy 
from photons into chemical energy stored in extractable biomolecules (e.g., lipids). This study 
focuses on characterization of oxygenic photosynthesis and respiration in photo-biofilm reactors, 
an alternative and often under-represented growth system with benefits over planktonic culture 
such as high cell density which facilitates harvesting and reduces water requirements. 
Biofilms are matrix-enclosed microbial cells attached to biological or non-biological 
surfaces.472 Photoautotrophic biofilms, composed of microalgae and/or cyanobacteria, are 
ubiquitous to nearly all photic aquatic environments. An important attribute of biofilms is that 
they both create and are functionally controlled by gradients in substrates, products and energy 
sources.473 Spatial gradients in light have been shown to directly control rates of oxygenic 
photosynthesis and corresponding oxygen concentrations inside biofilms.474 Oxygen gradients in 
biofilms are directly influenced by diffusion rates and can result in localized supersaturated 
concentrations (with respect to air saturation) during active oxygenic photosynthesis. The 
resulting high oxygen concentrations can inhibit CO2 incorporation and subsequent photo-
production of carbon storage compounds by competing as a substrate for ribulose 1,5-
bisphosphate carboxylase-oxygenase (RuBisCO) activity (Falkowski and Raven, 1997; Glud et 
al., 1992; Kliphuis et al., 2011).475 Thus, the characterization of spatial gradients in oxygenic 
 334 
photosynthesis and respiration activities is a key consideration for microalgal biofilm-based 
technologies.  
This study employed a recently developed rotating algal biofilm reactor (RABR) that was 
designed, built and tested at both the laboratory scale (lab-RABR) and pilot field scale (field-
RABR) (Fig. G.1).428 The advantage of the RABR is the ability to simultaneously facilitate algal 
growth, biomass concentration and dewatering. Biofilm reactors can also reduce the water and 
energy requirements for biomass and photo-production compared to traditional suspended 
culturing strategies.222 
  
Figure G.1 Representative photographs for: (A) tFhieg ufireel d1-  RABR and (B) lab-RABR culturing 
systems designed for algal biofilm cultu(rcionlgo r( irnepserortd ushctoiowns  icnr porsisn-ts) ectioned excised cotton cord 
substratum with biofilm growth). Note the ‘top’ and ‘bottom’ biofilm orientation corresponding 
to the ou ter and inner sections of th e field-RABR wheel, respectively. 
The RABR and other algal-based biofilm technologies have been investigated for their potential 
to concurrently remediate wastewater and produce biofuel precursor molecules.150, 476 The RABR 
can facilitate efficient biomass harvesting via the reported spool harvesting technique.428 
However, optimal biomass harvesting practices need to be determined in the context of biofilm 
specific physiology, such as optimal biomass areal density and biofilm thickness as it relates to 
active photo-production and photosynthesis zones. 
 335 
The current study focuses on spatial physiological characterization of microalgal biofilms 
cultured through the RABR method. The specific aims of this study were to:  
(1) characterize and compare two different RABR biofilms (wastewater remediating and 
potentially biofuel-producing) in the context of active photo-synthesis zones by directly 
measuring spatial gradients in steady-state oxygen and photosynthesis microprofiles, as well as, 
determining rates of photosynthesis and respiration processes.  
(2) Characterize and compare the biofuel potential and (neutral lipid) precursor 
biomolecule composition in these biofilms. In addition to specific aim 2, nitrate starvation was 
investigated as a potential strategy for inducing lipid accumulation in the lab scale RABR 
biofilms. 
Materials and methods 
Laboratory Strains, Culturing Conditions, and Biomass Sampling.  
The Chlorophyte isolate Botryococcus sp. strain WC-2B (hence forth referred to as WC-
2B) was cultured with the 8 L lab-RABR operated in batch mode. WC-2B was isolated from an 
alkaline stream in Yellowstone National Park (USA), confirmed unialgal using SSU 18S rDNA 
and revealed 99% alignment (1,676 bp) with Botryococcus sedeticus UTEX 2629, which has 
previously been described (Senousy et al., 2004). Reactors were operated in triplicate and grown 
at 25ºC in Bold’s basal medium buffered with 25 mM 2-[N-cyclohexylamino]-ethane-sulfonic 
acid (CHES, pKa 9.3) and rotated at 15.3 RPM. All RABR experiments were loaded with 
untreated cotton cord as the biofilm-substratum (0.64 cm diameter).428 The lab-RABRs consisted 
of cords coiled onto plastic cylindrical-spools (10 cm diameter) submerged ~5 cm in the liquid 
medium. The lab-RABRs were cultured under custom light emitting diode (LED) banks (Box 
 336 
Elder Innovations, LLC and T&L Design, Box Elder UT) programed with LabVIEW (National 
Instruments Corp.) to simulate a diurnal cycle with photosynthetically active radiation (PAR) 
values ranging from 0-900 µmol photons·m-2·sec-1 on a 14:10 L/D diel cycle following the 
equation:  
$
𝐼 = 𝑐𝑜𝑠 a! ∗ (𝑡 − 𝑡#)e        
"!
Where I is the light intensity, tL is the total light time in minutes, t is the independent time 
variable, and tM is the midpoint time corresponding to the maximum light intensity. 
Medium nitrate concentrations were monitored using NitraVer 5 pillow packets (HACH). 
Concentrated medium (10x) and supplemental diH2O (deionized) were added, as needed, to 
maintain nutrient replete conditions and offset evaporation. Culturing and sampling was 
performed under non-aseptic conditions (i.e., open-air). Nitrate depletion was induced (after 28 
days of replete culturing) by removing all liquid medium from the reactors followed by 
immediate replacement with Bold’s basal medium without nitrate. Nitrate deplete analysis and 
sampling was performed 60 hr post depletion. 
Biomass cell dry weights (CDW, gCDW·cm-2) were obtained throughout culturing by 
excising a known length cotton cord and its attached biofilm, followed by biofilm removal into 
preweighed aluminum weigh boats. The biomass was dried at 70°C for 18 hr until the biomass 
weight was constant. Biomass CDWs were calculated by subtracting the dry weight of the 
preweighed aluminum boat from the oven dried boat with biomass and normalizing by the 
cylindrical surface area for a known length of cotton cord substratum. 
 
 337 
Outdoor Culturing Conditions.  
Field scale biofilms were cultured outdoors (August 10th – October 17th 2012, Logan, UT, 
USA) with a pilot scale RABR (field-RABR) unit constructed in accordance with previously 
described methods.428 Briefly, biofilms were grown on cotton cord (identical to lab-RABR 
experiments) coiled onto aluminum wheels (193 cm in diameter) which rotated (1.25 RPM) 
partially submerged in ~14,000 L tanks (~10,700 L liquid volume). An important difference 
from the lab-RABR was that the cord-substratum of the field-RABR was exposed to light and 
nutrients from top and bottom (discussed further below). The field-RABR was placed in a 
continuous flow channel of wastewater (~18.9 ºC and pH~7.4) fed at ~1.25 LPM, which was 
drawn from the final pond of the outdoor wastewater lagoon facility (Logan, UT, USA). 
Oxygen Microsensor Analysis.  
Microsensor measurements were performed using Clark-type oxygen micro-electrodes 
with outside tip diameters of 25 µm, response time < 5 s and < 5% stirring sensitivity (Unisense, 
A/S).477 Amplification and sensor positioning were controlled with a microsensor multi-meter 
coupled with an ADC216 USB converter and a motor controlled micromanipulator. Data 
collection was aided by software packages, SensorTrace Pro ver.3.0.1 and Sloper ver. 3.0.3 
(Unisense, A/S). Two point calibrations were performed in air saturated diH2O ([O2] ≈ 260 µM) 
and in a 1 M NaOH, 0.1 M ascorbic acid solution (anoxic standard). Calibrations were repeatedly 
checked in the anoxic standard and in air saturated diH2O throughout the experiments. 
Microsensor measurements were performed between 21 and 25 ºC under both dark and light 
conditions (PAR = 700 µmol photons·m-2·sec-1). Spatial O2 measurements were performed in 
one-dimension (depth-wise) from the biofilm-air interface down towards the cotton cord 
substratum in 25-100 µm steps. The effective diffusion coefficient (De) for O2 in the microalgal 
 338 
biofilms was estimated to be 1.2·10-5 cm2·sec-1, by assuming it to be 50% of the aqueous value 
corresponding to fresh water at 25 ºC.478 The oxygen micro-profile and light:dark shift 
techniques used here have been previously described in detail.474, 479, 480 Briefly, Fick’s law was 
used to calculate the total oxygen flux exported from the surface of the biofilm (net areal rate of 
biofilm photosynthesis or Pn) and from the photic zone inside the biofilm (net areal rate of 
photosynthesis of the photic zone or Pn,phot). Additionally, the light:dark shift measurements were 
used to estimate gross photosynthesis profiles and areal rates (Pg) which represent the total 
amount of oxygenic photosynthesis under the assumptions that: (i) there is an initial steady-state 
O2 distribution prior to darkening, (ii) the O2 consumption rate is identical between the light and 
dark time periods, (iii) the O2 diffusion coefficient remains constant during the measurement 
time at each position. Detailed calculations for oxygen transport, photosynthesis, photosynthesis-
coupled respiration and dark-respiration processes are included in the supplemental material for 
this manuscript. 
Lipid Analysis.  
At the time of oxygen microsensor analysis, bulk biomass was harvested from the 
RABRs and washed, by centrifugation and diH2O resuspension, four times to remove medium 
salts. After which, the biomass was pelleted and frozen for lyophilization and lipid analysis. 
Extractable precursor analysis of free fatty acid, mono-, di-, tri-acyl glycerol (FFA, MAG, DAG, 
and TAG, respectively) was performed in accordance to the reported bead beating extraction 
method coupled with gas chromatography – flame ionization detection (GC-FID).157 
Additionally, biofuel potential, defined as total fatty acid methyl esters (FAME), produced 
directly from the biomass,70, 481 along with fatty acid profiles were determined by a previously 
 339 
described method of direct in situ biomass transesterification and quantified with gas 
chromatography – mass spectroscopy.157  
Results and discussion 
Biofilm Cultivation  
Biofilms were cultured on cotton cord substratum during field and laboratory scale 
RABR experiments (Fig. G.1). Samples from the lab-RABR were analyzed based on nitrate 
replete or deplete conditions. Samples from the field-RABR were separated according to growth 
orientation on the substratum. The field-RABR ‘top’ and ‘bottom’ samples correspond to 
biofilms formed on the outer and inner section of the rotating wheel, respectively. The field-
RABR top biofilms were cultured in an orientation directly exposed to ambient sun light 
(average daily maximum PAR = 1715 µmol photons·m-2·sec-1) compared to the more shaded 
bottom biofilms (average daily maximum PAR = 231 µmol photons·m-2·sec-1). Hence, there 
were four chosen biofilm sample-types analyzed and compared in this study: (i) lab-RABR 
biofilm that is nitrate replete, (ii) lab-RABR biofilm that is nitrate deplete (60 hr deplete 
culturing), (iii) field-RABR biofilm cultured on the top (outer wheel biofilm), and (iv) field-
RABR biofilm cultured on the bottom (inner wheel biofilm). It is important to emphasize that the 
laboratory and field-RABR systems are not identical and represent two different process 
objectives and are intended to be compared independently of each other. However, a future goal 
for the  
 
 
 
 340 
Table G.1 Measurements of areal photosynthesis rates, areal respiration rates and relevant depth 
scales for the laboratory- and field-RABR cultured biofilms. 
Table 1. 
Laboratory 
Areal rates (µmol O2·cm- Field RABR Top Field RABR Laboratory RABR 
2·sec-1) Biofilm Bottom Biofilm RABR Nitrate 
Replete Nitrate 
Deplete 
Photosynthesis, Pg a 11.84·10-4 a 5.23·10-4 a 7.51·10-4 a 5.70·10-4 
Net areal rate of biofilm 3.01·10-4 3.55·10-4 2.31·10-4 2.41·10-4 
photosynthesis, Pn (%Pg) (25.4%) (67.9%) (30.8%) (42.3%) 
 -4 -4 -4 -4 
Net areal rate of photic zone 3.64·10 3.96·10 3.10·10 2.91·10
photosynthesis, P (%P ) (30.7%) (75.7%) (41.3%) (51.1%) 
n,phot, g
 -4 -4 -4 -4 
Areal respiration of the 8.83·10 1.68·10 5.20·10 3.29·10
biofilm, R (74.6%) (32.1%) (69.2%) (57.7%) 
light (%Pg) 
 -4 -4 -4 -4 
Areal respiration of the 8.20·10 1.27·10 4.41·10 2.79·10
photic zone, R  (%P ) (69.3%) (24.3%) (58.7%) (48.9%) 
phot g
 
0.54·10-4 1.11·10-4 0.65·10-4 
Respiration in the dark, 0.74·10-4 
R     
dark 
 
Depth of photic zone, Lphot b1100 ± 200 b 900 ± 200 b 675 ± 25 b 650 ± 25 
(µm) 
 
Depth of oxic zone in light b 1750 ± 25 b 1800 ± 25 > 2675 > 2675 
(µm) 
 
Depth of oxic zone in dark b 700 ± 25 b 450 ± 25 b 850 ± 25 b 1150 ± 25 
(µm) 
a Mean of 2-3 independent measurements plus or minus a range of 25% from the mean. 
b Plus or minus measurement step-size, n = 2-3 
  
 
RABR technology is to better integrate the wastewater remediating and biofuel producing 
processes; hence a minimal number of comparisons based on basic biofilm physiology were 
made between the two systems. 
The maximum specific growth rates, measured during exponential phase, were 0.09 and 
0.17 day-1 for the laboratory and field cultured biofilms, respectively. The maximum measured 
biomass areal density (observed during stationary phase) were 0.36 and 0.65 gCDW·cm-2 for the 
lab- and field-RABRs, respectively. The final biomass areal density decreased by 0.01 gCDW·cm-2 
60 hr post nitrate depletion in the lab-RABR biofilms, potentially indicating minor biomass 
 341 
sloughing or degradation. The measured biofilm thickness (distance from substratum to biofilm 
surface at late stationary phase) was approximately 1 mm for each lab-RABR biofilm (nitrate 
replete or deplete) and approximately 2 mm for each field-RABR biofilm (top and bottom). 
Field-RABR for Wastewater Remediation 
Biofilm Heterogeneity. Direct, spatially resolved measurements of steady state oxygen 
profiles revealed differences between the biofilms formed on the top and bottom of the field-
RABR wastewater remediating system. The illuminated portions of both biofilms near the 
surfaces became supersaturated with O2, reaching concentrations over 600 µM which was 
approximately 3X the measured O2 saturation of the bulk wastewater (Fig. G.2A and G.2B). 
Both biofilms were oxic to depth of approximately1800 µm below the surface while illuminated 
(Table G.1). Steady-state oxygen profiles were also obtained after 15 min of dark conditioning 
(Fig. G.3C and G.3B) and the corresponding oxic-zone depths were 700 and 450µm in the top 
and bottom biofilms, respectively. This is evidence for higher oxygen consumption potential in 
the darkened bottom oriented biofilm (discussed in more detail below). 
 342 
 
Figure G.2 Field-RABR: dissolved oxygen microprofiles measured in the light extending from 
the surface for biofilms grown on the (A) outer wheel surface and (B) inner wheel surface; 
dissolved oxygen microprofiles measured in the dark for biofilms grown on the (C) outer wheel 
surface and (D) inner wheel surface; and photosynthesis profiles extending from the surface for 
biofilms grown on the (E) outer wheel surface and (F) inner wheel surface. Note that the biofilm 
surface position (depth = 0) is approximated by the position at which oxygen responses were 
measureable (subject to ± 25 µm error or ± 100 µm error for the photosynthesis profiles where 
each data point is a representative gross volumetric photosynthesis rate from 2-3 replicates.) and 
individual data points represent the mean values from 3-4 replicate profiles in both light and dark 
conditions. Error bars represent plus or minus one standard deviation. Dotted lines indicate the 
photic-zone termination depth, estimated from the light:dark shift method. Note the scale change 
on the x-axis. 
 343 
Oxygen gradients measured in the steady-state microprofiles show that these wastewater 
remediating biofilms maintain spatially varied microenvironments which may promote niche 
environments capable of supporting different microbial metabolic strategies. A significant 
portion of both biofilm samples remained anoxic during the experimental illuminated conditions 
(~10%) and in the dark (~50%). However, it is possible that these biofilms become fully oxic at 
or near peak solar irradiance during field cultivation. In the field these systems are also subject to 
temporal gradients in solar irradiance, temperature and nutrient flux. It is important to note that 
the measurements reported in this study are specific for standardized and constant incident 
irradiance and only represent comparative physiological potentials for these biofilms. 
The field-RABR was inoculated with the native wastewater microbial flora and was 
composed of a complex community of environmental biofilm-forming microorganisms including 
phototrophs and heterotrophs. Initial 454 pyrosequence analyses indicated a high level of 
diversity in the field-RABR biofilms, where cyanobacteria (predominately Oscillatoria sp. and 
Leptolyngbya sp.) and bacterial heterotrophs accounted for significant fractions of the microbial 
population (Miller et al., unpublished data). However, further molecular work is required to 
elucidate the microbial community differences between the two biofilms with respect to their 
orientation of growth. It is important to reemphasize that the ‘top’ and ‘bottom’ biofilms were 
formed simultaneously on different sides of the same cotton cord substratum and analyzed with 
microsensors ex situ under identical conditions. Other than growth orientation, these biofilms 
were cultured identically and were only spatially separated by the diameter of the 0.64 cm cotton 
cord substratum. 
 
 344 
Oxygenic-Photosynthesis. Direct measurements of oxygenic-photosynthesis rates 
quantified fundamental physiological differences in the field-RABR biofilms based only on 
orientation of biofilm formation (Fig. G.2E and G.2F). The measured areal rate of gross 
photosynthesis (Pg) in the top biofilm was ~2X greater than the bottom, signifying a much higher 
potential for photosynthetic electron acquisition (proportional to Pg) from the environment 
(Table G.1). This result was attributed to the availability of solar irradiance (PAR) during biofilm 
growth/formation which differed between 1715 compared to 231 µmol photons·m-2·sec-1 for the 
top and bottom, respectively.  
The active zone of photosynthesis is defined here as the position in the biofilm where the 
volumetric gross photosynthesis rate [Pg(z)] is greater than zero and its depth assumed to be 
equal to the biofilm photic zone (Lphot, corresponding to PAR). The Lphot value was only slightly 
higher in top biofilm (Table G.1) indicating that the penetration depths of actinic light are 
comparable when illuminated at the same incident irradiance. The minor differences observed in 
Lphot values may translate into effective diffusion coefficient variability; however, these variances 
are expected to be very small based on previously reported measurements478 and were not 
considered here in detail. A key observation for this system is that the top oriented biofilms are 
capable of producing oxygen at greater than twice the rate per photon attenuated than the 
neighboring bottom biofilm. Additionally, the respective zones of active photosynthesis are 
nearly identical under standardized incident irradiance. This observation qualitatively indicates 
that the areal quantum yields are greater for the biofilms formed under a higher incident solar 
irradiance. Rigorous quantification of spatially defined quantum yields and photosynthetic 
efficiencies are beyond the scope of this study although the present result is consistent with 
established photo-physiological observations.475 
 345 
Net areal photosynthesis rates were equated to the diffusive flux of oxygen transported 
from the biofilm surface (Pn) or the photic zone (Pn,phot) and both measurements were greater in 
the bottom formed biofilms compared to the top oriented samples (Table G.1). This difference is 
more pronounced and meaningful when interpreted as a percentage of Pg which is a proxy for the 
total photosynthetically derived oxygen.  Net photosynthesis rates for the entire biofilm (Pn) 
represent 67.9% of Pg in the bottom biofilm as compared to only 25.4% in the top. These 
percentage differences are even greater when evaluated for Pn,phot, which includes consideration 
of oxygen transported to the anoxic portions in the biofilm. These results confirm that net oxygen 
production rates alone are not representative of the oxygenic photosynthesis potential for these 
samples and that the bottom orientated biofilms have the capacity to provide a greater flux of 
oxygen to bulk waste-water environment. 
The Pn values measured for this study are only representative of steady-state reaction and 
diffusion processes. However, the rotating mechanism employed by the RABR alternates the 
biofilms between different light and fluid regimes in a periodic fashion corresponding to the 
submerged-liquid and ambient air surroundings. Diffusive oxygen flux was measured inside the 
biofilms and the steady-state microprofiles obtained on biofilms exposed to ambient air did not 
provide enough resolution to identify or determine the thicknesses of the diffusive boundary 
layer (DBL) at the surface of the biofilms. However, DBLs almost certainly were present and are 
not ruled out as important regulating factors in the oxygen transport processes, especially while 
being exposed to the liquid medium during rotation. It has been previously established that DBL 
thickness is a function of the velocity differential between the biofilm and bulk fluid.474, 482 This 
is an important consideration for RABR operation since the rotational speed can be optimized to 
reduce the effects of mass transfer limitations external to the biofilm. This highlights a future 
 346 
area of characterization for the field-RABR biofilms that has the potential to enhance the 
biofilms productivity by minimizing mass transfer limitations. 
Areal-Respiration Rates. The difference between gross and net areal photosynthesis rates 
provided direct measurements of photosynthesis-coupled respiration and revealed physiological 
distinctions between the two field-RABR biofilms. Areal photosynthesis coupled respiration 
rates were measured during illumination for the entire biofilm (Rlight) and within just the photic 
zone (Rphot). Both measurements were more than 5X higher in the top formed biofilms compared 
to the bottom (Table G.1). Respiration rates accounted for greater percentages of Pg than the 
corresponding Pn values in the biofilms formed top. The opposite was true for the bottom 
orientated biofilms.  
In contrast to the photosynthesis-coupled respiration rates, areal respiration rates in the 
dark (Rdark) were ~2X greater for the biofilms formed on the bottom biofilms as compared to the 
top (Table G.1). Respiration rates in corresponded directly to higher localized oxygen 
concentrations. This observation indicates that respiration in these biofilm consortia increases 
with oxygen concentration and production rate which are both functions of actinic light 
availability. This provides evidence of photo-respiration processes (e.g., RuBisCO-oxygenase 
activity) acting in concert heterotrophic oxygen consumption. The bottom oriented biofilm 
sample has a higher capacity for light-independent heterotrophic respiration compared to the top 
sample which is evinced by the higher Rdark values. 
Photosynthesis-coupled respiration is defined here to include any respiration occurring in 
the active zone of photosynthesis and can be advantageous to overall photo-production by 
lowering the localized O2/CO2 ratios inside the biofilm and resulting in higher selectivity for CO2 
fixation at the RuBisCO complex.475, 480, 483 Oxygen removal via heterotrophic or non-oxygenic 
 347 
community member activity is hypothesized to be a beneficial attribute to these waste-water 
remediating biofilm ecosystems.  Hence, the encouragement and control of localized respiration 
processes, independent of photo-respiration,  is identified here as a potentially important design 
feature for RABR operation and other  photosynthetic biofilm reactor technologies and should be 
considered for future optimization of photo-production. 
The top oriented field-RABR biofilm samples showed the highest rates of gross-oxygenic 
photosynthesis and respiration (both Rlight and Rphot). These two processes are tightly coupled 
inside biofilms and not considered independent from each other. In fact, it has been shown 
previously that photosynthesis and respiration increase concurrently with increasing irradiance in 
tightly controlled laboratory cultured algal biofilms.484 The differences between these closely 
associated wastewater remediating biofilm’s capacities for photosynthesis and respiration are a 
result of varied solar irradiance delivered during the culturing process. The top oriented biofilms 
were formed with an 87% greater incident irradiance (PAR) compared to its close neighbor 
formed on the bottom of the cotton cord substratum. This is a practical result since it is well 
established that different growth environments with respect to solar irradiance availability have 
been shown to promote different expression levels of components comprising the light 
harvesting complexes, non-photosynthetic accessory pigments (e.g., carotenoids)  and respiration 
components (e.g., terminal oxidases) in photosynthetic systems.475 
Nitrogen Depletion in Lab-RABR Samples 
Biofilm Heterogeneity. The lab-RABR biofilms, formed from the known lipid 
accumulating WC-2B strain, established oxygen gradients under both illuminated and dark 
conditions. The microprofiles revealed only subtle differences between biofilms subjected to 
nitrate replete and deplete conditions. Similar to the field-RABR biofilms, the illuminated 
 348 
surface associated positions from both replete and deplete biofilm samples became 
supersaturated with O2, reaching ~3X the measured O2 saturation of the medium (Fig. G.3A and 
G.3B). During illumination, the oxic zone extended to depths greater than 2675 µm below the 
biofilm surface (~1675 µm into the substratum) where the flux of oxygen became very low. The 
WC-2B biofilms showed oxygen transport, driven by consumption, in portions of the substratum 
indicating that some biofilm was formed within the cotton cord pore volume. This was also 
observed by confocal microscopy (Supplemental Fig. S2 and S3). These lab-RABR biofilms 
showed a higher degree of spatial heterogeneity with respect to replicate oxygen profiles 
compared to the field-RABR biofilms (evident by the larger standard deviation in Fig. G.3 as 
compared to Fig. G.2). This increased variance between measurements taken below Lphot 
positions could result from biofilm spatial heterogeneity specific for cells attached within the 
cotton material. Steady-state oxygen profiles were also obtained after 15 min of dark 
conditioning (Fig. G.3C and G.3D). The oxic zones in the absence of light ranged from 850-1150 
for the nitrate replete and deplete biofilms, respectively; indicating that the nitrogen starved 
biofilms had a lower potential for heterotrophic oxygen consumption (discussed in more detail 
below). 
Oxygenic-Photosynthesis and Respiration. Direct measurements of oxygenic-
photosynthesis and respiration rates quantified physiological differences in the RABR grown 
WC-2B biofilms cultured under nitrate replete and deplete conditions (Fig. G.3E and G.3F).  
Again, photosynthesis rates were measured as both net and gross production of photochemically 
derived oxygen at the biofilm scale. The WC-2B biofilms exhibited higher Pg values (~30%) 
during nitrate replete conditions indicating a greater potential for electron acquisition from the 
environment when not starved for nitrogen resource (Table G.1). The active zones of 
 349 
photosynthesis, evaluated as the portion of the biofilm between the surface and Lphot, were 
practically indistinguishable (within 25 µm) between the two nitrate viability conditions. This 
measurement supports the observation that actinic light was fully attenuated by the same depth 
and that the oxygenic photosynthesis reaction volumes were near identical under both conditions. 
This observation qualitatively establishes that the WC-2B biofilms exhibit higher photosynthetic 
quantum yields during nitrate replete conditions. 
 
Figure G.3 lab-RABR: dissolved oxygen microprofiles measured in the light extending from the 
surface for biofilms grown in (A) nitrate replete and (B) nitrate deplete conditions; dissolved 
 350 
oxygen microprofiles measured in the dark for biofilms grown in (C) nitrate replete and (D) 
nitrate deplete conditions; and photosynthesis profiles extending from the surface for biofilms 
grown in (E) nitrate deplete and (F) nitrate deplete. Note that the biofilm surface position (depth 
= 0) is approximated by the position at which oxygen responses were measurable (subject to ± 25 
µm error or ± 100 µm error for the photosynthesis profiles where each data point is a 
representative gross volumetric photosynthesis rate from 2-3 replicates.) and individual data 
points represent the mean values from 3-4 replicate profiles in both light and dark conditions. 
Error bars represent plus or minus one standard deviation. Dotted lines indicate the photic-zone 
termination depth, estimated from the light:dark shift method. Note the scale change on the x-
axis. 
Differences in the net areal rates of photosynthesis (both Pn and Pn,phot) between the two 
nitrate availability conditions were not as pronounced. However, both Pn and Pn,phot represented a 
greater percentage of Pg under nitrate deplete conditions. This observation is attributed to lower 
areal rates of photosynthetically-coupled respiration during nitrate starvation. Again, the Rlight 
and Rphot values were measured as the difference between Pg and respective net areal 
photosynthesis rates. Nitrate replete conditions promoted ~20% increase in photosynthesis 
coupled respiration rates. The Rdark measurements were greater during nitrate starvation 
indicating a higher capacity for heterotrophic (or light independent) respiration. However, the 
max areal respiration rates were observed during illumination and corresponded with increased 
Pg. This was consistent with the observations made on the field-RABR biofilms. However, it 
should be noted that these lab-RABR samples are unialgal cultures and unlike the waste-water 
remediating system, do not represent a complex community of phototrophs and heterotrophs. 
Hence, respiration occurring within the lab-RABR biofilms is attributed to WC-2B physiology. 
Although as a whole, there were only small differences observed in rates of 
photosynthesis and respiration between the two lab-RABR nitrate conditions, this data suggests 
two important findings. First, the results imply that nitrate depletion in the medium does not have 
a strong effect on the general physiology of the biofilm because only a small fraction of the 
biofilm (outer surface) is actively performing photochemical production under nitrogen replete 
 351 
conditions. Secondly, the biofilm remained photosynthetically active under non-growth 
conditions hinting at the importance of maintenance energy for cell viability and the potential for 
nitrogen (re-)cycling. These are important observations, within the setting of algal lipid 
production, since nitrogen stress is a common strategy for triggering triacylglycerol 
accumulation in planktonic microalgal cultures.374, 485, 486 
The first specific aim of this study was to characterize and compare the two different 
RABR biofilms (lab- and field-scale) in the context of active photo-synthesis and spatial 
gradients in steady-state oxygen and photosynthesis. Of the physiological parameters measured 
for this specific aim, photosynthesis-coupled respiration is of special interest and should be 
considered a potent design parameter for controlling local O2/CO2 ratios to promote carbon 
fixation and subsequent photo-productivity. One potential strategy for maximizing gross 
photosynthesis while minimizing localized oxygen concentration would be to promote 
heterotrophic activity via mixed culturing techniques. Evidence for this lies in the observation 
that the field-RABR top-oriented biofilm community, as compared to the WC-2B lab-RABR 
biofilm, displayed a higher potential for electron acquisition from the environment (proportional 
to Pg) while channeling much greater percentages of photosynthetically derived oxygen into 
respiration processes.   A hybridization of the wastewater remediating and biofuel production 
processes, may be better achieved via mixed species inoculation or ‘seeding’ with known lipid 
accumulating photoautotrophic community members combined with compatible heterotrophic 
oxygen scavengers. Consortial cooperation in microbial biofilm technology has previously been 
demonstrated in a number of different cell factory systems.487 
 
 
 352 
Biofuel Precursor Production.  
Extractable lipid fractions were recovered from all biofilm samples and analyzed by gas 
chromatography for assessment of biofuel properties (Table G.2). In addition, direct 
transesterification was performed on the lyophilized biomass to identify fatty acids and to 
determine total biofuel potential (extractable and non-extractable) for each biofilm-type (Table 
G.3). Modest increases of extractable precursor concentrations were measured in the nitrate 
deplete biofilms, as compared to the nitrate replete conditions. This observation was also 
qualitatively confirmed in microscopy images (compare Fig. S2A and S2B); where Bodipy 
505/515 was used to visualize the neutral lipid precursors. The total potential FAME-weight %, 
representative of the total biofuel potential of the biofilm, was modestly higher for the lab-RABR 
biofilms that were deplete of nitrate. 
The most notable observations regarding lipid production in the lab-RABR biofilms, 
were the differences in the total extractable weight % of lipids (sum of the FFA, MAG, DAG, 
and TAGs) between the nitrate replete and deplete conditions, 4.3 ± 0.4% and 7.3 ± 0.7% (w/w), 
respectively (Table G.2). The largest differences were observed in the DAG and TAG weight % 
and the respective areal concentrations. Although the WC-2B biofilms exhibited the expected, 
reasonable biofuel potentials; the lab-RABR production-system is not considered optimized for 
biofuel production. This is evident in the fact that the extractable precursors only accumulated to  
Table G.2 Mean extractable biofuel precursor weight % and areal concentrations for the 
laboratory- and field-RABR cultured biofilms (n = 3 with one standard deviation error, or 
n=2 with range reported as error). 
 353 
7.3 % (w/w) of the biomass, which was significantly less than planktonic cultures of WC-2B that 
can accumulate up to 13.9 % (w/w) of biomass as extractable precursors (7.7 % (w/w) of which 
is TAG) under high pH and nitrate deplete conditions (Gardner, unpublished data). This evinces 
that medium nitrate depletion alone may not be an effective condition for inducing TAG 
accumulation in microalgal biofilms, likely due to heterogeneous distributions of nutrients like 
nitrate caused by mass transfer limitations and the resulting distribution in microalgal activity. It 
should be noted that comparisons of these preliminary biofilm oil-production systems to well-
mixed planktonic systems does not account for culturing times, biomass production rates or 
differences associated with required operating costs (e.g., energy required for mixing or biomass 
harvesting or water input requirements).  
Biofilms cultured on the field-RABR had the lowest weight percentage in both 
extractable precursor molecules and potential FAMEs, 2.9 ± 1.1% and 5.1 ± 1.0% (w/w), 
respectively (Table G.2 and Table G.3). This observation coincides with the relatively high 
respiration rates measured in the samples (discussed earlier). The field-RABR biofilm samples 
are clearly not optimized for biodiesel (i.e., total FAMEs) production under the current culturing 
conditions. This could be, in part, due to colonization of non-lipid accumulating microbial 
community native to the wastewater (Fig. S2). However, the field-RABR exhibited higher 
biomass productivity (P = ∆gcdw/∆time) and total biomass areal density compared to the lab-
RABR. Hence, the areal concentration of FAME transesterified from the field-RABR biofilms 
yielded similar values as the nitrate deplete WC-2B biofilms (Table G.3). 
Table G.3 Mean FAME %, weight %, and areal concentration from the laboratory- and field-
RABR cultured biofilms. Biomass was directly transesterified to determine total biofuel potential 
 354 
from all fatty acid precursor molecules (extractable and non-extractable) (n=3 with one standard 
deviation error, or n = 2 with range reported as error). 
 
The second specific aim of this study was to characterize and compare the biofuel 
potential and (neutral lipid) precursor biomolecule composition in these biofilms. Although the 
current RABR systems are not considered optimized, lipid accumulation in algal biofilms is 
possible and reasonable if the microbial composition is constrained to known lipid producers 
such as the WC-2B isolate used here in the lab-RABR system. Future optimization is needed 
including the investigation of other industrially relevant algal strains such as Botryococcus 
braunii or Chlorella vulgaris. The field-RABR culturing system is a more practical and 
industrially scalable system compared to the lab-RABR. However, the current system is not 
considered viable for biodiesel production since it only accumulated 2.9 ± 1.1% and 5.1 ± 1.0% 
(w/w) precursor molecules and potential FAMEs, respectively. Future optimization and 
experimentation of the field-scale system will require methodologies for enhanced control of the 
microbial community composition to select for better lipid accumulation. It should be noted that 
although biodiesel production via production of fatty acids is low in the field system, it is still a 
viable technique for biomass production from wastewater resource under the current conditions. 
Biofuel production from this system has been previously reported by using the field-RABR 
 355 
derived algal biomass for acetone, butanol, and ethanol fermentation by Clostridium 
saccharoperbutylacetonicum.488 
As part of the objective from specific aim 2, nitrate starvation was investigated as a 
potential strategy for inducing lipid accumulation in the lab-scale RABR cultures. Although a 
modest increase in extractable precursors was observed, nitrogen stress as implemented here by a 
60 hr depletion was not identified to be as viable for “triggering” lipid accumulation in biofilms 
as compared to previously reported results for suspended culture studies.374, 489 This biofilm 
specific result is consistent with another previously reported study which focused on nutrient 
starvation (including nitrate) in cultures composed of fresh water green alga Scenedesmus 
obliquus and marine diatom Nitzschia palea.429 This previous study tested biofilm growth and 
lipid accumulation in algae cultured under relatively low shear in flat plate biofilm photo-
reactors and reported no significant changes in lipid concentration (% dry weight) between 
nitrate replete and deplete conditions. This is in minor contrast to the results from the current 
study which observed an approximate 2-3 % w/w increase after nitrate depletion and was also 
qualitatively confirmed via microscopy analysis (Supplemental Fig. S2 and S3). Additionally, 
the Schnurr et al. study reported significant and near complete biomass sloughing post nitrate 
depletion which was not observed as dramatically in the lab-RABR biofilms within the 60 hr 
nitrate deplete phase. This could be due to the different substratum materials (i.e., glass-plate 
compared to porous cotton cord material) and/or localized shear-stress at the biofilm surfaces. 
The combined results between the current and previously reported study429 indicate that inducing 
lipid accumulation via nutrient starvation may be possible but future culturing optimization is 
needed to evaluate the effects of known parameters associated with lipid accumulation in algal 
biofilms, such as nitrate and/or pH stress or chemical addition.374, 489, 490,72, 386  
 356 
Conclusions  
This manuscript explores critical photosynthetic parameters in conjunction with biofuel 
precursor molecule production in biofilms cultured though the novel RABR system. The lab-
RABR exhibited moderate biofuel capabilities yet requires process optimization. The wastewater 
remediating field-RABR biofilm exhibited higher rates of photosynthesis and respiration 
depending on the position of biofilm formation with respect to ambient sunlight, but is not 
currently a viable biodiesel production platform. This study developed a methodological 
foundation for directly measuring photosynthetic parameters fundamental to the physiology and 
design of efficient photosynthetic energy harvesting platforms and establishes a benchmark for 
the quantitative analysis of phototrophic biofilm technologies.  
Appendix G: Supplementary Data 
Supplementary data associated with this article can be found, in the online version, at 
http://dx.doi.org/10.1016/j.biortech.2014. 01.001.  
 
 
 
 
 
 
 
 
 357 
 
 
 
 
 
 
 
APPENDIX H 
DISSOLVED INORGANIC CARBON ENHANCED GROWTH, NUTRIENT UPTAKE, AND 
LIPID ACCUMULATION IN WASTEWATER GROWN MICROALGAL BIOFILMS  
  
 358 
Manuscript Information 
Maureen Kesaano, Robert D. Gardner, Karen Moll, Ellen Lauchnor, Robin Gerlach, Brent M. 
Peyton, Ronald C. Sims 
Bioresource Technology 
 
Status of Manuscript: 
____ Prepared for submission to a peer-reviewed journal 
____ Officially submitted to a peer-reviewed journal 
____ Accepted by a peer-reviewed journal 
__x_ Published in a peer-reviewed journal 
 
180 
 
 
 
  
 359 
Abstract 
Microalgal biofilms grown to evaluate potential nutrient removal options for wastewaters 
and feedstock for biofuels production were studied to determine the influence of bicarbonate 
amendment on their growth, nutrient uptake capacity, and lipid accumulation after nitrogen 
starvation. No significant differences in growth rates, nutrient removal, or lipid accumulation 
were observed in the algal biofilms with or without bicarbonate amendment. The biofilms 
possibly did not experience carbon-limited conditions because of the large reservoir of dissolved 
inorganic carbon in the medium. However, an increase in photosynthetic rates was observed in 
algal biofilms amended with bicarbonate. The influence of bicarbonate on photosynthetic and 
respiration rates was especially noticeable in biofilms that experienced nitrogen stress. Medium 
nitrogen depletion was not a suitable stimulant for lipid production in the algal biofilms and as 
such, focus should be directed towards optimizing growth and biomass productivities to 
compensate for the low lipid yields and increase nutrient uptake.  
 
 
 
 
  
Keywords: Microalgae, biofilms, wastewater, dissolved inorganic carbon, biofuels 
 
 
 
 360 
Introduction 
Cultivation of microalgae in wastewater streams has been proposed as a means of 
reducing competition for freshwater sources, as an inexpensive source of nutrients, and as a 
biological wastewater treatment alternative.491, 492 Microalgae can utilize nutrients in wastewater 
for growth to generate considerable amounts of biomass. However, recovery of microalgae from 
the liquid medium is difficult and represents a substantial capital cost in suspended cultivation 
systems,369, 493 consequently there is a growing interest in attached algal growth platforms. Algal 
biofilm based systems such as the rotating algal biofilm reactor (RABR), algal turf scrubber 
(ATS™), revolving algal bioreactor (RAB), and Algaewheel® have been developed, and algal 
biofilm growth demonstrated in bench and pilot scale operations.428, 469, 494-496 However, there is 
still limited fundamental information on algal biofilm physiological processes and growth 
especially in wastewater remediation.   
Widespread application of algal biofilm-based systems is also limited but can be 
promoted through integration of wastewater treatment with the production of valuable 
bioproducts from the harvested algal biomass. Algal biomass composition (i.e., lipid, 
carbohydrate, and protein content) is influenced by the chemical composition of the medium and 
the environmental growth conditions (e.g., temperature, pH, and light), which subsequently 
determines the by-products that can be synthesized. Conventionally, microalgae grown as 
feedstock for biofuels require a two stage process where biomass accumulation occurs under 
nutrient-rich conditions followed by an environmental challenge to induce secondary byproduct 
accumulation (e.g., tri-acylglycerols as energy storage compounds).497 Nutrient starvation is 
typically employed as an environmental stress to stimulate lipid biosynthesis in microalgae 
 361 
cultures.412, 498, 499 However, stimulation of lipid production in algal biofilms as a result of 
nutrient starvation has not been as successful as in suspended cultures.153, 429 
Furthermore, information on the use of other lipid inducing techniques such as chemical 
addition, pH stress, and temperature either independently evaluated or in combination with 
nutrient starvation is limited in algal biofilm studies. For example, addition of bicarbonate salts 
(HCO3-) was reported as an effective trigger for lipid production in nutrient limited suspended 
microalgae cultures.70, 490, 500, 501 The bicarbonate salts not only induce lipid production, but also 
provide a stable and readily available source of inorganic carbon essential for photosynthesis and 
microalgae growth.113, 374, 502 In addition, Glud et al. (1992) observed an increase in 
photosynthetic rates and a simultaneous reduction in respiration rates (17%) in a diatom-
dominated biofilm community amended with bicarbonate.480 
The potential use of bicarbonate in minimizing photorespiration is especially of interest 
in algal biofilms because of the high O2/CO2 ratios due to localized supersaturated oxygen 
concentration from active oxygen photosynthesis.153, 480 Photorespiration is a competing process 
to carboxylation, where ribulose-1,5-biphosphate carboxylase oxygenase (RuBisCO) acts as an 
oxygenase, thereby inhibiting carbon dioxide fixation and subsequently reducing photosynthetic 
efficiency. The study presented here evaluated the effects of adding dissolved inorganic carbon 
in the form of 2 mM HCO3- to synthetic wastewater medium to grow algal biofilms in order to:  
(1) Enhance algal biofilm growth, nutrient uptake, and lipid accumulation during nutrient deplete 
culturing  
(2) Increase photosynthetic rates with biofilm depth within the photic zone 
 
 362 
Materials and methods 
Microalgal biofilm culturing and sampling 
The chlorophyte isolate Botryococcus sp. strain WC-2B, previously described in 
Bernstein et al. (2014), was cultured in 8 L laboratory scale rotating algal biofilm reactors 
(RABRs) operated at 12 rpm and 25oC. Each reactor was comprised of two plastic cylindrical 
wheels (10 cm diameter) onto which 3/16 inch (diameter) untreated cotton cord was attached as 
the biofilm substratum. Synthetic wastewater was made to simulate typical medium strength 
domestic wastewater for total nitrogen (TN) and total phosphorus (TP) concentrations without a 
carbon source.503 The medium consisted of 60 mg L-1 NH4Cl, 150 mg L-1 NaNO3, 16 mg L-1 
Na2HPO4, 15 mg L-1 K2HPO4, 4 mg L-1 KH2PO4, 75 mg L-1 MgSO4.7H2O, 25 mg L-1 
CaCl2.H2O, and micronutrients (8.82 mg L-1 ZnSO4.7H2O, 1.44 mg L-1 MnCl2.4H2O, 0.71 mg L-
1 MoO3, 1.57 mg L-1 CuSO4.5H2O, 0.49 mg L-1 Co(NO3)2.6H2O and 4.98 mg L-1 FeSO4). 
The experimental set up consisted of four laboratory RABRs under fluorescent lights 
with a photosynthetically active radiation (PAR) of 227 ± 65 µmol m-2 s-1 on a 14:10 L/D cycle. 
Duplicate reactors were amended with 2 mM HCO3- in the form of NaHCO3 and another 
duplicate set without HCO3- amendment was cultured for comparison. The reactors were 
operated in sequenced batch mode with a 5 day hydraulic retention time (HRT) for a period of 18 
days, after which nitrogen stress was induced for an additional 5 days by replacing all liquid 
medium with synthetic wastewater without a nitrogen source. For each cycle of hydraulic 
retention time, the reactors were drained, cleaned, and filled with fresh medium. Prior to the start 
of the experiment, the medium was inoculated with microalgae and the RABRs operated for 3 
days (seeding period) to allow the microalgae to attach to the rope strands. As shown in Fig. 1, 
after the seeding period, the RABRs with the exception of the substratum (rope strands) were 
 363 
covered with black polyethylene sheet to minimize microalgae growth in the liquid medium. 
Culturing and sampling was performed under non-aseptic conditions (open air).  
Rope samples with attached microalgae were excised for oxygen microsensor 
measurements, microscopy characterization, biomass dry weight measurements, and lipid 
analysis. Biomass cell dry weights (CDW, gcdw m-2) were obtained by removing the biofilm from 
a known length of cord into a pre-weighed aluminum weigh boat using a flat end spatula. The 
biomass was dried at 70oC for 18 h until the biomass weight was constant. Biomass CDWs were 
calculated by subtracting the dry weight of the oven dried boat with biomass and normalizing by 
the total cylindrical surface area for the length of cotton cord substratum excised.  
Water quality monitoring 
Nitrate (NO3-), nitrite (NO2-), and orthophosphate (PO43-) concentrations were monitored 
in the bulk medium and measured by ion chromatography (IC) using a Dionex IonPac AS22 
carbonate eluent anion-exchange column set at a flow rate of 1.2 mL min-1. IC data was analyzed 
by Chromeleon 7 Chromatography Data system (CDS) software. Ammonium (NH4+-N) 
concentrations were determined according to the 2-phenylphenol method with a BioTek 
PowerWave XS microplate reader (Vermont, USA) at an absorbance of 660 nm.504 The 
dissolved inorganic carbon (DIC) was measured on 8 mL filtered (0.2 μm pore size filters) 
medium samples using a Skalar FormacsHT/TN TOC/TN analyzer (model CA16, Netherlands) and 
Skalar LAS-160 autosampler. DIC was quantified using peak area correlation against a standard 
curve from a bicarbonate-carbonate mixture (Sigma Aldrich). Culture pH and optical density 
(OD) measurements were taken using a standard laboratory Accumet pH electrode (Fisher 
Scientific) and Genesys 10 UV-Model 10-S spectrophotometer (Thermo Electron Corporation), 
respectively.  
 364 
Oxygen microsensor analysis  
Clark-type oxygen microelectrodes (10 µm tip diameter; OX-10 Unisense) and 
specialized computer controlled hardware (Unisense) were used to analyze the reactive transport 
of dissolved oxygen with biofilm depth under steady-state diffusive conditions corresponding to 
light and dark conditions. Photosynthetic rates (coupled with photo-respiration) were estimated 
using the light/dark shift technique.474, 477 The light/dark shift measurements are valid under the 
following assumptions: (1) initial steady state oxygen distribution is achieved before darkening, 
(2) oxygen consumption rates before and after dark incubation are identical, and (3) identical 
diffusive fluxes are maintained during the measurement time at each position.  Two point 
calibrations were performed for the oxic conditions (medium saturated with air) and anoxic 
conditions (medium sparged with nitrogen gas). 
Biodiesel analysis 
Biodiesel precursors i.e. free fatty acids (FFAs), mono-acylgylcerols (MAGs), di-
acylglycerols (DAGs), and tri-acylglycerols (TAGs) were extracted from dried biomass by bead 
beating extraction and the biodiesel potential (total FAMEs) was determined by direct in situ 
transesterification according to protocols published by Lohman et al. (2013). The total FAMEs 
and the fatty acid compositions of these FAMEs were quantified using gas chromatography-mass 
spectroscopy (GC-MS; Agilent 6890N and 5973 Network MS). The FFAs, MAGs, DAGs, and 
TAGs were analyzed using gas chromatography flame ionization detection (GC-FID; Agilent 
6890N).  
 
 
 365 
Results and discussion 
Microalgae growth rate and yield 
Microalgae successfully attached to the cotton cord and grew as a biofilm for the entire 
study period (Fig. S1). Curve fitting of the growth data for the entire study period showed that 
the 1st order growth equation provided a better description of the data compared to the zero order 
equation with R2 values of 0.973 and 0.985 for biofilm amended with bicarbonate and those 
without bicarbonate respectively (supplemental data). The lag phase was minimized by the 3-day 
seeding period. The microalgal biofilms were in exponential growth phase from days 3 - 10 as 
determined from linear the portion of the log transformed growth data and the stationary phase 
after 10 days of growth (Fig. H.1).  
Figure 1:  
 
Figure H.1 Growth curve from log transformed data showing the exponential phase (day 3-10) 
and stationary phase (day 11 -18). Insert: Equations and R2 describing the exponential phase for 
both biofilms with and without bicarbonate amendment. 
 366 
The maximum specific growth rates measured during the exponential phase were 
0.18|0.07 (mean|range) and 0.20|0.07 day-1 for algal biofilms amended with bicarbonate and the 
unamended control, respectively. The maximum areal biomass density measured during the 
stationary phase was 20.95 and 25.98 g m-2 for biofilms with bicarbonate and biofilm samples 
without bicarbonate, respectively. Additionally, the biofilm production rates calculated, as the 
total biomass accumulated per rope surface area divided by the time taken to reach stationary 
phase were 1.45 and 1.79 g m-2 day-1 for biofilms with bicarbonate and biofilm without 
bicarbonate added, respectively. 
Growth curves for the algal biofilms (attached to rope) and microalgae growth in 
suspension are shown in Fig. H.2A. Microalgae growth in the bulk medium was negligible over 
the study period indicating that covering the reactors with black plastic effectively prevented 
light penetration and minimized growth in suspension. There was no statistical difference 
observed in growth characteristics for algal biofilms amended with bicarbonate and biofilms that 
did not receive bicarbonate (p value of 0.4517 from t test). Although it was hypothesized that the 
addition of bicarbonate would increase the algal biofilm growth, this was not observed. With the 
8L medium reservoirs, even the unamended algal biofilms were not carbon limited, such that 
bicarbonate addition did not enhance growth in this reactor system. DIC measurements remained 
relatively constant for each 5-day retention time with slight differences observed in the medium 
concentrations, with the exception of the first 3 days, (Fig. H.2B). 
Removal of nitrogen and phosphorus from synthetic wastewater using 
algal biofilms 
A basic requirement of wastewater treatment is the removal of nutrients (i.e., nitrogen 
and phosphorus) to acceptable limits prior to discharge. Microalgae based systems promote 
 367 
nutrient removal through plant uptake and subsequent harvesting of the nutrient-rich biomass 
from the effluent. In addition, microalgae increase the medium pH via photosynthesis thereby 
promoting volatilization of ammonia and possible 
 
Figure H.2 Growth curves for attached and suspended microalgae (A) and dissolved inorganic 
carbon (DIC) concentrations (B) in laboratory-RABRs amended with bicarbonate and without 
bicarbonate addition. Error bars for algal biofilm yield and DIC measurements represent standard 
deviation (n=4). Error bars for suspended growth represent range (n=2). Verticle dotted lines 
represent end of 5 day hydraulic retention time. 
precipitation of phosphate ions.476 It should be noted that all the RABRs were covered in black 
polyethylene, cleaned, and had the bulk medium replaced every 5 days to minimize algal growth 
in the bulk medium, which also minimized the pH increase of the medium resulting from 
photosynthesis. Therefore, at the measured pH of 8.5 ± 0.15 for medium amended with 
 368 
bicarbonate and 7.97 ± 0.22 for medium without bicarbonate respectively, nutrient removal was 
attributed to the activity of the biofilms. 
The synthetic wastewater was prepared with ammonia and nitrate salts as the only 
nitrogen sources. Initial concentrations of total nitrogen and phosphorus in the medium were 
approximately 40 mg-N L-1 and 7 mg-P L-1, respectively, giving a molar N:P ratio of 
approximately 13:1. The measured residual total nitrogen concentrations (including NO2--N) 
ranged from 7.95 – 19.66 and 8.20 – 19.72 mg-N L-1 for RABRs with and without bicarbonate 
amendment, respectively.  Similarly, final total phosphorus concentrations ranged from 3.39 – 
3.57 and 3.35 – 3.55 mg-P L-1 for RABRs with and without bicarbonate amendment, 
respectively. The lowest N and P residual concentrations were obtained during the retention time 
cycles corresponding to the exponential growth phase of the biofilms (Fig. H.3). Therefore, as 
expected, nutrient removal from the wastewater was closely linked to algal biofilm growth i.e., 
higher removal efficiencies were obtained during the exponential growth phase of the biofilm 
compared to the onset of the stationary phase. 
The N and P removal efficiency ranged from 27 - 74% (NO3--N), 89 -100% (NH4+-N), 
and 19 - 41% (PO43--P) during the experiments, with no significant difference observed between 
liquid samples from reactors amended with bicarbonate and those that did not receive additional 
dissolved inorganic carbon. Similarly, for the entire duration residual N and P concentrations 
followed the same trend in cultures amended with bicarbonate and those that did not receive 
bicarbonate (Fig. H.3). Complete uptake of ammonium ions was observed unlike nitrate ions in 
this study, probably due to preferential uptake of ammonia by microalgae compared to nitrate.481 
Microalgal cultures supplied with mixed nitrate and ammonium sources may  
 369 
 
Figure H.3 Ammonium, nitrate, nitrite, and phosphate ion concentrations in medium amended 
with bicarbonate and without bicarbonate addition. Error bars represent range for (n=2). Verticle 
dotted lines represent end of 5 day hydraulic retention time. 
repress NO3--N uptake due to feedback-inhibition, since ammonium is an end product of 
assimilatory nitrate reduction.505 Similar to the algal biofilm growth results, phosphate and 
nitrogen removal rates were not influenced by the addition of bicarbonate to the medium. 
Maximum nutrient removal from wastewater with algal biofilms can be attained via harvesting at 
the end of the exponential growth phase preferably after 8-10 days of growth using this RABR 
system.  
The nitrite concentrations observed in solution were probably a result of incomplete 
nitrification of ammonia since the algal biofilms were grown in a non-aseptic oxygenated 
environment (Fig. H.3). An abiotic control was used to verify that the presence of NO2--N ions 
was due to biological processes (supplemental Table S1). The chemoautotrophic bacteria 
 370 
involved in nitrification require a carbon source such as CO2 or HCO3-, therefore the reactor with 
bicarbonate treatment possibly had more favorable initial conditions for the bacteria to grow, 
thus the higher nitrite concentrations observed (Fig. H.3).  However, quasi-steady state 
concentrations of nitrite were eventually attained and the difference ceased to be significant later 
in the experiment.  
Microalgal biofilm photosynthesis and coupled respiration  
Oxygen microprofiles under illumination. Oxygen microprofiles were taken before and 
after N-deprivation was initiated, at 18 and 23 days of RABR operations. Steady state oxygen 
microprofiles for biofilm samples under light showed an initial increase in oxygen concentrations 
(compared to equilibrium with saturated saturated air ≈260 µM oxygen), which peaked at a depth 
of 200 ± 25 µm from the biofilm surface (biofilm/air interface) for both N-replete and N-
deprived biofilms (Fig. H.4A and H.4B). Oxygen production in illluminated algal biofilms is a 
result of photosynthesis, and spatial gradients of light are known to affect the rate of oxygenic 
photosynthesis and corresponding oxygen concentrations in algal biofilms (Wieland and Kühl, 
2000). Photosynthetic activity was highest in the upper layers of the biofilm and decreased with 
biofilm depth, possibly due to light attenuation and/or substrate diffusion limitations. Biofilms 
cultured under N-replete conditions had peak  
 
 371 
 
Figure H.4 Steady state oxygen microprofiles for illuminated algal biofilms under nitrogen 
replete (A) and nitrogen deprived (B) conditions. Error bars represent standard deviation of 
replicate profiles (n=3); steady state oxygen microprofiles in the dark for algal biofilms under 
nitrogen replete (C) and nitrogen deprived (D) conditions. Error bars represent standard 
deviation of replicate profiles (n=3); and representative photosynthesis profiles for algal biofilms 
under nitrogen replete (E) and nitrogen deprived (F) conditions. Zero depth (surface) is at the 
algal biofilm/air interface. 
oxygen concentrations that were twice that of N-deprived biofilms (Fig. H.4A and H.4B). 
Furthermore, under illumination there were no anoxic zones observed in N-replete biofilms, an 
indication that the oxic zone (oxygen penetration depth) extended into the cotton cord substratum 
(Fig. H.4A). In nitrogen replete systems, the steady state oxygen microprofiles showed no 
significant differences under either light or dark conditions for biofilms with or without 
bicarbonate (Fig. H.4A and H.4C).  
 372 
On the contrary, differences in steady state oxygen microprofiles were revealed between 
N-deprived algal biofilms with and without bicarbonate amendement (Fig. H.4B). For example, 
bicarbonate amended biofilms had higher oxygen concentrations compared to biofilms that did 
not receive bicarbonate. This is an indication of either higher photosynthetic rates and/or reduced 
oxygen consumption rates due to respiration. Indeed, higher photosynthetic rates and lower areal 
respiration rates (in the light) were calculated for bicarbonate amended biofilm samples under N-
stress (Table H.1). Additionally, anoxic zones were observed in N-deprived algal biofilms and 
the depth of oxygen penetration for the bicarbonate amended biofilms was 1500 µm compared to 
850 µm for biofilms without bicarbonate addition (Table H.1).  
Oxygen microprofiles in the dark. Oxygen is consumed by algal biofilms in the dark as a 
result of respiration. Assuming oxygen diffusivity is constant, the rate at which oxygen decreases 
(slope) is an indication of the consumption rate i.e., a steeper decline in oxygen concentration 
indicates greater consumption and a smaller depth of oxygen penetration can be assumed to 
occur as a result of high heterotrophic activity.506 Steady state oxygen concentrations for biofilms 
in the dark decreased with depth to anoxic conditions for both N-replete and N-deprived biofilms 
(Fig. H.4C and H.4D). Biofilms under N-replete culturing showed a more gradual decline in 
oxygen concentration compared to N- 
 
 
 
 
 373 
Table H.1 Measurements of photosynthetic rates, respiration rates, and relevant depth parameters 
for laboratory grown microalgal biofilms with and without bicarbonate amendment. 
Parameter Bicarbonate No bicarbonate 
µmol O2·cm-2·sec-1 N-replete N-deprived N-replete N-deprived 
Gross photosynthesis, Pg 6.27E-04 2.26E-04 3.08E-04 2.02E-04 
Net areal rate of biofilm 2.43E-04 9.2E-05 2.21E-04 7.43E-06 
photosynthesis, Pn (% Pg) (38.74%) (40.76%) (71.59 %) (3.68%) 
Net areal rate of photic zone 2.99E-04 1.36E-04 2.61E-04 5.52E-05 
photosynthesis Pn,phot (% Pg) (47.72%) (60.26%) (84.87 %) (27.33) 
Areal respiration of the biofilm,  3.84E-04 1.34E-04 8.75E-05 1.94E-04 
Rlight (% Pg) (61.26%) (59.24%) (28.4%) (96.32%) 
Areal respiration of the photic zone, 3.28E-04 8.97E-05 4.66E-05 1.47E-04 
Rphot (% Pg) (52.28%) (39.74%) (15.13%) (72.67%) 
Respiration in the dark, Rdark 0.59E-04 1.49E-04 0.74 E-04 0.98E-04 
Depth of photic zone, Lphot, µm 1000 ± 100 600 ± 100 700 ± 100 600 ± 100 
Depth of oxic zone in light, µm >1950 1500 >1950 850 
Depth of oxic zone in the dark, µm 1100 ± 25 650 ± 25 950 ± 25 300 ± 25 
 
deprived biofilms, where steeper slopes and shorter oxygen penetration depths were observed. 
This was an indication of greater potential for heterotrophic oxygen consumption in N-deprived 
biofilms compared to N-replete biofilms, an observation that is contrary to what was reported in 
Bernstein et al. (2014).153 The current study provided a longer N-starvation period of 120 h 
compared to 60 h in the study by Bernstein et al. (2014), which may have promoted greater 
heterotrophic activity in the N-deprived biofilms.153  
 374 
Both before and after N-deprivation, biofilm samples amended with bicarbonate had 
greater oxygen penetration depths under dark conditions compared to biofilms that did not 
receive bicarbonate. Oxic zones of 1100 ± 25µm and 950 ± 25 µm in depth were estimated for 
biofilms amended with bicarbonate and without added bicarbonate under N-replete culturing. 
Similarly, oxygen penetration depths of 650 ± 25 µm and 300 ± 25 µm for biofilms amended 
with bicarbonate and without added bicarbonate during N-deprivation were observed (Table 
H.1). This showed that the bicarbonate amended biofilms had lower oxygen consumption in the 
dark compared to the biofilms without bicarbonate amendment for both nutrient conditions.   
Spatial rates of photosynthesis and respiration. The gross photosynthesis profiles were 
generated at a spatial resolution of 100 µm verticle depth using the volumetric photosynthetic 
rates (i.e., the rate of oxygen depletion within 3 seconds of dark incubation) determined from the 
light/dark shift technique. Photosynthesis occurred within a depth of 500 µm from the biofilm 
surface (Fig. H.4E and H.4F).  Similarly, increasing rates of areal gross photosythesis (Pg) 
resulted in higher areal net biofilm photosynthesis (Pn) and photic zone photosynthesis (Pn, phot), 
which corresponded to deeper oxic zones (Table H.1). 
However, photosynthetic rates significantly varied with both nutrient conditions and 
presence/absence of bicarbonate in medium. Biofilm samples under nutrient replete culturing had 
higher photosynthetic rates (Pg, Pn , and Pn,phot) compared to N-deprived algal biofilms indicating 
a greater potential for photo-productivity when nutrient replete (Fig. H.4 and Table H.1). 
Biofilms amended with bicarbonate also had higher photosynthetic rates (Pg, Pn , and Pn,phot ) 
compared to the biofilms that did not receive bicarbonate for both N-replete and deprived 
conditions (Table H.1). The distribution of Pn and Pn,phot as a fraction of the gross photosynthesis 
in the bicarbonate amended biofilms was different from that of biofilms that did not receive 
 375 
bicarbonate. Pn and Pn,phot  represented a greater proportion of gross photosynthesis under N-
deprived conditions for bicarbonate amended biofilms, whereas for biofilm samples that did not 
receive bicarbonate the reverse was observed i.e., Pn and Pn,phot  represented a greater proportion 
of gross photosynthesis under nutrient replete conditions.  
Dark respiration and photorespiration are the two basic types of respiration that occur in 
photosynthesizing microalgae. Dark respiration is assumed to be constant and occurs both in the 
light and dark whereas photorespiration is mostly active in the light and afew seconds after dark 
incubation.507 The dark respiration term (Rdark) was obtained as the slope of the initial portion of 
the O2 microprofiles (linear part) in the dark. The light respiration terms (Rlight and Rphot) were 
determined as the difference between Pg, and  Pn and Pn,phot, respectively. Although, there was no 
clear trend observed for repiration rates (Rlight and Rphot) across nutrient conditions, addition of 
bicarbonate to the biofilms revealed some differences. For biofilms cultured under N-replete 
conditions, higher areal respiration rates (Rlight and Rphot) were observed in bicarbonate amended 
biofilms compared to biofilms that did not receive bicarbonate (Table H.1). This may have been 
due to the higher photosynthetic rates and subsequent increase in oxygen concentration in the 
biofilms amended with bicarbonate during N-replete culturing (Fig. H.4). For algal biofilms 
cultured under N- deprived conditions, lower Rlight and Rphot were observed with added 
bicarbonate compared to biofilm samples without bicarbonate (Table H.1). This indicated that 
addition of bicarbonate reduced light respiration in N-deprived biofilms possibly due to an 
increased DIC supply.  
Dark respiration measurements were greater for N-deprived biofilms indicating a higher 
capacity for heterotrophic (or light independent) respiration. The influence of bicarbonate 
addition on Rdark values varied with nutrient condition. For example, N-replete cultures had 
 376 
higher Rdark in biofilms that did not receive bicarbonate, whereas higher Rdark were observed in 
biofilms amended with bicarbonate for N-deprived cultures (Table H.1).   
Biofuel precursor production 
Extractable biofuel precursor molecules (FFAs, MAGs, DAGs and TAGs) and total 
biofuel potential (as FAMEs, i.e. extractable and non-extractable molecules) for each biofilm 
type, both before and after N-starvation, were measured and are presented in Table H.2 and Fig. 
H.5. An increase in total extractable precursor concentrations was observed in the biofilms after 
the 120 h N-starvation period (Table H.2). Stressed microalgae have been reported to accumulate 
TAG as a carbon and energy storage material.508 The sum of extractable precursors increased 
from 5.62% to 7.13 % (w/w) for biofilms amended with bicarbonate and 4.84% to 5.18% (w/w) 
for the biofilms that did not receive bicarbonate, respectively (Table H.2). Although the FFA, 
MAG, and DAG concentrations remained relatively constant, twice as much TAGs accumulated 
in the biofilms after N-starvation leading to the overall increase in total biofuel precursor 
molecules (Table H.2). Bicarbonate amended algal biofilms had higher weight percentage of 
extractable molecules.  
 
 
 
 
 
 
 377 
 
Table H.2 Total and percent composition of extractable biofuel precursor weight (%) in 
laboratory grown algal biofilms with and without bicarbonate amendment. 
Precursor molecules,  Condition 
Weight % (w/w) Nutrient Replete Nutrient deplete 
aBicarbonate aNo bicarbonate aBicarbonate aNo bicarbonate 
C14 FFA 1.44|0.01 0.67|0.14 1.07|0.18 1.23|0.15 
C16 FFA 1.11|0.26 1.49|0.03 0.86|0.58 1.45|0.21 
C18 FFA 0.73|0.21 1.53|0.05 0.48|0.35 0.79|0.13 
C16 MAG 0.11|0.05 0.18|0.01 0.11|0.09 0.13|0.03 
C18 MAG 0.11|0.01 0.16|0.01 0.09|0.04 0.13|0.02 
C16 DAG 0.09|0.03 0.10|0.02 0.09|0.05 0.10|0.00 
C18 DAG 0.21|0.06 0.19|0.06 0.19|0.06 0.17|0.01 
C16 TAG 0.49|0.41 0.17|0.08 0.58|0.41 0.35|0.15 
C18 TAG 1.32|1.30 0.36|0.37 3.65|1.51 0.84|0.40 
Sum of extractables  
Weight % (w/w) 5.62|1.08 4.84|0.71 7.13|0.57 5.18|0.42 
Areal concentration (gm-2) 1.03 1.22 1.30 1.30 
aMean and range (|) for n=2 
The total FAME-weight percent and yield for N-replete and N-starved biofilms with or 
without bicarbonate amendment were similar (Fig. H.5A and H.5C). Although, the total FAME 
potential ranged from 12 – 20 %  (w/w) of the biomass (Fig. H.5B), the total extractable lipids 
were less than 10%  (w/w) (Table H.2). As previously reported by Bernstein et al. (2014), the 
most notable difference regarding lipid production in the RABR- grown algal biofilms was the 
difference in the total extractable weight percent of lipids between the N-replete and deplete 
conditions (Table H.2). Depletion of nitrogen and addition of dissolved inorganic carbon in the 
medium were not effective in stimulating substantial lipid production in the microalgal biofilms. 
Qualitative analysis of lipid profiles using images from CLSM showed the same result, the 
microalgal biofilms only showed a slight increase in lipids after N-starvation (Supplemental Fig. 
S4). Previous studies have attributed the inability of N-depletion in the growth medium to induce 
 378 
lipid production in algal biofilms to possible nutrient re-cycling within the biofilms and 
resilience of algal biofilms to environmental stress.153, 376 
 
 
Figure H.5 Total FAMEs and free fatty acid composition of the FAMEs. A: Mean percent 
FAME (w/w), B: percent lipids (w/w), C: areal concentration (g m-2). Error bars represent range 
(n=2). ND and NR represent nitrogen deprived and replete algal biofilms, respectively. 
Conclusions 
For this study, there was no significant difference in algal biofilm growth, nutrient 
removal, and lipid accumulation between algal biofilms amended with bicarbonate and those that 
did not receive bicarbonate. However, an increase in photosynthesis rates was observed in algal 
biofilms amended with bicarbonate. The influence of bicarbonate on photosynthetic and 
 379 
respiration rates was especially noticeable in biofilms that experienced nitrogen stress, as 
compared to biofilms in nutrient replete conditions.  
Medium N-depletion may not be a suitable stimulant for lipid production in algal biofilms; rather 
focusing on optimizing growth, nutrient removal rates, and/or biomass productivities may be 
more beneficial.  
Appendix H: Supplementary Data 
Supplementary data associated with this article can be found, in the online version, at 
http://dx.doi.org/10.1016/j.biortech.2014. 12.082. 2