· 1 ·

OBLIGATE PARASITES of CELLS

THE STORY OF VIRUSES begins in 1879 at the Agricultural Experiment Station in Wageningen in the Netherlands. In the mid-part of the nineteenth century, a disease ravaged the tobacco crop. It was so severe in some regions that it “caused the cultivation of tobacco to be given up entirely” (Zaitlin 1998). Adolf Mayer, christened the disease tobacco mosaic disease, as it manifest in darkened patches on the leaves of the tobacco plants. Mayer was looking for the cause of the disease when he observed that the juice extracted by grinding up the leaves of a diseased plant could pass on the disease to a healthy plant. He rightly concluded that a transmissible infectious agent was responsible for the disease of the tobacco crop. However, his experimental results did not suggest to him that the agent was anything other than a microbe.
In 1892 in St. Petersburg, Russia, Professor Dimitri Ivanowski demonstrated that the same transmissible agent could pass through a porcelain filter. The filter, invented by Louis Pasteur and Charles Chamberland, was designed to have a pore size that retained bacteria, since it permitted only particles smaller than 0.5–1.0 microns in diameter to pass through. Ivanowski’s results ruled out bacteria but he concluded that the disease-causing agent was most likely a bacterial by-product or a toxin. A few years elapsed before Martinus Willem Beijerinck, a Dutch scientist, refined the concept of the infectious principle. It was certainly not a bacterium. It could not be coaxed to grow in the laboratory in a nutritional medium that typically supported bacterial growth. He proposed that the infectious agent required close association with the metabolism of living plant cells for propagation. The infectious principle evidently depended on them for growth. He described the clear infectious filtrate as contagium vivum fluidum—a “contagious living fluid” (Bos 1999).
At the turn of the century, scientists had no tool other than the Chamberland filter to describe the physical nature of viruses. They were infectious entities small enough to pass through its pores, defined only by their diminutive size. It would take another forty years before tobacco mosaic virus particles themselves would be isolated and described as an “enzyme-like protein,” and later characterized as a nucleoprotein, a particle containing both protein and nucleic acids.
Some twenty years after these first observations of a virus infecting a plant, another tandem effort of scientific discovery revealed viruses that infect prokaryotic cells. The English doctor Frederick Twort was studying the bacterium Staphylococcus because it was a frequent contaminant of cowpox lesions that he collected for use in the preparation of smallpox vaccine. While examining the bacterium in culture, he observed clear patches on the surface of the small bacterial colonies growing on his culture plates. He interpreted them, quite correctly, to be the result of the destruction of bacterial cells and hence a disease of the microorganism. He found that the “disease” could be passed from one colony to another and that the agent passed through a filter, just as Beijerinck had observed for the infectious agent of tobacco mosaic disease. Although Twort believed the disease-causing principle which destroyed bacterial cells was probably an enzyme or toxin, the key properties of a virus were met (Twort 1915).
Perhaps Twort did not recognize the real significance of his observations, but FĂ©lix d’HĂ©relle, a QuĂ©becois scientist working at the Pasteur Institute in Paris, soon did. He advanced the discovery of bacteria infecting viruses one step further. He observed a filterable “antagonistic microbe” that killed Shigella dysenteriae, rendering the bacterial cultures clear. D’HĂ©relle wrote, “The disappearance of the dysentery bacilli is coincident with the appearance of an invisible microbe 
 [it] is an obligate bacteriophage” (D’HĂ©relle 1917). This was the first use of the term “bacteriophage,” which means “bacteria eating.” He had discovered what we now know to be the group of viruses that make up the vast majority of the virosphere. They are parasites of prokaryotes, the organisms that comprise the ancient bacterial and archaeal domains of life.
Although Ivanowski, Beijerinck, and Twort grappled with the nature of the infectious agent—a bacterium, a toxin, or an enzyme—today there are a wealth of biochemical, physical and molecular descriptions of viruses. A dictionary definition of virus might read: infective agent that typically consists of a nucleic acid molecule in a protein coat, is too small to be seen by light microscopy, and is able to multiply only within the living cells of a host. This is an apt description of a virus, but it has some shortcomings. The use of the qualifier “typically” is prescient. Most viruses do adhere to these principles, but there are notable exceptions. Some viruses get along just fine without a protein coat and some have particle sizes larger than some bacteria (refer to Chapter 8). To formulate an understanding of their fundamental nature, it is worth exploring a more refined and inclusive definition of viruses.

The Virosphere and Its Metagenome

The virosphere is the collective of all viruses in all ecosystems, and in all hosts in the biosphere. Notionally, when we think “virus,” we think of virus particles and their nucleic acid contents. It is the nucleic acids, ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), which are polymers of either ribonucleotides or deoxyribonucleotides that constitute the essential genetic blueprints of viruses. The genetic code of the nucleic acid that makes up the viral genome contains the information fundamental to its distinct identity. Just as different species of living organisms have different genetic blueprints recorded in their genome sequences, so too do viruses. Today it is possible to visualize different viruses under an electron microscope. This may well reveal particles that are indistinguishable in shape and size, which can belie their differences; their unique identity is in the information encoded in their genomes and it may be distinctly, even radically, different. The true diversity of the virus world can only be realized when their genetic contents—their individual unique bar codes—are cataloged and compared. It is therefore useful to consider the virosphere not simply in terms of the collective of distinct species of viruses but as the collective of their genetic informational content—the viral metagenome.
A metagenome catalogues the collective genomes of all of the organisms, which can be recovered from an environmental sample. An “environmental sample” may be a gram of soil, a milliliter of seawater, or an organism, each of which represent distinct ecosystems. The most inclusive use of the term collects the genomes in the biosphere, and this includes the genomic information of all living organisms and their viruses. The human metagenome captures the collective of genomes associated with it and therefore includes not only our own genome sequence, but also those of the organisms making up the microbiota that shares our body space. These symbiotic bacterial and archaeal cells constitute the human microbiome and occupy our external surfaces—our skin, the mucosal epithelia of the gut, the nasal and oral cavities, and our genital tracts. The human virome is the aggregate of viruses that infect both our own body cells and those of our microbial passengers. Their respective gene complements would be considered their metagenomes.
The study of metagenomes has been made possible by major technological advances in molecular biology. It is rooted in our ability to read and interpret the nucleotide sequences of the genetic material of organisms and viruses in a given sample. Prior to this development, the recognition and identification of microorganisms and viruses in a given sample was strictly limited to those that could be grown in culture or directly observed under the microscope. Today the detection of nucleotide sequences in even tiny samples of environmental or biological material can be used as effectively as a fingerprint to identify microbes and viruses.
Over the last decade researchers used these tools to probe for potential links between the composition of the human microbiome and health and disease. It is estimated that this microbial community is made up of 75 to 200 trillion individual microbes—a number comparable to the 100 trillion cells that make up the human body. An equally astonishing fact is that for each of the trillions of microbial cells there may be tenfold more viruses! This population of viruses—largely bacteriophages (phages for short)—is the major contributor to the human virome. The remaining contributors to the human virome are viruses infecting our own cells, human viruses. Although still poorly understood, the three-part interplay between the human body, our microbiome, and our virome is increasingly considered central to our health, and very often to our diseases.
Key tools in the exploding field of metagenomics are new generations of DNA sequencing technologies and sophisticated computational tools. Scientists can determine the nucleotide sequence of trace amounts of DNA from multiple organisms in a single sample. It is no longer necessary to culture the organisms separately and isolate the DNA from each organism. Massively parallel DNA sequencing allows complex mixtures of DNAs to be sequenced simultaneously. Together with sophisticated bioinformatic algorithms, the different DNA sequences and their relative abundance in the sample can be determined. Once it became unnecessary to culture organisms to characterize and catalogue their genome sequences, the principal barrier to researching the biology of our microbiome was overcome. In fact, though the vast majority of microbial species that make up the microbiome cannot currently be cultivated outside the body, today massively parallel sequencing can identify which organisms are present and in what abundance in samples of the gut microbiota. A key factor facilitating this analysis is that, without exception, the chromosomes of cellular life-forms encode genes required to build ribosomes. These are the biological machines responsible for interpreting the messenger RNA templates and manufacturing proteins from amino acids. The 16S ribosomal RNA (rRNA) genes of the small subunit of the prokaryote ribosome have been particularly well conserved throughout evolutionary history. Small differences in the sequence of these highly conserved genes allow accurate deduction of phylogenetic relationships between bacterial species. Comparing these unique “fingerprint” rRNA gene sequences with DNA sequences stored in genomic databases, researchers rapidly identify the bacterial or archaeal species in a given sample. The frequency of the particular rRNA gene sequence in the DNA sequence data indicates its relative abundance in the sample.
Unfortunately, no such tool exists to assist viral metagenomics. Accordingly, its progress lags behind microbial metagenomics. We cannot classify viruses in a given sample using the same approach used for prokaryotes. Virus genomes have no rRNA genes since they do not encode their own protein synthesis apparatus. Furthermore, virus genomes exhibit an unprecedented and quite remarkable diversity of genes and gene sequences. In fact, there is not a single gene or descendant of a single gene that can be found in all virus genomes; no unique viral fingerprint can be used to deduce their presence in a sample and determine their phylogenetic relationships.
Families of related viruses do, however, share similar replicative strategies and consequently have in common certain types of enzymes or structural proteins that are intrinsic to their respective lifestyles. Such genes have nucleotide sequence similarities that allow deduction of viral lineage relationships. Integrase proteins are examples of viral enzymes possessed by many different viruses. Although they can be quite different and highly divergent in amino acid sequence, integrase-related proteins are found in most viruses that integrate their genome into the host chromosome as part of their life cycle. Equally, many virus families employ capsid proteins. Despite the genetic diversity in the viral world, only three different structural templates for capsid proteins have been observed. It appears that only a limited number of viable solutions to the “problem” of virus capsid construction have evolved. Capsidated viruses all have related capsid proteins patterned on one or another of these three different three-dimensional templates. It is these protein amino acid sequence signatures, together with powerful computational tools, that the viral genomics scientist relies on to divine the origin and relatedness of viral sequences in a sample. It is not an exact science, and is complicated by the fact that only a fraction of viral sequences has been catalogued and recorded in genomic databases. It is also confounded by the rapid evolution of viral genes, as well as by the promiscuity of viral genetic information, frequently exchanged, lost, and gained. It is fair to say then that any assessment of the complexity of a viral metagenome is likely to be an extremely conservative estimate. Our computational methods detect similarity between a sample viral nucleotide sequence and those in existing viral databases. Truly novel sequences or those that may have evolved to have no perceptible similarity to known virus genes cannot be definitively assigned to a virus species.
Today, scientists are exploring the viral metagenomes in a variety of ecosystems. It is no great technical challenge for researchers to enumerate viruses in natural bodies of water. Quantitation of nucleic acids recovered from virus particles, isolated by passing samples of ocean water through a 0.5 micron filter, revealed an astonishing fact: each milliliter of seawater teems with 1 million microbial organisms, but there are 10 to 100 million viruses in the same sample (Bergh et al. 1989). The ocean at the seaside is a solution of virus particles. Conservatively then, it can be estimated that the virosphere is composed of 1031 individual viruses and they are the most abundant biological entities on earth, outnumbering the Bacteria and Archaea by a factor of 10 (BrĂŒssow and Hendrix 2002; Suttle 2007; Breitbart and Rohwer 2005). To an alien with the sensorial ability to detect both the microscopic and the macroscopic world, we and the other members of the Eukarya would be lost in the earthly crowd—we are that tiny a minority in the planetary community.

Complexity and “Dark Matter”

The diversity of viral genetic information recovered from environmental samples is quite simply, astonishing. The field of marine viral metagenomics emerged when researchers identified viruses in seawater using massively parallel DNA sequencing. Since the first decade of this century the field has advanced rapidly. Professor Forest Rohwer, a marine ecologist at the San Diego State University in California, is one of the pioneers of the field. He and his collaborators were some of the first scientists to exploit technologies for viral metagenomic analysis. In 2006 they reported one of the most comprehensive global studies of the marine viral metagenome (Angly et al. 2006; Suttle 2007). They collected and analyzed samples from more than sixty sites in four oceanic regions, sequencing virus DNA from the waters of the Gulf of Mexico, coastal western Canada, the Arctic Ocean, and the Sargasso Sea. Their studies cracked open the door, allowing a first view into the inscrutable world of marine viral populations and their ecology. With data from numerous other expeditions, a coherent picture has emerged, revealing that viral populations in the oceans are extremely diverse (Suttle 2007). Among the trillions of virus particles found in 100 liters of seawater, there are many thousands of distinct viral species, each with a discrete genetic blueprint, or genotype. More than a million different genotypes can be found in 1 kilogram of sediment taken from the ocean floor. Most phages are widespread and found around the globe—they are everywhere—but their relative abundances in different locations varies a great deal. Different environmental conditions must therefore have a profound influence on the prevalence of each virus and class of viruses found in different locales (Angly et al. 2006; Breitbart and Rohwer 2005).
Although the oceanic virome is the most intensively studied to date, an increasing wealth of research explores the virome in other ecological niches. Metagenomes of halophilic or thermophilic bacteria and archaea are studied in salt lakes and hot springs. The hypolythic microbial communities on the underside of translucent rocks in the hyperarid Namib Desert offer opportunity for study (Adriaenssens et al. 2014). Viruses in these collective ecosystems make up the numerical majority in the virosphere. They predominantly infect bacteria and archaea, but interact with them in many different ways. They exploit wide-ranging strategies, which have one thing in common: the singular goal of replicating and perpetuating their genetic information. Their genomes, encoding all the information that dictates their lifestyles, can be made of RNA or of DNA and may take many different forms: single- or double-stranded, linear or circular, or even segmented. They can in some rare instances be shared (when the genetic information in the viral genome is insufficient and the missing genes are provided by a second helper virus genome). Viruses have the potential for rapid evolution. This is possible due to a number of factors, including the huge complexity of virus populations and their short generation times resulting from fast iterative replicative cycles. Here, the genetic complexity to which I refer is a reflection not only of the large number of individual viruses but also the large diversity in their genetic information content. Another major catalyst of viral evolutionary rates is the promiscuity and the proficiency with which they exchange genetic information, both with each other and with their hosts. Finally, the error-prone nature of viral replication turns out a multitude of inexact copies whose mutations contribute to the genetic diversity of the population. Viruses ride on swift evolutionary currents, propelling their own evolution and the adaptive evolution of their hosts. The viral metagenome is a veritable smorgasbord of useful genetic functionality. It evolves in service to the success of the viral genome but if assimilated by hosts it can provide new competitive advantage for survival in changing and hostile environments.
Scientists estimate that each second, 1025 phages initiate infection of a host cell—each of these viruses breaks down into component parts before its genetic blueprint directs the man...