Chapter 1
GENOMES
The genome is the sum total of all the genetic information required to reproduce a particular organism. A genome contains all of the genes that encode the proteins and RNA molecules necessary for building and maintaining an organism, as well as all of the genetic information required to regulate the expression of these genes so that their products appear at the correct time during development, and in the correct cell type. All of this genetic information â information inherited from one generation to the next â constitutes the genome. Genomes are found both in cellular organisms and in the viruses and bacteriophages that replicate in them.
What all cellular organisms share in common is the molecule that encodes their genome: the double-stranded form of deoxyribonucleic acid commonly known as duplex DNA, double-stranded DNA, or simply DNA. Genomes of viruses, on the other hand, consist of either double-stranded DNA (dsDNA) or single-stranded DNA (ssDNA) as well as double-stranded or single-stranded ribonucleic acid (RNA). Cellular organisms can also harbor episomes, small duplex DNA molecules that replicate independently of the cellâs chromosomes. In bacteria, episomes are also referred to as plasmids. Bacterial plasmids are circular dsDNA molecules that replicate autonomously in their host cell. Their size varies from 1 to over 400 kilobase pairs (kbp), and cells contain from one to hundreds of copies, with a typical copy number of between 1 and 20. The reason that viruses and episomes cannot reproduce themselves outside of their host cell â and are therefore not alive â is that part of their genome is encoded by the genome of their host cell. One bizarre parasitic organism, the prion, depends completely on the DNA genome of the host. Prions are infectious particles composed only of protein that propagate by refolding abnormally into a structure that is able to convert a normal cellular protein into an abnormal cellular protein (the prion).
Genomes need not be contained within a single, huge molecule. In fact, genomes greater than approximately 10 kbp are generally segmented into molecules of smaller size, and one or more segments of a genome may be stored in different places. Among the eukarya, most of the genome is stored within the nucleus, but a relatively tiny portion is stored within the mitochondrion in animals and additionally in the chloroplast in plants. Each of these DNA molecules encodes genes specific for the function of the organelle that harbors it. Since these organelles are responsible for generating ATP, their small DNA molecules (16â2600 kbp) are nevertheless important for the survival of the organism.
DNA does not exist alone in nature; it exists only in association with proteins. DNA in all cellular organisms is organized into a DNAâprotein complex called chromatin in which the acidic (i.e. negatively charged at neutral pH) DNA is wrapped around a stable complex of basic (i.e. positively charged at neutral pH) proteins. These basic proteins are called histones in eukarya, histones or histone-like proteins in archaea, and histone-like proteins in bacteria. They bind DNA tightly to form a structure called the nucleosome. Although the composition and organization of chromatin differs between bacterial, archaeal, and eukaryal chromosomes, the principle is the same: compacting DNA into a small space while still providing a dynamic structure that permits variation in the expression of its genes. Many other proteins also bind to chromatin. Some of them bind to the DNA, some to other proteins.
The small genomes of bacteria, archaea, viruses, and episomes are stored in a single piece of chromatin called a chromosome. These chromosomes are usually, but not always, circular. The much larger genomes of eukarya are stored in many individual chromosomes. Eukaryal chromosomes are normally linear and terminated at each end by a unique structure called the telomere. When chromosomes are duplicated during cell division, the two sibling chromosomes are called sister chromatids. The number of copies of each chromosome per cell, referred to as ploidy, can vary with bacterial and archaeal species ranging from examples of haploid organisms (containing a single copy of their genome) to polyploid organisms (many copies of their chromosome). Single-cell eukaryotes, such as yeast, can exist in both a haploid and a diploid (two copies of the genome) state. Gametes (eggs and sperm) of plants and animals are haploid, but most of their other cells are diploid. Some specialized cells of plants and animals are polyploid, but they are terminally differentiated and do not undergo cell division.
GENOME COMPOSITION
DNA
The modern science of genetics, the study of heredity and variation in living organisms, began with the observations by Gregor Mendel (published in 1866, but largely ignored for the next 35 years) that organisms inherit traits in discrete units. Two concepts emerged from Mendelâs work. The âLaw of Segregationâ states that when individuals within a species reproduce, each gamete receives only one copy of each hereditary unit. The âLaw of Independent Assortmentâ states that each hereditary unit assorts independently of the others during gamete formation. Mendelâs hereditary units are what we now call genes, and the second law is true only for genes on different chromosomes.
The discovery of what genes are and how they are duplicated each time a cell reproduces itself began in 1869 when Friedrich Miescher, a student in the laboratory of Felix Hoppe-Seyler at the University of TĂźbingen, Germany, discovered DNA. At that time, it was thought that cells consisted largely of protein, but Miescher noted the presence of something in pus cells that âcannot belong among any of the protein substances known hitherto.â Since it was derived entirely from the cellâs nucleus, he named it nuclein. Subsequently, Miescher discovered that nuclein could be obtained from many other cells, and that it was unusual in that it contained phosphorus as well as carbon, oxygen, nitrogen, and hydrogen. Miescherâs work was not published until 1871, because Hoppe-Seyler insisted on first confirming its reproducibility. Miescher died in 1895, long before the significance of his discovery was appreciated. By 1893, another student of Hoppe-Seyler, Albrecht Kossel, had discovered that nuclein contained four nucleic acid bases, for which he received the Nobel Prize in Physiology or Medicine in 1910. Nevertheless, the credit for laying the foundation for determining the structure of nucleic acids belongs to Phoebus Levene. At the beginning of the twentieth century, Levene and his coworkers at the Rockefeller Institute in New York City elucidated the structures of both sugar components of the nucleic acids, established the nature of the nucleosides and nucleotides, and proposed a structure for nucleic acids that consisted of a linear combination of nucleotides with the correct internucleotide linkage. However, the biological significance of DNA remained unrecognized until 1944. The complexity of proteins and polysaccharides had long been established by that time, but the chemistry of nucleic acids suggested that they were simple polymers of a tetranucleotide repeat. It is doubtful that anyone seriously thought that such a simple chemical substance as DNA could encode all the genetic information required to reproduce a living organism.
The first step toward identification of DNA as the genetic material was taken in 1928 with the discovery of genetic transformation of bacteria by the British microbiologist Frederick Griffith. In trying to understand the nature of pneumonia, a leading killer of the day, Griffith observed that if he inoculated mice with a nonvirulent form of Streptococcus pneumoniae together with a heat-killed virulent form of this bacterium, the nonvirulent form readily became virulent. Griffithâs discovery provided a way to identify the chemical nature of genetic information. But the true nature of the genetic material awaited...