CHAPTER 1
| Mechanisms of Mutation |
BERNARD S. STRAUSS Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, USA | | |
INTRODUCTION
Mutation is a sudden, inheritable change in the genome [1]. The change needs to be sudden, that is, the change must be present in one cellular generation but not in the preceding generation. Mutations were originally defined as necessarily involving the gametes, but somatic mutations are now recognized as an important process. The phenomenon of recombination will often couple two sequences to give a molecule with such novel properties that it appears to have been a mutation, but geneticists have not usually classified such events as mutations. As will be seen in the discussion of chromosome aberrations (below) this separation between recombination and mutation is not clear-cut. The change also needs to be transmitted. An alteration in the DNA structure that prevents its replication is an inactivation, but not a mutation. On the other hand there are reversible changes possible in the structure, but not the coding properties of the nucleotides, for example, cytosine or adenine methylation. These changes can be inherited, but since the pairing properties of the nucleotide are not altered they are considered as epigenetic change, notwithstanding the large effect such modifications can have on function. Mutations may involve single nucleotides, in which case we speak of point mutations. A few or many nucleotides may be added or deleted. Whole genes or groups of genes may be deleted, duplicated or moved to other chromosomes, or whole chromosomes may be added or lost as a result of errors in cell division. All such changes may have drastic effects on the life of the organism. On the other hand, many changes in the DNA have no discernible effect on function. As long as the interest of geneticists concentrated on Mendelian traits (or phenotypes), attributable to the action of a single gene, it was relatively simple to distinguish functionally significant mutations. The discovery that humans may differ at approximately one nucleotide in every thousand or, given a genome size of about three billion, at three million possible sites, the single nucleotide polymorphisms or SNPs complicate the attempt to determine which changes are significant [2]. An additional complexity comes from the discoveries that differences in sequence statistically associated with different diseases are often found to be located in gene deserts, regions in which there are no known genes [3].
Although only about 25,000 protein coding genes have been recognized, these account for less than 5% of the total DNA. However, it appears that almost half (50%) of the total DNA is transcribed. At least some of these transcripts have a function. Up to one thousand are processed into small (20ā23 nucleotide) sequences called microRNA (miRNA) with key roles in regulating a variety of cell functions. The key players in this regulation are nucleotides 2 to 8 of the miRNA, the so-called āseed regionā [4, 5].
Insofar as the sequences coding these miRNAs have a physiological effect on the organism, they need to be considered as genes in the same sense as protein coding genes. The DNA giving rise to these miRNAs is subject to mutation by the same mechanisms for base change discussed in the chapter. In addition, āRNA editingā changes the pairing specificity of sequences by conversion of particular adenosines (A) to inosine (I) by deamination. It is conceivable that alterations in the editing system could result in major changes in the function of particular miRNAs. Recently two point mutations in the seed region of one of the miRNA genes, miR-96, have been shown to segregate in a Mendelian way and to be responsible for a form of deafness [6]. It is likely that single base changes in other miRNA genes will be identified with other cases of inherited disease.
The total amount of DNA involved in coding for micro-RNAs is small, even though the initial transcripts before processing are several kilobases long. Even 1000 several kilobase genes would only constitute about 0.1% of a total genome of 3.3 Ć 109 bases. However, we may yet find out that the 45% or so of transcribed DNA not accounted for, but mutable as a result of the processes described above, has some function. We are still at the beginning of our understanding of the structure of the human genome.
POINT MUTATIONS
We know more about the mechanisms by which point mutations occur and of their effects than any of the other changes (Table 1.1). The original association of a single point mutation with a particular disease was the discovery that sickle cell anemia was, in Paulingās words, a āmolecular diseaseā since all of its signs can be traced to a substitution of a glutamic acid by a valine in the hemoglobin molecule. This change was later shown to be due to a change from GAG (glutamic acid) to GTG (valine) [7]. The association of genetic change with a particular disease dates back at least a century to Garrodās discovery of the Mendelian inheritance of alkaptonuria, but Garrod had no idea of the nature of the genetic change. The second reason is methodological. Before the age of PCR and of DNA sequencing, mutations were most easily studied in the bacteria and the bacterial viruses. Ernst Freese and Seymour Benzer first systematized the possible single base changes. These investigators and their co-workers coined the terms ātransitionā to denote the change from one purine to another or of one pyrimidine to another. The four possible transitions are cytosine (C) to thymine (T) and its reverse, and adenine (A) to guanine (G) and its reverse. Freese and Benzer defined ātransversionsā as changes from a purine to a pyrimidine or the reverse. So a change from an A or a G to a C or a T, and the reverse C or T to A or G were defined as transversions. Shortly afterwards, Brenner pointed out that some of the putative transversions were actually additions or deletions of one or two nucleotides [8]. These were later called frameshifts, because of the discovery that the genetic code was formed of triplets, groups of three nucleotides read together to specify a particular amino acid. The addition (or deletion) of any number of nucleotides not divisible by three would result in a change in the reading āframeā of three nucleotides at a time, thereby changing the amino acid composition of all amino acids downstream of the coding change. The details of the genetic code, as elucidated in the 1960s, also indicated that such frameshifts might not only result in major changes in the amino acid composition of a protein but might also produce unexpected termination codons as a result of the shift. Point mutations that resulted in protein terminations were at first termed ānonsenseā mutations as opposed to missense mutations that resulted in the substitution of one amino acid for another. The nonsense mutations did not make āsenseā, i.e. did not specify any amino acid. There are three such codons, now called termination (ter) codons: UAA, UAG, and UGA. Since messenger RNA is the molecule that is actually read by the protein synthesizing machinery, the code is an RNA code with U(racil) substituted for T(hymine). One of the stop codons, UGA, is read as tryptophan in mitochondria and the mitochondrial code includes a few other variations: AGG and AGA are mitochondrial stop codons instead of coding for arginine, and AUA codes for methionine instead of isoleucine. There are 64 possible codons but only 20 (natural) amino acids and the code is degenerate, or redundant, in that several codons can specify the same amino acid. Point mutations within genes that do not change the meaning (amino acid coded) of the codon are termed synonymous or āsilentā as opposed to non-synonymous changes. Although there was some initial confusion about the necessity of punctuation between the triplet codons, it was realized that if reading of the code began at a fixed site, and if the reading āframeā was designed to read three nucleotides at a time, the correct sequence of amino acids would automatically be produced.
This terminology was developed before it was realized that there were large amounts of non-coding DNA. Even though it was (is) used, āframeshiftā has no meaning for mutations within such non-coding regions. A more recent term for small insertions or deletions, regardless of their physiological effect is āindelā. Notwithstanding, or because of the universal nature of the code, there still remain some mysteries. The codon AUG specifies methionine and it also signals āstartā, but not always. UGA specifies stop but some UGA codons specify the 21st amino acid, selenocysteine. The recoding of UGA is determined by the surrounding sequence, a characteristic stem loop located in the 3'-end of mammalian mRNAs.
MUTAGENIC AGENTS
It was first assumed that the fidelity of normal replication stemmed from the stability of the A:T and G:C base pairs resulting from hydrogen bonding. The success of the early workers on the molecular nature of mutations came from their ability to account for the specificity of a variety of mutagenic base analogs and other mutagenic agents. They were able to draw acceptable alternative base pairings resulting from the incorporation of these compounds into DNA, or by their distortion of the replication machinery. For a number of years research on mutational mechanisms consisted largely of formulations of how changes in DNA structure resulting from treatment with, or incorporation of, mutagenic chemicals could change the base pairing properties of the replicating DNA so that mistakes were made. The supposition by Bruce Ames that āCarcinogens are Mutagensā [9] (which did not, it turned out, mean that all carcinogens are mutagens) prompted chemists to study the changes in DNA structure produced by reaction with chemical and physical mutagens. Alkylating agents such as methyl nitrosourea and the chemotherapeutically active mustard gas derivatives were shown to react with individual nucleotides to produce multiple changes. Production of O6 methyl guanine by agents such as methyl nitrosourea or methyl nitronitrosoguanidine was shown to promote mistaken base pairing, making understandable the highly mutagenic characteristics of such compounds. A major development was the discovery that metabolic systems in the host activated ingested compounds making it possible for them to react with DNA. Carcinogenic polycyclic hydrocarbons and aflatoxins are converted to epoxide derivatives with the participation of the cytochrome p450 system. These epoxides react directly with DNA producing mutagenic adducts. The Ames assay, a bacterial test for mutagenic activity, was modified to account for such activation by the incorporation of a liver extract to media on which presumptive mutagens were tested [9].
TABLE 1.1 Definitions
We live in an environment that is essentially 55.6 Molar water. Given the law of mass action, hydrolytic reactions are inevitable. It has been estimated that we lose about 18,000 bases in every 24-hour period as a result of hydrolysis of the glycosidic bond [1]. The abasic sites so created are intrinsically mutagenic and organisms have devised a set of enzymes to survey and repair the damage. However, the most reactive mutagen in our environment is undoubtedly oxygen. Breathing, however unavoidable, is inherently dangerous! The electron transport chain by which ATP is generated results in the generation of reactive oxygen species (ROS) that produce the hydroxyl radical OH. When formed in proximity to DNA this species produces a variety of oxidation products, of which a guanine with a saturated imidazole ring (8-oxoguanine) is the most important. It is so important that enzymes utilizing different tactics are produced to overcome the effects [10]. The first, OGG1, is a glycosylase that removes the damaged base and sets off the base excision repair sequence. 8-oxoguanine in a DNA template strand has an increased probability of pairing with adenine resulting in G:CāT:A transversions. The human enzyme MYH (bacterial mutY homolog) removes As inserted opposite such damaged Gs. Base excision repair then occurs. Most of the time, the correct C will be inserted, but if not, the cycle can be repeated until an apoptic reaction ensues.
Some mutagenic changes, particularly those close to repeated sequences, seem to defy the specificity rules. Studies of the influence of the surrounding sequence on mutation permitted the deduction, supported by evidence with in vitro models, that what at first appeared to be base substitution mutagenesis might actually be the result of slippage errors in DNA replication [11]. Frameshifts (indels) are also likely to occur at regions of repeated sequence and the models account for such changes as a result of misalignments of either template or newly synthesized strand as a result of āslippageā during replication. Dissociation and reassociation of DNA strands occurs repeatedly and when there are repeated sequences, the reassociation may occur so that some of the repeated sequences in the newly synthesized strand āloop outā of the structure making addition of extra bases possible.
SPECIFICITY RULES
Most (early) considerations of mutational specificity and frequency focused on considerations of hydrogen bonding and on the base pairs suggested by Watson and Crick. Occasionally, an alternative pairing, the Hoogsteen base pair, was suggested to account for particular cases of specificity [12]. Notwithstanding the ability of such models to account for the data, it is not at all clear that hydrogen bonding is critical for successful DNA synthesis [13]. A non-polar thymine analog, (2,4 difluorotoluene) unable to form hydrogen-bonded base pairs can be incorporated selectively opposite adenine or an equivalent non-polar analog. The results obtained with such non-polar analogs depend on the polymerase used for the study. For some enzymes shape of the substrate is the determining factor; for others hydrogen bonding remains an important feature. Such studies with different polymerases indicate that the chemical nature of the substrate nucleotide is not the sole determinant of mutagenic specificity.
A favorite subject for study has been an abasic site, a point in the sequence at which a base has been removed without breaking the backbone DNA chain. In the absence of a base to serve as template, one might suppose that if replication were to proceed it would do so with bases added at random. In fact, this is not how it works [14]. Although the abasic site is a barrier to normal synthesis, some polymerases are able to insert bases opposite such sites and to extend DNA chains past the lesion. Bacterial replicative enzymes tend to ins...