Mathematics of Bioinformatics: Theory, Methods, and Applicationsprovides acomprehensiveformat for connecting and integrating information derived from mathematical methods and applying it to the understanding of biological sequences, structures, and networks. Each chapter is divided into a number of sections based on the bioinformatics topics and related mathematical theory and methods. Each topic of the section is comprised of the following three parts: an introduction to the biological problems in bioinformatics;a presentation ofrelevant topics of mathematical theory and methods to the bioinformatics problems introduced in the first part; an integrative overview that draws the connections and interfaces between bioinformatics problems/issues and mathematical theory/methods/applications.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Mathematics of Bioinformatics by Matthew He,Sergey Petoukhov, Yi Pan,Albert Y. Zomaya in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming Algorithms. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Wiley-Interscience

Year

2011

ISBN

9781118099520

Edition

Topic

Computer Science

Subtopic

Programming Algorithms

Index

Computer Science

Bioinformatics and Mathematics

Traditionally, the study of biology is from morphology to cytology and then to the atomic and molecular level, from physiology to microscopic regulation, and from phenotype to genotype. The recent development of bioinformatics begins with research on genes and moves to the molecular sequence, then to molecular conformation, from structure to function, from systems biology to network biology, and further investigates the interactions and relationships among, genes, proteins, and structures. This new reverse paradigm sets a theoretical starting point for a biological investigation. It sets a new line of investigation with a unifying principle and uses mathematical tools extensively to clarify the ever-changing phenomena of life quantitatively and analytically.

It is well known that there is more to life than the genomic blueprint of each organism. Life functions within the natural laws that we know and those that we do not know. Life is founded on mathematical patterns of the physical world. Genetics exploits and organizes these patterns. Mathematical regularities are exploited by the organic world at every level of form, structure, pattern, behavior, interaction, and evolution. Essentially all knowledge is intrinsically unified and relies on a small number of natural laws. Mathematics helps us understand how monomers become polymers necessary for the assembly of cells. Mathematics can be used to understand life from the molecular to the biosphere levels, including the origin and evolution of organisms, the nature of genomic blueprints, and the universal genetic code as well as ecological relationships.

Mathematics and biological data have a synergistic relationship. Biological information creates interesting problems, mathematical theory and methods provide models for understanding them, and biology validates the mathematical models. A model is a representation of a real system. Real systems are too complicated, and observation may change the real system. A good system model should be simple, yet powerful enough to capture the behavior of the real system. Models are especially useful in bioinformatics. In this chapter we provide an overview of bioinformatics history, genetic code and mathematics, background mathematics for bioinformatics, and the big picture of bioinformatics–informatics.

1.1 INTRODUCTION

Mendel’s Genetic Experiments and Laws of Heredity

The discovery of genetic inheritance by Gregor Mendel back in 1865 was considered as the start of bioinformatics history. He did experiments on the cross-fertilization of different colors of the same species. Mendel’s genetic experiments with pea plants took him eight years (1856–1863). During this time, Mendel grew over 10,000 pea plants, keeping track of progeny number and type. He recorded the data carefully and performed mathematical analysis of the data. Mendel illustrated that the process of inheritance of traits could be explained more easily if it was controlled by factors passed down from generation to generation. He concluded that genes come in pairs. Genes are inherited as distinct units, one from each parent. He also recorded the segregation of parental genes and their appearance in the offspring as dominant or recessive traits. He published his results in 1865. He recognized the mathematical patterns of inheritance from one generation to the next. Mendel’s laws of heredity are usually stated as follows:

The law of segregation. A gene pair defines each inherited trait. Parental genes are randomly separated by the sex cells, so that sex cells contain only one gene of the pair. Offspring therefore inherit one genetic allele from each parent.
The law of independent assortment. Genes for different traits are sorted from one another in such a way that the inheritance of one trait is not dependent on the inheritance of another.
The law of dominance. An organism with alternate forms of a gene will express the form that is dominant.

In 1900, Mendel’s work was rediscovered independently by DeVries, Correns, and Tschermak, each of whom confirmed Mendel’s discoveries. Mendel’s own method of research is based on the identification of significant variables, isolating their effects, measuring these meticulously, and eventually subjecting the resulting data to mathematical analysis. Thus, his work is connected directly to contemporary theories of mathematics, statistics, and physics.

Origin of Species

Charles Darwin published On the Origin of Species by Means of Natural Selection (Darwin, 1859) or “The Preservation of Favored Races in the Struggle for Life.” His key work was that evolution occurs through the selection of inheritance and involves transmissible rather than acquired characteristics between individual members of a species. Darwin’s landmark theory did not specify the means by which characteristics are inherited. The mechanism of heredity had not been determined at that time.

First Genetic Map

In 1910, after the rediscovery of Mendel’s work, Thomas Hunt Morgan at Columbia University carried out crossing experiments with the fruit fly (Drosophila melanogaster). He proved that the genes responsible for the appearance of a specific phenotype were located on chromosomes. He also found that genes on the same chromosome do not always assort independently. Furthermore, he suggested that the strength of linkage between genes depended on the distance between them on the chromosome. That is, the closer two genes lie to each other on a chromosome, the greater the chance that they will be inherited together. Similarly, the farther away they are from each other, the greater the chance of that they will be separated in the process of crossing over. The genes are separated when a crossover takes place in the distance between the two genes during cell division. Morgan’s experiments also lead to Drosophila’s unusual position as, to this day, one of the best studied organisms and most useful tools in genetic research. In 1911, Alfred Sturtevant, then an undergraduate researcher in the laboratory of Thomas Hunt Morgan, mapped the locations of the fruit fly genes, creating the first genetic map ever made.

Transposable Genetic Elements

In 1944, Barbara McClintock discovered that genes can move on a chromosome and can jump from one chromosome to another. She studied the inheritance of color and pigment distribution in corn kernels at the Carnegie Institution Department of Genetics in Cold Spring Harbor, New York. At age 81 she was awarded a Nobel prize. It is believed that transposons may be linked to such genetic disorders as hemophilia, leukemia, and breast cancer; and transposons may have played a crucial role in evolution.

DNA Double Helix

In 1953, James Watson and Francis Crick proposed a double-helix model of DNA. DNA is made of three basic components: a sugar, an acid, and an organic “base.” The base was always one of the four nucleotides: adenine (A), cytosine (C), guanine (G), or thymine (T). These four different bases are categorized in two groups: purines (adenine and guanine) and pyrimidines (thymine and cytosine). In 1950, Erwin Chargaff found that the amounts of adenine (A) and thymine (T) in DNA are about the same, as are the amounts of guanine (G) and cytosine (C). These relationships later became known as “Chargaff’s rules” and led to much speculation about the three-dimensional structure that DNA would have. Rosalind Franklin, a British chemist, used the x-ray diffraction technique to capture the first high-quality images of the DNA molecule. Franklin’s colleague Maurice Wilkins showed the pictures to James Watson, an American zoologist, who had been working with Francis Crick, a British biophysicist, on the structure of the DNA molecule. These pictures gave Watson and Crick enough information to propose in 1953 a double-stranded, helical, complementary, antiparallel model for DNA. Crick, Watson, and Wilkins shared the 1962 Nobel Prize in Physiology or Medicine for the discovery that the DNA molecule has a double-helical structure. Rosalind Franklin, whose images of DNA helped lead to the discovery, died of cancer in 1958 and, under Nobel rules, was not eligible for the prize. In 1957, Francis Crick and George Gamov worked out the “central dogma,” explaining how DNA functions to make protein. Their sequence hypothesis posited that the DNA sequence specifies the amino acid sequence in a protein. They also suggested that genetic information flows only in one direction, from DNA to messenger RNA to protein, the central concept of the central dogma.

Genetic Code (see Appendix A)

The genetic code was finally “cracked” in 1966. Marshall Nirenberg, Heinrich Mathaei, and Severo Ochoa demonstrated that a sequence of three nucleotide bases, a codon or triplet, determines each of the 20 amino acids found in nature. This means that there are 64 possible combinations (4³ = 64) for 20 amino acids. They formed synthetic messenger ribonucleic acid (mRNA) by mixing the nucleotides of RNA with a special enzyme called polynucleotide phosphorylase. This resulted in the formation of a single-stranded RNA in this reaction. The question was how these 64 genetic codes could code for 20 different amino acids. Nirenberg and Matthaei synthesized poly(U) by reacting only uracil nucleotides with the RNA-synthesizing enzyme, producing –UUUU–. They mixed this poly(U) with the protein-synthesizing machinery of Escherichia coli in vitro and observed the formation of a protein. This protein turned out to be a polypeptide of phenylalanine. They showed that a triplet of uracil must code for phenylalanine. Philip Leder and Nirenberg found an even better experimental protocol to solve this fundamental problem. By 1965 the genetic code was solved almost completely. They found that the “extra” codons are merely redundant: Some amino acids have one or two codons, some have four, and some have six. Three codons (called stop codons) serve as stop signs for RNA-synthesizing proteins.

First Recombinant DNA Molecules

In 1972, Paul Berg of Stanford University created the first recombinant DNA molecules by combining the DNA of two different organisms. He used a restriction enzyme to isolate a gene from a human-cancer-causing monkey virus. Then he used lipase to join the section of virus DNA with a molecule of DNA from the bacterial virus lambda, creating the first recombinant DNA molecule. He realized the risks of his experiment and terminated it temporarily before the recombinant DNA molecule was added to E. coli, where it would have quickly been reproduced. He proposed a one-year moratorium on recombinant DNA studies while safety issues were addressed. Berg later resumed his studies of recombinant DNA techniques and was awarded the 1980 Nobel Prize in Chemistry. His experiments paved the road for the field of genetic engineering and the modern biotechnology industry.

DNA Sequencing and Database

In early 1974, Frederick Sanger from the UK Medical Research Council was first to invent DNA-sequencing techniques. During his experiments to uncover the amino acids in bovine insulin, he developed the basics of modern sequencing methods. Sanger’s approach involved copying DNA strands, which would show the location of the nucleotides in the strands. To apply Sanger’s approach, scientists had to analyze the composite collections of DNA pieces detected from four test tubes, one for each of the nucleotides found in DNA (adenosine, cytosine, thymidine, guanine). Then they needed to be arranged in the correct order. This technique is very slow and tedious. It takes many years to sequence only a few million letters in a string of DNA. Almost simultaneously, the American scientists Alan Maxam and Walter Gilbert were creating a different method called the cleavage method. The base for virtually all DNA sequencing was the dideoxy-chain-terminating reaction developed by Sanger.

In 1978, David Botstein developed restriction-fragment-length polymorphisms. Individual human beings differ one base pair in every 500 nucleotides or so. The most interesting variations for geneticists are those that are recognized by certain enzymes called restriction enzymes. Each of these enzymes cuts DNA only in the presence of a specific sequence (e.g., GAATTC in the case of the restriction enzyme EcoR1). This sequence is called a restriction site. The enzyme will bypass the region if it has mutated to GACTTC. Thus, when a specific restriction enzyme cuts the DNA of different people, it may produce fragments of different lengths. These DNA fragments can be separated according to size by making them move through a porous gel in an electric field. Since the smaller fragments move more rapidly than the larger ones, their sizes can be determined by examining their positions in the gel. Variations in their lengths are called restriction-fragment-length polymorphisms.

In 1980, Kary Mullis invented polymerase chain reaction (PCR), a method fo...

Cover
Series page
Title page
COPYRIGHT PAGE
PREFACE
ABOUT THE AUTHORS
1 Bioinformatics and Mathematics
2 Genetic Codes, Matrices, and Symmetrical Techniques
3 Biological Sequences, Sequence Alignment, and Statistics
4 Structures of DNA and Knot Theory
5 Protein Structures, Geometry, and Topology
6 Biological Networks and Graph Theory
7 Biological Systems, Fractals, and Systems Biology
8 Matrix Genetics, Hadamard Matrices, and Algebraic Biology
9 Bioinformatics, Denotational Mathematics, and Cognitive Informatics
10 Evolutionary Trends and Central Dogma of Informatics
APPENDIX A Bioinformatics Notation and Databases
APPENDIX B Bioinformatics and Genetics Time Line
APPENDIX C Bioinformatics Glossary
Index
Wiley Series on Bioinformatics: Computational Techniques and Engineering