Biologists are stepping up their efforts in understanding the biological processes that underlie disease pathways in the clinical contexts. This has resulted in a flood of biological and clinical data from genomic and protein sequences, DNA microarrays, protein interactions, biomedical images, to disease pathways and electronic health records. To exploit these data for discovering new knowledge that can be translated into clinical applications, there are fundamental data analysis difficulties that have to be overcome. Practical issues such as handling noisy and incomplete data, processing compute-intensive tasks, and integrating various data sources, are new challenges faced by biologists in the post-genome era. This book will cover the fundamentals of state-of-the-art data mining techniques which have been designed to handle such challenging data analysis problems, and demonstrate with real applications how biologists and clinical scientists can employ data mining to enable them to make meaningful observations and discoveries from a wide array of heterogeneous data from molecular biology to pharmaceutical and clinical domains.

Contents:

Sequence Analysis:
- Mining the Sequence Databases for Homology Detection: Application to Recognition of Functions of Trypanosoma brucei brucei Proteins and Drug Targets (G Ramakrishnan, V S Gowri, R Mudgal, N R Chandra and N Srinivasan)
- Identification of Genes and Their Regulatory Regions Based on Multiple Physical and Structural Properties of a DNA Sequence (Xi Yang, Nancy Yu Song and Hong Yan)
- Mining Genomic Sequence Data for Related Sequences Using Pairwise Statistical Significance (Yuhong Zhang and Yunbo Rao)
Biological Network Mining:
- Indexing for Similarity Queries on Biological Networks (Günhan Gülsoy, Md Mahmudul Hasan, Yusuf Kavurucu and Tamer Kahveci)
- Theory and Method of Completion for a Boolean Regulatory Network Using Observed Data (Takeyuki Tamura and Tatsuya Akutsu)
- Mining Frequent Subgraph Patterns for Classifying Biological Data (Saeed Salem)
- On the Integration of Prior Knowledge in the Inference of Regulatory Networks (Catharina Olsen, Benjamin Haibe-Kains, John Quackenbush and Gianluca Bontempi)
Classification, Trend Analysis and 3D Medical Images:
- Classification and Its Application to Drug-Target Prediction (Jian-Ping Mei, Chee-Keong Kwoh, Peng Yang and Xiao-Li Li)
- Characterization and Prediction of Human Protein-Protein Interactions (Yi Xiong, Dan Syzmanski and Daisuke Kihara)
- Trend Analysis (Wen-Chuan Xie, Miao He and Jake Yue Chen)
- Data Acquisition and Preprocessing on Three Dimensional Medical Images (Yuhua Jiao, Liang Chen and Jin Chen)
Text Mining and Its Biomedical Applications:
- Text Mining in Biomedicine and Healthcare (Hong-Jie Dai, Chi-Yang Wu, Richard Tzong-Han Tsai and Wen-Lian Hsu)
- Learning to Rank Biomedical Documents with Only Positive and Unlabeled Examples: A Case Study (Mingzhu Zhu, Yi-Fang Brook Wu, Meghana Samir Vasavada and Jason T L Wang)
- Automated Mining of Disease-Specific Protein Interaction Networks Based on Biomedical Literature (Rajesh Chowdhary, Boris R Jankovic, Rachel V Stankowski, John A C Archer, Xiangliang Zhang, Xin Gao, Vladimir B Bajic)

Readership: Students, professionals, those who perform biological, medical and bioinformatics research.
Key Features:

Each chapter of this book will include a section to introduce a specific class of data mining techniques, which will be written in a tutorial style so that even non-computational readers such as biologists and healthcare researchers can appreciate them
The book will disseminate the impact research results and best practices of data mining approaches to the cross-disciplinary researchers and practitioners from both the data mining disciplines and the life sciences domains. The authors of the book will be well-known data mining experts, bioinformaticians and clinicians
Each chapter will also provide a detailed description on how to apply the data mining techniques in real-world biological and clinical applications. Thus, readers of this book can easily appreciate the computational techniques and how they can be used to address their own research issues

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Biological Data Mining And Its Applications In Healthcare by Xiaoli Li, See-Kiong Ng, Jason T L Wang in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Science General. We have over one million books available in our catalogue for you to explore.

Information

Publisher

WSPC

Year

2013

ISBN

9789814551021

Topic

Biological Sciences

Subtopic

Science General

Index

Biological Sciences

1. Sequence Analysis

Chapter 1

Mining the Sequence Databases for Homology Detection:
Application to Recognition of Functions of Trypanosoma brucei brucei
Proteins and Drug Targets

G. Ramakrishnan^{1, 2}, V.S. Gowri^{2, ‡}, R. Mudgal^{1, 2}, N.R. Chandra³ and
N. Srinivasan²

¹Indian Institute of Science Mathematics Initiative,
²Molecular Biophysics Unit,
³Department of Biochemistry, Indian Institute of Science,
Bangalore-560012, India.
^‡Present Address: School of Life Sciences, Jawaharlal Nehru University,
New Delhi-110067, India

With the amount of data deluge as a result of high-throughput sequencing techniques and structural genomics initiatives, there comes a need to leverage the large-scale data. Consequently, the role of computational methods to characterize genes and proteins solely from their sequence information becomes increasingly important. Over the past decade, development of sensitive profile-based sequence database search algorithms has improved the quality of structural and functional inferences from protein sequence. This chapter highlights the use of such sensitive approaches in recognition of evolutionary related proteins when the amino acid sequence similarity is very low. We further demonstrate the use of sequence database mining based remote homology detection methods in exploring the repertoire of functions and three dimensional structures of parasitic proteins in Trypanosoma brucei brucei, causative agent of African sleeping sickness. With an emphasis on various metabolic pathways, sequence-function and structure-function relationships are investigated. Integrating the information of parasitic proteins in metabolic pathways along with their homology to targets of FDA-approved drugs, attractive drug targets have been proposed.

1. Introduction

Over 17 million protein sequences have been deposited in the public databases, and this number has been growing rapidly (http://ncbi.nlm.nih.gov/RefSeq). On the other hand, over 80,000 protein structures have been experimentally determined so far. With the rapidly growing disparity in the data deluge, the role of computational methods to characterize the function of the proteins from their sequence becomes increasingly important.

The intimate relationship between protein structure and its function has led to the view that a reliable prediction of the structure of a protein from its sequence could give useful insights on its function. Many structure-based approaches for function prediction today do provide reliable models for a substantial fraction of the protein space¹. The widening gap between deluge of sequences and the experimental characterization of the respective proteins necessitates the utility of sequence-based as well as structure-based sensitive remote homology search techniques in identifying evolutionary relationships.

This chapter starts with description of remote homology detection methods for proteins with a focus on recognition of evolutionarily related proteins when the amino acid sequence similarity is very low. In particular, the power of profile-based sequence database search techniques and techniques involving matching of Hidden Markov Models (HMM) of protein domain families will be outlined. We will provide description of these extremely sensitive sequence search techniques in simple language with the tone of tutorial.

We shall then demonstrate the use of such techniques on the proteins encoded in the genome of Trypanosoma brucei brucei (one of the subspecies of Trypanosoma brucei) which causes African sleeping sickness. African trypanosomes are parasitic protozoa that belong to the class Kinetoplastida. These protozoan parasites are important mainly because of their energy metabolism. The metabolic enzymes of trypanosomes are very different from the host enzymes as these are localized in a specialized organelle called ‘glycosomes’. The life cycle of Trypanosoma brucei involves the insect host which is the tsetse fly and the mammalian host such as human, cattle and other life forms, depending on the type of subspecies infecting the host.

By application of sequence database mining based remote homology detection methods, we will recognize the repertoire of functions and three dimensional structures of proteins of the parasite, followed by integrating this dataset with information on various metabolic pathways. We will then combine the information of parasitic proteins in metabolic pathways with their homology to the targets of FDA-approved drugs, thereby recognizing attractive drug targets.

2. Remote homology driven approaches for protein function annotation

Database similarity searches have become a mainstay of bioinformatics, where protein homology detection plays a pivotal role in understanding the evolution of protein structures, functions and interactions. Many of the developments in protein bioinformatics can be traced back to an initial step of homology detection.

Studies on evolution are largely influenced by protein homology detection and two proteins are said to be homologous if there exists a protein, an ancestor, from which these two proteins have evolved. Homology in a literal sense means descent from a common ancestor. To determine the likeliness of two proteins being evolved from a common ancestor, calculations according to the model of evolution could be assessed, which when high, can very well support the likelihood within the framework of the evolutionary model.

When a protein sequence is found to be homologous to a protein of known function, then it raises the possibility of both sharing functional features. This follows from the fact that functional residues are usually conserved during the course of evolution, and hence evolutionarily related proteins show high functional similarity.

Efforts have been made for decades to explore closely related homologues for protein function annotation, and detecting remote (distantly related) homologues in order to explore protein sequence/ structure space. The progress made in remote homology detection will be described in the following subsections, highlighting the use of extremely sensitive sequence search techniques. In the interest of the scope of this chapter, the description of each technique is limited to the basic understanding of principles of the algorithm employed. Readers are encouraged to refer to the original publications for details in mathematical basis of the techniques.

2.1. Sequence-based approaches for remote homology detection

Identification of well-characterized homologues of protein sequences is usually identified by matching pairs of sequence and the most widely used tool for sequence comparison and database searching is BLAST (Basic Local Alignment Search Tool)². Chances of reliable detection of evolutionary relationships become smaller when the sequence identities of the related proteins go below 30%³. To improve the effectiveness of remote homology detection, sensitive search procedures based on the use of profiles such as Hidden Markov Models (HMMs) and Position Specific Scoring Matrices (PSSMs) were developed. Description of such sensitive methods followed by the assessment of significant sequence alignments are presented in the following subsections.

2.1.1. Iterated searches using PSI-BLAST

Position-Specific Iterated (PSI)-BLAST is a protein sequence profile search method that is far more capable of detecting remote homologues⁴ than single query...

Cover
Halftitle
Seriestitle
Title Page
Copyright Page
Preface
Contents
Part I: Sequence Analysis
Part II: Biological Network Mining
Part III: Classification, Trend Analysis and 3D Medical Images
Part IV: Text Mining and its Biomedical Applications
Index

About This Book