eBook - ePub

Statistical and Machine Learning Approaches for Network Analysis

Name: Statistical and Machine Learning Approaches for Network Analysis
Author: Matthias Dehmer, Subhash C. Basak

Matthias Dehmer,

Subhash C. Basak,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Statistical and Machine Learning Approaches for Network Analysis

Matthias Dehmer,

Subhash C. Basak,

Book details

Book preview

Table of contents

Citations

About This Book

Explore the multidisciplinary nature of complex networks through machine learning techniques

Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks.

Comprised of chapters written by internationally renowned researchers in the field of interdisciplinary network theory, the book presents current and classical methods to analyze networks statistically. Methods from machine learning, data mining, and information theory are strongly emphasized throughout. Real data sets are used to showcase the discussed methods and topics, which include:

A survey of computational approaches to reconstruct and partition biological networks
An introduction to complex networks—measures, statistical properties, and models
Modeling for evolving biological networks
The structure of an evolving random bipartite graph
Density-based enumeration in structured data
Hyponym extraction employing a weighted graph kernel

Statistical and Machine Learning Approaches for Network Analysis is an excellent supplemental text for graduate-level, cross-disciplinary courses in applied discrete mathematics, bioinformatics, pattern recognition, and computer science. The book is also a valuable reference for researchers and practitioners in the fields of applied discrete mathematics, machine learning, data mining, and biostatistics.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Statistical and Machine Learning Approaches for Network Analysis by Matthias Dehmer, Subhash C. Basak in PDF and/or ePUB format, as well as other popular books in Matematica & Probabilità e statistica. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Wiley

Year

2012

ISBN

9781118346983

Edition

Topic

Matematica

Subtopic

Probabilità e statistica

Chapter 1

A Survey of Computational Approaches to Reconstruct and Partition Biological Networks

Lipi Acharya, Thair Judeh, Dongxiao Zhu

“Everything is deeply intertwingled”

Theodor Holm Nelson

1.1 Introduction

The above quote by Theodor Holm Nelson, the pioneer of information technology, states a deep interconnectedness among the myriad topics of this world. The biological systems are no exceptions, which comprise of a complex web of biomolecular interactions and regulation processes. In particular, the field of computational systems biology aims to arrive at a theory that reveals complicated interaction patterns in the living organisms, which result in various biological phenomenon. Recognition of such patterns can provide insights into the biomolecular activities, which pose several challenges to biology and genetics. However, complexity of biological systems and often an insufficient amount of data used to capture these activities make a reliable inference of the underlying network topology as well as characterization of various patterns underlying these topologies, very difficult. As a result, two problems that have received a considerable amount of attention among researchers are (1) reverse engineering of biological networks from genome-wide measurements and (2) inference of functional units in large biological networks (Fig 1.1).

Figure 1.1 Approaches addressing two fundamental problems in computational systems biology (1) reconstruction of biological networks from two complementary forms of data resources, gene expression data and gene sets and (2) partitioning of large biological networks to extract functional units. Two classes of problems in network partitioning are graph clustering and community detection.

Rapid advances in high-throughput technologies have brought about a revolution in our understanding of biomolecular interaction mechanisms. A reliable inference of these mechanisms directly relates to the measurements used in the inference procedure. High throughput molecular profiling technologies, such as microarrays and second-generation sequencing, have enabled a systematic study of biomolecular activities by generating an enormous amount of genome-wide measurements, which continue to accumulate in numerous databases. Indeed, simultaneous profiling of expression levels of tens of thousands of genes allows for large-scale quantitative experiments. This has resulted in substantial interest among researchers in the development of novel algorithms to reliably infer the underlying network topology using gene expression data. However, gaining biological insights from large-scale gene expression data is very challenging due to the curse of dimensionality. Correspondingly, a number of computational and experimental methods have been developed to arrange genes in various groups or clusters, on the basis of certain similarity criterion. Thus, an initial characterization of large-scale gene expression data as well as conclusions derived from biological experiments result in the identification of several smaller components comprising of genes sharing similar biological properties. We refer to these components as gene sets. Availability of effective computational and experimental strategies have led to the emergence of gene sets as a completely new form of data for the reverse engineering of gene regulatory relationships. Gene set based approaches have gained more attention for their inherent ability to incorporate higher-order interaction mechanisms as opposed to individual genes.

There has been a sequence of computational efforts addressing the problem of network reconstruction from gene expression data and gene sets. Gaussian graphical models (GGMs) [1–3], probabilistic Boolean networks (PBNs) [4–7], Bayesian networks (BNs) [8, 9], differential equation based [10, 11] and mutual information networks such as relevance networks (RNs) [12, 13], ARACNE [14], CLR [15], MRNET [16] are viable approaches capitalizing on the use of gene expression data, whereas collaborative graph model (cGraph) [17], frequency method (FM) [18], and network inference from cooccurrences (NICO) [19, 20] are suitable for the reverse engineering of biological networks from gene sets.

After a biological network is reconstructed, it may be too broad or abstract of a representation for a particular biological process of interest. For example, given a specific signal transduction, only a part of the underlying network is activated as opposed to the entire network. A finer level of detail is needed. Furthermore, these parts may represent the functional units of a biological network. Thus, partitioning a biological network into different clusters or communities is of paramount importance.

Network partitioning is often associated with several challenges, which make the problem NP-hard [21]. Finding the optimal partitions of a given network is only feasible for small networks. Most algorithms heuristically attempt to find a good partitioning based on some chosen criteria. Algorithms are often suited to a specific problem domain. Two major classes of algorithms in network partitioning find their roots in computer science and sociology, respectively [22]. To avoid confusion, we will refer to the first class of algorithms as graph clustering algorithms and the second class of algorithms as community detection algorithms. For graph clustering algorithms, the relevant applications include very large-scale integration (VLSI) and distributing jobs on a parallel machine. The most famous algorithm in this domain is the Kernighan–Lin algorithm [23], which still finds use as a subroutine for various other algorithms. Other graph clustering algorithms include techniques based on spectral clustering [24]. Originally community detection algorithms focused on social networks in sociology. They now cover networks of interest to biologists, mathematicians, and physicists. Some popular community detection algorithms include Girvan–Newman algorithm [25], Newman's eigenvector method [21, 22], clique percolation algorithm [26], and Infomap [27]. Additional community detection algorithms include methods based on spin models [28, 29], mixture models [30], and label propagation [31].

Intuitively, reconstruction and partitioning of biological networks appear to be two completely opposite problems in that the former leads to an increase, whereas the latter results in a decrease of the dimension of a given structure. In fact, these problems are closely related and one leads to the foundation of the other. For instance, presence of hypothetical gene regulatory relationships in a reconstructed network provides a motivati...

Cover
Title Page
Copyright
Dedication
Preface
Contributors
Chapter 1: A Survey of Computational Approaches to Reconstruct and Partition Biological Networks
Chapter 2: Introduction to Complex Networks: Measures, Statistical Properties, and Models
Chapter 3: Modeling for Evolving Biological Networks
Chapter 4: Modularity Configurations in Biological Networks with Embedded Dynamics
Chapter 5: Influence of Statistical Estimators on the Large-Scale Causal Inference of Regulatory Networks
Chapter 6: Weighted Spectral Distribution: A Metric for Structural Analysis of Networks
Chapter 7: The Structure of an Evolving Random Bipartite Graph
Chapter 8: Graph Kernels
Chapter 9: Network-Based Information Synergy Analysis for Alzheimer Disease
Chapter 10: Density-Based Set Enumeration in Structured Data
Chapter 11: Hyponym Extraction Employing a Weighted Graph Kernel
Index