eBook - ePub

Introduction to Bioinformatics with R

Name: Introduction to Bioinformatics with R
Author: Edward Curry

A Practical Guide for Biologists

Edward Curry,

298 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Introduction to Bioinformatics with R

A Practical Guide for Biologists

Edward Curry,

Book details

Book preview

Table of contents

Citations

About This Book

In biological research, the amount of data available to researchers has increased so much over recent years, it is becoming increasingly difficult to understand the current state of the art without some experience and understanding of data analytics and bioinformatics. An Introduction to Bioinformatics with R: A Practical Guide for Biologists leads the reader through the basics of computational analysis of data encountered in modern biological research. With no previous experience with statistics or programming required, readers will develop the ability to plan suitable analyses of biological datasets, and to use the R programming environment to perform these analyses. This is achieved through a series of case studies using R to answer research questions using molecular biology datasets. Broadly applicable statistical methods are explained, including linear and rank-based correlation, distance metrics and hierarchical clustering, hypothesis testing using linear regression, proportional hazards regression for survival data, and principal component analysis. These methods are then applied as appropriate throughout the case studies, illustrating how they can be used to answer research questions.

Key Features:

· Provides a practical course in computational data analysis suitable for students or researchers with no previous exposure to computer programming.

· Describes in detail the theoretical basis for statistical analysis techniques used throughout the textbook, from basic principles

· Presents walk-throughs of data analysis tasks using R and example datasets. All R commands are presented and explained in order to enable the reader to carry out these tasks themselves.

· Uses outputs from a large range of molecular biology platforms including DNA methylation and genotyping microarrays; RNA-seq, genome sequencing, ChIP-seq and bisulphite sequencing; and high-throughput phenotypic screens.

· Gives worked-out examples geared towards problems encountered in cancer research, which can also be applied across many areas of molecular biology and medical research.

This book has been developed over years of training biological scientists and clinicians to analyse the large datasets available in their cancer research projects. It is appropriate for use as a textbook or as a practical book for biological scientists looking to gain bioinformatics skills.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Introduction to Bioinformatics with R by Edward Curry in PDF and/or ePUB format, as well as other popular books in Matemáticas & Probabilidad y estadística. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Chapman and Hall/CRC

Year

2020

ISBN

9781351015295

Edition

Topic

Matemáticas

Subtopic

Probabilidad y estadística

Introduction

1.1Why informatics is important for biologists

This is really all about data. In particular, it’s about working with so much data that learning to program computers to perform calculations for us will save a lot of time, and probably make possible analysis that would otherwise be impossible. In biological research, the amount of data available to researchers has increased so much over recent years this has been described as a ‘data explosion’[1].

Much of this biological data is freely available for any researcher to access and use in their own work. Therefore, any biological scientist who learns skills to enable obtaining, preprocessing and analyzing publically-available datasets, is giving themselves an advantage when it comes to making the most out of their own opportunities.

One consequence of this increase in biological data is that many of the recent paradigms of molecular biology come from computational analysis of large collections of data. In terms of developing an intuition for what is shown when results from computational analysis is presented in a paper, there is no substitute for first-hand experience of using a method for data analysis in your own research (of course, a theoretical understanding of the method in question is also important!). In reality, it is becoming increasingly difficult to understand the current state of the art in biological research without some experience and understanding of computational biology.

In 2014, the UK’s MRC and BBSRC (Medical Research Council and Biotechnology & Biological Sciences Research Council) produced a report of ‘skills vulnerabilities’, which reflected important research capabilities lacking in the UK. Both in 2014 and in a 2017 update¹, computational methods for biological research were identified as key weaknesses. In fact, the following specific points were highlighted:

•Data analytics, especially bioinformatics, appear to be particularly vulnerable.

•Informatics skills are applicable to many areas of both the biosciences and the medical sciences.

•Maths, statistics and computational biology skills are lacking particularly at the postgraduate and postdoctoral levels, with many respondents reporting difficulties in recruiting adequately skilled researchers at these levels; shortages are not just restricted to the UK.

So there is a recognized international shortage of bioinformatics skills, and these skills are increasingly fundamental across all areas of biological research. You were probably already aware of this given you’re reading this, but it hopefully serves as a motivating reminder that learning the bioinformatics skills taught in this book will be worth the effort you put in!

1.2How to use this book

This book was developed over a decade of my experience training biologists to empower their own research through making better use of computers. I think there are three key aspects of this training, which are in essence the intended learning outcomes of this book:

1.theoretical understanding of how a set of computational analysis steps produce a result that yields biological insight

2.ability to plan a set of analysis steps that, when carried out on a given dataset, will yield biological insight

3.practical experience of enacting those plans on real datasets to produce novel, valuable research results

For the first of these, reading the chapters of this book should help. Reading this book should also help with the second. But the only way to gain the skills to carry out data analysis to give research results is to do it. There is simply no substitute for practical experience. Furthermore, the more experience you get carrying out data analysis, the more instinctively you will be able to plan analyses for your own research and to think of the best datasets to work with. Because there is no substitute for practice, this book is designed to give all the practical guidance someone needs to be able to carry out a set of analysis procedures. We will cover the procedures that are particularly useful for harnessing different types of biological data.

Because a lot of data analysis tools are not implemented in tools with convenient graphical user interfaces (GUIs), there is no avoiding a bit of coding. While at first this will almost certainly be frustrating to those new to a command line interface, with time and practice you will find that the automation you can implement empowers you to achieve all sorts of things that would otherwise be impossible (or at least impractical). To help in this process, (all) required computer code is provided, which are effectively individual commands given to the computer. Each line² of code is followed with detailed descriptions of every part of every command.

The first chapters of this book introduce R and the Unix command shell, which will be indispensible tools for data analysis. This will involve learning some of the building blocks for programming computers to perform many tasks in one go, without requiring continued instruction from a human. Many of the methods we use are theoretically simple enough to calculate by hand with a small set of observations, but the beauty of using command-line tools is that you can program them to perform huge numbers of repetitive tasks very quickly and automatically. One should also not underestimate the importance and power of ‘data wrangling’, which acknoweldges that the format in which you obtain data is rarely exactly the format that you need it in to perform the analyses you want.

The fourth chapter explains the mathematical theory behind the analysis methods that are employed throughout this book. To understand the theory, we’ll make use of the R environment to look at a few practical examples. Generally, I take the philosophy that a solid understanding of a few very versatile methods is the best strategy to enable a great variety of applications with as little effort as possible. A recurring theme of my research supervision is that the simpler your approach to demonstrate a finding, the better (as long as it’s appropriate): it will be understandable to more people, and therefore have greater impact, and will be less likely to be misinterpreted.

Chapters 5 to 7 use real research examples to build up your practical experience of obtaining and analyzing biological datasets, utilizing the statistical analysis methods described in Chapter 4. The examples use already-processed datasets, so that the focus is on the analysis rather than worrying about formats. The complexity of the tasks and the datasets involved builds through these chapters, so that by the end of Chapter 7 we are systematically evaluating patterns of variation of hundreds of features from multiple platforms used to characterize different aspects of the same samples.

And finally, the bulk of this book by volume guides you through the specifics of working with different types of biological datasets. I have included those I think are the most frequently-encountered across molecular biology research, but this is certainly influenced by my own background in cancer research. The choice of data types to cover also balances the accessibility of obtaining, pre-processing and analyzing the data, so that we get the most out of the least effort.

A word of warning: it is easy to feel isolated in research, and that can be problematic when you find yourself, still new to bioinformatics, as the expert for your research group or team. There is an excellent blog post from Mick Watson³ on problems facing ‘lonely bioinformaticians’. Most importantly, don’t be afraid of looking to others for help.

You can do this! Stick with it, and you should find that you’re able to make more use of the data you generate and the vast accumulation of molecular biology data that is already in the public domain.

Bibliography

[1]V Marx. “The big challenges of big data,” Nature 498:255-260 (2013).

¹https://mrc.ukri.org/documents/pdf/review-of-vulnerable-skills-and-capabilities/

²Note that this is a line as ...

Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Acknowledgements
1 Introduction
2 Introduction to R
3 An Introduction to LINUX for Biological Research
4 Statistical Methods for Data Analysis
5 Analyzing Generic Tabular Numeric Datasets in R
6 Functional Enrichment Analysis
7 Integrating Multiple Datasets in R
8 Analyzing Microarray Data in R
9 Analyzing DNA Methylation Microarray Data in R
10 DNA Analysis with Microarrays
11 Working with Sequencing Data
12 Genomic Sequence Profiling
13 ChIP-seq
14 RNA-seq
15 Bisulphite Sequencing
16 Final Notes
Index