1.1 Overview
In this chapter, we introduce the main perspectives of the book: bioinformatics and computer science. In Section 1.2, we offer a working definition of the term âbioinformatics', we discuss where the discipline came from, and consider the impact of the genome-sequencing revolution. In Section 1.3, we discuss the origins of computer science, and note the emerging challenges relating to how to manage and describe biological data in ways that are computationally tractable. Having set the scene, we reflect briefly on some of the gaps that now confront computer science and bioinformatics.
By the end of the chapter, you should have an appreciation of how the field of bioinformatics evolved; you should also have gained insights into the extent to which its future progress is linked to the advances in data management and knowledge representation that are engaging computer science today.
1.2 Bioinformatics
1.2.1 What is bioinformatics?
Bioinformatics is a term that means different things to different people, with so many possible interpretations â many of them entirely reasonable â that it can sometimes be difficult to know what bioinformatics actually means, and whether it isn't just computational biology by another name. One way of making sense of the bioinformatics landscape is to recognise that it has both service and research components. Its service side primarily involves the routine storage, maintenance and retrieval of biological data. While these may seem like rather humdrum tasks for today's technologies, we'll explore why this is far from being true. By contrast, the research side of bioinformatics largely involves analysis of biological data using a variety of tools and techniques, often in combination, to create complex workflows or pipelines, including components ranging from pattern recognition and statistics, to visualisation and modelling. As we'll see, a particularly important facet of data analysis also concerns the use and development of prediction tools (later, we'll look at some of the ramifications of our heavy reliance on structure- and function-prediction approaches, especially in light of the emergence of high-throughput biology). The union of all of these capabilities into a broad-based, interdisciplinary science, involving both theoretical, practical and conceptual tools for the generation, dissemination, representation, analysis and understanding of biological information sets it apart from computational biology (which, as the name suggests, is perhaps more concerned with the development of mathematical tools for modelling and simulating biological systems).
In this book, we broadly explore issues relating to the computational manipulation and conceptual representation of biological sequences and macromolecular structures. We chose this vantage-point for two reasons: first, as outlined in the Preface, this is our âhome territory', and hence we can discuss many of the challenges from first-hand experience; second, this is where the discipline of bioinformatics has its roots, and it's from these origins that many of its successes, failures and opportunities stem.
1.2.2 The provenance of bioinformatics
The origins of bioinformatics, both as a term and as a scientific discipline, are controversial. The term itself was coined by theoretical biologist Paulien Hogeweg. In the early 1970s, she established the first research group specialising in bioinformatics, at the University of Utrecht (Hogeweg, 1978; Hogeweg and Hesper, 1978). Back then, with her colleague Ben Hesper, she defined the term to mean âthe study of informatic processes in biotic systems' (Hogeweg, 2011). But the term didn't gain popularity in the community for almost another two decades; and, by the time it did, it had taken on a rather different meaning.
In Europe, a turning point seems to have been around 1990, with the organisation of the Bioinformatics in the 90s conference (held in Maastricht in 1991), probably the first conference to include this ânew' term â bioinformatics â in its title. Consider that, during the same period, the National Center for Biotechnology Information1 (NCBI) had been established in the United States of America (USA) (Benson et al., 1990). But this was a centre for biotechnology information, not a bioinformatics centre, and it was established, at least in part, to provide the nation with a long-term âbiology informatics strategy' (Smith, 1990), not a âbioinformatics' strategy.
With a new label to describe itself, a new scientific discipline evolved from the growing needs of researchers to access and analyse (primarily biomedical) data, which was beginning to accumulate, seemingly quite rapidly, in different parts of the world. This sudden data accumulation was the result of a number of technological advances that were yielding, at that time, unprecedented quantities of biological sequence inforÂmation. Hand-in-hand with these developments came the widespread development of the algorithms, and computational tools and resources that were necessary to analyse, manipulate and...