eBook - ePub

Managing Time in Relational Databases

Name: Managing Time in Relational Databases
Author: Tom Johnston,Randall Weis

How to Design, Update and Query Temporal Data

Tom Johnston,

Randall Weis,

512 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Managing Time in Relational Databases

How to Design, Update and Query Temporal Data

Tom Johnston,

Randall Weis,

Book details

Book preview

Table of contents

Citations

About This Book

Managing Time in Relational Databases: How to Design, Update and Query Temporal Data introduces basic concepts that will enable businesses to develop their own framework for managing temporal data. It discusses the management of uni-temporal and bi-temporal data in relational databases, so that they can be seamlessly accessed together with current data; the encapsulation of temporal data structures and processes; ways to implement temporal data management as an enterprise solution; and the internalization of pipeline datasets.

The book is organized into three parts. Part 1 traces the history of temporal data management and presents a taxonomy of bi-temporal data management methods. Part 2 provides an introduction to Asserted Versioning, covering the origins of Asserted Versioning; core concepts of Asserted Versioning; the schema common to all asserted version tables, as well as the various diagrams and notations used in the rest of the book; and how the basic scenario works when the target of that activity is an asserted version table. Part 3 deals with designing, maintaining, and querying asserted version databases. It discusses the design of Asserted Versioning databases; temporal transactions; deferred assertions and other pipeline datasets; Allen relationships; and optimizing Asserted Versioning databases.

Integrates an enterprise-wide viewpoint with a strong conceptual model of temporal data management allowing for realistic implementation of database application development.
Provides a true practical guide to the different possible methods of time-oriented databases with techniques of using existing funtionality to solve real world problems within an enterprise data architecture environment.
Written by IT professionals for IT professionals, this book employs a heavily example-driven approach which reinforces learning by showing the results of puting the techniques discussed into practice.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Managing Time in Relational Databases by Tom Johnston,Randall Weis in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Bases de datos. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Morgan Kaufmann

Year

2010

ISBN

9780080963372

Topic

Ciencia de la computación

Subtopic

Bases de datos

1. A Brief History of Temporal Data Management

Contents

Excluding Time from Databases: A Ubiquitous Paradigm12

The 1980s13

Historical Databases13

History Tables13

The 1990s14

Temporal Extensions to SQL14

Data Warehouses and Data Marts Come of Age14

The Inmon/Kimball Religious Wars16

The 2000s18

Data Cubes18

Slowly Changing Dimensions19

Real-Time Data Warehouses20

The Future of Databases: Seamless Access to Temporal Data20

Closing In on Seamless Access22

Glossary References24

Temporal data management is not a new development. From the earliest days of business data processing (as it was called back then), transactions were captured in a transaction log and the files and tables those transactions were applied to were periodically backed up. With those backups and logfiles, we could usually recreate what the data looked like at any point in time along either of the temporal dimensions we will be discussing. Indeed, together they contain all the “raw material” needed to support fully bi-temporal data management. As we will see in this chapter, what has changed about temporal data management, over the decades, is accessibility to temporal data. These days, it takes less effort to get to temporal data than it used to, and it takes less time as well. But significant additional progress is possible, and computer scientists are working on it. We are, too. Our work has led to this book, and to the software we are developing to implement its concepts.

We emphasize that the following history is not a history in any strict sense. It is more like our reflections on changes in methods of managing data which we have observed, as IT consultants, over the last quarter-century. It is our attempt to look back on those changes, and impose some order and structure on them. It does not fall short of history, in a strict sense, in attempting to impose order and structure. All historical narrative does that, no matter how purely descriptive it claims to be. Rather, it falls short in its reliance solely on personal reminiscence.

Excluding Time from Databases: A Ubiquitous Paradigm

In one sense, temporal data has been accorded only second-class status since the advent of computers and their use in managing business data. Neither database management systems (DBMSs) and the tables they manage, nor access methods and the files they manage, provide explicit mechanisms and structures to distinguish data about the past, present or future of the things we keep track of. Instead, unless developer-designed data structures and developer-written code is deployed, every object is represented by one and only one row in a table. If the row is there, the corresponding object is represented in our databases; otherwise it is not. If something about a represented object changes, the row is retrieved, updated and rewritten to reflect that change.

This focus on current data is reflected in a basic paradigm that has been used since we began managing data with computers. The paradigm is that of one data structure to represent a type of object or event, containing multiple other data structures, each representing an instance of an object or event of that type. Contained within the latter data structures are additional structures, each representing a property of the instance in question, or a relationship it has to another instance of the same type or (more usually) a different type.

This paradigm has manifested itself in such terminologies as (i) files, records, fields and pointers; (ii) tables, rows, columns and foreign keys; and (iii) classes, objects, slots and links. For the remainder of this book, we will use the table, row, column and foreign key terminology, although the concepts of uni-temporal and bi-temporal data management apply equally well to data managed by directly invoking access methods, to data managed with proprietary software, and to data managed with object-oriented structures and transformations.

The 1980s

Historical Databases

In the 80s, as disk storage costs plummeted, it was inevitable that someone would think to put the most recent backup files onto disk where it would be possible to access them without first restoring them from off-line storage media. After that, the next step was to realize that there was value, not just in having a particular set of backup data remain on-line, but also in having the ability to compare multiple backups of the same data, made at different points in time.

Each backup is a snapshot of a set of data of interest, and just as a movie film makes motion apparent while being itself a series of still images, so too a series of database snapshots can make change apparent while being itself a series of still images. Thus was born the concept of a data warehouse, whose originators were Barry Devlin and Paul Murphy. ¹ This concept introduced temporal data management at the database level (as opposed to the table, row or column levels), since data warehouses are entire databases devoted to historical data.

¹See [1988, Devlin & Murphy]. The full citation may be found in the appendix Bibliographical Essay. The year is the year of publication, and entries in that appendix are organized by year of publication.

History Tables

On an architecturally smaller scale, IT developers were also beginning to design and implement several other ways of managing temporal data. One of them was the use of history tables, and another the use of version tables. In the former case, temporal data management is implemented at the table level in the sense that individual tables are the objects devoted to historical data, usually populated by triggers based on updates to the corresponding current tables. In the latter case, temporal data management is also implemented at the table level, but in this case historical and current data reside in the same table. In some cases, intrepid developers have even attempted to introduce temporal data management at the level of individual columns.

In addition, developers were also beginning to create on-line transaction tables by bringing collections of transactions back from off-line storage media, transactions that originally had been moved onto that media as soon as their current accounting periods were over. The difference between history tables and version tables, on the one hand, and transaction tables on the other hand, is that history and version tables record the state of objects at different times, whereas transaction tables record the events that change the states of those objects and, in particular, the relationships among them.

The 1990s

Temporal Extensions to SQL

By the early 90s, significant computer science research on bi-temporality had been completed. To the extent that word of these developments made its way into the business IT community, bi-temporality was understood as a distinction between logical time and physical time. Logical time, corresponding to what computer scientists called valid time, was generally referred to by IT professionals as effective time. It was understood to be that period of time, denoted by either a single date or by a pair of dates, during which the object represented by a row conformed to the description that row provided. The term “effective time” derives from the fact that for specific and non-overlapping periods of time, each of these rows is in effect as the representative of an object—as the authorized description of what the object is like during that specific period of time. As for physical time, it was understood to be a single date, the physical date on which the bi-temporal data is created.

This view was, in fact, either a misunderstanding of what the computer scientists were saying, or else an independently developed understanding of two kinds of time that were relevant to data. Either way, it fell short of full bi-temporality. For while it acknowledged that one kind of time is a period of time, it believed that the other kind of time is a point in time. With only one temporal extent represented, this was at best a quasi-bi-temporal model of data.

This misunderstanding aside, the computer science work on bi-temporality resulted in a proposal for bi-temporal extensions to the SQL language. The extensions were formulated originally as TSQL, later superceded by TSQL2. This proposal was submitted to the SQL Standards Committee in 1994 by Dr. Rick Snodgrass, but to this date has still not been ratified. Nonetheless, there is much that can be done to support bi-temporal functionality using today's technology, and much to be gained from doing so.

Data Warehouses and Data Marts Come of Age

The second major development in the 90s was that the concept of a data warehouse was proselytized and extended by Bill Inmon [1996, Inmon]. ² As a result of this work, the IT industry began to take note of data warehousing. In its purest form, a data warehouse records history as a series of snapshots of the non-transactional tables in legacy system databases.

²The first edition of Inmon's first book was apparently published in 1991, but we can find no reliable references to it. It seems to us that it was only with the second edition, published in 1996, that the IT community began to take notice of data warehousing.

This reflects the fact that, from the point of view of data warehousing, what is important are persistent objects and what they are like at different points in time, i.e. what states they are in as they pass through time. If we are interested in the changes rather than the states themselves, we can reconstruct the history of those changes by extracting the deltas between successive states.

At about the same time, Ralph Kimball [1996, Kimball] took a complementary approach, describing a method of recording history by means of a collection of transactions. With transactions, the focus changes from objects to the relationships among them, for example from a company and its customers to the account balances which track the relationship between that company and each of those customers. Starting with a base state of a relationship, such as account balances on the first of each year, the metrics of those relationships can be recreated, at any subsequent point in time, by applying statistical functions to the base states and the subsequent transactions, e.g. by adding a chronological series of all purchases, payments and returns by a customer to the beginning balance of her account, until any desired point in time is reached.

As the 90s progressed, a religious war developed between those IT professionals who followed Inmon's data warehouse method of managing history, and those who followed Kimball's data mart method. These disputes generated more heat than light, and they did so because the complementary nature of the warehouse vs. mart approaches was never clearly recognized.

Because this was the major focus of discussions by IT professionals, during the 90s, about how historical data should be recorded and managed, it is worth trying to describe both what the two camps thought was at stake, as well as what was really at stake. We will describe what was really at stake in the next chapter. Here, we will describe what the two camps thought was at stake.

The Inmon/Kimball Religious Wars

The Kimball Perspective

What the Kimball advocates thought was at stake, in the middle to late 90s, was the difference between a cumbersome and a nimble way of providing access to historical data. They thought the issue was an either/or issue, a choice to be made between data warehouses and data marts, with the correct choice being obvious.

It is true that “nimbleness” was a major concern during those years. Data warehouse projects were nearly always long-term, big-budget projects. Like most such projects, they tended to fail at a high rate. Most failures were prob...

Cover Image
Table of Contents
Front matter
Copyright
About the Authors
Preface
Introduction
1. A Brief History of Temporal Data Management
2. A Taxonomy of Bi-Temporal Data Management Methods
Introduction
3. The Origins of Asserted Versioning
4. The Origins of Asserted Versioning
5. The Core Concepts of Asserted Versioning
6. Diagrams and Other Notations
7. The Basic Scenario
Introduction
8. Designing and Generating Asserted Versioning Databases
9. An Introduction to Temporal Transactions
10. Temporal Transactions on Single Tables
11. Temporal Transactions on Multiple Tables
12. Deferred Assertions and Other Pipeline Datasets
13. Re-Presenting Internalized Pipeline Datasets
14. Allen Relationship and Other Queries
15. Optimizing Asserted Versioning Databases
16. Conclusion
Appendix. Bibliographical Essay
The Asserted Versioning Glossary
Index