eBook - ePub

Database Archiving

Name: Database Archiving
Author: Jack E. Olson

How to Keep Lots of Data for a Very Long Time

Jack E. Olson,

312 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Database Archiving

How to Keep Lots of Data for a Very Long Time

Jack E. Olson,

Book details

Book preview

Table of contents

Citations

About This Book

With the amount of data a business accumulates now doubling every 12 to 18 months, IT professionals need to know how to develop a system for archiving important database data, in a way that both satisfies regulatory requirements and is durable and secure. This important and timely new book explains how to solve these challenges without compromising the operation of current systems. It shows how to do all this as part of a standardized archival process that requires modest contributions from team members throughout an organization, rather than the superhuman effort of a dedicated team.

Exhaustively considers the diverse set of issues—legal, technological, and financial—affecting organizations faced with major database archiving requirements
Shows how to design and implement a database archival process that is integral to existing procedures and systems
Explores the role of players at every level of the organization—in terms of the skills they need and the contributions they can make.
Presents its ideas from a vendor-neutral perspective that can benefit any organization, regardless of its current technological investments
Provides detailed information on building the business case for all types of archiving projects

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Database Archiving by Jack E. Olson in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Morgan Kaufmann

Year

2010

ISBN

9780080884424

Topic

Computer Science

Subtopic

Databases

Index

Computer Science

Part 1. Archiving Basics

Chapter 1. Database Archiving Overview

Archiving is the process of preserving and protecting artifacts for future use. These artifacts have lived beyond their useful life and are being kept solely for the purpose of satisfying future historical investigations or curiosities that might or might not occur. An archive is a place where these artifacts are stored for long periods of time. They are retained in case someone will want or need them in the future. They are also kept in a manner so that they can be used in the future.

Archiving has existed in many forms for centuries. For example, the United States government employs a national archivist. The Presidential libraries are archives. Newspapers retain archives of all stories printed, since papers began to be published. Museums are archives of interesting objects from the past. Your local police department has an evidence archive. When a collection of items is placed in the cornerstone of a building during construction in anticipation that someone 100 or more years in the future will uncover it (a time capsule), an archive is being created and the creators are acting as archivists.

An archive is created for a specific purpose: to hold specific objects for future reference in the event that someone needs to look at them. The focus is on the objects that are to be included in the archive. Each archive has a specific purpose and stores a specific object type.

The process of archiving follows a common methodology. No matter what you archive, you should go through the same steps. If you leave out one of the steps, you will probably run into problems later on. This generic methodology is discussed in Chapter 3. However, before we go there, it is important to set the scope of this book.

The discussion that follows segregates data archiving into categories that are useful in understanding where this book fits into the broader archiving requirements. It also establishes some basic definitions and concepts that will be used later as we get deeper into the process.

1.1. A Definition of Database Archiving

Database archiving is the act of removing from an operational database selected data objects that are not expected to be referenced again and placing them in an archive data store where they can be accessed if needed. This is a powerful statement for which each part needs to be completely understood.

Data objects

A data object is the unit of information that someone in the future will seek from the archive. It represents a business event or object; it is the actual “thing” you are archiving. An example is banking transactions, such as deposits and withdrawals from customer accounts. The basic unit of an archive is a single transaction. Each transaction is a discrete object that reaches a point in time when it is ready to be archived. The data stored for a transaction would include all the particulars of the event, such as the account number, date, time, and place the transaction occurred; the dollar amount; and possibly more. This information might be wrapped with additional information, such as the account holder's name and address from the account master record, to make it a more complete record.

Another example of a data object might include an entire collection of data, such as production records for an airplane. The goal might be to archive the records for each individual airplane after it has been built and is ready to be sold. You might move this information into the archive one year after the airplane enters service. Although the archive object may have many different types of data, it still represents data for only a single airplane.

Selected data objects

You will have thousands of similar data objects in an operational database at any given point in time. These objects have unique characteristics in regard to their role in your information-processing systems and business. One characteristic is how long the data objects have been in the database. Some might have been there for months; others were created just a second ago. Some might be waiting for some other event to occur that would update them; some might not. Some might have a special status, others not.

In archiving, not all data objects are ready to be moved to the archive at the same time. The user must develop a policy expressed as selection criteria for determining when items are ready to be moved. Sometimes this policy is as simple as “All transactions for which 90 days have passed since the create event occurred.” Other times it is more complicated, such as “Archive all account information for accounts that have been closed for more than 60 days and do not have any outstanding issues such as uncollected amounts.”

The key point is that the database contains a large collection of like data objects. Archiving occurs on a periodic basis—say, once every week. At the time that a specific archive sweep of the database is done, only some of the objects will be ready to be archived. The policy used to select those objects that are ready needs to be defined in data value terms using information contained within or about the objects. The selected objects will probably be scattered all over the database and not contained in a single contiguous storage area or partition.

Removing data objects

The point of archiving is to take objects out of the active, operational environment and place them in a safe place for long-term maintenance. For database data this means that data is written to the archive data store and then deleted from the operational database. The whole point of archiving is to take data out of the operational database and store it safely for future use and reference.

In some instances, covered in later chapters, data that is moved to the archive is also left behind in the operational database and then deleted later. These are special circumstances; however, even here the archived data is frozen in the archive for long-term storage and considered archived. The copy left behind in the operational database is no longer the “official” record of the object.

Not expected to be referenced again

This is an important point for database archives. You do not want to move data from the operational database too early. The correct time to move it is when the data has a very low probability of being needed again. If there are normal business processes that will need to use this data, the data is probably not ready to be moved. Any subsequent need for the data should be unexpected and a clear exception.

Later chapters will discuss special circumstances in which this rule may be relaxed.

Placing them in an archive data store

The archive data store is a distinctly different data store than the one used for the operational database. The operational databases are designed and tuned to handle high volumes of data; incur high create, update, and delete activity levels; and process many random query transactions over the data. The data design, storage methods, and tuning requirements for the archive data store are dramatically different from those for operational databases. The archive data store will house much larger volumes of data, have no update activity, and infrequent query access.

In addition, other demands for managing the archive data store influence how and where it is to be stored. For example, the need to use low-cost storage and the need to have backup copies in geographically distributed locations are important considerations. Many of these requirements also dictate different storage design decisions as to how and where to store archived data.

The point is that the archive data store is a separate data store from the operational database and has uniquely different design and data management requirements.

Can be referenced again if needed

Even though the archive contains data that is not expected to be accessed again, it is there precisely so that it can be accessed again if needed. It makes no sense to bother archiving data if it cannot be accessed. Archives exist for legal and business requirements to keep data just in case a need arises.

The process of accessing data from archives is generally very different from accessing data for business processes. The reasons for access are different. The type of access is different. The user doing the accessing is also different. Queries against the archive are generally very simple but may involve large amounts of output data.

It is important that the archive query capabilities be strong enough to satisfy all future requirements and not require the user to restore data to the original systems for access through the original applications.

It is critical to understand the likely uses that could arise in the future and the shape of the queries that result. Anticipating these factors will impact decisions you make regarding how and where to keep the archive data store.

1.2. Forms of Data Archiving

It is important to understand the difference between data archiving and database archiving. Data archiving refers to a broad category of archiving that includes many subtypes. Database archiving is one of those subtypes.

Data archiving is a term that applies to keeping inactive information collected by an enterprise. Data is collected in many forms: paper documents, emails, computer files, and so on. Most data today is collected and stored electronically by computers. Only some of it is stored in databases using structured database management systems.

Figure 1.1 lists several categories of enterprise data. Almost every enterprise uses all these categories. The list demonstrates that data archiving applies to many diverse types of data. As we shall see, the requirements for storing different types of data and the problems that need to be solved vary from one categor...

Brief Table of Contents
Table of Contents
Copyright
Dedication
Preface
Acknowledgments
Part 1. Archiving Basics
Part 2. Establishing a Database Archiving Project
Part 3. Designing Database Archiving Applications
Part 4. Database Archiving Application Software
Part 5. Administration of the Database Archive
Appendix Final Thoughts
Appendix A. Generic Archiving Checklist
Appendix B. Goals of a Database Archiving System
Appendix C. Job Description of a Database Archive Analyst
Glossary
Glossary
Bibliography
Index