Chapter 1 A brief history of data warehousing and first-generation data warehouses
In the beginning there were simple mechanisms for holding data. There were punched cards. There were paper tapes. There was core memory that was hand beaded. In the beginning storage was very expensive and very limited.
A new day dawned with the introduction and use of magnetic tape. With magnetic tape, it was possible to hold very large volumes of data cheaply. With magnetic tape, there were no major restrictions on the format of the record of data. With magnetic tape, data could be written and rewritten. Magnetic tape represented a great leap forward from early methods of storage.
But magnetic tape did not represent a perfect world. With magnetic tape, data could be accessed only sequentially. It was often said that to access 1% of the data, 100% of the data had to be physically accessed and read. In addition, magnetic tape was not the most stable medium on which to write data. The oxide could fall off or be scratched off of a tape, rendering the tape useless.
Disk storage represented another leap forward for data storage. With disk storage, data could be accessed directly. Data could be written and rewritten. And data could be accessed en masse. There were all sorts of virtues that came with disk storage.
DATA BASE MANAGEMENT SYSTEMS
Soon disk storage was accompanied by software called a âDBMSâ or âdata base management system.â DBMS software existed for the purpose of managing storage on the disk itself. Disk storage managed such activities as
identifying the proper location of data;
resolving conflicts when two or more units of data were mapped to the same physical location;
allowing data to be deleted;
spanning a physical location when a record of data would not fit in a limited physical space;
Among all the benefits of disk storage, by far and away the greatest benefit was the ability to locate data quickly. And it was the DBMS that accomplished this very important task.
ONLINE APPLICATIONS
Once data could be accessed directly, using disk storage and a DBMS, there soon grew what came to be known as online applications. Online applications were applications that depended on the computer to access data consistently and quickly. There were many commercial applications of online processing. These included ATMs (automated teller processing), bank teller processing, claims processing, airline reservations processing, manufacturing control processing, retail point of sale processing, and many, many more. In short, the advent of online systems allowed the organization to advance into the 20th century when it came to servicing the day-to-day needs of the customer. Online applications became so powerful and popular that they soon grew into many interwoven applications.
Figure 1.1 illustrates this early progression of information systems.
In fact, online applications were so popular and grew so rapidly that in short order there were lots of applications.
And with these applications came a cry from the end userââI know the data I want is there somewhere, if I could only find it.â It was true. The corporation had a whole roomful of data, but finding it was another story altogether. And even if you found it, there was no guarantee that the data you found was correct. Data was being proliferated around the corporation so that at any one point in time, people were never sure about the accuracy or completeness of the data that they had.
PERSONAL COMPUTERS AND 4GL TECHNOLOGY
To placate the end userâs cry for accessing data, two technologies emergedâpersonal computer technology and 4GL technology.
Personal computer technology allowed anyone to bring his/her own computer into the corporation and to do his/her own processing at will. Personal computer software such as spreadsheet software appeared. In addition, the owner of the personal computer could store his/her own data on the computer. There was no longer a need for a centralized IT department. The attitude wasâif the end users are so angry about us not letting them have their own data, just give them the data.
At about the same time, along came a technology called â4GLâ â fourth-generation technology. The idea behind 4GL technology was to make programming and system development so straightforward that anyone could do it. As a result, the end user was freed from the shackles of having to depend on the IT department to feed him/her data from the corporation.
Between the personal computer and 4GL technology, the notion was to emancipate the end user so that the end user could take his/her own destiny into his/her own hands. The theory was that freeing the end user to access his/her own data was what was needed to satisfy the hunger of the end user for data.
And personal computers and 4GL technology soon found their way into the corporation.
But something unexpected happened along the way. While the end users were now free to access data, they discovered that there was a lot more to making good decisions than merely accessing data. The end users found that, even after data had been accessed,
if the data was not accurate, it was worse than nothing, because incorrect data can be very misleading;
incomplete data is not very useful;
data that is not timely is less than desirable;
when there are multiple versions of the same data, relying on the wrong value of data can result in bad decisions;
data without documentation is of questionable value.
It was only after the end users got access to data that they discovered all the underlying problems with the data.
THE SPIDER WEB ENVIRONMENT
The result was a big mess. This mess is sometimes affectionately called the âspiderâs webâ environment. It is called the spiderâs web environment because there are many lines going to so many places that they are reminiscent of a spiderâs web.
Figure 1.2 illustrates the evolution of the spiderâs web environment in the typical corporate IT environment.
The spiderâs web environment grew to be unimaginably complex in many corporate environments. As testimony to its complexity, consider the real diagram of a corp...