eBook - ePub

Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL

Name: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL
Author: Joe Celko

Joe Celko

384 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL

Joe Celko

Book details

Book preview

Table of contents

Citations

About This Book

Perfectly intelligent programmers often struggle when forced to work with SQL. Why? Joe Celko believes the problem lies with their procedural programming mindset, which keeps them from taking full advantage of the power of declarative languages. The result is overly complex and inefficient code, not to mention lost productivity.This book will change the way you think about the problems you solve with SQL programs.. Focusing on three key table-based techniques, Celko reveals their power through detailed examples and clear explanations. As you master these techniques, you'll find you are able to conceptualize problems as rooted in sets and solvable through declarative programming. Before long, you'll be coding more quickly, writing more efficient code, and applying the full power of SQL

Filled with the insights of one of the world's leading SQL authorities - noted for his knowledge and his ability to teach what he knows
Focuses on auxiliary tables (for computing functions and other values by joins), temporal tables (for temporal queries, historical data, and audit information), and virtual tables (for improved performance)
Presents clear guidance for selecting and correctly applying the right table technique

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL by Joe Celko in PDF and/or ePUB format, as well as other popular books in Informatik & Programmiersprachen. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Morgan Kaufmann

Year

2008

ISBN

9780080557526

Topic

Informatik

Subtopic

Programmiersprachen

SQL Is Declarative,
Not Procedural

IN THE PREFACE I told a short story about FORTRAN programmers who could only solve problems using loops and a LISP programmer who could only solve problems recursively This is not uncommon because we love the tools we know. Let me tell a joke instead of a story: A mathematician, a physicist, and a database programmer were all given a rubber ball and told to find the volume.

The mathematician carefully measured the diameter and either evaluated the volume of sphere formula or used a triple integral if the ball was not perfectly round.

The physicist filled a beaker with water, put the ball in the water, and measured the total displacement. He does not care about the details of the shape of the ball.

The database programmer looked up the model and serial numbers in his rubber ball manufacturer’s on-line database. He does not care about the actual ball. But he has information about the tolerances to which it was made, the expected shape and size, and a bunch of other things that apply to the entire rubber ball production process.

The moral of the story is: The mathematician knows how to compute. The physicist knows how to measure. The database guy knows how to look up data. Each person grabs his tools to solve the problem.

Now change the problem to an inventory of thousands of rubber balls. The mathematician and the physicist are stuck with a lot of manual labor. The database guy does a few downloads and he can produce rubber ball industry standards (assuming that there are such things) and detailed documentation in court with his answers.

1.1 Different Programming Models

Perfecting oneself is as much unlearning as it is learning.
—Edsgar Dijkstra

There are many models of programming. Procedural programming languages use a sequence of procedural steps guided by flow of control statements (WHILE-DO, IF-THEN-ELSE, and BEGIN-END) that change the input data to output data. This was the traditional view of programming, and it is often called the von Neumann Model after John von Neumann, the mathematician who was responsible for it. The same source code runs through the same compiler and generates the same executable module every time. The same program will work exactly the same way every time it is invoked. The keywords in this model are predictable and deterministic. It is also subject to some mathematical analysis because it is deterministic.

There are some variations on the theme. Some languages use different flow control statements. FORTRAN and COBOL allocated all the storage for the data at the start of the program. Later, the Algol family of languages did dynamic storage allocation based on the scope of the data within a block-structured language.

Edsgar Dijkstra (see his archives at www.cs.utexas.edu/users/EWD/) came up with a language that was nondeterministic. Statements, called guarded commands, have a control that either blocks or allows the statement to be executed, but there is no particular order of execution among the open statements. This model was not implemented in a commercial product, but it demonstrated that something we had thought was necessary for programming (determinism) could be dropped.

Functional programming languages are based on solving problems as a series of nested function calls. The concept of higher-order functions to change one function to another is important in these languages. The derivative and integral transforms are mathematical examples of such higher-order functions. One of the goals of such languages is to avoid a side effect in programs so they can be optimized algebraically In particular, once you have an expression that is equal to another (in some sense of equality), they can substitute for each other without affecting the result of the computation.

APL is the most successful functional programming language and had a fad period as a teaching language when Ken Iverson wrote his book A Programming Language in 1962. IBM produced special keyboards that included the obscure mathematical symbols used in APL for their desktop machines. Most of the functional languages never made it out of academia, but some survive in commercial applications today. Erlang is used for concurrent applications; R is a statistical language; Mathematica is a popular symbolic mathematics product; and Kx Systems uses the K language for large-volume financial analysis. More recently the ML and Haskell programming languages have become popular among Linux and UNIX programmers.

Here we dropped another concept that had been regarded as fundamental: There is no flow of control in these languages.

Constraint or constraint logic programming languages are a series of constraints on a problem domain. As you add more constraints, the system figures out which answers are possible and which are not. The most popular such language is PROLOG, which also had an academic fad many years ago when Borland Software (www.borland.com) made a cheap student version available. The website ON-LINE GUIDE TO CONSTRAINT PROGRAMMING by Roman Barták is a good place to start if you are interested in this topic (http://kti.ms.mff.cuni.cz/~bartak/constraints/index.html).

Here we dropped the concept of an algorithm altogether and just provided a problem specification.

Object-oriented (OO) programming is based on the ideas of objects that have both data and behavior in the same module of code. The programming model is a collection of independent cooperating objects instead of a single program invoking functions. An object is capable of receiving messages, processing data, and sending messages to other objects.

The idea is that each object can be maintained and written independently of any particular application and dropped into place where it is needed. Imagine a community of people who do particular jobs. They receive orders from their customers, process them, and return a result.

Many years ago, the INCITS H2 Database Standards Committee (née ANSI X3H2 Database Standards Committee) had a meeting in Rapid City, South Dakota. We had Mount Rushmore and Bjarne Stroustrup as special attractions. Mr. Stroustrup did his slide show with overhead transparencies (yes, this was before PowerPoint was ubiquitous!) about Bell Labs inventing C++ and OO programming, and we got to ask questions.

One of the questions was how we should put OO features into the working model of the next version of the SQL standard, which was known as SQL3 internally. His answer was that Bell Labs, with all their talent, had tried four different approaches to this problem and they came to the conclusion that it should not be done. OO was great for programming but deadly for data.

I have watched people try to force OO models into SQL, and it falls apart in about a year. Every typo becomes a new attribute or class, queries that would have been so easy in a relational model are now multitable monster outer joins, redundancy grows at an exponential rates, constraints are virtually impossible to write so you can kiss data integrity goodbye, and so forth.

With all these programming models, why should we not have different data models?

1.2 Different Data Models

Consider the humble punch card. Punch cards had been used in France to control textile looms since the early 1700s; the method was perfected by Joseph Marie Jacquard in 1801 with his Jacquard loom.

Flash forward to the year 1890, when a man named Herman Hollerith invented a punch card and tabulating machines for that year’s United States Census. His census project was so successful that Mr. Hollerith left the government and started the Tabulating Machine Company in 1896. After a series of mergers and name changes, this company became IBM. You might have heard of it.

Up to the 1970s, the “IBM card” and related machinery was everywhere. The most common card was the IBM 5081, and that part number became the common term for it—even across vendors! The punch card was data processing back then.

The physical characteristics of the card determined how we stored and processed data for decades afterwards. The card was the size of an 1887 United States dollar bill (3.25 inches by 7.375 inches). The reason for that size was simple; when Hollerith worked on the Census, he could get drawers to store the decks of cards from the Department of the Treasury across the street.

The cards had a grid of 80 columns of 12 rows, which could accommodate holes. This was for physical reasons again. But once the 80-column convention was established, it stuck. The early video terminals that replaced the key punch machines used screens with 80 columns of text and 24 or 25 rows—that is, two punch cards high and possibly a line for error messages.

Magnetic tapes started replacing punch cards in the 1970s, but they also mimicked the 80-column convention, although there was no longer any need. Many of the early ANSI tape standards for header records are based on this convention. Legacy systems simply replaced card readers with magnetic tape units for obvious reasons, but new applications continued to be built to this standard, too.

The physical nature of the cards meant that data was written and read from left to right in sequential order. Likewise, the deck of cards was written and read from front to back in sequential order.

A magnetic tape file is also written and read in the same way, but with the added bonus that when you drop a tape on the floor, it does not get scrambled like a deck of cards. The downside of a tape over a deck of cards is that it cannot be rearranged manually on purpose either.

Card and tape files are pretty passive creatures and will take whatever an application program throws at them without much objection. Files are also independent of each other, simply because they are connected to one application program at a time and therefore have no idea what other files look like.

Early disk systems also mimicked this model—physically contiguous storage read in a sequential order, with meaning given to the data by the program reading it.

It was a while before disk systems realized that the read/write heads could be moved to any physical position on the disk. This gave us random access storage. We still have a contiguous storage concept within each field and each record, however.

The Relational Model was a big jump, because it divorced the physical and logical models of data. If you read the specifications for many of the early programming languages, they describe physically contiguous data and storage methods. SQL describes only the behavior of the data without any reference to physical storage methods.

1.2.1 Columns Are Not Fields

A field within a record is defined by the application program that reads it. A column in a row in a table is defined independently of any application by the database schema in DDL. The data types in a column are always scalar and NULL-able.

This is a problem for files. If I mount the wrong tape on a tape drive, say a COBOL file, and read it with a FORTRAN program, it can produce meaningless output. The program simply counts the number of bytes from the start of the tape and slices off so many characters into each field from left to right.

The order of the application program variables in the READ or INPUT statements is important, because the values are read into the program variables in that order. In SQL, columns are referenced only by their n...

Cover Page
Title Page
Copyright Page
The Morgan Kaufmann Series in Data Management Systems
Dedication
Table of Contents
Preface
Chapter 1 - SQL Is Declarative, Not Procedural
Chapter 2 - Hardware, Data Volume, and Maintaining Databases
Chapter 3 - Data Access and Records
Chapter 4 - Lookup Tables
Chapter 5 - Auxiliary Tables
Chapter 6 - Views
Chapter 7 - Virtual Tables
Chapter 8 - Complicated Functions via Tables
Chapter 9 - Temporal Tables
Chapter 10 - Scrubbing Data with Non-1 NF Tables
Chapter 11 - Thinking in SQL
Chapter 12 - Group Characteristics
Chapter 13 - Turning Specifications into Code
Chapter 14 - Using Procedure and Function Calls
Chapter 15 - Numbering Rows
Chapter 16 - Keeping Computed Data
Chapter 17 - Triggers for Constraints
Chapter 18 - Procedural and Data Driven Solutions