Joe Celko's Data, Measurements and Standards in SQL
eBook - ePub

Joe Celko's Data, Measurements and Standards in SQL

  1. 309 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Joe Celko's Data, Measurements and Standards in SQL

Book details
Book preview
Table of contents
Citations

About This Book

Joe Celko has looked deep into the code of SQL programmers and found a consistent and troubling pattern - a frightening lack of consistency between their individual encoding schemes and those of the industries in which they operate. This translates into a series of incompatible databases, each one an island unto itself that is unable to share information with others in an age of internationalization and business interdependence. Such incompatibility severely hinders information flow and the quality of company data.Data, Measurements and Standards in SQL reveals the shift these programmers need to make to overcome this deadlock. By collecting and detailing the diverse standards of myriad industries, and then giving a declaration for the units that can be used in an SQL schema, Celko enables readers to write and implement portable data that can interface to any number of external application systems! This book doesn't limit itself to one subject, but serves as a detailed synopsis of measurement scales and data standards for all industries, thereby giving RDBMS programmers and designers the knowledge and know-how they need to communicate effectively across business boundaries.* Collects and details the diverse data standards of myriad industries under one cover, thereby creating a definitive, one-stop-shopping opportunity for database programmers.* Enables readers to write and implement portable data that can interface to any number external application systems, allowing readers to cross business boundaries and move up the career ladder.* Expert advice from one of the most-read SQL authors in the world who is well known for his ten years of service on the ANSI SQL standards committee and Readers Choice Award winning column in Intelligent Enterprise.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Joe Celko's Data, Measurements and Standards in SQL by Joe Celko in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming Languages. We have over one million books available in our catalogue for you to explore.

Information

Year
2009
ISBN
9780080884455
Part 1. History, Standards, and Designing Data
The first of this book discusses the principles of designing data encoding schemes and some of the history of standardization. This second part will give examples of actual standards used in a variety of industries.
In the first six chapters, I look at the foundations from the view point of a database designer who needs some understanding of the how and why.
I find it odd that database designers are very physical about their data and do not work with many abstractions. They were never taught the theory of scales and measurements. They have only a minimal knowledge of validation, verification, and risk of error as a part of the data.
Check digit algorithms are taught as single “programming tricks” in undergraduate computer science classes rather than a mathematical discipline.
To the best of my knowledge, I am the only person who teaches Data Encoding Schemes in an orderly fashion.
The failure of cowboy coders to use standards leads to problems. The homegrown encoding schemes have to be maintained internally. A standard is maintained for you. It is usually maintained by an organization devoted to that standard and with subject area experts who you could not hire.
The days of isolated databases are long gone. You can exchange data with other organizations or buy it from companies when it is standardized. Would you rather buy census data on magnetic tapes or conduct the census yourself?
You can read data and understand it because you know the units of measure.
Chapter 1. Scales and Measurements
“In physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.”
PLA, Vol. 1, Electrical Units of Measurement, 1883-05-03
Before you can put data into a database, you actually need to think about how it will be represented and manipulated. Most programmers have never heard of measurement theory or thought about the best way to represent their data. They either use whatever was there before or invent their own schemes on the fly. Most of the time, the data is put into the database in the units in which it was collected without regard to even a quick validation. It is assumed the input is in an appropriate unit, with appropriate scale and precision. In short, application programmers and users are perfect. This tendency to believe the computer, no matter how absurd the data, is called the “Garbage In, Gospel out” principle in IT folklore.
This unwillingness to do validation and verification is probably the major reason for the lack of data quality.

1.1. Measurement Theory

“Measure all that is measurable and attempt to make measurable that which is not yet so.”
—Galileo (1564–1642)
Measurement theory is a branch of applied mathematics that is useful in data analysis and database design. Measurements are not the same as the attribute being measured. Measurement is not just assigning numbers to things or their attributes so much as it is finding a property in things that can be expressed in numbers or other computable symbols. This structure is the scale used to take the measurement; the numbers or symbols represent units of measure.
Strange as it might seem, measurement theory came from psychology, not mathematics, statistics, or computer science. S. S. Stevens originated the idea of levels of measurement and classification of scales in 1946 for psychology testing. This is more recent than you would have thought. Scales are classified into types by the properties they do or do not have. The properties with which we are concerned are the following.
1. There is a natural origin point on the scale. This is sometimes called a zero, but it does not literally have to be a numeric zero. For example, if the measurement is the distance between objects, the natural zero is zero meters—you cannot get any closer than that. If the measurement is the temperature of objects, the natural zero is absolute zero—nothing can get any colder. However, consider time; it goes from an eternal past into an eternal future, so you cannot find a natural origin for it.
2. Meaningful operations can be performed on the units. It makes sense to add weights together to get a new weight. Adding temperatures has to consider mass. Dates can be subtracted to give a duration in days. However, adding names or shoe sizes together is absurd.
3. There is a natural ordering to the units. It makes sense to speak about events occurring before or after one another in time or a physical object being heavier, longer, or hotter than another object.
But the alphabetical order imposed on a list of names is arbitrary, not natural—a foreign language, with different names for the same objects, would impose another alphabetical ordering. And that assumes the other language even had an alphabet for an ordering; Chinese, for example, does not.
4. There is a natural metric function on the units. A metric function has nothing to do with the “metric system” of measurements, which is more properly called SI, for “SystemĂ© International d’units” in French. Metric functions have the following three properties:
a. The metric between an object and itself is the natural origin of the scale. We can write this in a notation as M(a, a) = 0.
b. The order of the objects in the metric function does not matter. Again in the semimathematical notation, M(a, b) = M(b, a).
c. There is a natural additive function that obeys the rule that M(a, b) + M(b, c) > = M(a, c), which is also known as the triangular inequality.
This notation is meant to be more general than just arithmetic. The “zero” in the first property is the origin of the scale, not just a numeric zero. The third property, defined with a “plus” and a “greater than or equal” sign, is a symbolic way of expressing general ordering relationships. The “greater than or equal” sign refers to a natural ordering on the attribute being measured. The “plus” sign refers to a meaningful operation in regard to that ordering, not just arithmetic addition.
The special case of the third property, where the “greater than or equal to” is always “greater than,” is very desirable to people because it means that they can use numbers for units and do simple arithmetic with the scales. This is called a strong metric property. For example, human perceptions of sound and light intensity follow a cube root law—that is, if you double the intensity of light, the perception of the intensity increases by only 20% (Stevens 1957). The actual formula is “Physical intensity to the 0.3 power equals perceived intensity” in English. Knowing this, designers of stereo equipment use controls that work on a logarithmic scale internally but that show evenly spaced marks on the control panel of the amplifier.
It is possible to have a scale that has any combination of the metric properties. For example, instead of measuring the distance between two places in meters, you can measure it in units of effort. This is the old Chinese system, which had uphill and downhill units of distance, so you could estimate the time required to make a journey on foot.
Does this system of distances have the property that M(a, a) = 0? Yes; it takes no effort to get to where you are already located. Does it have the property that M(a, b) = M(b, a)? No; it takes less effort to go downhill than to go uphill. Does it have the property that M(a, b) + M(b, c) >= M(a, c)? Yes with the direction considered; the amount of effort needed to go directly to a place will always be less than the effort of making another stop along the way.
As you can see, these properties can be more intuitive than mathematical. Obviously, we like the more mathematical side of this model because it fits into a database, but you have to be aware of the intuitive side.

1.1.1. Range, Granularity, and Your Instruments

“The only man who behaves sensibly is my tailor; he takes my measurements anew every time he sees me, while all the rest go on with their old measurements and expect me to fit them.”
—George Bernard Shaw
Range and granularity are properties of the way the measurements are made. Since we have to store data in a database within certain limits, they are very important to a database designer. The type of scales is unrelated to whether you use discrete or continuous variables. While measurements in a database are always discrete due to finite precision, attributes can be conceptually either discrete or continuous regardless of measurement level. Temperature is usually regarded as a continuous attribute, so temperature measurement to the nearest degree Celsius is a ratio-level measurement of a continuous attribute.
B9780123747228000013/gr1.webp is missing
■ Figure 1-1:.
Micrometer (http://www.design-technology.org/micrometer.webp).
However, quantum mechanics holds that the universe is fundamentally discrete, so temperature may actually be a discrete attribute. In ordinal scales for continuous attributes, ties are impossible (or have probability zero). In ordinal scales for discrete attributes, ties are possible. Nominal scales usually apply to discrete attributes. Nominal scales for continuous attributes can be modeled but are rarely used.
Aside from these philosophical considerations, there is the practical aspect of the instrument used for the measurement. A radio telescope, surveyor’s transit, meter stick, and a micrometer are tools that measure distance. Nobody would claim that they are interchangeable. I can use a measuring tape to fit furniture in my house but not to make a mechanical wristwatch or to measure the distance to the moon.
From a purely scientific viewpoint, measurements should be reduced to the least precise instrument’s readings. This means that you can be certain that the final results of calculations can be justified.
From a practical viewpoint, measurements are often adjusted by statistical considerations. This means that final results of calculations will be closer to reality—assuming that the adjustments were valid. This is particularly true for missing data, which we will discuss later.
But for now consider the simple example of a database showing that Joe Celko bought 500 bananas this week. Unless I just started a gorilla ranch, this is absurd and probably ought to be adjusted to five bananas or less. On the other hand, if the Dairy Queen Company orders five bananas this week, this is absurd. They are a corporation that had about 6000 restaurants in the United States, Canada, and 20 foreign countries in 2007, all of which make a lot of banana splits every day.

1.1.2. Range

A scale also has other properties that are of interest to someone building a database. First, scales have a range—what are the highest and lowest values that can appear on the scale? It is possible to have a finite or an infinite limit on either the lower or the upper bound. Overflow ...

Table of contents

  1. Cover image
  2. Table of Contents
  3. Copyright
  4. Introduction
  5. Part 1. History, Standards, and Designing Data
  6. Chapter 1. Scales and Measurements
  7. Chapter 2. Validation
  8. Chapter 3. Data Encoding Schemes
  9. Chapter 4. Scales
  10. Chapter 5. Data with Ignorance
  11. Chapter 6. Keys
  12. Part 2. A Sampling of Standards
  13. Chapter 7. Dates
  14. Chapter 8. Sex Codes
  15. Chapter 9. Ethnicity and Race Codes
  16. Chapter 10. ISO-3166 and Other Country Codes
  17. Chapter 11. Language Codes
  18. Chapter 12. Currency Codes
  19. Chapter 13. National Identification Numbers
  20. Chapter 14. Occupations
  21. Chapter 15. Colors
  22. Chapter 16. Telephone Numbers
  23. Chapter 17. E-Mail Addresses
  24. Chapter 18. Universal Postal Union
  25. Chapter 19. Hierarchical Triangular Mesh
  26. Chapter 20. Shoe Sizes
  27. Chapter 21. International Clothing Sizes
  28. Chapter 22. ICD Codes
  29. Chapter 23. Vehicle Identification Number (VIN)
  30. Chapter 24. Freight Containers
  31. Chapter 25. Credit Card Numbers
  32. Chapter 26. SWIFT and Related Banking Standards
  33. Chapter 27. Data Universal Numbering System
  34. Chapter 28. Global Trade Item Number
  35. Chapter 29. Digital Object Identifier (DOI)
  36. Chapter 30. Audiovisual Media
  37. Chapter 31. ISIN and Related Securities Identifiers
  38. Chapter 32. Temperature Scales
  39. Chapter 33. National Animal Identification System (NAIS)
  40. Chapter 34. ISO 216 Paper Sizes (“A,” “B,” and “C” Series)
  41. Chapter 35. Compass Points
  42. Chapter 36. Unicode
  43. Chapter 37. Driver’s Licenses
  44. Chapter 38. Currency Units and Near Money
  45. Chapter 39. Recipes and Food Preparation
  46. Chapter 40. Portable Document Format (PDF)
  47. Chapter 41. Temporal Data
  48. Chapter 42. Additive Congruential Generators
  49. Chapter 43. Traditional and Metric Typographic Units
  50. Index