eBook - ePub

Querying XML

Name: Querying XML
Author: Jim Melton,Stephen Buxton

XQuery, XPath, and SQL/XML in context

Jim Melton,

Stephen Buxton,

848 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Querying XML

XQuery, XPath, and SQL/XML in context

Jim Melton,

Stephen Buxton,

Book details

Book preview

Table of contents

Citations

About This Book

XML has become the lingua franca for representing business data, for exchanging information between business partners and applications, and for adding structure–and sometimes meaning—to text-based documents. XML offers some special challenges and opportunities in the area of search: querying XML can produce very precise, fine-grained results, if you know how to express and execute those queries.For software developers and systems architects: this book teaches the most useful approaches to querying XML documents and repositories. This book will also help managers and project leaders grasp how "querying XML" fits into the larger context of querying and XML. Querying XML provides a comprehensive background from fundamental concepts (What is XML?) to data models (the Infoset, PSVI, XQuery Data Model), to APIs (querying XML from SQL or Java) and more.* Presents the concepts clearly, and demonstrates them with illustrations and examples; offers a thorough mastery of the subject area in a single book. * Provides comprehensive coverage of XML query languages, and the concepts needed to understand them completely (such as the XQuery Data Model).* Shows how to query XML documents and data using: XPath (the XML Path Language); XQuery, soon to be the new W3C Recommendation for querying XML; XQuery's companion XQueryX; and SQL, featuring the SQL/XML * Includes an extensive set of XQuery, XPath, SQL, Java, and other examples, with links to downloadable code and data samples.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Querying XML by Jim Melton,Stephen Buxton in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Morgan Kaufmann

Year

2011

ISBN

9780080540160

Topic

Computer Science

Subtopic

Databases

Index

Computer Science

Part I

XML: Documents and Data

Chapter 1

XML

1.1 Introduction

The title of this book is Querying XML, so we start by introducing XML, describing what we mean by “querying,” and then discussing the special challenges in querying XML.

XML – the Extensible Markup Language – defines a set of rules for adding markup to data. Markup adds structure to data, and gives us a way of talking about the meaning of that data. The family of XML technologies provides a way to standardize the representation of data, so that we can process any data with standard programs, share data across applications, and transfer data from one person or application to another. In this first chapter, we introduce XML by looking at what markup is and what it’s good for. Then we look at a number of different uses for XML – a number of different kinds of XML data. Finally, we give examples of other ways to represent data, and compare them with XML.

1.2 Adding Markup to Data

Let’s take the movies example (Appendix A: The Example) used throughout this book. We have data describing many of our favorite movies. The data includes the title of the movie, the year it was first released, the names of some of the cast members, and other information about the movie. In this section, we look at the data in its raw form, then discuss how that data might be marked up to make it more useful.

1.2.1 Raw Data

We could represent our movie data in raw form, as in Example 1-1.

Example 1-1 movie, Raw Data

Example 1-1 is the raw data for one movie – a single record. In this format, the data doesn’t tell you much about the movie. You can probably spot the title, and, if you are familiar with “An American Werewolf in London,” you may be able to glean some information by means of educated guesswork. But if you wanted to write a program to read this data and do something with it – such as finding the name of the director – you would have to write code specifically for this piece of data (e.g., code that extracts the characters at positions 41 through 44 and 35 through 40 and adds a space in between them). What we need is some way to represent the data so that a program (or person) can process any movie record in the same way.

1.2.2 Separating Fields

A simple way to add some rudimentary structure to this record is to add a comma between each of the data items, or fields.

Example 1-2 movie, Fields Separated by Commas

Example 1-2 is the same movie data represented as a comma-separated list. Notice that, even with this simple mechanism, we had to introduce the “\” (backslash) character to “escape” a comma that was actually part of the data.

There are other ways to distinguish between fields of a record. In the early days of computing, fixed-length fields were common – each field might occupy, say, 8 bytes. This method makes access simple – if you want to access the beginning of the third field, you can go directly to the 17th byte. But fields smaller than 8 bytes take up more space than they need to, and fields longer than 8 bytes require some indication that they are spread across more than one field (such as a continuation marker).

Let’s continue our discussion with the comma-separated list in Example 1-2. You can spot the fields in this record, but there is no way of knowing which fields go together. For example, the fields “Agutter,” “Jenny,” “female,” and “Alex Price” each describe one aspect of a cast member, but it’s not apparent from the comma-separated list that those fields have anything in common. We have a way of delineating fields; now we need some way of grouping fields together.

1.2.3 Grouping Fields Together

Example 1-3 groups fields together. It also introduces a hierarchy of fields and subfields. Fields are separated by one or more commas, and fields that belong together are bounded by “,” at the start and “$,” at the end.

Example 1-3 movie, Grouped Fields

Example 1-3 is shown with some extra white space – each sub-field starts on a new line, and is indented. This is purely for (human) readability.

Now we know that “Agutter, Jenny, female, Alex Price” all belongs together and is all related in some way to “An American Werewolf in London.” And if you want to write a program to extract the director of each movie, given that each movie is formatted in the same way as in Example 1-3, you can write some general code that will parse the movie into first, second, and third fields, extract the contents of the third field, and parse that to get the first and last name of the director.

We are making progress! But Example 1-3 still has some shortcomings. There is no indication of what a field represents, other than its position within the record, which makes it difficult for humans to read. This has two implications – first, the data is vulnerable to error. If you (or the program generating the data) make a mistake and leave out the year of release, it’s not obvious that anything is missing, and a program processing this data may well return “LandisJohn” when asked for the year of release. Second, it makes it difficult to talk about the data. Most of the time, when we want to “talk about” the data, we want to describe some manipulation to a program – i.e., it’s difficult to write a program that says things like “print the second field of the third field of the movie record, then a space, then the first field of the third field of the movie record.” Our next step is to name the fields and subfields.

1.2.4 Naming Fields

If you read Example 1-3, you can probably guess that “An American Werewolf in London” is the title of the movie, and you may even deduce that Jenny Agutter plays the female lead, a character named Alex Price. But who is Peter Guber? And what does “98” mean? What we need is a way to name each field, to m...

Cover image
Title page
Table of Contents
The Morgan Kaufmann Series in Data Management Systems
Copyright
Dedication
Foreword
Preface
Part I: XML: Documents and Data
Part II: Metadata and XML
Part III: Managing and Storing XML for Querying
Part IV: Querying XML
Part V: Querying and The World Wide Web
Appendix A: The Example
Appendix B: Standards Processes
Appendix C: Grammars
Index
About the Authors