Advanced Data Management
eBook - ePub

Advanced Data Management

  1. 374 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Advanced Data Management

Book details
Book preview
Table of contents
Citations

About This Book

Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions.
This book provides a comprehensive coverage of the principles of data management developed in the last decades with a focus on data structures and query languages. It treats a wealth of different data models and surveys the foundations of structuring, processing, storing and querying data according these models.

Starting off with the topic of database design, it further discusses weaknesses of the relational data model, and then proceeds to convey the basics of graph data, tree-structured XML data, key-value pairs and nested, semi-structured JSON data, columnar and record-oriented data as well as object-oriented data. The final chapters round the book off with an analysis of fragmentation, replication and consistency strategies for data management in distributed databases as well as recommendations for handling polyglot persistence in multi-model databases and multi-database architectures.

While primarily geared towards students of Master-level courses in Computer Science and related areas, this book may also be of benefit to practitioners looking for a reference book on data modeling and query processing. It provides both theoretical depth and a concise treatment of open source technologies currently on the market.

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Advanced Data Management by Lena Wiese in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Information

Publisher
De Gruyter
Year
2015
ISBN
9783110433074
Edition
1

Part I: Introduction

1 Background

Database systems are fundamental for the information society. Every day, an inestimable amount of data is produced, collected, stored and processed: online shopping, sending emails, using social media, or seeing your physician are just some of the day-to-day activities that involve data management. A properly working database management system is hence crucial for a smooth operation of these activities. In this chapter, we introduce the principles and properties that a database system should ful fill. Database management systems and their components as well as data modeling are the other two basic concepts treated in this chapter.

1.1 Database Properties

As data storage plays such a crucial role in most applications, database systems should guarantee a correct and reliable execution in several use cases. From an abstract perspective, we desire that a database system fulfill the following properties:
Data management. A database system not only stores data, it must just as well support operations for retrieval of data, searches for data and updates on data. To enable interoperability with external applications, the database system must provide communication interfaces or application programming interfaces for several communication protocols or programming languages. A database system should also support transactions: A transaction is a sequence of operations on data in a database that must not be interrupted. In other words, the database executes operations within a transaction according to the ā€œall or nothingā€ principle: Either all operations succeed to their full extent or none of the operations is executed (and the subsequence of operations that was already executed is undone).
Scalability. The amount of data processed daily with modern information technology is tremendous. Processing these data can only be achieved by distribution of data in a network of database servers and a high level of parallelization. Database systems must flexibly react and adapt to a higher workload.
Heterogeneity. When collecting data or producing data (as output of some program), these data are usually not tailored to being stored in a relational table format. While the data in relational format are called structured and have a fixed schema which prescribes the structure of the data, data often come in different formats. Data that have a more flexible structure than the table format are called semi-structured; these can be tree-like structures (as used in XML documents) or ā€“ more generally ā€“ graph structures. Furthermore, data can be entirely unstructured (like arbitrary text documents).
Efficiency. The majority of database applications need fast database systems. Online shopping and web searches rely on high-performance search and retrieval operations. Likewise, other database operations like store and update must be executed in a speedy fashion to ensure operability of database applications.
Persistence. The main purpose of a database system is to provide a long-term storage facility for data. Some modern database applications (like data stream processing) just require a kind of selective persistence: only some designated output data have to be stored onto long-term storage devices, whereas the majority of the data is processed in volatile main memory and discarded afterwards.
Reliability. Database systems must prevent data loss. Data stored in the database system should not be distorted unintentionally: data integrity must be maintained by the database system. Storing copies of data on other servers or storage media (a mechanism called physical redundancy or replication) is crucial for data recovery after a failure of a database server.
Consistency. The database system must do its best to ensure that no incorrect or contradictory data persist in the system. This involves the automatic verification of consistency constraints (data dependencies like primary key or foreign key constraints) and the automatic update of distributed data copies (the replicas).
Non-redundancy. While physical redundancy is decisive for the reliability of a database system, duplication of values inside the stored data sets (that is, logical redundancy) should best be avoided. First of all, logical redundancy wastes space on the storage media. Moreover, data sets with logical redundancy are prone to different forms of anomalies that can lead to erroneous or inconsistent data. Normalization is one way to transform data sets into a non-redundant format.
Multi-User Support. Modern database systems must support concurrent accesses by multiple users or applications. Those independent accesses should run in isolation and not interfere with each other so that a user does not notice that other users are accessing the database system at the same time. Another major issue with multi-user support is the need for access control: data of one user should be protected from unwanted accesses by other users. A simple strategy for access control is to only allow users access to certain views on the data sets. A well-defined authentication mechanism is crucial to implement access control.
image
A database system should manage large amounts of heterogeneous data in an efficient, persistent, reliable, consistent, non-redundant way for multiple users.
Database systems often do not satisfy all of these requirements or only to the certain extent. When choosing a database system for a specific application, clarifying all mandatory requirements and weighing the pros and cons of the available systems is the first and foremost task.
image
Fig. 1.1. Database management system and interacting components

1.2 Database Components

The software component that is in charge of all database operations is the database management system (DBMS). Several other systems and components interact with the DBMS as shown in Figure 1.1. The DBMS relies on the operating system and the file system of the database server to store the data on disk. The DBMS also relies on the operating system to be able to use the network interfaces for communication with external applications or other database servers.
The low-level file system (or the operating system) does not have knowledge on internal structure or meaning of stored data, it just handles the stored data as arbitrary records. Hence, the purpose of the database management system is to provide the users with a higher-level interface and more structured data storage and retrieval operations. The DBMS operates on data in the main memory; more precisely it handles data in a particular portion of the main memory (called page buffer) that is reserved for the DBMS. The typical storage unit on disk is a ā€œblockā€ of data; often this data block is ca...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Dedication
  5. Preface
  6. Overview
  7. Table of Contents
  8. List of Figures
  9. List of Tables
  10. Part I: Introduction
  11. Part II: NOSQL And Non-Relational Databases
  12. Part III: Distributed Data Management
  13. Part IV: Conclusion
  14. Bibliography
  15. Index