eBook - ePub

Mastering PostgreSQL 13

Name: Mastering PostgreSQL 13
Author: Hans-Jürgen Schönig

Build, administer, and maintain database applications efficiently with PostgreSQL 13, 4th Edition

Hans-Jürgen Schönig

Share book

476 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Mastering PostgreSQL 13

Build, administer, and maintain database applications efficiently with PostgreSQL 13, 4th Edition

Hans-Jürgen Schönig

Book details

Book preview

Table of contents

Citations

About This Book

Explore expert techniques such as advanced indexing and high availability to build scalable, reliable, and fault-tolerant database applications using PostgreSQL 13

Key Features

Master advanced PostgreSQL 13 concepts with the help of real-world datasets and examples
Leverage PostgreSQL's indexing features to fine-tune the performance of your queries
Extend PostgreSQL's functionalities to suit your organization's needs with minimal effort

Book Description

Thanks to its reliability, robustness, and high performance, PostgreSQL has become one of the most advanced open source databases on the market. This updated fourth edition will help you understand PostgreSQL administration and how to build dynamic database solutions for enterprise apps with the latest release of PostgreSQL, including designing both physical and technical aspects of the system architecture with ease.

Starting with an introduction to the new features in PostgreSQL 13, this book will guide you in building efficient and fault-tolerant PostgreSQL apps. You'll explore advanced PostgreSQL features, such as logical replication, database clusters, performance tuning, advanced indexing, monitoring, and user management, to manage and maintain your database. You'll then work with the PostgreSQL optimizer, configure PostgreSQL for high speed, and move from Oracle to PostgreSQL. The book also covers transactions, locking, and indexes, and shows you how to improve performance with query optimization. You'll also focus on how to manage network security and work with backups and replication while exploring useful PostgreSQL extensions that optimize the performance of large databases.

By the end of this PostgreSQL book, you'll be able to get the most out of your database by executing advanced administrative tasks.

What you will learn

Get well versed with advanced SQL functions in PostgreSQL 13
Get to grips with administrative tasks such as log file management and monitoring
Work with stored procedures and manage backup and recovery
Employ replication and failover techniques to reduce data loss
Perform database migration from Oracle to PostgreSQL with ease
Replicate PostgreSQL database systems to create backups and scale your database
Manage and improve server security to protect your data
Troubleshoot your PostgreSQL instance to find solutions to common and not-so-common problems

Who this book is for

This database administration book is for PostgreSQL developers and database administrators and professionals who want to implement advanced functionalities and master complex administrative tasks with PostgreSQL 13. Prior experience in PostgreSQL and familiarity with the basics of database administration will assist with understanding key concepts covered in the book.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Mastering PostgreSQL 13 an online PDF/ePUB?

Yes, you can access Mastering PostgreSQL 13 by Hans-Jürgen Schönig in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2020

ISBN

9781800565425

Edition

Topic

Computer Science

Subtopic

Databases

Index

Computer Science

Optimizing Queries for Good Performance

In Chapter 5, Log Files and System Statistics, you learned how to read system statistics and how to make use of what PostgreSQL provides. Now that we are armed with this knowledge, this chapter is all about good query performance. Everybody is looking for good query performance. Therefore, it is important to address this topic in an in-depth way.

In this chapter, you will learn about the following topics:

Learning what the optimizer does
Understanding execution plans
Understanding and fixing joins
Enabling and disabling optimizer settings
Partitioning data
Adjusting parameters for good query performance
Making use of parallel queries
Introducing Just-in-Time (JIT) compilation

By the end of this chapter, we will be able to write better and faster queries. If the queries still aren't very good, we should be able to understand why this is the case. We will also be able to use the new techniques we will have learned about to partition data.

Learning what the optimizer does

Before even attempting to think about query performance, it makes sense to familiarize yourself with what the query optimizer does. Having a deeper understanding of what is going on under the hood makes a lot of sense because it helps you see what the database is really up to.

A practical example – How the query optimizer handles a sample query

To demonstrate how the optimizer works, I have compiled an example. It is something that I have used over the years for PostgreSQL training. Let's assume that there are three tables, as follows:

CREATE TABLE a (aid int, ...); -- 100 million rows CREATE TABLE b (bid int, ...); -- 200 million rows  CREATE TABLE c (cid int, ...); -- 300 million rows

Let's further assume that those tables contain millions, or maybe hundreds of millions, of rows. In addition to that, there are indexes:

CREATE INDEX idx_a ON a (aid); CREATE INDEX idx_b ON b (bid); CREATE INDEX idx_c ON c (cid);
CREATE VIEW v AS SELECT * 
 FROM a, b
 WHERE aid = bid;

Finally, there is a view that's joining the first two tables together.

Let's suppose that the end user wants to run the following query. What will the optimizer do with this query? What choices does the planner have?

SELECT * 
FROM v, c
WHERE v.aid = c.cid
 AND cid = 4;

Before looking at the actual optimization process, we will focus on some of the options that the planner has.

Evaluating join options

The planner has a couple of options here, so let's take this opportunity to understand what can go wrong if straightforward approaches are used.

Suppose the planner just steams ahead and calculates the output of the view. What is the best way to join 100 million rows with 200 million rows?

In this section, a couple of (not all) join options will be discussed to show you what PostgreSQL is able to do.

Nested loops

One way to join two tables is to use a nested loop. The principle here is simple. Here is some pseudocode:

for x in table1: 
 for y in table2: 
 if x.field == y.field 
 issue row 
 else 
 keep doing

Nested loops are often used if one of the sides is very small and contains only a limited set of data. In our example, a nested loop would lead to 100 million x 200 million iterations through the code. This is clearly not an option because the runtime would simply explode.

A nested loop is generally O(n2), so it is only efficient if one side of the join is very small. In this example, this is not the case, so a nested loop can be ruled out for calculating the view.

Hash joins

The second option is a hash join. The following strategy could be applied to solve our little problem. The following listing shows how a hash join works:

Hash join
 Sequential scan table 1
 Sequential scan table 2

Both sides can be hashed and the hash keys could be compared, leaving us with the result of the join. The problem here is that all of the values have to be hashed and stored somewhere.

Merge joins

Finally, there is the merge join. The idea here is to use sorted lists to join the results. If both sides of the join are sorted, the system can just take the rows from the top and see if they match and return them. The main requirement here is that the lists are sorted. Here is a sample plan:

Merge join 
 Sort table 1 
 Sequential scan table 1 
 Sort table 2 
 Sequential scan table 2

To join these two tables (table 1 and table 2), data has to be provided in a sorted order. In many cases, PostgreSQL will just sort the data. However, there are other options we can use to provide the join with sorted data. One way is to consult an index, as shown in the following example:

Merge join
 Index scan table 1
 Index scan table 2

One side of the join, or both sides, can use sorted data coming from lower levels of the plan. If the table is accessed directly, an index is the obvious choice for this, but only if the returned result set is significantly smaller than the entire table. Otherwise, we encounter almost double the overhead because we have to read the entire index and then the entire table. If the result set is a large portion of the table, a sequential scan is more efficient, especially if it is being accessed in the primary key order.

The beauty of a merge join is that it can handle a lot of data. The downside is that data has to be sorted or taken from an index at some point.

Sorting is O(n * log(n)). Therefore, sorting 300 million rows to perform the join is not attractive either.

Note that, since the introduction of PostgreSQL 10.0, all the join options described here are also available in a parallel version. Therefore, the optimizer will not just consider those standard join options, but also evaluate whether it makes sense to perform parallel queries.

Applying transformations

Obviously, doing the obvious thing (joining the view first) makes no sense at all. A nested loop would send the execution time through the roof. A hash join has to hash millions of rows, and a nested loop has to sort 300 million rows. All three options are clearly unsuitable here. The way out is to apply logical transformations to make the query fast. In this section, you will learn what the planner does to speed up the query. A couple of steps will need to be performed:

Inlining the view: The first transformation that the optimizer does is i...