eBook - ePub

PostgreSQL 10 High Performance

Name: PostgreSQL 10 High Performance
Author: Enrico Pirozzi

Enrico Pirozzi,

508 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

PostgreSQL 10 High Performance

Enrico Pirozzi,

Book details

Book preview

Table of contents

Citations

About This Book

Leverage the power of PostgreSQL 10 to design, administer and maintain a high-performance database solution

Key Features

Obtain optimal PostgreSQL 10 database performance, ranging from initial design to routine maintenance
Fine tune the performance of your queries and avoid the common pitfalls that can slow your system down
Contains tips and tricks on scaling successful database installations, and ensuring a highly available PostgreSQL solution

Book Description

PostgreSQL database servers have a common set of problems that they encounter as their usage gets heavier and requirements get more demanding. Peek into the future of your PostgreSQL 10 database's problems today. Know the warning signs to look for and how to avoid the most common issues before they even happen.

Surprisingly, most PostgreSQL database applications evolve in the same way—choose the right hardware, tune the operating system and server memory use, optimize queries against the database and CPUs with the right indexes, and monitor every layer, from hardware to queries, using tools from inside and outside PostgreSQL. Also, using monitoring insight, PostgreSQL database applications continuously rework the design and

configuration. On reaching the limits of a single server, they break things up; connection pooling, caching, partitioning, replication, and parallel queries can all help handle increasing database workloads.

By the end of this book, you will have all the knowledge you need to design, run, and manage your PostgreSQL solution while ensuring high performance and high availability

What you will learn

Learn best practices for scaling PostgreSQL 10 installations
Discover the best hardware for developing high-performance PostgreSQL applications
Benchmark your whole system – from hardware to application
Learn by real examples how server parameters impact performance
Discover PostgreSQL 10 features for partitioning and parallel query
Monitor your server, both inside and outside the database
Design and implement a good replication system on PostgreSQL 10

Who this book is for

This book is designed for database administrators and PostgreSQL architects who already use or plan to exploit the features of PostgreSQL 10 to design and maintain a high-performance PostgreSQL database. A working knowledge of SQL, and some experience with PostgreSQL will be helpful in getting the most out of this book.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access PostgreSQL 10 High Performance by Enrico Pirozzi in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Bases de datos. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781788472456

Edition

Topic

Ciencia de la computación

Subtopic

Bases de datos

Query Optimization

For some database administrators and application developers, query optimization is the most important part of database performance tuning. Earlier chapters have wandered some distance from there, because if your fundamental server setup isn't good, no amount of query tuning will help you. But things like selecting good hardware for a database server are rare. Figuring out why a query is slow and how to improve it is something you can expect to happen all the time. This is particularly true because query plans drift over time, and some aspects of your database will become less efficient. Maintenance to improve the latter problem is covered in the next few chapters. How queries execute and improving that execution is this chapter's topic.

The chapter will cover the following topics:

Explanation of the EXPLAIN command
Query plan node structure
Cost computation
Processing nodes
Scans
Subquery scan and subplan
CTE scan
Joins
Statistics

Sample data sets

To show you queries to optimize, you need data. So far, randomly generated data for a small number of tables has been good enough for that, but that has its limits and they were reached in the previous chapter, Chapter 9, Database Indexing.

Obtaining sample data of a significant size for benchmarking purposes is a never-ending struggle, because the pace of hardware progress means yesteryear's massive test database can be today's trivial workload. A listing that's kept up to date with useful new sources for test databases is at http://wiki.postgresql.org/wiki/Sample_Databases and some of these are part of a PostgreSQL sample database project, which has several examples that are worth checking out.

Pagila

The pagila database is a particularly interesting sample because it showcases many advanced PostgreSQL features. It relies heavily on stored procedures and even partitions some of its data. The main reason it's not used for examples here is because it's very small, only a few megabytes in size. The non-free license on the documentation for the original database it's based on (Sakila, a part of MySQL) makes it problematic to quote from here too.

dellstore2

dellstore2 was originally distributed by Dell at http://linux.dell.com/dvdstore/ as part of an effort to create a vendor-neutral comparison test. It includes everything from the database to an e-commerce web application. Its sample store sells DVDs to customers and includes products, orders, inventory, and customer history.

A PostgreSQL port of the database part of the Dell Store application is available from the sample databases project at http://pgfoundry.org/projects/dbsamples/.

While the benchmark can theoretically be targeted to create 3 sizes, only the small (10 MB) version is available for PostgreSQL so far. The data generators for the medium (1 GB) and large (100 GB) stores haven't been implemented for PostgreSQL yet. This makes the dellstore2 a modest test database for examples here, but not a real representative of a serious workload by modern standards.

Obtaining and installing dellstore2 can be very simple:

$ wget http://pgfoundry.org/frs/download.php/543/dellstore2-normal-1.0.tar.gz $ tar xvfz dellstore2-normal-1.0.tar.gz $ cd dellstore2-normal-1.0/ $ createdb dellstore2 $ psql -f dellstore2-normal-1.0.sql -d dellstore2 $ psql -d dellstore2 -c "VACUUM VERBOSE ANALYZE"

With indexes and related overhead, the result is actually 21 MB:

$ psql -d dellstore2 -c "SELECT pg_size_pretty(pg_database_size('dellstore2'))"  pg_size_pretty  ----------------  21 MB

This means that even the tiny default shared_buffers on most systems can hold the whole set of data, but the queries aren't completely trivial. Here are the major tables and indexes:

  table | size  --------------------------------+--------- public.customers | 3808 kB public.orderlines | 2840 kB public.cust_hist | 2368 kB public.products | 808 kB public.orders | 800 kB public.inventory | 400 kB    index | size  --------------------------------+--------- public.ix_orderlines_orderid | 1336 kB public.ix_cust_hist_customerid | 1080 kB public.ix_cust_username | 544 kB public.customers_pkey | 368 kB public.ix_order_custid | 232 kB public.orders_pkey | 232 kB public.ix_prod_special | 192 kB public.ix_prod_category | 192 kB public.products_pkey | 192 kB public.inventory_pkey | 192 kB

The examples in this chapter will be using the dellstore2 example installed, as shown before, unless otherwise noted.

The structure of this data is easy to understand if you've ever shopped online:

There are a number of products the store sells, each of which fits into a category
The store has customers
Customers place orders
Each order has a number of lines to it, each of which references the product being purchased
A customer history is saved listing all the products that customer has ever ordered

EXPLAIN basics

If you have a query that's running slowly, the first thing to try is running it with EXPLAIN before the statement executes. This displays what's called a query plan, the list of what's expected to happen when that query is executed. If you instead use EXPLAIN ANALYZE before the statement, you'll get both the estimation describing what the planner expected, along with what actually happened when the query ran. Note that this form will actually execute the statement as if you run it manually. Consider the following statement:

EXPLAIN ANALYZE DELETE * FROM t;

This is not only going to show you a query plan for deleting those rows, it's going to delete them; that's the only way to know for sure how long actually executing the plan is going to take. It's much harder to get a consistent setup to compare timing of operations that do INSERT, UPDATE, or DELETE using EXPLAIN ANALYZE because the act of collecting the data will change what a subsequent execution of the same statement will do.

Timing overhead

Assume you're executing a simple query to count all the customers in the database, and want to time how long this takes:

dellstore2=# \timing Timing is on. dellstore2=# SELECT count(*) FROM customers;  count  ------- 20000 Time: 7.994 ms

You may then be curious to know which query plan was used to get this result. The amount of time taken to determine that may shock you:

dellstore2=# EXPLAIN ANALYZE SELECT count(*) FROM customers; QUERY PLAN  ---------- Aggregate (cost=726.00..726.01 rows=1 width=0) (actual time=68.836..68.838 rows=1 loops=1) -> Seq Scan on customers (cost=0.00..676.00 rows=20000 width=0) (actual time=0.012..33.609 rows=20000 loops=1) Total runtime: 68.934 ms Time: 69.837 ms

This fairly trivial query was picked because it demonstrates something close to a worst-case scenario here, where instrumenting the query causes the result to slow dramatically, to almost 10 times as long. Using EXPLAIN ANALYZE is great for getting real times, but you shouldn't assume the exact proportions or time to be the same when running the query normally.

Hot and cold cache behavior

Returning to the regular version of the query seen previously, it executed in 7.994 milliseconds. This represents hot cache behavior, meaning that the data needed for the query was already in either the database or OS caches. It was left behind in the cache from when the data was loaded in the first place. Whether your cache is hot, or cold (not in the cache), is another thing to be very careful of. If you run a query twice with two different approaches, the second will likely be much faster simply because of caching, regardless of whether the plan was better or worse.

You can look at how long a query against the entire table takes as a way to measure the effective transfer rate for that table. In the hot cache case, it gives you an idea how quickly data moves between two sections of memory:

SELECT pg_size_pretty(CAST(pg_relation_size('customers') / 7.994 * 1000 as int8)) AS bytes_per_second;  bytes_per_second  ------------------  465 MB

As this was run on a simple laptop, getting 456 Mbps of rows processed by a query is respectable. In this case, repeatedly running the query takes around the same amount of time each run, which means that the amount cached is staying constant and not impacting results. In this case, it's 100% cached.

Clearing the cache

The exact way to clear all these caches and to get cold cache performance again varies based on the OS. Just stopping the database server isn't enough, because the OS cache can be expected to still have plenty of information cached. On Linux, you can use the drop_caches feature to discard everything it has in its page cache. Here's a complete example of cleaning the data out of memory for this database on Linux:

$ pg_ctl stop $ sudo su -  # sync # echo 3 > /proc/sys/vm/drop_caches # logout $ pg_ctl start -l $PGLOG

The sync here is to try to flush all data to the disk before we just blow away the caches. This drop_caches feature on Linux is not intended for regular production server use; it is more of a debugging feature that's potentially dangerous.

Rerunning the same benchmark a shows quite different performance:

$ psql -d dellstore2 dellstore2=# \timing Timing is on. dellstore2=# SELECT count(*) FROM customers;  count  -------  20000 Time: 204.738 ms dellstore2=# SELECT pg_size_pretty(CAST(pg_relation_size('customers') / 204.738 * 1000 as int8)) AS bytes_per_second; bytes_per_second  ------------------ 18 MB

Now you're seeing hard drive sequential read speeds 18 MB/s from a laptop hard drive, and this data isn't necessarily even contiguous. It's hard to achieve full drive speed on something so small though.

Repeating the query now returns to the original speed we know to expect from a hot cache run:

dellstore2=# SELECT count(*) FROM customers; Time: 8.067 ms

Tests against real data sets need to be very careful to recognize whether their data is already in the cache or not. The usual technique is to run each query three times. If the first is much slower than the...

Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Preface
PostgreSQL Versions
Database Hardware
Database Hardware Benchmarking
Disk Setup
Memory for Database Caching
Server Configuration Tuning
Routine Maintenance
Database Benchmarking
Database Indexing
Query Optimization
Database Activity and Statistics
Monitoring and Trending
Pooling and Caching
Scaling with Replication
Partitioning Data
Avoiding Common Problems
Other Books You May Enjoy