PostgreSQL 10 High Performance
eBook - ePub

PostgreSQL 10 High Performance

  1. 508 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

PostgreSQL 10 High Performance

Book details
Book preview
Table of contents
Citations

About This Book

Leverage the power of PostgreSQL 10 to design, administer and maintain a high-performance database solution

Key Features

  • Obtain optimal PostgreSQL 10 database performance, ranging from initial design to routine maintenance
  • Fine tune the performance of your queries and avoid the common pitfalls that can slow your system down
  • Contains tips and tricks on scaling successful database installations, and ensuring a highly available PostgreSQL solution

Book Description

PostgreSQL database servers have a common set of problems that they encounter as their usage gets heavier and requirements get more demanding. Peek into the future of your PostgreSQL 10 database's problems today. Know the warning signs to look for and how to avoid the most common issues before they even happen.

Surprisingly, most PostgreSQL database applications evolve in the same way—choose the right hardware, tune the operating system and server memory use, optimize queries against the database and CPUs with the right indexes, and monitor every layer, from hardware to queries, using tools from inside and outside PostgreSQL. Also, using monitoring insight, PostgreSQL database applications continuously rework the design and

configuration. On reaching the limits of a single server, they break things up; connection pooling, caching, partitioning, replication, and parallel queries can all help handle increasing database workloads.

By the end of this book, you will have all the knowledge you need to design, run, and manage your PostgreSQL solution while ensuring high performance and high availability

What you will learn

  • Learn best practices for scaling PostgreSQL 10 installations
  • Discover the best hardware for developing high-performance PostgreSQL applications
  • Benchmark your whole system – from hardware to application
  • Learn by real examples how server parameters impact performance
  • Discover PostgreSQL 10 features for partitioning and parallel query
  • Monitor your server, both inside and outside the database
  • Design and implement a good replication system on PostgreSQL 10

Who this book is for

This book is designed for database administrators and PostgreSQL architects who already use or plan to exploit the features of PostgreSQL 10 to design and maintain a high-performance PostgreSQL database. A working knowledge of SQL, and some experience with PostgreSQL will be helpful in getting the most out of this book.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access PostgreSQL 10 High Performance by Enrico Pirozzi in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Bases de datos. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781788472456

Query Optimization

For some database administrators and application developers, query optimization is the most important part of database performance tuning. Earlier chapters have wandered some distance from there, because if your fundamental server setup isn't good, no amount of query tuning will help you. But things like selecting good hardware for a database server are rare. Figuring out why a query is slow and how to improve it is something you can expect to happen all the time. This is particularly true because query plans drift over time, and some aspects of your database will become less efficient. Maintenance to improve the latter problem is covered in the next few chapters. How queries execute and improving that execution is this chapter's topic.
The chapter will cover the following topics:
  • Explanation of the EXPLAIN command
  • Query plan node structure
  • Cost computation
  • Processing nodes
  • Scans
  • Subquery scan and subplan
  • CTE scan
  • Joins
  • Statistics

Sample data sets

To show you queries to optimize, you need data. So far, randomly generated data for a small number of tables has been good enough for that, but that has its limits and they were reached in the previous chapter, Chapter 9, Database Indexing.
Obtaining sample data of a significant size for benchmarking purposes is a never-ending struggle, because the pace of hardware progress means yesteryear's massive test database can be today's trivial workload. A listing that's kept up to date with useful new sources for test databases is at http://wiki.postgresql.org/wiki/Sample_Databases and some of these are part of a PostgreSQL sample database project, which has several examples that are worth checking out.

Pagila

The pagila database is a particularly interesting sample because it showcases many advanced PostgreSQL features. It relies heavily on stored procedures and even partitions some of its data. The main reason it's not used for examples here is because it's very small, only a few megabytes in size. The non-free license on the documentation for the original database it's based on (Sakila, a part of MySQL) makes it problematic to quote from here too.

dellstore2

dellstore2 was originally distributed by Dell at http://linux.dell.com/dvdstore/ as part of an effort to create a vendor-neutral comparison test. It includes everything from the database to an e-commerce web application. Its sample store sells DVDs to customers and includes products, orders, inventory, and customer history.
A PostgreSQL port of the database part of the Dell Store application is available from the sample databases project at http://pgfoundry.org/projects/dbsamples/.
While the benchmark can theoretically be targeted to create 3 sizes, only the small (10 MB) version is available for PostgreSQL so far. The data generators for the medium (1 GB) and large (100 GB) stores haven't been implemented for PostgreSQL yet. This makes the dellstore2 a modest test database for examples here, but not a real representative of a serious workload by modern standards.
Obtaining and installing dellstore2 can be very simple:
$ wget http://pgfoundry.org/frs/download.php/543/dellstore2-normal-1.0.tar.gz $ tar xvfz dellstore2-normal-1.0.tar.gz $ cd dellstore2-normal-1.0/ $ createdb dellstore2 $ psql -f dellstore2-normal-1.0.sql -d dellstore2 $ psql -d dellstore2 -c "VACUUM VERBOSE ANALYZE" 
With indexes and related overhead, the result is actually 21 MB:
$ psql -d dellstore2 -c "SELECT pg_size_pretty(pg_database_size('dellstore2'))"  pg_size_pretty  ----------------  21 MB 
This means that even the tiny default shared_buffers on most systems can hold the whole set of data, but the queries aren't completely trivial. Here are the major tables and indexes:
  table | size  --------------------------------+--------- public.customers | 3808 kB public.orderlines | 2840 kB public.cust_hist | 2368 kB public.products | 808 kB public.orders | 800 kB public.inventory | 400 kB    index | size  --------------------------------+--------- public.ix_orderlines_orderid | 1336 kB public.ix_cust_hist_customerid | 1080 kB public.ix_cust_username | 544 kB public.customers_pkey | 368 kB public.ix_order_custid | 232 kB public.orders_pkey | 232 kB public.ix_prod_special | 192 kB public.ix_prod_category | 192 kB public.products_pkey | 192 kB public.inventory_pkey | 192 kB 
The examples in this chapter will be using the dellstore2 example installed, as shown before, unless otherwise noted.
The structure of this data is easy to understand if you've ever shopped online:
  • There are a number of products the store sells, each of which fits into a category
  • The store has customers
  • Customers place orders
  • Each order has a number of lines to it, each of which references the product being purchased
  • A customer history is saved listing all the products that customer has ever ordered

EXPLAIN basics

If you have a query that's running slowly, the first thing to try is running it with EXPLAIN before the statement executes. This displays what's called a query plan, the list of what's expected to happen when that query is executed. If you instead use EXPLAIN ANALYZE before the statement, you'll get both the estimation describing what the planner expected, along with what actually happened when the query ran. Note that this form will actually execute the statement as if you run it manually. Consider the following statement:
EXPLAIN ANALYZE DELETE * FROM t;
This is not only going to show you a query plan for deleting those rows, it's going to delete them; that's the only way to know for sure how long actually executing the plan is going to take. It's much harder to get a consistent setup to compare timing of operations that do INSERT, UPDATE, or DELETE using EXPLAIN ANALYZE because the act of collecting the data will change what a subsequent execution of the same statement will do.

Timing overhead

Assume you're executing a simple query to count all the customers in the database, and want to time how long this takes:
dellstore2=# \timing Timing is on. dellstore2=# SELECT count(*) FROM customers;  count  ------- 20000 Time: 7.994 ms
You may then be curious to know which query plan was used to get this result. The amount of time taken to determine that may shock you:
dellstore2=# EXPLAIN ANALYZE SELECT count(*) FROM customers; QUERY PLAN  ---------- Aggregate (cost=726.00..726.01 rows=1 width=0) (actual time=68.836..68.838 rows=1 loops=1) -> Seq Scan on customers (cost=0.00..676.00 rows=20000 width=0) (actual time=0.012..33.609 rows=20000 loops=1) Total runtime: 68.934 ms Time: 69.837 ms 
This fairly trivial query was picked because it demonstrates something close to a worst-case scenario here, where instrumenting the query causes the result to slow dramatically, to almost 10 times as long. Using EXPLAIN ANALYZE is great for getting real times, but you shouldn't assume the exact proportions or time to be the same when running the query normally.

Hot and cold cache behavior

Returning to the regular version of the query seen previously, it executed in 7.994 milliseconds. This represents hot cache behavior, meaning that the data needed for the query was already in either the database or OS caches. It was left behind in the cache from when the data was loaded in the first place. Whether your cache is hot, or cold (not in the cache), is another thing to be very careful of. If you run a query twice with two different approaches, the second will likely be much faster simply because of caching, regardless of whether the plan was better or worse.
You can look at how long a query against the entire table takes as a way to measure the effective transfer rate for that table. In the hot cache case, it gives you an idea how quickly data moves between two sections of memory:
SELECT pg_size_pretty(CAST(pg_relation_size('customers') / 7.994 * 1000 as int8)) AS bytes_per_second;  bytes_per_second  ------------------  465 MB 
As this was run on a simple laptop, getting 456 Mbps of rows processed by a query is respectable. In this case, repeatedly running the query takes around the same amount of time each run, which means that the amount cached is staying constant and not impacting results. In this case, it's 100% cached.

Clearing the cache

The exact way to clear all these caches and to get cold cache performance again varies based on the OS. Just stopping the database server isn't enough, because the OS cache can be expected to still have plenty of information cached. On Linux, you can use the drop_caches feature to discard everything it has in its page cache. Here's a complete example of cleaning the data out of memory for this database on Linux:
$ pg_ctl stop $ sudo su -  # sync # echo 3 > /proc/sys/vm/drop_caches # logout $ pg_ctl start -l $PGLOG 
The sync here is to try to flush all data to the disk before we just blow away the caches. This drop_caches feature on Linux is not intended for regular production server use; it is more of a debugging feature that's potentially dangerous.
Rerunning the same benchmark a shows quite different performance:
$ psql -d dellstore2 dellstore2=# \timing Timing is on. dellstore2=# SELECT count(*) FROM customers;  count  -------  20000 Time: 204.738 ms dellstore2=# SELECT pg_size_pretty(CAST(pg_relation_size('customers') / 204.738 * 1000 as int8)) AS bytes_per_second; bytes_per_second  ------------------ 18 MB 
Now you're seeing hard drive sequential read speeds 18 MB/s from a laptop hard drive, and this data isn't necessarily even contiguous. It's hard to achieve full drive speed on something so small though.
Repeating the query now returns to the original speed we know to expect from a hot cache run:
dellstore2=# SELECT count(*) FROM customers; Time: 8.067 ms 
Tests against real data sets need to be very careful to recognize whether their data is already in the cache or not. The usual technique is to run each query three times. If the first is much slower than the...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Dedication
  4. Packt Upsell
  5. Contributors
  6. Preface
  7. PostgreSQL Versions
  8. Database Hardware
  9. Database Hardware Benchmarking
  10. Disk Setup
  11. Memory for Database Caching
  12. Server Configuration Tuning
  13. Routine Maintenance
  14. Database Benchmarking
  15. Database Indexing
  16. Query Optimization
  17. Database Activity and Statistics
  18. Monitoring and Trending
  19. Pooling and Caching
  20. Scaling with Replication
  21. Partitioning Data
  22. Avoiding Common Problems
  23. Other Books You May Enjoy