How we maintain health of PostgreSQL database
We have couple of large and heavily updated database tables in PostgreSQL which often suffer from two issues:
- table and index bloat, which means they occupy way more disk space and memory than actually required;
- corrupted indexes, which means query planner can’t generate efficient query execution plans for them and as a result DB perfrmance degrades over time.
To help developers and database administrators deals with these issues, PostgreSQL has a few commands:
Let’s take a quick look at each of these maintenance commands to understand what they do and when to use them.
ANALYZE command collects statistics about specific table columns, entire table, or entire database. The PostgresQL query planner then uses that data to generate efficient execution plans for queries.
ANALYZE users;collects statistics for
ANALYZE VERBOSE users;does exactly the same plus prints progress messages.
ANALYZE users (id, name);collects statistics for
ANALYZE;collects statistics for all table in the current database.
Note that, in PostgreSQL
ANALYZE command doesn’t read or update indexes. It deals only with table/column contents. It also doesn’t block the table other queries still may read from the table while
ANALYZE is running.
You may want to run
ANALYZE in following situations:
- when the contents of a table has changed significantly. For example, when a few percents of records in a table have been added, updated or deleted.
- just before or after adding an index to a table. That may help query planner to generate optimal query plans that will efficient with the new index.
VACUUM comman reclaims storage occupied by dead records in database tables.
VACUUM removes deleted records and previous versions of updated records. The reason those records have to be removed is that they are only marked as unavailable but still occupy disk space, making table scans slower.
VACUUM also has the
ANALYZE option that tells it to analyze the table during vacuuming.
There are two different
VACUUMreorganize live records in a table and makes and allocated storage space avaiable for new records. However, that storage space doesn’t become available to the operating system. This command can be run in parallel with other read and write operations on the table.
VACUUM FULLexclusively locks the table, creates a new table file, copies only live records to this file, then deletes the old table file. The reclaimed storage space is returned to the operating system in this case. This command doesn’t allow any other read and write operations on the vacuumed table.
VACUUM users;vacuums users table.
VACUUM VERBOSE users;vacuums users table and prints progress messages.
VACUUM FULL users;fully vacuums users table. Other queries cannot access users table while vacuuming is running.
VACUUM ANALYZE users;vacuums and analyses users table.
VACUUM FULL VERBOSE ANALYZE users;fully vacuums users table and displays progress messages.
VACUUM;vacuums all the tables in the database the current user has access to.
Even though PostgreSQL can autovacuum tables after a certain percentage of rows gets marked as deleted, some developers and DB admins perfer to run
VACUUM ANALYZE on tables with a lot of read/write operations on a custom schedule. For example, daily, at the least busy time of the day.
You may want to run
VACUUM when the contents of a table has changed significantly: a significant percentage of records in a table has been added, updated or deleted.
REINDEX command rebuilds an existing index or all indexes for a table or entire database. Use this command to fix corrupted and unusable indexes, or when and index get bloated after significant change in the table contents.
REINDEX INDEX idx_users_on_name;reindexes index
REINDEX TABLE users;reindexes
REINDEX VERBOSE TABLE users;reindexes
userstable and prints progress messages.
REINDEX DATABASE db1;reindexes all tables in
In my experience, running
VACUUM ANALYZE and then
REINDEX commands on the largest by the number or rows and/or size tables in the database helps to improve the performance of SQL queries.
After a little playing around I created this SQL statement to generate
REINDEX TABLE CONCURRENTLY statements for all tables, in ascending order of size. Which helps me to validate the result after successful
SELECT 'REINDEX TABLE CONCURRENTLY ' || quote_ident(relname) || ' /*' ||
pg_size_pretty(pg_total_relation_size(C.oid)) || '*/;'
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname = 'public'
AND C.relkind = 'r'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC;
After performing above things results were pretty much good. Our database srank from 433GB disk usage to 377GB. Also some of our queries becomes faster.