How we maintain health of PostgreSQL database
We have couple of large and heavily updated database tables in PostgreSQL which often suffer from two issues:
- table and index bloat, which means they occupy way more disk space and memory than actually required;
- corrupted indexes, which means query planner can’t generate efficient query execution plans for them and as a result DB perfrmance degrades over time.
To help developers and database administrators deals with these issues, PostgreSQL has a few commands: ANALYZE
, VACUUM
, REINDEX
.
Let’s take a quick look at each of these maintenance commands to understand what they do and when to use them.
ANALYZE
PostgreSQL ANALYZE
command collects statistics about specific table columns, entire table, or entire database. The PostgresQL query planner then uses that data to generate efficient execution plans for queries.
Examples:
ANALYZE users;
collects statistics forusers
table.ANALYZE VERBOSE users;
does exactly the same plus prints progress messages.ANALYZE users (id, name);
collects statistics forid
andname
columns ofusers
table.ANALYZE;
collects statistics for all table in the current database.
Note that, in PostgreSQL ANALYZE
command doesn’t read or update indexes. It deals only with table/column contents. It also doesn’t block the table other queries still may read from the table while ANALYZE
is running.
You may want to run ANALYZE
in following situations:
- when the contents of a table has changed significantly. For example, when a few percents of records in a table have been added, updated or deleted.
- just before or after adding an index to a table. That may help query planner to generate optimal query plans that will efficient with the new index.
VACUUM
PostgreSQL VACUUM
comman reclaims storage occupied by dead records in database tables. VACUUM
removes deleted records and previous versions of updated records. The reason those records have to be removed is that they are only marked as unavailable but still occupy disk space, making table scans slower.
VACUUM
also has the ANALYZE
option that tells it to analyze the table during vacuuming.
There are two different VACUUM
versions.
VACUUM
reorganize live records in a table and makes and allocated storage space avaiable for new records. However, that storage space doesn’t become available to the operating system. This command can be run in parallel with other read and write operations on the table.VACUUM FULL
exclusively locks the table, creates a new table file, copies only live records to this file, then deletes the old table file. The reclaimed storage space is returned to the operating system in this case. This command doesn’t allow any other read and write operations on the vacuumed table.
Examples:
VACUUM users;
vacuums users table.VACUUM VERBOSE users;
vacuums users table and prints progress messages.VACUUM FULL users;
fully vacuums users table. Other queries cannot access users table while vacuuming is running.VACUUM ANALYZE users;
vacuums and analyses users table.VACUUM FULL VERBOSE ANALYZE users;
fully vacuums users table and displays progress messages.VACUUM;
vacuums all the tables in the database the current user has access to.
Even though PostgreSQL can autovacuum tables after a certain percentage of rows gets marked as deleted, some developers and DB admins perfer to run VACUUM ANALYZE
on tables with a lot of read/write operations on a custom schedule. For example, daily, at the least busy time of the day.
You may want to run VACUUM
when the contents of a table has changed significantly: a significant percentage of records in a table has been added, updated or deleted.
REINDEX
PostgreSQL REINDEX
command rebuilds an existing index or all indexes for a table or entire database. Use this command to fix corrupted and unusable indexes, or when and index get bloated after significant change in the table contents.
Examples:
REINDEX INDEX idx_users_on_name;
reindexes indexidx_users_on_name
.REINDEX TABLE users;
reindexesusers
table.REINDEX VERBOSE TABLE users;
reindexesusers
table and prints progress messages.REINDEX DATABASE db1;
reindexes all tables indb1
database.
In my experience, running VACUUM ANALYZE
and then REINDEX
commands on the largest by the number or rows and/or size tables in the database helps to improve the performance of SQL queries.
After a little playing around I created this SQL statement to generate REINDEX TABLE CONCURRENTLY
statements for all tables, in ascending order of size. Which helps me to validate the result after successful REINDEX
.
SELECT 'REINDEX TABLE CONCURRENTLY ' || quote_ident(relname) || ' /*' ||
pg_size_pretty(pg_total_relation_size(C.oid)) || '*/;'
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname = 'public'
AND C.relkind = 'r'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC;
After performing above things results were pretty much good. Our database srank from 433GB disk usage to 377GB. Also some of our queries becomes faster.