How we maintain health of PostgreSQL database

Nitin goyal
3 min readJun 14, 2021

We have couple of large and heavily updated database tables in PostgreSQL which often suffer from two issues:

  1. table and index bloat, which means they occupy way more disk space and memory than actually required;
  2. corrupted indexes, which means query planner can’t generate efficient query execution plans for them and as a result DB perfrmance degrades over time.

To help developers and database administrators deals with these issues, PostgreSQL has a few commands: ANALYZE, VACUUM, REINDEX.

Let’s take a quick look at each of these maintenance commands to understand what they do and when to use them.

ANALYZE

PostgreSQL ANALYZE command collects statistics about specific table columns, entire table, or entire database. The PostgresQL query planner then uses that data to generate efficient execution plans for queries.

Examples:

  1. ANALYZE users; collects statistics for users table.
  2. ANALYZE VERBOSE users; does exactly the same plus prints progress messages.
  3. ANALYZE users (id, name); collects statistics for id and name columns of users table.
  4. ANALYZE; collects statistics for all table in the current database.

Note that, in PostgreSQL ANALYZE command doesn’t read or update indexes. It deals only with table/column contents. It also doesn’t block the table other queries still may read from the table while ANALYZE is running.

You may want to run ANALYZE in following situations:

  1. when the contents of a table has changed significantly. For example, when a few percents of records in a table have been added, updated or deleted.
  2. just before or after adding an index to a table. That may help query planner to generate optimal query plans that will efficient with the new index.

VACUUM

PostgreSQL VACUUM comman reclaims storage occupied by dead records in database tables. VACUUM removes deleted records and previous versions of updated records. The reason those records have to be removed is that they are only marked as unavailable but still occupy disk space, making table scans slower.

VACUUM also has the ANALYZE option that tells it to analyze the table during vacuuming.

There are two different VACUUM versions.

  1. VACUUM reorganize live records in a table and makes and allocated storage space avaiable for new records. However, that storage space doesn’t become available to the operating system. This command can be run in parallel with other read and write operations on the table.
  2. VACUUM FULL exclusively locks the table, creates a new table file, copies only live records to this file, then deletes the old table file. The reclaimed storage space is returned to the operating system in this case. This command doesn’t allow any other read and write operations on the vacuumed table.

Examples:

  1. VACUUM users; vacuums users table.
  2. VACUUM VERBOSE users; vacuums users table and prints progress messages.
  3. VACUUM FULL users; fully vacuums users table. Other queries cannot access users table while vacuuming is running.
  4. VACUUM ANALYZE users; vacuums and analyses users table.
  5. VACUUM FULL VERBOSE ANALYZE users; fully vacuums users table and displays progress messages.
  6. VACUUM; vacuums all the tables in the database the current user has access to.

Even though PostgreSQL can autovacuum tables after a certain percentage of rows gets marked as deleted, some developers and DB admins perfer to run VACUUM ANALYZE on tables with a lot of read/write operations on a custom schedule. For example, daily, at the least busy time of the day.

You may want to run VACUUM when the contents of a table has changed significantly: a significant percentage of records in a table has been added, updated or deleted.

REINDEX

PostgreSQL REINDEX command rebuilds an existing index or all indexes for a table or entire database. Use this command to fix corrupted and unusable indexes, or when and index get bloated after significant change in the table contents.

Examples:

  1. REINDEX INDEX idx_users_on_name; reindexes index idx_users_on_name.
  2. REINDEX TABLE users; reindexes users table.
  3. REINDEX VERBOSE TABLE users; reindexes users table and prints progress messages.
  4. REINDEX DATABASE db1; reindexes all tables in db1 database.

In my experience, running VACUUM ANALYZE and then REINDEX commands on the largest by the number or rows and/or size tables in the database helps to improve the performance of SQL queries.

After a little playing around I created this SQL statement to generate REINDEX TABLE CONCURRENTLY statements for all tables, in ascending order of size. Which helps me to validate the result after successful REINDEX.

SELECT 'REINDEX TABLE CONCURRENTLY ' || quote_ident(relname) || ' /*' ||
pg_size_pretty(pg_total_relation_size(C.oid)) || '*/;'
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname = 'public'
AND C.relkind = 'r'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC;

After performing above things results were pretty much good. Our database srank from 433GB disk usage to 377GB. Also some of our queries becomes faster.

--

--

Nitin goyal

working as DevSecOps Engineer at Pixlevide Solutions. Loves to solve everyday problems with Automation.