Cumulative Statistics in PostgreSQL 18

2025-09-29T00:00:00Z

In PostgreSQL 18, the statistics & monitoring subsystem receives a significant overhaul - extended cumulative statistics, new per-backend I/O visibility, the ability for extensions to export / import / adjust statistics, and improvements to GUC controls and snapshot / caching behavior. These changes open new doors for performance analysis, cross‑environment simulation, and tighter integration with extensions. In this article I explore what’s new, what to watch out for, Grand Unified Configuration (GUC) knobs, and how extension authors can leverage the new C API surface.

Introduction & motivation

Statistics (in the broad sense: monitoring counters, I/O metrics, and planner / optimizer estimates) lie at the heart of both performance tuning and internal decision making in PostgreSQL. Transparent, reliable, and manipulable statistics, among other things, allow DBAs to address the efficiency of PostgreSQL directly, as well as enable “extensions” to improve the user experience.

That said, the historic statistics system of PostgreSQL has not been without points of friction. These include limited ability to clear (relations) statistics, metrics with units that don’t always align with user goals, and no C API for using the PostgreSQL Cumulative Stats engine. PostgreSQL 18 addresses these concerns head on.

Below is a summary of the key enhancements.

A warning on stats

While statistics offer incredible value, their collection can take up significant time and resources. PostgreSQL 18 introduces an important consideration: with the expanded range of collectible metrics, the hash table maximum size has been increased. Do keep in mind, especially if you’re designing large-scale systems with table-per-customer architectures, that 1GB ceilings have been shown to be hit with some millions of tables.

What’s new with PostgreSQL 18 and “stats”

Here are the major new or improved features relating to statistics and monitoring. Each item links to the relevant documentation or code where possible.

Generally, pg_stat_io now reports I/O activity in bytes rather than pages, which is more convenient for analysis. Moreover, WAL statistics were moved here from pg_stat_wal, providing a single, comprehensive view.

Upgrades

pg_upgrade is now able to retain optimizer statistics, removing the need to run a full ANALYZE on the databases to get good planning of queries after the upgrade; this is a very welcome update for large databases! Be aware that custom statistics added by an extension along with those created with CREATE STATISTICS won’t be retained.

You will surely want to look at new options in vacuumdb (--missing-stats-only) to, well, analyze only what’s needed.

On a similar note, the --[no-]statistics flag has been added to pg_dump, pg_dumpall, and pg_restore.

Maintenance

It’s now easier to know the maintenance effort on objects with total time spent on VACUUM and ANALYZE operation (and automatic ones) now reported into pg_stat_all_tables and variants.

A new GUC to not forget is track_cost_delay_timing. It collects time spent sleeping (due to delayed operations) for VACUUM and ANALYZE. While very interesting, like other track_io* GUCs, it implies a lot of extra calls to the system clock which on some platforms can lead to a severe performance impact. Always check with tool like pg_test_timing to ensure your system can afford it!

No more questions about checkpointer activity when using pg_stat_checkpointer. The new attribute num_done lets us know the number of completed checkpoints. You can also get what kind of buffers were written with slru_written and buffers_written now only matching shared_buffers: previously log and view were not providing the same counts because there was a SLRU counter in one case and not the other.

Analysis

Want to know more about the I/O handled by the backend (PID)? Call pg_stat_get_backend_io(int) and you’ll get output similar to what the pg_stat_io view provides, for this process (excluding those already). As for the WAL stats for this PID: call pg_stat_get_backend_wal(int).

New attributes parallel_workers_to_launch and parallel_workers_launched were introduced in pg_stat_database. The ratio lets us know if we have enough slots for parallel workers.

Interesting changes on pg_stat_statements: more queries will be grouped under the same identifier. For example, patterns IN (1,2,3, ...) as only first and last constant will be used. A more counter-intuitive change is related to the table name used in a query. Only the name is used, not the schema or relation OID. This last change allows us to track dropped or recreated tables for example, but it will group statistics from unrelated tables if they have just the same name. The way to keep separate statistics for tables with same name is to alias them in the queries (FROM my.table mt, other.table ot)…

Finally, additions to pg_backend_memory_contexts with path (to get parent/child) and type to segregate AllocSet, Generation, Slab and Bump contexts… and what exactly are Slab and Bump? They are not documented; for these you’ll want to read headers of C files here. They exist to optimize memory allocation, reallocation, and reset, depending on expected memory usage. For example, Slab is defined as a «MemoryContext implementation designed for cases where large numbers of equally-sized objects can be allocated and freed efficiently with minimal memory wastage and fragmentation».

Ah, no, a last one, wal_buffers_full was added to pg_stat_statements to allow us to tune for wal_buffers with better insights.

Replication

There are now better insights for conflict management when using logical replication that leverage new attributes in pg_stat_subscription_stats. As reference, this excerpt from the commit entry lists the following attributes that were introduced:

confl_insert_exists:
Number of times a row insertion violated a NOT DEFERRABLE unique
constraint.
confl_update_origin_differs:
Number of times an update was performed on a row that was
previously modified by another origin.
confl_update_exists:
Number of times that the updated value of a row violates a
NOT DEFERRABLE unique constraint.
confl_update_missing:
Number of times that the tuple to be updated is missing.
confl_delete_origin_differs:
Number of times a delete was performed on a row that was
previously modified by another origin.
confl_delete_missing:
Number of times that the tuple to be deleted is missing.

Advanced

There is now a new set of functions to manage relation and attributes stats (relpages, avg_width, and so on). This gives you the freedom to export, import, and adjust stats as you want, so you can replicate planner behavior outside of “production”, maintain patched stats, and so on.

My favorite for extension authors: the new C stats API

One of the most exciting parts is what PostgreSQL 18 opens up for extension authors.

This tiny line at bottom of section E.1.3.9 Modules is what concerns these changes:

Allow extensions to use the server’s cumulative statistics API (Michael Paquier)

Previously statistics manipulation was an internal-only affair; now there is an official, structured API surface you can build on (or wrap).

The commit message is well written, and covers most of the new functionality. A subset of the options is detailed in the documentation. However, you will need to go into source code to know more at this stage; in particular, it’s worth having a look at the injection points extension (provided in core) which uses the new API.

For a deeper dive into how an extension can leverage these new capabilities, soon you will be able to see PACS (PostgreSQL Advanced Cumulative Statistics) on Codeberg - my project that provides a wrapper library and helper utilities around the new PostgreSQL 18 statistics APIs.

In the meantime, the talk I gave at FOSDEM 2025 explores these topics in greater detail.

Data Bene - Tag: 'Statistics'

Cumulative Statistics in PostgreSQL 18