<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <title>Data Bene - Tag: 'Statistics'</title>
  <subtitle>Relational database, open-source and scalable.</subtitle>
  <link href="https://www.data-bene.io/en/blog/tags/statistics.xml" rel="self" type="application/atom+xml" />
  <updated>2025-09-29T00:00:00Z</updated>
  <id>https://www.data-bene.io/en/blog/tags/statistics.xml</id>
    <entry>
      <title>Cumulative Statistics in PostgreSQL 18</title>
      <link href="https://www.data-bene.io/en/blog/cumulative-statistics-in-postgresql-18/" />
      <updated>2025-09-29T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/cumulative-statistics-in-postgresql-18/</id>
     <content type="html"><![CDATA[ <p>In <strong>PostgreSQL 18</strong>, the statistics &amp; monitoring subsystem receives a significant overhaul - extended cumulative statistics, new per-backend I/O visibility, the ability for extensions to export / import / adjust statistics, and improvements to GUC controls and snapshot / caching behavior. These changes open new doors for performance analysis, cross‑environment simulation, and tighter integration with extensions. In this article I explore what’s new, what to watch out for, Grand Unified Configuration (GUC) knobs, and how extension authors can leverage the new C API surface.</p>
<h2 id="introduction-and-motivation"><a class="heading-anchor" href="#introduction-and-motivation">Introduction &amp; motivation</a></h2>
<p>Statistics (in the broad sense: monitoring counters, I/O metrics, and planner / optimizer estimates) lie at the heart of both performance tuning and internal decision making in PostgreSQL. Transparent, reliable, and manipulable statistics, among other things, allow DBAs to address the efficiency of PostgreSQL directly, as well as enable “extensions” to improve the user experience.</p>
<p>That said, the historic statistics system of PostgreSQL has not been without points of friction. These include limited ability to clear (relations) statistics, metrics with units that don’t always align with user goals, and no C API for using the PostgreSQL Cumulative Stats engine. PostgreSQL 18 addresses these concerns head on.</p>
<p>Below is a summary of the key enhancements.</p>
<h2 id="a-warning-on-stats"><a class="heading-anchor" href="#a-warning-on-stats">A warning on stats</a></h2>
<p>While statistics offer incredible value, their collection can take up significant time and resources. PostgreSQL 18 introduces an important consideration: with the expanded range of collectible metrics, the hash table maximum size has been increased. Do keep in mind, especially if you’re designing large-scale systems with table-per-customer architectures, that 1GB ceilings have been shown to be hit with some millions of tables.</p>
<h2 id="whats-new-with-postgresql-18-and-stats"><a class="heading-anchor" href="#whats-new-with-postgresql-18-and-stats">What’s new with PostgreSQL 18 and “stats”</a></h2>
<p>Here are the major new or improved features relating to statistics and monitoring. Each item links to the relevant documentation or code where possible.</p>
<p>Generally, <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-IO-VIEW" rel="noopener">pg_stat_io</a> now reports I/O activity in bytes rather than pages, which is more convenient for analysis. Moreover, WAL statistics were moved here from <code>pg_stat_wal</code>, providing a single, comprehensive view.</p>
<h3 id="upgrades"><a class="heading-anchor" href="#upgrades">Upgrades</a></h3>
<p><a href="https://www.postgresql.org/docs/18/pgupgrade.html" rel="noopener">pg_upgrade</a> is now able to retain optimizer statistics, removing the need to run a full <code>ANALYZE</code> on the databases to get good planning of queries after the upgrade; this is a very welcome update for large databases! Be aware that custom statistics added by an extension along with those created with <a href="https://www.postgresql.org/docs/18/sql-createstatistics.html" rel="noopener">CREATE STATISTICS</a> won’t be retained.</p>
<p>You will surely want to look at new options in <a href="https://www.postgresql.org/docs/18/app-vacuumdb.html" rel="noopener">vacuumdb</a> (<code>--missing-stats-only</code>) to, well, analyze only what’s needed.</p>
<p>On a similar note, the <code>--[no-]statistics</code> flag has been added to <a href="https://www.postgresql.org/docs/18/app-pgdump.html" rel="noopener">pg_dump</a>, <a href="https://www.postgresql.org/docs/18/app-pgdumpall.html" rel="noopener">pg_dumpall</a>, and <a href="https://www.postgresql.org/docs/18/app-pgrestore.html" rel="noopener">pg_restore</a>.</p>
<h3 id="maintenance"><a class="heading-anchor" href="#maintenance">Maintenance</a></h3>
<p>It’s now easier to know the maintenance effort on objects with total time spent on VACUUM and ANALYZE operation (and automatic ones) now reported into <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW" rel="noopener">pg_stat_all_tables</a> and variants.</p>
<p>A new GUC to not forget is <a href="https://www.postgresql.org/docs/18/runtime-config-statistics.html#GUC-TRACK-COST-DELAY-TIMING" rel="noopener">track_cost_delay_timing</a>. It collects time spent sleeping (due to delayed operations) for <code>VACUUM</code> and <code>ANALYZE</code>. While very interesting, like other <code>track_io*</code> GUCs, it implies a lot of extra calls to the system clock which on some platforms can lead to a severe performance impact. Always check with tool like <a href="https://www.postgresql.org/docs/18/pgtesttiming.html" rel="noopener">pg_test_timing</a> to ensure your system can afford it!</p>
<p>No more questions about checkpointer activity when using <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-CHECKPOINTER-VIEW" rel="noopener">pg_stat_checkpointer</a>. The new attribute <code>num_done</code> lets us know the number of <strong>completed</strong> checkpoints. You can also get what kind of buffers were written with <code>slru_written</code> and <code>buffers_written</code> now only matching <code>shared_buffers</code>: previously log and view were not providing the same counts because there was a SLRU counter <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=17cc5f666" rel="noopener">in one case and not the other</a>.</p>
<h3 id="analysis"><a class="heading-anchor" href="#analysis">Analysis</a></h3>
<p>Want to know more about the I/O handled by the backend (PID)? Call <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#PG-STAT-GET-BACKEND-IO" rel="noopener">pg_stat_get_backend_io(int)</a> and you’ll get output similar to what the <code>pg_stat_io</code> view provides, for this process (excluding those already). As for the WAL stats for this PID: call <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#PG-STAT-GET-BACKEND-WAL" rel="noopener">pg_stat_get_backend_wal(int)</a>.</p>
<p>New attributes <code>parallel_workers_to_launch</code> and <code>parallel_workers_launched</code> were introduced in <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-VIEW" rel="noopener">pg_stat_database</a>. The ratio lets us know if we have enough slots for parallel workers.</p>
<p>Interesting changes on <a href="https://www.postgresql.org/docs/18/pgstatstatements.html" rel="noopener">pg_stat_statements</a>: more queries will be grouped under the same identifier. For example, patterns <code>IN (1,2,3, ...)</code> as only first and last constant will be used. A more counter-intuitive change is related to the table name used in a query. Only the name is used, not the schema or relation OID. This last change allows us to track dropped or recreated tables for example, but it will group statistics from unrelated tables if they have just the same name. The way to keep separate statistics for tables with same name is to alias them in the queries (<code>FROM my.table mt, other.table ot</code>)…</p>
<p>Finally, additions to <a href="https://www.postgresql.org/docs/18/view-pg-backend-memory-contexts.html" rel="noopener">pg_backend_memory_contexts</a> with <code>path</code> (to get parent/child) and <code>type</code> to segregate <code>AllocSet</code>, <code>Generation</code>, <code>Slab</code> and <code>Bump</code> contexts… and what exactly are <code>Slab</code> and <code>Bump</code>? They are not documented; for these you’ll want to <a href="https://github.com/postgres/postgres/tree/master/src/backend/utils/mmgr" rel="noopener">read headers of C files here</a>. They exist to optimize memory allocation, reallocation, and reset, depending on expected memory usage. For example, <code>Slab</code> is defined as a «MemoryContext implementation designed for cases where large numbers of equally-sized objects can be allocated and freed efficiently with minimal memory wastage and fragmentation».</p>
<p>Ah, no, a last one, <code>wal_buffers_full</code> was added to <code>pg_stat_statements</code> to allow us to tune for <code>wal_buffers</code> with better insights.</p>
<h3 id="replication"><a class="heading-anchor" href="#replication">Replication</a></h3>
<p>There are now better insights for conflict management when using logical replication that leverage new attributes in <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION-STATS" rel="noopener">pg_stat_subscription_stats</a>. As reference, this excerpt from <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6c2b5edec" rel="noopener">the commit entry</a> lists the following attributes that were introduced:</p>
<ul class="list">
<li>
<p><code>confl_insert_exists</code>:<br>
Number of times a row insertion violated a NOT DEFERRABLE unique<br>
constraint.</p>
</li>
<li>
<p><code>confl_update_origin_differs</code>:<br>
Number of times an update was performed on a row that was<br>
previously modified by another origin.</p>
</li>
<li>
<p><code>confl_update_exists</code>:<br>
Number of times that the updated value of a row violates a<br>
NOT DEFERRABLE unique constraint.</p>
</li>
<li>
<p><code>confl_update_missing</code>:<br>
Number of times that the tuple to be updated is missing.</p>
</li>
<li>
<p><code>confl_delete_origin_differs</code>:<br>
Number of times a delete was performed on a row that was<br>
previously modified by another origin.</p>
</li>
<li>
<p><code>confl_delete_missing</code>:<br>
Number of times that the tuple to be deleted is missing.</p>
</li>
</ul>
<h3 id="advanced"><a class="heading-anchor" href="#advanced">Advanced</a></h3>
<p>There is now a <a href="https://www.postgresql.org/docs/18/functions-admin.html#FUNCTIONS-ADMIN-STATSMOD" rel="noopener">new set of functions</a> to manage relation and attributes stats (<code>relpages</code>, <code>avg_width</code>, and so on). This gives you the freedom to export, import, and adjust stats as you want, so you can replicate planner behavior outside of “production”, maintain patched stats, and so on.</p>
<h3 id="my-favorite-for-extension-authors-the-new-c-stats-api"><a class="heading-anchor" href="#my-favorite-for-extension-authors-the-new-c-stats-api">My favorite for extension authors: the new C stats API</a></h3>
<p>One of the most exciting parts is what PostgreSQL 18 <em>opens up</em> for extension authors.</p>
<p>This tiny line at bottom of section <a href="https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MODULES" rel="noopener">E.1.3.9 Modules</a> is what concerns these changes:</p>
<blockquote>
<p>Allow extensions to use the server’s cumulative statistics API (Michael Paquier)</p>
</blockquote>
<p>Previously statistics manipulation was an internal-only affair; now there is an official, structured API surface you can build on (or wrap).</p>
<p>The <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7949d9594" rel="noopener">commit message</a> is well written, and covers most of the new functionality. A subset of the options is <a href="https://www.postgresql.org/docs/18/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS" rel="noopener">detailed in the documentation</a>. However, you will need to go into source code to know more at this stage; in particular, it’s worth having a look at the <code>injection points</code> extension (provided in core) which uses the new API.</p>
<p>For a deeper dive into how an extension can leverage these new capabilities, soon you will be able to see <strong>PACS (PostgreSQL Advanced Cumulative Statistics)</strong> on Codeberg - my project that provides a wrapper library and helper utilities around the new PostgreSQL 18 statistics APIs.</p>
<p>In the meantime, the talk I gave at <a href="https://archive.fosdem.org/2025/schedule/event/fosdem-2025-4496-stats-roll-baby-stats-roll-/" rel="noopener">FOSDEM 2025</a> explores these topics in greater detail.</p>
 ]]></content>
			<author>
				<name>Cédric Villemain</name>
			</author>
    </entry>
</feed>
