<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <title>Data Bene - Tag: 'Free software'</title>
  <subtitle>Relational database, open-source and scalable.</subtitle>
  <link href="https://www.data-bene.io/en/blog/tags/free-software.xml" rel="self" type="application/atom+xml" />
  <updated>2026-01-22T00:00:00Z</updated>
  <id>https://www.data-bene.io/en/blog/tags/free-software.xml</id>
    <entry>
      <title>CERN PGDay: an annual PostgreSQL event in Geneva, Switzerland</title>
      <link href="https://www.data-bene.io/en/blog/cern-pgday-2026/" />
      <updated>2026-01-22T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/cern-pgday-2026/</id>
     <content type="html"><![CDATA[ <p>If you’re located near Western Switzerland and the Geneva region (or you just want to visit!), you might find it well worth your time to attend <a href="https://www.swisspug.org/cern-pgday-2026.html" rel="noopener">CERN PGDay 2026</a>. It’s an annual gathering (this year occurring on February 6th, 2026) for anyone interested in learning more about PostgreSQL that takes place at CERN, the world’s largest particle physics laboratory.</p>
<p><em>If you find the subject of particle physics interesting, you may want to visit anyways! They offer free access to many activities that run from Tuesday to Sunday; <a href="https://visit.cern/programme" rel="noopener">you can view the full programme here</a>.</em></p>
<p>Here, you’ll be able to attend a single track of seven English-language sessions, with a social gathering afterwards to enjoy CERN while continuing to connect with the rest of the attendees.</p>
<p>This year, there’ll be:</p>
<ol class="list">
<li><strong>A new PostgreSQL backend for CERN Tape Archive scheduling for LHC Run 4</strong> - Konstantina Skovola, CERN</li>
<li><strong>DCS Data Tools - PostgreSQL/TimescaleDB Implementation for ATLAS DCS Time-Series Data</strong> - Dimitrios Matakias, Paris Moschovakos, CERN</li>
<li><strong>Operational hazards of managing PostgreSQL DBs over 100TB</strong> - Teresa Lopes, Adyen</li>
<li><strong>Vacuuming Large Tables: How Recent Postgres Changes Further Enable Mission Critical Workloads</strong> - Robert Treat, AWS</li>
<li><strong>The (very practical) Postgres Sharding Landscape</strong> - Álvaro Hernández, OnGres</li>
<li><strong>The Alchemy of Shared Buffers: Balancing Concurrency and Performance</strong> - Josef Machytka, credativ</li>
<li><strong>When Kafka Met Elephant: A Love Story about Fast Ingestion</strong> - Barbora Linhartova, Jan Suchanek, Baremon</li>
</ol>
<p>The first talk of the day is of particular note…</p>
<blockquote>
<p>The CERN Tape Archive (CTA) stores over one exabyte of scientific data. To orchestrate storage operations (archival) and access operations (retrieval), the CTA Scheduler coordinates concurrent data movements across hundreds of tape servers, relying on a Scheduler Database (Scheduler DB) to manage the metadata of the in-flight requests. The existing objectstore-based design of the CTA Scheduler DB is a complex transactional management system. This talk presents the development of a new PostgreSQL-based backend for the CTA Scheduler as an off-the-shelf solution which simplifies implementation and is expected to significantly reduce future development and operational costs. We describe the implementation of all main CTA workflows and explain how PostgreSQL addresses the limitations of the objectstore-based system, providing the foundation for the tenfold increase in data throughput expected during LHC Run 4.</p>
</blockquote>
<p><em>(<a href="https://indico.cern.ch/event/1504097/contributions/6833857/" rel="noopener">link to talk description</a>)</em></p>
<p>In a world where ever larger amounts of digital information must be stored, learning more about how CERN manages over one exabyte of scientific data is sure to be an interesting experience.</p>
<p>Geneva is home to many international organizations across the public, private, and scientific sectors. If you’d like to explore the topic of PostgreSQL in more depth through engaging in discussion or attending sessions, it’s a fun location to meet and learn. Thinking of coming by? You can <a href="https://indico.cern.ch/event/1504097/registrations/114102/" rel="noopener">register until February 1st</a>.</p>
<p>Last year’s session recordings can be viewed by <a href="https://indico.cern.ch/event/1471762/timetable/#20250117" rel="noopener">visiting the 2025 schedule</a> and selecting the paperclip symbol next to the talk you’re interested in.</p>
<p>Stop by and see us in the catering area; we’re proud to be sponsoring the event again this year and will have a table or booth where you can visit. We’d love to talk about what PostgreSQL and open-source innovation, development, &amp; whatever questions you have.</p>
 ]]></content>
			<author>
				<name>Sarah Conway</name>
			</author>
    </entry>
    <entry>
      <title>Open Source Experience 2025</title>
      <link href="https://www.data-bene.io/en/blog/open-source-experience-2025/" />
      <updated>2026-01-03T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/open-source-experience-2025/</id>
     <content type="html"><![CDATA[ <p>The 2025 edition of the <a href="https://www.opensource-experience.com/" rel="noopener">Open Source Experience (OSXP)</a> took place on December 10th and 11th under the theme “Open Source, key to Europe’s strategic autonomy.” As you might expect, the focus was entirely on redefining Europe’s digital future as being driven by open source innovations across all technologies (including data management, cloud computing, and cybersecurity). This year, many of the talks were largely focused on the intersection between open source and AI technologies (in alignment with the focus the technology industry has had on AI in general, in 2025).</p>
<h2 id="the-event"><a class="heading-anchor" href="#the-event">The event</a></h2>
<p>The event lasted two days, featuring 90 exhibitors, 130 sessions, 150 speakers, and over 4,000 participants – a truly large-scale conference held at the Cité des Sciences et de l’Industrie in Paris.</p>
<p>This particular venue was smaller than last year’s venue (Le Palais de Congrés) and the rooms we went to were all full as a result of being very small (about 20 or 30 seats or so at most).</p>
<h2 id="the-talk-format"><a class="heading-anchor" href="#the-talk-format">The talk format</a></h2>
<p>Presentations were given in both English and French. Interestingly, there were no “silent rooms” this year (where headphones are provided to each attendee). Not everyone enjoyed that format last year, but it was a useful one for following two talks, or switching between them depending on the content or questions.</p>
<p>Two of our team members were in attendance and had the opportunity to explore various exhibitors and event rooms spread across three floors. The talks lasted 20 minutes. While too short to delve into details, this format was excellent for discovering new technologies and piquing our interest at a glance.</p>
<h2 id="the-talk-content"><a class="heading-anchor" href="#the-talk-content">The talk content</a></h2>
<p>There were six tracks that talks were categorized by:</p>
<ul class="list">
<li>Economic models and governance for sustainable open strategies</li>
<li>Artificial intelligence and scientific computing for data analysis</li>
<li>Cloud architecture and virtualization for an autonomous future</li>
<li>Development - software innovation in action</li>
<li>Cybersecurity and the software production chain: Open Source as a foundation of trust</li>
<li>Collaborative tools and business applications: regaining digital autonomy</li>
</ul>
<p>We found the topic of open source solutions within the public sector to be the most interesting. In particular, it was easy to see our reliance as a global society on the big five tech companies (GAFAM: Google, Apple, Facebook, Amazon, and Microsoft) has grown significantly in the past few years. Open source software is a direct solution for protecting our collective right to privacy in the digital age, which is exactly why conferences such as this are so important for the discovery of OSS alternatives and innovation that leads to further development within this sector.</p>
<h2 id="attendance"><a class="heading-anchor" href="#attendance">Attendance</a></h2>
<p>Attendance was particularly high from the very first day. We were delighted to have the opportunity to interact in person and meet our partners, especially <a href="https://www.ow2.org/" rel="noopener">OW2</a>, which also organizes an annual Open Source event in June. (The call for presentations is open until February 14, 2026 – see the OW2Con’26 call for proposals <a href="https://www.ow2con.org/view/2026/Call_For_Presentations" rel="noopener">here</a>.)</p>
<p>Since the event was entirely focused on open source technologies, we were able to discuss with numerous participants topics such as PostgreSQL support, along with the challenges and organizational impacts for companies wishing to innovate and adopt PostgreSQL, against the backdrop of market demand to break free from proprietary software licensing constraints.</p>
<p>We also had the pleasure of meeting key players in open source hardware innovation, which resonates with our own R&amp;D on RISC-V processors.</p>
<p>Many free software and open source projects were represented. Some examples include <a href="https://nextcloud.com" rel="noopener">Nextcloud</a> (a self-hosted cloud collaboration platform that we personally use for hosting here at Data Bene) and <a href="https://opentalk.eu/en" rel="noopener">OpenTalk</a>, a video-conferencing solution that is GDPR-compliant, operating within German data centers.</p>
<h2 id="closing-thoughts"><a class="heading-anchor" href="#closing-thoughts">Closing thoughts</a></h2>
<p>The event was successful and well-organized. The only thing that would have improved the experience would have been longer presentations to explore the various topics discussed in more depth. If you want to discover new open source projects, note that this event is also a great opportunity to freely exchange ideas on these topics.</p>
<p>The video replays for 2025 have not yet been published, but past conference recordings can be found on the <a href="https://www.opensource-experience.com/en/video-replays" rel="noopener">official website, here</a>.</p>
<p>Overall, we thoroughly enjoyed the event and hope to attend next year!</p>
 ]]></content>
			<author>
				<name>Grégory Tiram</name>
			</author>
    </entry>
    <entry>
      <title>Did you know? Tables in PostgreSQL are limited to 1,600 columns</title>
      <link href="https://www.data-bene.io/en/blog/did-you-know-tables-in-postgresql-are-limited-to-1600-columns/" />
      <updated>2025-11-13T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/did-you-know-tables-in-postgresql-are-limited-to-1600-columns/</id>
     <content type="html"><![CDATA[ <p><strong>Did you know a table can have no more than 1,600 columns?</strong> This blog article was inspired by a conversation Pierre Ducroquet and I had.</p>
<h2 id="first-the-documentation"><a class="heading-anchor" href="#first-the-documentation">First, the documentation</a></h2>
<p>The PostgreSQL documentation <a href="https://www.postgresql.org/docs/current/limits.html" rel="noopener">Appendix K</a> states a table can have a maximum of 1,600 columns.</p>
<p>This is a <strong>hard coded limit</strong> that can be found in the source code at <code>src/include/access/htup_details.h</code>:</p>
<pre class="language-plaintext"><code class="language-plaintext">#define MaxTupleAttributeNumber 1664
#define MaxHeapAttributeNumber	1600</code></pre>
<h2 id="reaching-the-limit-the-expected-way"><a class="heading-anchor" href="#reaching-the-limit-the-expected-way">Reaching the limit the expected way</a></h2>
<p>Let’s fully validate the claim and test accordingly.</p>
<h3 id="playing-with-table-definition"><a class="heading-anchor" href="#playing-with-table-definition">Playing with table definition</a></h3>
<p>Here, we’ll use a simple bash script because it is easy to adapt while testing.</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Classic example</span>

<span class="token keyword">DO</span> $$             
<span class="token keyword">DECLARE</span>
    i <span class="token keyword">int</span><span class="token punctuation">;</span>
<span class="token keyword">BEGIN</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'DROP TABLE IF EXISTS tint_1601;'</span><span class="token punctuation">;</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'CREATE TABLE tint_1601(i_1 int);'</span><span class="token punctuation">;</span>
    <span class="token keyword">FOR</span> i <span class="token operator">IN</span> <span class="token number">2.</span><span class="token number">.1601</span> <span class="token keyword">LOOP</span>
        <span class="token keyword">EXECUTE</span> <span class="token function">format</span><span class="token punctuation">(</span><span class="token string">'ALTER TABLE tint_1601 ADD COLUMN i_%s int;'</span><span class="token punctuation">,</span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">END</span> <span class="token keyword">LOOP</span><span class="token punctuation">;</span>
<span class="token keyword">END</span> $$<span class="token punctuation">;</span></code></pre>
<p>The typical output is as follows:</p>
<pre class="language-plaintext"><code class="language-plaintext">NOTICE:  table "tint_1600" does not exist, skipping
ERROR:  tables can have at most 1600 columns
CONTEXT:  SQL statement "ALTER TABLE tint_1600 ADD COLUMN i_1601 int;"
PL/pgSQL function inline_code_block line 8 at EXECUTE</code></pre>
<p>So far so good (or at least, all is working as expected).</p>
<p>You might have the idea to try replacing <code>int4</code> with <code>int2</code> type to create a 1,600+ column table. It will not work as this is a hard coded limit.</p>
<h3 id="playing-with-table-content"><a class="heading-anchor" href="#playing-with-table-content">Playing with table content</a></h3>
<p>Let’s build a 1,600 column table with the same demonstrated code.</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">DO</span> $$             
<span class="token keyword">DECLARE</span>
    i <span class="token keyword">int</span><span class="token punctuation">;</span>
<span class="token keyword">BEGIN</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'DROP TABLE IF EXISTS tint_1600;'</span><span class="token punctuation">;</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'CREATE TABLE tint_1600(i_1 int);'</span><span class="token punctuation">;</span>
    <span class="token keyword">FOR</span> i <span class="token operator">IN</span> <span class="token number">2.</span><span class="token number">.1600</span> <span class="token keyword">LOOP</span>
        <span class="token keyword">EXECUTE</span> <span class="token function">format</span><span class="token punctuation">(</span><span class="token string">'ALTER TABLE tint_1600 ADD COLUMN i_%s int;'</span><span class="token punctuation">,</span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">END</span> <span class="token keyword">LOOP</span><span class="token punctuation">;</span>
<span class="token keyword">END</span> $$<span class="token punctuation">;</span></code></pre>
<p>Another sql script can be used to produce a valid 1,600 column tuple:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">DO</span> $$
<span class="token keyword">DECLARE</span>
    s <span class="token keyword">TEXT</span><span class="token punctuation">;</span>
    rows_inserted <span class="token keyword">int</span><span class="token punctuation">;</span>
<span class="token keyword">BEGIN</span>
    s :<span class="token operator">=</span> <span class="token function">format</span><span class="token punctuation">(</span>
                 <span class="token string">'INSERT INTO tint_1600 VALUES (1%s);'</span>
               <span class="token punctuation">,</span> <span class="token keyword">repeat</span><span class="token punctuation">(</span> <span class="token string">',1'</span> <span class="token punctuation">,</span> <span class="token number">1599</span> <span class="token punctuation">)</span> 
               <span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">EXECUTE</span> s<span class="token punctuation">;</span>

    GET DIAGNOSTICS rows_inserted <span class="token operator">=</span> ROW_COUNT<span class="token punctuation">;</span>
    RAISE NOTICE <span class="token string">'Rows inserted: %'</span><span class="token punctuation">,</span> rows_inserted<span class="token punctuation">;</span>
<span class="token keyword">END</span> $$<span class="token punctuation">;</span></code></pre>
<p>The output is:</p>
<pre class="language-plaintext"><code class="language-plaintext">NOTICE:  Rows inserted: 1
DO</code></pre>
<p>Another success with no surprise.</p>
<h3 id="testing-the-limits"><a class="heading-anchor" href="#testing-the-limits">Testing the limits</a></h3>
<p>Let us continue pushing to the limits.</p>
<p>We now create another 1,600 column table using the <code>char(127)</code> data type.</p>
<p>We reuse our sql script with some modifications:</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Create a table with 1,600 columns: 1 x int + 1599 x char(127)</span>
<span class="token keyword">DO</span> $$             
<span class="token keyword">DECLARE</span>
    i <span class="token keyword">int</span><span class="token punctuation">;</span>
<span class="token keyword">BEGIN</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'DROP TABLE IF EXISTS tint_1600;'</span><span class="token punctuation">;</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'CREATE TABLE tint_1600(i_1 int);'</span><span class="token punctuation">;</span>
    <span class="token keyword">FOR</span> i <span class="token operator">IN</span> <span class="token number">2.</span><span class="token number">.1600</span> <span class="token keyword">LOOP</span>
        <span class="token keyword">EXECUTE</span> <span class="token function">format</span><span class="token punctuation">(</span><span class="token string">'ALTER TABLE tint_1600 ADD COLUMN c_%s char(127) NOT NULL;'</span><span class="token punctuation">,</span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">END</span> <span class="token keyword">LOOP</span><span class="token punctuation">;</span>
<span class="token keyword">END</span> $$<span class="token punctuation">;</span>

<span class="token comment">-- Insert a tuple - 1 x int + 1599 x char(127)</span>
<span class="token keyword">DO</span> $$
<span class="token keyword">DECLARE</span>
    s <span class="token keyword">TEXT</span><span class="token punctuation">;</span>
<span class="token keyword">BEGIN</span>
    s :<span class="token operator">=</span> <span class="token function">format</span><span class="token punctuation">(</span> 
                 <span class="token string">'INSERT INTO tint_1600 VALUES (1%s);'</span>
               <span class="token punctuation">,</span> <span class="token keyword">repeat</span><span class="token punctuation">(</span> $q$<span class="token punctuation">,</span><span class="token string">'1'</span>::<span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">127</span><span class="token punctuation">)</span>$q$ <span class="token punctuation">,</span> <span class="token number">1599</span> <span class="token punctuation">)</span> 
               <span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">EXECUTE</span> s<span class="token punctuation">;</span>
<span class="token keyword">END</span> $$<span class="token punctuation">;</span></code></pre>
<p>The output is:</p>
<pre class="language-plaintext"><code class="language-plaintext">ERROR:  row is too big: size 25616, maximum size 8160</code></pre>
<p>As we can see, the table has 1,600 columns but this time the tuple cannot fit a single heap page which explains the error “row is too big: size 25616, maximum size 8160”. If you paid attention to the modified script, you can see columns are defined as <code>NOT NULL</code> so at table creation PostgreSQL could have proven data insertion was impossible.</p>
<h2 id="what-about-joins"><a class="heading-anchor" href="#what-about-joins">What about JOINs?</a></h2>
<p>To keep things simple, let us auto-join:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">SELECT</span> a<span class="token punctuation">.</span><span class="token operator">*</span><span class="token punctuation">,</span>b<span class="token punctuation">.</span><span class="token operator">*</span> <span class="token keyword">FROM</span> tint_1600 a<span class="token punctuation">,</span> tint_1600 b<span class="token punctuation">;</span>
ERROR:  target lists can have at most <span class="token number">1664</span> entries</code></pre>
<p>Now the <code>SELECT</code> clause (<code>a.*,b.*</code>) is reaching its own limit (<code>MaxTupleAttributeNumber = 1664</code>).</p>
<h2 id="reaching-the-column-limit-the-unexpected-way"><a class="heading-anchor" href="#reaching-the-column-limit-the-unexpected-way">Reaching the column limit the unexpected way</a></h2>
<p>Sometimes, you have to modify your application and it generates schema modifications.<br>
Most of the time, there are table modifications like adding or dropping columns.</p>
<h3 id="exploring-add-/-drop-column"><a class="heading-anchor" href="#exploring-add-/-drop-column">Exploring <code>ADD</code> / <code>DROP COLUMN</code></a></h3>
<p>Let us see what happens from the SQL side when we add, then drop, a column.</p>
<pre class="language-sql"><code class="language-sql"><span class="token operator">=</span><span class="token comment"># CREATE TABLE tadc_1600(i_1 int NOT NULL);</span>

<span class="token keyword">CREATE</span> <span class="token keyword">TABLE</span>

<span class="token operator">=</span><span class="token comment"># ALTER TABLE tadc_1600 ADD COLUMN i_2 int NOT NULL;</span>

<span class="token keyword">ALTER</span> <span class="token keyword">TABLE</span>

<span class="token operator">=</span><span class="token comment"># SELECT attname,attnum,attstorage,attnotnull,attisdropped </span>
   <span class="token keyword">FROM</span> pg_attribute 
   <span class="token keyword">WHERE</span> attrelid<span class="token operator">=</span><span class="token punctuation">(</span>
                   <span class="token keyword">SELECT</span> oid 
                   <span class="token keyword">FROM</span> pg_class 
                   <span class="token keyword">WHERE</span> relname<span class="token operator">=</span><span class="token string">'tadc_1600'</span>
                   <span class="token punctuation">)</span> 
     <span class="token operator">AND</span> attnum <span class="token operator">></span> <span class="token number">0</span> <span class="token keyword">ORDER</span> <span class="token keyword">BY</span> attnum<span class="token punctuation">;</span>
     
 attname <span class="token operator">|</span> attnum <span class="token operator">|</span> attstorage <span class="token operator">|</span> attnotnull <span class="token operator">|</span> attisdropped 
<span class="token comment">---------+--------+------------+------------+--------------</span>
 i_1     <span class="token operator">|</span>      <span class="token number">1</span> <span class="token operator">|</span> p          <span class="token operator">|</span> t          <span class="token operator">|</span> f
 i_2     <span class="token operator">|</span>      <span class="token number">2</span> <span class="token operator">|</span> p          <span class="token operator">|</span> t          <span class="token operator">|</span> f
<span class="token punctuation">(</span><span class="token number">2</span> <span class="token keyword">rows</span><span class="token punctuation">)</span>

<span class="token operator">=</span><span class="token comment"># ALTER TABLE tadc_1600 DROP COLUMN i_2;</span>

<span class="token keyword">ALTER</span> <span class="token keyword">TABLE</span>

<span class="token operator">=</span><span class="token comment"># SELECT attname,attnum,attstorage,attnotnull,attisdropped </span>
   <span class="token keyword">FROM</span> pg_attribute 
   <span class="token keyword">WHERE</span> attrelid<span class="token operator">=</span><span class="token punctuation">(</span>
                   <span class="token keyword">SELECT</span> oid 
                   <span class="token keyword">FROM</span> pg_class 
                   <span class="token keyword">WHERE</span> relname<span class="token operator">=</span><span class="token string">'tadc_1600'</span>
                   <span class="token punctuation">)</span> 
     <span class="token operator">AND</span> attnum <span class="token operator">></span> <span class="token number">0</span> <span class="token keyword">ORDER</span> <span class="token keyword">BY</span> attnum<span class="token punctuation">;</span>

           attname            <span class="token operator">|</span> attnum <span class="token operator">|</span> attstorage <span class="token operator">|</span> attnotnull <span class="token operator">|</span> attisdropped 
<span class="token comment">------------------------------+--------+------------+------------+--------------</span>
 i_1                          <span class="token operator">|</span>      <span class="token number">1</span> <span class="token operator">|</span> p          <span class="token operator">|</span> t          <span class="token operator">|</span> f
 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">2.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> <span class="token operator">|</span>      <span class="token number">2</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t
<span class="token punctuation">(</span><span class="token number">2</span> <span class="token keyword">rows</span><span class="token punctuation">)</span></code></pre>
<p>When dropping a column,</p>
<ul class="list">
<li>the name becomes ‘.’ + ‘pg.dropped.’ + attnum + ‘.’,</li>
<li>the column becomes NULLable,</li>
<li>the column is marked as dropped.</li>
</ul>
<h3 id="iterating-add-/-drop-column"><a class="heading-anchor" href="#iterating-add-/-drop-column">Iterating ADD / DROP COLUMN</a></h3>
<p>One can wonder if there is a limit to the number of add/drop operations that can be run on a given table.</p>
<p>As usual, let us try:</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- ADD / DROP COLUMN example</span>
<span class="token keyword">DO</span> $$             
<span class="token keyword">DECLARE</span>
    i <span class="token keyword">int</span><span class="token punctuation">;</span>
<span class="token keyword">BEGIN</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'DROP TABLE IF EXISTS tadc;'</span><span class="token punctuation">;</span>
    <span class="token keyword">EXECUTE</span> <span class="token string">'CREATE TABLE tadc(i_1 int);'</span><span class="token punctuation">;</span>
    <span class="token keyword">FOR</span> i <span class="token operator">IN</span> <span class="token number">2.</span><span class="token number">.1601</span> <span class="token keyword">LOOP</span>
        <span class="token keyword">EXECUTE</span> <span class="token function">format</span><span class="token punctuation">(</span><span class="token string">'ALTER TABLE tadc ADD COLUMN i_%s int;'</span><span class="token punctuation">,</span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">EXECUTE</span> <span class="token function">format</span><span class="token punctuation">(</span><span class="token string">'ALTER TABLE tadc DROP COLUMN i_%s;'</span><span class="token punctuation">,</span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">END</span> <span class="token keyword">LOOP</span><span class="token punctuation">;</span>
<span class="token keyword">END</span> $$<span class="token punctuation">;</span></code></pre>
<p>The output is:</p>
<pre class="language-plaintext"><code class="language-plaintext">ERROR:  tables can have at most 1600 columns
CONTEXT:  SQL statement "ALTER TABLE tadc ADD COLUMN i_1601 int;"
PL/pgSQL function inline_code_block line 8 at EXECUTE</code></pre>
<p>Oh oh! We reached the 1,600 limit here as well.</p>
<p>Let us explore a bit after add/drop column 1,599 times:</p>
<pre class="language-sql"><code class="language-sql"><span class="token operator">=</span><span class="token comment"># SELECT attname,attnum,attstorage,attnotnull,attisdropped </span>
   <span class="token keyword">FROM</span> pg_attribute 
   <span class="token keyword">WHERE</span> attrelid<span class="token operator">=</span><span class="token punctuation">(</span>
                   <span class="token keyword">SELECT</span> oid 
                   <span class="token keyword">FROM</span> pg_class 
                   <span class="token keyword">WHERE</span> relname<span class="token operator">=</span><span class="token string">'tadc'</span>
                   <span class="token punctuation">)</span> 
     <span class="token operator">AND</span> attnum <span class="token operator">></span> <span class="token number">0</span> <span class="token keyword">ORDER</span> <span class="token keyword">BY</span> attnum<span class="token punctuation">;</span>

             attname             <span class="token operator">|</span> attnum <span class="token operator">|</span> attstorage <span class="token operator">|</span> attnotnull <span class="token operator">|</span> attisdropped 
<span class="token comment">---------------------------------+--------+------------+------------+--------------</span>
 i_1                             <span class="token operator">|</span>      <span class="token number">1</span> <span class="token operator">|</span> p          <span class="token operator">|</span> t          <span class="token operator">|</span> f
 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">2.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>    <span class="token operator">|</span>      <span class="token number">2</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t
 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">3.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>    <span class="token operator">|</span>      <span class="token number">3</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t
 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">4.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>    <span class="token operator">|</span>      <span class="token number">4</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t
 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">5.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>    <span class="token operator">|</span>      <span class="token number">5</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t

 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">1599.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> <span class="token operator">|</span>   <span class="token number">1599</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t
 <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>pg<span class="token punctuation">.</span>dropped<span class="token punctuation">.</span><span class="token number">1600.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> <span class="token operator">|</span>   <span class="token number">1600</span> <span class="token operator">|</span> p          <span class="token operator">|</span> f          <span class="token operator">|</span> t
<span class="token punctuation">(</span><span class="token number">1600</span> <span class="token keyword">rows</span><span class="token punctuation">)</span></code></pre>
<p>Well, table <code>tadc</code> has 1,600 columns. You can see this as modifications are appending and table content rewriting is avoided.</p>
<p>At this point, further column add &amp; drop modifications will fail.</p>
<p>Is there anything I can do to escape this situation?</p>
<h4 id="the-vacuum-knight-shall-save-the-postgresql-princess-right"><a class="heading-anchor" href="#the-vacuum-knight-shall-save-the-postgresql-princess-right">The VACUUM knight shall save the PostgreSQL princess, right?</a></h4>
<p>The <code>VACUUM</code> command operates at the tuple level so even if you run a <code>VACUUM FULL</code> the table structure will not change.</p>
<h4 id="so-the-dragon-ate-the-knight-whats-next"><a class="heading-anchor" href="#so-the-dragon-ate-the-knight-whats-next">So, the dragon ate the knight, what’s next?</a></h4>
<p>This is not an issue with dead tuples but rather an issue with the catalog.<br>
You’ll need to create a new table definition.</p>
<p>Here are some solutions, from simple to complex:</p>
<ol class="list">
<li>
<p>Build a new table (requires service downtime)</p>
<ul class="list">
<li><code>CREATE TABLE (LIKE INCLUDING ALL)</code></li>
<li><code>COPY</code> data from old to new table</li>
<li>Rename tables</li>
<li>Drop old table</li>
</ul>
</li>
<li>
<p>Leverage logical replication (minimize service downtime)</p>
<ul class="list">
<li><code>CREATE TABLE LIKE (INCLUDING ALL)</code></li>
<li><code>CREATE local PUBLICATION/SUBSCRIPTION</code></li>
<li>Once data is synchronized, stop/pause application service</li>
<li>Drop subscription</li>
<li>Rename tables</li>
<li>Restart/resume application</li>
<li>Drop old table</li>
</ul>
</li>
</ol>
<h4 id="what-about-foreign-keys"><a class="heading-anchor" href="#what-about-foreign-keys">What about Foreign Keys?</a></h4>
<p>The above solution <em>works</em> fine for simple cases. But real life tables often<br>
use integrity constraints. Let’s explore a bit using foreign keys.</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Foreign key case</span>

<span class="token operator">=</span><span class="token comment"># CREATE TABLE colors (id int, name text );</span>
<span class="token operator">=</span><span class="token comment"># CREATE TABLE objects ( id int, color_id int, name text );</span>

<span class="token operator">=</span><span class="token comment"># ALTER TABLE colors ADD PRIMARY KEY (id);</span>
<span class="token operator">=</span><span class="token comment"># ALTER TABLE objects ADD CONSTRAINT fk_color</span>
                       <span class="token keyword">FOREIGN</span> <span class="token keyword">KEY</span> <span class="token punctuation">(</span>color_id<span class="token punctuation">)</span> <span class="token keyword">REFERENCES</span> colors <span class="token punctuation">(</span>id<span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token operator">=</span><span class="token comment"># INSERT INTO colors </span>
   <span class="token keyword">VALUES</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token string">'red'</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">'green'</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token string">'blue'</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token operator">=</span><span class="token comment"># INSERT INTO objects </span>
   <span class="token keyword">VALUES</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token string">'red object'</span><span class="token punctuation">)</span>
         <span class="token punctuation">,</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">'green object'</span><span class="token punctuation">)</span>
         <span class="token punctuation">,</span><span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">,</span><span class="token string">'blue object'</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>Let’s apply the recipe:</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Duplicate table structure (valid columns only)  and copy data</span>
<span class="token operator">=</span><span class="token comment"># CREATE TABLE tmp_colors (LIKE colors INCLUDING ALL);</span>
<span class="token operator">=</span><span class="token comment"># INSERT INTO tmp_colors SELECT * FROM colors;</span>

<span class="token comment">-- Do the DROP/RENAME trick</span>
<span class="token operator">=</span><span class="token comment"># BEGIN;</span>
<span class="token operator">=</span><span class="token comment"># DROP TABLE colors;</span>
<span class="token operator">=</span><span class="token comment"># ALTER TABLE tmp_colors RENAME TO colors;</span>
<span class="token operator">=</span><span class="token comment"># COMMIT;</span></code></pre>
<p>The <code>DROP TABLE</code> command issued an error:</p>
<pre class="language-plaintext"><code class="language-plaintext">ERROR:  cannot drop table colors because other objects depend on it
DETAIL:  constraint fk_color on table objects depends on table colors
HINT:  Use DROP ... CASCADE to drop the dependent objects too.</code></pre>
<p>As we can see, the recipe has to be changed to include dependent tables as well.</p>
<p>Adding <code>CASCADE</code> will drop FK constraints on dependent tables.</p>
<p>Let’s run a modified version of the recipe:</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Do the DROP/RENAME trick</span>
<span class="token operator">=</span><span class="token comment"># BEGIN;</span>

<span class="token operator">=</span><span class="token comment"># DROP TABLE colors CASCADE;  -- DROP related FOREIGN KEY constaints</span>

<span class="token operator">=</span><span class="token comment"># ALTER TABLE tmp_colors RENAME TO colors;</span>

<span class="token comment">-- Recreate FK contraint</span>
<span class="token operator">=</span><span class="token comment"># ALTER TABLE objects ADD CONSTRAINT fk_color</span>
                       <span class="token keyword">FOREIGN</span> <span class="token keyword">KEY</span> <span class="token punctuation">(</span>color_id<span class="token punctuation">)</span> <span class="token keyword">REFERENCES</span> colors <span class="token punctuation">(</span>id<span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token keyword">COMMIT</span><span class="token punctuation">;</span></code></pre>
<p>We have to check the behaviour is the expected one:</p>
<pre class="language-sql"><code class="language-sql"><span class="token operator">=</span><span class="token comment"># INSERT INTO objects VALUES (5,5,'ro');</span>
ERROR:  <span class="token keyword">insert</span> <span class="token operator">or</span> <span class="token keyword">update</span> <span class="token keyword">on</span> <span class="token keyword">table</span> <span class="token string">"objects"</span> violates <span class="token keyword">foreign</span> <span class="token keyword">key</span> <span class="token keyword">constraint</span> <span class="token string">"fk_color"</span>
DETAIL:  <span class="token keyword">Key</span> <span class="token punctuation">(</span>color_id<span class="token punctuation">)</span><span class="token operator">=</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">)</span> <span class="token operator">is</span> <span class="token operator">not</span> present <span class="token operator">in</span> <span class="token keyword">table</span> <span class="token string">"colors"</span><span class="token punctuation">.</span>

<span class="token operator">=</span><span class="token comment"># INSERT INTO objects VALUES (5,3,'ro');</span>
<span class="token keyword">INSERT</span> <span class="token number">0</span> <span class="token number">1</span></code></pre>
<p>Success!</p>
<p>When integrity constraints are too numerous or you find it difficult to follow,<br>
you may use pg_dump/pg_restore to rebuild all automatically. If service downtime<br>
is an issue, you may use logical replication to perform like pg_dump/pg_restore.</p>
<h2 id="best-is-to-avoid-having-to-deal-with-this"><a class="heading-anchor" href="#best-is-to-avoid-having-to-deal-with-this">Best is to avoid having to deal with this</a></h2>
<p>As you can see, having to deal with the 1,600 column limit is not something you would<br>
like to do just for fun (usually). Notably, it can lead to service downtime.</p>
<h2 id="talk-to-us"><a class="heading-anchor" href="#talk-to-us">Talk to us</a></h2>
<p>Do you have other ideas of how to address this situation? Have you run into odd ways of reaching this hard-coded limit? <a href="https://www.data-bene.io/en/#contact" rel="noopener">Contact us</a>! We always love a good discussion about PostgreSQL.</p>
 ]]></content>
			<author>
				<name>Frédéric Delacourt</name>
			</author>
    </entry>
    <entry>
      <title>Cumulative Statistics in PostgreSQL 18</title>
      <link href="https://www.data-bene.io/en/blog/cumulative-statistics-in-postgresql-18/" />
      <updated>2025-09-29T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/cumulative-statistics-in-postgresql-18/</id>
     <content type="html"><![CDATA[ <p>In <strong>PostgreSQL 18</strong>, the statistics &amp; monitoring subsystem receives a significant overhaul - extended cumulative statistics, new per-backend I/O visibility, the ability for extensions to export / import / adjust statistics, and improvements to GUC controls and snapshot / caching behavior. These changes open new doors for performance analysis, cross‑environment simulation, and tighter integration with extensions. In this article I explore what’s new, what to watch out for, Grand Unified Configuration (GUC) knobs, and how extension authors can leverage the new C API surface.</p>
<h2 id="introduction-and-motivation"><a class="heading-anchor" href="#introduction-and-motivation">Introduction &amp; motivation</a></h2>
<p>Statistics (in the broad sense: monitoring counters, I/O metrics, and planner / optimizer estimates) lie at the heart of both performance tuning and internal decision making in PostgreSQL. Transparent, reliable, and manipulable statistics, among other things, allow DBAs to address the efficiency of PostgreSQL directly, as well as enable “extensions” to improve the user experience.</p>
<p>That said, the historic statistics system of PostgreSQL has not been without points of friction. These include limited ability to clear (relations) statistics, metrics with units that don’t always align with user goals, and no C API for using the PostgreSQL Cumulative Stats engine. PostgreSQL 18 addresses these concerns head on.</p>
<p>Below is a summary of the key enhancements.</p>
<h2 id="a-warning-on-stats"><a class="heading-anchor" href="#a-warning-on-stats">A warning on stats</a></h2>
<p>While statistics offer incredible value, their collection can take up significant time and resources. PostgreSQL 18 introduces an important consideration: with the expanded range of collectible metrics, the hash table maximum size has been increased. Do keep in mind, especially if you’re designing large-scale systems with table-per-customer architectures, that 1GB ceilings have been shown to be hit with some millions of tables.</p>
<h2 id="whats-new-with-postgresql-18-and-stats"><a class="heading-anchor" href="#whats-new-with-postgresql-18-and-stats">What’s new with PostgreSQL 18 and “stats”</a></h2>
<p>Here are the major new or improved features relating to statistics and monitoring. Each item links to the relevant documentation or code where possible.</p>
<p>Generally, <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-IO-VIEW" rel="noopener">pg_stat_io</a> now reports I/O activity in bytes rather than pages, which is more convenient for analysis. Moreover, WAL statistics were moved here from <code>pg_stat_wal</code>, providing a single, comprehensive view.</p>
<h3 id="upgrades"><a class="heading-anchor" href="#upgrades">Upgrades</a></h3>
<p><a href="https://www.postgresql.org/docs/18/pgupgrade.html" rel="noopener">pg_upgrade</a> is now able to retain optimizer statistics, removing the need to run a full <code>ANALYZE</code> on the databases to get good planning of queries after the upgrade; this is a very welcome update for large databases! Be aware that custom statistics added by an extension along with those created with <a href="https://www.postgresql.org/docs/18/sql-createstatistics.html" rel="noopener">CREATE STATISTICS</a> won’t be retained.</p>
<p>You will surely want to look at new options in <a href="https://www.postgresql.org/docs/18/app-vacuumdb.html" rel="noopener">vacuumdb</a> (<code>--missing-stats-only</code>) to, well, analyze only what’s needed.</p>
<p>On a similar note, the <code>--[no-]statistics</code> flag has been added to <a href="https://www.postgresql.org/docs/18/app-pgdump.html" rel="noopener">pg_dump</a>, <a href="https://www.postgresql.org/docs/18/app-pgdumpall.html" rel="noopener">pg_dumpall</a>, and <a href="https://www.postgresql.org/docs/18/app-pgrestore.html" rel="noopener">pg_restore</a>.</p>
<h3 id="maintenance"><a class="heading-anchor" href="#maintenance">Maintenance</a></h3>
<p>It’s now easier to know the maintenance effort on objects with total time spent on VACUUM and ANALYZE operation (and automatic ones) now reported into <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW" rel="noopener">pg_stat_all_tables</a> and variants.</p>
<p>A new GUC to not forget is <a href="https://www.postgresql.org/docs/18/runtime-config-statistics.html#GUC-TRACK-COST-DELAY-TIMING" rel="noopener">track_cost_delay_timing</a>. It collects time spent sleeping (due to delayed operations) for <code>VACUUM</code> and <code>ANALYZE</code>. While very interesting, like other <code>track_io*</code> GUCs, it implies a lot of extra calls to the system clock which on some platforms can lead to a severe performance impact. Always check with tool like <a href="https://www.postgresql.org/docs/18/pgtesttiming.html" rel="noopener">pg_test_timing</a> to ensure your system can afford it!</p>
<p>No more questions about checkpointer activity when using <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-CHECKPOINTER-VIEW" rel="noopener">pg_stat_checkpointer</a>. The new attribute <code>num_done</code> lets us know the number of <strong>completed</strong> checkpoints. You can also get what kind of buffers were written with <code>slru_written</code> and <code>buffers_written</code> now only matching <code>shared_buffers</code>: previously log and view were not providing the same counts because there was a SLRU counter <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=17cc5f666" rel="noopener">in one case and not the other</a>.</p>
<h3 id="analysis"><a class="heading-anchor" href="#analysis">Analysis</a></h3>
<p>Want to know more about the I/O handled by the backend (PID)? Call <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#PG-STAT-GET-BACKEND-IO" rel="noopener">pg_stat_get_backend_io(int)</a> and you’ll get output similar to what the <code>pg_stat_io</code> view provides, for this process (excluding those already). As for the WAL stats for this PID: call <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#PG-STAT-GET-BACKEND-WAL" rel="noopener">pg_stat_get_backend_wal(int)</a>.</p>
<p>New attributes <code>parallel_workers_to_launch</code> and <code>parallel_workers_launched</code> were introduced in <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-VIEW" rel="noopener">pg_stat_database</a>. The ratio lets us know if we have enough slots for parallel workers.</p>
<p>Interesting changes on <a href="https://www.postgresql.org/docs/18/pgstatstatements.html" rel="noopener">pg_stat_statements</a>: more queries will be grouped under the same identifier. For example, patterns <code>IN (1,2,3, ...)</code> as only first and last constant will be used. A more counter-intuitive change is related to the table name used in a query. Only the name is used, not the schema or relation OID. This last change allows us to track dropped or recreated tables for example, but it will group statistics from unrelated tables if they have just the same name. The way to keep separate statistics for tables with same name is to alias them in the queries (<code>FROM my.table mt, other.table ot</code>)…</p>
<p>Finally, additions to <a href="https://www.postgresql.org/docs/18/view-pg-backend-memory-contexts.html" rel="noopener">pg_backend_memory_contexts</a> with <code>path</code> (to get parent/child) and <code>type</code> to segregate <code>AllocSet</code>, <code>Generation</code>, <code>Slab</code> and <code>Bump</code> contexts… and what exactly are <code>Slab</code> and <code>Bump</code>? They are not documented; for these you’ll want to <a href="https://github.com/postgres/postgres/tree/master/src/backend/utils/mmgr" rel="noopener">read headers of C files here</a>. They exist to optimize memory allocation, reallocation, and reset, depending on expected memory usage. For example, <code>Slab</code> is defined as a «MemoryContext implementation designed for cases where large numbers of equally-sized objects can be allocated and freed efficiently with minimal memory wastage and fragmentation».</p>
<p>Ah, no, a last one, <code>wal_buffers_full</code> was added to <code>pg_stat_statements</code> to allow us to tune for <code>wal_buffers</code> with better insights.</p>
<h3 id="replication"><a class="heading-anchor" href="#replication">Replication</a></h3>
<p>There are now better insights for conflict management when using logical replication that leverage new attributes in <a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION-STATS" rel="noopener">pg_stat_subscription_stats</a>. As reference, this excerpt from <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6c2b5edec" rel="noopener">the commit entry</a> lists the following attributes that were introduced:</p>
<ul class="list">
<li>
<p><code>confl_insert_exists</code>:<br>
Number of times a row insertion violated a NOT DEFERRABLE unique<br>
constraint.</p>
</li>
<li>
<p><code>confl_update_origin_differs</code>:<br>
Number of times an update was performed on a row that was<br>
previously modified by another origin.</p>
</li>
<li>
<p><code>confl_update_exists</code>:<br>
Number of times that the updated value of a row violates a<br>
NOT DEFERRABLE unique constraint.</p>
</li>
<li>
<p><code>confl_update_missing</code>:<br>
Number of times that the tuple to be updated is missing.</p>
</li>
<li>
<p><code>confl_delete_origin_differs</code>:<br>
Number of times a delete was performed on a row that was<br>
previously modified by another origin.</p>
</li>
<li>
<p><code>confl_delete_missing</code>:<br>
Number of times that the tuple to be deleted is missing.</p>
</li>
</ul>
<h3 id="advanced"><a class="heading-anchor" href="#advanced">Advanced</a></h3>
<p>There is now a <a href="https://www.postgresql.org/docs/18/functions-admin.html#FUNCTIONS-ADMIN-STATSMOD" rel="noopener">new set of functions</a> to manage relation and attributes stats (<code>relpages</code>, <code>avg_width</code>, and so on). This gives you the freedom to export, import, and adjust stats as you want, so you can replicate planner behavior outside of “production”, maintain patched stats, and so on.</p>
<h3 id="my-favorite-for-extension-authors-the-new-c-stats-api"><a class="heading-anchor" href="#my-favorite-for-extension-authors-the-new-c-stats-api">My favorite for extension authors: the new C stats API</a></h3>
<p>One of the most exciting parts is what PostgreSQL 18 <em>opens up</em> for extension authors.</p>
<p>This tiny line at bottom of section <a href="https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MODULES" rel="noopener">E.1.3.9 Modules</a> is what concerns these changes:</p>
<blockquote>
<p>Allow extensions to use the server’s cumulative statistics API (Michael Paquier)</p>
</blockquote>
<p>Previously statistics manipulation was an internal-only affair; now there is an official, structured API surface you can build on (or wrap).</p>
<p>The <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7949d9594" rel="noopener">commit message</a> is well written, and covers most of the new functionality. A subset of the options is <a href="https://www.postgresql.org/docs/18/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS" rel="noopener">detailed in the documentation</a>. However, you will need to go into source code to know more at this stage; in particular, it’s worth having a look at the <code>injection points</code> extension (provided in core) which uses the new API.</p>
<p>For a deeper dive into how an extension can leverage these new capabilities, soon you will be able to see <strong>PACS (PostgreSQL Advanced Cumulative Statistics)</strong> on Codeberg - my project that provides a wrapper library and helper utilities around the new PostgreSQL 18 statistics APIs.</p>
<p>In the meantime, the talk I gave at <a href="https://archive.fosdem.org/2025/schedule/event/fosdem-2025-4496-stats-roll-baby-stats-roll-/" rel="noopener">FOSDEM 2025</a> explores these topics in greater detail.</p>
 ]]></content>
			<author>
				<name>Cédric Villemain</name>
			</author>
    </entry>
    <entry>
      <title>Most Desired Database Three Years Running: PostgreSQL's Developer Appeal</title>
      <link href="https://www.data-bene.io/en/blog/most-desired-database-three-years-running-postgresqls-developer-appeal/" />
      <updated>2025-08-09T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/most-desired-database-three-years-running-postgresqls-developer-appeal/</id>
     <content type="html"><![CDATA[ <p>PostgreSQL is having more than just a moment—it’s establishing a clear pattern of sustained excellence. For the third consecutive year, this community-driven database has claimed the top spot in the 2025 results for <a href="https://survey.stackoverflow.co/2025/" rel="noopener">Stack Overflow’s Annual Developer Survey</a>, and the results reveal both what developers value today and where the database landscape is heading.</p>
<p>The survey results show that PostgreSQL is ranked the highest among all database technologies for developers that want to use it in the next year (47%) or have used it this year and want to continue using it next year (66%) for the third year in a row.</p>
<h2 id="the-numbers-tell-a-compelling-story"><a class="heading-anchor" href="#the-numbers-tell-a-compelling-story"><strong>The Numbers Tell a Compelling Story</strong></a></h2>
<p>The survey data from over 49,000 developers across 177 countries provides clear evidence of PostgreSQL’s sustained appeal. Since 2023, PostgreSQL has consistently ranked as both the most desired and most admired database technology among developers.</p>
<p>Looking at the specific metrics from the survey visualizations, PostgreSQL leads with 46.5% of developers wanting to work with it in the coming year, while an impressive 65.5% of those who have used it want to continue doing so. These aren’t just impressive numbers—they represent a consistency that’s rare in the rapidly changing technology landscape.</p>
<p>The survey data also reveals an interesting pattern among developers currently using other database technologies. Developers working with MongoDB and Redis show a particularly strong desire to add PostgreSQL to their toolkit next year, seeing the value in adding relational database skills to their repertoire.</p>
<h2 id="the-community-advantage-in-action"><a class="heading-anchor" href="#the-community-advantage-in-action"><strong>The Community Advantage in Action</strong></a></h2>
<p>Why has PostgreSQL achieved this level of sustained success? The answer lies in its community-driven development model. As an open source project, PostgreSQL benefits from collaborative development that is both transparent and responsive to real-world needs.</p>
<p>The PostgreSQL project represents the best of what community-driven development can achieve. With over 400 code contributors across more than 140 supporting companies, the project boasts over 55,000 commits and more than 1.6 million lines of carefully crafted code. This diverse, globally distributed approach to development results in more thorough testing, faster bug fixes, and more innovative features than traditional commercial development models typically achieve.</p>
<p>Major versions are released annually with approximately 180 features per release, complemented by quarterly minor releases that include numerous improvements and fixes. This steady cadence of innovation consistently contributed by individuals all over the world ensures PostgreSQL doesn’t just keep pace with developer needs—it anticipates them. More than that, every individual has the agency to contribute to the project to ensure that anywhere the software is lagging behind, functionality changes to address modern demands.</p>
<h2 id="more-than-just-a-relational-database"><a class="heading-anchor" href="#more-than-just-a-relational-database"><strong>More Than Just a Relational Database</strong></a></h2>
<p>One key factor in PostgreSQL’s broad appeal is that it’s not limited to being just a relational database system. PostgreSQL is object-relational by design, capable of handling diverse data types including JSON/JSONB, XML, Key-Value, geometric, geospatial, native UUID, and time-series data. This versatility explains why developers from NoSQL backgrounds find PostgreSQL attractive—it offers relational reliability while maintaining the flexibility they’re accustomed to.</p>
<p>The extensive support for different data types, combined with ACID (Atomicity, Consistency, Isolation, Durability) characteristics, enables optimized, performant, and reliable data handling regardless of the specific requirements in place. Additionally, PostgreSQL’s huge community-driven extension network builds on its native extensibility, providing solutions for geospatial handling, disaster recovery, high availability infrastructure, monitoring, auditing, and much more.</p>
<h2 id="the-broader-database-landscape"><a class="heading-anchor" href="#the-broader-database-landscape"><strong>The Broader Database Landscape</strong></a></h2>
<p>While PostgreSQL dominates the top positions, the survey reveals a healthy, competitive database ecosystem. The complete rankings show:</p>
<p><strong>Most Desired Databases:</strong></p>
<ul class="list">
<li>PostgreSQL: 46.5%</li>
<li>SQLite: 28.3%</li>
<li>Redis: 23.5%</li>
<li>MySQL: 20.5%</li>
<li>MongoDB: 17.6%</li>
</ul>
<p><strong>Most Admired Databases:</strong></p>
<ul class="list">
<li>PostgreSQL: 65.5%</li>
<li>SQLite: 59%</li>
<li>Redis: 54.9%</li>
<li>MongoDB: 45.7%</li>
<li>MySQL: 43.2%</li>
</ul>
<p>These numbers reflect a diverse ecosystem where different databases serve specific purposes. SQLite’s strong performance highlights the continued importance of lightweight, embedded solutions. Redis maintains its position as a highly regarded specialized database for caching and real-time applications. Traditional databases like MySQL and Microsoft SQL Server continue to hold significant positions, while newer technologies like DuckDB show impressive admiration scores despite lower usage rates.</p>
<h2 id="the-foundation-of-postgresqls-enduring-success"><a class="heading-anchor" href="#the-foundation-of-postgresqls-enduring-success"><strong>The Foundation of PostgreSQL’s Enduring Success</strong></a></h2>
<p>Three consecutive years at the top of developer preferences doesn’t happen by accident. PostgreSQL’s sustained dominance stems from fundamental strengths that continue to serve developers well as technology landscapes shift. The resilience built into PostgreSQL through its community-driven development model means it adapts without losing stability. Its extensibility sets it apart in practical ways—rather than waiting for vendor roadmaps or worrying about feature gaps, developers can build what they need or leverage the extensive ecosystem of community extensions. The open source nature ensures PostgreSQL remains focused on developer needs rather than business models, with bug fixes happening quickly and features developing based on real-world use cases.</p>
<p>After 35 years of active development and three consecutive years as the most desired database technology, PostgreSQL has proven that community-driven open source development can deliver both immediate utility and long-term value. For developers and organizations looking at their database choices, PostgreSQL offers something increasingly rare: a technology that gets better over time without leaving its users behind.</p>
 ]]></content>
			<author>
				<name>Sarah Conway</name>
			</author>
    </entry>
    <entry>
      <title>Return of HOW2025</title>
      <link href="https://www.data-bene.io/en/blog/return-of-how2025/" />
      <updated>2025-07-15T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/return-of-how2025/</id>
     <content type="html"><![CDATA[ <p>The Highgo Open World conference, dedicated to the PostgreSQL ecosystem and the IvorySQL project, was held on June 27 and 28. The event was a resounding success: nearly 1,000 attendees on site, up to 8,000 simultaneous connections to the streams, and approximately 25,000 viewers in total.</p>
<p>The program featured 101 technical talks led by 105 speakers. The majority of sessions were in Mandarin, with an English track offered with simultaneous translation. You can view the full program at <a href="https://ivorysql.io/schedule/" rel="noopener">IvorySQL.io</a> and find the replays on Weibo via <a href="https://ivorysql.io/2025/06/27/live-access-june27/" rel="noopener">this link.</a></p>
<p>There was a small group of the international community present, including <a href="https://postgresql.life/post/grant_zhou/" rel="noopener">Grant Zhou</a>, who liaises and collaborates with the PostgreSQL association in China and the rest of the world.</p>
<p>On a technical level, the content was dense and particularly interesting. I particularly noted:</p>
<ul class="list">
<li>
<p>Alena Rybakina’s presentation on the PostgreSQL query planner and strategies for circumventing certain limitations.</p>
</li>
<li>
<p>A clear and concrete focus on Patroni (High Availability) by Alexander Kukushkin and Polina Bungina.</p>
</li>
<li>
<p>Florentz Tselai presented two topics applying his principles: simplicity and efficiency, using “AI” with PostgreSQL, and data management with Sun Tzu and the 36 Stratagems as support.</p>
</li>
<li>
<p>Also a very good introduction to Bazel and its use for Monogres (to be officially announced soon) presented energetically by Alvaro Hernandez.</p>
<p>Monogres is a very interesting initiative that should help strengthen control over the software supply chain, a major theme in IT today. And I also saw it as a great opportunity to showcase PostgreSQL variations with features and fixes that aren’t always possible to include in PostgreSQL itself or backport to previous major releases.</p>
</li>
<li>
<p>Michael Meskes had the honor of giving a plenary lecture on a topic that richly deserves it: “From Code to Commerce: Open Source Business Models Today,” a keynote on open-source and free software, applied to the PostgreSQL ecosystem.</p>
</li>
<li>
<p>My colleague Andrea presented the developments and trends of companies moving to IvorySQL and PostgreSQL.</p>
</li>
<li>
<p>For my part, I presented Linux PSI in the PostgreSQL context.</p>
</li>
</ul>
<p>Since everything is recorded, I encourage you to explore and watch the topics that interest you. There were also pre-recorded lectures in English during the event, but I admit that I took advantage of the time during the sessions to interact with participants.</p>
<p>Aside from the conferences, I had the chance to meet several members of the Chinese PostgreSQL community who are very well-known for their involvement in the success of PostgreSQL locally. I also had the opportunity to learn more about Cloudberry, replacing Greenplum, thanks Dianjin Wang!</p>
<hr>
<p>Data Bene also planned a time to meet with the IvorySQL team, based largely in Shandong, the province where Jinan, the host city of the conferences, is located. Ivory is a project in which we are actively involved and which allows companies to move away from Oracle “smoothly.” This is an important topic for our clients and one that occupies a prominent place in our partnership with Highgo: they have been working on this project for several years now, and we want to enable companies everywhere to benefit from it with appropriate support and expertise.</p>
<hr>
<p>The conferences were very well organized and the welcome was wonderful, the “Social Event” at the local “beer garden” perfectly suited to the heat of Jinan at the end of June!</p>
<p>Given the conference program, I bitterly regretted not understanding anything (there was 1 track in English and 5 in Mandarin)… but it is already being said that next year the Mandarin conferences could perhaps be translated (into English), and the date brought forward to May to take advantage of a milder climate.</p>
<p>There is so much to learn there that I will gladly return.</p>
 ]]></content>
			<author>
				<name>Cédric Villemain</name>
			</author>
    </entry>
    <entry>
      <title>A visit to PGConf.DE 2025 and discussion of PostgreSQL within the context of life sciences</title>
      <link href="https://www.data-bene.io/en/blog/a-visit-to-pgconfde-2025-and-discussion-of-postgresql-within-the-context-of-life-sciences/" />
      <updated>2025-06-06T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/a-visit-to-pgconfde-2025-and-discussion-of-postgresql-within-the-context-of-life-sciences/</id>
     <content type="html"><![CDATA[ <p>It’s always a pleasure to attend Postgres events, and <a href="http://PGConf.DE" rel="noopener">PGConf.DE</a> 2025 in Berlin was no different. This year’s event reunited old friendships and offered an open and welcoming environment to form new ones. And, of course, it also boasted numerous exciting talks!</p>
<p>At the conference I had the opportunity to present on Postgres within the context of the life sciences (discussed in the next section). And, altogether, I felt this conference had a nice diversity of talks: a selection that spanned Postgres core, its ecosystem, and beyond.</p>
<p>I’m confident that by the end, most if not all attendees left more enriched in some way relative to when they arrived.</p>
<h2 id="presentations"><a class="heading-anchor" href="#presentations">Presentations</a></h2>
<p>Leading up to this event I had the honor of one of my talks being accepted. The title was “<a href="https://www.postgresql.eu/events/pgconfde2025/schedule/session/6541-postgres-and-life-science-from-cells-to-stars/" rel="noopener">Postgres and Life Science: From Cells to Stars</a>” and it was organized as a meta-analysis / homage to the extensibility of Postgres and its various applications to the natural world.</p>
<p>In order to best tell this story, I walked the audience through the following five topics of increasing scope:</p>
<ul class="list">
<li>Neuronal mapping with a PostGIS-supported GUI
<ul class="list">
<li><a href="https://github.com/catmaid/CATMAID" rel="noopener">CATMAID source code</a></li>
</ul>
</li>
<li>Hydrological examination of rivers with the PgHydro extension
<ul class="list">
<li><a href="https://github.com/pghydro/pghydro" rel="noopener">PgHydro source code</a></li>
</ul>
</li>
<li>Fish biomass meta-analysis leveraging vanilla Postgres
<ul class="list">
<li><a href="https://www.nature.com/articles/s41597-024-04026-0" rel="noopener">Link to peer-reviewed publication</a></li>
</ul>
</li>
<li>COVID-19 dashboard using the Citus extension
<ul class="list">
<li><a href="https://www.citusdata.com/blog/2021/12/11/uk-covid-19-dashboard-built-using-postgres-and-citus/" rel="noopener">Link to blog post</a></li>
</ul>
</li>
<li>Star classification built on forked Postgres and altered extensions
<ul class="list">
<li><a href="https://indico.cern.ch/event/1471762/contributions/6280216/" rel="noopener">Link to presentation at Cern PGDay - 2025</a></li>
</ul>
</li>
</ul>
<p>I enjoyed putting together and presenting the talk, and there was nice discussion afterwards. Two points stood out in particular that I felt would be interesting to address here:</p>
<ol class="list">
<li>
<p>What three technologies (tools / workflows) would benefit most greatly, in terms of increased impact or adoptability, if their complexities were significantly reduced / abstracted away?</p>
</li>
<li>
<p>During my talk I made a claim that the brain was ACID compliant. While I was referring mostly to the action potentials of neurons, this was rightfully challenged.</p>
</li>
</ol>
<h3 id="1-identified-tools-/-workflows"><a class="heading-anchor" href="#1-identified-tools-/-workflows">1. Identified Tools / Workflows</a></h3>
<p><em>1. What three technologies (tools / workflows) would benefit most greatly, in terms of increased impact or adoptability, if their complexities were significantly reduced / abstracted away?</em></p>
<h4 id="1-image-vectorization"><a class="heading-anchor" href="#1-image-vectorization">1. Image vectorization</a></h4>
<p>Right out of the gate I thought about magnetic resonance scanner image classification. There’s quite a lot of conversation surrounding this topic within the medical community and there are plenty of startups in this space as well. My personal opinion is that there is momentum in the direction of accessibility, but there is still a strong separation between developer and end user. While I don’t know the answer at this point, I would look into <a href="https://github.com/pgvector/pgvector" rel="noopener">pgvector</a> and <a href="https://github.com/postgresml/postgresml" rel="noopener">postgresml</a> as a starting point. Due to this challenge’s involvement of vectors and machine learning, I would consider leveraging an image embedding service to turn the raw MRI output into a format that pgvector might be able to work with.</p>
<h4 id="2-data-management-and-version-control"><a class="heading-anchor" href="#2-data-management-and-version-control">2. Data management and version control</a></h4>
<p>As a former academic, I can speak to the ubiquitousness of the common spreadsheet (.csv format being less common, but still utilized). What’s more is that files are typically stored in local directories / private server / shared infrastructure, but nevertheless a vanilla folder architecture. One can imagine the potential frictions as the conversation scales to include multiple researchers across multiple groups. Factor in a naturally high student turnover, paired with an “I like doing it my way” mentality, and one could appreciate the value of standards. While improvements could be approached from a number of different angles, I’d like to focus on data management and version control.</p>
<p>Tidy data and good organizational hygiene are hallmarks of success in any field of study. However, tracking changes are most often, if not exclusively limited to text documents. While it might be surprising to the reader, “code repository” is not part of the common academic lexicon. Even the term “Linux” evokes an air of “mysterium tremendum et fascinans” (Otto, 1923). With data security at the top of the mind, self hosted options such as <a href="https://forgejo.org/" rel="noopener">forgejo</a> could potentially benefit life scientists greatly - particularly if there are reservations about storing data online. Instead of having multiple file drafts, e.g., “draft-1_final”, “draft_final_final”, etc, tools such as forgejo can help track progress and give researchers more transparency into past changes (leading into easier cross-team collaboration).</p>
<h4 id="3-compliance-and-auditing"><a class="heading-anchor" href="#3-compliance-and-auditing">3. Compliance and auditing</a></h4>
<p>Trust is a central topic in any field of research, and in certain circumstances, auditing (or otherwise some form of proof of work) may take center stage. In this case, Postgres and one of its companion extensions, <a href="https://github.com/pgaudit/pgaudit" rel="noopener">pgaudit</a>, can offer a nice step towards compliance. Due to Postgres’ capabilities, it can sometimes be viewed as intimidating and only suitable for large projects. I think there could be a lot of ubiquity with the publication of a “Postgres for small scale projects” type guide.</p>
<h4 id="discovery-and-exposure"><a class="heading-anchor" href="#discovery-and-exposure">Discovery and exposure</a></h4>
<p>At the end of the day, no one will willingly use something unless they know it exists. That’s why discoverability is one of the most fundamental concepts when discussing impact and adoptability. It’s up to the maintainers, contributors, and communities behind these open source tools to share what they’re up to on multiple platforms, as well as different conferences. Honestly, the easiest way to help is to just talk about it and get hands-on.</p>
<h3 id="2-the-brain-and-acid-compliance"><a class="heading-anchor" href="#2-the-brain-and-acid-compliance">2. The Brain and ACID Compliance</a></h3>
<p><em>2. During my talk I made a claim that the brain was ACID compliant. While I was referring mostly to the action potentials of neurons, this was rightfully challenged.</em></p>
<p>This was another exciting conversation in the post-presentation discussion, and while this really warrants its own blog post, I wanted to quickly share my thoughts. Within one of my slides, I made the claim that the brain is ACID compliant, at least in the sense of transactions being all-or-nothing. Neurons, which are a common cell type in the brain, have a characteristic whereby they receive signals which compile until a threshold is reached and then the neuron sends a signal of its own, or “fires.” This is a gross oversimplification: here’s a quick <a href="https://en.wikipedia.org/wiki/Action_potential" rel="noopener">Wikipedia link</a> for more information.</p>
<p>However, astute audience members righty noted that the brain is complex and has different regions. There is memory loss and there are activities that can alter function and consciousness. However, to what extent do external influences on the brain correspond to a database system? If something corrupts a Postgres database, it is no longer ACID compliant, but it was beforehand. All these points are both valid and interesting. It will be interesting to think on this and write a more formal response.</p>
<h2 id="concluding-thoughts"><a class="heading-anchor" href="#concluding-thoughts">Concluding Thoughts</a></h2>
<p>To sum things up, this was great conference. I know I speak for all attendees when I extend a thank you to all involved, whether they be staff, volunteers, speakers, or otherwise.</p>
<h2 id="references"><a class="heading-anchor" href="#references">References</a></h2>
<p>Foote, K. J., Grant, J. W. A., &amp; Biron, P. M. (2024). A global dataset of salmonid biomass in streams. Scientific data, 11(1), 1172. <a href="https://doi.org/10.1038/s41597-024-04026-0" rel="noopener">https://doi.org/10.1038/s41597-024-04026-0</a></p>
<p>Giordano, C., &amp; Hadjibagheri, P. (2021, December 11). UK COVID-19 dashboard built using Postgres and Citus for millions of users. Microsoft TechCommunity Blog. <a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/uk-covid-19-dashboard-built-using-postgres-and-citus-for/ba-p/3039052" rel="noopener">https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/uk-covid-19-dashboard-built-using-postgres-and-citus-for/ba-p/3039052</a></p>
<p>Kazimiers, T., et al. (2021). CATMAID (Collaborative Annotation Toolkit for Massive Amounts of Image Data) [Computer software]. GitHub. <a href="https://github.com/catmaid/CATMAID" rel="noopener">https://github.com/catmaid/CATMAID</a></p>
<p>Krefl, D., &amp; Nienartowicz, K. (2025, January 17). Harnessing Postgres and HPC for petabyte-scale variable star classification in astronomy [Conference presentation]. CERN PGDay 2025, Geneva, Switzerland. <a href="https://indico.cern.ch/event/1336647/contributions/5660229/" rel="noopener">https://indico.cern.ch/event/1336647/contributions/5660229/</a></p>
<p>Otto, R. (1923). The idea of the holy: An inquiry into the non-rational factor in the idea of the divine and its relation to the rational (J. W. Harvey, Trans.). Oxford University Press. (Original work published 1917)</p>
<p>Teixeira, A. de A., &amp; PgHydro Project. (2022). pghydro (Version 6.6) [Computer software]. GitHub. <a href="https://github.com/pghydro/pghydro" rel="noopener">https://github.com/pghydro/pghydro</a></p>
<p>Wikipedia contributors. (2025, May 16). Action potential. Wikipedia, The Free Encyclopedia. <a href="https://en.wikipedia.org/wiki/Action_potential" rel="noopener">https://en.wikipedia.org/wiki/Action_potential</a></p>
 ]]></content>
			<author>
				<name>Evan Stanton</name>
			</author>
    </entry>
    <entry>
      <title>SCaLE 22x: Bringing the Open Source Community to Pasadena</title>
      <link href="https://www.data-bene.io/en/blog/scale-22x-bringing-the-open-source-community-to-pasadena/" />
      <updated>2025-06-02T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/scale-22x-bringing-the-open-source-community-to-pasadena/</id>
     <content type="html"><![CDATA[ <p>The Southern California Linux Expo (SCaLE) 22x, recognized as being North America’s largest community-run open source and free software conference, took place at the Pasadena Convention Center from March 6-9, 2025. <em>When I say community-run, I mean it—no corporate overlords dictating the agenda, just pure open source enthusiasm driving four days of technical discussions and collaboration.</em></p>
<p>This year’s conference focused around the topics of AI, DevOps and cloud-native technologies, open source community engagement, security and compliance, systems and infrastructure, and FOSS @ home (exploring the world of self-hosted applications and cloud services).</p>
<p>The conference drew attendees from around the world to talk about everything open-source, revolving around Linux at the core (of course) while continuing the discussion across topics such as embedded systems &amp; IoT. As always, there was a unique blend of cutting-edge tech talk and practical problem-solving within every space that is what makes SCaLE special.</p>
<h2 id="herding-elephants-postgresql@scale22x"><a class="heading-anchor" href="#herding-elephants-postgresql@scale22x"><strong>Herding Elephants: PostgreSQL@SCaLE22x</strong></a></h2>
<p>PostgreSQL@SCaLE22x ran as a dedicated two-day, two-track event on March 6-7, 2025, recognized under the PostgreSQL Global Development Group community event guidelines. The selection team included Gabrielle Roth, Joe Conway, and Mark Wong, ensuring the quality you’d expect from the PostgreSQL community.</p>
<p>The speaker lineup was impressive: Magnus Hagander, Christophe Pettus, Peter Farkas, Devrim Gündüz, Hamid Akhtar, Henrietta Dombrovskaya, Shaun Thomas, Gülçin Yıldırım Jelínek &amp; Andrew Farries, Nick Meyer, and Jimmy Angelakos. One particularly memorable session was titled “Row-Level Security Sucks. Can We Make It Usable?”—a refreshingly honest take on PostgreSQL’s RLS feature that probably resonated with more than a few database administrators in the audience.</p>
<p>The community “Ask Me Anything” panel was hosted by Stacey Haysler and featured Christophe Pettus, Devrim Gündüz, Jimmy Angelakos, Magnus Hagander, and Mark Wong. These sessions are where the real knowledge transfer happens—no marketing speak, just practitioners talking shop about PostgreSQL internals, performance, best practices, and the future of the database.</p>
<p>Behind the scenes, volunteers Derya Gumustel, Erika Miller, Hamid Akhtar, Jennifer Scheuerell, Mark Wong, and Roberto Mello kept everything running smoothly, with PGUS hosting the booth in the expo hall.</p>
<p>Personally, I had the pleasure of collaborating with Jimmy Angelakos during his <a href="https://vyruss.org/blog/scale-22x-live-streams-row-level-security-sucks.html" rel="noopener">live streaming sessions</a> featuring other guests like Henrietta Dombrovskaya, Mark Wong, Gülçin Yıldırım Jelínek, and even a brief cameo from Devrim Gündüz.</p>
<p><em>One of the topics discussed with Gülçin Yıldırım Jelínek on the podcast is whether or not there’s any community interest in continuing <a href="https://www.youtube.com/watch?v=WwaJd2c9whM" rel="noopener">Postgres Café</a>. What do you think? Do you want to see more episodes from this podcast series, expanding discussions on extension and open source development to the rest of the community and beyond? Let us know: <a href="mailto:contact@data-bene.io">contact@data-bene.io</a></em></p>
<h2 id="something-for-everyone"><a class="heading-anchor" href="#something-for-everyone"><strong>Something for Everyone</strong></a></h2>
<p>There are a lot of co-located events besides PostgreSQL @ SCaLE, including “SCaLE: The Next Generation (TNG)” which is a youth-focused tech event encouraging interactive activities and presentations for students, and the annual Cybersecurity Capture the Flag (CTF) game event presented by Cal Poly FAST and Pacific Hackers.</p>
<p>SCaLE remains an excellent place to network when looking to advance your career in open source. Socializing at the booths is always an excellent way to make connections and find opportunities, of course, but Open Source Career Day also returned in order to offer a dedicated space for professionals and aspiring technologists to become empowered with resources, tools, real-world examples, and engaging content from presentations and workshops.</p>
<p>The fun tradition of holding a Saturday Game Night with food &amp; drinks also continued this year, with Trivia Night (presented by Uncoded) and other fun activities such as inflatable axe throwing, nerf target practice, arts &amp; crafts, a board game room, casino night, &amp; a blocks room for building derby cars, playing pictionary, or building with large blocks.</p>
<h2 id="keep-your-calendar-open-for-scale-23x"><a class="heading-anchor" href="#keep-your-calendar-open-for-scale-23x"><strong>Keep Your Calendar Open for SCaLE 23x</strong></a></h2>
<p>SCaLE has established itself as a consistent presence in Pasadena, and this stability has allowed the conference to build lasting relationships with the local community and venues. Keep an eye out for SCaLE 23x announcements - it promises to be well worth the visit.</p>
<p>For those interested in PostgreSQL@SCaLE specifically, stay tuned to the PostgreSQL mailing lists for announcements about volunteering, speaking opportunities, or other ways to participate in next year’s event. The PostgreSQL track and booth is a consistent source of engaging discussions amongst those in the Postgres community and beyond, reflecting the database’s growing adoption across industries.</p>
<h2 id="the-open-source-gathering-for-one-and-all"><a class="heading-anchor" href="#the-open-source-gathering-for-one-and-all"><strong>The Open Source Gathering for One and All</strong></a></h2>
<p>In a world where many tech conferences feel more like elaborate vendor showcases, SCaLE remains that rare gathering where community comes first, collaboration is genuine, and the technology discussions are driven by practitioners solving real problems. Mark your calendars for SCaLE 23x—this is one conference that consistently delivers on its promise of bringing together open source enthusiasts to actually collaborate and learn.</p>
<p>Wish you hadn’t missed out? You can always check out the <a href="https://www.youtube.com/playlist?list=PLh1QjGnfC2eREVHe8shz8Db7jGJvZerCK" rel="noopener">YouTube playlist of talks</a> that were recorded during the conference to at least benefit from the knowledge contained therein.</p>
 ]]></content>
			<author>
				<name>Sarah Conway</name>
			</author>
    </entry>
    <entry>
      <title>Postgres Café: Contributing to Open Source</title>
      <link href="https://www.data-bene.io/en/blog/postgres-cafe-contributing-to-open-source/" />
      <updated>2025-03-04T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/postgres-cafe-contributing-to-open-source/</id>
     <content type="html"><![CDATA[ <p>It’s our sixth episode of <a href="https://www.youtube.com/watch?v=WwaJd2c9whM" rel="noopener">Postgres Café</a>, a collaborative podcast from <a href="https://www.data-bene.io/en/" rel="noopener">Data Bene</a> &amp; <a href="https://xata.io/" rel="noopener">Xata</a> where we discuss everything from PostgreSQL extensions to community contributions. In today’s episode, Sarah Conway &amp; Gülçin Yıldırım Jelinek meet with Andrea Cucciniello on the topic of how companies and individuals can contribute to open source projects, and why they might consider doing so.</p>
<h2 id="episode-6-postgresql-extension-development-the-community-and-beyond"><a class="heading-anchor" href="#episode-6-postgresql-extension-development-the-community-and-beyond">Episode 6: PostgreSQL Extension Development, The Community, &amp; Beyond</a></h2>
<p>How often do companies express interest in open-source contribution? Clearly, by helping out in any way, the open-source project itself sees a benefit. But are there any advantages for the company that is giving back in any way? What are some contribution methods that a company can consider? These are all questions we hear about constantly—so let’s explore some of the answers discussed in this episode in a quick recap.</p>
<h3 id="giving-back-to-open-source-projects-and-communities"><a class="heading-anchor" href="#giving-back-to-open-source-projects-and-communities">Giving back to open source projects &amp; communities</a></h3>
<p>At Data Bene, we have a few customers that are interested in developing features or enhancements for the PostgreSQL ecosystem already.</p>
<p>These companies are interested in addressing bugs and adding new features that complement their use cases and tech stacks across PostgreSQL, Citus Data, and related technologies to accomplish two things:</p>
<ol class="list">
<li>To build functionality they need that is natively built into the upstream software and transparently maintained by the greater open-source community, and</li>
<li>To ensure others who have a similar use case are able to leverage these benefits as well.</li>
</ol>
<p>Times change; the only way the upstream software will remain relevant, useful, and beneficial to the global audience using the product is if there are global contributions back to the same, ensuring it still meets real users needs from year to year.</p>
<h3 id="why-support-open-source-projects"><a class="heading-anchor" href="#why-support-open-source-projects">Why support open-source projects?</a></h3>
<p>Vendor lock-in is a huge problem in the software &amp; services industry; giving back to open-source projects ensures that technology that is openly developed can continue to be so. Using FOSS technology means you avoid investing in a company that might close the code or restrict access, giving the end user freedom to continue using and developing essential tools that are part of their tech stack.</p>
<p>This kind of software is also subject to a highly visible development process, meaning it is much harder for privacy invasions, cybersecurity vulnerabilities, and more to be built into the underlying code.</p>
<p>Additionally, open-source software is built by individuals all over the world with a variety of perspectives and backgrounds; this ensures that it is thoroughly tested, with a wide range of features built-in that are <em>actually useful</em> to many end-users. This helps these kinds of projects to be successful for a number of years and continue to be so as long as there is a community willing to support each of them.</p>
<p><em>Case-in-point: PostgreSQL has been around for 35+ years of active development and is still topping developer surveys and charts today for being the most liked, most used, and most popular database solution—worldwide!</em></p>
<h3 id="how-can-companies-best-support-open-source-projects"><a class="heading-anchor" href="#how-can-companies-best-support-open-source-projects">How can companies best support open-source projects?</a></h3>
<p>There are a few key ways to achieve this end-goal:</p>
<ol class="list">
<li><strong>Include code contributions as part of your engineers’ working time.</strong> When allocating developer time for working on upstream code, you’re ensuring that the technology that you leverage (to provide support and/or services, to power your product, or for your infrastructure to depend on) experiences improved performance, expanded functionality, resolved issues, or addressed bug fixes.</li>
<li><strong>Consider developing extensions.</strong> Creating and maintaining extensions allow companies to add specialized features or address certain use cases without altering the core codebase. In the case of PostgreSQL in particular, this extensibility allows Postgres to meet the needs of different industries, users, and businesses, with a versatile and strong ecosystem. This kind of modular system lets PostgreSQL evolve without an overcomplicated core, making the project as a whole easier to manage and update.</li>
<li><strong>Sponsor, organize, and participate in events.</strong> As a company, you can elect to uplift or initiate technology conferences, user-groups, workshops, and more to spread awareness and educate the general public about the technology you want to see thrive. Events are an excellent way for users &amp; developers to collaborate, discuss advancements, and share best practices, which leads to a strengthened community and an enhanced product as a result.</li>
</ol>
<h3 id="how-data-bene-contributes"><a class="heading-anchor" href="#how-data-bene-contributes">How Data Bene contributes</a></h3>
<p>Cédric Villemain, Data Bene’s president, has developed <a href="https://codeberg.org/c2main/pgfincore" rel="noopener">pg_fincore</a> and is currently working on <a href="https://codeberg.org/data-bene/statsmgr" rel="noopener">StatsMgr</a>, pg_psi, and other components that are designed to improve Postgres’ statistics capabilities.</p>
<p>Our team is also responsible for a number of contributions across projects like <a href="https://www.citusdata.com/blog/2025/02/06/distribute-postgresql-17-with-citus-13/" rel="noopener">Citus Data</a> and <a href="https://github.com/zammad/zammad/" rel="noopener">Zammad</a>.</p>
<p>We make a point of sponsoring, presenting at, &amp; advocating for PostgreSQL or open-source community conferences and user groups, such as PostgreSQL Europe, pgDay Paris, AlpOSS, Capitole du Libre, &amp; more. Some of our team also individually have started or are on the organizational committees for various events such as the Barcelona &amp; Madrid PostgreSQL User Groups and pgDay Lowlands. The impact of events on the larger project &amp; community cannot be understated, and it is important to us to do all we can to contribute in this manner.</p>
<p>Finally, we help customers understand how to contribute to PostgreSQL and similar open-source projects. Through training, workshops, and collaboration, we encourage making meaningful contributions that fit their goals and support the greater community.</p>
<p><em>If you’re a developer who is interested in contributing to open-source and/or the PostgreSQL ecosystem, or helping customers with R&amp;D requirements, our team is expanding—visit us at our <a href="https://data-bene.io/en/jobs" rel="noopener">website</a> to see available positions!</em></p>
<h3 id="watch-the-full-episode"><a class="heading-anchor" href="#watch-the-full-episode">Watch the full episode</a></h3>
<p>Thinking about watching the full discussion? Check it out on YouTube:</p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/BYvXQB9O71U?si=-irKIHXxwiPhBFQP" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<h3 id="stay-tuned-for-more-postgres-tools"><a class="heading-anchor" href="#stay-tuned-for-more-postgres-tools">Stay tuned for more Postgres tools</a></h3>
<p>We’ve finished our first round of episodes for Postgres Café as of this release! More episodes may or may not be pending… follow us on social media (like <a href="https://www.linkedin.com/company/91744288" rel="noopener">LinkedIn</a> or <a href="https://fosstodon.org/@data_bene" rel="noopener">Mastodon</a>) to be updated on more to come. (Would you like to see more from this podcast series? Let us know!)</p>
<p><a href="https://www.youtube.com/playlist?list=PLf7KS0svgDP_zJmby3RMzzOVO45qLbruA" rel="noopener">Subscribe to the playlist</a> or check it out for interviews about open-source extensions like <a href="https://codeberg.org/Data-Bene/StatsMgr" rel="noopener">StatsMgr</a> for efficient statistics management for PostgreSQL, an open-source change data capture (CDC) tool designed specifically for PostgreSQL called <a href="https://youtu.be/j1R3a0-jg6c" rel="noopener">pgstream</a>, &amp; more. PostgreSQL is one of the most extensible databases on the market with a huge extension ecosystem; learn directly from the experts as you discover some of the options out there.</p>
 ]]></content>
			<author>
				<name>Sarah Conway</name>
			</author>
    </entry>
    <entry>
      <title>Postgres Café: Deploying distributed PostgreSQL at scale with Citus Data</title>
      <link href="https://www.data-bene.io/en/blog/postgres-cafe-deploying-distributed-postgresql-at-scale-with-citus-data/" />
      <updated>2025-01-29T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/postgres-cafe-deploying-distributed-postgresql-at-scale-with-citus-data/</id>
     <content type="html"><![CDATA[ <p>It’s time for the fourth episode of <a href="https://www.youtube.com/watch?v=WwaJd2c9whM" rel="noopener">Postgres Café</a>, a podcast from our teams at <a href="https://www.data-bene.io/en/" rel="noopener">Data Bene</a> and <a href="https://xata.io/" rel="noopener">Xata</a> where we discuss PostgreSQL contribution and extension development. In this latest episode, Sarah Conway and Gülçin Yıldırım Jelinek meet with Stéphane Carton to cover <a href="https://github.com/citusdata/citus" rel="noopener">Citus Data</a>, a completely open-source extension from Microsoft that provides a solution for deploying distributed PostgreSQL at scale.</p>
<h2 id="episode-4-citus-data"><a class="heading-anchor" href="#episode-4-citus-data">Episode 4: Citus Data</a></h2>
<p>The Citus database has experienced 127 releases since Mar 24, 2016 when it was first made freely open-source for open use and contributions. It’s a powerful tool that works natively with PostgreSQL, and seamlessly integrates with all Postgres tools and extensions. Continue reading for a summary of what we covered in this podcast episode!</p>
<h3 id="addressing-scalability-performance-and-the-management-of-large-datasets"><a class="heading-anchor" href="#addressing-scalability-performance-and-the-management-of-large-datasets">Addressing scalability, performance, and the management of large datasets</a></h3>
<p>So why does Citus Data exist, and what problems does it solve? Let’s delve into this by category.</p>
<h4 id="development"><a class="heading-anchor" href="#development">Development</a></h4>
<p>Citus is designed to solve the distributed data modeling problem by providing methods in distributed data modeling to map workloads, such as sharding tables based on primary keys (especially useful for microservices and high-throughput workloads).</p>
<h4 id="scalability"><a class="heading-anchor" href="#scalability">Scalability</a></h4>
<p>By distributing data across multiple nodes, you’re able to enable the horizontal scaling of PostgreSQL databases.</p>
<p>This allows developers to combine CPU, memory, storage, and I/O capacity across multiple machines for handling large datasets and high traffic workloads. It’s simple to add more worker nodes to the cluster and rebalance the shards as your data volume grows.</p>
<h4 id="performance"><a class="heading-anchor" href="#performance">Performance</a></h4>
<p>The distributed query engine in Citus is used to maximize efficiency, parallelizing queries and batching execution across multiple worker nodes.</p>
<p>Even in cases where there are thousands to millions of statements being executed per second, data ingestion is still optimized through finding the right shard placements, connecting to the appropriate worker nodes, and performing operations in parallel. All of this ensures high throughput and low latency for real-time data absorption.</p>
<h4 id="high-availability-and-redundancy"><a class="heading-anchor" href="#high-availability-and-redundancy">High Availability &amp; Redundancy</a></h4>
<p>Through the distributed data model, you can create redundant copies of tables and shard data across multiple nodes. Through this process, you can ensure the database remains resilient and available even when nodes crash, and maintain high availability as a result.</p>
<h3 id="contributing-to-citus"><a class="heading-anchor" href="#contributing-to-citus">Contributing to Citus</a></h3>
<p>At Data Bene, our goal is to support forward momentum of upstream source code through ongoing development and code contributions. Cédric Villemain, among others on our team, constantly assesses for new feature additions or other improvements that can make a difference for users.</p>
<p>No matter whether you’re part of a DevOps team that is looking to build out distributed architecture for your PostgreSQL instances, or an end user such as a business analyst that is seeking efficient performance when handling vast amounts of data, Citus Data may be the perfect extension to support your use case.</p>
<p>If you have specific feature requests or concerns, our team here at Data Bene will help support you to contribute directly to Citus Data or can do so on your behalf to ensure the longevity of the project and relevance for your projects. Learn more about contributing to Citus Data by referencing the official <a href="https://github.com/citusdata/citus/blob/main/CONTRIBUTING.md" rel="noopener">CONTRIBUTING.md</a> file.</p>
<h3 id="watch-the-full-episode"><a class="heading-anchor" href="#watch-the-full-episode">Watch the full episode</a></h3>
<p>Thinking about watching the full discussion? Check it out on YouTube:</p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/WueRn76nJ9Q?si=ulvzvfcr4Ux17tt0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<h3 id="stay-tuned-for-more-postgres-tools"><a class="heading-anchor" href="#stay-tuned-for-more-postgres-tools">Stay tuned for more Postgres tools</a></h3>
<p>More episodes are still being published for Postgres Café! <a href="https://www.youtube.com/playlist?list=PLf7KS0svgDP_zJmby3RMzzOVO45qLbruA" rel="noopener">Subscribe to the playlist</a> for more interviews around open-source tools like <a href="https://codeberg.org/Data-Bene/StatsMgr" rel="noopener">StatsMgr</a> for efficient statistics management for PostgreSQL, <a href="https://github.com/xataio/pgzx" rel="noopener">pgzx</a> for the creation of PostgreSQL extensions using Zig, &amp; more. Get ideas from the experts for new extensions to try out and maximize your Postgres deployments.</p>
 ]]></content>
			<author>
				<name>Sarah Conway</name>
			</author>
    </entry>
    <entry>
      <title>Postgres Café: Expand monitoring capabilities with StatsMgr</title>
      <link href="https://www.data-bene.io/en/blog/postgres-cafe-expand-monitoring-capabilities-with-statsmgr/" />
      <updated>2025-01-07T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/postgres-cafe-expand-monitoring-capabilities-with-statsmgr/</id>
     <content type="html"><![CDATA[ <p>2025 has begun, and with it we’re excited to release the second episode of <a href="https://www.youtube.com/watch?v=WwaJd2c9whM" rel="noopener">Postgres Café</a>, a blog and video series from our teams over at <a href="https://www.data-bene.io/en/" rel="noopener">Data Bene</a> and <a href="https://xata.io/" rel="noopener">Xata</a> made with the intention of exploring the world of open source and where it meets PostgreSQL’s extensibility. Throughout this series, we discuss different extensions and tools that enhance the developer experience when working with PostgreSQL. In our second episode, we explore a brand new PostgreSQL extension called <a href="https://codeberg.org/data-bene/statsmgr" rel="noopener">StatsMgr</a> that leverages background workers and shared memory to snapshot, manage, and query various statistics for WAL, SLRU, IO, checkpointing, and more.</p>
<h2 id="episode-2-statsmgr"><a class="heading-anchor" href="#episode-2-statsmgr">Episode 2: StatsMgr</a></h2>
<p>In this episode, we introduce the just-released open source extension StatsMgr, created to continuously monitor and track events across PostgreSQL and the underlying system. Here’s a look at what this episode covered:</p>
<h3 id="customized-metrics-processing"><a class="heading-anchor" href="#customized-metrics-processing">Customized metrics processing</a></h3>
<p>Originally the idea was to provide a simplified interface for metrics, while enhancing them with a wide variety of available types. This functionality was then expanded to address problems like:</p>
<ul class="list">
<li><strong>Making statistics available</strong> for collection from external systems, without interruption even when those external systems are down.</li>
<li><strong>Providing an immediate view of PostgreSQL statistics</strong> with historical tracking, including pg_stat views &amp; functions.</li>
<li><strong>Increasing &amp; reducing the amount of historical records when needed</strong> with dynamic buffer allocation.</li>
<li><strong>Debugging PostgreSQL instances</strong> with historical analysis and without required restarts.</li>
</ul>
<p>This extension, in turn, is great at handling situations like when…</p>
<ul class="list">
<li><strong>…your monitoring agent is down</strong>; using StatsMgr as a backup allows you to ensure you won’t lose statistics in this event, as events are captured regardless and stored for collection later on by your monitoring agent.</li>
<li><strong>…you have spikes or otherwise unusual behavior on your production system</strong>. This extension allows you to get an overview of activity for useful debugging insights.</li>
</ul>
<h3 id="expansive-and-historical-metrics-collection"><a class="heading-anchor" href="#expansive-and-historical-metrics-collection">Expansive &amp; historical metrics collection</a></h3>
<p>Currently, supported statistics include:</p>
<ul class="list">
<li>WAL</li>
<li>SLRU</li>
<li>BGWriter</li>
<li>Checkpointer</li>
<li>Archiver</li>
<li>IO</li>
</ul>
<p>Each of these is registered with a handler that lets you fetch and manage these statistics, and also is accompanied by shared memory structures for storing historical snapshots.</p>
<p>Some of the next steps for the project will include adding in dynamic statistics such as pg_stat_user_tables, amongst others.</p>
<p>There are still many things to do, from subtle improvements to major new features. So of course there’s many opportunities to contribute to the project, no matter if you’re a new-comer or an advanced PostgreSQL developer. Interested in being a part of the effort? Check out <a href="https://codeberg.org/Data-Bene/StatsMgr/src/branch/main/CONTRIBUTING.md" rel="noopener">CONTRIBUTING.md</a> within the project.</p>
<h3 id="watch-the-full-episode"><a class="heading-anchor" href="#watch-the-full-episode">Watch the full episode</a></h3>
<p>For an in-depth exploration of StatsMgr and its capabilities, watch the full episode here:</p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/UMzCLFwCPI8?si=-NW4Na4PAiq6qdoY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<h3 id="stay-tuned-for-more-postgres-tools"><a class="heading-anchor" href="#stay-tuned-for-more-postgres-tools">Stay tuned for more Postgres tools</a></h3>
<p>We still have much more to come for Postgres Café. <a href="https://www.youtube.com/playlist?list=PLf7KS0svgDP_zJmby3RMzzOVO45qLbruA" rel="noopener">Subscribe to the playlist</a> for episodes that feature more open-source tools like <a href="https://pgroll.com/" rel="noopener">pgroll</a> for zero-downtime schema migrations, <a href="https://www.citusdata.com/" rel="noopener">Citus Data</a> for distributed and scalable PostgreSQL as an extension, and more. Watch this space to learn how each tool can make working with Postgres smoother and more efficient.</p>
 ]]></content>
			<author>
				<name>Sarah Conway</name>
			</author>
    </entry>
    <entry>
      <title>Strange data type transformations</title>
      <link href="https://www.data-bene.io/en/blog/strange-data-type-transformations/" />
      <updated>2024-12-02T00:00:00Z</updated>
      <id>https://www.data-bene.io/en/blog/strange-data-type-transformations/</id>
     <content type="html"><![CDATA[ <h2 id="when-your-function-argument-types-are-loosely-changed"><a class="heading-anchor" href="#when-your-function-argument-types-are-loosely-changed">When your function argument types are loosely changed</a></h2>
<p>This article results from a code review I did for a customer.</p>
<p>Our customer created a <code>pg_dump --schema-only</code> of the target database to provide<br>
me with the plpgsql code and database object structures to review. So far<br>
so good.</p>
<p>I started to read the code and then became puzzled. The code looks like this:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">CREATE</span> <span class="token keyword">FUNCTION</span> xxx<span class="token punctuation">(</span> p_id <span class="token keyword">character</span><span class="token punctuation">,</span> p_info <span class="token keyword">character</span> <span class="token keyword">varying</span> <span class="token punctuation">)</span>
<span class="token keyword">RETURNS</span> <span class="token keyword">integer</span>
<span class="token keyword">LANGUAGE</span> plpgsql
<span class="token keyword">AS</span> $$
<span class="token keyword">DECLARE</span>
<span class="token keyword">BEGIN</span>
   <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
   <span class="token keyword">INSERT</span> <span class="token keyword">INTO</span> t1
   <span class="token keyword">SELECT</span> <span class="token operator">*</span> <span class="token keyword">FROM</span> t2 <span class="token keyword">WHERE</span> t2<span class="token punctuation">.</span>id <span class="token operator">=</span> p_id<span class="token punctuation">;</span>
   <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
<span class="token keyword">END</span><span class="token punctuation">;</span>
$$
<span class="token punctuation">;</span></code></pre>
<p>Maybe you saw nothing wrong with the function. Perhaps knowing the table<br>
definition will help:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">CREATE</span> <span class="token keyword">TABLE</span> t2 <span class="token punctuation">(</span>
   id <span class="token keyword">VARCHAR</span><span class="token punctuation">(</span><span class="token number">130</span><span class="token punctuation">)</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span>
   <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
   <span class="token keyword">PRIMARY</span> <span class="token keyword">KEY</span> <span class="token punctuation">(</span>id<span class="token punctuation">)</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p><a href="http://t2.id" rel="noopener">t2.id</a> is always 130 characters long (in practice) and there are 400 million tuples.<br>
So as you may have guessed, it seems odd to have the p_id CHARACTER matching id VARCHAR(130).<br>
Moreover CHARACTER is the same as CHAR(1).</p>
<p>Our customer had not seen any issues with the code for years. Nevertheless, our customer told me that the function definition he wrote was not like that: it was meant to be p_id CHARACTER(130) - not CHARACTER.</p>
<p>So what went wrong? Let’s test around because it’s fun.</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">CREATE</span> <span class="token keyword">FUNCTION</span> test<span class="token punctuation">(</span> c <span class="token keyword">character</span><span class="token punctuation">,</span> d <span class="token keyword">character</span> <span class="token keyword">varying</span> <span class="token punctuation">)</span>
<span class="token keyword">RETURNS</span> void
<span class="token keyword">LANGUAGE</span> plpgsql
<span class="token keyword">AS</span> $$
<span class="token keyword">BEGIN</span>
  RAISE NOTICE <span class="token string">'c=%, d=%'</span><span class="token punctuation">,</span> c<span class="token punctuation">,</span>d<span class="token punctuation">;</span>
<span class="token keyword">END</span><span class="token punctuation">;</span>
$$<span class="token punctuation">;</span>

<span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token string">'123465789'</span><span class="token punctuation">,</span> <span class="token string">'987654321'</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
NOTICE:  c<span class="token operator">=</span><span class="token number">123465789</span><span class="token punctuation">,</span> d<span class="token operator">=</span><span class="token number">987654321</span>
 test 
<span class="token comment">------</span>
 
<span class="token punctuation">(</span><span class="token number">1</span> <span class="token keyword">row</span><span class="token punctuation">)</span></code></pre>
<p>We have an interesting result here: no casting to CHAR(1) has been done.<br>
Let’s see more details:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">EXPLAIN</span> <span class="token punctuation">(</span>COSTS <span class="token keyword">OFF</span><span class="token punctuation">,</span><span class="token keyword">ANALYZE</span><span class="token punctuation">,</span>VERBOSE<span class="token punctuation">)</span>
        <span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token string">'123465789'</span><span class="token punctuation">,</span> <span class="token string">'987654321'</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
NOTICE:  c<span class="token operator">=</span><span class="token number">123465789</span><span class="token punctuation">,</span> d<span class="token operator">=</span><span class="token number">987654321</span>
                             QUERY <span class="token keyword">PLAN</span>                              
<span class="token comment">---------------------------------------------------------------------</span>
 Result <span class="token punctuation">(</span>actual <span class="token keyword">time</span><span class="token operator">=</span><span class="token number">0.040</span><span class="token punctuation">.</span><span class="token number">.0</span><span class="token number">.041</span> <span class="token keyword">rows</span><span class="token operator">=</span><span class="token number">1</span> loops<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
   Output: test<span class="token punctuation">(</span><span class="token string">'123465789'</span>::bpchar<span class="token punctuation">,</span> <span class="token string">'987654321'</span>::<span class="token keyword">character</span> <span class="token keyword">varying</span><span class="token punctuation">)</span>
 Planning <span class="token keyword">Time</span>: <span class="token number">0.023</span> ms
 Execution <span class="token keyword">Time</span>: <span class="token number">0.053</span> ms
<span class="token punctuation">(</span><span class="token number">4</span> <span class="token keyword">rows</span><span class="token punctuation">)</span></code></pre>
<p>We can see there was a cast to BPCHAR. As a reminder, BPCHAR is an alias of CHARACTER<br>
and it can represent a string up to 10,485,760 characters.</p>
<p>Now let’s make another test:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">CREATE</span> <span class="token keyword">FUNCTION</span> test<span class="token punctuation">(</span>c <span class="token keyword">character</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">RETURNS</span> <span class="token keyword">character</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span>
<span class="token keyword">LANGUAGE</span> <span class="token keyword">sql</span>
<span class="token keyword">AS</span> $$
<span class="token keyword">select</span> c<span class="token punctuation">;</span>
$$<span class="token punctuation">;</span></code></pre>
<p>As you can see, the language changed to SQL and the argument type and the return<br>
type are CHAR(4). How does it execute?</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">SELECT</span> test<span class="token punctuation">(</span><span class="token string">'123456789'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
   test    
<span class="token comment">-----------</span>
 <span class="token number">123456789</span>
<span class="token punctuation">(</span><span class="token number">1</span> <span class="token keyword">row</span><span class="token punctuation">)</span>

<span class="token keyword">EXPLAIN</span> VERBOSE <span class="token keyword">SELECT</span> test<span class="token punctuation">(</span><span class="token string">'123456789'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                QUERY <span class="token keyword">PLAN</span>                 
<span class="token comment">-------------------------------------------</span>
 Result  <span class="token punctuation">(</span>cost<span class="token operator">=</span><span class="token number">0.00</span><span class="token punctuation">.</span><span class="token number">.0</span><span class="token number">.01</span> <span class="token keyword">rows</span><span class="token operator">=</span><span class="token number">1</span> width<span class="token operator">=</span><span class="token number">32</span><span class="token punctuation">)</span>
   Output: <span class="token string">'123456789'</span>::bpchar
<span class="token punctuation">(</span><span class="token number">2</span> <span class="token keyword">rows</span><span class="token punctuation">)</span></code></pre>
<p>As you can see, even though you expect to process CHAR(4) data, you end up processing arbitrary length strings instead!!</p>
<p>However, do not rush to PostgreSQL mailing list to complain YET!</p>
<p>As a matter of fact, this behaviour is not a bug. The <a href="https://www.postgresql.org/docs/current/sql-createfunction.html" rel="noopener">documentation</a> states:</p>
<blockquote>
<p>“The full SQL type syntax is allowed for declaring a function’s arguments and return value. However, parenthesized type modifiers (e.g., the precision field for type numeric) are discarded by CREATE FUNCTION. Thus for example CREATE FUNCTION foo (varchar(10)) … is exactly the same as CREATE FUNCTION foo (varchar) …”</p>
</blockquote>
<p>This explains that CHARACTER(x) became CHARACTER aliased as BPCHAR. And as we saw, BPCHAR is not actually CHAR(1) but more like VARCHAR(10485760). This fully explains the behaviour.</p>
<p>Wait, wait , WAIT ! The original intention was to deal with CHAR(4) string - not any arbituary length strings.</p>
<p>Isn’t there any hope? No, sorry… (kidding.)</p>
<p>Reading the same documentation page, we see that “argtype” and “rettype” can be base, composite, or domain types, or can reference the type of a table column.</p>
<p>The trick is to create either a composite type or a domain to use as argtype or rettype.</p>
<p>Here are some examples:</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Works in simple case trick</span>
<span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token string">'12345789'</span>::<span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token comment">-- Domain trick</span>
<span class="token keyword">CREATE</span> DOMAIN c4 <span class="token keyword">AS</span> <span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">CREATE</span> <span class="token keyword">FUNCTION</span> test<span class="token punctuation">(</span>param c4<span class="token punctuation">)</span>
<span class="token keyword">RETURNS</span> c4
<span class="token keyword">AS</span> $$
<span class="token keyword">BEGIN</span>
  RAISE NOTICE <span class="token string">'param=%'</span><span class="token punctuation">,</span> param<span class="token punctuation">;</span>
  <span class="token keyword">RETURN</span> param<span class="token punctuation">;</span>
<span class="token keyword">END</span><span class="token punctuation">;</span>
$$ <span class="token keyword">LANGUAGE</span> plpgsql<span class="token punctuation">;</span>

<span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token string">'123456789'</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
ERROR:  <span class="token keyword">value</span> too long <span class="token keyword">for</span> <span class="token keyword">type</span> <span class="token keyword">character</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span>

<span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token string">'123456789'</span>::<span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
NOTICE:  param<span class="token operator">=</span><span class="token number">1234</span>
 test 
<span class="token comment">------</span>
 <span class="token number">1234</span>
<span class="token punctuation">(</span><span class="token number">1</span> <span class="token keyword">row</span><span class="token punctuation">)</span>

<span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token string">'123456789'</span>::c4<span class="token punctuation">)</span><span class="token punctuation">;</span>
NOTICE:  param<span class="token operator">=</span><span class="token number">1234</span>
 test 
<span class="token comment">------</span>
 <span class="token number">1234</span>
<span class="token punctuation">(</span><span class="token number">1</span> <span class="token keyword">row</span><span class="token punctuation">)</span>

<span class="token keyword">SELECT</span> pg_typeof<span class="token punctuation">(</span> test<span class="token punctuation">(</span> <span class="token string">'123456789'</span>::<span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
NOTICE:  param<span class="token operator">=</span><span class="token number">1234</span>
 pg_typeof 
<span class="token comment">-----------</span>
 c4
<span class="token punctuation">(</span><span class="token number">1</span> <span class="token keyword">row</span><span class="token punctuation">)</span></code></pre>
<p>Now you should be happy with the result.</p>
<p>What? Not yet? Ok here is an additional trick.</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Map a table structure</span>
<span class="token keyword">CREATE</span> <span class="token keyword">TABLE</span> qq <span class="token punctuation">(</span> c <span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token keyword">CREATE</span> <span class="token keyword">FUNCTION</span> test<span class="token punctuation">(</span><span class="token operator">IN</span> c qq<span class="token punctuation">,</span> <span class="token keyword">OUT</span> d qq<span class="token punctuation">)</span>
<span class="token keyword">LANGUAGE</span> <span class="token keyword">sql</span>
<span class="token keyword">AS</span> $$
<span class="token keyword">SELECT</span> c<span class="token punctuation">;</span>
$$<span class="token punctuation">;</span>

<span class="token keyword">SELECT</span> <span class="token operator">*</span> <span class="token keyword">FROM</span> test<span class="token punctuation">(</span><span class="token keyword">ROW</span><span class="token punctuation">(</span><span class="token string">'12345'</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
ERROR:  <span class="token keyword">value</span> too long <span class="token keyword">for</span> <span class="token keyword">type</span> <span class="token keyword">character</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span>

<span class="token keyword">SELECT</span> <span class="token operator">*</span> <span class="token keyword">from</span> test<span class="token punctuation">(</span><span class="token keyword">ROW</span><span class="token punctuation">(</span><span class="token string">'1234'</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
  c   
<span class="token comment">------</span>
 <span class="token number">1234</span></code></pre>
<p>Hmm, OK, but how is this is different from the domain trick?</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- Easy Type Alteration</span>
<span class="token keyword">ALTER</span> <span class="token keyword">TABLE</span> qq <span class="token keyword">ALTER</span> c <span class="token keyword">TYPE</span> <span class="token keyword">char</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token keyword">SELECT</span> <span class="token operator">*</span> <span class="token keyword">FROM</span> test<span class="token punctuation">(</span> <span class="token keyword">ROW</span><span class="token punctuation">(</span><span class="token string">'12345'</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
   c   
<span class="token comment">-------</span>
 <span class="token number">12345</span></code></pre>
<p>Try to ALTER a domain - you will see how (not) easy it is.</p>
<p>The table definition trick allows for some flexibility as follows:</p>
<pre class="language-sql"><code class="language-sql"><span class="token keyword">ALTER</span> <span class="token keyword">TABLE</span> qq <span class="token keyword">ADD</span> ee <span class="token keyword">int</span><span class="token punctuation">;</span>

<span class="token keyword">SELECT</span> test<span class="token punctuation">(</span> <span class="token keyword">ROW</span><span class="token punctuation">(</span><span class="token string">'12345'</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
   test   
<span class="token comment">----------</span>
 <span class="token punctuation">(</span><span class="token number">12345</span><span class="token punctuation">,</span><span class="token number">4</span><span class="token punctuation">)</span>

<span class="token keyword">SELECT</span> <span class="token operator">*</span> <span class="token keyword">FROM</span> test<span class="token punctuation">(</span> <span class="token keyword">ROW</span><span class="token punctuation">(</span><span class="token string">'12345'</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">;</span>
   c   <span class="token operator">|</span> ee 
<span class="token comment">-------+----</span>
 <span class="token number">12345</span> <span class="token operator">|</span>  <span class="token number">4</span></code></pre>
<p>We hope you enjoyed this article and that you learnt something new and interesting!</p>
 ]]></content>
			<author>
				<name>Frédéric Delacourt</name>
			</author>
    </entry>
</feed>
