Data Bene

Once Upon a Time in a Confined Database - PostgreSQL, QRCodes, and the Art of Backup Without a Network

2025-04-01T00:00:00Z

📦 The Fort Knox of Databases

Once upon a time, in a faraway server room encased in heavy glass and reinforced concrete, lived a PostgreSQL database so confined, so secluded, it could only dream of the cloud.

No network.

No USB.

No writable external storage device.

Just a keyboard, a monitor, and the hum of industrial-grade air filters.

This wasn’t your average air-gapped setup. This was a zero-exfiltration zone, with operational security so tight you’d think it was guarding state secrets—or worse, legacy banking software.

And yet, in this digital oubliette, one innocent challenge remained:
How do you back up a PostgreSQL database without ever extracting a file?

🎥 When Screens Are Your Network

Our customer didn’t just want backups—they needed them. The fear wasn’t theft, it was total failure: a motherboard dying quietly in its glass sarcophagus, taking the data with it. And if it came to that, chiseling through reinforced architecture wasn’t a viable disaster recovery plan.

We brainstormed everything:

OCR of scrolling SQL dumps? Too lossy.
Filming the psql output? Way too verbose.
Printing out hex? Please, we’re not monsters.

And then came the epiphany: QR codes.

What if we could pg_dump the database…

into QR codes…

on the screen…

captured by a high-speed camera…

then reassembled frame by frame outside the vault?

It was so absurd, it just might work.

🧠 Hacking `pg_dump`: Now with More Pixels

PostgreSQL’s beloved pg_dump tool is modular. So we extended it with a custom archiver: --format=qrcode.

Here’s how it works:

QR encoding: Each chunk of SQL output is encoded into a PNG QR code.
Streaming: Instead of saving to disk, we push the stream of PNGs directly to stdout.
Framing: Our UI lays out multiple QR codes on a single screen using high-DPI output. (We’re talking 2000+ pixels here—room for a whole grid of codes.)
Playback: A dedicated machine with a 1280Hz high-speed camera films the screen, capturing the sequence as a video.

No keyboard macros. No sneaky uploads. Just photons and frames.

🔍 Reassembly Outside the Glass

Once the video is extracted from the glass box:

Our parser watches the footage, frame by frame.
QR codes are detected and decoded in parallel.
Each chunk is sequence-tagged for ordering.
The resulting text is reassembled into a proper pg_dump.sql.

And just like that: the database lives again—fully exported with no digital transfer.
Only light and lenses.

🧩 Notes on Performance & Fidelity

QR Version: We used Version 40 QR codes (max capacity) with optimized binary mode for high density.
Error Correction: Level Q for resilience under compression/artifacts.
Screen Real Estate: 25×16 grid of codes per frame on a 1920×1080 pixel monitor—350 chunks per screen, around 1,016KB per frame.
Playback Rate: We achieved ~60 screens/sec = 21,000 chunks/sec, nearly 60MB/sec!.
Total Export Time: A full logical backup under 6GB was exported in less than 100 minutes!

🛡️ Why This Matters

This isn’t just a quirky story—it’s a reminder that PostgreSQL’s flexibility extends even into the absurd. Air-gapped systems aren’t rare in defense, finance, or critical infrastructure. And when normal tooling fails, PostgreSQL’s pluggable architecture gives you room to innovate, even in the tightest constraints.

We at Data Bene live for this kind of challenge. Whether it’s optimizing query plans or designing data exfiltration methods that look like spycraft, we’re here to help you make PostgreSQL dance—even when it’s stuck in a cage.

Want to try it yourself? Drop us a line—we love weird backups.

And if you’re thinking of streaming pg_restore into a laser light show, call us. We’re intrigued.

Postgres Café: Contributing to Open Source

2025-03-04T00:00:00Z

It’s our sixth episode of Postgres Café, a collaborative podcast from Data Bene & Xata where we discuss everything from PostgreSQL extensions to community contributions. In today’s episode, Sarah Conway & Gülçin Yıldırım Jelinek meet with Andrea Cucciniello on the topic of how companies and individuals can contribute to open source projects, and why they might consider doing so.

Episode 6: PostgreSQL Extension Development, The Community, & Beyond

How often do companies express interest in open-source contribution? Clearly, by helping out in any way, the open-source project itself sees a benefit. But are there any advantages for the company that is giving back in any way? What are some contribution methods that a company can consider? These are all questions we hear about constantly—so let’s explore some of the answers discussed in this episode in a quick recap.

Giving back to open source projects & communities

At Data Bene, we have a few customers that are interested in developing features or enhancements for the PostgreSQL ecosystem already.

These companies are interested in addressing bugs and adding new features that complement their use cases and tech stacks across PostgreSQL, Citus Data, and related technologies to accomplish two things:

To build functionality they need that is natively built into the upstream software and transparently maintained by the greater open-source community, and
To ensure others who have a similar use case are able to leverage these benefits as well.

Times change; the only way the upstream software will remain relevant, useful, and beneficial to the global audience using the product is if there are global contributions back to the same, ensuring it still meets real users needs from year to year.

Why support open-source projects?

Vendor lock-in is a huge problem in the software & services industry; giving back to open-source projects ensures that technology that is openly developed can continue to be so. Using FOSS technology means you avoid investing in a company that might close the code or restrict access, giving the end user freedom to continue using and developing essential tools that are part of their tech stack.

This kind of software is also subject to a highly visible development process, meaning it is much harder for privacy invasions, cybersecurity vulnerabilities, and more to be built into the underlying code.

Additionally, open-source software is built by individuals all over the world with a variety of perspectives and backgrounds; this ensures that it is thoroughly tested, with a wide range of features built-in that are actually useful to many end-users. This helps these kinds of projects to be successful for a number of years and continue to be so as long as there is a community willing to support each of them.

Case-in-point: PostgreSQL has been around for 35+ years of active development and is still topping developer surveys and charts today for being the most liked, most used, and most popular database solution—worldwide!

How can companies best support open-source projects?

There are a few key ways to achieve this end-goal:

Include code contributions as part of your engineers’ working time. When allocating developer time for working on upstream code, you’re ensuring that the technology that you leverage (to provide support and/or services, to power your product, or for your infrastructure to depend on) experiences improved performance, expanded functionality, resolved issues, or addressed bug fixes.
Consider developing extensions. Creating and maintaining extensions allow companies to add specialized features or address certain use cases without altering the core codebase. In the case of PostgreSQL in particular, this extensibility allows Postgres to meet the needs of different industries, users, and businesses, with a versatile and strong ecosystem. This kind of modular system lets PostgreSQL evolve without an overcomplicated core, making the project as a whole easier to manage and update.
Sponsor, organize, and participate in events. As a company, you can elect to uplift or initiate technology conferences, user-groups, workshops, and more to spread awareness and educate the general public about the technology you want to see thrive. Events are an excellent way for users & developers to collaborate, discuss advancements, and share best practices, which leads to a strengthened community and an enhanced product as a result.

How Data Bene contributes

Cédric Villemain, Data Bene’s president, has developed pg_fincore and is currently working on StatsMgr, pg_psi, and other components that are designed to improve Postgres’ statistics capabilities.

Our team is also responsible for a number of contributions across projects like Citus Data and Zammad.

We make a point of sponsoring, presenting at, & advocating for PostgreSQL or open-source community conferences and user groups, such as PostgreSQL Europe, pgDay Paris, AlpOSS, Capitole du Libre, & more. Some of our team also individually have started or are on the organizational committees for various events such as the Barcelona & Madrid PostgreSQL User Groups and pgDay Lowlands. The impact of events on the larger project & community cannot be understated, and it is important to us to do all we can to contribute in this manner.

Finally, we help customers understand how to contribute to PostgreSQL and similar open-source projects. Through training, workshops, and collaboration, we encourage making meaningful contributions that fit their goals and support the greater community.

If you’re a developer who is interested in contributing to open-source and/or the PostgreSQL ecosystem, or helping customers with R&D requirements, our team is expanding—visit us at our website to see available positions!

Watch the full episode

Thinking about watching the full discussion? Check it out on YouTube:

Stay tuned for more Postgres tools

We’ve finished our first round of episodes for Postgres Café as of this release! More episodes may or may not be pending… follow us on social media (like LinkedIn or Mastodon) to be updated on more to come. (Would you like to see more from this podcast series? Let us know!)

Subscribe to the playlist or check it out for interviews about open-source extensions like StatsMgr for efficient statistics management for PostgreSQL, an open-source change data capture (CDC) tool designed specifically for PostgreSQL called pgstream, & more. PostgreSQL is one of the most extensible databases on the market with a huge extension ecosystem; learn directly from the experts as you discover some of the options out there.

Postgres Café: Deploying distributed PostgreSQL at scale with Citus Data

2025-01-29T00:00:00Z

It’s time for the fourth episode of Postgres Café, a podcast from our teams at Data Bene and Xata where we discuss PostgreSQL contribution and extension development. In this latest episode, Sarah Conway and Gülçin Yıldırım Jelinek meet with Stéphane Carton to cover Citus Data, a completely open-source extension from Microsoft that provides a solution for deploying distributed PostgreSQL at scale.

Episode 4: Citus Data

The Citus database has experienced 127 releases since Mar 24, 2016 when it was first made freely open-source for open use and contributions. It’s a powerful tool that works natively with PostgreSQL, and seamlessly integrates with all Postgres tools and extensions. Continue reading for a summary of what we covered in this podcast episode!

Addressing scalability, performance, and the management of large datasets

So why does Citus Data exist, and what problems does it solve? Let’s delve into this by category.

Development

Citus is designed to solve the distributed data modeling problem by providing methods in distributed data modeling to map workloads, such as sharding tables based on primary keys (especially useful for microservices and high-throughput workloads).

Scalability

By distributing data across multiple nodes, you’re able to enable the horizontal scaling of PostgreSQL databases.

This allows developers to combine CPU, memory, storage, and I/O capacity across multiple machines for handling large datasets and high traffic workloads. It’s simple to add more worker nodes to the cluster and rebalance the shards as your data volume grows.

Performance

The distributed query engine in Citus is used to maximize efficiency, parallelizing queries and batching execution across multiple worker nodes.

Even in cases where there are thousands to millions of statements being executed per second, data ingestion is still optimized through finding the right shard placements, connecting to the appropriate worker nodes, and performing operations in parallel. All of this ensures high throughput and low latency for real-time data absorption.

High Availability & Redundancy

Through the distributed data model, you can create redundant copies of tables and shard data across multiple nodes. Through this process, you can ensure the database remains resilient and available even when nodes crash, and maintain high availability as a result.

Contributing to Citus

At Data Bene, our goal is to support forward momentum of upstream source code through ongoing development and code contributions. Cédric Villemain, among others on our team, constantly assesses for new feature additions or other improvements that can make a difference for users.

No matter whether you’re part of a DevOps team that is looking to build out distributed architecture for your PostgreSQL instances, or an end user such as a business analyst that is seeking efficient performance when handling vast amounts of data, Citus Data may be the perfect extension to support your use case.

If you have specific feature requests or concerns, our team here at Data Bene will help support you to contribute directly to Citus Data or can do so on your behalf to ensure the longevity of the project and relevance for your projects. Learn more about contributing to Citus Data by referencing the official CONTRIBUTING.md file.

Watch the full episode

Thinking about watching the full discussion? Check it out on YouTube:

Stay tuned for more Postgres tools

More episodes are still being published for Postgres Café! Subscribe to the playlist for more interviews around open-source tools like StatsMgr for efficient statistics management for PostgreSQL, pgzx for the creation of PostgreSQL extensions using Zig, & more. Get ideas from the experts for new extensions to try out and maximize your Postgres deployments.

Postgres Café: Expand monitoring capabilities with StatsMgr

2025-01-07T00:00:00Z

2025 has begun, and with it we’re excited to release the second episode of Postgres Café, a blog and video series from our teams over at Data Bene and Xata made with the intention of exploring the world of open source and where it meets PostgreSQL’s extensibility. Throughout this series, we discuss different extensions and tools that enhance the developer experience when working with PostgreSQL. In our second episode, we explore a brand new PostgreSQL extension called StatsMgr that leverages background workers and shared memory to snapshot, manage, and query various statistics for WAL, SLRU, IO, checkpointing, and more.

Episode 2: StatsMgr

In this episode, we introduce the just-released open source extension StatsMgr, created to continuously monitor and track events across PostgreSQL and the underlying system. Here’s a look at what this episode covered:

Customized metrics processing

Originally the idea was to provide a simplified interface for metrics, while enhancing them with a wide variety of available types. This functionality was then expanded to address problems like:

Making statistics available for collection from external systems, without interruption even when those external systems are down.
Providing an immediate view of PostgreSQL statistics with historical tracking, including pg_stat views & functions.
Increasing & reducing the amount of historical records when needed with dynamic buffer allocation.
Debugging PostgreSQL instances with historical analysis and without required restarts.

This extension, in turn, is great at handling situations like when…

…your monitoring agent is down; using StatsMgr as a backup allows you to ensure you won’t lose statistics in this event, as events are captured regardless and stored for collection later on by your monitoring agent.
…you have spikes or otherwise unusual behavior on your production system. This extension allows you to get an overview of activity for useful debugging insights.

Expansive & historical metrics collection

Currently, supported statistics include:

WAL
SLRU
BGWriter
Checkpointer
Archiver
IO

Each of these is registered with a handler that lets you fetch and manage these statistics, and also is accompanied by shared memory structures for storing historical snapshots.

Some of the next steps for the project will include adding in dynamic statistics such as pg_stat_user_tables, amongst others.

There are still many things to do, from subtle improvements to major new features. So of course there’s many opportunities to contribute to the project, no matter if you’re a new-comer or an advanced PostgreSQL developer. Interested in being a part of the effort? Check out CONTRIBUTING.md within the project.

Watch the full episode

For an in-depth exploration of StatsMgr and its capabilities, watch the full episode here:

Stay tuned for more Postgres tools

We still have much more to come for Postgres Café. Subscribe to the playlist for episodes that feature more open-source tools like pgroll for zero-downtime schema migrations, Citus Data for distributed and scalable PostgreSQL as an extension, and more. Watch this space to learn how each tool can make working with Postgres smoother and more efficient.

Strange data type transformations

2024-12-02T00:00:00Z

When your function argument types are loosely changed

This article results from a code review I did for a customer.

Our customer created a pg_dump --schema-only of the target database to provide
me with the plpgsql code and database object structures to review. So far
so good.

I started to read the code and then became puzzled. The code looks like this:

CREATE FUNCTION xxx( p_id character, p_info character varying )
RETURNS integer
LANGUAGE plpgsql
AS $$
DECLARE
BEGIN
   ...
   INSERT INTO t1
   SELECT * FROM t2 WHERE t2.id = p_id;
   ...
END;
$$
;

Maybe you saw nothing wrong with the function. Perhaps knowing the table
definition will help:

CREATE TABLE t2 (
   id VARCHAR(130) NOT NULL
   ...
   PRIMARY KEY (id)
);

t2.id is always 130 characters long (in practice) and there are 400 million tuples.
So as you may have guessed, it seems odd to have the p_id CHARACTER matching id VARCHAR(130).
Moreover CHARACTER is the same as CHAR(1).

Our customer had not seen any issues with the code for years. Nevertheless, our customer told me that the function definition he wrote was not like that: it was meant to be p_id CHARACTER(130) - not CHARACTER.

So what went wrong? Let’s test around because it’s fun.

CREATE FUNCTION test( c character, d character varying )
RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
  RAISE NOTICE 'c=%, d=%', c,d;
END;
$$;

SELECT test( '123465789', '987654321' );
NOTICE:  c=123465789, d=987654321
 test 
------
 
(1 row)

We have an interesting result here: no casting to CHAR(1) has been done.
Let’s see more details:

EXPLAIN (COSTS OFF,ANALYZE,VERBOSE)
        SELECT test( '123465789', '987654321' );
NOTICE:  c=123465789, d=987654321
                             QUERY PLAN                              
---------------------------------------------------------------------
 Result (actual time=0.040..0.041 rows=1 loops=1)
   Output: test('123465789'::bpchar, '987654321'::character varying)
 Planning Time: 0.023 ms
 Execution Time: 0.053 ms
(4 rows)

We can see there was a cast to BPCHAR. As a reminder, BPCHAR is an alias of CHARACTER
and it can represent a string up to 10,485,760 characters.

Now let’s make another test:

CREATE FUNCTION test(c character(4))
RETURNS character(4)
LANGUAGE sql
AS $$
select c;
$$;

As you can see, the language changed to SQL and the argument type and the return
type are CHAR(4). How does it execute?

SELECT test('123456789');
   test    
-----------
 123456789
(1 row)

EXPLAIN VERBOSE SELECT test('123456789');
                QUERY PLAN                 
-------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=32)
   Output: '123456789'::bpchar
(2 rows)

As you can see, even though you expect to process CHAR(4) data, you end up processing arbitrary length strings instead!!

However, do not rush to PostgreSQL mailing list to complain YET!

As a matter of fact, this behaviour is not a bug. The documentation states:

“The full SQL type syntax is allowed for declaring a function’s arguments and return value. However, parenthesized type modifiers (e.g., the precision field for type numeric) are discarded by CREATE FUNCTION. Thus for example CREATE FUNCTION foo (varchar(10)) … is exactly the same as CREATE FUNCTION foo (varchar) …”

This explains that CHARACTER(x) became CHARACTER aliased as BPCHAR. And as we saw, BPCHAR is not actually CHAR(1) but more like VARCHAR(10485760). This fully explains the behaviour.

Wait, wait , WAIT ! The original intention was to deal with CHAR(4) string - not any arbituary length strings.

Isn’t there any hope? No, sorry… (kidding.)

Reading the same documentation page, we see that “argtype” and “rettype” can be base, composite, or domain types, or can reference the type of a table column.

The trick is to create either a composite type or a domain to use as argtype or rettype.

Here are some examples:

-- Works in simple case trick
SELECT test( '12345789'::char(4) );

-- Domain trick
CREATE DOMAIN c4 AS char(4);
CREATE FUNCTION test(param c4)
RETURNS c4
AS $$
BEGIN
  RAISE NOTICE 'param=%', param;
  RETURN param;
END;
$$ LANGUAGE plpgsql;

SELECT test( '123456789' );
ERROR:  value too long for type character(4)

SELECT test( '123456789'::char(4) );
NOTICE:  param=1234
 test 
------
 1234
(1 row)

SELECT test( '123456789'::c4);
NOTICE:  param=1234
 test 
------
 1234
(1 row)

SELECT pg_typeof( test( '123456789'::char(4) ) );
NOTICE:  param=1234
 pg_typeof 
-----------
 c4
(1 row)

Now you should be happy with the result.

What? Not yet? Ok here is an additional trick.

-- Map a table structure
CREATE TABLE qq ( c char(4));

CREATE FUNCTION test(IN c qq, OUT d qq)
LANGUAGE sql
AS $$
SELECT c;
$$;

SELECT * FROM test(ROW('12345'));
ERROR:  value too long for type character(4)

SELECT * from test(ROW('1234'));
  c   
------
 1234

Hmm, OK, but how is this is different from the domain trick?

-- Easy Type Alteration
ALTER TABLE qq ALTER c TYPE char(5);

SELECT * FROM test( ROW('12345') );
   c   
-------
 12345

Try to ALTER a domain - you will see how (not) easy it is.

The table definition trick allows for some flexibility as follows:

ALTER TABLE qq ADD ee int;

SELECT test( ROW('12345', 4) );
   test   
----------
 (12345,4)

SELECT * FROM test( ROW('12345', 4) );
   c   | ee 
-------+----
 12345 |  4

We hope you enjoyed this article and that you learnt something new and interesting!

Welcome to the new Data Bene blog

2024-10-02T00:00:00Z

Exploring PostgreSQL, Open Source, and Innovation

At Data Bene, we’re excited to unveil the latest version of our blog, where we will delve into PostgreSQL, open-source technologies, and the dynamic world of startups. With this update, we’re eager to bring you a curated selection of technical insights, industry news, and success stories to inspire and inform.

As loyal supporters of open-source solutions, we champion PostgreSQL for its ability to drive innovation and efficiency in modern businesses. Through our articles and tutorials, we aim to empower users to unlock it’s full potential. We will discuss our favourite features as well as provide updates on the latest advancements—topics we believe help in leveraging this powerful relational database management system.

In today’s rapidly evolving tech landscape, open source has become synonymous with collaboration, innovation, and community-driven development. We’re strongly committed to showcasing the transformative potential of open source as a result.

Because of this, in our blog we will explore:

open-source tools that are part of the PostgreSQL ecosystem,
how we can improve and support open-source collaboration, and
projects to continue progressing in development that shape the future of open technology while avoiding vendor lock-in.

With years of experience working alongside a diverse range of companies, from emerging startups to global enterprises, we’ve seen firsthand how innovative organizations leverage PostgreSQL to drive their success. Our blog will feature not only insights from entrepreneurs and tech leaders sharing their experiences but also in-depth articles from our own PostgreSQL experts. They will discuss their favorite PostgreSQL features and native tools, the challenges they’ve faced, and how they overcame them, providing a unique, technical perspective on overcoming real-world challenges.

We’ll also dive into emerging trends and new technologies that are shaping the future of PostgreSQL, offering a window into how startups and established businesses alike are navigating the fast-evolving landscape of data management. Whether it’s through interviews with founders or analysis of groundbreaking projects, we’ll highlight how PostgreSQL continues to be a key enabler of innovation in today’s tech-driven world.

Finally, at Data Bene, sustainability isn’t just a buzzword—it’s a core value that guides our every endeavor. As we embark on new development initiatives and explore cutting-edge technologies, we’re committed to minimizing our environmental footprint and promoting eco-friendly practices. From optimizing software efficiency to reducing energy consumption, we’re dedicated to building a more sustainable future, one line of code at a time. Consequently, some of the articles we release will highlight these practices and open up the topic of sustainability within PostgreSQL development practices for discussion.

We invite you to join us on this exciting journey of exploration, discovery, and innovation. Whether you’re a PostgreSQL enthusiast, an open-source advocate, a startup aficionado, a long-time developer, or just curious, there’s something for everyone in our blog! So bookmark this page, follow us on LinkedIn and keep an eye out for our upcoming newsletter. Anywhere you follow, you’ll be able to stay tuned for regular updates packed with valuable insights and inspiration.

Thank you for being a part of our community. Together, let’s embark on a journey of growth, learning, and shared success.

Data Bene partner of pgDay Paris 2023

2023-03-14T00:00:00Z

Like every year Data Bene actively participates in pgDays which, new this year, will be over two days. Event that has become a must for all PostgreSQL enthusiasts! Data Bene is “Partner” of this event which gives you the opportunity to train, share experiences and discover the latest advances, functionalities of the open source relational database.
In its desire to contribute, Data Bene is also participating in the March 22nd day to deliver Tutorials (in French):

In the morning, Frédéric Delacourt will present “Everything you always wanted to know about Connection Pools (but were afraid to ask)”, from 9am to 12:15pm.

In the afternoon, it will be Cédric Villemain’s turn with “CREATE EXTENSION, yours preferably”, from 1:45pm to 5pm.

We are delighted to welcome you there. You will find the details of the training on the agenda of the 2 days, and you can register on the registration page.

We hope to see many of you there!

The Data Bene team.

Data Bene contributes to the PostgreSQL@CERN Meetup 2023

2023-01-11T00:00:00Z

Data Bene is a co-sponsor with CERN of the Meetup organized by the Geneva PostgreSQL User Group on Friday 13 January afternoon at CERN in Geneva. Cédric and Hélène will be happy to welcome you there or to count you among us in situ or via the webcast: https://webcast.web.cern.ch/event/i1221200

We will host a conference on Citus Data: Distributed database without complex.

Data Bene Partner Sponsor of pgDay Paris 2022

2022-03-23T00:00:00Z

Data Bene will be here all day, on March 24th 2022, as a Partner Sponsor of the event, to welcome you to THE community conference in France. We will also talk about Partitioning, don’t miss it.