Story

What Is a Collation, and Why Is My Data Corrupt? – PG Phridays with Shaun Thomas

pgedge_postgres Friday, April 03, 2026

Postgres has relied on the OS to handle text sorting for most of its history. When glibc 2.28 shipped in 2018 with a major Unicode collation overhaul, every existing text index built under the old rules became invalid... but silently. No warnings, no errors. Just wrong query results and missed rows.

Postgres 17 added a builtin locale provider that removes the external dependency entirely:

initdb --locale-provider=builtin --locale=C.UTF-8

This change helps sorting to become stable across OS upgrades. glibc is still the default in Postgres 18, so this must be specified when creating a new cluster.

For clusters already running: Postgres 13+ will log a warning when a collation version changes. That warning is an instruction to rebuild affected indexes.

Get more details here in this week's PG Phriday blog post from Shaun Thomas: https://www.pgedge.com/blog/what-is-a-collation-and-why-is-my-data-corrupt

3 0
Read on Hacker News