Optimizing Postgres for High-Throughput Logs

ViberzFix processes over 100 million log events per day. When we first started, we used a simple Postgres setup that worked fine. But as we scaled, we hit every Postgres anti-pattern in the book.

The Autovacuum Problem

Postgres uses MVCC (Multi-Version Concurrency Control) for transaction isolation. This means deleted and updated rows aren't immediately removed—they become dead tuples that autovacuum must clean up.

sql

-- Check dead tuple ratio
SELECT relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / 
             nullif(n_live_tup, 0) * 100, 2) as dead_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
ORDER BY dead_ratio DESC;

Tuning WAL for Write-Heavy Workloads

Write-Ahead Logging is Postgres's durability mechanism. Every write goes to the WAL before being applied to tables. For high-throughput workloads, the default settings are far too conservative.

Increase wal_buffers to 256MB for write-heavy workloads
Set checkpoint_completion_target to 0.9
Use synchronous_commit = off if you can tolerate small data loss
Consider partitioning tables by time for easier maintenance

Results

After these optimizations, our write throughput increased by 8x and our p99 latency dropped from 200ms to 15ms. Postgres continues to be our primary datastore, and we're more confident than ever in our choice.

Optimizing Postgres for High-Throughput Logs

The Autovacuum Problem

Tuning WAL for Write-Heavy Workloads

Results

Related Articles

Why We Rebuilt Our Entire Build System in Rust