Postgres is excellent. We use it heavily, and we trust it. It stores every link, every workspace, every billing record, every user. But at around the 90-day mark on a growing URL shortener, you run into a wall if you're also storing raw click events there. The wall isn't dramatic — Postgres doesn't fall over, it just gets slower in the specific, frustrating way that aggregate queries on large append-only tables get slower. Index scans stop helping. Autovacuum can't keep up. A dashboard that loaded in 800ms now loads in 4 seconds. You add more indexes. It gets worse.
This post explains why we moved click events to ClickHouse, what the schema looks like, how the ingestion path works, and where the sharp edges are. It's a companion to the redirect latency post and covers the data tier that sits downstream of the redirect — the part that turns raw click-fire-and-forget events into the dashboards on the analytics solutions page.
TL;DR#
- Click events are append-only and aggregate-query-shaped. Columnar storage wins on both counts.
- Postgres's row-per-event overhead, index bloat, and vacuum contention compound past ~30M events on a shared table.
- ClickHouse with
ORDER BY (workspace_id, link_id, created_at)returns 90-day per-workspace rollups in under 100ms with no extra indexes. - Link metadata stays in Postgres (mutable, transactional); click events stay in ClickHouse (append-only, analytical). The split is clean because the two don't need to overlap at write time.
Why Postgres struggles with this workload#
The workload shape is the problem. A click event is:
- Written once and never updated.
- Written at high, bursty rates — 50:1 write-to-read ratio is a rough estimate, and on a campaign launch day the ratio spikes further.
- Queried in aggregate: total clicks per link, geo breakdown for a workspace, device split across a campaign's top 100 links. Almost nobody queries a single click row directly.
Postgres is built for a different set of assumptions. Every row carries a tuple header, transaction ID fields (xmin, xmax), and visibility metadata. For transactional workloads where you update rows in place and need MVCC to handle concurrent reads and writes, that overhead is the price you pay for correctness. For append-only analytical data you never touch again, it's pure waste.
The index problem is worse. Postgres's B-tree indexes work well for selective point-lookups — finding the row with link_id = 'xyz'. They help less when you're running GROUP BY link_id, country across 10 million rows for a workspace. The index narrows the scan, but the query still has to pull decompressed row data from heap pages, deserialize it, aggregate it. At 30M rows the query is slow. At 100M rows it's painful.
Autovacuum makes this worse in practice. High-throughput inserts generate dead tuples (MVCC row versions that are no longer visible to active transactions but haven't been cleaned up yet). Autovacuum reclaims them, but it competes with live queries for I/O. You'll see periodic latency spikes on your Postgres dashboards that correspond to autovacuum running on the events table. The TimescaleDB compression and chunk architecture docs are worth reading for the specifics of how a Postgres extension tries to work around this — the problem is real enough that a dedicated time-series layer on top of Postgres exists to address it.
We ran clicks in Postgres through the first 90 days of the project. A workspace with ~4M click events had a dashboard that took 3.2 seconds to render a 7-day rollup by country. Adding a partial index on (workspace_id, created_at) got it to 1.4 seconds. Good enough at the time; clearly not a path that scales to 100M+ events per month.
Why ClickHouse fits#
ClickHouse is a columnar database engine built explicitly for analytical workloads — accessed 2026-05-12. Columnar means the data for each column is stored contiguously on disk rather than interleaved row-by-row. For a query like "sum click count by country for workspace X over the last 30 days", the engine only reads the country, workspace_id, and created_at columns. It never touches user_agent_hash or referrer_host. The I/O reduction on wide tables with many columns is substantial.
The compression ratio is the second property. Columnar data compresses significantly better than row data because repeated values in the same column — the same workspace_id appearing thousands of times, the same set of 60-odd country codes, a handful of os values — compress to almost nothing under ClickHouse's LZ4 and ZSTD codecs. Our click events compress to roughly 12-15% of their raw size on disk. The same data in Postgres sits at 35-45% of raw size with default toast compression. That's not just a storage win — smaller on-disk data means more of it fits in the OS page cache, which means fewer disk reads per query.
The third property is the MergeTree engine family — accessed 2026-05-12. MergeTree tables sort and cluster data by the table's ORDER BY key. Queries that filter or group on the leading columns of that key only read the data parts that overlap with the filter range. This is ClickHouse's equivalent of a clustered index, except it's mandatory (you define it at table creation time) and applies to every query, not just indexed columns.
We use ORDER BY (workspace_id, link_id, created_at). A query for workspace abc123's clicks in the last 7 days reads only the data parts where workspace_id = 'abc123' appears, then within those parts reads only the time-range slice. On 90 days of click data, this typically touches 3-8% of total data parts. The sparse primary index ClickHouse maintains over the ORDER BY columns makes this targeting efficient without requiring a dense B-tree for every query pattern. ReplicatedMergeTree adds HA — two ClickHouse nodes with ZooKeeper-equivalent coordination (ClickHouse Keeper), replicating parts as they're written.
The schema we ship#
CREATE TABLE clicks
(
workspace_id LowCardinality(String),
link_id String,
click_id UUID,
created_at DateTime64(3, 'UTC'),
country LowCardinality(FixedString(2)),
device LowCardinality(String),
os LowCardinality(String),
browser LowCardinality(String),
referrer_host String,
utm_source LowCardinality(String),
utm_medium LowCardinality(String),
utm_campaign String,
utm_term String,
utm_content String,
ip_truncated String,
user_agent_hash FixedString(64)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/clicks', '{replica}')
PARTITION BY toYYYYMM(created_at)
ORDER BY (workspace_id, link_id, created_at)
SETTINGS index_granularity = 8192;
A few schema decisions worth calling out.
LowCardinality(String) on workspace_id, country, device, os, browser, utm_source, utm_medium — these columns have bounded cardinality (hundreds of distinct workspace IDs, 250-odd country codes, a handful of device types). LowCardinality activates dictionary encoding, which stores the values as integer references into a dictionary of unique strings. Query speed on these columns improves materially; compression improves further. We learned this from the ClickHouse docs on low-cardinality data type — the rule of thumb is that any string column with fewer than 10,000 distinct values is a candidate.
PARTITION BY toYYYYMM(created_at) creates a separate data partition per calendar month. Old partitions can be dropped (or moved to slower storage) with a single ALTER TABLE clicks DROP PARTITION '202501' statement, which is instant — no full-table scan. For GDPR data-retention enforcement, this is practically important: deleting a workspace's historical clicks is a single operation per monthly partition rather than a distributed DELETE WHERE workspace_id = ?.
ip_truncated stores the last-octet-zeroed IP address (203.0.113.0 not 203.0.113.42). The full IP is never stored. The truncated form is sufficient for geo lookup — which we do at ingest time, before storage — and doesn't constitute personal data under the GDPR Article 4 definition when combined with our data minimisation practices. See solutions/analytics for the full privacy architecture.
The ingestion path#
The redirect handler at each edge POP is fire-and-forget. It appends a click event to a Redpanda topic (clicks.v1) and issues the 302. It does not wait for acknowledgement from ClickHouse. If Redpanda is unavailable at that moment, the click is dropped — the redirect is not. The hierarchy is deliberate: a missed click is recoverable (we can reconstruct approximate counts from access logs); a failed redirect is user-visible.
Redpanda consumer groups — accessed 2026-05-12 — provide the delivery guarantee between the edge and the ingester. The click-ingester service runs as a consumer group against the clicks.v1 topic. Partitions are assigned automatically; if an ingester instance fails, its partitions rebalance to surviving instances within the group's heartbeat timeout.
The ingester accumulates events in memory and flushes in batches: 1,000 rows or 250ms, whichever comes first. This is important. ClickHouse's MergeTree engine is optimized for bulk inserts — it writes each insert as a new data part, and lots of tiny inserts generate lots of tiny parts, which triggers aggressive background merging and slows queries during the merge load. The ClickHouse docs recommend inserts of at least 1,000 rows to keep the number of parts manageable. We flush at 1,000 rows / 250ms and have not seen part-count issues in normal operation.
Backpressure propagates through Redpanda. If ClickHouse is slow or restarting, the ingester's consumer lag increases, which we monitor as a leading indicator of data freshness problems. Redpanda's retention configuration (14 days) gives us a meaningful replay window if ClickHouse needs to be rebuilt from scratch.
The architecture is described in more detail at /docs/architecture/click-ingester.
Postgres vs ClickHouse: the split#
Two databases for one service is a cost that needs justification. The justification here is that the two workloads are genuinely incompatible at scale, and the data doesn't overlap in a way that requires synchronous coordination.
Postgres owns link metadata: the links, workspaces, users, billing, rules tables. These are mutable (links get archived, destination URLs change, billing records get updated), transactional (creating a link and minting its slug must be atomic), and read-heavy relative to writes. B-tree indexes and MVCC are exactly what you want.
ClickHouse owns click events: the clicks table. These are append-only (a click, once written, is never updated), write-heavy, and queried in aggregate patterns that columnar storage handles better. There are no cross-database joins at write time — the ingester inserts click events with workspace_id and link_id as denormalized string fields. At query time, the analytics API may look up link metadata from Postgres and click counts from ClickHouse separately, then join in application code. That's one extra network round trip; it's a much better trade than putting the analytical query load on Postgres.
The Citus distributed Postgres documentation describes the alternative: distributed Postgres extensions designed to handle time-series analytical workloads. We evaluated this path and concluded that ClickHouse's query performance and compression ratio are far enough ahead for pure-analytical workloads that the operational overhead of two databases is worth it. If your workload is mixed — some analytical, lots of row-level mutations — the Citus approach makes more sense.
Query patterns we hit hourly#
The dashboard refreshes for each workspace on approximately a 60-second cycle. There are three heavy queries:
Per-link click rollup, last 24 hours. The top of every link list shows a sparkline. This is a GROUP BY link_id, toHour(created_at) over the last 24 hours for the workspace.
Country breakdown for a campaign. A marketing team checking campaign performance wants GROUP BY country, COUNT(*) filtered by utm_campaign over a date range.
Device split across a workspace's top-100 links. The analytics tab shows a device/OS breakdown. This is GROUP BY device, os, COUNT(*) for the workspace's highest-traffic links over the last 30 days.
All three patterns are served well by the ORDER BY (workspace_id, link_id, created_at) clustering. The workspace filter prunes the scan to the right slice of data immediately; the link_id and created_at columns further narrow it for the first two patterns. The device split query scans more columns but still benefits from the workspace-level clustering.
A representative query for the top-100 links by click count over the last 24 hours:
SELECT
link_id,
count() AS clicks_24h
FROM clicks
WHERE
workspace_id = 'ws_abc123'
AND created_at >= now() - INTERVAL 1 DAY
GROUP BY link_id
ORDER BY clicks_24h DESC
LIMIT 100
On a workspace with 12 million click events spanning 90 days, this query returns in approximately 45-80ms depending on the number of data parts in the relevant partition. The same query against a Postgres table of equivalent size runs 2-4 seconds with the best index configuration we tried.
We do not use ClickHouse materialized views for these patterns. The ORDER BY clustering and partition pruning are sufficient, and materialized views add schema-migration complexity. If query latency ever becomes a problem at higher event volumes, materialized views are the next lever — not additional indexing.
Failure modes we've hit#
Three production incidents are worth documenting.
ClickHouse Keeper blip. ClickHouse Keeper is the ZooKeeper-equivalent coordination layer for ReplicatedMergeTree. On one occasion, a Keeper node restart during a background merge caused the replicated table to pause accepting inserts for approximately 90 seconds while the new leader elected. The ingester retried with exponential backoff (2s, 4s, 8s, cap 60s) — this is standard Go retry logic with context.DeadlineExceeded handling. During the outage window, about 1,800 click events backed up in the ingester's in-memory buffer and were successfully flushed once Keeper recovered. The only user-visible effect was a ~2-minute gap in live dashboard data. The fix was ensuring Keeper nodes are spread across separate physical hosts with separate power circuits on the Hetzner network.
Partition imbalance on Redpanda. The clicks.v1 topic uses 12 partitions. After adding a second ingester instance, partition assignment by Redpanda's default balancer assigned 9 partitions to one instance and 3 to the other — it used the total-partitions-per-consumer strategy and the instance counts happened to produce a 3:1 imbalance. This wasn't a data-loss risk, but the over-assigned instance was batching slowly under the load, increasing click processing latency to about 800ms. The fix was a manual partition rebalance using the Redpanda admin API and switching to the rack-aware balancer to use round-robin assignment.
ALTER TABLE on a large table is async. We added a browser_version column in January. ClickHouse's ALTER TABLE ... ADD COLUMN acknowledges immediately and applies the schema change asynchronously over background part merges. For about 20 minutes after the alter, queries that referenced the new column returned NULL for rows in unmutated parts. Nothing broke — we were adding a column, not changing an existing one — but a query that assumed browser_version IS NOT NULL would have returned incorrect results. The lesson: treat ALTER TABLE on large ClickHouse tables as eventually consistent, not synchronous.
What we don't do in ClickHouse#
Two things that come up in conversations and that we've deliberately avoided.
Per-event joins to Postgres tables. Some analytics queries naturally want to join click events to link metadata — "show me all clicks for links tagged 'campaign-q2', with their destination URLs". Doing this as a federated query (ClickHouse clicks joined to Postgres links via a ClickHouse JDBC table function) is possible but slow. Instead, we either denormalize relevant fields into the click event at ingest time (like utm_campaign, which is derived from the link's UTM template) or resolve the join in application code: two separate queries, merged in the analytics API layer. This keeps both databases operating inside their performance envelopes.
Per-row updates. ClickHouse supports ALTER TABLE UPDATE for row-level mutations, but it's an async, resource-intensive operation that works against the engine's assumptions. If you find yourself needing to update a click event after the fact — say, to retroactively mark it as a bot click — you've built the architecture wrong. The right pattern is to store a separate click_quality table in ClickHouse that maps click_id to a quality score, computed by the url-scanner service asynchronously, and JOIN them at query time. Reads are cheap; retroactive updates are expensive and should be rare.
For the broader infrastructure picture — how the edge redirect interacts with Redpanda, how click deduplication works, and how the click-ingester service fits into the service topology — see /docs/architecture/click-ingester. The edge-side story (fire-and-forget Redpanda publish, latency budget) is in the redirect p95 post. For what the data enables downstream — geo breakdowns, device splits, UTM attribution dashboards — the solutions/analytics page covers the product surface. If you're building link campaigns against the API and care about which query patterns the SDK exposes, solutions/developers is the relevant entry point.
For teams evaluating Elido as link infrastructure, the architecture choice here — ClickHouse for clicks, Postgres for everything else — is also what allows us to offer no-sampling analytics. The smart links explained post covers how the routing decisions that generate these click events work at the edge.
Marius Voß is DevRel and edge infra at Elido. He helped design the click-ingester pipeline and has spent more time than he'd like debugging ClickHouse Keeper election timing.