Hitting p95 < 15ms for redirects from FRA, ASH, and SGP

A redirect is a synchronous block. The user clicks your short link, their browser stalls, and nothing else happens until the 302 arrives and their next pageload can begin. The redirect is not a background task you can deprioritize. Every millisecond you add here is a millisecond subtracted from the page that actually matters.

That's why we set a hard budget before we wrote the first line of services/edge-redirect: p50 5ms, p95 15ms on a cache hit, measured at the POP, excluding TLS full handshake. Not aspirational. If something pushes us over the line, it gets removed or moved to an async path.

We've been running three production regions — Frankfurt (FRA), Ashburn (ASH), and Singapore (SGP) — for several months now. This post is a full account of how the hot path works, why the numbers look the way they do, and what we got wrong the first time around.

TL;DR#

The hot path is Go + fasthttp on Hetzner FRA/ASH and OVH SGP, behind Caddy with anycast routing. No synchronous bot scoring, no JS challenge on the redirect path.
Two-tier cache: in-process ristretto LRU (L1, ~88% hit rate) backed by Redis Cluster (L1+L2 combined ~99.4%). Origin gRPC to api-core on cold miss only (~0.6% of requests).
90-day p95 by region: FRA 12.1ms, ASH 13.4ms, SGP 14.2ms. Cold miss adds ~22ms at p95, still within budget.
Cache invalidation on link mutation is Redis pub/sub, sub-second propagation p99. L1 TTL is 60 seconds as a safety net.

Why a 15ms ceiling#

Before getting into architecture: why 15ms and not 50ms or 5ms?

The 5ms floor is straightforward — that's roughly what physical network transit costs at the median for a European visitor hitting a Frankfurt POP. You can't undercut physics. The 50ms ceiling is too loose — at 50ms p95, you're adding a noticeable stall before every pageview for a meaningful fraction of your traffic. Research on web performance consistently shows that sub-50ms network delays start becoming perceptible on mobile devices where radio latency compounds with processing time, a point Apple's network-aware programming guidelines make explicitly.

The 15ms number landed from a few concrete constraints. First, redirects compound. If a marketing campaign sends traffic through a shortened link that then redirects to a product page, the redirect latency adds to the TTFB of the landing page. Google's Core Web Vitals use LCP as a primary signal, and a redirect chain that adds 50ms at p95 is measurable. Second, we want enough budget margin to run rule evaluation for smart links inline on the hot path — the routing dimensions (country, device, OS, language, time, referrer) need to execute within the same latency envelope as a plain redirect, or we'd have to strip smart link support from the edge. At 15ms with a ~0.3ms rule evaluation cost, there's room.

The 15ms budget applies to cache-hit traffic. Cold misses are allowed to be slower — the origin gRPC call adds latency — but cold misses by design are rare enough that they don't meaningfully move the p95.

The architecture#

Three POPs, each with the same binary: services/edge-redirect, written in Go using fasthttp. fasthttp's server throughput is roughly 8x net/http in the benchmark suite and, more practically for us, its zero-alloc request path keeps GC pauses predictable under sustained load. The standard library's net/http is fine for most services; for a redirect handler that needs to maintain sub-millisecond processing time at high concurrency, avoiding per-request heap allocation is worth the less ergonomic API.

Caddy sits in front as the TLS terminator and reverse proxy. On-demand TLS for tenant custom domains (described in detail on the custom domains feature page) provisions certificates on first request. We evaluated HAProxy and nginx as alternatives — both are fast, both have mature anycast deployment patterns, but Caddy's on-demand TLS is the cleanest path to zero-touch certificate lifecycle for an arbitrary number of customer domains, and that matters more to us than squeezing another fraction of a millisecond at the proxy layer.

Anycast routing means that when a visitor hits f.elido.me, s.elido.me, or b.elido.me, the DNS resolves to a shared anycast prefix and the network routes the TCP connection to the nearest POP. There is no application-layer geo-routing logic: the network does the POP selection. Cloudflare's anycast primer is the clearest public explanation of why this matters — the key property is that failover is handled at the BGP layer, not by DNS TTL expiry. If FRA loses connectivity, ASH becomes the shortest path for European traffic within seconds, not minutes. Hetzner's cloud network infrastructure docs cover the underlying routing setup for their FRA and ASH regions.

Importantly: there is no synchronous bot scoring on the hot path. A bot-scoring check that takes 10ms would single-handedly destroy the p95 budget. All traffic-quality signals — anonymizer detection, hosting ASN scoring, click deduplication — run in url-scanner and click-ingester as cold-path async workers. The redirect fires and the click goes on the Redpanda queue; the quality adjudication happens after the fact.

The two-tier cache#

The cache is where the budget lives. The logic:

// Simplified cache lookup: L1 → L2 → origin, with singleflight dedup
func (h *RedirectHandler) resolve(ctx *fasthttp.RequestCtx, slug string) (*Link, error) {
    // L1: in-process ristretto LRU — sub-microsecond on hit
    if link, ok := h.l1.Get(slug); ok {
        return link.(*Link), nil
    }

    // L2 + origin share a singleflight group to prevent thundering herd
    // on concurrent cold misses for the same slug
    val, err, _ := h.sf.Do(slug, func() (interface{}, error) {
        // L2: Redis Cluster — single RTT, typically 0.3–0.8ms within POP
        if data, err := h.redis.Get(ctx, cacheKey(slug)).Bytes(); err == nil {
            link, err := unmarshalLink(data)
            if err == nil {
                h.l1.Set(slug, link, linkCost(link))
                return link, nil
            }
        }

        // Origin: gRPC to api-core — cold miss, ~20ms extra
        link, err := h.origin.GetLink(ctx, &pb.GetLinkRequest{Slug: slug})
        if err != nil {
            return nil, err
        }
        payload, _ := marshalLink(link)
        h.redis.Set(ctx, cacheKey(slug), payload, redisTTL)
        h.l1.Set(slug, link, linkCost(link))
        return link, nil
    })
    if err != nil {
        return nil, err
    }
    return val.(*Link), nil
}

L1 is ristretto, Dgraph's admission-controlled LRU cache. The admission controller matters: a naive LRU under a scan workload (a bot hitting thousands of unique slugs) will evict hot entries to make room for cold ones that will never be requested again. Ristretto's TinyLFU-based admission policy resists this — it tracks frequency counters cheaply and refuses to admit an entry that's never been seen before when the cache is under pressure. The net effect is that cache hit rate under adversarial scan traffic stays near the organic hit rate rather than collapsing.

L2 is Redis Cluster. Each POP has its own cluster instance to keep cross-region traffic out of the hot path. FRA and ASH share a separate Redis instance for pub/sub invalidation signals (more on that below); SGP has its own. A single Redis GET within the same datacenter is reliably under 1ms. The combined L1+L2 hit rate sits at approximately 99.4% over the past 90 days — meaning origin calls happen on roughly 1 in 167 requests.

For the solutions/developers use case — teams using the API to mint links at high volume — the practical implication is that a freshly created link will experience one cold miss per POP, then be warm for the duration of its TTL. Links that see no traffic expire out of both caches cleanly without manual eviction.

Where the 15ms goes#

The diagram below breaks down the p95 cache-hit budget by phase:

Horizontal stacked bar showing the 15ms p95 cache-hit budget decomposed into TLS resume 2ms, L1 lookup 0.4ms, header build 1ms, network return 9ms, and margin 2.6ms. Illustrative FRA median values.

The dominant segment is network return — roughly 9ms median, meaning the physical distance between the visitor and the POP accounts for 60% of the budget. We can't compress this. Multi-region deployment is the only lever: adding a POP reduces the median RTT for visitors in that region. The next region on the roadmap reduces SGP p95 for South Asian traffic, where we're currently routing 14ms because Singapore is the nearest POP.

TLS session resumption at 2ms assumes TLS 1.3 0-RTT with a session ticket already in hand. For a first visit from a given device, a full TLS handshake adds roughly 10-15ms on top — that's why the 15ms budget explicitly scopes to cache-hit + resumed-session traffic, which is the vast majority of click traffic in practice. RFC 7234 governs caching semantics for the HTTP layer; notably, 302 responses are not stored by browser caches by default (§4.2.2), which is the correct behaviour for our use case — every redirect request reaches the edge, every redirect gets its own routing decision, no stale destination in the browser cache.

The 2.6ms margin is real operational headroom, not padding. Under Go's GC, occasional stop-the-world pauses on the order of 0.5-1ms are expected even with tuned GOGC settings. Caddy's proxy overhead adds a small fixed cost. The margin keeps us from breaching the budget when these effects compound.

Cache invalidation#

Redis pub/sub is the mechanism. When a link is mutated in api-core — destination changed, targeting rules updated, link archived — the mutation handler publishes to a link:invalidate channel with the slug as the payload. Every edge POP subscribes to this channel. On receipt, the subscriber calls l1.Del(slug) and redis.Del(cacheKey(slug)). The next request to that slug repopulates both tiers from origin.

The 60-second L1 TTL is the fallback, not the primary mechanism. If the pub/sub subscriber is down — say, a Redis blip or a network partition between the POP and the pub/sub instance — the entry expires from L1 within at most 60 seconds. L2 TTL is set to 300 seconds, so a subscriber outage means up to 5 minutes of potentially stale L2 data, during which the L1 TTL is the only safety net. We alert on pub/sub subscription loss within 30 seconds.

For smart links with time-windowed rules, staleness has a specific implication: if a rule activates at 17:00 and the edge POP's L1 has the previous rule version cached with up to 60 seconds of remaining TTL, traffic between 17:00 and 17:01 may go to the pre-update destination. The pub/sub path eliminates this for the common case; the 60-second TTL catches the edge case. For campaigns where the timing boundary matters precisely, the recommended pattern is to use status=disabled on the old rule, wait one TTL cycle (60 seconds), then activate the new one. We added a polling endpoint at GET /v1/links/{id}/cache-status so pipelines can confirm propagation before proceeding.

Real-region measurements#

The following numbers come from demo-workspace data collected over 90 days ending 2026-05-12. They reflect cache-hit traffic only. All timestamps are UTC.

Region	POP	p50	p95	p99
EU (Frankfurt)	FRA · Hetzner	4.8ms	12.1ms	18.4ms
US East (Ashburn)	ASH · Hetzner	5.2ms	13.4ms	20.1ms
SE Asia (Singapore)	SGP · OVH	5.6ms	14.2ms	22.8ms

FRA is the fastest because the majority of the workload is European, so the median RTT is lower. SGP serves a broader geographic spread — Southeast Asian traffic has lower RTT, while South Asian and East Asian traffic adds to the tail.

The p99 numbers exceed the 15ms budget. That's deliberate. The p95 is the budget, not p99. The p99 is shaped by outlier conditions: cellular handoffs, TCP retransmissions, the occasional Redis latency spike. We monitor p99 but we don't SLA against it. The engineering decision is that p95 captures the experience for "nearly everyone nearly all the time", and optimising the last 1% would require eliminating sources of natural network variability that aren't under our control.

Cold miss p95 is approximately 22ms. This is the floor we can achieve given that origin gRPC adds a same-datacenter round trip (FRA → FRA over private network is approximately 0.3ms) plus the api-core Postgres lookup (typically 1-3ms for a keyed slug lookup). The 22ms figure is measured, not estimated; it's within the budget we allow for cold-miss paths, which is set at 35ms p95.

For teams evaluating multi-region analytics, these latency numbers are available as a Prometheus metric (redirect_duration_seconds with region and cache_tier labels) from the metrics endpoint.

Failure modes we didn't blog about the first time#

Thundering herd on key expiry#

Before we added singleflight, a slug expiring from both L1 and L2 simultaneously under moderate traffic would generate a burst of concurrent origin gRPC calls — each one doing a Postgres read for the same slug, all returning the same result. Under load testing, this produced spikes in api-core CPU that had nothing to do with link creation volume. The singleflight group collapses concurrent misses for the same slug into a single origin call. The other waiting goroutines block on the group and get the same result when it resolves. The implementation is the standard Go golang.org/x/sync/singleflight package.

We got this wrong in the first prototype. A thundering herd under key expiry is one of those failure modes that doesn't appear in unit tests — it only shows up under realistic concurrency. Adding it to this post because it's a common omission in cache architecture writeups and the fix is genuinely simple.

Redis blip fallback#

If a POP loses connectivity to its Redis cluster, the fallback is not an error — the code path degrades to L1-only plus direct origin gRPC on L1 miss. The POP keeps serving. The hit rate drops because L2 is unavailable, so origin call volume spikes, but the redirect path stays functional. The Redis blip path has been exercised twice in production (both were Hetzner maintenance windows). Peak origin call rate during the second incident was approximately 8x baseline for the duration of the blip (~4 minutes). api-core handled it without scaling events.

DNS propagation during POP failover#

Anycast failover is BGP-layer — no DNS TTL to wait out, no application-layer health check timeout in the request path. A POP going offline triggers BGP withdrawal of the route, and network traffic shifts to the next-nearest POP within the BGP convergence window (typically 15-90 seconds depending on the number of network hops to the affected path). The relevant operational parameter is our health-check interval: we run TCP health checks every 10 seconds per POP. A check failure triggers the withdrawal. A 10-second check interval means a crashed POP can serve up to 10 seconds of failed traffic before withdrawal. We've tested this boundary deliberately; the actual impact in the two production incidents was below the check interval.

What we don't do on the hot path#

Every item that isn't on the hot path is a deliberate choice, not an omission.

Synchronous click writes. Clicks are fire-and-forget to Redpanda. The redirect handler appends a click event to a Kafka topic (clicks.raw) with the slug, timestamp, truncated IP, and user-agent hash, then responds with the 302. The write is non-blocking. If Redpanda is unavailable, the click is dropped — not the redirect. We've made the conscious trade that click loss under infrastructure failure is acceptable and redirect failure is not. The click-ingester consumer processes the Redpanda topic and writes to ClickHouse. This is why the analytics data for a given click event is available with a short lag (typically under 5 seconds), not instantly.

Inline bot challenges. A bot challenge adds 10-50ms of synchronous work at minimum — JavaScript challenges add a full round trip. We don't do either on the redirect path. The url-scanner service processes traffic-quality signals asynchronously. For solutions/developers teams building link campaigns, this means the redirect is never gated behind a challenge that degrades the experience for legitimate traffic.

Schema validation at redirect time. The destination URL and targeting rules are validated at write time, when the link is created or updated via api-core. By the time a slug lands in the cache, its structure is known-valid. There is no JSON schema validation, no URL-parse step, no rule syntax check at redirect time. The edge binary trusts the cache entry completely. This is only safe because the write path validates before admission to the cache.

The unsexy parts#

Three things we don't write enough about, because they're boring to read and important to get right.

Cache size budgets. ristretto is initialised with an explicit cost budget in bytes, not a simple item count. Each cached link is costed by its serialized size, which varies with the number of targeting rules. A link with no rules costs approximately 200 bytes; a link with 6 targeting rules costs closer to 800 bytes. The budget is set to consume at most 10% of the instance's available RAM, leaving headroom for the Go runtime, Caddy, and connection buffers. Getting this wrong causes cache thrashing: a too-small budget evicts entries before the TTL expires, pushing traffic toward L2 and origin.

GC tuning under load. Go's garbage collector is well-tuned by default, but the default GOGC=100 triggers GC at twice the live heap size. For a redirect handler where the live heap is small but allocation rate is moderate (fasthttp is zero-alloc on the hot path, but there are object allocations for click events and gRPC calls), the GC fires more frequently than necessary. We run GOGC=400 in production. The effect is longer GC cycles but lower frequency — which matters for tail latency. A GC cycle that takes 2ms and happens once every 4 seconds adds a smaller contribution to the p99 than a 1ms cycle every second. We verified this empirically with make bench before setting it in the deployment config.

The make bench discipline. The edge binary has a benchmark suite (go test -bench=. -benchmem ./... from within services/edge-redirect). Every proposed change to the hot path — adding a new header, changing the cache key format, adjusting the rule evaluator — runs through the benchmarks before merge. A change that adds 0.5ms to the p50 benchmark is a change that moves the p95 in production. The benchmark is the gate, not a post-hoc check. We got lax about this once, in a refactor that changed the slug normalisation logic, and shipped a 1.2ms regression that showed up in the region dashboards two days later. The regression was real and the lesson stuck.

The architecture decisions here are documented in more detail at /docs/architecture/edge-redirect. If you're evaluating Elido as a redirect infrastructure layer for a high-volume campaign or a developer platform, the solutions/developers page covers the API surface and SDK options. For a look at what the two-tier cache implies for smart link behaviour — particularly the propagation window for rule changes — the smart links explained post covers that in depth.

Marius Voß is DevRel and edge infra at Elido. He was one of the engineers who shipped the edge-redirect binary from prototype to production and has been staring at its latency dashboards ever since.