edge-redirect

edge-redirect is the only Elido service on the request path of a public redirect. It runs in Hetzner FRA, Hetzner ASH, and OVH SGP behind Caddy, and answers every https://elido.me/<slug> request with a 302 to the destination URL. Everything else is cold-path.

This page is the design rationale: what’s in the binary, what’s not, and why the latency budget shapes the implementation.

1. Latency budget

Stage	p50	p95	Notes
TLS handshake (warm)	0.5ms	2ms	Caddy session resumption + TLS 1.3
Anycast DNS + edge ingress	—	—	Bounded by the visitor’s network
L1 cache lookup	0.2ms	0.5ms	In-process LRU, ristretto
L2 cache lookup (on L1 miss)	1.5ms	3ms	Redis Cluster, same datacenter
Origin fetch (on L2 miss)	12ms	30ms	gRPC to api-core, rare
Smart-link rule eval	0.3ms	1ms	Same process, no extra hop
Response write + click publish	1ms	3ms	redpanda is fire-and-forget
Total cache hit	5ms	15ms	Excluding TLS

The 15ms p95 is hard. Anything that pushes us over — synchronous DB reads, blocking I/O on the click event, regex compilation per request — gets ripped out.

2. What’s in the binary

Just three packages of any consequence:

internal/cache — L1 LRU + L2 Redis client, ~600 LOC
internal/rules — smart-link rule evaluator, ~400 LOC
internal/redirect — request handler + click publisher, ~300 LOC

Plus a thin cmd/edge-redirect main and the standard library. The binary is 18MB stripped; cold start is <200ms. Rolling restarts finish in 30 seconds across the cluster.

3. What’s NOT in the binary

By design, on the hot path:

No SQL — Postgres is in the cold-path origin, never touched on a cache hit.
No JSON parsing on the hot path — link metadata is stored in Redis as a packed binary format (msgpack) and decoded zero-allocation.
No regex compilation per request — user-agent regexes from smart-link rules compile once at link-load time; the compiled matchers live in the cache entry.
No outbound HTTP — clicks are written to redpanda, not to external pixels. Pixels fan out from cold-path workers later.
No auth — the redirect endpoint is public by definition. The only thing we do is rate-limit per-IP via Caddy, before the request even hits us.

Everything above either runs in cold-path workers or doesn’t run at all.

4. Cache architecture

Two layers, both lazy.

L1 — ristretto LRU per process. ~50,000 entries by default, TTL 60 seconds for rule-bearing links and 5 minutes for plain ones. The TTL difference matters because rule changes need to propagate quickly; plain link destinations rarely change.

L2 — Redis Cluster, one cluster per region. Same TTLs, but authoritative. On L1 miss the request fetches from L2 and populates L1 on the way back; on L2 miss it falls through to api-core via gRPC, populates both layers, and serves the redirect.

Cache invalidation is a publish to a Redis pub/sub channel (link:invalidate); every edge POP subscribes and evicts matching L1 entries within 1 second.

5. Click publishing

Clicks are fire-and-forget into redpanda. The publish call is async from the request handler’s perspective — it adds the event to a per-process buffer and returns immediately. A separate goroutine drains the buffer every 100ms or 1000 events, whichever comes first.

If redpanda is unreachable, the buffer fills up and the goroutine spills the overflow to a local on-disk WAL. When redpanda recovers, the WAL drains. We accept up to 5 minutes of buffered clicks before alerting.

This means no click is ever lost on a single-region redpanda outage, and the redirect path doesn’t slow down.

6. Smart-link rules

The rule engine is part of the same process — there is no separate “rules service” to call. Rule evaluation adds <1ms even when a link has six rules, because the compiled matchers are inline in the cache entry.

First-match wins. A rule matches when all its match conditions are true (countries, devices, OS, browsers, languages, time windows, referrer hosts). Rules are evaluated in order set in the dashboard.

If no rule matches, the link’s default_destination_url is used. The default is required — a smart link never produces a 404.

See Smart links for the rule schema and authoring details.

7. Deployment topology

Three POPs as of 2026-05:

FRA (Frankfurt, Hetzner) — primary EU
ASH (Ashburn, Hetzner) — primary US
SGP (Singapore, OVH) — primary APAC, Business+ region

Each POP runs 4-12 edge-redirect pods on bare metal (not VMs — the latency budget doesn’t survive virtualization on the network path). Caddy in front handles TLS termination, Brotli, and rate limiting.

Anycast DNS (BGP via Hetzner + OVH) routes visitors to the nearest POP. Failover between POPs is automatic via BGP withdraw; there’s no DNS-based failover.

8. What we measure

The metrics that matter:

elido_redirect_latency_seconds{cache="l1|l2|origin"} — split by cache layer because the three populations are entirely different
elido_redirect_requests_total{status,cache} — error rate + cache hit ratio
elido_link_cache_size — keep ratio above 95% L1 hit on steady-state traffic
elido_click_publish_lag_seconds — should be near-zero; alert at >5s

See Observability for the full Grafana dashboard.