edge-redirect
edge-redirect is the only Elido service on the request path of a
public redirect. It runs in Hetzner FRA, Hetzner ASH, and OVH SGP
behind Caddy, and answers every https://elido.me/<slug> request
with a 302 to the destination URL. Everything else is cold-path.
This page is the design rationale: what’s in the binary, what’s not, and why the latency budget shapes the implementation.
1. Latency budget
| Stage | p50 | p95 | Notes |
|---|---|---|---|
| TLS handshake (warm) | 0.5ms | 2ms | Caddy session resumption + TLS 1.3 |
| Anycast DNS + edge ingress | — | — | Bounded by the visitor’s network |
| L1 cache lookup | 0.2ms | 0.5ms | In-process LRU, ristretto |
| L2 cache lookup (on L1 miss) | 1.5ms | 3ms | Redis Cluster, same datacenter |
| Origin fetch (on L2 miss) | 12ms | 30ms | gRPC to api-core, rare |
| Smart-link rule eval | 0.3ms | 1ms | Same process, no extra hop |
| Response write + click publish | 1ms | 3ms | redpanda is fire-and-forget |
| Total cache hit | 5ms | 15ms | Excluding TLS |
The 15ms p95 is hard. Anything that pushes us over — synchronous DB reads, blocking I/O on the click event, regex compilation per request — gets ripped out.
2. What’s in the binary
Just three packages of any consequence:
internal/cache— L1 LRU + L2 Redis client, ~600 LOCinternal/rules— smart-link rule evaluator, ~400 LOCinternal/redirect— request handler + click publisher, ~300 LOC
Plus a thin cmd/edge-redirect main and the standard library. The
binary is 18MB stripped; cold start is <200ms. Rolling restarts
finish in 30 seconds across the cluster.
3. What’s NOT in the binary
By design, on the hot path:
- No SQL — Postgres is in the cold-path origin, never touched on a cache hit.
- No JSON parsing on the hot path — link metadata is stored in Redis as a packed binary format (msgpack) and decoded zero-allocation.
- No regex compilation per request — user-agent regexes from smart-link rules compile once at link-load time; the compiled matchers live in the cache entry.
- No outbound HTTP — clicks are written to redpanda, not to external pixels. Pixels fan out from cold-path workers later.
- No auth — the redirect endpoint is public by definition. The only thing we do is rate-limit per-IP via Caddy, before the request even hits us.
Everything above either runs in cold-path workers or doesn’t run at all.
4. Cache architecture
Two layers, both lazy.
L1 — ristretto LRU per process. ~50,000 entries by default,
TTL 60 seconds for rule-bearing links and 5 minutes for plain
ones. The TTL difference matters because rule changes need to
propagate quickly; plain link destinations rarely change.
L2 — Redis Cluster, one cluster per region. Same TTLs, but authoritative. On L1 miss the request fetches from L2 and populates L1 on the way back; on L2 miss it falls through to api-core via gRPC, populates both layers, and serves the redirect.
Cache invalidation is a publish to a Redis pub/sub channel
(link:invalidate); every edge POP subscribes and evicts
matching L1 entries within 1 second.
5. Click publishing
Clicks are fire-and-forget into redpanda. The publish call is
async from the request handler’s perspective — it adds the
event to a per-process buffer and returns immediately. A separate
goroutine drains the buffer every 100ms or 1000 events, whichever
comes first.
If redpanda is unreachable, the buffer fills up and the goroutine spills the overflow to a local on-disk WAL. When redpanda recovers, the WAL drains. We accept up to 5 minutes of buffered clicks before alerting.
This means no click is ever lost on a single-region redpanda outage, and the redirect path doesn’t slow down.
6. Smart-link rules
The rule engine is part of the same process — there is no separate “rules service” to call. Rule evaluation adds <1ms even when a link has six rules, because the compiled matchers are inline in the cache entry.
First-match wins. A rule matches when all its match conditions
are true (countries, devices, OS, browsers, languages, time
windows, referrer hosts). Rules are evaluated in order set in
the dashboard.
If no rule matches, the link’s default_destination_url is used.
The default is required — a smart link never produces a 404.
See Smart links for the rule schema and authoring details.
7. Deployment topology
Three POPs as of 2026-05:
- FRA (Frankfurt, Hetzner) — primary EU
- ASH (Ashburn, Hetzner) — primary US
- SGP (Singapore, OVH) — primary APAC, Business+ region
Each POP runs 4-12 edge-redirect pods on bare metal (not VMs — the latency budget doesn’t survive virtualization on the network path). Caddy in front handles TLS termination, Brotli, and rate limiting.
Anycast DNS (BGP via Hetzner + OVH) routes visitors to the nearest POP. Failover between POPs is automatic via BGP withdraw; there’s no DNS-based failover.
8. What we measure
The metrics that matter:
elido_redirect_latency_seconds{cache="l1|l2|origin"}— split by cache layer because the three populations are entirely differentelido_redirect_requests_total{status,cache}— error rate + cache hit ratioelido_link_cache_size— keep ratio above 95% L1 hit on steady-state trafficelido_click_publish_lag_seconds— should be near-zero; alert at >5s
See Observability for the full Grafana dashboard.
See also
- Smart links — what the rule engine evaluates
- Observability — metrics and traces for the hot path
- Self-hosting — Helm chart that ships edge-redirect for on-prem deployment