URL Shortener API: Rate Limits, Retries, Idempotency

Three endpoints, an auth header, a JSON body. A URL shortener API is one of the easiest integrations on any backlog, and the quickstart gets you a working short link in a few minutes. What the quickstart skips is everything that happens when the integration runs at volume: the rate limiter pushing back, a transient 503 mid-batch, a job queue that delivers the same message twice. Get those wrong and you get duplicate links, dropped work, and a 429 storm that makes things worse.

This post is the production-hardening companion to the API quickstart. It covers the three mechanics that separate a demo from a reliable integration: rate limits and how to pace against them, which errors to retry and how to back off, and idempotency keys that keep a retry from creating a second link. The examples use Elido's API, but the patterns are the same against any well-built link shortener API. If you treat short links as infrastructure you manage from code, the broader case for that is in short links as Terraform.

Rate Limits: a Token Bucket and Three Headers

Elido meters the API with a token bucket, scoped per workspace. The published sustained rates are 10 requests per second on Free, 100 on Pro, 500 on Business, and a negotiated ceiling on Enterprise. Pro carries a burst capacity of 200, which means a full bucket lets you fire 200 requests at once before the rate settles back to the sustained 100 per second. Most link-creation jobs fit inside the burst and never feel the limit at all.

You do not have to guess where you stand. Every response carries three headers:

X-RateLimit-Limit - the current per-second ceiling.
X-RateLimit-Remaining - tokens left in the current window.
X-RateLimit-Reset - the Unix timestamp when the bucket refills.

A well-behaved client reads X-RateLimit-Remaining and slows down before it hits zero, rather than sprinting into a wall of 429s and reacting after the fact. Proactive pacing keeps throughput smooth; reactive retrying after every rejection wastes round trips and, if every client retries at the same instant, manufactures a thundering herd.

A token bucket refilling at the workspace rate while requests draw tokens, with the three rate-limit response headers shown, and a 429 returned once the bucket empties

When you genuinely need to create thousands of links, do not loop the single-create endpoint. POST /v1/links/bulk accepts up to 1000 links in one request and counts as a single unit against the rate limit. One bulk call moves a thousand links for the cost of one token; a thousand single calls burn a thousand tokens and most of your burst. The bulk path is how the Google Sheets import moves a campaign's worth of links without tripping the limiter.

A 429 Too Many Requests - the status RFC 6585 reserves for exactly this - comes back with a retry_after value telling you how many seconds to wait. Respect it. That number is the limiter telling you precisely when a token will be available, which is better information than any guess your backoff would produce.

Retries: Which Codes, and How to Back Off

Not every error is worth retrying, and retrying the wrong one is how a small failure becomes an outage. Sort the responses into two piles.

Retry these, because they are transient: 429 (you were too fast), and 500, 502, 503, 504 (a server-side or gateway fault that may clear on its own). Do not retry these, because the same request will fail identically: 400 (the payload is invalid), 401 (the token is missing or wrong), 403 (the token lacks the scope), 404 (the resource is not there or not yours), and 409 (a slug conflict or a stale-version edit). The first pile is "wait and try again." The second is "fix the code or the input." Retrying a 400 in a tight loop just turns a bug into a denial-of-service attack on yourself.

For the retryable codes, the algorithm that matters is exponential backoff with jitter. Plain exponential backoff - double the wait each attempt - still synchronizes clients, because every client that failed at the same moment also retries at the same moments. Adding randomness spreads them out. AWS's write-up on exponential backoff and jitter is the canonical reference and shows why the jittered version dramatically cuts contention. A compact version in TypeScript:

const RETRYABLE = new Set([429, 500, 502, 503, 504]);

async function withRetry<T>(
  call: () => Promise<Response>,
  max = 5,
): Promise<Response> {
  let attempt = 0;
  while (true) {
    const res = await call();
    if (res.ok || !RETRYABLE.has(res.status) || attempt >= max) return res;

    // Honor server guidance first; otherwise back off exponentially with full jitter.
    const retryAfter = Number(res.headers.get("retry-after"));
    const base =
      Number.isFinite(retryAfter) && retryAfter > 0
        ? retryAfter * 1000
        : Math.min(1000 * 2 ** attempt, 20_000);
    const wait = Math.random() * base; // full jitter
    await new Promise((r) => setTimeout(r, wait));
    attempt++;
  }
}

Three things make this safe rather than dangerous. It caps attempts, so a persistent fault fails loudly instead of spinning forever. It honors Retry-After when the server sends it, falling back to computed backoff only when it does not. And it jitters, so a fleet of workers recovering from the same blip does not stampede in lockstep. The official SDKs implement this same policy out of the box - @elido/sdk, elido-python, and the Go client retry exactly the five transient codes with jittered backoff - which is the main reason to reach for an SDK over a hand-rolled HTTP client.

There is one rule that ties retries to the next section: a retry of a create is only safe if the create is idempotent. Otherwise every retry risks a second link.

Idempotency: How to Not Create Duplicate Links

The classic failure looks like this. Your worker creates a short link, the link is created, but the 200 never makes it back - the connection drops on the return trip. The worker sees a timeout, assumes failure, and retries. Now you have two links for one campaign. At scale, the dashboard fills with /foo, /foo-1, /foo-2, and the duplicates skew every report downstream.

Idempotency keys close that gap. Send an Idempotency-Key header on a mutating request - any string up to 255 characters - and the server stores the response against it. Present the same key again and you get the original response back, status code and body, without the operation running twice. The pattern is the same one Stripe documents for idempotent requests, and it is the standard way to make an unreliable network safe for writes.

The detail that makes or breaks it is where the key comes from. Do not generate a random key per attempt - that defeats the point, because each retry then looks like a new operation. Derive it from a stable business identifier so the same logical action always produces the same key:

const link = await elido.links.create(
  { destinationUrl: order.landingUrl },
  { idempotencyKey: `order-${order.id}-link` },
);

Now a retry of the same job carries order-12345-link again, hits the stored response, and returns the link that already exists. Exactly one link per order, no matter how many times the queue redelivers. This is what lets you combine the backoff loop above with creates safely: the retry and the idempotency key are two halves of the same guarantee.

An at-least-once job queue firing two create requests with the same idempotency key; the server stores the first response and returns it for the second, so exactly one link is created

Two boundaries to keep in mind. The key is scoped per workspace: the same key in two workspaces creates two links, which is correct for a multi-tenant API but surprises teams that assume keys are global. And the cache is not forever - on Elido it holds for 24 hours keyed on (workspace, key). A retry within the window deduplicates; a retry three days later, from a stuck job that finally drained, will create a fresh link. For multi-day batches, do not lean on the key alone. Persist the link ID returned by the first success and look it up before re-issuing. The IETF has been standardizing this header in the Idempotency-Key draft, and the 24-hour-window caveat is called out there too.

If you are wiring an API integration today and want it to survive its own retries, start on a free workspace, generate a service-account token, and put an idempotency key on your very first create rather than retrofitting one after the duplicates show up.

Putting It Together

A production-grade create call is the three mechanics stacked. Pace against the rate-limit headers so you rarely hit 429. Wrap the call in jittered backoff that retries only the transient codes and respects Retry-After. Carry an idempotency key derived from a business ID so the retry is safe. With the official SDK, the first two come for free and you supply only the key:

import { Elido, ElidoRateLimitError } from "@elido/sdk";

const elido = new Elido({ token: process.env.ELIDO_TOKEN! });

export async function shortenForOrder(order: Order) {
  try {
    return await elido.links.create(
      { destinationUrl: order.landingUrl, tags: [`order:${order.id}`] },
      { idempotencyKey: `order-${order.id}-link` },
    );
  } catch (err) {
    if (err instanceof ElidoRateLimitError) {
      // SDK already retried with backoff; we are still limited. Defer the job.
      throw new RetryableJobError(err.retryAfter);
    }
    throw err; // non-retryable: surface it
  }
}

None of this is exotic. It is the same discipline any write-heavy API deserves, applied to links. The reward is an integration that does the right thing under load instead of quietly corrupting your link inventory. For the read side of the same API - pulling click data back out without hammering the limiter - the tradeoffs are in webhooks versus polling for click tracking, and the full endpoint surface lives on the API and SDKs page and the developer solutions overview.

Frequently asked questions

What are the rate limits on a URL shortener API?

On Elido the published limits are 10 requests per second on Free, 100 on Pro with a burst of 200, 500 on Business, and a negotiated limit on Enterprise, all scoped per workspace. Every response carries X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers so a client can pace itself before it ever sees a 429. Bulk creates count as one unit against the limit, so the bulk endpoint is the right tool for large jobs.

Which API errors should I retry?

Retry only transient faults: 429 (rate limited), 500, 502, 503, and 504. Use exponential backoff with jitter and a retry cap. Never retry 400, 401, 403, 404, or 409 - those are validation failures, auth problems, or conflicts, and the same request will fail again. Pair retries with an idempotency key so a retried create does not produce a duplicate link.

What is an idempotency key and when do I need one?

An idempotency key is an opaque string you send on a mutating request so the server can recognize a retry and return the original result instead of acting twice. You need one whenever a create or update can be retried - which is always, because networks fail and job queues deliver at least once. Derive the key from a stable business identifier like an order ID so the same logical operation always carries the same key.

How do I avoid creating duplicate short links?

Send an idempotency key derived from a stable business identifier on every create. If the call is retried, the server returns the originally created link rather than minting a second one. For jobs that can retry days later - beyond the idempotency cache window - also store the link ID returned by the first success and look it up before re-issuing the create.

How long does an idempotency key last?

On Elido the idempotency cache lives for 24 hours, keyed on the workspace plus the key. A retry within that window returns the stored response; a retry after it is treated as a fresh request and will create a new resource. For multi-day batch jobs, persist the resulting link ID on first success rather than relying on the key alone.

URL Shortener API: Rate Limits, Retries, Idempotency

Rate Limits: a Token Bucket and Three Headers

Retries: Which Codes, and How to Back Off

Idempotency: How to Not Create Duplicate Links

Putting It Together

Frequently asked questions

What are the rate limits on a URL shortener API?

Which API errors should I retry?

What is an idempotency key and when do I need one?

How do I avoid creating duplicate short links?

How long does an idempotency key last?

Paste a URL, get a working short link

Continue reading

Rate Limits: a Token Bucket and Three Headers

Retries: Which Codes, and How to Back Off

Idempotency: How to Not Create Duplicate Links

Putting It Together

Related on the Blog

Frequently asked questions

What are the rate limits on a URL shortener API?

Which API errors should I retry?

What is an idempotency key and when do I need one?

How do I avoid creating duplicate short links?

How long does an idempotency key last?

Paste a URL, get a working short link

Continue reading