The Brief
A fintech startup is launching a public API. Without rate limiting, one rogue client can crater the entire service — and the on-call engineer is you. The CTO has asked for a thread-safe, in-memory token-bucket rate limiter that the gateway can use to throttle requests per API key.
A token bucket has a capacity and a refill rate. Each request consumes tokens; when the bucket is empty, requests are denied. Tokens refill continuously over time, never exceeding capacity. The limiter must support multiple independent buckets keyed by string, and a decorator to wrap any function with rate limiting.