I've been playing with my Twitter social graph recently, and it occurred to me that the people that I'm friends with form several clusters. I wanted to see if I could come up with some sort of clustering algorithm to identify these clusters. Why? Well for one, it could be of practical use; maybe I can find some good use for it. But, perhaps more than that, I was curious if I could make a clustering algorithm – I've kinda got a thing for reinventing wheels.
I like positing hypotheses that are completely unverified and poorly examined. Why? Because it's easier to play with ideas when you don't have to check your work. 🤣 Here are two somewhat related hypotheses about how evolution has made two very different jiggly things more durable and resistant to distress: your brain and trees.
In the companion post I introduced a problem with naive, window-based rate limiters – they're too forgiving! The user's request count is stored in a key in Redis with a TTL of, say, 15 minutes, and once the key expires, the abusive user can come back and immediately offend again. Effectively the abusive user is using your infrastructure to rate limit their requests.
In this post we'll investigate an alternative approach to windowed rate limiting which keeps a running estimate of each user's request rate and rejects requests for users whose rate is above the prescribed threshold. The focus of this post is on the math behind the approach. For a practical implementation, usage, and motivation for why the math might be worth looking at, please take a look at the companion post.
You've got a problem: a small subset of abusive users are body slamming your API with extremely high request rates. You've added windowed rate limiting, and this reduces the load on your infrastructure, but behavior persists. These naughty users are not attempting to rate-limit their own requests. They fire off as many requests as they can, almost immediately hit HTTP 429 Too Many Requests, and even then don't let up. As soon as a new rate limit window is available, the pattern starts all over again.
In order to curtail this behavior, it would be nice to penalize bad users according to their recent average request rate. That is, if a user responsibly limits their own requests, then they never get a 429. However if the user has the bad habit of constantly exceeding the rate limit, then we stop them from making any more requests – forever ... no new windows and no second chances... that is until they mend their ways and start monitoring their own rate more responsibly. Once their average request rate falls below the prescribed threshold, then their past sins are forgiven, and they may begin anew as a responsible user of our API.
Once upon a time, a long time ago, I was an aerospace engineer. One of the main goals of an aerospace engineer is (duh) to make sure you keep aircraft safely in the air. To do this, you must make sure that the aircraft is stable. Simplistically, stability indicates that the forces acting upon the aircraft tend to correct any perturbation that acts upon the aircraft. For example, if a gust of wind pitches the nose of the aircraft upward, then an unstable aircraft will pitch further upward and soon start back flipping. A stable aircraft, on the other hand, will be naturally forced back towards level flight. I was recently reminded of my aerospace background when reading a blog post by Brian Balfour (et al.) about product feedback loops. His description of product feedback loops is, surprisingly, quite analogous to the aircraft feedback loops we see with aircraft. However, instead of air pressure forcing the aircraft to tip backwards more and more, we see things like social sharing causing a product to gain in more and more popularity. An interesting difference between aircraft dynamics and product dynamics is that, while you want aircraft to dampen out perturbations and return to straight and level flight, you want products to exhibit instability – you want a slight jump in popularity to give rise to more and more popularity and trigger exponential growth. It seems, then, that aerospace engineering may provide interesting tools for understanding just how to trigger exponential product growth. Let's dig in!
A while back I built a PEG (Parsing Expression Grammer) parser in golang. I wasn't blogging at the time, so the idea slipped under the radar. Here's a link to the codebase.
About three and a half years ago I came up with a clever trick for accurately approximating and tightly bounding a cumulative distribution in a rather small data structure. It's high time that I blogged about it! In this post I'll talk about the problem space, my technique, the potential benefits of my approach over other approaches, and ways to improve in the future.
The skip list is one of my favorite data structures.
* It can be used to implement ordered lists or sets.
* It is easy to understand.
* It doesn't require any complex re-balancing like some of the other ordered-list structures.
* It's fast. And all of the operations - insert, search, delete - are O(log n) on average.
It has gnawed on my subconscious for the past 5 years. Even as I wrote Relevant Search it was there at the back of my mind weighing me down - the fundamental problem of search. But only now has the problem taken shape so that I can even begin to describe it. Succinctly, here it is:
Users don't know what the heck they want and couldn't tell you even if they did.