# EARRRL – the Estimated Average Recent Request Rate Limiter

You’ve got a problem: a small subset of abusive users are body slamming your API with extremely high request rates. You’ve added windowed rate limiting, and this reduces the load on your infrastructure, but behavior persists. These naughty users are not attempting to rate-limit their own requests. They fire off as many requests as they can, almost immediately hit HTTP 429 Too Many Requests, and even then don’t let up. As soon as a new rate limit window is available, the pattern starts all over again.

# EARRRL – the Estimated Average Recent Request Rate Limiter - the Mathy Bits

In the companion post I introduced a problem with naive, window-based rate limiters – they’re too forgiving! The user’s request count is stored in a key in Redis with a TTL of, say, 15 minutes, and once the key expires, the abusive user can come back and immediately offend again. Effectively the abusive user is using your infrastructure to rate limit their requests.

# Gopar - The Golang Parser that Needs a Better Name

A while back I built a PEG (Parsing Expression Grammer) parser in golang. I wasn’t blogging at the time, so the idea slipped under the radar. Here’s a link to the codebase.

# A Sketch for a new Distribution Sketch

About three and a half years ago I came up with a clever trick for accurately approximating and tightly bounding a cumulative distribution in a rather small data structure. It’s high time that I blogged about it! In this post I’ll talk about the problem space, my technique, the potential benefits of my approach over other approaches, and ways to improve in the future.

# Build your own Skip List

The skip list is one of my favorite data structures.

• It can be used to implement ordered lists or sets.
• It is easy to understand.
• It doesn’t require any complex re-balancing like some of the other ordered-list structures.
• It’s fast. And all of the operations - insert, search, delete - are O(log n) on average.

# The Fundamental Problem of Search

It has gnawed on my subconscious for the past 5 years. Even as I wrote Relevant Search it was there at the back of my mind weighing me down - the fundamental problem of search. But only now has the problem taken shape so that I can even begin to describe it. Succinctly, here it is:

# Load Testing Elasticsearch Using Python asyncio and the Slow Log

Over the past couple of days I’ve been reading over Yeray Diaz’s wonderful blog posts on python3 asyncio (AsyncIO for the Working Python Developer and Asyncio Coroutine Patterns: Beyond await) and I decided to see if I could come up with some sort of Elasticsearch load testing framework.

# Are You Here Today?

A notification popped up on my Slack messenger at work; it was from Mary, our office administrator.

# Meg's First Camping Trip

This past weekend I took Baby Meg (3.5yrs old) on her first camping trip. And boy was it memorable. For starters check out our digs:

# Haystack Highlights

On April 10th and 11th OpenSource Connections held their first (annual I hope) Haystack search relevance conference. It was intended to be a small-and-casual, 50-person conference but ended up pulling in roughly 120 people requiring OSC to scramble to find more space. The end result was one of the best conferences I’ve ever attended. In general, conference speakers have to aim their content at the lowest common denominator so they that they don’t lose their audience. At this conference, the lowest common denominator was really high! So there was no need to over-explain the boring introductory topics. Instead the speakers were able to jump into interesting and deep content immediately.

# Will Acuff on Building Relationship and Improving Communities

Today I had a Penny Chat with Will Acuff discussing how organizations can form relationships with communities. Will should know, he and his wife Tiffany founded Corner toCorner a group that made huge inroads into helping underprivileged communities in Nashville. The reason that I want to learn about this is because my church, (New Garden Church), is making a concerted effort right now to better connect to our community. In some ways we are positioned perfectly to do this - our church services are in Dupont Tyler Middle School. However we have yet to make meaningful relationships with the people in our community outside of our congregation. So we’re looking for help!

# Better Click Tracking for Identifying Statistically High Performers - Part I

Click tracking is a way of boosting documents based upon the historical clickthrough rate that they received when surfaced in search results. Here’s how it works: Let’s say that we’re building click tracking for an online store and we want to boost the documents that are getting the most attention. First you set up logging so that you can count how times a particular item is clicked. Next you have a process that aggregates the clicks across, say, a week, and you store the value in a click_count field along side the documents that you are serving from search. Finally, when someone performs a search you boost the results according to the click_count so that items with high clickthrough rates start surfacing higher in search results. But if you think hard, there’s a pretty nasty problem with this approach.

# Tokenizing Embedding Spaces for Faster, More Relevant Search

Embedding spaces are quite trendy right now machine learning. With word2vec for example, you can create an embedding for words that is capable of capturing analogy. Given the input “man is to king as woman is to what?”, a word2vec embedding can be used to correctly answer “queen”. (Remarkable isn’t it?) Embeddings like this can be used for a wide variety of different domains. For example, facial photos can be projected into an embedding space and for tasks of facial recognition. However I wonder if embeddings fall short in a domain that I am very near to - search. Consider the facial recognition task: Each face photo is converted into an N-dimensional vector where N is often rather high (hundreds of values). Given a sample photograph of a face, if you want to find all of the photos of that person then you have to search for all the photo vectors near to the sample photo’s vector. But, due to the curse of dimensionality, very high dimensional embedding spaces are not amenable to data structure commonly used for spatial search, such as k-d trees.

# Neuroscience Penny Chat with David Simon

As many of my friends know, I’ve picked up neuroscience as a sort of side hobby. (Some people collect stamps, I memorize anatomical structures of the brain.) Last time I blogged about this was regarding my Penny Chat with Stephen Bailey on his work with MRIs. But this week I sat down with one of Stephen’s friends David Simon to talk about his research involving Electroencephalography a.k.a. EEG.

# Neuroscience Penny Chat with Stephen Bailey

Last week I took part in a Medical Imaging study at Vanderbilt in Stephen Bailey’s laboratory and lookey at this:

# Find Someone to Steal Your Idea - I Dare You!

A week ago I met with an aspiring entrepreneur who had some interesting ideas regarding a recruitement startup. But during the conversation I got the feeling that he was holding his cards close and I was having a little trouble getting the whole picture. Towards the end of the conversation he confided that he was really vested in his ideas for the startup and that it actually hurt to hear those ideas criticized.

# Poker Talk with a Two-Time World Series of Poker Bracelet Winner

I was lucky enough last week to find myself drinking a beer with Pat Poels, Eventbrite VP of engineering and two-time World Series of Poker bracelet winner. And I was luckier still that he was in the mood to talk about his poker days. I love hearing these stories but I’m always reluctant to ask because I suspect people ask him about “the poker days” all the time.

# Functional Programming Penny Chat

Better late than never for my Penny Chat Review for Bryan Hunter’s FP discussion. Here are some of the things that I picked up:

# Algorithmic Influencer Marketing

I had a great Penny Chat with Kara Fulgum regarding a very foreign concept to me, Influence Marketing. But first off…

# Cowboys and Consultants Don't Need Unit Tests

As a developer, my understanding and respect for software testing has been slow coming because in my previous work I have been an engineer and a consultant, and in these roles it wasn’t yet obvious how important testing really is. But over the past year I have finally gained an appropriate respect and appreciation for testing; and it’s even improving the way I write code. In this post I will explain where I’ve come from and how far I’ve traveled in my testing practices. I’ll then list out some of the more important principles I’ve picked up along the way.

# Climbing Mount Maslow

In his 1943 paper, A Theory of Human Motivation, Abraham Maslow introduced a simple principle that has had a profound influence in the fields of psychology and sociology. Namely, he introduce the concept of a hierarchy of human needs which he termed Physiological, Safety, Belongingness and Love, Esteem, Self-Actualization and Self-Transcendence. And Maslow’s big main point here was that it is necessary to first satisfy the basic needs before we can even have the luxury to start worrying about the higher-level concerns. But for me, looking through Maslow’s hierarchy in some detail, it seems that all the cool kids hang out towards the top of that hierarchy. I’ve been there in the past, and am occasionally so fortunate as to touch the top of the hierarchy again from time to time. But I think that I (we) can do better than this! So I determined myself to try and devise a way to “hack” Maslow’s Hierarchy so as to maximize the time I’m spending near the top.

# Understanding Eigenvector Centrality with the Metaphor of Political Power

If you play around much with graphs, one of the first things that you’ll run into is the idea of network centrality. Centrality is a value associated with a node which represents how important and how central that node is to the network as a whole. There are actually quite a few way of defining network centrality - here are just a few:

# Greetings

This is the first post of what I hope will be many posts to come. Being the first, I feel that it is import to lay out the themes that I intend to cover and the goals that I expect to achieve. However – having not the slightest idea of what I will do with this blog, or even if I’ll do anything with it at all, you’ll have to be satisfied with the pretentious sounding introductory material which you are currently reading.