Train Whistles
29 Nov 2022A quick post tonight – just relating an interesting observation that I made about train whistles.
Read MoreA quick post tonight – just relating an interesting observation that I made about train whistles.
Read More(Note to reader: I think I wrote this post for myself. From an outside perspective, it’s by far the most boring one I’ve ever written. But it’s math that’s been occupying my mind for a week and from an inside perspective it’s been quite fun. Maybe you’ll find the fun in it that I did.)
Read MoreI’ve been playing with my Twitter social graph recently, and it occurred to me that the people that I’m friends with form several clusters. I wanted to see if I could come up with some sort of clustering algorithm to identify these clusters. Why? Well for one, it could be of practical use; maybe I can find some good use for it. But, perhaps more than that, I was curious if I could make a clustering algorithm – I’ve kinda got a thing for reinventing wheels.
Read MoreI like positing hypotheses that are completely unverified and poorly examined. Why? Because it’s easier to play with ideas when you don’t have to check your work. 🤣 Here are two somewhat related hypotheses about how evolution has made two very different jiggly things more durable and resistant to distress: your brain and trees.
Read MoreYou’ve got a problem: a small subset of abusive users are body slamming your API with extremely high request rates. You’ve added windowed rate limiting, and this reduces the load on your infrastructure, but behavior persists. These naughty users are not attempting to rate-limit their own requests. They fire off as many requests as they can, almost immediately hit HTTP 429 Too Many Requests
, and even then don’t let up. As soon as a new rate limit window is available, the pattern starts all over again.
In the companion post I introduced a problem with naive, window-based rate limiters – they’re too forgiving! The user’s request count is stored in a key in Redis with a TTL of, say, 15 minutes, and once the key expires, the abusive user can come back and immediately offend again. Effectively the abusive user is using your infrastructure to rate limit their requests.
Read MoreA while back I built a PEG (Parsing Expression Grammer) parser in golang. I wasn’t blogging at the time, so the idea slipped under the radar. Here’s a link to the codebase.
Read MoreAbout three and a half years ago I came up with a clever trick for accurately approximating and tightly bounding a cumulative distribution in a rather small data structure. It’s high time that I blogged about it! In this post I’ll talk about the problem space, my technique, the potential benefits of my approach over other approaches, and ways to improve in the future.
Read MoreThe skip list is one of my favorite data structures.
It has gnawed on my subconscious for the past 5 years. Even as I wrote Relevant Search it was there at the back of my mind weighing me down - the fundamental problem of search. But only now has the problem taken shape so that I can even begin to describe it. Succinctly, here it is:
Read MoreOver the past couple of days I’ve been reading over Yeray Diaz’s wonderful blog posts on python3 asyncio
(AsyncIO for the Working Python Developer and Asyncio Coroutine Patterns: Beyond await) and I decided to see if I could come up with some sort of Elasticsearch load testing framework.
A notification popped up on my Slack messenger at work; it was from Mary, our office administrator.
Read MoreThis past weekend I took Baby Meg (3.5yrs old) on her first camping trip. And boy was it memorable. For starters check out our digs:
Read MoreOn April 10th and 11th OpenSource Connections held their first (annual I hope) Haystack search relevance conference. It was intended to be a small-and-casual, 50-person conference but ended up pulling in roughly 120 people requiring OSC to scramble to find more space. The end result was one of the best conferences I’ve ever attended. In general, conference speakers have to aim their content at the lowest common denominator so they that they don’t lose their audience. At this conference, the lowest common denominator was really high! So there was no need to over-explain the boring introductory topics. Instead the speakers were able to jump into interesting and deep content immediately.
Read MoreToday I had a Penny Chat with Will Acuff discussing how organizations can form relationships with communities. Will should know, he and his wife Tiffany founded Corner toCorner a group that made huge inroads into helping underprivileged communities in Nashville. The reason that I want to learn about this is because my church, (New Garden Church), is making a concerted effort right now to better connect to our community. In some ways we are positioned perfectly to do this - our church services are in Dupont Tyler Middle School. However we have yet to make meaningful relationships with the people in our community outside of our congregation. So we’re looking for help!
Read MoreClick tracking is a way of boosting documents based upon the historical clickthrough rate that they received when surfaced in search results. Here’s how it works: Let’s say that we’re building click tracking for an online store and we want to boost the documents that are getting the most attention. First you set up logging so that you can count how times a particular item is clicked. Next you have a process that aggregates the clicks across, say, a week, and you store the value in a click_count
field along side the documents that you are serving from search. Finally, when someone performs a search you boost the results according to the click_count
so that items with high clickthrough rates start surfacing higher in search results. But if you think hard, there’s a pretty nasty problem with this approach.
Embedding spaces are quite trendy right now machine learning. With word2vec for example, you can create an embedding for words that is capable of capturing analogy. Given the input “man is to king as woman is to what?”, a word2vec embedding can be used to correctly answer “queen”. (Remarkable isn’t it?) Embeddings like this can be used for a wide variety of different domains. For example, facial photos can be projected into an embedding space and for tasks of facial recognition. However I wonder if embeddings fall short in a domain that I am very near to - search. Consider the facial recognition task: Each face photo is converted into an N-dimensional vector where N is often rather high (hundreds of values). Given a sample photograph of a face, if you want to find all of the photos of that person then you have to search for all the photo vectors near to the sample photo’s vector. But, due to the curse of dimensionality, very high dimensional embedding spaces are not amenable to data structure commonly used for spatial search, such as k-d trees.
Read MoreAs many of my friends know, I’ve picked up neuroscience as a sort of side hobby. (Some people collect stamps, I memorize anatomical structures of the brain.) Last time I blogged about this was regarding my Penny Chat with Stephen Bailey on his work with MRIs. But this week I sat down with one of Stephen’s friends David Simon to talk about his research involving Electroencephalography a.k.a. EEG.
Read MoreLast week I took part in a Medical Imaging study at Vanderbilt in Stephen Bailey’s laboratory and lookey at this:
Read MoreA week ago I met with an aspiring entrepreneur who had some interesting ideas regarding a recruitement startup. But during the conversation I got the feeling that he was holding his cards close and I was having a little trouble getting the whole picture. Towards the end of the conversation he confided that he was really vested in his ideas for the startup and that it actually hurt to hear those ideas criticized.
Read MoreI was lucky enough last week to find myself drinking a beer with Pat Poels, Eventbrite VP of engineering and two-time World Series of Poker bracelet winner. And I was luckier still that he was in the mood to talk about his poker days. I love hearing these stories but I’m always reluctant to ask because I suspect people ask him about “the poker days” all the time.
Read MoreBetter late than never for my Penny Chat Review for Bryan Hunter’s FP discussion. Here are some of the things that I picked up:
Read MoreI had a great Penny Chat with Kara Fulgum regarding a very foreign concept to me, Influence Marketing. But first off…
Read MoreAs a developer, my understanding and respect for software testing has been slow coming because in my previous work I have been an engineer and a consultant, and in these roles it wasn’t yet obvious how important testing really is. But over the past year I have finally gained an appropriate respect and appreciation for testing; and it’s even improving the way I write code. In this post I will explain where I’ve come from and how far I’ve traveled in my testing practices. I’ll then list out some of the more important principles I’ve picked up along the way.
Read MoreIn his 1943 paper, A Theory of Human Motivation, Abraham Maslow introduced a simple principle that has had a profound influence in the fields of psychology and sociology. Namely, he introduce the concept of a hierarchy of human needs which he termed Physiological, Safety, Belongingness and Love, Esteem, Self-Actualization and Self-Transcendence. And Maslow’s big main point here was that it is necessary to first satisfy the basic needs before we can even have the luxury to start worrying about the higher-level concerns. But for me, looking through Maslow’s hierarchy in some detail, it seems that all the cool kids hang out towards the top of that hierarchy. I’ve been there in the past, and am occasionally so fortunate as to touch the top of the hierarchy again from time to time. But I think that I (we) can do better than this! So I determined myself to try and devise a way to “hack” Maslow’s Hierarchy so as to maximize the time I’m spending near the top.
Read MoreIf you play around much with graphs, one of the first things that you’ll run into is the idea of network centrality. Centrality is a value associated with a node which represents how important and how central that node is to the network as a whole. There are actually quite a few way of defining network centrality - here are just a few:
Read MoreThis is the first post of what I hope will be many posts to come. Being the first, I feel that it is import to lay out the themes that I intend to cover and the goals that I expect to achieve. However – having not the slightest idea of what I will do with this blog, or even if I’ll do anything with it at all, you’ll have to be satisfied with the pretentious sounding introductory material which you are currently reading.
Read More