Tool Invocation – Demonstrating the Marvel of GPT's Flexibility

I know it has to be true – The magic of human-level cognition isn’t the result of a brain in which every single piece is perfectly tuned for its very special and isolated task. Instead, there has to be some simple “principle of computation” that is repeated throughout the brain; some component that, when replicated throughout the brain, gives rise to this emergent property we call intelligence. Sure, we have a motor cortex, specialized to control movement. We have a visual cortex, specialized to process information from our eyes. And we have a pre-frontal cortex that assimilates all this information together and plans what to do next – “That’s a snake! Don’t step on it!” But there is evidence that the base circuitry that makes up all these modules is actually quite general. At every point on the neocortex, you see basically the same pattern - an interwoven computational “component” composed of 6 layers of stacked neurons. At every portion of our cortex this pattern is, with very little modification, repeated. Besides the similar morphology, there is other evidence that the computational components are general. In several really weird brain rewiring studies, they have redirected visual input to the auditory pathway and shown that animals can compensate quite well - effectively “seeing” with the part of their brain that was rightfully intended to hear stuff! (Mice (Lyckman et al., 2001), ferrets (Sur et al., 1988; Roe et al., 1990; Roe et al., 1992; Roe et al., 1993; Sharma et al., 2000), and hamsters (Schneider, 1973; Kalil & Schneider, 1975; Frost, 1982; Frost & Metin, 1985).)

Read More

The Primary Objective of Software Design: Minimizing Total Cognitive Load

Over my half-career in software development, I’ve started to collect some insights (or at least opinions) about how software can be built so that it is easy to maintain, use, and extend. Usually we hear of principles such as modularity, abstraction, loose coupling, and separation of concerns, and each of these is important to strive for. But I’ve found that behind all of these, there is a single, unifying principle – the reduction of cognitive load. In this post I talk about what I’ve come to think of as the primary objective of software design: minimizing total cognitive load of all future users and maintainers of your software.

Read More

Similarity Search for Grouped Content (Teaser)

Vector search has taken the world by storm. The idea is this - cast documents into a vector embedding space via some transformation (e.g. BERT) and then index them into a vector database that is capable of fast approximate nearest neighbors search. Then when the user makes a text query, cast their query into this same embedding space and find the nearest vectors to the query vector and return the corresponding documents.

Read More

What is Humor?

I was thinking about it… I have no idea what humor is. Can you define it? I certainly can’t.

Read More

What's a "Mind Meld Teaser" Post

I realized today that I don’t write blog posts most of the time. If you were to look at my private notes, for every blog post that I’ve published here, I have probaby 10-20 half baked ideas that would be great to write about… but I just don’t have time to get around to them. Why not? Well, because frankly, writing a blog post is a time risk. If I determine myself to write a post on some Friday afternoon, it’s very possible that I could be signing up for a whole-weekend work task, or even more if it’s really a good post. Some good posts require research and creating proofs. All good posts require lots of time for actually crafting the text and refining it. What’s more, blogging is a rather lonely task. All this preparation I do alone, and that takes me away from time I could be spending around people – which I much prefer.

Read More

Train Whistles

A quick post tonight – just relating an interesting observation that I made about train whistles.

Read More

Playing with a Rational Distribution

(Note to reader: I think I wrote this post for myself. From an outside perspective, it’s by far the most boring one I’ve ever written. But it’s math that’s been occupying my mind for a week and from an inside perspective it’s been quite fun. Maybe you’ll find the fun in it that I did.)

Read More

"Who da Boss" Graph Clustering

I’ve been playing with my Twitter social graph recently, and it occurred to me that the people that I’m friends with form several clusters. I wanted to see if I could come up with some sort of clustering algorithm to identify these clusters. Why? Well for one, it could be of practical use; maybe I can find some good use for it. But, perhaps more than that, I was curious if I could make a clustering algorithm – I’ve kinda got a thing for reinventing wheels.

Read More

Evolution of Jiggly Stuff

I like positing hypotheses that are completely unverified and poorly examined. Why? Because it’s easier to play with ideas when you don’t have to check your work. 🤣 Here are two somewhat related hypotheses about how evolution has made two very different jiggly things more durable and resistant to distress: your brain and trees.

Read More

EARRRL – the Estimated Average Recent Request Rate Limiter

You’ve got a problem: a small subset of abusive users are body slamming your API with extremely high request rates. You’ve added windowed rate limiting, and this reduces the load on your infrastructure, but behavior persists. These naughty users are not attempting to rate-limit their own requests. They fire off as many requests as they can, almost immediately hit HTTP 429 Too Many Requests, and even then don’t let up. As soon as a new rate limit window is available, the pattern starts all over again.

Read More

EARRRL – the Estimated Average Recent Request Rate Limiter - the Mathy Bits

In the companion post I introduced a problem with naive, window-based rate limiters – they’re too forgiving! The user’s request count is stored in a key in Redis with a TTL of, say, 15 minutes, and once the key expires, the abusive user can come back and immediately offend again. Effectively the abusive user is using your infrastructure to rate limit their requests.

Read More

Aircraft Control Theory - Applied to Product Growth

exponential distribution and percentile error
Read More

Gopar - The Golang Parser that Needs a Better Name

A while back I built a PEG (Parsing Expression Grammer) parser in golang. I wasn’t blogging at the time, so the idea slipped under the radar. Here’s a link to the codebase.

Read More

A Sketch for a new Distribution Sketch

About three and a half years ago I came up with a clever trick for accurately approximating and tightly bounding a cumulative distribution in a rather small data structure. It’s high time that I blogged about it! In this post I’ll talk about the problem space, my technique, the potential benefits of my approach over other approaches, and ways to improve in the future.

Read More

Build your own Skip List

The skip list is one of my favorite data structures.

  • It can be used to implement ordered lists or sets.
  • It is easy to understand.
  • It doesn’t require any complex re-balancing like some of the other ordered-list structures.
  • It’s fast. And all of the operations - insert, search, delete - are O(log n) on average.
Read More

The Fundamental Problem of Search

It has gnawed on my subconscious for the past 5 years. Even as I wrote Relevant Search it was there at the back of my mind weighing me down - the fundamental problem of search. But only now has the problem taken shape so that I can even begin to describe it. Succinctly, here it is:

Read More

Load Testing Elasticsearch Using Python asyncio and the Slow Log

Over the past couple of days I’ve been reading over Yeray Diaz’s wonderful blog posts on python3 asyncio (AsyncIO for the Working Python Developer and Asyncio Coroutine Patterns: Beyond await) and I decided to see if I could come up with some sort of Elasticsearch load testing framework.

Read More

Are You Here Today?

A notification popped up on my Slack messenger at work; it was from Mary, our office administrator.

Read More

Meg's First Camping Trip

This past weekend I took Baby Meg (3.5yrs old) on her first camping trip. And boy was it memorable. For starters check out our digs:

Read More

Haystack Highlights

On April 10th and 11th OpenSource Connections held their first (annual I hope) Haystack search relevance conference. It was intended to be a small-and-casual, 50-person conference but ended up pulling in roughly 120 people requiring OSC to scramble to find more space. The end result was one of the best conferences I’ve ever attended. In general, conference speakers have to aim their content at the lowest common denominator so they that they don’t lose their audience. At this conference, the lowest common denominator was really high! So there was no need to over-explain the boring introductory topics. Instead the speakers were able to jump into interesting and deep content immediately.

Read More

Will Acuff on Building Relationship and Improving Communities

Today I had a Penny Chat with Will Acuff discussing how organizations can form relationships with communities. Will should know, he and his wife Tiffany founded Corner toCorner a group that made huge inroads into helping underprivileged communities in Nashville. The reason that I want to learn about this is because my church, (New Garden Church), is making a concerted effort right now to better connect to our community. In some ways we are positioned perfectly to do this - our church services are in Dupont Tyler Middle School. However we have yet to make meaningful relationships with the people in our community outside of our congregation. So we’re looking for help!

Read More

Better Click Tracking for Identifying Statistically High Performers - Part I

Click tracking is a way of boosting documents based upon the historical clickthrough rate that they received when surfaced in search results. Here’s how it works: Let’s say that we’re building click tracking for an online store and we want to boost the documents that are getting the most attention. First you set up logging so that you can count how times a particular item is clicked. Next you have a process that aggregates the clicks across, say, a week, and you store the value in a click_count field along side the documents that you are serving from search. Finally, when someone performs a search you boost the results according to the click_count so that items with high clickthrough rates start surfacing higher in search results. But if you think hard, there’s a pretty nasty problem with this approach.

Read More

Tokenizing Embedding Spaces for Faster, More Relevant Search

Embedding spaces are quite trendy right now machine learning. With word2vec for example, you can create an embedding for words that is capable of capturing analogy. Given the input “man is to king as woman is to what?”, a word2vec embedding can be used to correctly answer “queen”. (Remarkable isn’t it?) Embeddings like this can be used for a wide variety of different domains. For example, facial photos can be projected into an embedding space and for tasks of facial recognition. However I wonder if embeddings fall short in a domain that I am very near to - search. Consider the facial recognition task: Each face photo is converted into an N-dimensional vector where N is often rather high (hundreds of values). Given a sample photograph of a face, if you want to find all of the photos of that person then you have to search for all the photo vectors near to the sample photo’s vector. But, due to the curse of dimensionality, very high dimensional embedding spaces are not amenable to data structure commonly used for spatial search, such as k-d trees.

Read More

Neuroscience Penny Chat with David Simon

As many of my friends know, I’ve picked up neuroscience as a sort of side hobby. (Some people collect stamps, I memorize anatomical structures of the brain.) Last time I blogged about this was regarding my Penny Chat with Stephen Bailey on his work with MRIs. But this week I sat down with one of Stephen’s friends David Simon to talk about his research involving Electroencephalography a.k.a. EEG.

Read More

Neuroscience Penny Chat with Stephen Bailey

Last week I took part in a Medical Imaging study at Vanderbilt in Stephen Bailey’s laboratory and lookey at this:

Read More

Find Someone to Steal Your Idea - I Dare You!

A week ago I met with an aspiring entrepreneur who had some interesting ideas regarding a recruitement startup. But during the conversation I got the feeling that he was holding his cards close and I was having a little trouble getting the whole picture. Towards the end of the conversation he confided that he was really vested in his ideas for the startup and that it actually hurt to hear those ideas criticized.

Read More

Poker Talk with a Two-Time World Series of Poker Bracelet Winner

I was lucky enough last week to find myself drinking a beer with Pat Poels, Eventbrite VP of engineering and two-time World Series of Poker bracelet winner. And I was luckier still that he was in the mood to talk about his poker days. I love hearing these stories but I’m always reluctant to ask because I suspect people ask him about “the poker days” all the time.

Read More

Functional Programming Penny Chat

Better late than never for my Penny Chat Review for Bryan Hunter’s FP discussion. Here are some of the things that I picked up:

Read More

Algorithmic Influencer Marketing

I had a great Penny Chat with Kara Fulgum regarding a very foreign concept to me, Influence Marketing. But first off…

Read More

Cowboys and Consultants Don't Need Unit Tests

As a developer, my understanding and respect for software testing has been slow coming because in my previous work I have been an engineer and a consultant, and in these roles it wasn’t yet obvious how important testing really is. But over the past year I have finally gained an appropriate respect and appreciation for testing; and it’s even improving the way I write code. In this post I will explain where I’ve come from and how far I’ve traveled in my testing practices. I’ll then list out some of the more important principles I’ve picked up along the way.

Read More

Climbing Mount Maslow

In his 1943 paper, A Theory of Human Motivation, Abraham Maslow introduced a simple principle that has had a profound influence in the fields of psychology and sociology. Namely, he introduce the concept of a hierarchy of human needs which he termed Physiological, Safety, Belongingness and Love, Esteem, Self-Actualization and Self-Transcendence. And Maslow’s big main point here was that it is necessary to first satisfy the basic needs before we can even have the luxury to start worrying about the higher-level concerns. But for me, looking through Maslow’s hierarchy in some detail, it seems that all the cool kids hang out towards the top of that hierarchy. I’ve been there in the past, and am occasionally so fortunate as to touch the top of the hierarchy again from time to time. But I think that I (we) can do better than this! So I determined myself to try and devise a way to “hack” Maslow’s Hierarchy so as to maximize the time I’m spending near the top.

Read More

Understanding Eigenvector Centrality with the Metaphor of Political Power

If you play around much with graphs, one of the first things that you’ll run into is the idea of network centrality. Centrality is a value associated with a node which represents how important and how central that node is to the network as a whole. There are actually quite a few way of defining network centrality - here are just a few:

Read More

Greetings

This is the first post of what I hope will be many posts to come. Being the first, I feel that it is import to lay out the themes that I intend to cover and the goals that I expect to achieve. However – having not the slightest idea of what I will do with this blog, or even if I’ll do anything with it at all, you’ll have to be satisfied with the pretentious sounding introductory material which you are currently reading.

Read More