Skip to content

2018

A Sketch for a new Distribution Sketch

About three and a half years ago I came up with a clever trick for accurately approximating and tightly bounding a cumulative distribution in a rather small data structure. It's high time that I blogged about it! In this post I'll talk about the problem space, my technique, the potential benefits of my approach over other approaches, and ways to improve in the future.

Build your own Skip List

The skip list is one of my favorite data structures. * It can be used to implement ordered lists or sets. * It is easy to understand. * It doesn't require any complex re-balancing like some of the other ordered-list structures. * It's fast. And all of the operations - insert, search, delete - are O(log n) on average.

It has gnawed on my subconscious for the past 5 years. Even as I wrote Relevant Search it was there at the back of my mind weighing me down - the fundamental problem of search. But only now has the problem taken shape so that I can even begin to describe it. Succinctly, here it is:

Users don't know what the heck they want and couldn't tell you even if they did.

Are You Here Today?

A notification popped up on my Slack messenger at work; it was from Mary, our office administrator.

Are you here today?

Now Mary has an interesting side career, she's a yoga instructor. But she is not a yoga instructor of the common, everyday, throw-your-leg-over-your-head variety. Rather she prefers to instruct in the more ancient and traditional notions of yoga - notions that include the physical practice but are also related to meditation and to philosophical detachment from the more selfish aspects of the ego. Given that, and the fact that her yoga practice had come up in conversation recently, I decided to poke at the question "Are you here today?":

Now that's a rather deep question don't you think?

Moments later Mary arrived at my desk with a copy of the Bhagavad Gita. Ha! I'd poked fun at her, but she was actually planning to lend me a copy of a book that delves into why this actually is a deep question.

Meg's First Camping Trip

This past weekend I took Baby Meg (3.5yrs old) on her first camping trip. And boy was it memorable. For starters check out our digs:

missing

It's not a tent, and it looks an awful lot like an upscale deer blind. But no... as far as Meg is concerned, this is her "Castle in the Sky". I showed it to her in the afternoon and she was excited to find out that that night we were going to sleep in the Castle in the Sky.

Haystack Highlights

On April 10th and 11th OpenSource Connections held their first (annual I hope) Haystack search relevance conference. It was intended to be a small-and-casual, 50-person conference but ended up pulling in roughly 120 people requiring OSC to scramble to find more space. The end result was one of the best conferences I've ever attended. In general, conference speakers have to aim their content at the lowest common denominator so they that they don't lose their audience. At this conference, the lowest common denominator was really high! So there was no need to over-explain the boring introductory topics. Instead the speakers were able to jump into interesting and deep content immediately.

Needless to say, I came away with a ton of good information that I'm going to put to work at Eventbrite as soon as possible.

Better Click Tracking for Identifying Statistically High Performers - Part I

Click tracking is a way of boosting documents based upon the historical clickthrough rate that they received when surfaced in search results. Here's how it works: Let's say that we're building click tracking for an online store and we want to boost the documents that are getting the most attention. First you set up logging so that you can count how times a particular item is clicked. Next you have a process that aggregates the clicks across, say, a week, and you store the value in a click_count field along side the documents that you are serving from search. Finally, when someone performs a search you boost the results according to the click_count so that items with high clickthrough rates start surfacing higher in search results. But if you think hard, there's a pretty nasty problem with this approach.

(Can you figure it out?)

The problem is feedback. In the context of search results, the first page, and really, the first few results get all the love. Very few users are desperate enough to click through to the second page of results. So click tracking causes a nasty positive feedback dynamic to arise: The user are shown a page of results, user's only click into those results, thus those first-page items now get an additional boost. This makes it even more likely for these items to show up on the first page of results for other related searches, which exacerbates the problem, etc. One way of addressing this problem is by tracking the typical clickthrough rate and then boosting a document according to only how much it exceeds the typical clickthrough rate.

This is the first in a series of blog posts where we will examine how a more sophisticated version of click tracking can be implemented and we will examine some of the neat off-shoots of this work that allow you to things like turning click logs into judgement lists. But first we start with a very simple example... a very simple example:

Will Acuff on Building Relationship and Improving Communities

Today I had a Penny Chat with Will Acuff discussing how organizations can form relationships with communities. Will should know, he and his wife Tiffany founded Corner toCorner a group that made huge inroads into helping underprivileged communities in Nashville. The reason that I want to learn about this is because my church, (New Garden Church), is making a concerted effort right now to better connect to our community. In some ways we are positioned perfectly to do this - our church services are in Dupont Tyler Middle School. However we have yet to make meaningful relationships with the people in our community outside of our congregation. So we're looking for help!