Better Click Tracking for Identifying Statistically High Performers - Part I

Click tracking is a way of boosting documents based upon the historical clickthrough rate that they received when surfaced in search results. Here’s how it works: Let’s say that we’re building click tracking for an online store and we want to boost the documents that are getting the most attention. First you set up logging so that you can count how times a particular item is clicked. Next you have a process that aggregates the clicks across, say, a week, and you store the value in a click_count field along side the documents that you are serving from search. Finally, when someone performs a search you boost the results according to the click_count so that items with high clickthrough rates start surfacing higher in search results. But if you think hard, there’s a pretty nasty problem with this approach.

Read More

Tokenizing Embedding Spaces for Faster, More Relevant Search

Embedding spaces are quite trendy right now machine learning. With word2vec for example, you can create an embedding for words that is capable of capturing analogy. Given the input “man is to king as woman is to what?”, a word2vec embedding can be used to correctly answer “queen”. (Remarkable isn’t it?) Embeddings like this can be used for a wide variety of different domains. For example, facial photos can be projected into an embedding space and for tasks of facial recognition. However I wonder if embeddings fall short in a domain that I am very near to - search. Consider the facial recognition task: Each face photo is converted into an N-dimensional vector where N is often rather high (hundreds of values). Given a sample photograph of a face, if you want to find all of the photos of that person then you have to search for all the photo vectors near to the sample photo’s vector. But, due to the curse of dimensionality, very high dimensional embedding spaces are not amenable to data structure commonly used for spatial search, such as k-d trees.

Read More

Neuroscience Penny Chat with David Simon

As many of my friends know, I’ve picked up neuroscience as a sort of side hobby. (Some people collect stamps, I memorize anatomical structures of the brain.) Last time I blogged about this was regarding my Penny Chat with Stephen Bailey on his work with MRIs. But this week I sat down with one of Stephen’s friends David Simon to talk about his research involving Electroencephalography a.k.a. EEG.

Read More

Neuroscience Penny Chat with Stephen Bailey

Last week I took part in a Medical Imaging study at Vanderbilt in Stephen Bailey’s laboratory and lookey at this:

Read More

Find Someone to Steal Your Idea - I Dare You!

A week ago I met with an aspiring entrepreneur who had some interesting ideas regarding a recruitement startup. But during the conversation I got the feeling that he was holding his cards close and I was having a little trouble getting the whole picture. Towards the end of the conversation he confided that he was really vested in his ideas for the startup and that it actually hurt to hear those ideas criticized.

Read More