KodeKabuki

Welcome, my name is Harish Mallipeddi. I work for Amazon Web Services (AWS). This blog is mostly a dump of interesting articles that I come across on the web. Topics span across multiple areas including algorithms/datastructures, NoSQL stores, database internals, web-scale challenges, and functional languages.

February 22, 2011 at 2:32pm

Home

http://www.toao.com/posts/finding-similar-items-key-store-minhashing.html →

Article introduces what minhashing is and proves that the probability of 2 sets being similar is actually equal to the probability of their minhashes matching. So you can actually calculate the minhashes of sets and use that to determine if the sets are similar/dissimilar without having to compare each and every element.