Simhash diagram showing how blocks of F and G are permuted so that differing bits are all at the low end

Simhash and solving the hamming distance problem: explained

Simhashes are a clever means of rapidly finding near-identical documents (or other items) within a large corpus, without having to individually compare every document to every other document. Using simhashes for any sizable corpus involves two parts: generating the simhash itself, andĀ solving the Hamming distance problem. Neither is much use without the other. Unlike minhashes, …

Simhash and solving the hamming distance problem: explained Read More »