Simhash and solving the hamming distance problem: explained
Simhashes are a clever means of rapidly finding near-identical documents (or other items) within a large corpus, without having to individually compare every document to every other document. Using simhashes for any sizable corpus involves two parts: generating the simhash itself, and solving the Hamming distance problem. Neither is much use without the other. Unlike minhashes, …
Simhash and solving the hamming distance problem: explained Read More »