Programming

Simhash diagram showing how blocks of F and G are permuted so that differing bits are all at the low end

Simhash and solving the hamming distance problem: explained

Simhashes are a clever means of rapidly finding near-identical documents (or other items) within a large corpus, without having to individually compare every document to every other document. Using simhashes for any sizable corpus involves two parts: generating the simhash itself, and solving the Hamming distance problem. Neither is much use without the other. Unlike minhashes, …

Simhash and solving the hamming distance problem: explained Read More »

Addition-only Bezier cubic splines

It’s not so often you need to code your own bezier curves nowadays. If you do, here’s a fast algorithm which uses fewer arithmetic operations — and additions only. I felt very clever when I discovered this a dozen years ago, but I wasn’t the first.