/* MurmurHash2, by Austin Appleby * Note - This code makes a few assumptions about how your machine behaves - * 1. We can read a 4-byte value from any address without crashing * 2. sizeof(int) == 4 * * And it has a few limitations - * * 1. It will not work incrementally. liyazzi: 也就是key需要一次性完整知道,不适合动态数据流,因为每一次的混合计算都依赖于前面的数据,然后最后的混合依赖于整个数据 * 2. It will not produce the same results on little-endian and big-endian 影响跨平台性 * machines. */ unsigned int dictGenHashFunction(const void *key, int len) { /* 'm' and 'r' are mixing constants generated offline. They're not really 'magic', they just happen to work well. */ uint32_t seed = dict_hash_function_seed; const uint32_t m = 0x5bd1e995; const int r = 24;
/* Initialize the hash to a 'random' value */ uint32_t h = seed ^ len;
/* Mix 4 bytes at a time into the hash */ const unsigned char *data = (const unsigned char *)key;
while(len >= 4) { uint32_t k = *(uint32_t*)data;
k *= m; k ^= k >> r; k *= m;
h *= m; h ^= k;
data += 4; len -= 4; }
/* Handle the last few bytes of the input array */ switch(len) { case 3: h ^= data[2] << 16; case 2: h ^= data[1] << 8; case 1: h ^= data[0]; h *= m; };
/* Do a few final mixes of the hash to ensure the last few * bytes are well-incorporated. */ h ^= h >> 13; h *= m; h ^= h >> 15;
return (unsigned int)h; }
哈希函数中大多混淆操作选用 ^ 的原因:
异或操作能混合数据的各个位的特征,异或操作拥有自反性
消除偏置:避免简单的加法与乘法带来的累计偏置
性能高效:CPU级别运算,位运算通常速度较快
Redis中的使用
在 redis 中,MurmurHash2 被用作默认哈希函数实现,主要用于:
计算哈希表的 key 索引
计算一致性哈希中虚拟节点的位置
选择 MurmurHash2 的原因:
快速:适合 redis 高性能的需求
均匀:减少哈希冲突
简单:实现简单
缺点
MurmurHash2 在某些情况下可能会出现哈希碰撞率略高的问题。
Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.[51] With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.