Hash Functions (continued) - Fowler/Noll/Vo Hash
해쉬 함수를 찾으려 구글링하다가 FNV라는 해쉬를 만들었다
저장소가 2의 승수가 아닐경우 이 함수를 만든 사람이 FNV를 추천한다고함 FNV 는 저자가 만든것임;
아래 그래프는 결과값으로 32비트를 구할 경우에 대한 균일 분포를 나타낸 그래프라고 함
분포는 대략 괜찮은 분포인듯.. 문자열이 비슷하고 하위 부분만(완전 하위는 아니였지만) 좀 바꿔서 해보니 Figure1 코드에선
키값은 그닦 잘 뽑지는 못한다(Figure4코드는?)
Figure1 소스를 32bit라 가정하고 c++로 코드를 바꿔 시간을 측정해보니 대략 100000 번 정도 돌리니
for(int i=0;i<100;++i){
FNV("Fieldstone_DM.tga");
}
경과시간 : 22.018000 (ms) , 220180 (us)
측정 횟수가 늘 수 록 이 이상에선 여기서대략 해보니 100000 이 횟수에 10을 곱한 횟수가 경과시간에서 수치상 10을 곱한시간과
대략 비슷하닥 보면 됨
그리고 100번 돌렸을때는
경과시간 : 0.043000 (ms) , 430 (us)
정도의 시간을 보임
Figure4 에 나오는 코드로 테스트해보니
10000000 번 수행에서 다음처럼
name : fnvs , 경과시간 : 2665.978000 (ms) , 26659780 (us)
name : fnvs1 , 경과시간 : 2126.676000 (ms) , 21266760 (us)
속도가 Figrue1보다도 빠르고 키 값또한 잘 뽑는다
아래는 사이트..
http://home.comcast.net/~bretm/hash/6.html
Hash Functions (continued)
Fowler/Noll/Vo Hash
Hash Functions (continued)
Fowler/Noll/Vo Hash
This hash function is described here. It is a simple multiplicative hash with the addition of a post-processing step called xor-folding to remove some linearity in the lower bits. The FNV authors recommend using xor-folding if the desired hash size is not a power of 2, but they actually mean to use xor-folding when the hash size is not a power of 2 or is less than 32.
There are different initialization and multiplication constants for use with 32-bit, 64-bit, 128-bit hashes, etc., but we'll only examine the 32-bit version of FNV here. In our uniform distribution test we'll use xor-folding for the lower bits since that is the intended use of this hash, but we'll examine the upper bits as-is. Since our avalanche test is a full 32-bit test, we can't use xor-folding there. This will allow us to identify classes of keys for which using xor-folding is a necessity.
The primary appeals of this hash are its simplicity and its speed, but its speed is dependent on whether or not the CPU architecture supports fast integer multiplication. The Jenkins shift-add-xor hashes are faster on CPU architectures without fast multiplication.
Figure 1 shows the listing for the FNV 32-bit hash.
Figure 1
Fowler/Noll/Vo (FNV) 32-bit hash function.
public class FNV : HashFunction
{
int shift;
uint mask;
// hash without xor-folding
public FNV()
{
shift = 0;
mask = 0xFFFFFFFF;
}
// hash with xor-folding
public FNV(int bits)
{
shift = 32 - bits;
mask = (1U << shift) - 1U;
}
public override uint ComputeHash(byte[] data)
{
uint hash = 2166136261;
foreach (byte b in data)
hash = (hash * 16777619) ^ b;
if (shift == 0)
return hash;
return (hash ^ (hash >> shift)) & mask;
}
}
Uniform Distribution Test
The details of this test are described fully on aprevious page. We examine the distribution of numbers derived from lower and upper bits of the hash output in sizes of 1 through 16 bits.
Figure 2 shows the results of this test for the FNV hash. This test indicates that the FNV 32-bit hash with xor-folding produces uniformly distributed values for hash tables that are a power of two, up to at least 214, when the key octets are uniformly distributed, distributed similar to alphabetic text, or sparsely distributed.
Figure 2
χ2 test results for FNV 32-bit hash, with xor-folding for the lower-bit tests.
Uniform keys Text keys Sparse keys
Bits Lower Upper Lower Upper Lower Upper
1 0.777 0.888 0.888 0.888 0.480 0.480
2 0.967 0.326 0.407 0.197 0.513 0.720
3 0.109 0.390 0.498 0.103 0.573 0.016
4 0.548 0.416 0.649 0.210 0.143 0.469
5 0.360 0.606 0.931 0.992 0.665 0.201
6 0.328 0.753 0.584 0.416 0.882 0.361
7 0.436 0.560 0.995 0.302 0.981 0.124
8 0.297 0.729 0.856 0.730 0.472 0.113
9 0.222 0.349 0.629 0.951 0.701 0.769
10 0.731 0.208 0.066 0.646 0.875 0.551
11 0.813 0.356 0.678 0.820 0.519 0.556
12 0.076 0.229 0.521 0.068 0.091 0.474
13 0.780 0.565 0.719 0.090 0.117 0.132
14 0.225 0.813 0.269 0.251 0.855 0.568
15 0.583 0.005 0.699 0.370 0.571 0.218
16 0.004 0.000 0.117 0.012 0.024 0.211
Figure 3
Avalanche behavior of the FNV 32-bit hash.
For two-octet keys:
For four-octet keys:
For 256-octet keys:
The upper bits are not uniformly distributed if you use more than 14 or 15 bits. Because of this, I don't recommend using this hash with hash tables larger than 214 buckets. The results are acceptible up to 216 bits for text keys, so you may be able to use it for that purpose if you carefully test the performance in your particular application.
Avalanche Test
The details of this test are described fully on aprevious page. We examine the diffusion of input bits into different bit positions in the output.
Figure 3 shows the results of this test for the FNV hash. This test indicates that the FNV hash has poor avalanche behavior, as do all simple multiplicative hashes. This means that there will be some classes of keys for which the hash function does not produce uniform output, even for small bucket counts where the χ2 tests succeeded above.
Of particular concern are the two low-order bits. These are always just a simple linear function of the two low-order bits of the keys octets. For example, in the full 32-bit hash value, the low-order bit will always just be a simple XOR of the LSBs of the key octets and the LSB from the intialization constant. Also of concern is that fact that the upper bits of the key octets do not have any influence on the low-order bits of the hash output (without xor-folding). The MSB of each key octet does not diffuse at all into the entire lower 8 bits of the hash value. This indicates that you must follow the xor-folding recommendations for all classes of keys.
Also, the mixing step of this hash is never applied to the last octet of the key. This shows clearly in the avalanche results. The authors offer an alternative form of the hash where the combining step is done before the mixing step, and I recommend that you adopt this alternative if you use FNV. There is no reason to do otherwise, and I'm surprised that the authors do not recommend this by default.
Figure 4
Modified FNV with good avalanche behavior and uniform distribution with larger hash sizes.
public class ModifiedFNV : HashFunction
{
public override uint ComputeHash(byte[] data)
{
const uint p = 16777619;
uint hash = 2166136261;
foreach (byte b in data)
hash = (hash ^ b) * p;
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
return hash;
}
}
Conclusion
I don't recommend using 32-bit FNV as a general-purpose hash. It can produce uniform output, but its suitability needs to be tested for each class of keys for which you intend to use it. It is likely to produce non-uniform output if you have more than 214 or 215 hash buckets. If the CPU architecture does not have fast integer multiplication, use a shift-add-xor hash instead.
If you want to use an FNV-style hash function, I recommend using the modified version listed in Figure 4. This version passes all the uniform distribution tests above and it achieves avalanche for every tested combination of input and output bits (green squares everywhere). No xor-folding step is required.
The only difference between this version and the original version is that the mixing steps occursafter the combining step in each round, and it adds a post-processing step to mix the bits even further. These two changes completely correct the avalanche behavior of the function. As a result, this version of FNV passes all of the χ2 tests above, all the way up to 216 buckets. I haven't tested larger sizes but I suspect it would be OK there as well.
First page, Next page
Home | E-mail: BretMulvey at Hotmail
Copyright © 2007 Bret Mulvey. All Rights Reserved.