The String hash function
In the String class, for example, the hash code h of a string s of length n is calculated as
hash=a[0]*31^n-1+a[1]*31^n-2+........a[n-1]*31^0
or, in code,
int h = 0;
for (int i = 0; i < n; i++)
{
h = 31*h + s.charAt(i);
}
For example the hash code of hello uses the Unicode values of its characters
h e l l o
h e l l o
104 101 108 108 111(ASCII Values) to give the value 99162332=104*31^4+101*31^3+108*31^2+108*31^1+111
benefits of Doing This is that we won't get same hash values for anagram string e.g. hello & olhel, helol etc.
In general the arithmetic operations in such expressions will use 32-bit modular arithmetic ignoring overflow. For example Integer.MAX_VALUE + 1 = Integer.MIN_VALUE
where Integer.MAX_VALUE =2147483647 Integer.MIN_VALUE =-2147483648 Note that, because of wraparound associated with modular arithmetic, the hash code could be negative, or even zero. It happened to be positive in this case because hello is a fairly short string. Thus, for example, the hash code for helloa, which is 31 times the hash code for hello plus 97, would not be , which is outside the range of 32-bit signed integers, but 3074032079-2^32=-1220935217 (which is in the range)
so After All The Basic & Efficient has Function For String Looks Like
int hash(String key)
{ int h = 0;
for (int i = 0; i < key.length; i++)
{
h = 31*h + s.charAt(i); }
if(h>INT_MAX)
return h-INT_MAX // or h % table_size
else if(h<INT_MIN)
return h+INT_MAX;
return h; //else return h
}
Another Hash Function Cloud be if consider string only combination of alpha-bates so base 26
if we simply use the sum or crc type hash function we will get the same hash value thus collision.
see hash function
int hash (String s)
{
int k = 0, m = 26;
for (int i=0; i < s.length(); i++)
k += (int)s.charAt (i);
return ( k%m );
}
it will return same hash value for all anagram string as explained above .
int hash(String s)
{
int k = 0, m = 26;
for (int i=0; i < s.length(); i++)
k += (int)s.charAt (i) + i*(int)s.charAt (i);
return ( k%m );
}
benefits of Doing This is that we won't get same hash values for anagram string e.g. hello &
olhel, helol etc.
Some other information on String Hash Function .
http://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier
http://www.jchq.net/certkey/0902certkey.htm
Feel Free to Comment.
In the String class, for example, the hash code h of a string s of length n is calculated as
hash=a[0]*31^n-1+a[1]*31^n-2+........a[n-1]*31^0
or, in code,
int h = 0;
for (int i = 0; i < n; i++)
{
h = 31*h + s.charAt(i);
}
For example the hash code of hello uses the Unicode values of its characters
h e l l o
h e l l o
104 101 108 108 111(ASCII Values) to give the value 99162332=104*31^4+101*31^3+108*31^2+108*31^1+111
benefits of Doing This is that we won't get same hash values for anagram string e.g. hello & olhel, helol etc.
In general the arithmetic operations in such expressions will use 32-bit modular arithmetic ignoring overflow. For example Integer.MAX_VALUE + 1 = Integer.MIN_VALUE
where Integer.MAX_VALUE =2147483647 Integer.MIN_VALUE =-2147483648 Note that, because of wraparound associated with modular arithmetic, the hash code could be negative, or even zero. It happened to be positive in this case because hello is a fairly short string. Thus, for example, the hash code for helloa, which is 31 times the hash code for hello plus 97, would not be , which is outside the range of 32-bit signed integers, but 3074032079-2^32=-1220935217 (which is in the range)
so After All The Basic & Efficient has Function For String Looks Like
int hash(String key)
{ int h = 0;
for (int i = 0; i < key.length; i++)
{
h = 31*h + s.charAt(i); }
if(h>INT_MAX)
return h-INT_MAX // or h % table_size
else if(h<INT_MIN)
return h; //else return h
}
Another Hash Function Cloud be if consider string only combination of alpha-bates so base 26
if we simply use the sum or crc type hash function we will get the same hash value thus collision.
see hash function
int hash (String s)
{
int k = 0, m = 26;
for (int i=0; i < s.length(); i++)
k += (int)s.charAt (i);
return ( k%m );
}
it will return same hash value for all anagram string as explained above .
int hash(String s)
{
int k = 0, m = 26;
for (int i=0; i < s.length(); i++)
k += (int)s.charAt (i) + i*(int)s.charAt (i);
return ( k%m );
}
benefits of Doing This is that we won't get same hash values for anagram string e.g. hello &
olhel, helol etc.
No comments :
Post a Comment