Hashing tutorial?
Does anybody know where to get a good tutorial on how to speed up the LZ77 algorithm using hashing?
Thanks in advance!
February 06, 2000 06:26 PM
I too am curious about this. More generally I''m trying to find out what exactly hashing is/accomplishes.
I gather that it helps to speed up searching or sorting. I''ve looked fairly thoroughly and I can''t really find a good definition + example of hashing. That''s not completely true, I have found short definitions on some computer science sites, but that def''n just made me look up tons of other definitions so in the end I got nowhere.
Hashing--whats the deal???
Thanks.
I gather that it helps to speed up searching or sorting. I''ve looked fairly thoroughly and I can''t really find a good definition + example of hashing. That''s not completely true, I have found short definitions on some computer science sites, but that def''n just made me look up tons of other definitions so in the end I got nowhere.
Hashing--whats the deal???
Thanks.
Hashing is a sorting method and it is also a searching method. Lets say you have a list of 100 names. You need to locate those names rather simply in an array of 100 names. Hashing must use a bounded data structure because of the nature of the hasing function. In order to store these in an efficient manner one runs the name string through an algorithm to generate a numerical representation of the key. This is called the hashing function and can be any math function you can think of. Lets use a function that adds the value of each letter in the name and divides by the number 10 and truncates the result to an integer. The number of the letter is its position in the alphabet starting with A and ending with Z.
For this example we will use three names
D E R E K
4 + 5 + 18+ 5 + 11 = 43/10 = 4.3 = 4
A M Y
1 + 13+ 25 = 39/10 = 3.9 = 3.9 = 3
A I M Y
1 + 9 + 13+ 25 = 48/10 = 4.8 = 4
What the algorithm does is it gives us an offset or an array position where the key is located in the hash array. We would therefore place Derek into cell 4 and Amy into cell 3. When we needed to find Derek we would run the algorithm again and return the cell reference. Simple, efficient and should result in (N) operations for any one lookup plus the cost of the calculation.
With hashing tables such as this one that have an inefficient hashing algorithm you can begin to see a problem. Derek and Aimy hash into the same array cell address. We call this a collision. When a collision occurs most algorithms rely on an insertion sort to find the next free cell to put the value. The insertion sort ends when the key has been placed into the next available sorted cell. We would therefore place Aimy into cell 5. When you search for this our wonderful algorithm becomes (N) operations + the calculation + the cost of a sequential search on keys that collide. Clearly you can see that efficiency of this sort search method degrades faster with each successive insertion after a collision. In the example above cells 3, 4 and 5 would be filled making all hashing into cell 5 also an insertion sort/sequential search due to collision.
The key to making a hashing algorithm work is for it to be complex enough to return all unique values but be computationally more efficient than other sorting algorithms. You will see hashing algorithms used extensively in controlled sets of data such as game command tables. In a game command table you can use preprocessor defines to define a command number and use that number as a hash to an array of function pointers in C or an array of command objects in C++. This assures uniquenes and efficiency of both the table(no collisions) and the hashing function.(there isn't one) You can make the example above slightly more collision proof by making each array cell a linked list or binary tree with respect to collisions. This will ensure that one collision will not cause more collisions as only values that hash to cell 4 will be located in cell 4(or some part of the linked list/binary tree below it) Be careful though as this might end up giving you nothing but complexity in the end and inefficiency in the total cost of the search/sort method.
Hope that helps explain hashing. Any good Algorithms book will be able to explain this further.
Kressilac
ps as for the original question I am not sure. I would have to look it up, but I suspect it is well documented on the internet and in books.
Edited by - kressilac on 2/7/00 12:59:17 AM
For this example we will use three names
D E R E K
4 + 5 + 18+ 5 + 11 = 43/10 = 4.3 = 4
A M Y
1 + 13+ 25 = 39/10 = 3.9 = 3.9 = 3
A I M Y
1 + 9 + 13+ 25 = 48/10 = 4.8 = 4
What the algorithm does is it gives us an offset or an array position where the key is located in the hash array. We would therefore place Derek into cell 4 and Amy into cell 3. When we needed to find Derek we would run the algorithm again and return the cell reference. Simple, efficient and should result in (N) operations for any one lookup plus the cost of the calculation.
With hashing tables such as this one that have an inefficient hashing algorithm you can begin to see a problem. Derek and Aimy hash into the same array cell address. We call this a collision. When a collision occurs most algorithms rely on an insertion sort to find the next free cell to put the value. The insertion sort ends when the key has been placed into the next available sorted cell. We would therefore place Aimy into cell 5. When you search for this our wonderful algorithm becomes (N) operations + the calculation + the cost of a sequential search on keys that collide. Clearly you can see that efficiency of this sort search method degrades faster with each successive insertion after a collision. In the example above cells 3, 4 and 5 would be filled making all hashing into cell 5 also an insertion sort/sequential search due to collision.
The key to making a hashing algorithm work is for it to be complex enough to return all unique values but be computationally more efficient than other sorting algorithms. You will see hashing algorithms used extensively in controlled sets of data such as game command tables. In a game command table you can use preprocessor defines to define a command number and use that number as a hash to an array of function pointers in C or an array of command objects in C++. This assures uniquenes and efficiency of both the table(no collisions) and the hashing function.(there isn't one) You can make the example above slightly more collision proof by making each array cell a linked list or binary tree with respect to collisions. This will ensure that one collision will not cause more collisions as only values that hash to cell 4 will be located in cell 4(or some part of the linked list/binary tree below it) Be careful though as this might end up giving you nothing but complexity in the end and inefficiency in the total cost of the search/sort method.
Hope that helps explain hashing. Any good Algorithms book will be able to explain this further.
Kressilac
ps as for the original question I am not sure. I would have to look it up, but I suspect it is well documented on the internet and in books.
Edited by - kressilac on 2/7/00 12:59:17 AM
Derek Licciardi (Kressilac)Elysian Productions Inc.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement