Thursday, December 03, 2009

Hash Table Sort

For a given input list of size n, make a hash table of size n*2.

Run through the list once to determine its maximum and minimum values.

Create a hash function that puts an item in a slot depending on where it lies in between the minimum and maximum values. For example, if there are 50 slots, the minimum value is 10, and the maximum value is 20, then a value of 12 would go into the 10th slot. (A comparable formula *can* be used for strings.)

After placing all the items, go through the hash table and sort each individual linked list, using the sorting method most efficient for its size (probably always an insertion sort where the input list is randomly distributed). Concatenate all the sorted lists end-to-end.

This algorithm is less particularly advantageous the less unevenly distributed the input list, but we can always mitigate that problem by using an n*n hash table or just a larger hash table to any degree. Even an n*(n/x) hash table where each secondary table is a table of linked lists.

This algorithm could also use a histogram to determine individual slot ranges and secondary table sizes.

Where the range of values is smaller than the length of the list, you don't actually need a hash table of size n*2. It can be of size range*2.

I think this whole idea is just reinventing the bucket sort. Oh well.

No comments: