Wednesday, December 17, 2008

Three sorting algorithm ideas

Binary Sort

A sorting algorithm that sorts in constant amortized time, but which is proportional to the average length, in binary, of the elements being sorted

Let's say we have a function that returns a list of {0, 1} values which is a given element's binary representation. Now we make an array of the size of our input list, which we will use as a binary tree. Each node will be a struct instance. For each element in our list to be sorted, we navigate this tree using the element's binary sequence as the path, creating new branches as necessary. When we get to the end, we insert a reference to the original object into the beginning of a linked list of references to objects belonging at that node (i.e., equivalent elements). Then at the end of our sorting we simply walk the tree and traverse its linked lists to return our sorted list. If we want equivalent objects to appear in the same order in which they appeared in the original list, we can append them to the end of the linked lists instead and have the node store a pointer to the current last linked list item.

This is only the naive version, though.. we don't really have to make the full path for every element. Instead of having a function that returns an object's binary representation, we'll have a function that returns an iterator for it. Then instead of navigating/creating nodes to the end of the path, we navigate until we find a node who simply stores a pointer to an iterator for an element that's somewhere underneath it. Now, since a node can only contain one such iterator pointer, when you come to such a node you must poll values from *both* iterators (your current element's and the node's) and create a path until the values diverge, at which point you create two children and put your respective iterator and element pointers in those. Of course you'd nullify the pointers in the original node. You have a 50/50 chance of this happening on the first poll, a 3/4 chance of it happening by the second poll, etc. When an iterator exhausts itself you delete the iterator object, nullify the iterator pointer and start the linked list of equivalent elements for that node.

Alternatively to the linked lists we could simply have a pointer to the first instance of an equivalent member and a count value that specifies how many times it repeats, but that only works with members that don't have any 'hidden variables' with respect to their binary representations. Plain old strings or numbers would work just fine that way. Sorting person objects by last name, wouldn't.

This algorithm isn't generally suitable for mixed-type collections or any type that uses a special comparison function. Strings, ints, floating points and fixed decimals should work fine. The above is optimized for strings. For a numerical type, all iterators are exhausted at the same branching level (8, 16, 32, or 64), although that doesn't change the algorithm a whole lot -- maybe provides for some minor optimizations. But because comparing two numbers might be less expensive than grabbing a value from a binary iterator, we could forgo the binary representations altogether for numbers. Instead of a node storing a pointer to an iterator and an element, it would simply store a number. That could be a huge improvement. And even if we still used binary representations, most numeric values probably start with lots of 0's, so we could speed up the process by performing a BSF assembly command on it to determine the number of leading zeros, then jump directly to a node, skipping the first portion of the path constituting zeros, based on a table of pointers for numbers of leading zeros up to 8, 16, 32 or 64.

Binary Search Library Sort

Use the Library Sort method ( http://en.wikipedia.org/wiki/Library_sort ), but instead of just iterating through the list to find where an item belongs, do a binary search. Since we're starting with an empty gapped list, our elements in it will always be sorted. A new element will be placed in the first empty binary search location, so our search will never have to worry about traversing gaps.

Binary Search Sort

This is actually not a sorted list; it's a sorted linked list. Take one element and put it in the list. That element becomes the first search point. The next element becomes the next search point either larger than or smaller than the previous element. And so and and so on (except that identical elements exist in the same search point). This method actually seems to be identical to the binary sort method above where we use comparison instead of a binary representation. But we can perhaps improve the algorithm by changing search points. For example, each node can store the average value of the two nodes it's 'between' (its parent and grandparent, and we can also use the max and min of the whole collection for the first two two depths if we want), and if a value comes along that is about to be placed undeneath it, but it's closer to the parent's average value than its parent is, then it swaps itself with its parent node. For strings our average might just have to be a character and its place in the string.

No comments: