Sorting algorithms III – 9 years later

And so I’m back! I haven’t been here for quite some time, and a lot has happened in the meantime. I’ve worked as a Customer Support Manager at Eezeebee (a small software company for fashion B2B’s and B2C’s) for quite some time, but recently I made a career change by becoming something I never thought I’d become (mostly because my father was one): a teacher! I’ve recently begun as an IT teacher at the Hanzehogeschool (University of Applied Sciences). It’s hard work but it’s fun and it’s rewarding. And it brought me back to that old passion o’mine: sorting algorithms 🙂 One of the courses I teach, together with the esteemed Bas Heijne, is Algorithms and Data Structures, wherein sorting is used as an example case for developing efficient algorithms. Reading back on what I wrote 9 years ago, it seems to me that the infamous H Sort is nothing more than Selection Sort (although the keeping-in-memory part could be a small improvement). But… the other day I thought of yet another sorting algorithm: Index Sort, or maybe Hash Sort. It has to do with storing items in an array using their value (or a hash of it) as the index. Theoretically, this could give you a Big O of N (N times O(1)). I’m still developing and testing it using good old Java, and the results seem promising, but I have no final results yet. So, again, stay tuned for more on Sorting!

Sorting algorithms II

As posted a few days ago, I experimented a bit with a new sorting algorithm called H Sort. I built a simple testing framework in Java to see how well it performs compared to traditional Q Sort.

First I steadily increased the number of runs. I noticed that the average time it takes to do a sorting run decreases with the total number of runs. At first this seemed strange because the runs are unconnected, a new array of random numbers generated each run. Then I realized that this effect is probably due to more processing power and cores being allocated to the process when the number of runs increases. The average sorting times of the two algorithms begin to settle at 1,000 runs and become really stable at 100,000 runs.

  • Times: 100 Orderedness: 0% Size: 100 Average time Q: 99,32 micros Average time H: 152,88 micros Factor H vs. Q: 1,54
  • Times: 1000 Orderedness: 0% Size: 100 Average time Q: 9,40 micros Average time H: 37,18 micros Factor H vs. Q: 3,95
  • Times: 10000 Orderedness: 0% Size: 100 Average time Q: 8,92 micros Average time H: 35,15 micros Factor H vs. Q: 3,94
  • Times: 50000 Orderedness: 0% Size: 100 Average time Q: 8,81 micros Average time H: 33,52 micros Factor H vs. Q: 3,81
  • Times: 100000 Orderedness: 0% Size: 100 Average time Q: 8,72 micros Average time H: 32,94 micros Factor H vs. Q: 3,78
  • Times: 500000 Orderedness: 0% Size: 100 Average time Q: 8,87 micros Average time H: 33,52 micros Factor H vs. Q: 3,78
  • Times: 1000000 Orderedness: 0% Size: 100 Average time Q: 8,77 micros Average time H: 33,15 micros Factor H vs. Q: 3,78
Having found the optimal number of runs, I then varied the ‘orderedness’ of the generated sort lists. Orderedness corresponds to the chance that the number at index i is equal to i. If not, it is a random number. So an array with 0% orderedness is a completely random array, whereas one with 100% orderedness goes like 1, 2, 3, …, 100. I read that Q Sort doesn’t perform well on already quite ordered lists. I can verify that, except that when comparing to H Sort that moment is reached only around 98% orderedness, which is an almost completely ordered array. So no great advantage of H Sort so far…
  • Times: 100000 Orderedness: 0% Size: 100 Average time Q: 8,88 micros Average time H: 33,60 micros Factor H vs. Q: 3,78
  • Times: 100000 Orderedness: 10% Size: 100 Average time Q: 8,70 micros Average time H: 33,43 micros Factor H vs. Q: 3,84
  • Times: 100000 Orderedness: 20% Size: 100 Average time Q: 8,50 micros Average time H: 33,31 micros Factor H vs. Q: 3,92
  • Times: 100000 Orderedness: 30% Size: 100 Average time Q: 8,28 micros Average time H: 33,07 micros Factor H vs. Q: 4,00
  • Times: 100000 Orderedness: 40% Size: 100 Average time Q: 8,00 micros Average time H: 32,45 micros Factor H vs. Q: 4,05
  • Times: 100000 Orderedness: 50% Size: 100 Average time Q: 7,76 micros Average time H: 31,75 micros Factor H vs. Q: 4,09
  • Times: 100000 Orderedness: 60% Size: 100 Average time Q: 7,52 micros Average time H: 31,05 micros Factor H vs. Q: 4,13
  • Times: 100000 Orderedness: 70% Size: 100 Average time Q: 7,29 micros Average time H: 30,12 micros Factor H vs. Q: 4,13
  • Times: 100000 Orderedness: 80% Size: 100 Average time Q: 7,06 micros Average time H: 28,45 micros Factor H vs. Q: 4,03
  • Times: 100000 Orderedness: 90% Size: 100 Average time Q: 7,07 micros Average time H: 23,88 micros Factor H vs. Q: 3,38
  • Times: 100000 Orderedness: 95% Size: 100 Average time Q: 7,42 micros Average time H: 16,93 micros Factor H vs. Q: 2,28
  • Times: 100000 Orderedness: 98% Size: 100 Average time Q: 8,15 micros Average time H: 8,85 micros Factor H vs. Q: 1,09
  • Times: 100000 Orderedness: 99% Size: 100 Average time Q: 8,55 micros Average time H: 5,05 micros Factor H vs. Q: 0,59
  • Times: 100000 Orderedness: 100% Size: 100 Average time Q: 9,08 micros Average time H: 0,50 micros Factor H vs. Q: 0,06
Maybe better luck for H Sort when varying the array size? Indeed, H sort performs almost as well as Q Sort on arrays of size… 10. When increasing the array size, Q Sort quickly takes the lead. And then we run into the limitations of my Core i7 2.8 GHz laptop: sorting arrays of 10,000 numbers for 100,000 times took so much time I eventually gave up and killed the process.
  • Times: 100000 Orderedness: 50% Size: 10 Average time Q: 0,48 micros Average time H: 0,51 micros Factor H vs. Q: 1,08
  • Times: 100000 Orderedness: 50% Size: 100 Average time Q: 7,74 micros Average time H: 31,75 micros Factor H vs. Q: 4,10
  • Times: 100000 Orderedness: 50% Size: 500 Average time Q: 58,33 micros Average time H: 359,31 micros Factor H vs. Q: 6,16
  • Times: 100000 Orderedness: 50% Size: 1000 Average time Q: 127,35 micros Average time H: 1323,50 micros Factor H vs. Q: 10,39

All in all we can say that H Sort is not a very good algorithm, which in no useful situation is able to outperform good old Q Sort. It was fun developing and testing it, but I’ll leave it at this. The project can be found on GitHub for those who want to try themselves and maybe improve the algorithm.

Sorting algorithms

Some time ago I suddenly got an idea for a new sorting algorithm. What if you walked through an unsorted list of numbers, doing so until the next number was no longer greater than its predecessor, moved that number to the front and started again? I soon realized that this was a very brute and therefore somewhat inefficient approach. Performance could drastically improve if you don’t move the smaller number to the fron but keep it in memory during your next walk-through and insert it into the right spot when you pass it.

I wanted to test this so I made a Java program to implement my new sorting algorithm, which I dubbed ‘H sort’. Of course I needed a benchmark to test against, and I recalled the famous Q sort algorithm from my freshman year. I dusted off my old Pascal book (never thought I’d ever do that again!) and used it to implement Q sort in Java. Strangely enough Q sort does not seem to be in any standard Java library.

As I’d thought, brute H sort performed tragically badly. Its speed also seems highly dependent on the degree of ‘unsortedness’ of the sort list. Sorting an array of 10 integers ranging from 1 through 100 takes approximately 45 microsecs for Q sort, while brute H sort needs 80 up to 600 micros! Improved H sort takes around 50 micros, so that’s indeed quite acceptable.

Further comparing Q and (improved) H sort:

  • Array size 50: 125 vs. 520 micros
  • Array size 100: 260 vs. 1800 micros
  • Array size 1000: 4,000 vs. 18,000 micros

OK, so it’s quite good but not nearly as good as the old Q sort! Of course I need to test this further, running tests automatically for many times and averaging the results. I also want to vary the degree of ‘sortedness’ of the arrays, using partly-sorted ones for testing. I hope to work on this in my spare time the coming days and I’ll keep you posted! I’ll also look into whether this ‘H sort’ algorithm of mine is indeed new or if someone already came up with this (which seems very likely).