:: String Sort on GPUs ::

Introduction ::

String sorting or variable-length key sorting has lagged in performance on the GPU even as the fixed-length key sorting has improved dramatically. Radix sorting is the fastest on the GPUs. In this work, we present a fast and efficient string sort on the GPU that is built on the available radix sort. Our method sorts strings from left to right in steps, moving only indexes and small prefixes for efficiency. We reduce the number of sort steps by adaptively consuming maximum string bytes based on the number of segments in each step. Performance is improved by using Thrust primitives for most steps and by removing singleton segments from consideration. Over 70% of the string sort time is spent on Thrust primitives. This provides high performance along with high adaptability to future GPUs. We achieve speed of up to 10 over current GPU methods, especially on large datasets. We also scale to much larger input sizes. We present results on easy and difficult strings defined using their after-sort tie lengths.

Code and Datasets ::

The code is hosted on github [Click Here]

Related Publications ::

1. Aditya Deshpande and P J Narayanan: Can GPUs Sort Strings Efficiently? In: IEEE High Performance Computing (2013) [Best GPU Paper Award]

2. Aditya Deshpande and P J Narayanan: Fast Burrows Wheeler Compression Using All-Cores In: AsHES, IEEE International Parallel and Distributed Processing Symposium (2015)

Associated People ::

Aditya Deshpande Prof. P J Narayanan

Last Modified: Wed Jan 2, 2:03:00 IST 2013