Programming Parallel Computers 2020

Exercise SO: sorting

Detailed specification

You need to implement the following function:

void psort(int n, data_t* data)

Here n is the size of the input array data. All input elements are of type data_t, i.e., unsigned long long, i.e., 64-bit unsigned integers.

Correct output

Your function should have the same behavior as:

std::sort(data, data+n);

That is, you need to sort this in place: modify the input array so that it will be in a sorted order. You are free to allocate some additional memory to keep temporary data if needed.

Rules

In your implementation, you are permitted and encouraged to use all single-threaded features of the C++ standard library. In particular, you can freely use the single-threaded std::sort as a subroutine in your code, as well as all other features of the algorithms library.

For multi-threading, you can use the basic primitives of OpenMP, as usual. Please do not hard-code the number of threads; please let OpenMP choose the number of threads, and use functions omp_get_max_threads and omp_get_num_threads if you need to know the number of threads. This way your code should automatically respect e.g. the OMP_NUM_THREADS environment variable. Your code should work reasonably efficiently for any number of threads between 1 and 8, and it should work correctly also for a larger number of threads. For example, your program must not crash if we change OMP_NUM_THREADS from 8 to 9, and your program should not immediately fall back to a single-threaded implementation if we change OMP_NUM_THREADS from 8 to 7. Rounding the number of threads down to the nearest power of 2 is acceptable.

For the GPU exercise, you have to stick to the basic CUDA API. In particular, you are not allowed to use Thrust.

In the merge sort task and quicksort task, it is perfectly fine to resort to the single-threaded std::sort in the base case (regardless of what algorithm the standard library happens to use internally); the key requirement is that the multi-threaded parts are based on the idea of merge sort and quicksort, respectively. For merging and partitioning, you are free to use any strategy, and you are encouraged to use also e.g. vector instructions whenever possible.