You need to implement the following function:
void psort(int n, data_t* data)
n is the size of the input array
data. All input elements are of type
unsigned long long, i.e., 64-bit unsigned integers.
Your function should have the same behavior as:
That is, you need to sort this in place: modify the input array so that it will be in a sorted order. You are free to allocate some additional memory to keep temporary data if needed.
In your implementation, you are permitted and encouraged to use all single-threaded features of the C++ standard library. In particular, you can freely use the single-threaded
std::sort as a subroutine in your code, as well as all other features of the algorithms library.
For multi-threading, you can use the basic primitives of OpenMP, as usual. Please do not hard-code the number of threads; please let OpenMP choose the number of threads, and use functions omp_get_max_threads and omp_get_num_threads if you need to know the number of threads. This way your code should automatically respect e.g. the
OMP_NUM_THREADS environment variable. Your code should work reasonably efficiently for any number of threads between 1 and 8, and it should work correctly also for a larger number of threads. For example, your program must not crash if we change
OMP_NUM_THREADS from 8 to 9, and your program should not immediately fall back to a single-threaded implementation if we change
OMP_NUM_THREADS from 8 to 7. Rounding the number of threads down to the nearest power of 2 is acceptable.
For the GPU exercise, you have to stick to the basic CUDA API. In particular, you are not allowed to use Thrust.
In the merge sort task and quicksort task, it is perfectly fine to resort to the single-threaded
std::sort in the base case (regardless of what algorithm the standard library happens to use internally); the key requirement is that the multi-threaded parts are based on the idea of merge sort and quicksort, respectively. For merging and partitioning, you are free to use any strategy, and you are encouraged to use also e.g. vector instructions whenever possible.