Programming Parallel Computers 2020

Help with debugging

Correctness issues

In general, always try to isolate the problem first! Figure out what is the smallest, simplest code that still does something unexpected. Remember that you are not limited to use our makefiles and test scripts, but you can always develop e.g. your own unit tests.

Also make sure that you are using Maari-A computers. You do not need to be physically there, you can use ssh.

Strange bugs, segmentation faults, etc.?

First try AddressSanitizer.

Still unexplained segmentation faults?

It might be a stack overflow. Unfortunately, a stack overflow is typically reported as a segmentation fault. In the classroom computers, the stack size limit is approx. 8MB. Do not allocate large arrays on the stack. If you need to allocate storage for megabytes of data, use the heap.

Another possibility is that you might be using vector types without proper memory alignment. Remember to use e.g. posix_memalign instead of malloc, new, or std::vector for dynamic memory allocation whenever you use vector types such as float8_t.

Try to use a debugger, e.g. GDB to see precisely where the program crashes.

Random results? Strange results?

You might be reading wrong parts of the memory. Try AddressSanitizer.

You might be reading memory that is not initialized. In C and C++, memory allocation functions typically do not guarantee that memory is initialized with zeros. However, it is easy to forget to initialize newly allocated memory, and in many cases your program may accidentally work correctly as newly allocated memory often happens to contain all zeros. To better detect bugs related to the use of uninitialized memory accesses on Linux, try to run your program (here mf-test) e.g. as follows:

MALLOC_PERTURB_=191 ./mf-test

With this environment variable setting, malloc and other related functions will fill newly allocated memory with the value 64. If you interpret such values as doubles or floats, you will get reasonable non-zero values that will hopefully more easily reveal bugs related to uninitialized memory. Please note that this is not compatible with AddressSanitizer, so you must first compile without debugging options.

My CUDA code does not seem to work at all?

Check for errors. Wrap all CUDA API calls in error-checking macros, and also check for errors after each kernel launch. For example, you can define a macro like this:

#define CHECK_CUDA_ERROR(call) do { \
    cudaError_t result_ = (call); \
    if (result_ != cudaSuccess) { \
        fprintf(stderr, #call " failed: %s\n", \
                cudaGetErrorString(result_)); \
        exit(1); \
    } \
} while(0)

And use it like this:

...
CHECK_CUDA_ERROR(cudaMalloc((void**)&x, n));
CHECK_CUDA_ERROR(cudaMalloc((void**)&y, n));
...
kernel<<<dimGrid, dimBlock>>>(params);
CHECK_CUDA_ERROR(cudaGetLastError());
...