In general, always try to isolate the problem first! Figure out what is the smallest, simplest code that still does something unexpected. Remember that you are not limited to use our makefiles and test scripts, but you can always develop e.g. your own unit tests.
Also make sure that you are using Maari-A computers. You do not need to be physically there, you can use ssh.
First try AddressSanitizer.
It might be a stack overflow. Unfortunately, a stack overflow is typically reported as a segmentation fault. In the classroom computers, the stack size limit is approx. 8MB. Do not allocate large arrays on the stack. If you need to allocate storage for megabytes of data, use the heap.
Another possibility is that you might be using vector types without proper memory alignment. Remember to use e.g.
posix_memalign instead of
std::vector for dynamic memory allocation whenever you use vector types such as
Try to use a debugger, e.g. GDB to see precisely where the program crashes.
You might be reading wrong parts of the memory. Try AddressSanitizer.
You might be reading memory that is not initialized. In C and C++, memory allocation functions typically do not guarantee that memory is initialized with zeros. However, it is easy to forget to initialize newly allocated memory, and in many cases your program may accidentally work correctly as newly allocated memory often happens to contain all zeros. To better detect bugs related to the use of uninitialized memory accesses on Linux, try to run your program (here
mf-test) e.g. as follows:
With this environment variable setting,
malloc and other related functions will fill newly allocated memory with the value 64. If you interpret such values as doubles or floats, you will get reasonable non-zero values that will hopefully more easily reveal bugs related to uninitialized memory. Please note that this is not compatible with AddressSanitizer, so you must first compile without debugging options.
Check for errors. Wrap all CUDA API calls in error-checking macros, and also check for errors after each kernel launch. In your Git repository, you will find in the header file
common/cudacheck.h the definition of macro
CHECK that you can use for this purpose, e.g., as follows:
And use it like this:
#include "cudacheck.h" ... CHECK(cudaMalloc((void**)&x, n)); CHECK(cudaMalloc((void**)&y, n)); ... kernel<<<dimGrid, dimBlock>>>(params); CHECK(cudaGetLastError()); ...
It is also highly recommended to try to use
cuda-memcheck, for example, like this (using CP exercises as an example):
cuda-memcheck tool is able to detect many bugs related to memory accesses, a bit like AddressSanitizer in CPU-side code. You can also use the “racecheck” option to detect race conditions in shared memory accesses:
cuda-memcheck --tool racecheck ./cp-test