Add cuda-unchecked-kernel-call check
Motivation
Calls to CUDA kernels can yield errors after their invocation. These errors can be obtained by calling cudaGetLastError(), which also resets CUDA’s error state. There is a non error-resetting version of this function called cudaPeekAtLastError(), but the lint check does not accept this (see
below). A limited set of errors can block a kernel from launching including driver malfunctions, trying to allocate too much shared memory, or using too many threads or blocks. Since those errors can cause unexpected behavior that blocks subsequent computation, they should be caught as close
to the launch point as possible. The lint check enforces this by requiring that every kernel be immediately followed by an error check.
Behavior
The cuda-unchecked-kernel-call checks whether there is a call to cudaGetLastError() directly after each kernel call. To be precise, there can be no side-effecting or branching code between the kernel call and the call to cudaGetLastError(), such as branching due to the ?: operator or due
to a call to a function. This is because a more complicated behavior is likely to be harder for humans to read and would would be significantly slower to automatically check. We want to encourage well-designed, multi-line macros that check for errors, so we explicitly allow macros whose content is
do { /* error check */ } while(false), since this is the recommended way of making multi-line macros.
The check does also accept the handler it was provided as a valid way to handle the error, even if the handler does not comply with the rule above (or is a function which cannot be easily and quickly checked). However, it is still encouraged to call cudaGetLastError() early in the handler’s code
for the code to be readable.
Automatic fixes
The lint check can be configured to automatically fix the issue by adding an error handling macro right after the kernel launch. You can specify the error handler for your project by setting the HandlerName option for the cuda-unchecked-kernel-call. Here is an example of how this fix can
transform unhandled code from:
void foo(bool b) { if (b) kernel<<<x, y>>>(); }
to
void foo(bool b) { if(b) {kernel<<<x, y>>>(); `C10_CUDA_KERNEL_LAUNCH_CHECK`();} }
The specific handler used for this example is taken from PyTorch and its definition can be found here.
Known Limitations
Using cudaPeekAtLastError()
cudaPeekAtLastError() can also be used to check for CUDA kernel launch errors. However, there are several reasons why this is not and will most likely not be considered as a valid way to check for errors after kernel invocations. This all has to do with the purpose of the function, which is to
not reset the internal error variable:
- Subsequent kernel calls, even if they don’t produce any errors, will seem as if they produced an error due to the error not being reset. This behavior is easy to overlook and may cause he significant difficulty in debugging.
- Our linter cannot easily check whether the error was reset before subsequent kernel calls. It might even be impossible to do so due to the error leaking inter-procedurally from functions whose code we can’t access.
Checking for errors that occurred while a kernel was running
Our linter does not check whether errors occurred while a kernel was running. The linter only enforces checks that a kernel launched correctly. cudaDeviceSynchronize() and similar API calls can be used to see that a kernel’s computation was successful, but these are blocking calls, so we are not
able to suggest where they should go automatically.
Parent diffs
This diff relies on D133436, D133725 and D133942 to properly run, so feel free to take a look at those as well
Just curious -- is it sufficient to just call cudaGetLastError(); ? Or does the checker require using its result, too? I.e. in practice this particular check will not really do anything useful. The tests below look somewhat inconsistent.