This is an archive of the discontinued LLVM Phabricator instance.

Cuda Check for ignored errors after calling a CUDA kernel
Needs ReviewPublic

Authored by barcisz on Sep 15 2022, 10:44 AM.

Details

Summary

Add cuda-unchecked-kernel-call check

Motivation

Calls to CUDA kernels can yield errors after their invocation. These errors can be obtained by calling cudaGetLastError(), which also resets CUDA’s error state. There is a non error-resetting version of this function called cudaPeekAtLastError(), but the lint check does not accept this (see
below). A limited set of errors can block a kernel from launching including driver malfunctions, trying to allocate too much shared memory, or using too many threads or blocks. Since those errors can cause unexpected behavior that blocks subsequent computation, they should be caught as close
to the launch point as possible. The lint check enforces this by requiring that every kernel be immediately followed by an error check.

Behavior

The cuda-unchecked-kernel-call checks whether there is a call to cudaGetLastError() directly after each kernel call. To be precise, there can be no side-effecting or branching code between the kernel call and the call to cudaGetLastError(), such as branching due to the ?: operator or due
to a call to a function. This is because a more complicated behavior is likely to be harder for humans to read and would would be significantly slower to automatically check. We want to encourage well-designed, multi-line macros that check for errors, so we explicitly allow macros whose content is
do { /* error check */ } while(false), since this is the recommended way of making multi-line macros.
The check does also accept the handler it was provided as a valid way to handle the error, even if the handler does not comply with the rule above (or is a function which cannot be easily and quickly checked). However, it is still encouraged to call cudaGetLastError() early in the handler’s code
for the code to be readable.

Automatic fixes

The lint check can be configured to automatically fix the issue by adding an error handling macro right after the kernel launch. You can specify the error handler for your project by setting the HandlerName option for the cuda-unchecked-kernel-call. Here is an example of how this fix can
transform unhandled code from:

void foo(bool b) {
  if (b)
    kernel<<<x, y>>>();
}

to

void foo(bool b) {
  if(b)
    {kernel<<<x, y>>>(); `C10_CUDA_KERNEL_LAUNCH_CHECK`();}
}

The specific handler used for this example is taken from PyTorch and its definition can be found here.

Known Limitations

Using cudaPeekAtLastError()

cudaPeekAtLastError() can also be used to check for CUDA kernel launch errors. However, there are several reasons why this is not and will most likely not be considered as a valid way to check for errors after kernel invocations. This all has to do with the purpose of the function, which is to
not reset the internal error variable:

  • Subsequent kernel calls, even if they don’t produce any errors, will seem as if they produced an error due to the error not being reset. This behavior is easy to overlook and may cause he significant difficulty in debugging.
  • Our linter cannot easily check whether the error was reset before subsequent kernel calls. It might even be impossible to do so due to the error leaking inter-procedurally from functions whose code we can’t access.

Checking for errors that occurred while a kernel was running

Our linter does not check whether errors occurred while a kernel was running. The linter only enforces checks that a kernel launched correctly. cudaDeviceSynchronize() and similar API calls can be used to see that a kernel’s computation was successful, but these are blocking calls, so we are not
able to suggest where they should go automatically.

Parent diffs

This diff relies on D133436, D133725 and D133942 to properly run, so feel free to take a look at those as well

Diff Detail

Event Timeline

barcisz created this revision.Sep 15 2022, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 10:44 AM
barcisz requested review of this revision.Sep 15 2022, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 10:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a subscriber: tra.Sep 15 2022, 12:03 PM

Our linter cannot easily check whether the error was reset

It can not in principle. Many CUDA errors are 'sticky' and can only be cleared by resetting the GPU or exiting the application and the former is virtually never used beyond toy examples (resetting a GPU would clear a lot of state, including memory allocations and restoring it is usually not feasible in practice).
E.g:
https://stackoverflow.com/questions/43659314/how-can-i-reset-the-cuda-error-to-success-with-driver-api-after-a-trap-instructi
https://stackoverflow.com/questions/56329377/reset-cuda-context-after-exception/56330491

The checker has no way to tell whether the returned error is sticky and the stickiness can start at any CUDA runtime call, so, generally speaking, all CUDA API calls must be checked and any of them may be the one producing sticky errors due to preceding calls. At the very minimum, in addition to <<<...>>> kernel launches, user may also launch kernels via cudaLaunchKernel() and, I believe, CUDA runtime itself may launch some helper kernels under the hood, so I would not be surprised to see other sources of errors.

I think ultimately the checker should be generalized to flag all unchecked CUDA runtime calls. The problem is that that is going to be exceedingly noisy in practice as a lot of real code does not bother to check for the errors consistently. Limiting the checks to kernel launches may be a reasonable starting point as it would give us the ability to zero in on the culprit kernel by running the app with "CUDA_LAUNCH_BLOCKING".

clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu
53

Just curious -- is it sufficient to just call cudaGetLastError(); ? Or does the checker require using its result, too? I.e. in practice this particular check will not really do anything useful. The tests below look somewhat inconsistent.

75

WDYM by "is not considered safe" here? How is that different from calling cudaGetLastError() and checking its value?

80–82

What would happen with a single ; as would be seen in the normal user code?

112

Why does this case produce no warning, while a very similar case above does? In both cases result of cudaGetLastError() is assigned to an unused variable within the loop body.

b<<<1, 2>>>();
// CHECK-MESSAGES: :[[@LINE-1]]:{{[0-9]+}}: warning: Possible unchecked error after a kernel launch.
for(;;)
  auto err2 = cudaGetLastError(); // Brackets omitted purposefully, since they create an additional AST node

I think ultimately the checker should be generalized to flag all unchecked CUDA runtime calls. The problem is that that is going to be exceedingly noisy in practice as a lot of real code does not bother to check for the errors consistently. Limiting the checks to kernel launches may be a reasonable starting point as it would give us the ability to zero in on the culprit kernel by running the app with "CUDA_LAUNCH_BLOCKING".

By that do you mean that the way the check is now it is acceptable or that it should be improved to handle intra-procedural analysis? The intention with this check is to work a lot in tandem with the one in D133804, which therefore prevents most such cases. The practice that the check checks for is also commonly used in ML frameworks which heavily rely on CUDA, so not catching such cases might still be helpful for them just for the sake of preserving code consistency and catching such errors (since if there is an error then that means that some part of the code was broken anyways). Thus, the check is optimized for lowering false positives during static checking and for a practice lowering the number of false negatives within the CUDA code.

clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu
53

Technically it does not require the user to actually use the value of cudaGetLastError(), but

  1. If they are calling it then they most likely did not place this call there randomly and are using it to check for the error returned by the kernel
  2. the check being introduced in D133804 can be used to check if the return value has been used, so checking it here as well would have been a duplication
75

As in the check does not do inter-procedural analysis short of adding the handler to AcceptedHandlers, so the check will flag such occurences

80–82

Nothing, it would work just fine; it's rather that all other kernel calls in this test use a single ; so I want to check this case here

112

Because often a macro will wrap its error handling code in a do {...} while(0) loop and that's why we check this case and simmilar ones with CFG analysis

barcisz retitled this revision from git push Cuda Check for ignored errors after calling a CUDA kernel to Cuda Check for ignored errors after calling a CUDA kernel.Sep 15 2022, 3:01 PM
barcisz edited the summary of this revision. (Show Details)Sep 15 2022, 3:04 PM
tra added a comment.Sep 15 2022, 4:07 PM

The intention with this check is to work a lot in tandem with the one in D133804, which therefore prevents most such cases.
Thus, the check is optimized for lowering false positives during static checking and for a practice lowering the number of false negatives within the CUDA code.

SGTM. This patch + D133804 should have everything covered.

clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu
53
  1. If they are calling it then they most likely did not place this call there randomly and are using it to check for the error returned by the kernel

If that's the case, then why kernel launches on lines 45 and 51 are reported as possibly unchecked? Both are followed by the cudaGetLastError() call and are, technically checked, if we're not analyzing the usage of the result of the call.

What am I missing?

75

Hmm.. Using a helper function to check for cuda errors is a fairly common pattern.
Is there a way to annotate such a helper function as checks cudaGetLastError?

80–82

I still do not understand how it all fits together. What does a kernel call, the extra ;, the macro, and the checker code have to do with each other?

Is the idea that the checker should see though the empty statement between the kernel call and the checker macro?
If that's the case I'd make it a bit more prominent. E.g. something like this:

  b<<<1, 2>>>();
  ; /* Make sure that we see through empty expressions in-between the call and the checker. */ ;
CUDA_CHECK_KERNEL();
112

The do/while(0) wrapping part I understand. I'm puzzled why the checker appears to work differently with different loop kinds.
Why a cudaGetLastError() call inside do {} while() is detected and considered as a cuda result check, but the same call within for() {} is not?

barcisz added inline comments.Sep 15 2022, 4:51 PM
clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu
53

The idea is that the call should happen directly after the kernel without any branching (because branching can often make things much harder to understand in case of things like for loop make the error not actually have cudaGetLastError() called after every kernel call

75

There would be an easy way to do that, but it's much more common for projects to have those helper functions project-wide (or at least sub-project wide) which means they can be just specified explicitly for the project in the options for the check (the official documentation for the check will be uploaded tomorrow)

80–82

The reason we're checking for multiple ;s here is that due to macros not being present in the AST they have to be located on the lexer stage, which makes it necessary to search for them based on tokens. The tokens used after the kernel call here (semicolons and a comment) are the only allowed token between the kernel call and the macro, since any other one would indicate another statement being present

112

This is because for(<something>;<something>;<something>) works differently in the CFG analysis. The precise definition for where the cudaGetLastError() is allowed is that it should be the first statement/expression tree/function call after the kernel call and should be in a straight line from it. For example, for(;;) {cudaGetLastError()} would not be similifiable to a single control flow block, and for(;false;) {cudaGetLastError()} has an expression tree evaluated before the call to cudaGetLastError(). Technically , this definition currently only supports wrapping the statement with cudaGetLastError(), but is made more general in case

  • The user uses gotos to achieve a simmilar pattern
  • Such CFG layout can be achieved with other c++ mechanisms
  • Means to achieve such CFG layout with different mechanisms appear in future standards of c++
barcisz updated this revision to Diff 460743.Sep 16 2022, 7:07 AM

documentation for the check

LegalizeAdulthood resigned from this revision.Mar 29 2023, 8:19 AM