Page MenuHomePhabricator

Cuda Check for ignored return errors from api calls to cuda
Needs ReviewPublic

Authored by barcisz on Sep 13 2022, 1:02 PM.

Details

Summary

Add cuda-unchecked-api-call check

Motivation

cuda_runtime.h header is included by default in all CUDA files. Functions defined there provide an interface between regular c/c++ and the CUDA driver. Most of them return errors in the form of cudaError_t enum. The specific errors returned for each call are described in the CUDA documentation. However, it is really easy to ignore those errors and that the calls can fail. This can happen for call-related reasons, such as the address provided to the call being in the wrong address space (host vs. device), or for external reasons, such as problems with the CUDA device or with the driver. In any case, those can and will impact your execution, in the best case leading to a crash or, in the worst case, resulting in an incorrect program state.

Behavior

The cuda-unchecked-api-call check checks whether the value returned by a call to the CUDA API is unused, similar to the bugprone-unused-return-value check (however, it is more specific and it’s more likely to be used by the people that need it). It defines a CUDA API function as a function that:

  • returns a type cudaError_t
  • Is included through a header whose suffix is cuda_runtime.h (this allows for cuda_runtime.h to be replaced by for example _cuda_runtime.h in case some buck setup is configured like that)

Automatic fixes

The lint check can be configured to produce a FixItHint that puts the value from the CUDA API call inside a macro or a function handler. You can specify the error handler for your project by setting the HandlerName option for the cuda-unchecked-api-call. Here is an example of how this fix can transform unhandled code from:

void foo() {
  cudaDeviceReset();
}

to

void foo() {
  C10_CUDA_CHECK(cudaDeviceReset());
}

The specific handler used for this example is taken from PyTorch and its definition can be found here.

Limiting the allowed handlers

Since the projects may only have a limited set of handlers for the errors thrown by CUDA, there is also an option to limit the allowed ways to handle the value of the error returned by the check by setting the AcceptedHandlers option to a comma-separated list of names (which can be scoped) of those allowed error handlers. If HandlerName is set then it will also be implicitly added to that list. This option also works with dummy macros that pass the error through and do not do anything (which may be present in the code for performance reasons).

Parent diffs

This diff relies on D133801 and D133436 to properly run, so feel free to take a look at those as well

Diff Detail

Event Timeline

barcisz created this revision.Sep 13 2022, 1:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 1:02 PM
barcisz requested review of this revision.Sep 13 2022, 1:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 1:02 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
barcisz edited the summary of this revision. (Show Details)Sep 13 2022, 1:04 PM
barcisz updated this revision to Diff 459870.Sep 13 2022, 2:08 PM

Added some explanation comments

tra added inline comments.Sep 13 2022, 2:49 PM
clang-tools-extra/test/clang-tidy/check_clang_tidy.py
121 ↗(On Diff #459870)

The comment is incorrect. These flags have nothing to do with GPU availability, but rather with CUDA SDK which is normally expected to provide the 'standard' set of CUDA headers and libdevice bitcode.

clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-api-call-function-handler.cu
2

This does not look right.

barcisz updated this revision to Diff 460019.Sep 14 2022, 3:17 AM

More explanation comments for the check's code

barcisz updated this revision to Diff 460025.Sep 14 2022, 3:38 AM

Rebase and better comments for cuda-related decisions in the check

barcisz edited the summary of this revision. (Show Details)Sep 14 2022, 3:49 AM
barcisz edited the summary of this revision. (Show Details)
barcisz updated this revision to Diff 460034.Sep 14 2022, 4:15 AM

Removed unnecessary cuda-related compilation flags from tests

barcisz added inline comments.Sep 14 2022, 5:50 AM
clang-tools-extra/test/clang-tidy/check_clang_tidy.py
121 ↗(On Diff #459870)

Yes by CUDA I meant cuda library here but I'll change it accordingly

clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-api-call-function-handler.cu
2

Fixed

barcisz updated this revision to Diff 460055.Sep 14 2022, 5:54 AM

Update base to D133801

barcisz updated this revision to Diff 460184.Sep 14 2022, 12:35 PM

Documentation and small check-message-related bugfixes

barcisz updated this revision to Diff 460256.Sep 14 2022, 3:50 PM

Brought back different message prefix for when AcceptedHandlers is set

barcisz updated this revision to Diff 460258.Sep 14 2022, 4:00 PM

Removed unneeded headers from the check

barcisz updated this revision to Diff 460261.Sep 14 2022, 4:03 PM

Removed unneeded headers from the check

barcisz updated this revision to Diff 460410.Sep 15 2022, 7:50 AM

Use header guards instead of pragma