The current context is thread-local state, and in preparation of GPU async execution (on multiple threads) we need to set the context before calling API that create resources.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| mlir/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp | ||
|---|---|---|
| 72 | Should this rather use push/pop in case there is some external (to the gpu dialect) use of the context, too? Like if this runs inside of some other runtime. | |
| mlir/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp | ||
|---|---|---|
| 72 | It certainly could, but it seems a little over-engineered at this stage. But happy to add it if you think it makes sense. | |
| mlir/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp | ||
|---|---|---|
| 72 | CUDA context issues are annoying to debug and why not if we can avoid creating that issue. I will forget this and then be puzzled :) | |
Hmm, this turned out more complex than I had thought. I had a simple push/pop in mind. If that is not enough, lets keep it at the simple version for now.
| mlir/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp | ||
|---|---|---|
| 37 | Creating it always as before would make this less complex. What is the drawback? | |
| 44 | This might no longer be the current one, if it was just created. | |
| 48 | Why not use cuCtxPopCurrent here? | |
| 61 | Doesn't cuCtxCreate already do this? | |
| mlir/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp | ||
|---|---|---|
| 37 | Setting a specific context allows running on a different device, for example. The use is quite limited though because mgpuSetContext() is not thread safe. We will probably need to expose the per-thread context per thread, or per function that needs one. I switched it to the primary context, which is the simplest. | |
| 44 | See comment below. | |
| 48 | The CUDA context stack is from early CUDA days. I have not seen anyone using it in years, and the HIP equivalent is marked deprecated. | |
| 61 | cuCtxCreate sets the current context, this restores it so that the c'tor can grab it. It's a bit of a back and forth, but there is no call_once-else. | |
| mlir/tools/mlir-rocm-runner/rocm-runtime-wrappers.cpp | ||
|---|---|---|
| 75 | This should say hipCtxGetCurrent will fix later. | |
Thanks for cleaning this up.
| mlir/tools/mlir-rocm-runner/rocm-runtime-wrappers.cpp | ||
|---|---|---|
| 76 | context -> Context | |
Creating it always as before would make this less complex. What is the drawback?