The current context is thread-local state, and in preparation of GPU async execution (on multiple threads) we need to set the context before calling API that create resources.
Should this rather use push/pop in case there is some external (to the gpu dialect) use of the context, too? Like if this runs inside of some other runtime.
Hmm, this turned out more complex than I had thought. I had a simple push/pop in mind. If that is not enough, lets keep it at the simple version for now.
Creating it always as before would make this less complex. What is the drawback?
This might no longer be the current one, if it was just created.
Why not use cuCtxPopCurrent here?
Doesn't cuCtxCreate already do this?
Setting a specific context allows running on a different device, for example. The use is quite limited though because mgpuSetContext() is not thread safe.
We will probably need to expose the per-thread context per thread, or per function that needs one.
I switched it to the primary context, which is the simplest.
See comment below.
The CUDA context stack is from early CUDA days. I have not seen anyone using it in years, and the HIP equivalent is marked deprecated.
cuCtxCreate sets the current context, this restores it so that the c'tor can grab it. It's a bit of a back and forth, but there is no call_once-else.