This commit adjusts the CUDA context management in the SerializeToCubin pass.
In particular, it uses the device 0 primary context instead of creating a new
CUDA context on each invocation of SerializeToCubin. This yields very large
improvements in compile time, especially if an application (like a JIT compiler)
is calling SerializeToCubin repeatedly.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo