Instead of calling cuFuncGetAttribute with
CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK for every kernel invocation,
we can do it for the first one and cache the result as part of the
KernelInfo struct. The only functional change is that we now expect
cuFuncGetAttribute to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Do we need to change the front end? This data structure should be generated by the FE, right?