With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic. This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
You may want to add a change for to make sure explicit invariant loads work within kernels, too.
llvm/test/CodeGen/NVPTX/ldg-invariant.ll | ||
---|---|---|
21–22 ↗ | (On Diff #67941) | You may want to add a test case for invariant load from non-global space. |