In the "large stride" heuristic ignore loads from the constant address
space (as well as local address space). K$ behavior is very different
from L0$ so it doesn't make much sense to use the same heuristic for
them.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
345–346 | This isn't really a good implementation. We use scalar loads in more cases and constant address space isn't a guarantee of SMEM loads |
Comment Actions
This is LGTM, but needs test fix first of course.
I think the test is OK now I have rebased on D122804?
Comment Actions
LGTM I guess, but isConstantAddr really should be fixed. CONSTANT_ADDRESS isn't sufficient or even that helpful for knowing if this will use scalar loads
This isn't really a good implementation. We use scalar loads in more cases and constant address space isn't a guarantee of SMEM loads