Details
- Reviewers
arsenm foad - Commits
- rG28eb9ed3bb5b: [AMDGPU] Fine tune LDS misaligned access speed
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1486–1488 | What do the numbers mean? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1486–1488 | More or less 'it operates with a speed comparable to N-bit wide load'. With the full alignment ds128 is slower than ds96 for example. If underaligned it is comparable to a speed of a single dword access, which would then mean 32 < 128 and it is faster to issue a wide load regardless. 1 is simply 'slow, don't do it'. I.e. comparing an aligned load to a wider load which will not be aligned anymore the latter is slower. But essentially it is just a rank, these are not additive. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1486–1488 | This needs to be commented |
What do the numbers mean?