Apply merging to s_load as is done for s_buffer_load.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Compilation testing on graphics shader corpus (~10k pipelines) for Navi10.
~15% reduction in s_load_dword instructions.
~1% reduction in s_waitcnt instructions.
~1% increase in s_mov instructions.
~0.75% overall decrease in instructions.
VGPR usage increase in ~400 pipelines (mostly 1-2 VGPRs).
VGPR usage decrease in ~500 pipelines.
Mean VGPR usage change: -0.02 VGPRs.
SGPR usage increase in ~3000 pipelines.
SGPR usage decrease in ~1000 pipelines.
Mean SGPR usage change: 0.62 SGPRs.
No changes in scratch usage.
Occupancy (executable wavefronts based on VGPR count) increased in 234 pipelines, decreased in 125 pipelines.
LGTM, thanks! The SILoadStoreOptimizer.cpp diff is impressively small.
In future perhaps we should also handle the *_SGPR_IMM forms of various SMEM loads.
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
2335–2336 | Maybe rename to something like mergeSMEMLoadImmPair? |
Maybe rename to something like mergeSMEMLoadImmPair?