This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
ClosedPublic

Authored by critson on Jul 28 2022, 7:08 PM.

Details

Summary

Apply merging to s_load as is done for s_buffer_load.

Diff Detail

Event Timeline

critson created this revision.Jul 28 2022, 7:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 7:08 PM
critson requested review of this revision.Jul 28 2022, 7:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 7:08 PM

Compilation testing on graphics shader corpus (~10k pipelines) for Navi10.

~15% reduction in s_load_dword instructions.
~1% reduction in s_waitcnt instructions.
~1% increase in s_mov instructions.
~0.75% overall decrease in instructions.

VGPR usage increase in ~400 pipelines (mostly 1-2 VGPRs).
VGPR usage decrease in ~500 pipelines.
Mean VGPR usage change: -0.02 VGPRs.

SGPR usage increase in ~3000 pipelines.
SGPR usage decrease in ~1000 pipelines.
Mean SGPR usage change: 0.62 SGPRs.

No changes in scratch usage.

Occupancy (executable wavefronts based on VGPR count) increased in 234 pipelines, decreased in 125 pipelines.

foad accepted this revision.Jul 29 2022, 7:19 AM

LGTM, thanks! The SILoadStoreOptimizer.cpp diff is impressively small.

In future perhaps we should also handle the *_SGPR_IMM forms of various SMEM loads.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
2336

Maybe rename to something like mergeSMEMLoadImmPair?

This revision is now accepted and ready to land.Jul 29 2022, 7:19 AM
critson marked an inline comment as done.Jul 29 2022, 7:41 PM
This revision was landed with ongoing or failed builds.Jul 29 2022, 7:41 PM
This revision was automatically updated to reflect the committed changes.