This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
ClosedPublic

Authored by kosarev on Sep 13 2022, 9:56 AM.

Diff Detail

Event Timeline

kosarev created this revision.Sep 13 2022, 9:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 9:56 AM
kosarev requested review of this revision.Sep 13 2022, 9:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 9:56 AM
rampitec added inline comments.Sep 13 2022, 1:34 PM
llvm/test/CodeGen/AMDGPU/merge-sbuffer-load.mir
136

Add couple negative tests maybe, one with different sgprs used and one with non-adjacent offsets?

foad added a comment.Sep 14 2022, 2:22 AM

I can't resist pointing out that this patch would be simpler if we never used the _SGPR form on subtargets where the _SGPR_IMM form is available.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
644

Don't you need to list the _SGPR forms here too?

I can't resist pointing out that this patch would be simpler if we never used the _SGPR form on subtargets where the _SGPR_IMM form is available.

It would be even simpler if we did this in the IR to begin with. It's only after codegen you have to worry so much about addressing mode minutia

kosarev updated this revision to Diff 460125.Sep 14 2022, 8:59 AM

Addressed review feedback.

kosarev marked 2 inline comments as done.Sep 14 2022, 9:03 AM
kosarev added inline comments.
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
644

Oh, that's a nice catch, thanks. I've updated hasSameBaseAddress() above to compare the number of address operands so tests could catch this as well.

kosarev marked an inline comment as done.Sep 14 2022, 9:11 AM

I can't resist pointing out that this patch would be simpler if we never used the _SGPR form on subtargets where the _SGPR_IMM form is available.

True, the implementation would be a bit simpler. I don't mind it either way, it's just that a slightly simpler implementation doesn't necessarily work as an obviously sufficient argument when it comes to user-faced things. I guess some wider discussion might help here.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
1388

Also changed this line to what looks to me a more reliable implementation.

This revision is now accepted and ready to land.Sep 14 2022, 1:48 PM
foad accepted this revision.Sep 15 2022, 3:12 AM

LGTM, thanks!

This revision was landed with ongoing or failed builds.Sep 15 2022, 5:49 AM
This revision was automatically updated to reflect the committed changes.