This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix combined MMO in load-store merge
ClosedPublic

Authored by rampitec on Feb 22 2022, 4:39 PM.

Details

Summary

Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR order into the combineKnownAdjacentMMOs. At the
moment it picks the first operand and just replaces its offset and
size. This essentially loses alignment information and may generally
result in an incorrect base pointer to be used.

Use a base pointer in memory addresses order instead and only adjust
size.

Diff Detail

Event Timeline

rampitec created this revision.Feb 22 2022, 4:39 PM
rampitec requested review of this revision.Feb 22 2022, 4:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 4:39 PM
Herald added a subscriber: wdng. · View Herald Transcript
foad added inline comments.Feb 23 2022, 1:40 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
667–668

If this comment was true ("A and B are identical except for size and offset") then your patch would not be necessary.

foad added a comment.Feb 23 2022, 3:05 AM

MIRParser is a bit strange, because it parses 'align 8' as 'basealign 8'. So your test case:

%1:vgpr_32 = GLOBAL_LOAD_DWORD %0, 4, 0, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 4, align 8, addrspace 1)
%2:vgpr_32 = GLOBAL_LOAD_DWORD %0, 0, 0, implicit $exec :: (load (s32) from `float addrspace(1)* undef`, align 4, addrspace 1)

really means:

%1:vgpr_32 = GLOBAL_LOAD_DWORD %0, 4, 0, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 4, basealign 8, addrspace 1)
%2:vgpr_32 = GLOBAL_LOAD_DWORD %0, 0, 0, implicit $exec :: (load (s32) from `float addrspace(1)* undef`, basealign 4, addrspace 1)

so the two MMOs have the same base pointer, but different information about how aligned it is, which probably would not happen in real codegen.

If you really want to handle that correctly, I suggest taking the larger of the two basealign values, instead of taking the one with the smaller offset.

Separately I think it would be less confusing if the test case said "basealign" explicitly, at least when specifying an alignment that is "more aligned" than the offset.

Separately maybe we should fix the MIRParser?

foad added a comment.Feb 23 2022, 3:46 AM

Separately maybe we should fix the MIRParser?

D120400.

MIRParser is a bit strange, because it parses 'align 8' as 'basealign 8'. So your test case:

%1:vgpr_32 = GLOBAL_LOAD_DWORD %0, 4, 0, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 4, align 8, addrspace 1)
%2:vgpr_32 = GLOBAL_LOAD_DWORD %0, 0, 0, implicit $exec :: (load (s32) from `float addrspace(1)* undef`, align 4, addrspace 1)

really means:

%1:vgpr_32 = GLOBAL_LOAD_DWORD %0, 4, 0, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 4, basealign 8, addrspace 1)
%2:vgpr_32 = GLOBAL_LOAD_DWORD %0, 0, 0, implicit $exec :: (load (s32) from `float addrspace(1)* undef`, basealign 4, addrspace 1)

so the two MMOs have the same base pointer, but different information about how aligned it is, which probably would not happen in real codegen.

If you really want to handle that correctly, I suggest taking the larger of the two basealign values, instead of taking the one with the smaller offset.

Separately I think it would be less confusing if the test case said "basealign" explicitly, at least when specifying an alignment that is "more aligned" than the offset.

Separately maybe we should fix the MIRParser?

The thing I have written in the test might be mannered, but this is exactly what I meant for the testing purposes: align is 8, basealign is 4, offset to it is 4, and we somehow managed to know the alignment of the value with offset to be 8. I know it will not happen in real life likely.

Regardless of the MIR parser I suppose we need to use a pointer info from the lead operation except for the size, because final operation will have the same base pointer as the lead.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
667–668

In fact I think this comment is generally incorrect. I'd probably better remove it al all. This is the point of the patch, use the MMO of the leading operation, this is why I have changed one of the pointers in the test to float, to spot that.

foad added a comment.Feb 23 2022, 8:55 AM

The thing I have written in the test might be mannered, but this is exactly what I meant for the testing purposes: align is 8, basealign is 4, offset to it is 4,

MachineMemOperand cannot represent that information. It only stores basealign explicitly, and it computes align on demand from basealign and offset. See MachineMemOperand::getAlign().

The thing I have written in the test might be mannered, but this is exactly what I meant for the testing purposes: align is 8, basealign is 4, offset to it is 4,

MachineMemOperand cannot represent that information. It only stores basealign explicitly, and it computes align on demand from basealign and offset. See MachineMemOperand::getAlign().

Hm... Indeed. But then 'align' itself shall not probably accepted by the MIR parser at all? Only the basealign is a real input?

Actually I think what is wrong here is the very attempt to find a proper MMO. It shall just accept an MMO of a lead operation explicitly and fix the size.

rampitec updated this revision to Diff 410875.Feb 23 2022, 11:19 AM
rampitec marked an inline comment as done.

Changed to detect a leading load by CombineInfo.Offset and not by the offset from an MMO.

rampitec updated this revision to Diff 410954.Feb 23 2022, 3:23 PM

Added CombineInfo::operator<().

rampitec updated this revision to Diff 410960.Feb 23 2022, 3:42 PM
foad accepted this revision.Feb 24 2022, 1:21 AM

LGTM, thanks. Choosing the first MMO (in pointer order) makes sense to me, as then we don't have to update the offset or the alignment info, just the size.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
667–668

In fact I think this comment is generally incorrect. I'd probably better remove it al all.

Agreed.

678

Typo "the leading operations's pointer".

This revision is now accepted and ready to land.Feb 24 2022, 1:21 AM
rampitec updated this revision to Diff 411176.Feb 24 2022, 10:29 AM
rampitec marked an inline comment as done.

Fixed typo in the comment.

This revision was landed with ongoing or failed builds.Feb 24 2022, 10:48 AM
This revision was automatically updated to reflect the committed changes.