This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add sanity check that fixes bad shift operation in AMD backend
ClosedPublic

Authored by konradkusiak97 on Jul 12 2023, 1:57 AM.

Details

Reviewers
foad
arsenm
Group Reviewers
Restricted Project
Summary

There is a problem with the SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to UB.

This boolean function decides if two masks can be combined into 1. The idea here is that the bits which are "on" in one mask, don't overlap with the "on" bits of the other. Consider an example (10 bits for simplicity):

Mask 1: 0101101000
Mask 2: 0000000110

Those can be combined into a single mask: 0101101110.

To check if such an operation is possible, the code takes the mask which is greater and counts how many 0s there are, starting from the LSB and stopping at the first 1. Then, it shifts 1u by this number and compares it with the smaller mask. The problem is that when both masks are 0, the counter will find 32 zeroes in the first mask and will try to do a shift by 32 positions which leads to UB.

The fix is a simple sanity check, if the bigger mask is 0 or not.

Diff Detail

Unit TestsFailed

Event Timeline

konradkusiak97 created this revision.Jul 12 2023, 1:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 1:57 AM
konradkusiak97 requested review of this revision.Jul 12 2023, 1:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 1:57 AM
foad added a reviewer: Restricted Project.Jul 12 2023, 2:27 AM

Seems reasonable.

Stray ] in first line of commit message, and the target is called "AMDGPU" not "AMD".

An alternative fix would be to avoid the shift altogether, by checking something like countl_zero(MinMask) + countr_zero(MaxMask) >= 32.

konradkusiak97 retitled this revision from [AMD] ]Add sanity check that fixes bad shift operation in AMD backend to [AMDGPU] Add sanity check that fixes bad shift operation in AMD backend.Jul 12 2023, 2:29 AM
JonChesterfield added inline comments.
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

Should this be if masks are equal?

Not sure sanity check describes the condition, maybe drop the comment

Dropped the comment

konradkusiak97 added inline comments.Jul 12 2023, 3:23 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

Hm, If the masks are equal, the result will be returning false. I think that's the correct behaviour that the author of this function had in mind. That's based on the fact that there is <= and not < sign in if ((1u << AllowedBitsForMin) <= MinMask), so it really checks if the masks overlap - and two equal masks overlap fully.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

If equal masks imply they can't be combined, that should probably be true for the special case of equal masks that are zero.

I haven't looked at the call tree to determine what combining masks means in this context.

foad added inline comments.Jul 12 2023, 3:46 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

Firstly, there is no real use case for dmask=0. The resulting instructions would be invalid. All we have to do is handle it gracefully and not crash.

If equal masks imply they can't be combined, that should probably be true for the special case of equal masks that are zero.

I don't agree. This code is trying to check whether the range of bits set in one mask (from the lowest set bit to the highest including any gaps) overlaps with the range of set bits in the other. By that definition, two equal non-zero masks do overlap but two equal zero masks do not.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

It's called dmasksCanBeCombined though. If equal masks return false, and the zero case doesn't matter other than some arithmetic in the compiler, then masks equal zero returning false seems reasonable. Drive by review, not a blocking comment.

foad added inline comments.Jul 12 2023, 4:18 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

I still disagree. Returning false for all equal masks is no simpler to implement, and less easy to justify.

konradkusiak97 added inline comments.Jul 12 2023, 4:24 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

I'm also more directed towards returning true in the case of both masks being 0. Because you can combine two 0 masks. Whereas two equal, non-zero masks can't be trivially combined because they overlap.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

We already return false for all other equal masks. I'm proposing if (!MaxMask) return false.

I'm running on the heuristic that special cases show up in bug reports and there's no test case in this diff. I don't see zero as special other than we're tripping over it in the compiler.

Usually (x64) shift 32 behaves as if masking the value, so the other UB fix is probably &31 in the right place, which would also return false (I think, haven't checked carefully)

included all local changes

arsenm added inline comments.Jul 15 2023, 9:52 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
877

First I'd say countr_zero is just broken if this is UB.

Also needs testcase

Updated revision

konradkusiak97 added a comment.EditedAug 8 2023, 7:56 AM

I updated the fix to return false for this case. I've been trying to create a testcase for this but I'm still trying to figure out exactly how to write it so it triggers this specific case with 0 mask.

arsenm added a comment.Aug 8 2023, 8:39 AM

I updated the fix to return false for this case. I've been trying to create a testcase for this but I'm still trying to figure out exactly how to write it so it triggers this specific case with 0 mask.

You probably just have to write a MIR test

Created a testcase

Fixed the UB behaviour and included a testcase

arsenm accepted this revision.Aug 9 2023, 10:43 AM
This revision is now accepted and ready to land.Aug 9 2023, 10:43 AM

Thanks @arsenm. As I don't have the commit access, could you land this patch for me? Please use "Konrad Kusiak konrad.kusiak@codeplay" to commit the change.