This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fold out readfirstlane between vgpr to vgpr copies
Needs ReviewPublic

Authored by arsenm on Apr 25 2022, 6:54 AM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Summary

We were handling the SGPR->VGPR->SGPR case. This extends to handle
VGPR->SGPR->VGPR cases if exec wasn't modified between the use and
def. This cleans up a few cases used to assert uniformity if that
turned out to not be helpful.

Diff Detail

Event Timeline

arsenm created this revision.Apr 25 2022, 6:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 6:54 AM
arsenm requested review of this revision.Apr 25 2022, 6:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 6:54 AM
Herald added a subscriber: wdng. · View Herald Transcript
foad added inline comments.Apr 25 2022, 7:27 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case.

I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it.

b-sumner added inline comments.
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

We use readfirstlane to "elect" a value from the currently active lines. The argument is likely not uniform, and breaking such code would be problematic.

arsenm added inline comments.Apr 25 2022, 1:11 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

I thought this was wrong at first but don't see where the problem is. If you're reading the value back into a VGPR with the same exec mask at a later point, where is the difference? At the copy to VGPR, you're copying the from the same lane

b-sumner added inline comments.Apr 25 2022, 1:29 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to all other active lanes.

foad added inline comments.Apr 25 2022, 1:32 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

In the original code, every lane gets the same value in %2. If you remove the readfirstlane, they might get different values (if %0 is non-uniform).

arsenm added inline comments.Apr 25 2022, 1:33 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

It's broadcast to a scalar value, but as soon as you have a vector use, it's reduced down to the active lanes again. It's only uniform for scalar uses

arsenm added inline comments.Apr 25 2022, 1:36 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

Oh, I see here. I'm checking the wrong exec def point. I need to check exec at the point the readfirstlane source is defined, not the readfirstlane itself

foad added inline comments.Apr 25 2022, 1:47 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858

Even if exec doesn't change throughout the whole program, it's still semantically wrong to remove a readfirstlane like this, unless you can prove independently that the input to the readfirstlane is uniform.