We were handling the SGPR->VGPR->SGPR case. This extends to handle
VGPR->SGPR->VGPR cases if exec wasn't modified between the use and
def. This cleans up a few cases used to assert uniformity if that
turned out to not be helpful.
Details
- Reviewers
- None
- Group Reviewers
Restricted Project
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case. I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | We use readfirstlane to "elect" a value from the currently active lines. The argument is likely not uniform, and breaking such code would be problematic. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | I thought this was wrong at first but don't see where the problem is. If you're reading the value back into a VGPR with the same exec mask at a later point, where is the difference? At the copy to VGPR, you're copying the from the same lane |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to all other active lanes. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | In the original code, every lane gets the same value in %2. If you remove the readfirstlane, they might get different values (if %0 is non-uniform). |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | It's broadcast to a scalar value, but as soon as you have a vector use, it's reduced down to the active lanes again. It's only uniform for scalar uses |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | Oh, I see here. I'm checking the wrong exec def point. I need to check exec at the point the readfirstlane source is defined, not the readfirstlane itself |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
1858 | Even if exec doesn't change throughout the whole program, it's still semantically wrong to remove a readfirstlane like this, unless you can prove independently that the input to the readfirstlane is uniform. |
This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case.
I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it.