These may be inserted to assert uniformity somewhere.
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | I do not think it is correct. If first lane is off you will fold wrong value. In both cases of readfirstlane and readlane you need to check a specific lane, which I am not sure even possible. |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | readlane ignores exec. readfirstlane only reads exec to find the first lane, and still returns a result when exec is 0. The source value is a constant, so it doesn't matter what the lane is |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | Readlane ignores exec, but v_mov_b32 does not. So you may have an old value of vgpr instead of immediate. |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | It would matter if there was an exec def between the copy and the use, which I want to ban within a block |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | But it is not banned yet, right? Are we sure that only happens within a block? Why block cannot be executed with a zero exec? And then at the very least we need to be sure register is virtual. |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | We force execz skip branches around blocks that have readlane/readfirstlane in them. exec = 0 should also be illegal on function entry |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | I thought about it. From the ISA perspective it is illegal if you are reading an inactive lane. This can be legal here on the grounds that register is virtual and we are in SSA. Are both conditions true? |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | SGPR spilling does read/write VGPRs for inactive lanes, but the use of an SSA value for an invalid lane doesn't make sense. I'm not sure if the exec check is strictly necessary, but I've added it. |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | I do not think this check really changes anything. Look: you have lane 8 disabled, both at def and use. Then you do readlane for the lane 8. It will pass the check and yield wrong value. So this check is not necessary and does not help too. There are just two conditions which may make it legal: 1) register is virtual 2) we are in SSA. If both a satisfied you can do it, because there can be no other valid def anyway. |
This cannot work as written, at least readlane part. Maybe readfirstlane can, providing skip is always inserted around and def is in the same block. Otherwise it needs to be a virtual register and SSA.
Why do we check for virtual registers all over the place place then, including the few lines above?
Also note, if virtual and SSA, then check for exec modification is not needed because that must be the only def anyway. Otherwise it reads undef that way or another.
Thanks, this is very helpful!
Does this collapse redundant readfirstlanes? I.e., the case where the result of a readfirstlane is immediately fed to another readfirstlane. I think it might fix that -- could you please add a corresponding test?
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | You do need the EXEC mask check either way because the use could be observing a def inside a divergent loop. For the readfirstlane case, that check should be sufficient. For the readlane case, I agree that you need to know the register is virtual and in SSA. But with that knowledge, I believe the code should be good as-is. |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | Right. Exec check probably needed, at any rate it doesn't hurt. But readlane case needs either be removed or additional check needed. |
Add test for multiple folds
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | Both of those are known. This pass is only used SSA, and skips physical registers before this point |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | It is SSA, granted. But I do not see that we check for virtual register at all possible foldOperand() invocations, only in one of the cases. |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | The search for users begins from a virtual register def. foldInstOperand is never called |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
525 | There are quite a few places where we call foldOperand(), including recursively. Only foldInstOperand() is guarded with the check. It is not immediately obvious none of that calls will end up with a virtual register only. Moreover, we are checking for virtual destination registers in the very same function. It will be safe to add the check here, otherwise I cannot prove to myself it is safe. |
I do not think it is correct. If first lane is off you will fold wrong value. In both cases of readfirstlane and readlane you need to check a specific lane, which I am not sure even possible.