Details
- Reviewers
foad - Commits
- rGc27f8fb96882: [AMDGPU] Remove cndmask from readsExecAsData
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
133 | The function name is confusing, all these instructions access EXEC just as normal vector instructions. One cause of the failure @sebastian-ne showed in D116053 is we are dropping convergent attribute when lowering amdgcn.icmp into V_CMP. I am not sure if there is any other reason. I think may be a right fix for the issue is define convergent version of V_CMP, I am not sure whether it sounds reasonable? Please also add some comments this is just workaround for the issue that we are lowering convergent icmp into normal V_CMP. We should fix it some other way. | |
140 | Did you see a failure caused by v_readfirstlane? if not, please also remove it. |
LGTM. I agree with all of @ruiling's comments but I don't think they should block this patch, which is clearly a step in the right direction.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
133 |
Yes, sounds good to me. There is also a related problem in that the "convergent" attribute in IR prevents an instruction from being sunk into an "if" but it does not prevent it from being hoisted out of an "if". For ballot we need both. I'm not sure if the MachineIR convergent flag does both. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
133 |
MachineIR have prohibited both sinking and hoisting. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
133 | The function name and the whole idea reflects that unlike other VALU where exec just predicates the execution these instructions use exec much like a normal operand which is reflected in the actual numerical answer. I.e., for cmp that is: dst = cmp(src0, src1) & exec For readfirstlane that is: dst = vgpr[cttz(exec)] Marking cmp as convergent might be a better fix because it will still allow to move compares within a block, although it sounds a little orthogonal to the exec use modeling to me. | |
140 | No, because it is convergent. But technically is has the same effect of using exec. |
The function name is confusing, all these instructions access EXEC just as normal vector instructions. One cause of the failure @sebastian-ne showed in D116053 is we are dropping convergent attribute when lowering amdgcn.icmp into V_CMP. I am not sure if there is any other reason. I think may be a right fix for the issue is define convergent version of V_CMP, I am not sure whether it sounds reasonable? Please also add some comments this is just workaround for the issue that we are lowering convergent icmp into normal V_CMP. We should fix it some other way.