Given a select_cc producing a constant and a negation of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22888 | Is it really beneficial if this cast is necessary? Should this check the compared and selected types match (or maybe allow truncate, if it's free) |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22892 | Shouldn't this be done in SimplifySetCC, i.e. for all setCC regardless of whether they are used in a select_cc or not? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22888 | Any of the selecti32i64 and selecti8i32 tests will involve sign extending, and all look like they produce less or equal numbers of instructions. I've not seen any cases where it made it unprofitable (but may not have tried all possibilities). | |
22892 | Oh yeah! Will do. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22888 | For AMDGPU if the compared type was 64-bits, and the selected type was 32-bits, this would be worse. The other way around would be beneficial |
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll | ||
---|---|---|
31–56 | Do you mean one of these two tests? Are some of the instructions more costly than others? |
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll | ||
---|---|---|
31–56 | This case is an improvement. The case I think would regress would swap out a 64-bit type for the compare, i.e. %c = icmp sgt i64 %a, -1 %s = select i1 %c, i32 C, i32 -C ret i32 %s |
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll | ||
---|---|---|
31–56 | Is that not selecti64i32 above? |
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll | ||
---|---|---|
31–56 | oh yes, that is an improvement |
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
3905 ↗ | (On Diff #370256) | Split this off as its own patch? Handle the not-equal predicate too? We just need some tests like: define i1 @src(i32 %input) { %sh = ashr i32 %input, 31 %c = icmp eq i32 %sh, -1 ret i1 %c } Note that instcombine generalizes this, so we don't care about the shift amount: // Canonicalize the shift into an 'and': // icmp eq/ne (shr X, ShAmt), C --> icmp eq/ne (and X, HiMask), (C << ShAmt) I'm not sure if that makes sense here in codegen, but it might be worth a look to see how that changes things on various targets. |
llvm/test/CodeGen/PowerPC/testComparesi32ltu.ll | ||
---|---|---|
75 ↗ | (On Diff #370503) | Oh, I thought I had removed all those. This is just regenerated the test, there are no changes in this file now (there was in a previous version). |
I've added oneusecmp tests, which show the same number or less instructions for the cases I've tested. let me know if I should add a oneuse check in case other patterns are not better.
Bit hack looks slightly better to me too, so LGTM.
The safe move would be to put the one-use clause into a first commit, then remove it as a follow-up, so we just get the more aggressive folds on their own. That also makes it less likely that we'd have to revert the main patch if only the more aggressive case causes a regression somewhere.
Should there be a hasOneUse check on the condition op (N0)? Need to add a (potentially negative) test?