This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
ClosedPublic

Authored by dmgreen on Sep 2 2021, 5:28 AM.

Details

Summary

Given a select_cc producing a constant and a negation of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

Diff Detail

Event Timeline

dmgreen created this revision.Sep 2 2021, 5:28 AM
dmgreen requested review of this revision.Sep 2 2021, 5:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2021, 5:28 AM
Herald added subscribers: MaskRay, wdng. · View Herald Transcript
arsenm added inline comments.Sep 2 2021, 6:08 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22888

Is it really beneficial if this cast is necessary? Should this check the compared and selected types match (or maybe allow truncate, if it's free)

foad added a subscriber: foad.Sep 2 2021, 6:15 AM
foad added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22892

Shouldn't this be done in SimplifySetCC, i.e. for all setCC regardless of whether they are used in a select_cc or not?

dmgreen added inline comments.Sep 2 2021, 6:49 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22888

Any of the selecti32i64 and selecti8i32 tests will involve sign extending, and all look like they produce less or equal numbers of instructions. I've not seen any cases where it made it unprofitable (but may not have tried all possibilities).

22892

Oh yeah! Will do.

dmgreen updated this revision to Diff 370256.Sep 2 2021, 6:52 AM
dmgreen edited the summary of this revision. (Show Details)

Moved setcc fold to SimplifySetCC

arsenm added inline comments.Sep 2 2021, 7:00 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22888

For AMDGPU if the compared type was 64-bits, and the selected type was 32-bits, this would be worse. The other way around would be beneficial

dmgreen added inline comments.Sep 2 2021, 8:42 AM
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll
31–56

Do you mean one of these two tests? Are some of the instructions more costly than others?

arsenm added inline comments.Sep 2 2021, 9:18 AM
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll
31–56

This case is an improvement. The case I think would regress would swap out a 64-bit type for the compare, i.e.

%c = icmp sgt i64 %a, -1
%s = select i1 %c, i32 C, i32 -C
ret i32 %s
dmgreen added inline comments.Sep 2 2021, 9:20 AM
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll
31–56

Is that not selecti64i32 above?

arsenm added inline comments.Sep 2 2021, 9:23 AM
llvm/test/CodeGen/AMDGPU/select-constant-xor.ll
31–56

oh yes, that is an improvement

spatel added inline comments.Sep 2 2021, 9:28 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3905 ↗(On Diff #370256)

Split this off as its own patch? Handle the not-equal predicate too?

We just need some tests like:

define i1 @src(i32 %input) {
  %sh = ashr i32 %input, 31
  %c = icmp eq i32 %sh, -1
  ret i1 %c
}

Note that instcombine generalizes this, so we don't care about the shift amount:

// Canonicalize the shift into an 'and':
// icmp eq/ne (shr X, ShAmt), C --> icmp eq/ne (and X, HiMask), (C << ShAmt)

I'm not sure if that makes sense here in codegen, but it might be worth a look to see how that changes things on various targets.

dmgreen updated this revision to Diff 370503.Sep 3 2021, 1:04 AM
dmgreen edited the summary of this revision. (Show Details)

Rebase, taking out the setcc combine.

spatel added inline comments.Sep 3 2021, 6:17 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22880

Should there be a hasOneUse check on the condition op (N0)? Need to add a (potentially negative) test?

llvm/test/CodeGen/PowerPC/testComparesi32ltu.ll
75 ↗(On Diff #370503)

What is happening on this test?

dmgreen added inline comments.Sep 3 2021, 7:56 AM
llvm/test/CodeGen/PowerPC/testComparesi32ltu.ll
75 ↗(On Diff #370503)

Oh, I thought I had removed all those. This is just regenerated the test, there are no changes in this file now (there was in a previous version).

dmgreen updated this revision to Diff 370585.Sep 3 2021, 7:57 AM

I've added oneusecmp tests, which show the same number or less instructions for the cases I've tested. let me know if I should add a oneuse check in case other patterns are not better.

spatel accepted this revision.Sep 3 2021, 8:28 AM

I've added oneusecmp tests, which show the same number or less instructions for the cases I've tested. let me know if I should add a oneuse check in case other patterns are not better.

Bit hack looks slightly better to me too, so LGTM.
The safe move would be to put the one-use clause into a first commit, then remove it as a follow-up, so we just get the more aggressive folds on their own. That also makes it less likely that we'd have to revert the main patch if only the more aggressive case causes a regression somewhere.

This revision is now accepted and ready to land.Sep 3 2021, 8:28 AM
This revision was landed with ongoing or failed builds.Sep 5 2021, 8:04 AM
This revision was automatically updated to reflect the committed changes.