This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2
ClosedPublic

Authored by spatel on Oct 27 2021, 10:00 AM.

Details

Summary

If C is a high-bit mask:
(trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?)

This extends the fold added with:
acabad9ff6bf (https://alive2.llvm.org/ce/z/aFr7qV)

We discussed using decomposeBitTestICmp() to generalize this, but that function doesn't line up with the other fold that I was imagining (maybe there's some way to adapt/invert the logic?).

This patch also modifies the code to create the mask constant from the earlier patch in an attempt to make the bit-masking relationships clearer. I can make that an NFC pre-commit to be safer.

Here are Alive2 generalizations the folds:
https://alive2.llvm.org/ce/z/u-ZpC_ (the previous patch)
https://alive2.llvm.org/ce/z/YsuAu2 (ult this patch)

https://alive2.llvm.org/ce/z/ekktQP (ugt low bitmask)
https://alive2.llvm.org/ce/z/pJY9wR (ugt one clear bit)

Diff Detail

Event Timeline

spatel created this revision.Oct 27 2021, 10:00 AM
spatel requested review of this revision.Oct 27 2021, 10:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2021, 10:00 AM
spatel edited the summary of this revision. (Show Details)Oct 27 2021, 10:03 AM
spatel updated this revision to Diff 384507.EditedNov 3 2021, 10:33 AM

Patch updated:
I am including the sibling pair of 'ugt' folds here, so we can try to see if there's a cleaner way to implement this.
decomposeBitTestICmp() assumes the new compare constant is zero, so we could use that as-is for 2 of these, but then we'd still have the other 2 folds here.
I'm not sure if generalizing that helper would be useful for any other potential callers.

spatel updated this revision to Diff 384517.Nov 3 2021, 10:44 AM

Patch updated:
Fixed a code comment that was copy-pasted without being updated to match the actual logic in the code.

lebedev.ri edited the summary of this revision. (Show Details)Nov 3 2021, 10:46 AM

Like i asked in mail, this really seems like something for decomposeBitTestICmp().

spatel added a comment.Nov 3 2021, 1:25 PM

Like i asked in mail, this really seems like something for decomposeBitTestICmp().

I don't disagree, but we have 2 at least problems:

  1. It doesn't cover the pair of folds that result in a non-zero constant for the new icmp.
  2. It catches signbit tests that may currently get folded in the opposite direction (see inline comment), so we'll need to remove a transform.

So I'm leaning towards adding these as the first step. Reversing the existing fold could lead to regressions in IR or codegen, so that seems more risky, and I'll potentially have to chase down regressions to get that to stick.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1897–1899

This is an opposing transform if we use decomposeBitMask to make this patch more general.
This was added with:
dfa3b0954145ec6

I noticed that there are a pair of similar folds for icmp (add X, C1), C just ahead of the fold in D113366. I think we're missing something similar to what I'm proposing here, but I haven't worked out the bitmask relationships.

Ping.
As I mentioned, I'd prefer not to try to add folds and invert an existing fold in one patch. I did add more instcombine tests for a follow-up patch that would use decomposeBitTestICmp.

lebedev.ri added inline comments.Nov 15 2021, 11:49 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4642

I'm guessing we prefer inequality check over relational? https://alive2.llvm.org/ce/z/9mn4Tx

4646

Are these supposed to be in this patch? I'm not seeing alive2 links in the description for them.

spatel edited the summary of this revision. (Show Details)Nov 15 2021, 12:16 PM
This revision is now accepted and ready to land.Nov 15 2021, 12:24 PM
spatel marked an inline comment as done.Nov 15 2021, 12:25 PM
spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4642

Yes, we should canonicalize to equality on that pattern:
https://alive2.llvm.org/ce/z/MWK88Q

4646

Yes, I put the 'ugt' patterns in for symmetry (but I can commit in pieces if that seems better). The alive2 links were in a subsequent comment, but I just added them to the description.

This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.
fhahn added a subscriber: fhahn.Mar 18 2022, 5:34 AM

It looks like this introduced a regression with -Oz: https://github.com/llvm/llvm-project/issues/53321

It would be great if you could take a look.

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2022, 5:34 AM
MatzeB added a subscriber: MatzeB.Dec 1 2022, 6:20 PM

It seems this leads to slightly worse x86 codegen. Where we previously had:

%v1 = trunc i64 %v0 to i32
%cmp = icmp slt i32 %v1, 0

resulting in:

testl   %eax, %eax
js      .LBB0_1

instcombine now changes this to:

%v1 = and i64 %v0, 2147483648
%cmp = icmp eq i64 %v1, 0

resulting in

testl   $-2147483648, %eax              # imm = 0x80000000
jne     .LBB0_2

Though I guess we best fix this by adding more x86 patterns...

spatel added a comment.Dec 2 2022, 8:08 AM

It looks like this introduced a regression with -Oz: https://github.com/llvm/llvm-project/issues/53321

It would be great if you could take a look.

Sorry I missed this when it was sent - the code looks fine now, so I closed the bug.

spatel added a comment.Dec 2 2022, 8:12 AM

It seems this leads to slightly worse x86 codegen. Where we previously had:

%v1 = trunc i64 %v0 to i32
%cmp = icmp slt i32 %v1, 0

resulting in:

testl   %eax, %eax
js      .LBB0_1

instcombine now changes this to:

%v1 = and i64 %v0, 2147483648
%cmp = icmp eq i64 %v1, 0

resulting in

testl   $-2147483648, %eax              # imm = 0x80000000
jne     .LBB0_2

Though I guess we best fix this by adding more x86 patterns...

Yes, we can fix this up at some point in codegen/isel. For any mask+cmp of an i64/i32/i16 try to convert to a signbit test of a smaller power-of-2 type to avoid using a constant in the test instruction?

spatel added a comment.Dec 5 2022, 1:14 PM

Yes, we can fix this up at some point in codegen/isel. For any mask+cmp of an i64/i32/i16 try to convert to a signbit test of a smaller power-of-2 type to avoid using a constant in the test instruction?

Proposal to do this target-independently (but using target hooks to limit it):
https://reviews.llvm.org/D139363