This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Optimize and of icmps with power-of-2 and contiguous masks
ClosedPublic

Authored by jmciver on May 16 2022, 12:07 PM.

Details

Diff Detail

Event Timeline

jmciver created this revision.May 16 2022, 12:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2022, 12:07 PM

All six reported failing tests (due to timeouts) are passing locally in my build/test environment.

jmciver updated this revision to Diff 429934.May 16 2022, 9:55 PM

Fix capitzalation of function names containing ICMP

Based on functionality of the optimization I believe foldLogOpOfMaskedICmps to be a reasonable location. However, it maybe better to create a new function and call it from InstCombinerImpl::foldAndOrOfICmps based on the following.

The placement of the call to foldLogOpOfMaskedICmps_AllZeros_BMask_NotMixed_and_NotAllOnes in the function foldLogOpOfMaskedICmps breaks the consistency of the conditional testing based on the conjunction of left and right mask (see variable Mask line 586). It is possible to place calls to foldLogOpOfMaskedICmps_AllZeros_BMask_NotMixed_and_NotAllOnes in two locations in foldLogOpOfMaskedICmps:

  • In function foldLogOpOfMaskedIcmpsAsymmetric called from conditional block Mask == 0 (line 593)
  • In conditional block Mask & (Mask_NotAllZeros | BMask_NotAllOnes (line 655)

I opted against this due to readability.

Thanks in advance for any feedback!

jmciver published this revision for review.May 17 2022, 12:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 12:00 AM
nikic edited the summary of this revision. (Show Details)May 17 2022, 1:20 AM

To reduce performance impact I am thinking that the call to foldLogOpOfMaskedICmps_AllZeros_BMask_NotMixed_and_NotAllOnes should be moved into a new function, foldPowerOf2AndWithLesserContinuous, and then call this function from foldAndOrOfICmps. This will help to keep the purpose of foldLogOpOfMaskedICmps clear and reduce the overhead by not testing for a less probable optimization early.

Please can you add vector test coverage - uniform / non-uniform cases and cases containing undef / poison elements

Thanks @RKSimon! I will work on adding the additional tests.

As per @RKSimon's feedback I have discovered a number of vector deficiencies in my implementation.

From reading https://developers.redhat.com/articles/2022/12/20/how-contribute-llvm# (which is excellent) I have a better understanding of how to structure this patch. I am currently working on further vector test case generation and will post it as a parent to this patch in the next day or so.

jmciver updated this revision to Diff 494079.Feb 1 2023, 3:07 PM

Add support for vectors:

  • C1 supports splat vectors with poison/undef.
  • C2 supports vectors with unique elements.
jmciver updated this revision to Diff 494099.Feb 1 2023, 3:59 PM

Fix no change test case updated in D143044

jmciver updated this revision to Diff 499012.Feb 20 2023, 6:48 PM

Update icmp-logical.ll test names as per recommendation in D143044

goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
502

Its a bit hard to follow the cases, can you add a comment specifying the transformation being done at each non nullptr return.

535

remove empty line.

jmciver updated this revision to Diff 508458.Mar 26 2023, 7:22 PM

Add comments to each non nullptr conditional.

Refactor leading ones and zeros comparison to use equality operator.

jmciver marked an inline comment as done.Mar 26 2023, 7:23 PM
jmciver added inline comments.Mar 26 2023, 7:28 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
502

Thanks for taking the time to review. Let me know if you think the comments help.

jmciver updated this revision to Diff 508483.Mar 26 2023, 10:54 PM

Rebase onto latest main.

jmciver edited the summary of this revision. (Show Details)Mar 27 2023, 8:41 PM

Added Alive2 URLs to patch summary.

goldstein.w.n added inline comments.Apr 25 2023, 5:38 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
484

Can you be explicit about what you are transforming this into (ult).

Also shouldn't it be (icmp (A & B) == 0) & (icmp (A & D) != E)?

489

C is unused AFAICT.

494

As a TODO (or here if you want to) you could also do:

`(icmp (A & B) == B) & (icmp (A & D) == D)` -> `(icmp (A & (B | D) == (B | D)`

https://alive2.llvm.org/ce/z/sFcvJP

543

Can you add some tests with undef elements.

Also if undef is allowed in the non-vec case as well, this check and the scalar one could be done with a lambda.

jmciver updated this revision to Diff 522027.May 14 2023, 3:42 PM
jmciver edited the summary of this revision. (Show Details)

Resolving feedback items identified by @goldstein.w.n

In function foldLogOpOfMaskedICmpsAllZerosBMaskNeqMixedOrContiguous:

  • Improved explicitness of target transformation in function comment
  • Removed unused function parameter
  • Added splat vector test using undef
  • Refactored test for compatibility into a lambda function
jmciver marked 2 inline comments as done.May 14 2023, 5:13 PM
jmciver added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
494
543

I added tests using undef splat vectors.

The conversion to a lambda is much cleaner. I have enabled support for undef/poison support in the scalar case. However, testing of this transformation using scalar undef or poison enables an earlier optimization path:

https://godbolt.org/z/7c87ba45v

The following Alive2 output show the scalar use of undef and poison with this transformation to be correct:

https://alive2.llvm.org/ce/z/9_eQny

jmciver marked 2 inline comments as not done.May 14 2023, 5:15 PM
goldstein.w.n added inline comments.May 16 2023, 9:17 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
494

You're right, for some reason thought it wasn't being handled.

535

nit(style): i -> I

543

Thanks. Just FYI, you can do multiple proofs in the same link by postfixing 'src' with either '\d+' or '_<desc>' i.e:
https://godbolt.org/z/M5eK16Ess

549

I think it would be clearer to organize this as follows:

if(IsVec) {
foreach(B, D, E) {
if(!isReducible) { return nullptr; }
}
}
else {
if(!isReducible) { return nullptr; }
}
return ICmpInst(ULT, A, D);
jmciver updated this revision to Diff 523189.May 17 2023, 3:29 PM

Refactor scalar and vector tests to emit icmp from one location

Prior to this update foldLogOpOfMaskedICmpsAllZerosBMaskNeqMixedOrContiguous
returned icmp expressions from two locations. Acceptance is now handled at the
end of the function resulting in a single location for icmp emission.

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2023, 3:29 PM
jmciver updated this revision to Diff 523192.May 17 2023, 3:34 PM

Refactor scalar and vector tests to emit icmp from one location

Prior to this update foldLogOpOfMaskedICmpsAllZerosBMaskNeqMixedOrContiguous
returned icmp expressions from two locations. Acceptance is now handled at the
end of the function resulting in a single location for icmp emission.

jmciver marked 2 inline comments as done.May 17 2023, 3:39 PM
jmciver added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
543

Thanks for the Godbolt pointer!

549

I agree that is much cleaner. I have refactored accordingly. I added a comment to the function header stating that parameter B supports masking.

goldstein.w.n added inline comments.May 17 2023, 4:09 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
520

When do we transform to A & X eq/ne Y? I don't see that code.

543

nit: maybe:

} 
// Test scalar type arguments for conversion.    
else if (!isReducible(B, D, E)) {
     return nullptr;
}

Although given the LLVM style of '}' and 'else' on the same line its a bit out of place. No strong feelings so update if you see fit.

llvm/test/Transforms/InstCombine/icmp-logical.ll
2553 ↗(On Diff #523192)

One more thing. Can you put the "fail" keyword somewhere in the tests that you expect no change in. Think it helps make it crystal clear exact what the is meant to happen.

jmciver added inline comments.May 18 2023, 10:51 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
520

That is a great question. The A & X eq/ne Y is emitted by the other transformations in foldLogOpOfMaskedICmps. When I started developing this patch I picked this location as the classifier function getMaskedTypeForICmpPair converted the source to the form:
(icmp ne (A & B), C) | (icmp qe (A & D), E). This is violation of the original intended use foldLogOpOfMaskedICmps at it is targetting ICMPs of the same operator, which fold to (icmp(A & X) ==/!= Y).

My current thinking is to move this fold into one of the other called functions in foldAndOrOfICmps (option 1) or create its own function and then call that from foldAndOrOfICmps (option 2).

Option 1: One possible function is foldIsPowerOf2, but it is specific to ctpop based expressions so I am hesitant to expand functionality.

Option 2: Create a function called foldIsNegativePowerOf2 (open to other names), which will attempt the transform X <u 2^n to (X & ~(2^n-1)) == 0 and then perform the subsequent tests for folding as seen in foldLogOpOfMaskedICmpsAllZerosBMaskNeqMixedOrContiguous. This removes the need for verbose aforementioned function name and provides a concise "does one thing well" function.

I think option 2 is probably the best path. I'll go ahead and implement and see what it looks like. If you have other ideas let me know!

llvm/test/Transforms/InstCombine/icmp-logical.ll
2553 ↗(On Diff #523192)

I updated the names of tests that intentionally result in no change with a "_fail" postfix, see D143044.

jmciver marked an inline comment as not done.May 18 2023, 10:51 AM
jmciver updated this revision to Diff 524062.May 20 2023, 3:08 PM

Add foldPowerOf2AndShiftedMask(...) and call from InstCombinerImpl::foldAndOrOfICmps(...)

Place the folding of ((icmp X u< P) & (icmp(X & M) != M)) or ((icmp X s> -1) &
(icmp(X & M) != M)) into (icmp X u< M) in foldPowerOf2AndShiftedMask(...). This
function attempts the transform of (X <u 2^n) into (X & ~(2^n-1)) == 0 or (icmp
X s> -1) into (icmp (X & SignMask) == 0) and then performs additional checks in
foldNegativePower2AndShiftedMask(...).

This patch delta removes prior implementation work from
foldLogOpOfMaskedICmps(...).

goldstein.w.n added inline comments.May 20 2023, 5:20 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
520

Bah, I had left reply unsent. Agree option 2 is better (and looks like thats what you did so great)

goldstein.w.n accepted this revision.May 20 2023, 5:21 PM

LGTM. I'm not a maintainer so please wait 1/2 days to push so others may take a look.

This revision is now accepted and ready to land.May 20 2023, 5:21 PM

@goldstein.w.n thanks for taking the time to review.

I'll hold for a maintainers approval.

@goldstein.w.n thanks for taking the time to review.

I'll hold for a maintainers approval.

you probably want @nikic, hes the defacto maintainer of instcombine

@nikic this patch is for #54856, which originated from @erikdesjardins.

This refinment is not triggered by building LLVM or anything of the components of the llvm-test-suite.

What are the appropriate next steps to gain InstCombine maintainer sign off?

Thanks in advance for your feedback.

@nikic this patch is for #54856, which originated from @erikdesjardins.

This refinment is not triggered by building LLVM or anything of the components of the llvm-test-suite.

What are the appropriate next steps to gain InstCombine maintainer sign off?

Thanks in advance for your feedback.

you might try @RKSimon.

Also to be fair I do end-to-end reviews (just not a maintainer) and believe this is okay to go out as is. If you don't hear back
soon I don't think it would be an issue to push.
I can run the zorg buildbot tests on this for you if it would help you feel more comfortable about it, or of course you can
wait as you see fit.

@goldstein.w.n thanks for the reply.

I'll wait until Monday and then proceed to land the patch.

Is it customary to get approval for preceding NFC patches? If yes, can you please take a look at D143044?

Thanks,
John

@goldstein.w.n thanks for the reply.

I'll wait until Monday and then proceed to land the patch.

Is it customary to get approval for preceding NFC patches? If yes, can you please take a look at D143044?

No (ish). If you have push access I think its generally fine to push them with the implementation unless someone posts feedback on it.

Thanks,
John

This revision was landed with ongoing or failed builds.Jun 9 2023, 3:10 PM
This revision was automatically updated to reflect the committed changes.