This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add oneuse checks to shr + cmp constant folds.
ClosedPublic

Authored by aemerson on May 4 2023, 6:01 PM.

Details

Summary

This change has virtually no code size regressions on the llvm test suite (+ SPECs)
while having these improvements (measured with -Os on Darwin arm64):

External/S.../CFP2006/450.soplex/450.soplex    214024.00      213920.00     -0.0%
External/S...7speed/641.leela_s/641.leela_s     93412.00       93348.00     -0.1%
External/S...17rate/541.leela_r/541.leela_r     93412.00       93348.00     -0.1%
MultiSourc.../Applications/JM/lencod/lencod    426044.00      425748.00     -0.1%
MultiSourc...rks/mediabench/gsm/toast/toast     20436.00       20416.00     -0.1%
MultiSourc...ench/telecomm-gsm/telecomm-gsm     20436.00       20416.00     -0.1%
MultiSourc...Prolangs-C/assembler/assembler     16172.00       16156.00     -0.1%
MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode     35332.00       35256.00     -0.2%
SingleSour...Adobe-C++/stepanov_abstraction      6904.00        6888.00     -0.2%
External/SPEC/CINT2000/254.gap/254.gap         366060.00      365132.00     -0.3%
MultiSourc...-ProxyApps-C++/PENNANT/PENNANT     79688.00       79484.00     -0.3%
External/S...NT2006/464.h264ref/464.h264ref    352044.00      351132.00     -0.3%
SingleSour...arks/Adobe-C++/functionobjects     15524.00       15480.00     -0.3%
SingleSour...arks/Adobe-C++/stepanov_vector     10728.00       10696.00     -0.3%
SingleSour...ks/Misc-C++/stepanov_container     16900.00       16848.00     -0.3%
MultiSource/Applications/oggenc/oggenc         124184.00      123780.00     -0.3%
SingleSour...tout-C++/Shootout-C++-wordfreq      7060.00        7036.00     -0.3%
MultiSourc...ity-rijndael/security-rijndael      8976.00        8936.00     -0.4%
MultiSource/Benchmarks/McCat/18-imp/imp          9816.00        9772.00     -0.4%
SingleSour...chmarks/Misc-C++/stepanov_v1p2      1772.00        1764.00     -0.5%
MultiSourc...iabench/g721/g721encode/encode      5492.00        5464.00     -0.5%
MultiSourc...rks/McCat/03-testtrie/testtrie      1364.00        1344.00     -1.5%
SingleSour.../execute/GCC-C-execute-pr42833       400.00         364.00     -9.0%

Doing so also prevents a regression described in https://reviews.llvm.org/D143624

Diff Detail

Event Timeline

aemerson created this revision.May 4 2023, 6:01 PM
goldstein.w.n added inline comments.
llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
900

Can you split the tests to a seperate patch so we can see the dif?

aemerson added inline comments.May 4 2023, 7:16 PM
llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
900

You can just look at the test immediately before this one to see what the original codegen is. The difference is just that the code is this case isn’t combined.

Where are the codegen improvements coming from?
Better imm values or something? Personally this feels a bit of a hack at the expense of potential ILP.

Where are the codegen improvements coming from?
Better imm values or something? Personally this feels a bit of a hack at the expense of potential ILP.

I'll look into exactly what changed in the codegen tomorrow. Why does this feel like a hack as opposed to all the other hasOneUse() checks in the optimizers?

Where are the codegen improvements coming from?
Better imm values or something? Personally this feels a bit of a hack at the expense of potential ILP.

I'll look into exactly what changed in the codegen tomorrow. Why does this feel like a hack as opposed to all the other hasOneUse() checks in the optimizers?

The hasOneUse checks are there to prevent creating more instructions. Even with multi-use on ashr this combine
doesn't create more instructions and does shrink the critical path so it passes muster for an InstCombine.

If it is only darwin arm64 that cares about this change, why not handle it in the backend?

Where are the codegen improvements coming from?
Better imm values or something? Personally this feels a bit of a hack at the expense of potential ILP.

I'll look into exactly what changed in the codegen tomorrow. Why does this feel like a hack as opposed to all the other hasOneUse() checks in the optimizers?

The hasOneUse checks are there to prevent creating more instructions. Even with multi-use on ashr this combine
doesn't create more instructions and does shrink the critical path so it passes muster for an InstCombine.

Ok, that's fair.

If it is only darwin arm64 that cares about this change, why not handle it in the backend?

First of all, I didn't say that only Darwin arm64 cares about this, it's just the system that I happened to run the test suite build on. Secondly, these combines aren't easy to undo (at least one of them involves combing two constants into one). If anything, I think these combines make more sense to do at the codegen level than here but I could be wrong.

So looking at two cases, in oggenc there are quite a few occurrences of select patterns being broken by this combine (because the compare immediate is being munged into something not useful). These result in fewer simplifications which will be both larger and slower.

In MiBench/security-rijndael, we have as you say unfriendly immediate to compares being generated which causes us to have to materialize to registers. This is both a performance and code size hit.

dmgreen accepted this revision.May 7 2023, 4:16 PM

I can verify that this fixes the issues, including the other one from https://reviews.llvm.org/D143624#4315468 that looks more difficult to reverse. It may be worth adding some tests like https://godbolt.org/z/cxKrfM4TK. But otherwise it LGTM. Getting more ILP sounds like something the backend should handle, if it is profitable given the constants.

This revision is now accepted and ready to land.May 7 2023, 4:16 PM

I can verify that this fixes the issues, including the other one from https://reviews.llvm.org/D143624#4315468 that looks more difficult to reverse. It may be worth adding some tests like https://godbolt.org/z/cxKrfM4TK. But otherwise it LGTM. Getting more ILP sounds like something the backend should handle, if it is profitable given the constants.

I think the generic statement "more ILP is better than less ILP" is fair for instcombine to make. Choosing which constants are profitable (what this patch is trying to approximate), however, still doesn't make sense to me.

I can verify that this fixes the issues, including the other one from https://reviews.llvm.org/D143624#4315468 that looks more difficult to reverse. It may be worth adding some tests like https://godbolt.org/z/cxKrfM4TK. But otherwise it LGTM. Getting more ILP sounds like something the backend should handle, if it is profitable given the constants.

I think the generic statement "more ILP is better than less ILP" is fair for instcombine to make.

Do you have an alternative suggestion given that we can’t really undo some of these transforms? ILP is not the only consideration that instcombine has, code size (which we can demonstrate is improved by this change) is an equally important goal.

Choosing which constants are profitable (what this patch is trying to approximate), however, still doesn't make sense to me.

We’re not trying to approximate choosing constants. We’re just adding a one use check.

nikic requested changes to this revision.May 8 2023, 12:09 PM

Please add tests as @dmgreen suggested (and also for whatever pattern you saw in oggenc, if it's a different one) and precommit them. See https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests if you are unfamiliar with the precommit workflow.

I'm open to considering this, as ultimately what we care about in InstCombine is what form is most beneficial for further transforms. Blocking this in the multi-use case will lose us some folds that could have made use of the simplified condition, but your data seems to suggest that in practice we lose more by failing to match select patterns. The alternative is to try harder when matching those select patterns.

However, if we want to do this change, you will first have to implement LVI support to effectively undo this when reasoning about condition ranges. That is, LVI needs to know how to determine the range of X given a condition of the form (ashr X, C) pred C2. This works automatically if such comparisons get folded, but requires special support if they don't.

This revision now requires changes to proceed.May 8 2023, 12:09 PM

Please add tests as @dmgreen suggested (and also for whatever pattern you saw in oggenc, if it's a different one) and precommit them. See https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests if you are unfamiliar with the precommit workflow.

I'm open to considering this, as ultimately what we care about in InstCombine is what form is most beneficial for further transforms. Blocking this in the multi-use case will lose us some folds that could have made use of the simplified condition, but your data seems to suggest that in practice we lose more by failing to match select patterns. The alternative is to try harder when matching those select patterns.

However, if we want to do this change, you will first have to implement LVI support to effectively undo this when reasoning about condition ranges. That is, LVI needs to know how to determine the range of X given a condition of the form (ashr X, C) pred C2. This works automatically if such comparisons get folded, but requires special support if they don't.

I'm not familiar with LVI, but I plugged in:

define i1 @ashrsgt_01_00(i4 %x, ptr %p) {
  %s = ashr i4 %x, 1
  %c = icmp sgt i4 %s, 0
  ret i1 %c
}

into opt and even after instcombine folds that to a single icmp, LVI doesn't seem to know anything about %x, just that it's overdefined. I assume you mean something else?

nikic added a comment.May 9 2023, 4:16 AM

@aemerson You need to use the comparison as a branch condition, something like this:

define i1 @test(i4 %x) {
  %s = ashr i4 %x, 1
  %c = icmp sgt i4 %s, 0
  br i1 %c, label %if, label %else
if:
  %c2 = icmp sgt i4 %x, 1
  ret i1 %c2
else:
  ret i1 false
}

CVP will not be able to fold this anymore if the comparison isn't moved to %x in a multi-use scenario (for that matter, InstCombine won't either, which is something it does handle right now).

aemerson updated this revision to Diff 557818.Oct 20 2023, 2:32 PM

Rebase on top of pre-committed tests.

@nikic Now that the LVI patch has went in, is this ok?

nikic accepted this revision.Oct 26 2023, 2:39 AM

LGTM

llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
3–4

Adjust comment.

This revision is now accepted and ready to land.Oct 26 2023, 2:39 AM