This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
CMakeLists.txt
3/3
InstCombineAddSub.cpp
1/1
InstCombineInternal.h
13/13
InstCombineNegator.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
mul.ll
4/4
sub-of-negatible.ll

Differential D68408

[InstCombine] Negator - sink sinkable negations
ClosedPublic

Authored by lebedev.ri on Oct 3 2019, 10:27 AM.

Download Raw Diff

Details

Reviewers

spatel
nikic
efriedma
xbolva00
vitalybuka
dvyukov

Commits

rG352fef3f11f5: [InstCombine] Negator - sink sinkable negations

Summary

As we have discussed previously (e.g. in D63992 / D64090 / PR42457), sub instruction
can almost be considered non-canonical. While we do convert sub %x, C -> add %x, -C,
we sparsely do that for non-constants. But we should.

Here, i propose to interpret sub %x, %y as add (sub 0, %y), %x IFF the negation can be sinked into the %y

This has some potential to cause endless combine loops (either around PHI's, or if there are some opposite transforms).
For former there's -instcombine-negator-max-depth option to mitigate it, should this expose any such issues
For latter, if there are still any such opposing folds, we'd need to remove the colliding fold.
In any case, reproducers welcomed!

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 3 2019, 10:27 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptOct 3 2019, 10:27 AM

lebedev.ri edited the summary of this revision. (Show Details)Oct 3 2019, 10:28 AM

lebedev.ri edited the summary of this revision. (Show Details)Oct 3 2019, 10:33 AM

Actually, sub can be freely negated.

This *is* unusual. Is this too ugly to live?

I'd prefer to actually transform the operation tree at the point we decide it's profitable, to make it clear we can't end up in an infinite loop or something like that. As it is, you're depending on some other transforms happening in a particular order, and it's not clear that will happen consistently. (Yes, it's a little more code, but I think that's okay.)

In D68408#1695357, @efriedma wrote:

This *is* unusual. Is this too ugly to live?

I'd prefer to actually transform the operation tree at the point we decide it's profitable, to make it clear we can't end up in an infinite loop or something like that. As it is, you're depending on some other transforms happening in a particular order, and it's not clear that will happen consistently. (Yes, it's a little more code, but I think that's okay.)

Actually, that's presicely why i believed this may be the best approach - we both avoid lot's of code duplication,
and if there are missing folds this *will* shake them loose - if they don't happen it'll "be caught" as a deadlock.

I'm not really sure how more direct approach would look..

In D68408#1695399, @lebedev.ri wrote:

In D68408#1695357, @efriedma wrote:

This *is* unusual. Is this too ugly to live?

I'd prefer to actually transform the operation tree at the point we decide it's profitable, to make it clear we can't end up in an infinite loop or something like that. As it is, you're depending on some other transforms happening in a particular order, and it's not clear that will happen consistently. (Yes, it's a little more code, but I think that's okay.)

I'm not really sure how more direct approach would look..

Okay so this is nowhere near polished-enough, but how about this?

Herald added a subscriber: mgorny. · View Herald TranscriptOct 5 2019, 3:02 PM

Give more thought as to whether the new instructions should be inserted or not, and actually succeed in building test-suite.

@efriedma any high-level feedback on this?

Hmm, that's a bit more complicated than I hoped it would be... but I don't see any obvious way to simplify it.

It looks like this speculatively creates new instructions, then erases them on failure?

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
2	InstCombineAddSub.cpp?

In D68408#1708655, @efriedma wrote:

Hmm, that's a bit more complicated than I hoped it would be... but I don't see any obvious way to simplify it.

The one big missing thing is to use an actual worklist instead of recursion, but i'm not sure how to do that yet.

It looks like this speculatively creates new instructions, then erases them on failure?

Yes. More specifically, while speculatively creating them, it inserts them internal worklist
and in their proper places in basic blocks. If we succeed to negate the entire tree,
that worklist is then propagated to instcombine itself.
If that doesn't happen, then they are 'trivially' dead and deleted.

Thank you for taking a look!
Also handle trunc, fix comments.

If that doesn't happen, then they are 'trivially' dead and deleted.

Deleted where, exactly? If you're expecting the instcombine main loop to delete them, you'll force instcombine into an infinite loop, I think.

In D68408#1710214, @efriedma wrote:

If that doesn't happen, then they are 'trivially' dead and deleted.

Deleted where, exactly? If you're expecting the instcombine main loop to delete them, you'll force instcombine into an infinite loop, I think.

Hmm, they are not added to the instcombine's worklist unless we succeed in negating
the entire tree, and this passes test-suite with no infinite looping.

You are saying that we should instead DCE instructions in *our* worklist if we fail, correct?

Erase newly-created/inserted instruction if negation failed.

Bump

bump

Ping

@efriedma - do you want to continue reviewing?

I've just pointed out a few nits for now.

llvm/lib/Transforms/InstCombine/InstCombineInternal.h
958	typo: attempt
llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
169	it's -> its
183	it's -> its
193	negatible one -> negatible if one
193	it's -> its
244	typo: temporarily
llvm/test/Transforms/InstCombine/sub-of-negatible.ll
159–160	I didn't follow the diffs here - was one of these tests redundant? The code comment didn't match before, but it still doesn't?
271–272	Add these tests with baseline results as pre-commit?
351	it's -> its

In D68408#1767626, @spatel wrote:

@efriedma - do you want to continue reviewing?

In D68408#1767626, @spatel wrote:

I've just pointed out a few nits for now.

Nits addressed.

There are some more patterns that aren't handled here.
The idea is to ideally move them all here.

llvm/test/Transforms/InstCombine/sub-of-negatible.ll
159–160	The test was too complicated, was checking more than the minimal pattern - subtraction can be freely negated by swapping its operands.

Do you have stats on how often this fires? Impact on compile-time?

Remove more specific pattern-matching from InstCombiner::visitSub() simultaneously with adding this more general functionality, so we don't have redundancy (and limit compile-time impact)?

lebedev.ri mentioned this in rGb89ba5f9399a: [NFC][InstCombine] Autogenerate check lines in a few tests.Dec 4 2019, 2:19 PM

NOT READY FOR REVIEW

In D68408#1769213, @spatel wrote:

Remove more specific pattern-matching from InstCombiner::visitSub() simultaneously with adding this more general functionality, so we don't have redundancy (and limit compile-time impact)?

I was hoping to do that in steps, but i guess that's one way to boost those stats :))
I think i have moved everything relevant from InstCombiner::visitSub() now.
Observations:

There's an artificial one-use restriction that needs to go away (when Depth=0 and we are looking at sub 0, %x)
We loose/do not propagate no wrap flags
InstCombiner::visitAdd() change is needed because of how good we are get at sinking negations - else there are two opposite folds. It should be done beforehand, separately.
Same with InstCombiner::foldAddWithConstant() change, not entirely related to the diff.
There are some other regressions.

lebedev.ri mentioned this in D71064: [InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`).Dec 5 2019, 6:21 AM

lebedev.ri mentioned this in rG796fa662f128: [InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to….Dec 5 2019, 10:24 AM

Rebased, slightly better, but not there yet still.

xbolva00 added a subscriber: xbolva00.Dec 5 2019, 10:55 AM

Still not for review

Some more unreachable code removed from InstCombiner::visitSub()

Maybe you can change title to [WIP] or [NOT READY FOR REVIEW] ?

xbolva00 added inline comments.Dec 6 2019, 4:41 AM

llvm/test/Transforms/InstCombine/sub.ll
1183 ↗	(On Diff #232532)	Remove FIXME?

spatel mentioned this in D72007: [InstCombine] try to pull 'not' of select into compare operands.Jan 4 2020, 6:17 AM

nikic mentioned this in D72978: [InstCombine] Combine neg of shl of sub (PR44529).Jan 18 2020, 8:05 AM

nikic mentioned this in rG0b83c5a78fae: [InstCombine] Combine neg of shl of sub (PR44529).Jan 22 2020, 2:04 PM

spatel mentioned this in D77230: [InstCombine] enhance freelyNegateValue() by handling xor.Apr 1 2020, 10:23 AM

spatel mentioned this in rG3d9004879118: [InstCombine] enhance freelyNegateValue() by handling xor.Apr 1 2020, 12:23 PM

spatel mentioned this in rG1008435f3d47: Revert "[InstCombine] do not exclude min/max from icmp with casted operand fold".Apr 2 2020, 7:01 AM

Rebased.

Finally understood the interfering transforms - this fundamentally interferes with D48754.
Yes, i see the PhaseOrdering regressions if we revert that.
Not yet sure how to deal with this, i see 3 options:

don't do such canonicalization (what i've done here)
don't implement Negator
add Abs IR instruction (i think not)
Don't try to sink negation when it's used by abs/nabs pattern

This doesn't hang check-llvm/test-suite.

Harbormaster failed remote builds in B52802: Diff 256768!Apr 11 2020, 7:27 AM

Handle PHI's, too.
Strangely, i distinctively recall seeing some preexisting test
causing an endless cycle, but i'm no longer observing that problem.

Harbormaster failed remote builds in B52817: Diff 256801!Apr 11 2020, 4:00 PM

xbolva00 added inline comments.Apr 11 2020, 4:11 PM

llvm/test/Transforms/PhaseOrdering/min-max-abs-cse.ll
39 ↗	(On Diff #256801)	Regression

And solve potential endless combine cycle by not trying to sink negations
if that instruction has an user that looks like abs/nabs.

I think this is it for now, so review is welcomed..

llvm/test/Transforms/PhaseOrdering/min-max-abs-cse.ll
39 ↗	(On Diff #256801)	https://reviews.llvm.org/D68408#1976039

In D68408#1976039, @lebedev.ri wrote:

add Abs IR instruction (i think not)

Is the answer the same for an intrinsic? The more we see these kinds of problems, the more I wish we had created intrinsics for abs/smin/smax/umin/umax long ago. We keep trying to work around this limitation in IR, but it doesn't seem worth it. We have abs() functions in source and an abs node in codegen, so we're creating intermediate ops that don't exist on either side of IR.

Harbormaster failed remote builds in B52854: Diff 256855!Apr 12 2020, 8:32 AM

In D68408#1976790, @spatel wrote:

In D68408#1976039, @lebedev.ri wrote:

add Abs IR instruction (i think not)

Is the answer the same for an intrinsic?

Ack, i meant instruction/intrinsic interchangeably here.

The more we see these kinds of problems, the more I wish we had created intrinsics for abs/smin/smax/umin/umax long ago. We keep trying to work around this limitation in IR, but it doesn't seem worth it. We have abs() functions in source and an abs node in codegen, so we're creating intermediate ops that don't exist on either side of IR.

Note that we similarly don't have saturating multiplication, overflowing/saturating left-shift.

In D68408#1976807, @lebedev.ri wrote:

In D68408#1976790, @spatel wrote:

In D68408#1976039, @lebedev.ri wrote:

add Abs IR instruction (i think not)

Is the answer the same for an intrinsic?

Ack, i meant instruction/intrinsic interchangeably here.

We have grown more accepting of intrinsics (overflow, saturating, funnel, etc.) recently, so I'm not sure if the old arguments against an abs intrinsic still hold. Is there anything in particular about abs() that makes it different?

The more we see these kinds of problems, the more I wish we had created intrinsics for abs/smin/smax/umin/umax long ago. We keep trying to work around this limitation in IR, but it doesn't seem worth it. We have abs() functions in source and an abs node in codegen, so we're creating intermediate ops that don't exist on either side of IR.

Note that we similarly don't have saturating multiplication, overflowing/saturating left-shift.

Yes, there's no clear line that I know of. It's like IR canonicalization - we make the rules up as we go along and adapt the surrounding logic to work with the current format. I think we've overcome the limitation for this patch, so we don't need to gate this patch on a decision, but I think we still have the problem. The clearest sign is that -- within instcombine -- we have bailouts for otherwise accepted canonicalizations only if we "matchSelectPattern()".

In D68408#1977656, @spatel wrote:

Yes, there's no clear line that I know of. It's like IR canonicalization - we make the rules up as we go along and adapt the surrounding logic to work with the current format. I think we've overcome the limitation for this patch, so we don't need to gate this patch on a decision, but I think we still have the problem. The clearest sign is that -- within instcombine -- we have bailouts for otherwise accepted canonicalizations only if we "matchSelectPattern()".

FWIW I agree that we need to reevaluate this decision and at least introduce min/max intrinsics. These are special-cased in too many places for the current treatment as a simple compare and select to still make sense. Especially when we take into account that there seems to be a new min/max related infinite loop every other month, and that the icmp/select representation has ill-defined behavior when it comes to undef. I expect the effort for introducing these intrinsics will be pretty similar to saturating add/sub, and would be happy to chip in...

+1 for new abs/min/max intrinsics

It'd be great if we could separate the new intrinsics disscussion from this patch..

In D68408#1977656, @spatel wrote:

In D68408#1976807, @lebedev.ri wrote:

In D68408#1976790, @spatel wrote:

In D68408#1976039, @lebedev.ri wrote:

add Abs IR instruction (i think not)

Is the answer the same for an intrinsic?

Ack, i meant instruction/intrinsic interchangeably here.

We have grown more accepting of intrinsics (overflow, saturating, funnel, etc.) recently, so I'm not sure if the old arguments against an abs intrinsic still hold. Is there anything in particular about abs() that makes it different?

No, i just don't want to condition this patch on introduction of whole new set of intrinsics :]

spatel mentioned this in D74484: [AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand..Apr 14 2020, 6:55 AM

In D68408#1978925, @nikic wrote:

In D68408#1977656, @spatel wrote:

Yes, there's no clear line that I know of. It's like IR canonicalization - we make the rules up as we go along and adapt the surrounding logic to work with the current format. I think we've overcome the limitation for this patch, so we don't need to gate this patch on a decision, but I think we still have the problem. The clearest sign is that -- within instcombine -- we have bailouts for otherwise accepted canonicalizations only if we "matchSelectPattern()".

FWIW I agree that we need to reevaluate this decision and at least introduce min/max intrinsics. These are special-cased in too many places for the current treatment as a simple compare and select to still make sense. Especially when we take into account that there seems to be a new min/max related infinite loop every other month, and that the icmp/select representation has ill-defined behavior when it comes to undef. I expect the effort for introducing these intrinsics will be pretty similar to saturating add/sub, and would be happy to chip in...

Sounds like general agreement for the people on this review (and sorry for going off-topic for this specific patch). I'll post something to llvm-dev after trying to dig up the earlier discussions for those intrinsics.

And looks like we have this month's infinite loop here :) -
https://bugs.llvm.org/show_bug.cgi?id=45539

In D68408#1981910, @spatel wrote:

In D68408#1978925, @nikic wrote:

In D68408#1977656, @spatel wrote:

Yes, there's no clear line that I know of. It's like IR canonicalization - we make the rules up as we go along and adapt the surrounding logic to work with the current format. I think we've overcome the limitation for this patch, so we don't need to gate this patch on a decision, but I think we still have the problem. The clearest sign is that -- within instcombine -- we have bailouts for otherwise accepted canonicalizations only if we "matchSelectPattern()".

FWIW I agree that we need to reevaluate this decision and at least introduce min/max intrinsics. These are special-cased in too many places for the current treatment as a simple compare and select to still make sense. Especially when we take into account that there seems to be a new min/max related infinite loop every other month, and that the icmp/select representation has ill-defined behavior when it comes to undef. I expect the effort for introducing these intrinsics will be pretty similar to saturating add/sub, and would be happy to chip in...

Sounds like general agreement for the people on this review (and sorry for going off-topic for this specific patch). I'll post something to llvm-dev after trying to dig up the earlier discussions for those intrinsics.

Note that i've posted https://github.com/AliveToolkit/alive2/pull/353 with my vision of their modelling.
Notably, i don't think we should have nabs, and i believe abs should be similar to the cttz/ctlz
in the sense that it should take a second param - i1 NSW.

In D68408#1978925, @nikic wrote:

In D68408#1977656, @spatel wrote:

Yes, there's no clear line that I know of. It's like IR canonicalization - we make the rules up as we go along and adapt the surrounding logic to work with the current format. I think we've overcome the limitation for this patch, so we don't need to gate this patch on a decision, but I think we still have the problem. The clearest sign is that -- within instcombine -- we have bailouts for otherwise accepted canonicalizations only if we "matchSelectPattern()".

FWIW I agree that we need to reevaluate this decision and at least introduce min/max intrinsics. These are special-cased in too many places for the current treatment as a simple compare and select to still make sense. Especially when we take into account that there seems to be a new min/max related infinite loop every other month, and that the icmp/select representation has ill-defined behavior when it comes to undef. I expect the effort for introducing these intrinsics will be pretty similar to saturating add/sub, and would be happy to chip in...

I think i'd like to try to handle that, since out of the people in this disscussion
only i (well, and @xbolva00) haven't dealt with that before, may be good to spread the knowledge.

Ping. What would it take to get this moving? :)

Compile-time numbers look good: http://llvm-compile-time-tracker.com/compare.php?from=f52e0507574b4fd84dc4674536f5dfbab396c0f6&to=0a009b654793dee8e335c053eb043e297071e0d1&stat=instructions

Change does not seem to have cost above noise, apart from a -0.6% improvement on tramp3d-v4. There's a corresponding -0.74% reducing in code-size, so this transform is clearly doing (or enabling) something big there. It might be interesting (but not necessary) to take a quick look at what happens there.

In D68408#1991910, @nikic wrote:

Compile-time numbers look good: http://llvm-compile-time-tracker.com/compare.php?from=f52e0507574b4fd84dc4674536f5dfbab396c0f6&to=0a009b654793dee8e335c053eb043e297071e0d1&stat=instructions

I was actually trying to get that info, and i'm not sure what step i'm missing other than pushing the [[ https://github.com/LebedevRI/llvm-project/tree/perf/instcombine-negator | perf/* ]] branch?

Change does not seem to have cost above noise, apart from a -0.6% improvement on tramp3d-v4. There's a corresponding -0.74% reducing in code-size, so this transform is clearly doing (or enabling) something big there. It might be interesting (but not necessary) to take a quick look at what happens there.

Ah, interesting.
I actually expected this to have measurable negative(bad) cost,
so it's a pleasant surprise to see beneficial numbers here :)

In D68408#1991930, @lebedev.ri wrote:

In D68408#1991910, @nikic wrote:

Compile-time numbers look good: http://llvm-compile-time-tracker.com/compare.php?from=f52e0507574b4fd84dc4674536f5dfbab396c0f6&to=0a009b654793dee8e335c053eb043e297071e0d1&stat=instructions

I was actually trying to get that info, and i'm not sure what step i'm missing other than pushing the [[ https://github.com/LebedevRI/llvm-project/tree/perf/instcombine-negator | perf/* ]] branch?

I need to manually add your fork as a remote first, so branches get picked up (too many forks to listen to all of them). I've done that now.

In D68408#1991910, @nikic wrote:

Compile-time numbers look good: http://llvm-compile-time-tracker.com/compare.php?from=f52e0507574b4fd84dc4674536f5dfbab396c0f6&to=0a009b654793dee8e335c053eb043e297071e0d1&stat=instructions

Change does not seem to have cost above noise, apart from a -0.6% improvement on tramp3d-v4. There's a corresponding -0.74% reducing in code-size, so this transform is clearly doing (or enabling) something big there. It might be interesting (but not necessary) to take a quick look at what happens there.

Yes, it would be good to derive a regression test from that benchmark and/or invent some larger tests that show the greater optimization power of the new code. Unless I missed it, all of the current test diffs show that we do no harm, but if we can show that the added code/complexity buys us something immediately, that makes the benefit clear.

In D68408#1992227, @spatel wrote:

In D68408#1991910, @nikic wrote:

Compile-time numbers look good: http://llvm-compile-time-tracker.com/compare.php?from=f52e0507574b4fd84dc4674536f5dfbab396c0f6&to=0a009b654793dee8e335c053eb043e297071e0d1&stat=instructions

Change does not seem to have cost above noise, apart from a -0.6% improvement on tramp3d-v4. There's a corresponding -0.74% reducing in code-size, so this transform is clearly doing (or enabling) something big there.
It might be interesting (but not necessary) to take a quick look at what happens there.

Yes, it would be good to derive a regression test from that benchmark

old.ll.xz853 KBDownload

new.ll.xz846 KBDownload

The impact there is quite noticeable as per llvm-diff, which almost immediately crashes.
It appears, a lot more inlining happens (functions now-missing in new.ll),
and some more function 'specialization' (functions now-appearing in new.ll).
I'm not sure i can distill/filter that to make any reasonable test case..

In D68408#1992227, @spatel wrote:

and/or invent some larger tests that show the greater optimization power of the new code.
Unless I missed it, all of the current test diffs show that we do no harm, but if we can show that the added code/complexity buys us something immediately, that makes the benefit clear.

The claim i'm making in the patch's description is that we almost consider sub instruction
non-canonical, and we should be trying to fold it away as much as possible.
These test cases show the common patters, that we miss currently, where we can get rid of it.

This approach is what was requested in

In D68408#1695357, @efriedma wrote:

This *is* unusual. Is this too ugly to live?

I'd prefer to actually transform the operation tree at the point we decide it's profitable, to make it clear we can't end up in an infinite loop or something like that. As it is, you're depending on some other transforms happening in a particular order, and it's not clear that will happen consistently. (Yes, it's a little more code, but I think that's okay.)

Given that we have compile-time data now, LGTM.
The implementation goes beyond my normal casual C++ usage (eg, I'd never seen 'zip' before), so if someone else can take a 2nd/final look too, that would be great.

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
134	Remove/update comment.

This revision is now accepted and ready to land.Apr 20 2020, 1:02 PM

nikic added inline comments.Apr 20 2020, 1:51 PM

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
181	Noticed while looking through the tramp3d-v4 diff: This should be behind a one-use check, to avoid duplicating expensive division instructions.

xbolva00 added inline comments.Apr 20 2020, 1:52 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1923	Just wondering if we could use some better name for this lambda than “cleanup”.

lebedev.ri added inline comments.Apr 20 2020, 2:19 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1923	I couldn't come up with a better one, any suggestions? `TryToNarrowDeduceFlags()`? This is where `goto` might make sense, but somehow i don't want to use it..

xbolva00 added inline comments.Apr 20 2020, 2:48 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1923	Your idea is fine I think.

lebedev.ri added inline comments.Apr 20 2020, 2:52 PM

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
181	I was hesitant about this one indeed, it isn't a typo that one-use check has gone away here, because we generally consider only instruction count. @spatel thoughts? But the main, bigger question this touches is: "but what if all the uses would get negated by us? In future, can we somehow sanely model the whole negatible tree, not giving up at non-single-use instructions, but defer that to after we've finished building new tree?"

spatel added inline comments.Apr 21 2020, 5:52 AM

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
181	I missed that logic difference, and I'm not getting a reviewable diff of the attached files with llvm-diff or other apps. Can you create an IR example/regression test for that?

lebedev.ri added inline comments.Apr 21 2020, 6:06 AM

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
181	@spatel To be clear, the "bigger question" is pretty rhetorical, or at least not for this review. The actual question here is whether we should consider `sdiv` special, and even if we can negate it without increasing instruction count, we should only do so if there are no other uses of old `sdiv`.

spatel added inline comments.Apr 21 2020, 6:45 AM

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
181	There is precedence for this kind of special treatment. In other words, not all opcodes are equal in terms of analysis (and secondary concern of codegen), and we will even increase instruction count to avoid some like div/rem (although those transforms are probably currently not safe with respect to poison). If it would be NFC-ish to keep the one-use check, then we should do that. Then, remove the limitation as a follow-up if that can be shown useful?

Updated: adjust comments, lambda name, guard sdiv with an artificial one-use-check.

lebedev.ri added inline comments.Apr 21 2020, 10:50 AM

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp
181	Alright, i'll add one-use check just to get this moving :)

Harbormaster failed remote builds in B54127: Diff 259055!Apr 21 2020, 11:21 AM

Closed by commit rG352fef3f11f5: [InstCombine] Negator - sink sinkable negations (authored by lebedev.ri). · Explain WhyApr 21 2020, 12:27 PM

This revision was automatically updated to reflect the committed changes.

Hi,

This change causes a performance regression in tsan, as detected on our LLVM buildbot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/49850/steps/tsan%20analyze/logs/stdio

The script that comes with tsan checks the number of PUSH, etc in some of the key tsan functions,
where each extra PUSH cases tsan to be slower.

With this change, the number of PUSHes went from 3 to 4.

Please take a look, this might be a performance regression for a wider set of targets.

Before your change:

read1 tot 484; size 1830; rsp 1; push 3; pop 15; call 2; load 24; store  9; sh  46; mov 106; lea   2; cmp  76

After your change:

read1 tot 515; size 1980; rsp 1; push 4; pop 4; call 2; load 24; store  9; sh  46; mov 113; lea   2; cmp  90

Script to reproduce (in the llvm-project root dir, with "build" subdir)

#!/bin/bash

compile() {

clang -c -O2  compiler-rt/lib/tsan/rtl/tsan_rtl.cpp -I compiler-rt/lib -Wall -std=c++14 -Wno-unused-parameter -O2 -g -DNDEBUG    -m64 -fno-lto -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fno-sanitize=safe-stack -fvisibility=hidden -fno-lto -O3 -gline-tables-only -Wno-gnu -Wno-variadic-macros -Wno-c99-extensions -Wno-non-virtual-dtor -fPIE -fno-rtti -msse3 -Wframe-larger-than=530 -Wglobal-constructors

}

git checkout a13dce1d90cba6c55252dee0a2600eab37ffbc44
(cd build; ninja clang 2> /dev/null)
compile
compiler-rt/lib/tsan/analyze_libtsan.sh tsan_rtl.o | grep read1

git checkout 352fef3f11f5ccb2ddc8e16cecb7302a54721e9f
(cd build; ninja clang 2> /dev/null)
compile
compiler-rt/lib/tsan/analyze_libtsan.sh tsan_rtl.o | grep read1

In D68408#1998112, @kcc wrote:

Hi,

Hi.

This change causes a performance regression in tsan, as detected on our LLVM buildbot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/49850/steps/tsan%20analyze/logs/stdio

Looks like the build was red already: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/49808
That explains why i didn't see the new failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/49809

The script that comes with tsan checks the number of PUSH, etc in some of the key tsan functions,
where each extra PUSH cases tsan to be slower.

With this change, the number of PUSHes went from 3 to 4.

Please take a look, this might be a performance regression for a wider set of targets.

Before your change:
read1 tot 484; size 1830; rsp 1; push 3; pop 15; call 2; load 24; store  9; sh  46; mov 106; lea   2; cmp  76
After your change:
read1 tot 515; size 1980; rsp 1; push 4; pop 4; call 2; load 24; store  9; sh  46; mov 113; lea   2; cmp  90

Interesting. Not very unexpected, there's always possibility of an avalanche effect with IR changes.

~~Unhelpful answer: wow how things have regressed since rL342092 / D51985~~

Script to reproduce (in the llvm-project root dir, with "build" subdir)

#!/bin/bash

compile() {
clang -c -O2  compiler-rt/lib/tsan/rtl/tsan_rtl.cpp -I compiler-rt/lib -Wall -std=c++14 -Wno-unused-parameter -O2 -g -DNDEBUG    -m64 -fno-lto -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fno-sanitize=safe-stack -fvisibility=hidden -fno-lto -O3 -gline-tables-only -Wno-gnu -Wno-variadic-macros -Wno-c99-extensions -Wno-non-virtual-dtor -fPIE -fno-rtti -msse3 -Wframe-larger-than=530 -Wglobal-constructors
}

git checkout a13dce1d90cba6c55252dee0a2600eab37ffbc44
(cd build; ninja clang 2> /dev/null)
compile
compiler-rt/lib/tsan/analyze_libtsan.sh tsan_rtl.o | grep read1

git checkout 352fef3f11f5ccb2ddc8e16cecb7302a54721e9f
(cd build; ninja clang 2> /dev/null)
compile
compiler-rt/lib/tsan/analyze_libtsan.sh tsan_rtl.o | grep read1

old.ll1 MBDownload

old.o84 KBDownload

new.ll1 MBDownload

new.o85 KBDownload

llvm-diff report is pretty large, IR instruction-wise, this appears to be a win overall (-639 +609 instructions).
Visually, i think can spot only two IR patterns that we now fail to fold:

in function _ZN6__tsan10InitializeEPNS_11ThreadStateE:
  in block %if.then88.i:
    >   %22 = xor i64 %xor.i.i.i, -17592186044417
    >   %mul.i.i194.neg.i = add i64 %22, 1
    >   %sub.i = add i64 %mul.i.i194.neg.i, %mul.i.i203.i
    <   %mul.i.i194.i = xor i64 %xor.i.i.i, 17592186044416
    <   %sub.i = sub i64 %mul.i.i203.i, %mul.i.i194.i
  in block %if.then88.1.i:
    >   %35 = xor i64 %xor.i.i.1.i, -17592186044417
    >   %mul.i.i194.neg.1.i = or i64 %mul.i.i203.1.i, 1
    >   %sub.1.i = add i64 %mul.i.i194.neg.1.i, %35
    <   %mul.i.i194.1.i = xor i64 %xor.i.i.1.i, 17592186044416
    <   %sub.1.i = sub i64 %mul.i.i203.1.i, %mul.i.i194.1.i

Filed https://bugs.llvm.org/show_bug.cgi?id=45647

But i believe, i'm supposed to look at the @__tsan_read1 function, right?
Then the relevant diff is:

in function __tsan_read1:
  in block %if.then.i1390.i.i.i:
    >   %sub.neg.i1385.i.i.i = sub nsw i64 %and.i19.i1382.i.i.i, %and.i.i1380.i.i.i
    >   %sub.neg.highbits.i1388.i.i.i = lshr i64 %sub.neg.i1385.i.i.i, %and.i.i.i1387.i.i.i
    >   %cmp7.i1389.i.i.i = icmp ne i64 %sub.neg.highbits.i1388.i.i.i, 0
    <   %sub6.i1387.i.i.i = sub nsw i64 0, %sub.i1383.i.i.i
    <   %sub6.highbits.i1388.i.i.i = lshr i64 %sub6.i1387.i.i.i, %and.i.i.i1386.i.i.i
    <   %cmp7.i1389.i.i.i = icmp ne i64 %sub6.highbits.i1388.i.i.i, 0
        %cmp.i1378.i.i.i = icmp ult i64 %xor.i1422.i.i.i, 1125899906842624
    >   %or.cond.i.i.i = or i1 %cmp.i1378.i.i.i, %cmp7.i1389.i.i.i
    >   br i1 %or.cond.i.i.i, label %do.body226.i.i.i, label %if.end86.i.i.i
    <   %or.cond.i.i.i = or i1 %cmp.i1378.i.i.i, %cmp7.i1389.i.i.i
    <   br i1 %or.cond.i.i.i, label %do.body226.i.i.i, label %if.end86.i.i.i

So we've traded 0 - %sub.i1383.i.i.i for %and.i19.i1382.i.i.i - %and.i.i1380.i.i.i
That's it, [un]fortunately, there is nothing else going on..
But thankfully, that explains the problem well.
Pushed rG5a159ed2a8e5a9a6ced73f78e4c64b01d76d3493.
Thanks.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

CMakeLists.txt

1 line

InstCombineAddSub.cpp

6 lines

InstCombineInternal.h

57 lines

InstCombineNegator.cpp

217 lines

test/

Transforms/

InstCombine/

mul.ll

7 lines

sub-of-negatible.ll

151 lines

Diff 223385

llvm/lib/Transforms/InstCombine/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)			set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)
	tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)			tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)
	add_public_tablegen_target(InstCombineTableGen)			add_public_tablegen_target(InstCombineTableGen)

	add_llvm_library(LLVMInstCombine			add_llvm_library(LLVMInstCombine
	InstructionCombining.cpp			InstructionCombining.cpp
	InstCombineAddSub.cpp			InstCombineAddSub.cpp
	InstCombineAtomicRMW.cpp			InstCombineAtomicRMW.cpp
	InstCombineAndOrXor.cpp			InstCombineAndOrXor.cpp
	InstCombineCalls.cpp			InstCombineCalls.cpp
	InstCombineCasts.cpp			InstCombineCasts.cpp
	InstCombineCompares.cpp			InstCombineCompares.cpp
	InstCombineLoadStoreAlloca.cpp			InstCombineLoadStoreAlloca.cpp
	InstCombineMulDivRem.cpp			InstCombineMulDivRem.cpp
				InstCombineNegator.cpp
	InstCombinePHI.cpp			InstCombinePHI.cpp
	InstCombineSelect.cpp			InstCombineSelect.cpp
	InstCombineShifts.cpp			InstCombineShifts.cpp
	InstCombineSimplifyDemanded.cpp			InstCombineSimplifyDemanded.cpp
	InstCombineVectorOps.cpp			InstCombineVectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms			${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/InstCombine			${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/InstCombine

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	)			)

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,894 Lines • ▼ Show 20 Lines	if (match(Op1, m_AShr(m_Value(A), m_APInt(ShAmt))) &&
// --> (A < 0) ? -A : A		// --> (A < 0) ? -A : A
Value *Cmp = Builder.CreateICmpSLT(A, ConstantInt::getNullValue(Ty));		Value *Cmp = Builder.CreateICmpSLT(A, ConstantInt::getNullValue(Ty));
// Copy the nuw/nsw flags from the sub to the negate.		// Copy the nuw/nsw flags from the sub to the negate.
Value *Neg = Builder.CreateNeg(A, "", I.hasNoUnsignedWrap(),		Value *Neg = Builder.CreateNeg(A, "", I.hasNoUnsignedWrap(),
I.hasNoSignedWrap());		I.hasNoSignedWrap());
return SelectInst::Create(Cmp, Neg, A);		return SelectInst::Create(Cmp, Neg, A);
}		}

		// Now that we know we have failed to fold this `sub a, b`, let's try to
		// interpret `sub a, b` as `add a, (sub 0, b)`, and let's try to sink
		// `(sub 0, b)` into b itself.
		if (Value NegOp1 = Negator::Negate(Op1, this))
		return BinaryOperator::CreateAdd(NegOp1, Op0);

if (Instruction *Ext = narrowMathIfNoOverflow(I))		if (Instruction *Ext = narrowMathIfNoOverflow(I))
return Ext;		return Ext;

bool Changed = false;		bool Changed = false;
if (!I.hasNoSignedWrap() && willNotOverflowSignedSub(Op0, Op1, I)) {		if (!I.hasNoSignedWrap() && willNotOverflowSignedSub(Op0, Op1, I)) {
Changed = true;		Changed = true;
I.setHasNoSignedWrap(true);		I.setHasNoSignedWrap(true);
}		}
if (!I.hasNoUnsignedWrap() && willNotOverflowUnsignedSub(Op0, Op1, I)) {		if (!I.hasNoUnsignedWrap() && willNotOverflowUnsignedSub(Op0, Op1, I)) {
Changed = true;		Changed = true;
I.setHasNoUnsignedWrap(true);		I.setHasNoUnsignedWrap(true);
}		}

return Changed ? &I : nullptr;		return Changed ? &I : nullptr;
}		}
		xbolva00Unsubmitted Done Reply Inline Actions Just wondering if we could use some better name for this lambda than “cleanup”. xbolva00: Just wondering if we could use some better name for this lambda than “cleanup”.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions I couldn't come up with a better one, any suggestions? `TryToNarrowDeduceFlags()`? This is where `goto` might make sense, but somehow i don't want to use it.. lebedev.ri: I couldn't come up with a better one, any suggestions? `TryToNarrowDeduceFlags()`? This is…
		xbolva00Unsubmitted Done Reply Inline Actions Your idea is fine I think. xbolva00: Your idea is fine I think.

/// This eliminates floating-point negation in either 'fneg(X)' or		/// This eliminates floating-point negation in either 'fneg(X)' or
/// 'fsub(-0.0, X)' form by combining into a constant operand.		/// 'fsub(-0.0, X)' form by combining into a constant operand.
static Instruction *foldFNegIntoConstant(Instruction &I) {		static Instruction *foldFNegIntoConstant(Instruction &I) {
Value *X;		Value *X;
Constant *C;		Constant *C;

// Fold negation into constant operand. This is limited with one-use because		// Fold negation into constant operand. This is limited with one-use because
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 946 Lines • ▼ Show 20 Lines	private:
Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);		Value EvaluateInDifferentType(Value V, Type *Ty, bool isSigned);

/// Returns a value X such that Val = X * Scale, or null if none.		/// Returns a value X such that Val = X * Scale, or null if none.
///		///
/// If the multiplication is known not to overflow then NoSignedWrap is set.		/// If the multiplication is known not to overflow then NoSignedWrap is set.
Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);		Value Descale(Value Val, APInt Scale, bool &NoSignedWrap);
};		};

		namespace {

		// As a default, let's assume that we want to be somewhat aggressive,
		// and attemt to traverse up to 16 layers in attempt to sink negation.
		spatelUnsubmitted Done Reply Inline Actions typo: attempt spatel: typo: attempt
		static constexpr unsigned NegatorDefaultMaxDepth = 16;

		// Let's guesstimate that most often we will negate less than 8 layers of
		// binops (i.e. 2^8 == 256 instructions).
		static constexpr unsigned NegatorMaxNewNodesSSO = 256;

		/// Provides an 'InsertHelper' that calls a user-provided callback, but unlike
		/// the usual IRBuilderCallbackInserter does NOT perform the default insertion.
		class IRBuilderCallbackNoInsert {
		std::function<void(Instruction *)> Callback;

		public:
		IRBuilderCallbackNoInsert(std::function<void(Instruction *)> Callback)
		: Callback(std::move(Callback)) {}

		protected:
		void InsertHelper(Instruction I, const Twine &Name, BasicBlock BB,
		BasicBlock::iterator InsertPt) const {
		Callback(I);
		}
		};

		} // namespace

		class Negator final {
		/// Top-to-bottom, def-to-use negated instruction tree we produced.
		SmallVector<Instruction *, NegatorMaxNewNodesSSO> NewInstructions;

		using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackNoInsert>;
		BuilderTy Builder;

		Negator(LLVMContext &C, const DataLayout &DL);

		using Result = std::pair<ArrayRef<Instruction > /NewInstructions*/,
		Value * /NegatedRoot/>;

		LLVM_NODISCARD Value visit(Value V, unsigned Depth);

		/// Recurse depth-first and attempt to sink the negation.
		/// FIXME: use worklist?
		LLVM_NODISCARD Optional<Result> run(Value *Root);

		Negator(const Negator &) = delete;
		Negator(Negator &&) = delete;
		Negator &operator=(const Negator &) = delete;
		Negator &operator=(Negator &&) = delete;

		public:
		/// Attempt to negate \p Root. Retuns nullptr if negation can't be performed,
		/// otherwise returns negated value.
		LLVM_NODISCARD static Value Negate(Value Root, InstCombiner &IC);
		};

} // end namespace llvm		} // end namespace llvm

#undef DEBUG_TYPE		#undef DEBUG_TYPE

#endif // LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#endif // LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H

llvm/lib/Transforms/InstCombine/InstCombineNegator.cpp

This file was added.

				//===- InstCombineAddSub.cpp ------------------------------------- C++ --===//
				//
				efriedmaUnsubmitted Done Reply Inline Actions InstCombineAddSub.cpp? efriedma: InstCombineAddSub.cpp?
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements propagation of negation into expressions.
				//
				//===----------------------------------------------------------------------===//

				#include "InstCombineInternal.h"
				#include "llvm/ADT/APFloat.h"
				#include "llvm/ADT/APInt.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/InstructionSimplify.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/IR/Constant.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/InstrTypes.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Operator.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/Value.h"
				#include "llvm/Support/AlignOf.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/Support/Compiler.h"
				#include "llvm/Support/KnownBits.h"
				#include <cassert>
				#include <deque>
				#include <utility>

				using namespace llvm;

				#define DEBUG_TYPE "instcombine"

				STATISTIC(NegatorTotalNegationsAttempted,
				"Negator: Numer of negations attempted to be sinked.");
				STATISTIC(NegatorNumTreesNegated,
				"Negator: Number of negations successfully sinked.");
				STATISTIC(NegatorMaxDepthVisited, "Negator: Maximal traversal depth ever "
				"reached while attempting to sink negation.");
				STATISTIC(NegatorDepthLimitReached, "Negator: How many times did the traversal "
				"depth limit was reached during sinking.");
				STATISTIC(NegatorTotalValuesVisited, "Negator: Total number of values visited "
				"during attempts to sink negation.");
				STATISTIC(NegatorNumInstructionsCreatedTotal,
				"Negator: Number of new negated instructions created, total");
				STATISTIC(NegatorNumInstructionsNegatedSuccess,
				"Negator: Number of new negated instructions created in successful "
				"negation sinking attempts");

				static cl::opt<bool>
				NegatorEnabled("instcombine-negator-enabled", cl::init(true),
				cl::desc("Should we attempt to sink negations?"));

				static cl::opt<unsigned>
				NegatorMaxDepth("instcombine-negator-max-depth",
				cl::init(NegatorDefaultMaxDepth),
				cl::desc("What is the maximal lookup depth when trying to "
				"check for viability of negation sinking."));

				Negator::Negator(LLVMContext &C, const DataLayout &DL)
				: Builder(C, TargetFolder(DL),
				IRBuilderCallbackNoInsert([&](Instruction *I) {
				++NegatorNumInstructionsCreatedTotal;
				NewInstructions.push_back(I);
				})) {}

				// FIXME: can this be reworked into a worklist-based algorithm while preserving
				// the depth-first, early bailout traversal?
				LLVM_NODISCARD Value Negator::visit(Value V, unsigned Depth) {
				NegatorMaxDepthVisited.updateMax(Depth);
				++NegatorTotalValuesVisited;

				Value *X;

				// -(-(X)) -> X.
				if (match(V, m_Neg(m_Value(X))))
				return X;

				// Integral constants can be freely negated.
				if (match(V, m_AnyIntegralConstant()))
				return ConstantExpr::getNeg(cast<Constant>(V), /HasNUW=/false,
				/HasNSW=/false);

				// If we have a non-instruction, or it has other usese, then give up.
				if (!isa<Instruction>(V) \|\| !V->hasOneUse())
				return nullptr;

				auto *I = cast<Instruction>(V);

				// Preserve debug info!
				Builder.SetCurrentDebugLocation(I->getDebugLoc());

				// In some cases we can give the answer without further recursion.
				switch (I->getOpcode()) {
				case Instruction::PHI:
				// `phi` is negatible if all the incoming values are negatible. We'd need to
				// ensure that we won't deadloop (pr12338.ll), so let's bother not for now.
				return nullptr;
				case Instruction::Sub:
				// `sub` is always negatible.
				return Builder.CreateSub(I->getOperand(1), I->getOperand(0),
				I->getName() + ".neg");
				default:
				break; // Other instructions require recursive reasoning.
				}

				// Rest of the logic is recursive, so if it's time to give up then it's time.
				if (Depth > NegatorMaxDepth) {
				LLVM_DEBUG(dbgs() << "Negator: reached maximal allowed traversal depth in "
				<< *V << ". Giving up.\n");
				++NegatorDepthLimitReached;
				return nullptr;
				}

				switch (I->getOpcode()) {
				case Instruction::Select: {
				// `select` is negatible if both hands of `select` are negatible.
				Value *NegOp1 = visit(I->getOperand(1), Depth + 1);
				if (!NegOp1) // Early return.
				return nullptr;
				Value *NegOp2 = visit(I->getOperand(2), Depth + 1);
				if (!NegOp2)
				return nullptr;
				// Do preserve the metadata!
				return Builder.CreateSelect(I->getOperand(0), NegOp1, NegOp2,
				I->getName() + ".neg", /MDFrom=/I);
				spatelUnsubmitted Done Reply Inline Actions Remove/update comment. spatel: Remove/update comment.
				}
				case Instruction::Shl: {
				// `shl` is negatible if the first operand is negatible.
				Value *NegOp0 = visit(I->getOperand(0), Depth + 1);
				if (!NegOp0) // Early return.
				return nullptr;
				return Builder.CreateShl(NegOp0, I->getOperand(1), I->getName() + ".neg");
				}
				case Instruction::Add: {
				// `add` is negatible if both of it's operands are negatible.
				Value *NegOp0 = visit(I->getOperand(0), Depth + 1);
				if (!NegOp0) // Early return.
				return nullptr;
				Value *NegOp1 = visit(I->getOperand(1), Depth + 1);
				if (!NegOp1)
				return nullptr;
				return Builder.CreateAdd(NegOp0, NegOp1, I->getName() + ".neg");
				}
				case Instruction::Mul: {
				// `mul` is negatible one of it's operands is negatible.
				Value NegatedOp, OtherOp;
				if (Value *NegOp0 = visit(I->getOperand(0), Depth + 1)) {
				NegatedOp = NegOp0;
				OtherOp = I->getOperand(1);
				} else if (Value *NegOp1 = visit(I->getOperand(1), Depth + 1)) {
				NegatedOp = NegOp1;
				OtherOp = I->getOperand(0);
				} else // Can't negate either of them.
				return nullptr;
				return Builder.CreateMul(NegatedOp, OtherOp, I->getName() + ".neg");
				}
				default:
				return nullptr; // Don't know, likely not negatible for free.
				}

				spatelUnsubmitted Done Reply Inline Actions it's -> its spatel: it's -> its
				llvm_unreachable("Can't get here. We always return from switch.");
				};

				LLVM_NODISCARD Optional<Negator::Result> Negator::run(Value *Root) {
				Value Negated = visit(Root, /Depth=*/0);
				if (!Negated)
				return llvm::None;
				return std::make_pair(ArrayRef<Instruction *>(NewInstructions), Negated);
				};

				LLVM_NODISCARD Value Negator::Negate(Value Root, InstCombiner &IC) {
				++NegatorTotalNegationsAttempted;
				nikicUnsubmitted Done Reply Inline Actions Noticed while looking through the tramp3d-v4 diff: This should be behind a one-use check, to avoid duplicating expensive division instructions. nikic: Noticed while looking through the tramp3d-v4 diff: This should be behind a one-use check, to…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions I was hesitant about this one indeed, it isn't a typo that one-use check has gone away here, because we generally consider only instruction count. @spatel thoughts? But the main, bigger question this touches is: "but what if all the uses would get negated by us? In future, can we somehow sanely model the whole negatible tree, not giving up at non-single-use instructions, but defer that to after we've finished building new tree?" lebedev.ri: I was hesitant about this one indeed, it isn't a typo that one-use check has gone away here…
				spatelUnsubmitted Done Reply Inline Actions I missed that logic difference, and I'm not getting a reviewable diff of the attached files with llvm-diff or other apps. Can you create an IR example/regression test for that? spatel: I missed that logic difference, and I'm not getting a reviewable diff of the attached files…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions @spatel To be clear, the "bigger question" is pretty rhetorical, or at least not for this review. The actual question here is whether we should consider `sdiv` special, and even if we can negate it without increasing instruction count, we should only do so if there are no other uses of old `sdiv`. lebedev.ri: @spatel To be clear, the "bigger question" is pretty rhetorical, or at least not for this…
				spatelUnsubmitted Done Reply Inline Actions There is precedence for this kind of special treatment. In other words, not all opcodes are equal in terms of analysis (and secondary concern of codegen), and we will even increase instruction count to avoid some like div/rem (although those transforms are probably currently not safe with respect to poison). If it would be NFC-ish to keep the one-use check, then we should do that. Then, remove the limitation as a follow-up if that can be shown useful? spatel: There is precedence for this kind of special treatment. In other words, not all opcodes are…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Alright, i'll add one-use check just to get this moving :) lebedev.ri: Alright, i'll add one-use check just to get this moving :)
				LLVM_DEBUG(dbgs() << "Negator: attempting to sink negation into " << *Root
				<< "\n");
				spatelUnsubmitted Done Reply Inline Actions it's -> its spatel: it's -> its

				if (!NegatorEnabled)
				return nullptr;

				Negator N(Root->getContext(), IC.getDataLayout());
				Optional<Result> Res = N.run(Root);
				if (!Res) // Negation failed.
				return nullptr;

				LLVM_DEBUG(dbgs() << "Negator: successfully sunk negation into " << *Root
				spatelUnsubmitted Done Reply Inline Actions negatible one -> negatible if one spatel: negatible one -> negatible if one
				spatelUnsubmitted Done Reply Inline Actions it's -> its spatel: it's -> its
				<< "\n NEW: " << *Res->second << "\n");
				++NegatorNumTreesNegated;

				// We must temporairly unset the 'current' DebugLoc of the InstCombine's
				// IRBuilder so that it won't override the DebugLoc's we have already kept
				// from the original instructions.
				InstCombiner::BuilderTy::InsertPointGuard Guard(IC.Builder);
				IC.Builder.SetCurrentDebugLocation(DebugLoc());

				// We must propagate newly-created instructions into the InstCombine's
				// IRBuilder so that they will finally be inserted into the basic block,
				// and into the InstCombine's worklist so it can attempt to combine them.
				LLVM_DEBUG(dbgs() << "Negator: Propagating " << Res->first.size()
				<< " instrs to InstCombine\n");

				// They are in def-use order, so nothing fancy, just insert them in order.
				llvm::for_each(Res->first, [&](Instruction *I) {
				++NegatorNumInstructionsNegatedSuccess;
				IC.Builder.Insert(I);
				});

				// And return the new root.
				return Res->second;
				};
				spatelUnsubmitted Done Reply Inline Actions typo: temporarily spatel: typo: temporarily

llvm/test/Transforms/InstCombine/mul.ll

	Show First 20 Lines • Show All 450 Lines • ▼ Show 20 Lines
	;			;
	%neg = sub i32 0, %x			%neg = sub i32 0, %x
	%mul = mul i32 %neg, %y			%mul = mul i32 %neg, %y
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test_mul_canonicalize_op1(i32 %x, i32 %z) {			define i32 @test_mul_canonicalize_op1(i32 %x, i32 %z) {
	; CHECK-LABEL: @test_mul_canonicalize_op1(			; CHECK-LABEL: @test_mul_canonicalize_op1(
	; CHECK-NEXT: [[Y:%.]] = mul i32 [[Z:%.]], 3			; CHECK-NEXT: [[TMP1:%.]] = mul i32 [[Z:%.]], -3
	; CHECK-NEXT: [[TMP1:%.]] = mul i32 [[Y]], [[X:%.]]			; CHECK-NEXT: [[TMP2:%.]] = mul i32 [[TMP1]], [[X:%.]]
	; CHECK-NEXT: [[MUL:%.*]] = sub i32 0, [[TMP1]]			; CHECK-NEXT: ret i32 [[TMP2]]
	; CHECK-NEXT: ret i32 [[MUL]]
	;			;
	%y = mul i32 %z, 3			%y = mul i32 %z, 3
	%neg = sub i32 0, %x			%neg = sub i32 0, %x
	%mul = mul i32 %y, %neg			%mul = mul i32 %y, %neg
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test_mul_canonicalize_nsw(i32 %x, i32 %y) {			define i32 @test_mul_canonicalize_nsw(i32 %x, i32 %y) {
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/sub-of-negatible.ll

Show All 24 Lines	;
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = sub i8 %x, %t0		%t1 = sub i8 %x, %t0
ret i8 %t1		ret i8 %t1
}		}

; Shift-left can be negated if all uses can be updated		; Shift-left can be negated if all uses can be updated
define i8 @t2(i8 %x, i8 %y) {		define i8 @t2(i8 %x, i8 %y) {
; CHECK-LABEL: @t2(		; CHECK-LABEL: @t2(
; CHECK-NEXT: [[T0:%.]] = shl i8 -42, [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i8 42, [[Y:%.]]
; CHECK-NEXT: [[T1:%.]] = sub i8 [[X:%.]], [[T0]]		; CHECK-NEXT: [[T1:%.]] = add i8 [[TMP1]], [[X:%.]]
; CHECK-NEXT: ret i8 [[T1]]		; CHECK-NEXT: ret i8 [[T1]]
;		;
%t0 = shl i8 -42, %y		%t0 = shl i8 -42, %y
%t1 = sub i8 %x, %t0		%t1 = sub i8 %x, %t0
ret i8 %t1		ret i8 %t1
}		}
define i8 @n2(i8 %x, i8 %y) {		define i8 @n2(i8 %x, i8 %y) {
; CHECK-LABEL: @n2(		; CHECK-LABEL: @n2(
; CHECK-NEXT: [[T0:%.]] = shl i8 -42, [[Y:%.]]		; CHECK-NEXT: [[T0:%.]] = shl i8 -42, [[Y:%.]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: call void @use8(i8 [[T0]])
; CHECK-NEXT: [[T1:%.]] = sub i8 [[X:%.]], [[T0]]		; CHECK-NEXT: [[T1:%.]] = sub i8 [[X:%.]], [[T0]]
; CHECK-NEXT: ret i8 [[T1]]		; CHECK-NEXT: ret i8 [[T1]]
;		;
%t0 = shl i8 -42, %y		%t0 = shl i8 -42, %y
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = sub i8 %x, %t0		%t1 = sub i8 %x, %t0
ret i8 %t1		ret i8 %t1
}		}
define i8 @t3(i8 %x, i8 %y, i8 %z) {		define i8 @t3(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @t3(		; CHECK-LABEL: @t3(
; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]		; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: call void @use8(i8 [[T0]])
; CHECK-NEXT: [[T1:%.]] = shl i8 [[T0]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i8 [[Z]], [[Y:%.]]
; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]		; CHECK-NEXT: [[T2:%.]] = add i8 [[TMP1]], [[X:%.]]
; CHECK-NEXT: ret i8 [[T2]]		; CHECK-NEXT: ret i8 [[T2]]
;		;
%t0 = sub i8 0, %z		%t0 = sub i8 0, %z
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = shl i8 %t0, %y		%t1 = shl i8 %t0, %y
%t2 = sub i8 %x, %t1		%t2 = sub i8 %x, %t1
ret i8 %t2		ret i8 %t2
}		}
Show All 12 Lines	;
call void @use8(i8 %t1)		call void @use8(i8 %t1)
%t2 = sub i8 %x, %t1		%t2 = sub i8 %x, %t1
ret i8 %t2		ret i8 %t2
}		}

; Select can be negated if all it's operands can be negated and all the users of select can be updated		; Select can be negated if all it's operands can be negated and all the users of select can be updated
define i8 @t4(i8 %x, i1 %y) {		define i8 @t4(i8 %x, i1 %y) {
; CHECK-LABEL: @t4(		; CHECK-LABEL: @t4(
; CHECK-NEXT: [[T0:%.]] = select i1 [[Y:%.]], i8 -42, i8 44		; CHECK-NEXT: [[TMP1:%.]] = select i1 [[Y:%.]], i8 42, i8 -44
; CHECK-NEXT: [[T1:%.]] = sub i8 [[X:%.]], [[T0]]		; CHECK-NEXT: [[T1:%.]] = add i8 [[TMP1]], [[X:%.]]
; CHECK-NEXT: ret i8 [[T1]]		; CHECK-NEXT: ret i8 [[T1]]
;		;
%t0 = select i1 %y, i8 -42, i8 44		%t0 = select i1 %y, i8 -42, i8 44
%t1 = sub i8 %x, %t0		%t1 = sub i8 %x, %t0
ret i8 %t1		ret i8 %t1
}		}
define i8 @n4(i8 %x, i1 %y) {		define i8 @n4(i8 %x, i1 %y) {
; CHECK-LABEL: @n4(		; CHECK-LABEL: @n4(
Show All 16 Lines	;
%t0 = select i1 %y, i8 -42, i8 %z		%t0 = select i1 %y, i8 -42, i8 %z
%t1 = sub i8 %x, %t0		%t1 = sub i8 %x, %t0
ret i8 %t1		ret i8 %t1
}		}
define i8 @t6(i8 %x, i1 %y, i8 %z) {		define i8 @t6(i8 %x, i1 %y, i8 %z) {
; CHECK-LABEL: @t6(		; CHECK-LABEL: @t6(
; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]		; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: call void @use8(i8 [[T0]])
; CHECK-NEXT: [[T1:%.]] = select i1 [[Y:%.]], i8 -42, i8 [[T0]]		; CHECK-NEXT: [[TMP1:%.]] = select i1 [[Y:%.]], i8 42, i8 [[Z]]
; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]		; CHECK-NEXT: [[T2:%.]] = add i8 [[TMP1]], [[X:%.]]
; CHECK-NEXT: ret i8 [[T2]]		; CHECK-NEXT: ret i8 [[T2]]
;		;
%t0 = sub i8 0, %z		%t0 = sub i8 0, %z
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = select i1 %y, i8 -42, i8 %t0		%t1 = select i1 %y, i8 -42, i8 %t0
%t2 = sub i8 %x, %t1		%t2 = sub i8 %x, %t1
ret i8 %t2		ret i8 %t2
}		}
define i8 @t7(i8 %x, i1 %y, i8 %z) {		define i8 @t7(i8 %x, i1 %y, i8 %z) {
; CHECK-LABEL: @t7(		; CHECK-LABEL: @t7(
; CHECK-NEXT: [[T0:%.]] = shl i8 1, [[Z:%.]]		; CHECK-NEXT: [[TMP1:%.]] = shl i8 -1, [[Z:%.]]
; CHECK-NEXT: [[T1:%.]] = select i1 [[Y:%.]], i8 0, i8 [[T0]]		; CHECK-NEXT: [[TMP2:%.]] = select i1 [[Y:%.]], i8 0, i8 [[TMP1]]
; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]		; CHECK-NEXT: [[T2:%.]] = add i8 [[TMP2]], [[X:%.]]
; CHECK-NEXT: ret i8 [[T2]]		; CHECK-NEXT: ret i8 [[T2]]
;		;
%t0 = shl i8 1, %z		%t0 = shl i8 1, %z
%t1 = select i1 %y, i8 0, i8 %t0		%t1 = select i1 %y, i8 0, i8 %t0
%t2 = sub i8 %x, %t1		%t2 = sub i8 %x, %t1
ret i8 %t2		ret i8 %t2
}		}
define i8 @n8(i8 %x, i1 %y, i8 %z) {		define i8 @n8(i8 %x, i1 %y, i8 %z) {
; CHECK-LABEL: @n8(		; CHECK-LABEL: @n8(
; CHECK-NEXT: [[T0:%.]] = shl i8 1, [[Z:%.]]		; CHECK-NEXT: [[T0:%.]] = shl i8 1, [[Z:%.]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: call void @use8(i8 [[T0]])
; CHECK-NEXT: [[T1:%.]] = select i1 [[Y:%.]], i8 0, i8 [[T0]]		; CHECK-NEXT: [[T1:%.]] = select i1 [[Y:%.]], i8 0, i8 [[T0]]
; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]		; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]
; CHECK-NEXT: ret i8 [[T2]]		; CHECK-NEXT: ret i8 [[T2]]
;		;
%t0 = shl i8 1, %z		%t0 = shl i8 1, %z
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = select i1 %y, i8 0, i8 %t0		%t1 = select i1 %y, i8 0, i8 %t0
%t2 = sub i8 %x, %t1		%t2 = sub i8 %x, %t1
ret i8 %t2		ret i8 %t2
}		}

; Subtraction can be negated if the first operand can be negated		; Subtraction can be negated if the first operand can be negated
; x - (y - z) -> x - y + z -> x + (-y) + z		; x - (y - z) -> x - y + z -> x + (z - y)
define i8 @t9(i8 %x, i8 %y, i8 %z) {		define i8 @t9(i8 %x, i8 %y) {
		spatelUnsubmitted Done Reply Inline Actions I didn't follow the diffs here - was one of these tests redundant? The code comment didn't match before, but it still doesn't? spatel: I didn't follow the diffs here - was one of these tests redundant? The code comment didn't…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions The test was too complicated, was checking more than the minimal pattern - subtraction can be freely negated by swapping its operands. lebedev.ri: The test was too complicated, was checking more than the minimal pattern - subtraction can be…
; CHECK-LABEL: @t9(		; CHECK-LABEL: @t9(
; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]		; CHECK-NEXT: [[T01:%.]] = sub i8 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: ret i8 [[T01]]
; CHECK-NEXT: [[T11:%.]] = add i8 [[Y:%.]], [[Z]]
; CHECK-NEXT: [[T2:%.]] = add i8 [[T11]], [[X:%.]]
; CHECK-NEXT: ret i8 [[T2]]
;		;
%t0 = sub i8 0, %z		%t0 = sub i8 %y, %x
call void @use8(i8 %t0)		%t1 = sub i8 0, %t0
%t1 = sub i8 %t0, %y		ret i8 %t1
%t2 = sub i8 %x, %t1
ret i8 %t2
}		}
define i8 @n10(i8 %x, i8 %y, i8 %z) {		define i8 @n10(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @n10(		; CHECK-LABEL: @n10(
; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]		; CHECK-NEXT: [[T0:%.]] = sub i8 [[Y:%.]], [[X:%.*]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: call void @use8(i8 [[T0]])
; CHECK-NEXT: [[T1:%.]] = sub i8 [[T0]], [[Y:%.]]		; CHECK-NEXT: [[T1:%.*]] = sub i8 0, [[T0]]
; CHECK-NEXT: call void @use8(i8 [[T1]])		; CHECK-NEXT: ret i8 [[T1]]
; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]
; CHECK-NEXT: ret i8 [[T2]]
;
%t0 = sub i8 0, %z
call void @use8(i8 %t0)
%t1 = sub i8 %t0, %y
call void @use8(i8 %t1)
%t2 = sub i8 %x, %t1
ret i8 %t2
}
define i8 @n11(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @n11(
; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Z:%.]]
; CHECK-NEXT: call void @use8(i8 [[T0]])
; CHECK-NEXT: [[T1:%.]] = add i8 [[Y:%.]], [[Z]]
; CHECK-NEXT: [[T2:%.]] = sub i8 [[X:%.]], [[T1]]
; CHECK-NEXT: ret i8 [[T2]]
;		;
%t0 = sub i8 0, %z		%t0 = sub i8 %y, %x
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = sub i8 %y, %t0		%t1 = sub i8 0, %t0
%t2 = sub i8 %x, %t1		ret i8 %t1
ret i8 %t2
}		}

; Addition can be negated if both operands can be negated		; Addition can be negated if both operands can be negated
; x - (y + z) -> x - y - z -> x + ((-y) + (-z)))		; x - (y + z) -> x - y - z -> x + ((-y) + (-z)))
define i8 @t12(i8 %x, i8 %y, i8 %z) {		define i8 @t12(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @t12(		; CHECK-LABEL: @t12(
; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Y:%.]]		; CHECK-NEXT: [[T0:%.]] = sub i8 0, [[Y:%.]]
; CHECK-NEXT: call void @use8(i8 [[T0]])		; CHECK-NEXT: call void @use8(i8 [[T0]])
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
;		;
%t0 = sub i8 0, %y		%t0 = sub i8 0, %y
call void @use8(i8 %t0)		call void @use8(i8 %t0)
%t1 = mul i8 %t0, %z		%t1 = mul i8 %t0, %z
call void @use8(i8 %t1)		call void @use8(i8 %t1)
%t2 = sub i8 %x, %t1		%t2 = sub i8 %x, %t1
ret i8 %t2		ret i8 %t2
}		}

		; Phi can be negated if all incoming values can be negated
		define i8 @t16(i1 %c, i8 %x) {
		spatelUnsubmitted Done Reply Inline Actions Add these tests with baseline results as pre-commit? spatel: Add these tests with baseline results as pre-commit?
		; CHECK-LABEL: @t16(
		; CHECK-NEXT: begin:
		; CHECK-NEXT: br i1 [[C:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
		; CHECK: then:
		; CHECK-NEXT: br label [[END:%.*]]
		; CHECK: else:
		; CHECK-NEXT: br label [[END]]
		; CHECK: end:
		; CHECK-NEXT: [[Z:%.]] = phi i8 [ [[X:%.]], [[THEN]] ], [ 42, [[ELSE]] ]
		; CHECK-NEXT: ret i8 [[Z]]
		;
		begin:
		br i1 %c, label %then, label %else
		then:
		%y = sub i8 0, %x
		br label %end
		else:
		br label %end
		end:
		%z = phi i8 [ %y, %then], [ -42, %else ]
		%n = sub i8 0, %z
		ret i8 %n
		}
		define i8 @n17(i1 %c, i8 %x) {
		; CHECK-LABEL: @n17(
		; CHECK-NEXT: begin:
		; CHECK-NEXT: br i1 [[C:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
		; CHECK: then:
		; CHECK-NEXT: [[Y:%.]] = sub i8 0, [[X:%.]]
		; CHECK-NEXT: br label [[END:%.*]]
		; CHECK: else:
		; CHECK-NEXT: br label [[END]]
		; CHECK: end:
		; CHECK-NEXT: [[Z:%.*]] = phi i8 [ [[Y]], [[THEN]] ], [ -42, [[ELSE]] ]
		; CHECK-NEXT: call void @use8(i8 [[Z]])
		; CHECK-NEXT: [[N:%.*]] = sub i8 0, [[Z]]
		; CHECK-NEXT: ret i8 [[N]]
		;
		begin:
		br i1 %c, label %then, label %else
		then:
		%y = sub i8 0, %x
		br label %end
		else:
		br label %end
		end:
		%z = phi i8 [ %y, %then], [ -42, %else ]
		call void @use8(i8 %z)
		%n = sub i8 0, %z
		ret i8 %n
		}
		define i8 @n19(i1 %c, i8 %x, i8 %y) {
		; CHECK-LABEL: @n19(
		; CHECK-NEXT: begin:
		; CHECK-NEXT: br i1 [[C:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
		; CHECK: then:
		; CHECK-NEXT: [[Z:%.]] = sub i8 0, [[X:%.]]
		; CHECK-NEXT: br label [[END:%.*]]
		; CHECK: else:
		; CHECK-NEXT: br label [[END]]
		; CHECK: end:
		; CHECK-NEXT: [[R:%.]] = phi i8 [ [[Z]], [[THEN]] ], [ [[Y:%.]], [[ELSE]] ]
		; CHECK-NEXT: [[N:%.*]] = sub i8 0, [[R]]
		; CHECK-NEXT: ret i8 [[N]]
		;
		begin:
		br i1 %c, label %then, label %else
		then:
		%z = sub i8 0, %x
		br label %end
		else:
		br label %end
		end:
		%r = phi i8 [ %z, %then], [ %y, %else ]
		%n = sub i8 0, %r
		ret i8 %n
		}
		spatelUnsubmitted Done Reply Inline Actions it's -> its spatel: it's -> its