Page MenuHomePhabricator

[InstCombine] Process newly inserted instructions in the correct order

Authored by nikic on Jan 25 2020, 2:29 AM.



InstCombine operates on the basic premise that the operands of the currently processed instruction have already been simplified. It achieves this by pushing instructions to the worklist in reverse program order, so that instructions are popped off in program order. The worklist management in the main combining loop also makes sure to uphold this invariant.

However, the same is not true for all the code that is performing manual worklist management. The largest problem (addressed in this patch) are instructions inserted by InstCombine's IRBuilder. These will be pushed onto the worklist in order of insertion (generally matching program order), which means that a) the users of the original instruction will be visited first, as they are pushed later in the main loop and b) the newly inserted instructions will be visited in reverse program order.

This causes a number of problems: First, folds operate on instructions that have not had their operands simplified, which may result in optimizations being missed (ran into this in, which was the original motivation for this patch). Additionally, this increases the amount of folds InstCombine has to perform, both within one iteration, and by increasing the number of total iterations.

This patch addresses the issue by adding a Worklist.AddDeferred() method, which is used for instructions inserted by IRBuilder. These will only be added to the real worklist after the combine finished, and in reverse order, so they will end up processed in program order. I should note that the same should also be done to nearly all other uses of Worklist.Add(), but I'm starting with just this occurrence, which has by far the largest test fallout.

Most of the test changes are due to or other cases where we don't canonicalize something. These are neutral. There are also a couple of regressions that I will highlight in comments...

Diff Detail

Event Timeline

nikic created this revision.Jan 25 2020, 2:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2020, 2:29 AM
nikic marked 2 inline comments as done.Jan 25 2020, 2:43 AM
nikic added inline comments.

Here (a shl nsw 31) sdiv INT_MIN is folded to zext ((a shl nsw 31) == INT_MIN). Previously the zext was visited first and this was folded to and. Now the comparison is visited first, which is folded to a == -1. At that point, the zext cannot be folded to and anymore, because we've lost information.

I don't think we can really do anything about this, unless we want to drop the equality fold, which I don't think we want. The new IR also doesn't look particularly bad, depending on how the result is going to be used.


This issue is tracked at I got the impression that @spatel is planning to work on this at some point, but if desired I can also look into fixing this regression first.

spatel added inline comments.Tue, Jan 28, 6:42 AM

I took a look at that, and I think we need an analysis enhancement first:

nikic updated this revision to Diff 241194.Wed, Jan 29, 9:24 AM

Rebase over D73575.

nikic updated this revision to Diff 241251.Wed, Jan 29, 12:27 PM

Rebase over D73647.

nikic marked an inline comment as done.Wed, Jan 29, 12:31 PM
nikic added inline comments.

D73575 and D73647 have addressed this regression. We now get a marginally better result here.

lebedev.ri accepted this revision.Wed, Jan 29, 12:58 PM

I agree test changes look harmless now.
The internal changes seem good, but please wait for more review.

This revision is now accepted and ready to land.Wed, Jan 29, 12:58 PM
spatel accepted this revision.Wed, Jan 29, 2:02 PM



typo: combined -> combine


Could take this opportunity to correct all of the ill-formatted 'FunctionNames' to 'functionNames'.

This revision was automatically updated to reflect the committed changes.
nikic marked an inline comment as done.
nikic marked an inline comment as done.Thu, Jan 30, 12:48 AM
nikic added inline comments.

I was planning to do a mass rename of the various Add methods, I'll roll the style change into that one.

Hi Nikita,

This patch regresses one of the top-3 hot functions in 444.namd (from SPEC CPU2006). It slows down that function by 23% (from 94 seconds to 116 seconds) and increases its code size by 13% (from 2588 bytes to 2920 bytes). This happens at -Oz on aarch64-linux-gnu. Since this happens at -Oz, code size regression is more interesting than the performance regression.

Would you please check if _ZN20ComputeNonbondedUtil16calc_pair_energyEP9no hits a corner case or something? And whether this can be avoided?

444.namd,[.] _ZN20ComputeNonbondedUtil26calc_pair_energy_full,100,101,1417,1422,4120,4156
444.namd,[.] _ZN20ComputeNonbondedUtil19calc_pair_fullelectEP,101,100,1232,1240,3588,3576
444.namd,[.] _ZN20ComputeNonbondedUtil16calc_pair_energyEP9no,123,113,941,1158,2588,2920

Let me know if you need help reproducing this. I have linux perf profiles, preprocessed source and compiled assembly handy.


@maxim-kuvyrkov: Thanks for the report. Having the preprocessed source would be great. If you have an idea which part of the generated assembly could be related to the slowdown, that would of course be great to know as well.

Regressions from this change are not entirely unexpected, as it may expose missing folds that previously worked "by accident".