This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Switch to using a worklist.
ClosedPublic

Authored by fhahn on Sep 21 2021, 6:57 AM.

Details

Summary

This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

At the moment, this directly uses the instcombine worklist, which
provides the required infrastructure. Some functions related to the
worklist still include "IC" in the debug output. I can adjust that, if
we decide to re-use the worklist.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Diff Detail

Event Timeline

fhahn created this revision.Sep 21 2021, 6:57 AM
fhahn requested review of this revision.Sep 21 2021, 6:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2021, 6:57 AM

Nice!
I'm not sure it is a good idea to just reuse part of another pass like that.
Perhaps InstCombineWorklist should be extracted+promoted to being more reusable?

fhahn updated this revision to Diff 373953.Sep 21 2021, 8:51 AM

Nice!
I'm not sure it is a good idea to just reuse part of another pass like that.
Perhaps InstCombineWorklist should be extracted+promoted to being more reusable?

Yeah, it should probably be moved. I put up D110181 to move it to Utils and rename it to InstructionWorklist.

Hmm, but this still doesn't do instcombine-style whole-function reprocessing ("iteration"), but only reprocesses the ones explicitly added to worklist.
This is intentional?

fhahn added a comment.Sep 21 2021, 9:30 AM

Hmm, but this still doesn't do instcombine-style whole-function reprocessing ("iteration"), but only reprocesses the ones explicitly added to worklist.
This is intentional?

At the moment, the first combining round is done by iterating over the function without the worklist. Only if something gets simplified, the modified instructions are added. Is this what you are referring to? I did not add all instructions to the worklist up front intentionally.

The number of combines is quite limited at the moment and I expect there are a lot of functions where no combine triggers at all. In those cases, adding everything to the worklist first seems a bit of unnecessary overhead.

Hmm, but this still doesn't do instcombine-style whole-function reprocessing ("iteration"), but only reprocesses the ones explicitly added to worklist.
This is intentional?

At the moment, the first combining round is done by iterating over the function without the worklist. Only if something gets simplified, the modified instructions are added. Is this what you are referring to? I did not add all instructions to the worklist up front intentionally.

The number of combines is quite limited at the moment and I expect there are a lot of functions where no combine triggers at all. In those cases, adding everything to the worklist first seems a bit of unnecessary overhead.

Instcombine roughly does the following:

  1. add all instructions into worklist
  2. pop from worklist until it's empty
  3. if made any changes, goto 1.

This essentially does 1. and 2., but not 3.
This may either be an oversight, or an intentional improvement that guarantees that in this pass we don't repeat instcombine's mistake
of relying on 3., but consistently add everything we want reprocessed into worklist.
Thus, i'm asking.

nikic added a subscriber: nikic.Sep 21 2021, 9:57 AM

This may either be an oversight, or an intentional improvement that guarantees that in this pass we don't repeat instcombine's mistake
of relying on 3., but consistently add everything we want reprocessed into worklist.

I'd strongly suggest not to repeat InstCombine's mistakes :) Just the worklist iteration should be enough...

llvm/include/llvm/Transforms/Utils/InstructionWorklist.h
35 ↗(On Diff #373953)

Drop this initializer, as initialized in ctor?

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
108

Just Worklist.pushValue(&Old). The Instruction check is exactly what pushValue() does relative to push().

948

Probably not necessary, as replaceValue() adds the old instruction to the worklist for DCE purposes?

1080

Do we actually need the deferred worklist for VectorCombine (does it make a difference for any test cases)? InstCombine has some pretty complicated worklist management, it might be preferable to forgo it if we don't need it. (No strong opinion though.)

fhahn updated this revision to Diff 373981.Sep 21 2021, 10:31 AM

Remove DbgPrefix, use pushValue, avoid using deferred worklist part.

This may either be an oversight, or an intentional improvement that guarantees that in this pass we don't repeat instcombine's mistake
of relying on 3., but consistently add everything we want reprocessed into worklist.

Thanks for elaborating! I think for now it is not worth re-processing the whole function, because the combines are very limited in scope. It would probably be best to make sure the correct instructions are added to the worklist instead, e.g. note the updates around line 522 which ensure we can clean up now-dead extracts.

fhahn marked 3 inline comments as done.Sep 21 2021, 10:33 AM
fhahn added inline comments.
llvm/include/llvm/Transforms/Utils/InstructionWorklist.h
35 ↗(On Diff #373953)

The DbgPrefix is gone now

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
108

done thanks!

948

I here is a store, so it won't be trivially dead

1080

Not really, I removed the deferred handling .

lebedev.ri accepted this revision.Sep 21 2021, 10:36 AM

Remove DbgPrefix, use pushValue, avoid using deferred worklist part.

This may either be an oversight, or an intentional improvement that guarantees that in this pass we don't repeat instcombine's mistake
of relying on 3., but consistently add everything we want reprocessed into worklist.

Thanks for elaborating! I think for now it is not worth re-processing the whole function, because the combines are very limited in scope. It would probably be best to make sure the correct instructions are added to the worklist instead, e.g. note the updates around line 522 which ensure we can clean up now-dead extracts.

Aha, just checking. Sounds good to me.
Seems fine, but probably wait for @spatel.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
106

We also likely want to revisit users of the value of which we've just reduced the use-count.
But honestly this is quite fragile, because generally speaking we want to revisit all it's uses transitively.
https://bugs.llvm.org/show_bug.cgi?id=47238

This revision is now accepted and ready to land.Sep 21 2021, 10:36 AM

Do you know if the compile-time diff remained the same after the changes from the initial draft? I know it's only a small diff, but I'm actually surprised given how little VectorCombine does. Maybe there's more vector code out there than I realized. :)

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1057

Nit: I'd name this "FoldInst" or "CombineInst" because I equate "Simplify" with "InstSimplify" and so not creating new values.

llvm/test/Transforms/VectorCombine/AArch64/load-extract-insert-store-scalarization.ll
21

Why change this test?

fhahn updated this revision to Diff 374039.Sep 21 2021, 2:45 PM
fhahn marked 3 inline comments as done.

Address comments.

Do you know if the compile-time diff remained the same after the changes from the initial draft? I know it's only a small diff, but I'm actually surprised given how little VectorCombine does. Maybe there's more vector code out there than I realized. :)

I took a closer look and it seems like the majority of the diff was coming from some other subtle restructuring of the code. I updated the patch to be more in inline with the original code. It looks mostly neutral now:

NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

fhahn marked 3 inline comments as done.Sep 21 2021, 2:47 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
106

Do you think it's worth doing this in the current patch? Not sure if it is possible to come up with a test case with the current limited set of combines.

1057

Changed to FoldInst.

llvm/test/Transforms/VectorCombine/AArch64/load-extract-insert-store-scalarization.ll
21

This was an accidental change that sneaked in. Removed now.

fhahn edited the summary of this revision. (Show Details)Sep 21 2021, 2:48 PM
spatel accepted this revision.Sep 21 2021, 2:56 PM

LGTM

lebedev.ri accepted this revision.Sep 21 2021, 11:31 PM
lebedev.ri added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
106

I don't know.
I'm simply saying that by not reiterating over the whole function afterwards,
we are very much likely missing transforms.

fhahn marked 3 inline comments as done.Sep 22 2021, 1:54 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
106

Oh I see. I plan to add verification that re-running VectorCombine does not change the function as a follow-up. That should help surfacing such issues.

lebedev.ri added inline comments.Sep 22 2021, 1:57 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
106

Yep, that would be good, thanks!

This revision was landed with ongoing or failed builds.Sep 22 2021, 1:58 AM
This revision was automatically updated to reflect the committed changes.