This is an archive of the discontinued LLVM Phabricator instance.

[Transforms] Adding a WeakReassociate pass
Needs ReviewPublic

Authored by opaparo on Dec 25 2017, 5:26 AM.

Details

Summary

A discussion that started at D38037 raised the need for a new reassociation pass, in addition to the existing one.

The existing pass takes a tree of a reassociable operator, and transforms it into a "linear tree", e.g. for addition a_0 + (a_1 + (a_2 + ... (a_n-1 + a_n))...). The tree is built such that the latter operands have lower "ranks". The ranking is such that, for example, constants have rank of zero which means they will be at the end of the tree which will cause them to eventually fold.
The new pass implements another kind of reassociation, something of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3), which is not supported by the existing pass.
The new pass also limited in numerous ways (hence "Weak"): it will only reassociate a tree compound of two kinds of leaves by putting all the leaves of one kind ("LeafClass") in one subtree and the leaves from the other kind at the second subtree. It will also never change the topography of the tree (unlike the existing reassociate pass which aggressively dictates a new topography). That means, amongst the rest, that this pass never creates new instructions.

Diff Detail

Repository
rL LLVM

Event Timeline

opaparo created this revision.Dec 25 2017, 5:26 AM
opaparo updated this revision to Diff 128152.Dec 26 2017, 12:42 AM

Now using the constant's value rather than its pointer value for enforcing total order on LeafClasses. This is needed in order to ensure deterministic output of the compiler (I actually got different results when running with different OSs before this change).

davide requested changes to this revision.Dec 26 2017, 3:24 AM
davide added a subscriber: davide.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

This revision now requires changes to proceed.Dec 26 2017, 3:24 AM

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

The new pass is a kind of preparation pass for InstCombine. It complements changes introduced at D39421, as well as other InstCombine transformations. Checkout my lit tests, they illustrate several scenarios.
I believe that this pass will open a gate to other reassociations required by InstCombine transformations, reassociations that can't be expressed in means of the existing Reassociate pass.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

The new pass is a kind of preparation pass for InstCombine. It complements changes introduced at D39421, as well as other InstCombine transformations. Checkout my lit tests, they illustrate several scenarios.
I believe that this pass will open a gate to other reassociations required by InstCombine transformations, reassociations that can't be expressed in means of the existing Reassociate pass.

@davide, does that answer your questions? Do you have any additional concerns?

opaparo updated this revision to Diff 148024.May 22 2018, 9:22 AM

Following this discussion, rebasing the patch to trunk.

lebedev.ri added inline comments.
lib/Transforms/Scalar/WeakReassociate.cpp
136

Consider passing SmallVectorImpl<unsigned> to avoid adding noise in form of size.

139

I think you may want to suffix that one with U https://godbolt.org/g/FDQPax
Or maybe there is some helper function to do that?

155–156

Consider passing SmallVectorImpl<> to avoid adding noise in form of size.

210

You could do std::stack<BinaryOperator *, SmallVector<BinaryOperator *, 4>> to still make use of SSO

240

Do you actually need to specify the type here?

249–250

I *think* this is the only use of i?
wouldn't

for (Value *Operand : PrevDepthNode->operands())

work?

264

Wouldn't just NonLeaves.push_back({}); work?

284

Humm, have you considered [[ https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h | llvm::DenseMap<> ]]?

286

Couldn't this be range-for loop?

lebedev.ri added inline comments.May 22 2018, 10:13 AM
lib/Transforms/Scalar/WeakReassociate.cpp
104

It is hopefully nothing, but what if they compare equal?
I'm basically thinking about the llvm::sort() being intentionally unstable kind of things.

hjyamauchi added inline comments.
lib/Transforms/IPO/PassManagerBuilder.cpp
413

If we add it here, should we also add it in the corresponding location in PassBuilder::buildFunctionSimplificationPipeline (lib/Passes/PassBuilder.cpp) for the new pass manager?

What are benchmarks results with this pass (compile-time/performance)? Do we really need it at all optlevels, or can we only include it at -O3?

opaparo updated this revision to Diff 148823.May 28 2018, 9:15 AM

Several changes in this revision:

  1. The pass now extends PassInfoMixin<> instead of FunctionPass. A wrapper legacy pass that extends FunctionPass was also added.
  2. The pass is now also used in the PassBuilder::buildFunctionSimplificationPipeline flow.
  3. Style:
    • Using SmallVectorImpl instead of SmallVector when possible, in order to avoid explicitly stating its size.
    • Using SmallVectorImpl instead of SmallVector when possible, in order to avoid explicitly stating its size.
    • Other miscellaneous small fixes following code review comments.
opaparo marked 9 inline comments as done.May 28 2018, 9:38 AM

What are benchmarks results with this pass (compile-time/performance)? Do we really need it at all optlevels, or can we only include it at -O3?

I've got some minor compile time regressions on my benchmarks, but nothing significant.
Judging by this ongoing discussion, this pass can potentially become a complimentary pass to InstCombine. If that will be the case, it should be active in all opt levels.

lib/Transforms/Scalar/WeakReassociate.cpp
104

If they compare equal then they are the same LeafClass, and should correspond to the same cell in the map.
I've indeed had some trouble with stable sorting in the past, and that's exactly why the Op1s of the two sides are compared by value and not by pointer (which may lead to different outcomes on different platforms or even consecutive runs on the same machine).

210

Done, but just so I'm sure - what does SSO stand for?

284

I've tried it.
The thing is that llvm::DenseMap does not guarantee the order of the elements, unlike std::map. It of course guarantees that two different key types will be mapped to two different cells in the map, but when iterating over it (as I do later on) you can get any order. When experimenting with it, I got two different iterating orders in two different runs, which resulted in different code in each run.
Technically I could use llvm::sort after I'm done filling the map, but I find it strange to do that instead of using a data structure that already keeps the elements sorted.

Please rebase + more tests

opaparo updated this revision to Diff 163527.Aug 31 2018, 7:14 AM

Rebased and added lit tests

opaparo added a comment.EditedAug 31 2018, 7:16 AM

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

The new pass is a kind of preparation pass for InstCombine. It complements changes introduced at D39421, as well as other InstCombine transformations. Checkout my lit tests, they illustrate several scenarios.
I believe that this pass will open a gate to other reassociations required by InstCombine transformations, reassociations that can't be expressed in means of the existing Reassociate pass.

@davide, does that answer your questions? Do you have any additional concerns?

@davide, unless you have any additional concerns, do you mind removing the "needs revision" mark in order to avoid discouraging others from reviewing this change?

Some remarks:

  1. It currently runs *after* instcombine. Is that intentional? Didn't you say it is a preparation pass?
  2. local_unnamed_addr and #0 in tests isn't needed.
  3. Please run -instnamer on the tests, i.e. s/%/%t/ (name all variables.).
  4. It is not ideal to test anything more than just the -weak-reassociate in the test/Transforms/WeakReassociate/ (i.e. would be best to drop -instcombine from run lines)