This is an archive of the discontinued LLVM Phabricator instance.

Use a data structure better suited for large sets in SimplificationTracker.
ClosedPublic

Authored by tamur on Nov 1 2018, 3:13 PM.

Details

Summary

D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile.

In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch.

Diff Detail

Repository
rL LLVM

Event Timeline

tamur created this revision.Nov 1 2018, 3:13 PM
bjope added a comment.Nov 5 2018, 10:24 AM

If I remember correctly a problem was that depending on the order iteration at line 3011

while (PhiNodesToMatch.size())

the end result (which PHI node that others are found out to be equivalent to) could be different. Quite annoying when trying to debug something, and when turning on -print-after-all or rebuilding using -g the result is different.

Have you, for example, tested to run llc with and without -print-after-all, or -debug, or some other options that might impact memory layout. And have you checked that we still are debug invariant (always getting the same result regardless if you build the compiler using -g, or a different -O level)?
Another way to test it could be to build two versions of llc. One that inserts at the end of AllPhiNodes and one the inserts at the front (using the SmallSetVector and some hacks to insert from different ends in the vector). Then run a bunch of tests and compare the result.

Btw, when you say 600.000 PHI nodes, is that in the same SimplificationTracker instance?
If 600.000 is some kind of total, what is the average size of AllPhiNodes when MatchPhiSet is called? Perhaps the size N (currently set to 32) of the SmallSetVector can be tuned.

lib/CodeGen/CodeGenPrepare.cpp
2651 ↗(On Diff #172249)

The comment about SetVector is no longer valid with this patch.

tamur updated this revision to Diff 172864.Nov 6 2018, 3:16 PM
tamur marked an inline comment as done.Nov 6 2018, 3:34 PM

Thank you for the explanation. OK, neither SmallPtrSet nor SmallSetVector seems suitable for this specific task, so I created a simple set implementation, that is fast and has the same iteration order with SmallSetVector; so I don't expect any behaviour change. (The problem with SmallSetVector is that it actually removes elements from the underlying vector and shifts all others, which is bad when we have many elements).

I arrived at 600.000 by putting a static variable that peeks the AllPhiNodes.size() and acts as a "maximum value tracker" inside SimplificationTracker::insertNewPhi, so it should correspond to a single activation. The code I am trying to compile is open source (EPL 1.0), if you are curious or want to verify:

LGTM, please wait a bit and if Bjorn does not object, feel free to land.

lib/CodeGen/CodeGenPrepare.cpp
2755 ↗(On Diff #172864)

indent?

bjope requested changes to this revision.Nov 7 2018, 12:27 AM
bjope added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
2713 ↗(On Diff #172864)

Imagine that we insert the value 1,2,3,4,5 into the set, then the NodeList will be [1, 2, 3, 4, 5].
Now we remove "3", then the NodeList is untouched and FirstValidElement still points at the beginning of the NodeList.
If we later inset "3" again, then the NodeList will become [1, 2, 3, 4, 5, 3].
When iterating we now will get the order 1,2,3,4,5 and not the expected 1,2,4,5,3 (assuming that we really expect to get the insertion order).

I'm not sure if the above will ever happen for the specific use case in CodeGenPrepare, but looking at this as a general set implementation it seems faulty.

One solution, seeing this as a set of pointers, could be to now allow nullptr in the set (so add an assert for the in "insert"). And then we should not only move FirstValidElement forward here, we also need to scan the NodeList starting at FirstValidElement and replace any found entry matching Ptr with a nullptr (there should be exactly one such entry since it is a unique set).

2790 ↗(On Diff #172864)

nit: I think it should say "elements are iterated" or "elements will be iterated"

This revision now requires changes to proceed.Nov 7 2018, 12:27 AM

Just a note, there is a specific usage of this data structure.
Phi nodes are inserted in the initial step without any removals and then only removals are used.

So we should not see something like delete-insert.
So while data structure really suffers from what Bjorn wrote but in this specific application of this data structure I do not expect any problems...

bjope added a comment.Nov 7 2018, 12:56 AM

Just a note, there is a specific usage of this data structure.
Phi nodes are inserted in the initial step without any removals and then only removals are used.

So we should not see something like delete-insert.
So while data structure really suffers from what Bjorn wrote but in this specific application of this data structure I do not expect any problems...

Ok. Just worried that someone might find this set implementation nifty, trying to reuse it in some other part of the code or moving it to ADT, without considering this limitation regarding delete-insert.
If the extra cleanup in the NodeList is costly we could at least add some comment explaining this in the description of the set (and/or in the erase method). And maybe we can assert somehow that there are no inserts after erase (at insert the size of the NodeList and the size of the NodeSet must be equal)?
Still, if there is no big performance issue with making the data structure more general (also supporting delete-insert), such a solution could be accepatable as well.

Just a note, there is a specific usage of this data structure.
Phi nodes are inserted in the initial step without any removals and then only removals are used.

So we should not see something like delete-insert.
So while data structure really suffers from what Bjorn wrote but in this specific application of this data structure I do not expect any problems...

Ok. Just worried that someone might find this set implementation nifty, trying to reuse it in some other part of the code or moving it to ADT, without considering this limitation regarding delete-insert.

Completely agreed!

tamur updated this revision to Diff 173219.Nov 8 2018, 1:58 PM

Oh, I had completely misseed the remove-and-add-again-later case, thanks for pointing out. I changed the underlying set from SmallPtrSet to SmallDenseMap to let it point to the valid index at the vector; I think this is the only way to handle removing in constant time (Allowing nullptr does not seem enough). It made the compilation time for my huge case to go up from 96 seconds to maybe 98 seconds, and should be unnoticable in 'normal' cases.

tamur marked 2 inline comments as done.Nov 8 2018, 1:59 PM
tamur updated this revision to Diff 173223.Nov 8 2018, 2:01 PM
tamur marked an inline comment as done.
bjope accepted this revision.Nov 11 2018, 2:52 AM

LGTM now (might wanna wait for @skatkov to take a look at the last changes as well).

This revision is now accepted and ready to land.Nov 11 2018, 2:52 AM
skatkov accepted this revision.Nov 11 2018, 7:35 PM

LGTM as well

tamur added a comment.Nov 12 2018, 1:45 PM

Thank you both for the review. Committing.

This revision was automatically updated to reflect the committed changes.