This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
5
SSAUpdaterNew.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
CMakeLists.txt
17
SSAUpdaterNew.cpp
-
unittests/Transforms/Utils/
-
Transforms/
-
Utils/
-
CMakeLists.txt
1
SSAUpdaterNew.cpp

Differential D28934

Write a new SSAUpdater
Needs ReviewPublic

Authored by • dberlin on Jan 19 2017, 7:17 PM.

Download Raw Diff

Details

Reviewers

chandlerc
mzolotukhin
davide
sanjoy

Summary

This SSAUpdater has a simpler interface and even simpler
implementation than our current one. It is both faster, and IMHO,
simpler to understand. It also can be used for multiple variables at
once, and can be reused without destruction or worry.

It does not require dominance frontiers, dominators, etc, as our
current implementation does.

I originally wrote it for something else, but D28534 reminded me i had
it. It implements the marker algorithm from "Simple and Efficient
Construction of Static Single Assignment Form" at
http://compilers.cs.uni-saarland.de/papers/bbhlmz13cc.pdf. It does not
create completely minimal SSA (so in some cases leaves a few phi
nodes). I could make it do so by implementing the SCC algorithm in
the paper, at some implementation complexity cost.
The marker algorithm is superificially similar[1] to the current SSAUpdater,
but howwe place phi nodes is very different, and the API + implementation as
described is, IMHO, easier to understand :)

I also didn't implement the trivial recursive phi simplification (i
would rather just use the scc algorithm at that point, as it wouldn't
require recursive simplfication).

I do not pretend this version is ready to go in by far (it has no
tests, to start :P). But before i spent any more time on it, i
figured i'd get general thoughts on whether people think it is worth
cleaning it up, or if we don't care enough about SSAUpdater as-is.

(Also, fair warning, it worked the last time i tried it, but i haven'
touched it in a while)

[1] If one stares at the implementation, written in 2009, and then the
paper, written in 2012, one may wonder if some credit is not missing
somewhere.

Diff Detail

Build Status

Buildable 3136
Build 3136: arc lint + arc unit

Event Timeline

• dberlin created this revision.Jan 19 2017, 7:17 PM

Harbormaster completed remote builds in B3104: Diff 85093.Jan 19 2017, 7:17 PM

Herald added a subscriber: mgorny. · View Herald TranscriptJan 19 2017, 7:17 PM

Note that regardless of whether we want this for regular SSA or not, this will serve as the basis for my generic MemorySSA updater.
Sadly, because MemorySSA is not really part of the IR, and because it deals with may-defs, and not must-defs, we won't be able to share much beyond the idea.
(On the plus side, MemorySSA's semantics actually make generic updating easier and faster)

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

When I worked on SROA and SROA had split logic between SSAUpdater and mem2reg, it was definitely confusing and a source of much sorrow... That said, we fixed it by getting rid of SSAUpdater...

But looking at this code and looking at SSAUpdater, I'm at least interested in the simplification. This code doesn't have comments, debug info handling, or the SCC-stuff Danny mentioned, but even if it grows by 3x, by by guess it will still be less than 1/3rd the code of the current SSAUpdater.

I guess the question is can we get the quality up w/o regressing the simplicity. From your description Danny, I'd be interested in the SCC formation bit to get minimal and other properties, and it would need to handle debug info a bit. Not sure if there are any other challenges really.

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

FWIW: It's been a source of sorrow before.
People kind of randomly try to figure out what order to call things in, because it's not clear what the terminology it uses means, nor is it clear what the right sequence necessarily is to get the right result.

As for speed, we wrote an entire separate linear time phi placement pass (see promotememtoreg, and more particularly, idfcalculator) just because the current SSAUpdater was too slow and not easily understandable.
It's also used for ADCE and MemorySSA.

This particular patch came about because i needed an SSAUpdater workalike for MemorySSA updating, and i figured, what the heck, might as well see if anyone wants me to replace the existing one while i'm at it.
I started with the above, and then modified it for what i needed.

If you stare at this vs the paper vs the existing SSAUpdater, you can clearly see the paper (and this impl) is a slightly more refined version of the renaming algorithm, with a completely different phi insertion scheme shoved on top.
(i'm not sure if they both share some common lineage, because otherwise, that's ... yeah)

Our current SSAUpdater tries to iteratively compute dominators, dominance frontiers, and then phi placement for those. It's actually N^3 in the number of variables, and in some cases, badly so.

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

Do you find it easy to use? Perhaps I just haven't s studied it in sufficient detail, but I don't. Moreover, I think that's. our informal project policy has tended toward, "use this only if there's no other way." I'll admit never having thought about rewriting it, or if there is a better algorithm, but I think this is certainly an area in which potential improvements should be considered.

In D28934#651355, @dberlin wrote:

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

FWIW: It's been a source of sorrow before.
People kind of randomly try to figure out what order to call things in, because it's not clear what the terminology it uses means, nor is it clear what the right sequence necessarily is to get the right result.

As for speed, we wrote an entire separate linear time phi placement pass (see promotememtoreg, and more particularly, idfcalculator) just because the current SSAUpdater was too slow and not easily understandable.
It's also used for ADCE and MemorySSA.

This particular patch came about because i needed an SSAUpdater workalike for MemorySSA updating, and i figured, what the heck, might as well see if anyone wants me to replace the existing one while i'm at it.
I started with the above, and then modified it for what i needed.

If you stare at this vs the paper vs the existing SSAUpdater, you can clearly see the paper (and this impl) is a slightly more refined version of the renaming algorithm, with a completely different phi insertion scheme shoved on top.
(i'm not sure if they both share some common lineage, because otherwise, that's ... yeah)

Our current SSAUpdater tries to iteratively compute dominators, dominance frontiers, and then phi placement for those. It's actually N^3 in the number of variables, and in some cases, badly so.

I don't have any strong comments on the API (yet), I would like to try myself porting a pass as an experiment and see how that feels. I personally think removing the dependency on dom info is already a win per-se, and the code is definitely much smaller/easier to understand than what we currently have (I don't think it will grow enormously if we include SCC, but I may be wrong).
To sum up, I would like to see this going foward.

In D28934#651355, @dberlin wrote:

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

FWIW: It's been a source of sorrow before.
People kind of randomly try to figure out what order to call things in, because it's not clear what the terminology it uses means, nor is it clear what the right sequence necessarily is to get the right result.

As for speed, we wrote an entire separate linear time phi placement pass (see promotememtoreg, and more particularly, idfcalculator) just because the current SSAUpdater was too slow and not easily understandable.
It's also used for ADCE and MemorySSA.

This particular patch came about because i needed an SSAUpdater workalike for MemorySSA updating, and i figured, what the heck, might as well see if anyone wants me to replace the existing one while i'm at it.
I started with the above, and then modified it for what i needed.

If you stare at this vs the paper vs the existing SSAUpdater, you can clearly see the paper (and this impl) is a slightly more refined version of the renaming algorithm, with a completely different phi insertion scheme shoved on top.
(i'm not sure if they both share some common lineage, because otherwise, that's ... yeah)

Our current SSAUpdater tries to iteratively compute dominators, dominance frontiers, and then phi placement for those. It's actually N^3 in the number of variables, and in some cases, badly so.

While I don't think speed is the only factor we should consider, I don't think I ever found a case where compile time for SSAUpdater skyrockets, but this doesn't mean there aren't any. Did you find a case where this actually matters, out of curiosity?

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater

Okay I guess I gotta get better at listening! :)

It's algorithm was definitely not usable, time wise, for MemorySSA (i tried it), it was about 2-5x slower, just because of the amount of work it does per thing. MemorySSA is, as i said, a bit weird.

I don't remember the llvmdev history, but my understanding was it was really bad for SROA (I'm sure chandler can saying more).

In any case, let me add the SCC post-pass to this (the do scc walking as we rename algorithm is ... more complex, and i'm not sure it's worth it speed wise) so someone can play with it more. We have an SCC iterator, so i think i can make this happen very easily and it should be fast.

My understanding, BTW (not verified), is that with one additional recursive simplification function above, the only difference between this and minimal form should be in irreducible control flow.

But i'll check :)

Updating the patch in a second, i templated the trivial phi removal, fixed some bug, started on unit tests enough for others to play with it.

It should be a pretty direct conversion from the old ssaupdater (setAvailableValue->writeVariable, getvalueAtEndOfBlock->readVariableAfterDef, getvalueAtMiddleOfBlock->readVariableBeforeDef)

A few notes from more testing:

The old updater crashes if you call getValueAtEndOfBlock for a currently empty block (you have to insert the instruction first) (I can make the new one work on incomplete cfgs, actually, if we want)
The only cases we generate spurious phis now is in the case of irreducible control flow. This will go away once i implement the SCC algorithm.

Something like this:

define void @F(i32, i32) {
if:
  br i1 true, label %loopmain, label %loopstart

loopstart:                                        ; preds = %loopmain, %if
  br label %loopmain

loopmain:                                         ; preds = %loopstart, %if
  br i1 true, label %loopstart, label %afterloop

afterloop:                                        ; preds = %loopmain
  ret i32 %0
}

Will become

define void @F(i32, i32) {
if:
  br i1 true, label %loopmain, label %loopstart

loopstart:                                        ; preds = %loopmain, %if
  %update1 = phi i32 [ %update, %loopmain ], [ %0, %if ]
  br label %loopmain

loopmain:                                         ; preds = %loopstart, %if
  %update = phi i32 [ %update1, %loopstart ], [ %0, %if ]
  br i1 true, label %loopstart, label %afterloop

afterloop:                                        ; preds = %loopmain
  ret i32 %0
}

Add basic unit tests, comment better, add SCC based minimization.

This should now produce minimal SSA.

I am obviously a fan of the algorithm and biased :)

Interesting tweak with the visited set (compared to the paper), guess that saves us RAUWs in many cases at the cost of extra visited set lookups.
I am surprised to find readVariableBeforeDef() as public API, in my experience outside users of the algorithms do not need it (but maybe you have a good use case). I probably would have named it readVariableAtBlockBegin() or readVariableLiveIn() and the "normal" variant simply readVariable().

lib/Transforms/Utils/SSAUpdaterNew.cpp
60	Shouldn't this be `readVariableBeforeDef(Var, BB)`?
86	Maybe `PredVal->getParent() == BB && isa<PHINode>(PredVal)` is slightly more efficient.

With respect to SCC handling: In my experience the benefits of a truly minimal SSA form are very small and may not justify the extra complexity and calculations. At least back then we mostly added that version to the paper to present a complete solution, the default libfirm code didn't use it (but had a pass in the pipeline that would remove extra phis once).

My understanding, BTW (not verified), is that with one additional recursive simplification function above, the only difference between this and minimal form should be in irreducible control flow.

Yep we proved that in the paper. (Of course proving it correct doesn't replace testing :)

As for what's public:
The paper comments that for incremental updates, you want readVariableBeforeDef. This is equivalence to getValueAtMiddleOfBlock, which we make public in SSAUpdater.

The paper's API is actually identical to SSAUpdater's old API circa 2010, with the functions renamed.
The only difference is the more advanced phi placement algorithms.
I think Bob even has most of the marker algorithm right at one point if you go back through the file's history.

I think if we keep going down this route i would change the API.
Users shouldn't generally have to care where in the block they are.
The real problem on this front is loops.

Given:
main:
def(a)
goto loop

loop:
use (a)
def (a)
goto loop or somewhere else

For loop, you need

To call writeVariable of both defs so it's visible to the algorithm, otherwise, it will not get the value over the backedge.
But ask about the version before the def, because that's where the use is.

I think this should all be hidden, and orderedbb or something used to locate the relative position of the use to the defs.

(this is what i'm doing for MemorySSA, but memory ssa is easier because we have a single variable and every may-def creates a new version. Thus, i don't even need the user to tell me the defs, i can just find the closest def, etc)

lib/Transforms/Utils/SSAUpdaterNew.cpp
60	I think you are commenting on an old version of the patch where it had that bug :)

In D28934#652235, @dberlin wrote:

As for what's public:
The paper comments that for incremental updates, you want readVariableBeforeDef. This is equivalence to getValueAtMiddleOfBlock, which we make public in SSAUpdater.

The paper's API is actually identical to SSAUpdater's old API circa 2010, with the functions renamed.
The only difference is the more advanced phi placement algorithms.
I think Bob even has most of the marker algorithm right at one point if you go back through the file's history.

I think if we keep going down this route i would change the API.
Users shouldn't generally have to care where in the block they are.
The real problem on this front is loops.

Given:
main:
def(a)
goto loop

loop:
use (a)
def (a)
goto loop or somewhere else

For loop, you need

To call writeVariable of both defs so it's visible to the algorithm, otherwise, it will not get the value over the backedge.

But ask about the version before the def, because that's where the use is.

Yep, however that soon breaks down as soon as you have more than 1 definition in a block so there isn't a single before/after def anymore.
Just reviewing the libfirm SSA updater, we indeed first introduced all defs to the algorithm (picking the latest one if there are multiple) and then go over all defs/uses in a block from top to bottom.

I think this should all be hidden, and orderedbb or something used to locate the relative position of the use to the defs.

Yep, some notion of ordering is necessary to get multiple defs/uses in a block "right", finding a way to hide it would be cool.

(this is what i'm doing for MemorySSA, but memory ssa is easier because we have a single variable and every may-def creates a new version. Thus, i don't even need the user to tell me the defs, i can just find the closest def, etc)

In D28934#652243, @MatzeB wrote:

In D28934#652235, @dberlin wrote:

As for what's public:
The paper comments that for incremental updates, you want readVariableBeforeDef. This is equivalence to getValueAtMiddleOfBlock, which we make public in SSAUpdater.

The paper's API is actually identical to SSAUpdater's old API circa 2010, with the functions renamed.
The only difference is the more advanced phi placement algorithms.
I think Bob even has most of the marker algorithm right at one point if you go back through the file's history.

I think if we keep going down this route i would change the API.
Users shouldn't generally have to care where in the block they are.
The real problem on this front is loops.

Given:
main:
def(a)
goto loop

loop:
use (a)
def (a)
goto loop or somewhere else

For loop, you need

To call writeVariable of both defs so it's visible to the algorithm, otherwise, it will not get the value over the backedge.

But ask about the version before the def, because that's where the use is.

Yep, however that soon breaks down as soon as you have more than 1 definition in a block so there isn't a single before/after def anymore.
Just reviewing the libfirm SSA updater, we indeed first introduced all defs to the algorithm (picking the latest one if there are multiple) and then go over all defs/uses in a block from top to bottom.

There is a better way, IMHO, to do this,if you have the set of all defs/uses at once (which in LLVM, you can get easily), and it's fixed, by using a variant of traditional renaming:

The traditional SSA renaming algorithm is O(B+I) because it walks all instructions in dominator order by walking the blocks in the dom tree.
The only reason it does this is to push/pop the renaming stack in the right order.
But it's not actually necessary to walk the blocks at all to get the right ordering of the renaming stack.
You can get the dfs completion numbers of the dominator block for each def/use (and, if they aren't the only def/use in the block, use orderedbasicblock or something to get a local numbering. A lot of passes i work on have the local numbering lying around anyway).

Then

Sort the defs/uses by the dfs completion numbers then the local numbers.
For each def/use: 
  <stack consists of member, domnodedfsin, domnodedfsout>
   sync rename stack to dominate current DFS scope of def/use by popping until member dfsin/out is contained by dom node.
   push onto rename stack if you have a def
   if stack is empty, and you are staring at a use, it's undef
   any use is defined by top of stack.

(i'm going to blog this algorithm at some point, because i don't see it really described in any paper, but it's O(def+uses) if you have a local numbering around.)

This is if you've inserted phi nodes using some other algorithm, of course.
You can integrate phi node insertion into this algorithm (for starters, IIRC, you only ever have to insert a phi node if stack.size() > 1) by integrating the paper's algorithm into this one.

I use a variant of this algorithm to do value numbering elimination in O(number of things equal to each other) instead of O(I) like the standard algorithms.
Its also used to do renaming for copy insertion here: here:https://github.com/dberlin/llvm-gvn-rewrite/blob/newgvn-predicateinfo/lib/Transforms/Utils/PredicateInfo.cpp
(in that case, you can even play the marker algorithm style game and only insert a copy once you see a use that will use it. This avoids having to compute liveness just to avoid useless insertions)

I think this should all be hidden, and orderedbb or something used to locate the relative position of the use to the defs.

Yep, some notion of ordering is necessary to get multiple defs/uses in a block "right", finding a way to hide it would be cool.

I think we can do it.

• dberlin mentioned this in D29047: Introduce a basic MemorySSA updater, that supports insertDef and insertUse operations..Jan 23 2017, 12:48 PM

I read the paper and reviewed the marker algorithm. Some comments inline.
Side note, for didactical purposes: I wasn't familiar with this algorithm before, but I think it's very nice, probably easier to understand than Morgan's book chapter/algs on SSA construction.

lib/Transforms/Utils/SSAUpdaterNew.cpp
40–44	I'll be happy if we can add stats about the number of PHIs trivialized, inserted etc.. (so that we can also compare the "basic" marker algorithm and the SCC approach)
78–85	You can point out here (or elsewhere) that the updater can be somehow trivially extended to handle incomplete CFGs.
93–96	As you're describing the single predecessor case above, I would add a comment here saying something like: `Multiple predecessors case, collect all the definitions and construct a phi, if necessary`
99	dot at the end of comment, here and everywhere else.
130	`auto *`, here and everywhere else you use `dyn_cast<>`, `dyn_cast_or_null<>`, `cast<>` etc..
148–149	Just fold `OperRange` inside `tryRemoveTrivialPhi`, maybe.
174–176	It would be good if we can add a test showing that we trivialize self-referencing phi, if possible.
175–176	Also, I understand that we don't need to call `replaceAllUsesWith` as there are no non-self uses, but shouldn't we call `eraseFromParent` here?
188–192	This is not the first time that I find SCCIterator being not-very-general.

bryant added a subscriber: bryant.Jan 24 2017, 1:29 PM

davide added inline comments.Jan 24 2017, 3:51 PM

include/llvm/Transforms/Utils/SSAUpdaterNew.h
1–2	Unformatted, I think.
37	The whole comment needs formatting. The explanation is very good, IMHO.
37–38	extra `///`
42–43	s/doesn/does/
88–92	Comments on these member variables, maybe? (as we do in NewGVN).
lib/Transforms/Utils/SSAUpdaterNew.cpp
1–2	Unformatted.
48–56	These functions appear to be trivial, move to the header, maybe?

davide added inline comments.Jan 24 2017, 4:13 PM

lib/Transforms/Utils/SSAUpdaterNew.cpp
197	This seems to be more `PhiToDFSNum` than `Root`, unless I'm missing something.
205–207	Also, if we special case DFSNum == 0, probably adding a comment would help the reader (or using `const int` or a macro).

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

One data point : in https://llvm.org/bugs/show_bug.cgi?id=16756 we have a test case where JumpThreading is taking 30% of the total time of a clang -O3 compilation.
One large issue was that we were spending a lot of time in SSAUpdater (Duncan mentions in this PR that the bottleneck is SSAUpdater).

When I looked at it, it seemed to me that we were constantly rebuilding the DominatorTree inside the SSAUpdater over and over for each and every replacement, while it seemed possible to build one up front and reusing it.

anemet added a subscriber: anemet.Jan 25 2017, 9:48 AM

In D28934#655907, @mehdi_amini wrote:

In D28934#651323, @sanjoy wrote:

I've never really heard people complain about our current SSAUpdater (about it being too slow or difficult to use) -- well except in this review :). Is there a specific problem you're trying to address with this?

One data point : in https://llvm.org/bugs/show_bug.cgi?id=16756 we have a test case where JumpThreading is taking 30% of the total time of a clang -O3 compilation.
One large issue was that we were spending a lot of time in SSAUpdater (Duncan mentions in this PR that the bottleneck is SSAUpdater).

When I looked at it, it seemed to me that we were constantly rebuilding the DominatorTree inside the SSAUpdater over and over for each and every replacement, while it seemed possible to build one up front and reusing it.

It also rebuilds the dominance frontiers. The problem with this is that the answer changes depending on the definition blocks you choose. We have a minimal linear time calculator (not used in the current updater, but used by PromoteMemToReg now) , but it's still linear-time per variable worst case. It does do tricks to minimize the work it does.
Before i started down the path of the current algorithm, i also read all the latest multi-variable phi placement algorithms.

In practice, this seemed the easiest starting point, as the "share work between frontier calculation" algorithms get fairly complicated.

Diffusion mentioned this in rL293356: Introduce a basic MemorySSA updater, that supports insertDef,.Jan 27 2017, 5:34 PM

It's great to see some new work in this area. It has been years since I rewrote the SSAUpdater and I've always been worried about the performance. It seems fast enough for the common cases but the worst-case behavior is terrible. Anyway, it has been a long time since I looked at this, but I remember concluding that it was hard to really fix it right without changing the API. The basic part of the Braun et al. approach looks very similar to what we had prior to my rewrite (r100047 through r101612). Unfortunately it has been a long time and I'm having trouble remembering the details of the problem we ran into. The gist of it was that SSA construction is different than SSA updating because the latter needs to determine if the necessary phis are already present. We ran into cases where the SSA updater was unable to determine that a loop already had a suitable phi, so it would add a new one, creating phi cycles that grew by one every time the SSA updater was invoked on the code (which can be a large number of times). Perhaps this only came up with irreducible control flow or something, but basically you've got to get minimal SSA all the time to avoid this situation.

At least right now, it should avoid extra phis in all but the irreducible case (unlike the previous attempt, which, again, quite admirable :P)
There, it will not create them more than once (or shouldn't).

If those turn out to matter, we can use the SCC algorithm to minimize them (and i can build it on the fly, it's just uglier to do scc finding as we go instead of after).

lib/Transforms/Utils/SSAUpdaterNew.cpp
175–176	Yes, if phi is not null, we do need to erase it in that case. Whoops.

Just to note: I'll get back to this at some point.

joerg just gave me a testcase where, out of the 3minutes it takes, 2+ minutes are spent forming and verifying LCSSA, all due to the updater speed :(

hiraditya added a subscriber: hiraditya.Feb 14 2017, 6:19 PM

hiraditya added inline comments.

unittests/Transforms/Utils/SSAUpdaterNew.cpp
2	File name...

@dberlin I just came back to this while I was looking at NewGVN PRE.
I decided to give it a try to see how the API looked like and I personally didn't find any problems with it (I might just used it wrong, FWIW).

https://reviews.llvm.org/P7997

In particular, the old API getValueInTheMiddleOfTheBlock has been source of confusion (at least to me).

With your patch and my patch on top of it, I saw still one test failing:

--- /home/davide/old.ll 2017-05-30 20:56:21.058869000 -0700
+++ /home/davide/new.ll 2017-05-30 20:56:30.450514797 -0700
@@ -316,7 +316,7 @@
   br label %block4

 block4:                                           ; preds = %block3, %block2
-  %D = phi i32 [ 87, %block2 ], [ 97, %block3 ]
+  %D = phi i32 [ 97, %block3 ], [ 87, %block2 ]
   %A = phi i32 [ -1, %block2 ], [ 42, %block3 ]
   br i1 %cmpxy, label %block5, label %exit

@@ -350,7 +350,7 @@
   br label %loop

 loop:                                             ; preds = %loop, %entry
-  %Y2 = phi i8 [ %Y, %entry ], [ 0, %loop ]
+  %Y2 = phi i8 [ 0, %loop ], [ %Y, %entry ]
   %i = phi i32 [ 4, %entry ], [ 192, %loop ]
   %X2 = getelementptr i8, i8* %p, i32 %i
   %cond = call i1 @cond2()
@@ -372,7 +372,7 @@
   br label %loop

 loop:                                             ; preds = %cont, %entry
-  %Y2 = phi i8 [ %Y, %entry ], [ %Y2.pre, %cont ]
+  %Y2 = phi i8 [ %Y2.pre, %cont ], [ %Y, %entry ]
   %i = phi i32 [ 4, %entry ], [ 3, %cont ]
   %X2 = getelementptr i8, i8* %p, i32 %i
   %cond = call i1 @cond2()

I guess this is because we don't have a consistent ordering of PHI args. If that's the only problem, I might consider looking at it next (and then keep converting the remaining users).

In D28934#768477, @davide wrote:
@dberlin I just came back to this while I was looking at NewGVN PRE.
I decided to give it a try to see how the API looked like and I personally didn't find any problems with it (I might just used it wrong, FWIW).

https://reviews.llvm.org/P7997

In particular, the old API getValueInTheMiddleOfTheBlock has been source of confusion (at least to me).

With your patch and my patch on top of it, I saw still one test failing:
--- /home/davide/old.ll 2017-05-30 20:56:21.058869000 -0700
+++ /home/davide/new.ll 2017-05-30 20:56:30.450514797 -0700
@@ -316,7 +316,7 @@
   br label %block4

 block4:                                           ; preds = %block3, %block2
-  %D = phi i32 [ 87, %block2 ], [ 97, %block3 ]
+  %D = phi i32 [ 97, %block3 ], [ 87, %block2 ]
   %A = phi i32 [ -1, %block2 ], [ 42, %block3 ]
   br i1 %cmpxy, label %block5, label %exit

@@ -350,7 +350,7 @@
   br label %loop

 loop:                                             ; preds = %loop, %entry
-  %Y2 = phi i8 [ %Y, %entry ], [ 0, %loop ]
+  %Y2 = phi i8 [ 0, %loop ], [ %Y, %entry ]
   %i = phi i32 [ 4, %entry ], [ 192, %loop ]
   %X2 = getelementptr i8, i8* %p, i32 %i
   %cond = call i1 @cond2()
@@ -372,7 +372,7 @@
   br label %loop

 loop:                                             ; preds = %cont, %entry
-  %Y2 = phi i8 [ %Y, %entry ], [ %Y2.pre, %cont ]
+  %Y2 = phi i8 [ %Y2.pre, %cont ], [ %Y, %entry ]
   %i = phi i32 [ 4, %entry ], [ 3, %cont ]
   %X2 = getelementptr i8, i8* %p, i32 %i
   %cond = call i1 @cond2()
I guess this is because we don't have a consistent ordering of PHI args. If that's the only problem, I might consider looking at it next (and then keep converting the remaining users).

ping on my last comment, in case it got lost in the churn :)

In D28934#768477, @davide wrote:
@dberlin I just came back to this while I was looking at NewGVN PRE.
I decided to give it a try to see how the API looked like and I personally didn't find any problems with it (I might just used it wrong, FWIW).

https://reviews.llvm.org/P7997

In particular, the old API getValueInTheMiddleOfTheBlock has been source of confusion (at least to me).

With your patch and my patch on top of it, I saw still one test failing:
--- /home/davide/old.ll 2017-05-30 20:56:21.058869000 -0700
+++ /home/davide/new.ll 2017-05-30 20:56:30.450514797 -0700
@@ -316,7 +316,7 @@
   br label %block4

 block4:                                           ; preds = %block3, %block2
-  %D = phi i32 [ 87, %block2 ], [ 97, %block3 ]
+  %D = phi i32 [ 97, %block3 ], [ 87, %block2 ]
   %A = phi i32 [ -1, %block2 ], [ 42, %block3 ]
   br i1 %cmpxy, label %block5, label %exit

@@ -350,7 +350,7 @@
   br label %loop

 loop:                                             ; preds = %loop, %entry
-  %Y2 = phi i8 [ %Y, %entry ], [ 0, %loop ]
+  %Y2 = phi i8 [ 0, %loop ], [ %Y, %entry ]
   %i = phi i32 [ 4, %entry ], [ 192, %loop ]
   %X2 = getelementptr i8, i8* %p, i32 %i
   %cond = call i1 @cond2()
@@ -372,7 +372,7 @@
   br label %loop

 loop:                                             ; preds = %cont, %entry
-  %Y2 = phi i8 [ %Y, %entry ], [ %Y2.pre, %cont ]
+  %Y2 = phi i8 [ %Y2.pre, %cont ], [ %Y, %entry ]
   %i = phi i32 [ 4, %entry ], [ 3, %cont ]
   %X2 = getelementptr i8, i8* %p, i32 %i
   %cond = call i1 @cond2()
I guess this is because we don't have a consistent ordering of PHI args. If that's the only problem, I might consider looking at it next (and then keep converting the remaining users).

Yes, it should hit the phi arguments in pred order, roughly, but it won't be perfect.
We should just sort them.

Note that we also will not go looking for existing phis to reuse.
IMHO, doing so (as current SSAUpdater does), is not a good plan.
Either let CSE/instsimplify/any of the million things that will fix this, clean it up
or we need to extend the lifetime of ssa updater objects and hash the arguments of phi nodes in blocks we see.
Right now, what SSAUpdater does is badly N^2.

Anyway, if you want to work on this patch, feel free to submit diffs and takeover this revision. I probably won't get back to it for a few months at least.

Thanks for your comment, I'll see what I can do given this shows up in some of our internal workload.
One thing, I agree we should change the API at some point (and maybe use a O(def + uses) renamer as we do in PredInfo) but, for starters, we should be able to have a drop-in replacement for RewriteUse, i.e. the new algorithm will always pick the correct value?

void SSAUpdater::RewriteUse(Use &U) {
  Instruction *User = cast<Instruction>(U.getUser());

  Value *V;
  if (PHINode *UserPN = dyn_cast<PHINode>(User))
    V = GetValueAtEndOfBlock(UserPN->getIncomingBlock(U));
  else
    V = GetValueInMiddleOfBlock(User->getParent());
[...]

In D28934#791453, @davide wrote:
Thanks for your comment, I'll see what I can do given this shows up in some of our internal workload.
One thing, I agree we should change the API at some point (and maybe use a O(def + uses) renamer as we do in PredInfo) but, for starters, we should be able to have a drop-in replacement for RewriteUse, i.e. the new algorithm will always pick the correct value?
void SSAUpdater::RewriteUse(Use &U) {
  Instruction *User = cast<Instruction>(U.getUser());

  Value *V;
  if (PHINode *UserPN = dyn_cast<PHINode>(User))
    V = GetValueAtEndOfBlock(UserPN->getIncomingBlock(U));
  else
    V = GetValueInMiddleOfBlock(User->getParent());
[...]

I believe this should be:

void SSAUpdater::RewriteUse(unsigned Var, Use &U) {
  Instruction *User = cast<Instruction>(U.getUser()); 
  BasicBlock *VarBB = User->getParent();
  if (auto *PN = dyn_cast<PHINode>(User))
    VarBB = PN->getIncomingBlock(U);

  return readVariableAfterDef(Var, VarBB);
}

(Note all these API's, both old and new, assume you are rewriting in program order. It's possible to make it not require that, by storing all the defs in a given block and using ordered instructions to see where the use occurs relative to them)

mzolotukhin mentioned this in D44282: [PR16756] JumpThreading: explicitly update SSA rather than use SSAUpdater..Mar 8 2018, 5:16 PM

sanjoy resigned from this revision.Jan 29 2022, 5:35 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

SSAUpdaterNew.h

125 lines

lib/

Transforms/

Utils/

CMakeLists.txt

1 line

SSAUpdaterNew.cpp

309 lines

unittests/

Transforms/

Utils/

CMakeLists.txt

1 line

SSAUpdaterNew.cpp

107 lines

Diff 85224

include/llvm/Transforms/Utils/SSAUpdaterNew.h

This file was added.

				//===-- SSAUpdaterNew.h - Unstructured SSA Update Tool -------------*- C++
				//-*-===//
				davideUnsubmitted Not Done Reply Inline Actions Unformatted, I think. davide: Unformatted, I think.
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the SSAUpdaterNew class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_UTILS_SSAUPDATERNEW_H
				#define LLVM_TRANSFORMS_UTILS_SSAUPDATERNEW_H

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DenseSet.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"
				#include <string>

				namespace llvm {

				class BasicBlock;
				class Instruction;
				class LoadInst;
				template <typename T> class ArrayRef;
				template <typename T> class SmallVectorImpl;
				class PHINode;
				class Type;
				class Use;
				class Value;

				/// \brief Helper class for SSA formation on a set of values defined in
				davideUnsubmitted Not Done Reply Inline Actions The whole comment needs formatting. The explanation is very good, IMHO. davide: The whole comment needs formatting. The explanation is very good, IMHO.
				/// multiple blocks.///
				davideUnsubmitted Not Done Reply Inline Actions extra `///` davide: extra `///`
				/// This is used when code duplication or another unstructured
				/// transformation wants to rewrite a set of uses of one value with uses of a
				/// set of values.
				/// First, choose an unsigned int to represent your "variable". As long as you
				/// are consistent, it doesn not matter which you choose.
				davideUnsubmitted Not Done Reply Inline Actions s/doesn/does/ davide: s/doesn/does/
				/// Set the name and type of the "variable" using setName and setType.
				/// Now, for each definition of the "variable" that already exists or you've
				/// inserted, call writeVariable(Var, BB, Value).
				/// For each place you want to know the variable to use (for example, to replace
				/// a use), call readVariableBeforeDef(Var, BB) or readVariableAfterDef(Var,
				/// BB), it will place phi nodes if necessary.
				/// If you have multiple definitions in the same block, call the read/write
				/// routines in program order.
				/// For example, if before i have:
				/// if (foo)
				/// a = b + c
				/// d = b + c
				/// and i place a new copy in the else block to do PRE:
				/// if (foo)
				/// ifbb:
				/// a = b + c
				/// else
				/// elsebb:
				/// temp = b + c
				/// mergebb:
				/// d = b + c
				/// I would call:
				/// setType(0, a->getType())
				/// setName(0, "pretemp")
				/// writeVariable(0, ifbb, a)
				/// writeVariable(0, elsebb, temp)
				/// and now to update d, i would call readVariableBeforeDef(0, mergebb), and it
				/// will
				/// create a phi node and return it to me.
				/// if i have a loop with a use as so:
				// pred:
				// def (A)
				// loop:
				// use(A)
				// def(A)
				// branch to either pred, loop
				// i would call writevariable(0, pred, A)
				// then readvariablebeforedef(0, loop, A) to replace the use (because the use
				// occurs
				/// before the first definition)

				class SSAUpdaterNew {

				private:
				SmallVectorImpl<PHINode > InsertedPHIs;
				DenseMap<std::pair<unsigned, BasicBlock >, Value > CurrentDef;
				DenseSet<std::pair<unsigned, BasicBlock *>> VisitedBlocks;
				DenseMap<unsigned, Type *> CurrentType;
				DenseMap<unsigned, StringRef> CurrentName;
				davideUnsubmitted Not Done Reply Inline Actions Comments on these member variables, maybe? (as we do in NewGVN). davide: Comments on these member variables, maybe? (as we do in NewGVN).

				public:
				/// If InsertedPHIs is specified, it will be filled
				/// in with all PHI Nodes created by rewriting.
				explicit SSAUpdaterNew(SmallVectorImpl<PHINode > InsertedPHIs = nullptr);
				SSAUpdaterNew(const SSAUpdaterNew &) = delete;
				SSAUpdaterNew &operator=(const SSAUpdaterNew &) = delete;
				~SSAUpdaterNew();

				// Mark a definition point of variable Var, occuring in BasicBlock BB, with
				// Value V.
				void writeVariable(unsigned Var, BasicBlock BB, Value V);
				// Get the definition of Var in BB, creating any phi nodes necessary to give a
				// singular answer.
				Value readVariableAfterDef(unsigned Var, BasicBlock );
				// Set the type to be used for variable Var
				void setType(unsigned Var, Type *Ty);
				void setName(unsigned Var, StringRef Name);
				Value readVariableBeforeDef(unsigned Var, BasicBlock );
				void minimizeInsertedPhis();

				private:
				void minimizeInsertedPhisHelper(const SmallVectorImpl<PHINode *> &);
				void processSCC(const SmallPtrSetImpl<PHINode *> &);
				Value addPhiOperands(unsigned Var, PHINode );
				template <class RangeType>
				Value tryRemoveTrivialPhi(unsigned Var, Instruction Phi,
				RangeType &Operands);
				Value recursePhi(unsigned Var, Value );
				};
				}

				#endif // LLVM_TRANSFORMS_UTILS_SSAUPDATER_H

lib/Transforms/Utils/CMakeLists.txt

Show All 34 Lines	add_llvm_library(LLVMTransformUtils
Mem2Reg.cpp		Mem2Reg.cpp
MemorySSA.cpp		MemorySSA.cpp
MetaRenamer.cpp		MetaRenamer.cpp
ModuleUtils.cpp		ModuleUtils.cpp
NameAnonGlobals.cpp		NameAnonGlobals.cpp
PromoteMemoryToRegister.cpp		PromoteMemoryToRegister.cpp
StripGCRelocates.cpp		StripGCRelocates.cpp
SSAUpdater.cpp		SSAUpdater.cpp
		SSAUpdaterNew.cpp
SanitizerStats.cpp		SanitizerStats.cpp
SimplifyCFG.cpp		SimplifyCFG.cpp
SimplifyIndVar.cpp		SimplifyIndVar.cpp
SimplifyInstructions.cpp		SimplifyInstructions.cpp
SimplifyLibCalls.cpp		SimplifyLibCalls.cpp
SplitModule.cpp		SplitModule.cpp
StripNonLineTableDebugInfo.cpp		StripNonLineTableDebugInfo.cpp
SymbolRewriter.cpp		SymbolRewriter.cpp
Show All 11 Lines

lib/Transforms/Utils/SSAUpdaterNew.cpp

This file was added.

				//===- SSAUpdaterNew.cpp - Unstructured SSA Update Tool
				//----------------------===//
				davideUnsubmitted Not Done Reply Inline Actions Unformatted. davide: Unformatted.
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the SSAUpdaterNew class.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Utils/SSAUpdaterNew.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SCCIterator.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/ADT/TinyPtrVector.h"
				#include "llvm/Analysis/InstructionSimplify.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/CFG.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/DebugLoc.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Use.h"
				#include "llvm/IR/Value.h"
				#include "llvm/IR/ValueHandle.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include <cassert>
				#include <utility>

				using namespace llvm;

				#define DEBUG_TYPE "SSAUpdaterNew"

				SSAUpdaterNew::SSAUpdaterNew(SmallVectorImpl<PHINode > NewPHI)
				davideUnsubmitted Not Done Reply Inline Actions I'll be happy if we can add stats about the number of PHIs trivialized, inserted etc.. (so that we can also compare the "basic" marker algorithm and the SCC approach) davide: I'll be happy if we can add stats about the number of PHIs trivialized, inserted etc.. (so that…
				: InsertedPHIs(NewPHI) {}

				SSAUpdaterNew::~SSAUpdaterNew() {}
				void SSAUpdaterNew::setType(unsigned Var, Type *Ty) { CurrentType[Var] = Ty; }
				void SSAUpdaterNew::setName(unsigned Var, StringRef Name) {
				CurrentName[Var] = Name;
				}

				// Store a def of Variable Var, in BB, with Value V
				void SSAUpdaterNew::writeVariable(unsigned Var, BasicBlock BB, Value V) {
				CurrentDef[{Var, BB}] = V;
				}
				davideUnsubmitted Not Done Reply Inline Actions These functions appear to be trivial, move to the header, maybe? davide: These functions appear to be trivial, move to the header, maybe?

				// Given a use that occurs after the most recent definition of a variable in a
				// block, get the value of that use.
				// IE
				MatzeBUnsubmitted Not Done Reply Inline Actions Shouldn't this be `readVariableBeforeDef(Var, BB)`? MatzeB: Shouldn't this be `readVariableBeforeDef(Var, BB)`?
				dberlinAuthorUnsubmitted Not Done Reply Inline Actions I think you are commenting on an old version of the patch where it had that bug :) dberlin: I think you are commenting on an old version of the patch where it had that bug :)
				// def (foo)
				// use (foo)
				// You would use readVariableAfterDef after calling writeVariable for the def
				Value SSAUpdaterNew::readVariableAfterDef(unsigned Var, BasicBlock BB) {
				auto CDI = CurrentDef.find({Var, BB});
				if (CDI != CurrentDef.end())
				return CDI->second;
				return readVariableBeforeDef(Var, BB);
				}

				// Given a use that occurs, currently, before the first definition of a
				// variablein a block, get the value for that use.
				// IE given
				// use (foo)
				// def (foo)
				// You would use readVariableBeforeDef to get the value for the use of foo

				// This is the marker algorithm from "Simple and Efficient Construction of
				// Static Single Assignment Form"
				// The simple, non-marker algorithm places phi nodes at any join
				// Here, we place markers, and only place phi nodes if they end up necessary.
				// They are only necessary if they break a cycle (IE we recursively visit
				// ourselves again), or we discover, while getting the value of the operands,
				// that there are two or more definitions needing to be merged.
				// This still will leave non-minimal form in the case of irreducible control
				davideUnsubmitted Not Done Reply Inline Actions You can point out here (or elsewhere) that the updater can be somehow trivially extended to handle incomplete CFGs. davide: You can point out here (or elsewhere) that the updater can be somehow trivially extended to…
				// flow, where phi nodes may be in cycles with themselves, but unnecessary.
				MatzeBUnsubmitted Not Done Reply Inline Actions Maybe `PredVal->getParent() == BB && isa<PHINode>(PredVal)` is slightly more efficient. MatzeB: Maybe `PredVal->getParent() == BB && isa<PHINode>(PredVal)` is slightly more efficient.
				Value SSAUpdaterNew::readVariableBeforeDef(unsigned Var, BasicBlock BB) {
				Value *Result;

				// Single predecessor case, just recurse
				if (BasicBlock *Pred = BB->getSinglePredecessor()) {
				Result = readVariableAfterDef(Var, Pred);
				} else if (VisitedBlocks.count({Var, BB})) {
				// We hit our node again, meaning we had a cycle, we must insert a phi
				// node to break it.
				IRBuilder<> B(BB, BB->begin());
				davideUnsubmitted Not Done Reply Inline Actions As you're describing the single predecessor case above, I would add a comment here saying something like: `Multiple predecessors case, collect all the definitions and construct a phi, if necessary` davide: As you're describing the single predecessor case above, I would add a comment here saying…
				Result = B.CreatePHI(CurrentType.lookup(Var), 0, CurrentName.lookup(Var));
				} else {
				// Mark us visited so we can detect a cycle
				davideUnsubmitted Not Done Reply Inline Actions dot at the end of comment, here and everywhere else. davide: dot at the end of comment, here and everywhere else.
				VisitedBlocks.insert({Var, BB});
				SmallVector<Value *, 8> PHIOps;

				// Recurse to get the values in our predecessors. This will insert phi nodes
				// if we cycle
				for (auto *Pred : predecessors(BB))
				PHIOps.push_back(readVariableAfterDef(Var, Pred));
				// Now try to simplify the ops to avoid placing a phi.
				// This may return null if we never created a phi yet, that's okay
				PHINode *Phi = dyn_cast_or_null<PHINode>(CurrentDef.lookup({Var, BB}));
				// Check if the existing phi has the operands we need
				// It may be we placed due to a cycle, in which case, the number of operands
				// will be 0
				if (Phi && Phi->getNumOperands() != 0)
				if (!std::equal(Phi->op_begin(), Phi->op_end(), PHIOps.begin()))
				Phi = nullptr;
				Result = tryRemoveTrivialPhi(Var, Phi, PHIOps);
				// If we couldn't simplify, we may have to create a phi
				if (Result == Phi) {
				if (!Phi) {
				IRBuilder<> B(BB, BB->begin());
				Phi = B.CreatePHI(CurrentType.lookup(Var), 0, CurrentName.lookup(Var));
				}

				// These will be filled in by the recursive read.
				for (auto *Pred : predecessors(BB))
				Phi->addIncoming(CurrentDef.lookup({Var, Pred}), Pred);
				Result = Phi;
				}
				if (InsertedPHIs)
				if (PHINode *PN = dyn_cast<PHINode>(Result))
				davideUnsubmitted Not Done Reply Inline Actions `auto `, here and everywhere else you use `dyn_cast<>`, `dyn_cast_or_null<>`, `cast<>` etc.. davide:* `auto *`, here and everywhere else you use `dyn_cast<>`, `dyn_cast_or_null<>`, `cast<>` etc..
				InsertedPHIs->push_back(PN);
				// Set ourselves up for the next variable by resetting visited state.
				VisitedBlocks.erase({Var, BB});
				}
				writeVariable(Var, BB, Result);
				return Result;
				}

				// Recurse over a set of phi uses to eliminate the trivial ones
				Value SSAUpdaterNew::recursePhi(unsigned Var, Value Phi) {
				if (!Phi)
				return nullptr;
				TrackingVH<Value> Res(Phi);
				SmallVector<TrackingVH<Value>, 8> Uses;
				std::copy(Phi->user_begin(), Phi->user_end(), std::back_inserter(Uses));
				for (auto &U : Uses) {
				if (PHINode UsePhi = dyn_cast<PHINode>(&U)) {
				auto OperRange = UsePhi->operands();
				tryRemoveTrivialPhi(Var, UsePhi, OperRange);
				davideUnsubmitted Not Done Reply Inline Actions Just fold `OperRange` inside `tryRemoveTrivialPhi`, maybe. davide: Just fold `OperRange` inside `tryRemoveTrivialPhi`, maybe.
				}
				}
				return Res;
				}

				// Eliminate trivial phis
				// Phis are trivial if they are defined either by themselves, or all the same
				// argument.
				// IE phi(a, a) or b = phi(a, b) or c = phi(a, a, c)
				// We recursively try to remove
				template <class RangeType>
				Value SSAUpdaterNew::tryRemoveTrivialPhi(unsigned Var, Instruction Phi,
				RangeType &Operands) {
				// Detect equal or self arguments
				Value *Same = nullptr;
				for (auto &Op : Operands) {
				// If the same or self, good so far
				if (Op == Phi \|\| Op == Same)
				continue;
				// not the same, return the phi since it's not eliminatable by us
				if (Same)
				return Phi;
				Same = Op;
				}
				// Never found a non-self reference, the phi is undef
				if (Same == nullptr)
				return UndefValue::get(CurrentType.lookup(Var));
				davideUnsubmitted Not Done Reply Inline Actions It would be good if we can add a test showing that we trivialize self-referencing phi, if possible. davide: It would be good if we can add a test showing that we trivialize self-referencing phi, if…
				davideUnsubmitted Not Done Reply Inline Actions Also, I understand that we don't need to call `replaceAllUsesWith` as there are no non-self uses, but shouldn't we call `eraseFromParent` here? davide: Also, I understand that we don't need to call `replaceAllUsesWith` as there are no non-self…
				dberlinAuthorUnsubmitted Not Done Reply Inline Actions Yes, if phi is not null, we do need to erase it in that case. Whoops. dberlin: Yes, if phi is not null, we do need to erase it in that case. Whoops.
				if (Phi) {
				Phi->replaceAllUsesWith(Same);
				Phi->eraseFromParent();
				}

				// We should only end up recursing in case we replaced something, in which
				// case, we may have made other Phis trivial.
				return recursePhi(Var, Same);
				}

				// Tarjan's algorithm with Nuutila's improvements
				// Wish we could use SCCIterator, but graph traits makes it very hard to
				// create induced subgraphs because it
				// 1. only works on pointers, so i can't just create an intermediate struct
				// 2. the functions are static, so i can't just override them in a subclass of
				// graphtraits, or otherwise store state in the struct.
				davideUnsubmitted Not Done Reply Inline Actions This is not the first time that I find SCCIterator being not-very-general. davide: This is not the first time that I find SCCIterator being not-very-general.

				struct TarjanSCC {
				unsigned int DFSNum = 1;
				SmallPtrSet<Value *, 8> InComponent;
				DenseMap<Value *, unsigned int> Root;
				davideUnsubmitted Not Done Reply Inline Actions This seems to be more `PhiToDFSNum` than `Root`, unless I'm missing something. davide: This seems to be more `PhiToDFSNum` than `Root`, unless I'm missing something.
				SmallPtrSet<Value *, 8> PHISet;
				SmallVector<Value *, 8> Stack;
				// Store the components as vector of ptr sets, because we need the topo order
				// of SCC's, but not individual member order
				SmallVector<SmallPtrSet<PHINode *, 8>, 8> Components;
				TarjanSCC(const SmallVectorImpl<PHINode *> &Phis)
				: PHISet(Phis.begin(), Phis.end()) {
				for (auto Phi : Phis)
				if (Root.lookup(Phi) == 0)
				FindSCC(Phi);
				davideUnsubmitted Not Done Reply Inline Actions Also, if we special case DFSNum == 0, probably adding a comment would help the reader (or using `const int` or a macro). davide: Also, if we special case DFSNum == 0, probably adding a comment would help the reader (or using…
				}
				void FindSCC(PHINode *Phi) {
				Root[Phi] = ++DFSNum;
				// Store the DFS Number we had before it possibly gets incremented.
				unsigned int OurDFS = DFSNum;
				InComponent.erase(Phi);
				for (auto &PhiOp : Phi->operands()) {
				// Only inducing the subgraph of phis we got passed
				if (!PHISet.count(PhiOp))
				continue;
				if (Root.lookup(PhiOp) == 0)
				FindSCC(cast<PHINode>(PhiOp));
				if (!InComponent.count(PhiOp))
				Root[Phi] = std::min(Root.lookup(Phi), Root.lookup(PhiOp));
				}
				// See if we really were the root, by seeing if we still have our DFSNumber,
				// or something lower
				if (Root.lookup(Phi) == OurDFS) {
				Components.resize(Components.size() + 1);
				auto &Component = Components.back();
				Component.insert(Phi);
				DEBUG(dbgs() << "Component root is " << *Phi << "\n");
				InComponent.insert(Phi);
				while (!Stack.empty() && Root.lookup(Stack.back()) >= OurDFS) {
				DEBUG(dbgs() << "Component member is " << *Stack.back() << "\n");
				Component.insert(cast<PHINode>(Stack.back()));
				InComponent.insert(Stack.back());
				Stack.pop_back();
				}
				} else {
				// Part of a component, push to stack
				Stack.push_back(Phi);
				}
				}
				};

				// Condense the set of SCCs and by extension, necessary phis
				void SSAUpdaterNew::processSCC(const SmallPtrSetImpl<PHINode *> &Component) {
				DEBUG(dbgs() << "Start component\n");
				for (auto *Member : Component)
				DEBUG(dbgs() << "Component member is " << *Member << "\n");
				DEBUG(dbgs() << "End component\n");
				// We already processed trivial components
				if (Component.size() == 1)
				return;

				SmallVector<PHINode *, 8> Inner;
				SmallPtrSet<Value *, 8> OuterOps;

				// See which members have operands inside or outside our component.
				for (auto *Phi : Component) {
				bool isInner = true;
				for (auto &PhiOp : Phi->operands())
				if (!isa<PHINode>(PhiOp) \|\| !Component.count(cast<PHINode>(PhiOp))) {
				OuterOps.insert(PhiOp);
				isInner = false;
				}
				if (isInner)
				Inner.push_back(Phi);
				}
				// If there is only one outside operand, all the phis end up resolving to that
				// value.
				// If there is more than one outside operand, then the ones with outside
				// operands are necessary. We then condense the set to the nodes only having
				// inner ops. Because Tarjan's SCC algorithm is maximal, we may be able to
				// reduce these even further.
				if (OuterOps.size() == 1) {
				Value Replacement = OuterOps.begin();
				for (auto *Member : Component) {
				// FIXME: Speed this up
				auto Iter = std::find(InsertedPHIs->begin(), InsertedPHIs->end(), Member);
				if (Iter != InsertedPHIs->end()) {
				// swap to the end and resize, to avoid erase in middle
				std::swap(*Iter, InsertedPHIs->back());
				InsertedPHIs->resize(InsertedPHIs->size() - 1);
				}

				Member->replaceAllUsesWith(Replacement);
				Member->eraseFromParent();
				}
				} else if (OuterOps.size() > 1) {
				minimizeInsertedPhisHelper(Inner);
				}
				}

				void SSAUpdaterNew::minimizeInsertedPhisHelper(
				const SmallVectorImpl<PHINode *> &Phis) {
				if (Phis.empty())
				return;
				// Compute the induced subgraph
				TarjanSCC SCCFinder(Phis);
				for (auto &Component : SCCFinder.Components) {
				processSCC(Component);
				}
				}

				void SSAUpdaterNew::minimizeInsertedPhis() {
				if (!InsertedPHIs)
				return;

				minimizeInsertedPhisHelper(*InsertedPHIs);
				}

unittests/Transforms/Utils/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Analysis			Analysis
	Core			Core
	Support			Support
	TransformUtils			TransformUtils
	)			)

	add_llvm_unittest(UtilsTests			add_llvm_unittest(UtilsTests
	ASanStackFrameLayoutTest.cpp			ASanStackFrameLayoutTest.cpp
	Cloning.cpp			Cloning.cpp
	FunctionComparator.cpp			FunctionComparator.cpp
	IntegerDivision.cpp			IntegerDivision.cpp
	Local.cpp			Local.cpp
	MemorySSA.cpp			MemorySSA.cpp
				SSAUpdaterNew.cpp
	ValueMapperTest.cpp			ValueMapperTest.cpp
	)			)

unittests/Transforms/Utils/SSAUpdaterNew.cpp

This file was added.

				//===- Local.cpp - Unit tests for Local -----------------------------------===//
				//
				hiradityaUnsubmitted Not Done Reply Inline Actions File name... hiraditya: File name...
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Utils/SSAUpdaterNew.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "gtest/gtest.h"

				using namespace llvm;

				TEST(SSAUpdaterNew, SimpleMerge) {
				SSAUpdaterNew Updater;
				LLVMContext C;
				Module M("SSAUpdaterTest", C);
				IRBuilder<> B(C);
				auto *F = Function::Create(
				FunctionType::get(B.getVoidTy(), {B.getInt32Ty(), B.getInt32Ty()}, false),
				GlobalValue::ExternalLinkage, "F", &M);

				// Make blocks
				BasicBlock *IfBB = BasicBlock::Create(C, "if", F);
				BasicBlock *TrueBB = BasicBlock::Create(C, "true", F);
				BasicBlock *FalseBB = BasicBlock::Create(C, "false", F);
				BasicBlock *MergeBB = BasicBlock::Create(C, "merge", F);
				B.SetInsertPoint(IfBB);
				B.CreateCondBr(B.getTrue(), TrueBB, FalseBB);
				B.SetInsertPoint(TrueBB);
				B.CreateBr(MergeBB);
				B.SetInsertPoint(FalseBB);
				B.CreateBr(MergeBB);
				Argument FirstArg = &F->arg_begin();
				Argument SecondArg = &(++(F->arg_begin()));
				B.SetInsertPoint(&TrueBB->front());
				Value *AddOp = B.CreateAdd(FirstArg, SecondArg);
				Updater.setType(0, B.getInt32Ty());
				Updater.setName(0, "update");
				Updater.writeVariable(0, TrueBB, AddOp);
				PHINode *FirstPhi = dyn_cast_or_null<PHINode>(Updater.readVariableBeforeDef(0, MergeBB));
				EXPECT_NE(FirstPhi, nullptr);
				EXPECT_TRUE(isa<UndefValue>(FirstPhi->getIncomingValueForBlock(FalseBB)));
				EXPECT_EQ(FirstPhi->getIncomingValueForBlock(TrueBB), AddOp);
				B.SetInsertPoint(FalseBB, FalseBB->begin());
				Value *AddOp2 = B.CreateAdd(FirstArg, SecondArg);
				Updater.writeVariable(0, FalseBB, AddOp2);
				PHINode *SecondPhi = dyn_cast_or_null<PHINode>(Updater.readVariableBeforeDef(0, MergeBB));
				EXPECT_NE(SecondPhi, nullptr);
				EXPECT_EQ(SecondPhi->getIncomingValueForBlock(TrueBB), AddOp);
				EXPECT_EQ(SecondPhi->getIncomingValueForBlock(FalseBB), AddOp2);
				}
				TEST(SSAUpdaterNew, Irreducible) {
				//Create the equivalent of
				// x = something
				// if (...)
				// goto second_loop_entry
				// while (...) {
				// second_loop_entry:
				// }
				// use(x)
				SmallVector<PHINode *, 8> Inserted;
				SSAUpdaterNew Updater(&Inserted);
				LLVMContext C;
				Module M("SSAUpdaterTest", C);
				IRBuilder<> B(C);
				auto *F = Function::Create(
				FunctionType::get(B.getVoidTy(), {B.getInt32Ty(), B.getInt32Ty()}, false),
				GlobalValue::ExternalLinkage, "F", &M);

				// Make blocks
				BasicBlock *IfBB = BasicBlock::Create(C, "if", F);
				BasicBlock *LoopStartBB = BasicBlock::Create(C, "loopstart", F);
				BasicBlock *LoopMainBB = BasicBlock::Create(C, "loopmain", F);
				BasicBlock *AfterLoopBB = BasicBlock::Create(C, "afterloop", F);
				B.SetInsertPoint(IfBB);
				B.CreateCondBr(B.getTrue(), LoopMainBB, LoopStartBB);
				B.SetInsertPoint(LoopStartBB);
				B.CreateBr(LoopMainBB);
				B.SetInsertPoint(LoopMainBB);
				B.CreateCondBr(B.getTrue(), LoopStartBB, AfterLoopBB);
				Argument FirstArg = &F->arg_begin();
				Updater.setType(0, B.getInt32Ty());
				Updater.setName(0, "update");
				Updater.writeVariable(0, &F->getEntryBlock(), FirstArg);

				// The old updater crashes on an empty block, so add a return.
				B.SetInsertPoint(AfterLoopBB);
				ReturnInst *Return = B.CreateRet(FirstArg);
				// This will create two phis in a cycle, that are useless.
				Value *FirstResult = Updater.readVariableBeforeDef(0, AfterLoopBB);
				Return->setOperand(0, FirstResult);
				// Right now it is a phi.
				EXPECT_TRUE(isa<PHINode>(Return->getOperand(0)));
				EXPECT_EQ(Inserted.size(), 2u);
				Updater.minimizeInsertedPhis();
				// Now it should be an argument
				EXPECT_EQ(Return->getOperand(0), FirstArg);
				// And we should have no phis
				EXPECT_TRUE(Inserted.empty());
				}

This is an archive of the discontinued LLVM Phabricator instance.

Write a new SSAUpdaterNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85224

include/llvm/Transforms/Utils/SSAUpdaterNew.h

lib/Transforms/Utils/CMakeLists.txt

lib/Transforms/Utils/SSAUpdaterNew.cpp

unittests/Transforms/Utils/CMakeLists.txt

unittests/Transforms/Utils/SSAUpdaterNew.cpp

Write a new SSAUpdater
Needs ReviewPublic