This is an archive of the discontinued LLVM Phabricator instance.

Update MergedLoadStoreMotion to use MemorySSA
AbandonedPublic

Authored by dberlin on Mar 28 2015, 5:43 PM.

Details

Summary

This updates MergedLoadStoreMotion to use and preserve MemorySSA.
It depends on D7864

Prior to this, the algorithm for loads was N^2, and for stores, it was N^2R, where N is the number of instructions in
the block, and R is the number of removed stores
(It restarted the reverse walk every time it removed a store due to iterator invalidation).

It is now O(M) (where M is the number of memory instructions in the two blocks) for loads,
and O(max(M,S^2)) for stores (because we have no downwards clobbering API yet, the hash table does not help
us determine memory dependence for our uses).

I have deliberately not changed behavior in terms of what loads/stores it will remove or the compile time
controls, in order to make the change minimal.

(Hopefully phabricator will not screwup a revision that is a branch of a branch,
arc diff --preview showed the right stuff)

Diff Detail

Event Timeline

dberlin retitled this revision from to Update MergedLoadStoreMotion to use MemorySSA.
dberlin updated this object.
dberlin edited the test plan for this revision. (Show Details)
dberlin added reviewers: hfinkel, reames, nlewycky.
dberlin added a subscriber: Unknown Object (MLST).

Delete dead functions
Trigger an update so phabricator sends out this review, since it got eaten during sendgrid issues

Update for MemorySSA API Update

reames edited edge metadata.Apr 9 2015, 9:10 AM

FYI, I'm basically reviewing this as a new algorithm. Trying to match it up with what's there before doesn't seem particularly worth doing given the algorithmic problems you commented on.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
148

Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code would be cleaner.

Also, I'm surprised that a generic instruction hash function doesn't already exist. What does GVN use for this?

159

Is the extra A == B check needed? I would expect that to be inside isSameOperationAs.

361

Not quite clear what you're trying to say with this comment? The code appears to just be updating MemorySSA for the transform performed.

444

I'm a bit confused here. If we found the set of load instructions which share the generation number at the start of a block and we're merging two blocks with a single shared successor, why do we need any further legality checks? Aren't all the loads guaranteed to have the generation number at the end of the common block?

Are you possibly trying to do two generations at once here? Or something else more complex?

549

Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may be completely correct, but it looks suspicious when compared to instruction insertion into a basic block.

582

Given this is a linear traversal of Pred0, does the reserve really gain you anything here?

591

Neat!

591–592

Ok, I'm again a bit confused by the algorithm. :)

What I expected to see was something along the following:

  • Find store in each block which is a candidate for merge (using memory-def edges from MemoryPhi)
  • Merge any loads in those basic blocks which use the generation if possible.
  • Search downward from each store to find a conflicting memory access or end of block. (This could either by by walking instructions or mem-use edges.)
  • If both reach end of block, merge.
  • Repeat previous steps with newly created MemoryPhi for earlier stores if any.

During building, MemorySSA currently optimizes uses to the nearest
dominating clobbering access for you, done at the suggestion of 2
people. We could make it stop doing this (it's just a flag), but the
tradeoff is most things end up walking more in practice.
Additionally, as things optimize, and you update memssa, you will
often end up with the *same result* anyway. The only way to stop that
is to make it so you are required to use "nearest dominating
memorydef" instead of "dominating memorydef" when updating. We do
this in the API's where we do insertion (addNewMemoryUse), but we
don't check/verify it in replaceMemoryAccess style API's, we only
check standard domination.

The other tradeoff, btw, is that while doing MemoryUse optimization
while building makes this particular algorithm slightly harder, it
makes an (IMHO) much more useful thing a lot easier.

A lot of passes (GVN, memcpyopt, etc) want to know whether they can
replace a load with a load to eliminate something:

1= MemoryDef(liveonentry)
store a
2 = MemoryDef(1)
store b
3 = MemoryDef(2)
store c
4 = MemoryDef(3)
store d
MemoryUse(1)
load a, 4 bytes
load a, 8 bytes
(or different types, or whatever)

By optimizing MemoryUse's during building, we are guaranteed that if
we do getMemoryAccess(load A)->definition(which is store a)->uses, or
we now have all loads that actually use *that* store's value. This
means we can look at the other loads of that store value and see if we
can reuse them.

Doing this without use optimization of uses would be much harder.
you'd need to use a downwards API from the store to get all possible
real uses, which may encompass walking the entire program.

On the other hand, the tradeoff of doing MemoryUse optimization is
that it means to get all the loads in a block, you really have to ask
for all the loads in the block, instead of trying to hope you have
chains that give it to you :)

dberlin edited edge metadata.
  • Update for new insertion API. Clean up comments a bit
  • Update to walk stores

This now uses an immediate use walker rather than the alias ABI to determine downward exposure of stores

reames requested changes to this revision.Oct 8 2015, 10:28 AM
reames edited edge metadata.

Waiting for update or possibly submission. Have lost track.

This revision now requires changes to proceed.Oct 8 2015, 10:28 AM

I have a large update coming to this, was just trying to make a sane update
API (the rest is easy).

dberlin edited edge metadata.

Update to work in progress
This will work for loads, stores coming up.

This also includes the memoryssa update API, which will be split out and committed separately

dberlin edited edge metadata.
  • Add support for stores. Code generated is correct but updating of memoryssa is wrong and will be fixed
  • Add verification for completeness of PHI nodes
  • Add basic PHI creation API
  • Update MemorySSA for store sinking

This version should correctly update memoryssa for store sinking :)
(I checked that it does on the llvm testcases, i'll test it further as i
split out the memoryssa part)

dberlin added inline comments.Jun 15 2016, 12:10 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
148

To answer the original question, i'm checking to see about lambdas.

No generic Instruction hasher exists that covers precisely the set of stuff we care about. GVN does what newgvn does, which is it defines its own expression class, shoves operands in those classes, and calls hash_combine on the whole shebang:

friend hash_code hash_value(const Expression &Value) {
    return hash_combine(
        Value.opcode, Value.type,
        hash_combine_range(Value.varargs.begin(), Value.varargs.end()));
  }

The default hasher will end up hashing the pointer values themselves, which is what gvn wanted in this case.

It is not useful, however, for hashing "parts of operations you care about", or "do two operations look pretty similar".

Past that, i did not change the things included in the hash to ensure consistency of what is optimized with the existing code.

159

It is not, it goes straight into comparing all sorts of fields/data.

gberry edited edge metadata.Jun 16 2016, 11:52 AM

A couple of minor comments I noticed when reading through this for my own edification:

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
617

It seems like you forgot to finish this sentence.

975

typo: "MemoryDefwith"

987

typo: "replaceALlUsesWith"

dberlin marked 3 inline comments as done.Jun 16 2016, 4:20 PM
dberlin edited edge metadata.
  • Update MergedLoadStoreMotion to optionally use MemorySSA
  • Add support for stores. Code generated is correct but updating of memoryssa is wrong
  • Add verification for completeness of PHI nodes
  • Add basic PHI creation API
  • Update MemorySSA for store sinking
  • Fix typos

A few comments in passing.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
98

Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption that other passes will have a command-line option to opt into using MemorySSA? Alternatively, we could include the option in the PassManagerBuilder and use this single flag to control the use of MemorySSA for all passes during the transition period. Just my 2c.

176–177

Should the DominatorTree be added as preserved?

177

Perhaps I missed something, but why is TargetLibraryInfo required?

542

No need for extra curly brackets.

706

but -> put

721

Pre-increment here?

728

Any reason we don't just increment in the for statement itself?

733

Please add a period.

817

Please add a period.

dberlin marked 10 inline comments as done.Jun 17 2016, 8:26 AM
dberlin added inline comments.
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
98

I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is).

A single flag would be nice, but would make it much harder to turn memoryssa on and off for each pass, and the point of the flag is to be able to test/debug regressions until we are satisified and get rid of the non-memoryssa version.

(ala what happened with SROA, etc and the ssaupdater)

176–177

This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only

177

It is not, this is a merge error.

721

Not sure what line this is supposed to go with, i can't find a post-increment around :)

728

Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating the iterator.
It turns out to be non-trivial to change this, but i never moved this loop back to account for this.

dberlin added inline comments.Jun 17 2016, 8:41 AM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
728

Actually, it can't be done, we still invalidate the current iterator for loads.
I'll document this.

Update for requested changes

mcrosier added inline comments.Jun 17 2016, 8:54 AM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
98

I agree we should have a flag per pass to simplify debugging.

721

Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the if statement.

++NLoads;
if (NLoads * Size1 ...)

vs

if (++NLoads * Size1 ...)
dberlin marked an inline comment as done.

Rebase for passmanager conversion

It looks like this change still contains all of the MemorySSA changes that were split out into D21463

Properly rebase on top of other patch

Should be fixed now.

Would it make sense to split out the DFS numbering code motion barrier code into a separate change?

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
977

Typo: "explosed".
Also, there is no 'End' parameter as referenced in the comment.

Rebase after dependent revision commited

Also run verification pass with new pass manager for tests.

Any other comments?
If not, because it's off by default (and it looked like there are no real objections), my plan was to commit it.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
721

Oh, sorry. I copied this code from the non-memoryssa version.
(In truth, the code can stand a lot of cleaning up, improvement, and reducing of restrictions)

I'm still digesting this, but here are a couple more nits I noticed.
I don't want to hold up approval from anyone else though.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
757

Isn't MemorySSA only preserved if 'UseMemorySSA' is true?

982

getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not call them.
which seems to conflict with this usage.

dberlin added inline comments.Jun 22 2016, 12:54 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
757

Yes, fixed.

982

I will change the docs for them in a separate patch.

The change to make updateDFSNumbers public, etc, was done specifically to enable this usage.
See the thread titlte "[PATCH] Make updateDFSNumbers API public" from last year, and there was a review where i computed them separately, and it was suggested to just make it public.

(and happy to wait if you want to review it)

Drive-by comments about comments (insert "we need to go deeper" meme here)

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
791

Potentially unfinished thought?

797

Did you mean "only move simple (non-atomic, non-volatile)"?

  • Update for review comments on comments
dberlin added inline comments.Jun 22 2016, 1:40 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
982

This is now fixed in the docs for those functions

Gerolf added a subscriber: Gerolf.Jun 22 2016, 7:07 PM
Gerolf added inline comments.
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
297

Loc0,1 should be moved below the conditional. Then they are definitely needed.

346

This is a MSSA update problem. How about
if (MSSA && MSSA->update(HoistCan, HoistedInst))

MSSA->removeMemoryAccess(ElseInst);
542

You could remove the conditional here and simply return from updateMemorySSA when MSSA is null.

617

How about: Visit blocks in DFS order and number instructions. For each block track first hoist barrier and last sink barrier.

745

I think this is the only place where you need UseMemorySSA. All other instances can be handled by if (MSSA).

1032

Shouldn't updates be owned by MSSA rather than each client?

dberlin marked 10 inline comments as done.Jun 22 2016, 11:54 PM
dberlin added inline comments.
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
346

As mentioned, if you look early enough in the thread on this, you can see at one point it was, in fact that API, and i was explicitly asked to make simpler apis instead of these kinds of composed ones :)

The way it is now is also consistent with how we do updates for everything else both SSA related (where you do exactly what you see below), and analysis related, in LLVM.
(IE memdep has you call invalidateCachedPointerInfo and such, dominators requires you set and redirect dominators on your own)

In fact, if memoryssa was part of the IR, there would be no difference between the way this code looks and the rest of the function (which is updating the IR) looks . Because it's a side datastructure the naming is a little different and we haven't built all the same utilities yet.

Additonally, the kind of API you suggest above would have wildly varying complexity and is overkill vs building utilities that handle the common cases, which is what we do elsewhere for ssa/dom/etc updates.

I'm more than happy to commit to building those utilities as we come across common cases.

1032

(replying here since phab is not good at processing email like this):

That would be inconsistent with what we do in the rest of LLVM :)
MemorySSA is an SSA form, and like SSA in LLVM, passes are responsible for keeping SSA up to date. This is also true of most analysis preservation.
You can see that splitCriticalEdge takes a memdep argument, for example, and will update memdep instead of memdep updating itself.
Same with dominators. Passes that want to preserve Dom are responsible for redirecting the dominators where they should go. They can screw it up, too!

Doing general updates (IE "I have no idea what happened, you go find out and fix it") is also expensive, and at least for every pass i've converted so far, completely unnecessary.

I also had more functional updaters (IE replaceAccessWithHoistedAccess) in memoryssa at earlier points, but if you look earlier in the review, you'll see others wanted it to be simple functionality instead.
In retrospect, i agree.
We can build a general updater if we need it, and we can build utilities to handle common updates if we need it. So far, it hasn't been necessary.

At some point, the better question is really "why is memoryssa a side data structure instead of just part of the normal IR". But that's a question for another day.

  • Update for review comments
dberlin added inline comments.Jun 23 2016, 10:19 AM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
1032

(and, just to also point out, the update algorithm we use is specific to diamonds, and would not work for more general control flow. It also would not work without the set of legality testing, etc, we perform elsewhere in the code )

I'll take a more in depth look into the load and store merge routines also. At a first glance it seems one could just add a few MSSA hooks rather than copy-paste-modify the code base.
I also take a (late :-() look at the core MSSA design. Could you outline the high-level MSSA design in more detail than http://reviews.llvm.org/D7864?Eg. how does it compare/relate to http://www.airs.com/dnovillo/Papers/mem-ssa.pdf? Thanks!

-Gerolf

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
346

Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative approach you propose. I'm surprised though that a simple replacement of a load has not come up elsewhere.

1032

What got me here is that I expected the update to be simple. Why is here more involved than replacing two MD's with one, possibly removing a MP? From the clients perspective updates like this look too complicated to get right. Maybe there is verifier in place, but in the code I don't even see a single assertion but could help debugging issues should something go wrong. Perhaps we can come up with some utility routines that make simpler to grasp: when two memory or more memory operation get moved, this is the blueprint for updating MSSA and guarantee it stays consistent.

1099

There could be a lambda or function that takes care of the two loops.

Also, ust to give you some idea of general update complexity: If you don't
bound the problem with some common cases, it's simply not possible to make
it faster than current creation (Ie it would be simpler/easier to not
preserve memoryssa in those cases).

For the common cases, assuming you tell us what variables to remove and
what variables were added in what blocks:

  1. If you only removed and replaced variables (Ie literally no code motion

or insertion), it requires the renaming pass only. It could be further
optimized to not redo use chain optimization for memoryuses if you can
guarantee you only touched memoryuses. For memorydefs, if you can guarantee
(and you probably can't) that none of the aa relationships have changed,
you would not have to redo use chain opt there either. You can't guarantee
this because basicaa, et al have walkers with random hard limits in them,
and so you may get better results simply by reducing the length of a chain
of things it had to walk, even if there was in fact, no aliasing either way.

  1. if the cfg has not changed, and you have added/removed variables: if you

added memoryuses, if we stay with pruned ssa, we have to recompute the IDF.
We could minimize the size we calculate by tracking some things related to
where the last uses in the program are relative to the iterated dominance
frontier. It's complicated however.
After recomputing IDF, we also have to redo renaming. For inserting
memoryuses, if we tracked the first/last version of a block (and updated
it through the creation/removal apis), we only have to rename parts of the
dom tree where the first/last version changes.

(IE if we changed first version in a block due to phi node insertion, but
last version is the same, we do not have to rename children of that block.
if we changed last version but not first version, we only have to rename
children but not the current block)
For inserting stores, it's something close to this but a bit more
complicated.

  1. if the cfg has changed, without tracking a lot more under the covers

related to merge/join sets, it's not possible to make it faster than doing
it from scratch.

The TL;DR of all this is that a general update mechanism is very hard to
make faster than for scratch without giving it all kinds of specific info,
and usually figuring out that info is as hard as figuring out the update
algorithms, in my experience.

gberry added a comment.Jul 5 2016, 8:48 AM

Sorry if this isn't the right place, but I have a general question about what it means to preserve MemorySSA, specifically regarding the defining accesses of MemoryUse nodes. Is the idea here that we make a best effort to keep the MemoryUse defining access links optimized (i.e. never pointing to a no-alias def)? Because of the limits of basic-aa, it isn't possible to guarantee this property after any code transformation, even in the limited case here, since the alias results for completely unrelated load/stores may have been affected, right?

Gerolf added inline comments.Jul 5 2016, 8:42 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
809

The code from here could be a function. I prefer a function to fit on my screen.

832

Why do you need this? When the first condition is false the while loop does not execute. But then SafeToLoadUnconditionally is not needed anyway

848

Within a block I thought the DFS numbers are
0
1
2
...
So I would expect loop up(load) < BBHoistBarrier. I'm probably wrong about the DFSs.

850

Load1 is loop invariant and the continue condition can be computed in the header.

862

That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could still be Load0.

Also, not hoisting above loads that had not been hoisted is too restrictive. With better/cheaper DA this optimization can be more aggressive. Perhaps this could be a FIXME.

dberlin added inline comments.Jul 5 2016, 10:45 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
809

Fixed

832

It's moved here because it's loop-invariant. That's also why the check matches the loop check, to avoid calculating it if the loop won't execute :)
Otherwise, we would calculate it on every loop iteration, when it doesn't change.

848

You are thinking about it backwards i think.
The dfs numbers are indeed as you list them

If the block looks like this:

0
1 - load barrier
2 - load

We can't hoist.

If it's
0
1 - load
2 - load barrier

We can hoist.

So if DFS(load) > DFS(load barrier), that means the load barrier is *above* us in the block (and we come *after* the load barrier), and we can't hoist past it.

850

Fixed

862
  1. Yes, i'll fix.
  2. This code was copied from mergeLoads originally. It is indeed too restrictive (there are a lot of random restrictions we could relax). I'll add a FIXME.

Update for review comments

A few more comments/questions below. I still have a few parts I haven't fully grasped yet, but don't let me hold up approval.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
93

vector doesn't appear to be used?

158

Can't you just compute these instruction indices for the diamond then/else blocks lazily? The same goes for LastSinkBarrier, which wouldn't even need a map in that case.

773

Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use it for the Load0 hash table insertion as well?

801

Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though.

999

Is there a reason you're using a queue for this work-list instead of the more common SmallVector?

dberlin marked an inline comment as done.Jul 6 2016, 5:33 PM

I'll be honest. I think we are going a bit overboard for something that is off by default and is meant to be an identical algorithm to an existing pass that we know can stand improvements.

While I am all for trying to get code to be as good as possible, i don't think we need to make it completely perfect on the first pass when it's easily subject to incremental improvements :)

IE i think there is some point that is "good enough" to start, particularly for things that we can improve before we flip them to be the default.

For all we know, changes we have to make to account for converting the other passes are going to also make us change code here anyway.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
93

fixed

158

Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right now. It did not seem worth it for a first pass to build something to do it :)
Remember this is supposed to be an NFC conversion to start, and building the barrier maps already converts what was N^2 before to N.

773

Let us please not do that in this patch.
Doing that would change this algorithm to catch a different set of loads than the non-memoryssa version.
I would like to keep the algorithms identical in what they catch for the moment, so debugging is easy.

Eventually, when the mssa version is the default and the non-mssa version killed, we can improve it to be whatever we want.

801

If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :)

Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones.

999

We push back and pull front because that exploration order guarantees we terminate at the earliest possible point.

If we popped off the back, it would be an order that would explore lower things before higher things.

Gerolf added inline comments.Jul 6 2016, 8:11 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
832

Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the) while condition twice.

848

Hm, it looks alright today :-). Thanks.

891

Footer -> Tail in error message

909

There is similar code in mergeLoads. That could be a utility function that returns false when there is no memory access in at least one of the blocks, e.g.. static bool hasMemoryAccess(BasicBlock *B1, BasicBlock *B2).

931

Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for NLoads there.

935

Could be a separate function:

952

Please add a comment about what the equal stores are.

956

Since it is sinking stores would it speed up the code reversing the loop traversal (from second to first)?

981

A comment would be nice why the accesses get cleared here. Or would that fit better in sinkStores?

990

Typo: explosed

1005

What is a good invariant for this loop? Like there shouldn't be more checks than memory references in the code that contains Start? A checked invariant would make me more comfortable with this code.

1021

That looks very wrong. At the minimum this is a convoluted equality test.

1054

I continue struggling with this function. For now I just need some help understanding the comment.
When you say "above" S0, S1 you refer to DFS numbering like Sx is above S0 in this example:

0 Sx
1
2 S0

Correct? This would be consistent with your DFS explanation in a previous explanation.

1056

What is the "top of the diamond"? The head?

1059

What is the "sink location block"? The Tail?

1062

What is "the block"? Perhaps "uses in the tail and/or below" describes it better.

1063

Would it make sense to clearly distinguish phi and memory phi (mphi) nodes?

dberlin abandoned this revision.Jul 6 2016, 9:21 PM
dberlin marked an inline comment as done.
gberry added a comment.Jul 8 2016, 2:00 PM

I'm personally fine with committing this as is since it's is disabled, though I don't feel I have the authority to approve it.

I do have a couple more comments/questions below, mainly to help me gain confidence that my understanding of MemorySSA is correct.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
801

Is that right (the bit about all the loads being equal)?

I'm thinking of a case like the below where all of the loads pass isSameOperation and all have the call as their clobbering access. In this case you'll end up comparing all pairs of loads (assuming none of the loads must alias).

call foo
br %c, %then, %else

then:
load i32, %p1
load i32, %p2
load i32, %p3
...
br %end

else:
load i32, %p4
load i32, %p5
load i32, %p6
...
br %end

end:

1013

Do you even need the domtree info here to check for barriers in a diamond? There are references to hammocks, but as far as I can tell we never actually attempt to optimize hammocks. If that is the case, couldn't this just check for barriers in the same block as Start?

1054

I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of the diamond that references just one of the sunk stores.
For example:

; 1 = MemDef ...
call foo
br %c %then, %else

then:
; 2 = MemDef(1)
store @A
; 3 = MemDef(2)
store @B
br %end

else:
; 4 = MemDef(1)
store @A
br %end

end:
5 = MemPhi(3, 4)

I believe in this case, after updating the end block will look like:

end:
5 = MemPhi(3, 6)
; 6 = MemDef(1)
store @A

Which seems wrong, since the phi is before it's use, but also it seems like having 6's defining access skip over the phi 5 and def 3 could cause trouble, though I'm less sure about the latter.

Let me clarify my last comment, since I hit send too early. This was a bug
i already fixed. Case 1 should also be checking that there are no stores
below. If there are, it should be doing case 2. In essence, the check for
case two should happen before the check for case 1 (since it's not possible
to have stores below without a phi node existing)

This will cause your phi node to get changed to memphi(3, 1), and do the
right thing.

I have a testcase for this.

davide added a subscriber: davide.Dec 26 2016, 7:02 AM

hrrrm, are there any real showstoppers to get this one in? Maybe Dan has no more time time to work on it but this patch already constitutes a strong basis on which we can iterate and it's also disabled by default.
@Gerolf , any opinions?