This is an archive of the discontinued LLVM Phabricator instance.

Update MergedLoadStoreMotion to use MemorySSA
AbandonedPublic

Authored by • dberlin on Mar 28 2015, 5:43 PM.

Download Raw Diff

Details

Reviewers

reames
gberry
nlewycky
hfinkel

Summary

This updates MergedLoadStoreMotion to use and preserve MemorySSA.
It depends on D7864

Prior to this, the algorithm for loads was N^2, and for stores, it was N^2R, where N is the number of instructions in
the block, and R is the number of removed stores
(It restarted the reverse walk every time it removed a store due to iterator invalidation).

It is now O(M) (where M is the number of memory instructions in the two blocks) for loads,
and O(max(M,S^2)) for stores (because we have no downwards clobbering API yet, the hash table does not help
us determine memory dependence for our uses).

I have deliberately not changed behavior in terms of what loads/stores it will remove or the compile time
controls, in order to make the change minimal.

(Hopefully phabricator will not screwup a revision that is a branch of a branch,
arc diff --preview showed the right stuff)

Diff Detail

Event Timeline

• dberlin updated this revision to Diff 22847.Mar 28 2015, 5:43 PM

• dberlin retitled this revision from to Update MergedLoadStoreMotion to use MemorySSA.

• dberlin updated this object.

• dberlin edited the test plan for this revision. (Show Details)

• dberlin added reviewers: hfinkel, reames, nlewycky.

• dberlin added a parent revision: D7864: This patch introduces MemorySSA, a virtual SSA form for memory. Details on what it looks like are in MemorySSA.h.

• dberlin added a subscriber: Unknown Object (MLST).

Delete dead functions
Trigger an update so phabricator sends out this review, since it got eaten during sendgrid issues

Update for MemorySSA API Update

FYI, I'm basically reviewing this as a new algorithm. Trying to match it up with what's there before doesn't seem particularly worth doing given the algorithmic problems you commented on.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
148	Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code would be cleaner. Also, I'm surprised that a generic instruction hash function doesn't already exist. What does GVN use for this?
159	Is the extra A == B check needed? I would expect that to be inside isSameOperationAs.
362	Not quite clear what you're trying to say with this comment? The code appears to just be updating MemorySSA for the transform performed.
445	I'm a bit confused here. If we found the set of load instructions which share the generation number at the start of a block and we're merging two blocks with a single shared successor, why do we need any further legality checks? Aren't all the loads guaranteed to have the generation number at the end of the common block? Are you possibly trying to do two generations at once here? Or something else more complex?
547	Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may be completely correct, but it looks suspicious when compared to instruction insertion into a basic block.
580	Given this is a linear traversal of Pred0, does the reserve really gain you anything here?
589	Neat!
589–590	Ok, I'm again a bit confused by the algorithm. :) What I expected to see was something along the following: Find store in each block which is a candidate for merge (using memory-def edges from MemoryPhi) Merge any loads in those basic blocks which use the generation if possible. Search downward from each store to find a conflicting memory access or end of block. (This could either by by walking instructions or mem-use edges.) If both reach end of block, merge. Repeat previous steps with newly created MemoryPhi for earlier stores if any.

During building, MemorySSA currently optimizes uses to the nearest
dominating clobbering access for you, done at the suggestion of 2
people. We could make it stop doing this (it's just a flag), but the
tradeoff is most things end up walking more in practice.
Additionally, as things optimize, and you update memssa, you will
often end up with the *same result* anyway. The only way to stop that
is to make it so you are required to use "nearest dominating
memorydef" instead of "dominating memorydef" when updating. We do
this in the API's where we do insertion (addNewMemoryUse), but we
don't check/verify it in replaceMemoryAccess style API's, we only
check standard domination.

The other tradeoff, btw, is that while doing MemoryUse optimization
while building makes this particular algorithm slightly harder, it
makes an (IMHO) much more useful thing a lot easier.

A lot of passes (GVN, memcpyopt, etc) want to know whether they can
replace a load with a load to eliminate something:

1= MemoryDef(liveonentry)
store a
2 = MemoryDef(1)
store b
3 = MemoryDef(2)
store c
4 = MemoryDef(3)
store d
MemoryUse(1)
load a, 4 bytes
load a, 8 bytes
(or different types, or whatever)

By optimizing MemoryUse's during building, we are guaranteed that if
we do getMemoryAccess(load A)->definition(which is store a)->uses, or
we now have all loads that actually use *that* store's value. This
means we can look at the other loads of that store value and see if we
can reuse them.

Doing this without use optimization of uses would be much harder.
you'd need to use a downwards API from the store to get all possible
real uses, which may encompass walking the entire program.

On the other hand, the tradeoff of doing MemoryUse optimization is
that it means to get all the loads in a block, you really have to ask
for all the loads in the block, instead of trying to hope you have
chains that give it to you :)

Update for new insertion API. Clean up comments a bit
Update to walk stores

This now uses an immediate use walker rather than the alias ABI to determine downward exposure of stores

Waiting for update or possibly submission. Have lost track.

This revision now requires changes to proceed.Oct 8 2015, 10:28 AM

tvvikram added a subscriber: tvvikram.Jan 2 2016, 10:15 AM

mcrosier added a subscriber: mcrosier.Jun 14 2016, 8:57 AM

I have a large update coming to this, was just trying to make a sane update
API (the rest is easy).

mcrosier added a reviewer: gberry.Jun 14 2016, 9:13 AM

Update to work in progress
This will work for loads, stores coming up.

This also includes the memoryssa update API, which will be split out and committed separately

george.burgess.iv added a subscriber: george.burgess.iv.Jun 14 2016, 11:21 AM

Add support for stores. Code generated is correct but updating of memoryssa is wrong and will be fixed

Add verification for completeness of PHI nodes
Add basic PHI creation API
Update MemorySSA for store sinking

This version should correctly update memoryssa for store sinking :)
(I checked that it does on the llvm testcases, i'll test it further as i
split out the memoryssa part)

• dberlin added inline comments.Jun 15 2016, 12:10 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
148	To answer the original question, i'm checking to see about lambdas. No generic Instruction hasher exists that covers precisely the set of stuff we care about. GVN does what newgvn does, which is it defines its own expression class, shoves operands in those classes, and calls hash_combine on the whole shebang: friend hash_code hash_value(const Expression &Value) { return hash_combine( Value.opcode, Value.type, hash_combine_range(Value.varargs.begin(), Value.varargs.end())); } The default hasher will end up hashing the pointer values themselves, which is what gvn wanted in this case. It is not useful, however, for hashing "parts of operations you care about", or "do two operations look pretty similar". Past that, i did not change the things included in the hash to ensure consistency of what is optimized with the existing code.
159	It is not, it goes straight into comparing all sorts of fields/data.

A couple of minor comments I noticed when reading through this for my own edification:

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
615	It seems like you forgot to finish this sentence.
973	typo: "MemoryDefwith"
985	typo: "replaceALlUsesWith"

Update MergedLoadStoreMotion to optionally use MemorySSA
Add support for stores. Code generated is correct but updating of memoryssa is wrong
Add verification for completeness of PHI nodes
Add basic PHI creation API
Update MemorySSA for store sinking
Fix typos

rebased

• dberlin added a parent revision: D21463: Add MemoryAccess creation and PHI creation APIs to MemorySSA.Jun 17 2016, 1:34 AM

A few comments in passing.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
98	Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption that other passes will have a command-line option to opt into using MemorySSA? Alternatively, we could include the option in the PassManagerBuilder and use this single flag to control the use of MemorySSA for all passes during the transition period. Just my 2c.
176–177	Should the DominatorTree be added as preserved?
177	Perhaps I missed something, but why is TargetLibraryInfo required?
541	No need for extra curly brackets.
704	but -> put
719	Pre-increment here?
726	Any reason we don't just increment in the for statement itself?
731	Please add a period.
815	Please add a period.

• dberlin marked 10 inline comments as done.Jun 17 2016, 8:26 AM

• dberlin added inline comments.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
98	I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is). A single flag would be nice, but would make it much harder to turn memoryssa on and off for each pass, and the point of the flag is to be able to test/debug regressions until we are satisified and get rid of the non-memoryssa version. (ala what happened with SROA, etc and the ssaupdater)
176–177	This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only
177	It is not, this is a merge error.
719	Not sure what line this is supposed to go with, i can't find a post-increment around :)
726	Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating the iterator. It turns out to be non-trivial to change this, but i never moved this loop back to account for this.

• dberlin added inline comments.Jun 17 2016, 8:41 AM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
726	Actually, it can't be done, we still invalidate the current iterator for loads. I'll document this.

Update for requested changes

mcrosier added inline comments.Jun 17 2016, 8:54 AM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
98	I agree we should have a flag per pass to simplify debugging.
719	Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the if statement. ++NLoads; if (NLoads * Size1 ...) vs if (++NLoads * Size1 ...)

Rebase for passmanager conversion

It looks like this change still contains all of the MemorySSA changes that were split out into D21463

Properly rebase on top of other patch

Should be fixed now.

Would it make sense to split out the DFS numbering code motion barrier code into a separate change?

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
975	Typo: "explosed". Also, there is no 'End' parameter as referenced in the comment.

Rebase after dependent revision commited

Also run verification pass with new pass manager for tests.

Any other comments?
If not, because it's off by default (and it looked like there are no real objections), my plan was to commit it.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
719	Oh, sorry. I copied this code from the non-memoryssa version. (In truth, the code can stand a lot of cleaning up, improvement, and reducing of restrictions)

I'm still digesting this, but here are a couple more nits I noticed.
I don't want to hold up approval from anyone else though.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
754	Isn't MemorySSA only preserved if 'UseMemorySSA' is true?
980	getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not call them. which seems to conflict with this usage.

• dberlin added inline comments.Jun 22 2016, 12:54 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
754	Yes, fixed.
980	I will change the docs for them in a separate patch. The change to make updateDFSNumbers public, etc, was done specifically to enable this usage. See the thread titlte "[PATCH] Make updateDFSNumbers API public" from last year, and there was a review where i computed them separately, and it was suggested to just make it public.

(and happy to wait if you want to review it)

Drive-by comments about comments (insert "we need to go deeper" meme here)

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
789	Potentially unfinished thought?
795	Did you mean "only move simple (non-atomic, non-volatile)"?

Update for review comments on comments

• dberlin added inline comments.Jun 22 2016, 1:40 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
980	This is now fixed in the docs for those functions

Gerolf added a subscriber: Gerolf.Jun 22 2016, 7:07 PM

Gerolf added inline comments.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
308	Loc0,1 should be moved below the conditional. Then they are definitely needed.
347	This is a MSSA update problem. How about if (MSSA && MSSA->update(HoistCan, HoistedInst)) MSSA->removeMemoryAccess(ElseInst);
541	You could remove the conditional here and simply return from updateMemorySSA when MSSA is null.
615	How about: Visit blocks in DFS order and number instructions. For each block track first hoist barrier and last sink barrier.
743	I think this is the only place where you need UseMemorySSA. All other instances can be handled by if (MSSA).
1030	Shouldn't updates be owned by MSSA rather than each client?

• dberlin marked 10 inline comments as done.Jun 22 2016, 11:54 PM

• dberlin added inline comments.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
347	As mentioned, if you look early enough in the thread on this, you can see at one point it was, in fact that API, and i was explicitly asked to make simpler apis instead of these kinds of composed ones :) The way it is now is also consistent with how we do updates for everything else both SSA related (where you do exactly what you see below), and analysis related, in LLVM. (IE memdep has you call invalidateCachedPointerInfo and such, dominators requires you set and redirect dominators on your own) In fact, if memoryssa was part of the IR, there would be no difference between the way this code looks and the rest of the function (which is updating the IR) looks . Because it's a side datastructure the naming is a little different and we haven't built all the same utilities yet. Additonally, the kind of API you suggest above would have wildly varying complexity and is overkill vs building utilities that handle the common cases, which is what we do elsewhere for ssa/dom/etc updates. I'm more than happy to commit to building those utilities as we come across common cases.
1030	(replying here since phab is not good at processing email like this): That would be inconsistent with what we do in the rest of LLVM :) MemorySSA is an SSA form, and like SSA in LLVM, passes are responsible for keeping SSA up to date. This is also true of most analysis preservation. You can see that splitCriticalEdge takes a memdep argument, for example, and will update memdep instead of memdep updating itself. Same with dominators. Passes that want to preserve Dom are responsible for redirecting the dominators where they should go. They can screw it up, too! Doing general updates (IE "I have no idea what happened, you go find out and fix it") is also expensive, and at least for every pass i've converted so far, completely unnecessary. I also had more functional updaters (IE replaceAccessWithHoistedAccess) in memoryssa at earlier points, but if you look earlier in the review, you'll see others wanted it to be simple functionality instead. In retrospect, i agree. We can build a general updater if we need it, and we can build utilities to handle common updates if we need it. So far, it hasn't been necessary. At some point, the better question is really "why is memoryssa a side data structure instead of just part of the normal IR". But that's a question for another day.

Update for review comments

• dberlin added inline comments.Jun 23 2016, 10:19 AM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
1030	(and, just to also point out, the update algorithm we use is specific to diamonds, and would not work for more general control flow. It also would not work without the set of legality testing, etc, we perform elsewhere in the code )

I'll take a more in depth look into the load and store merge routines also. At a first glance it seems one could just add a few MSSA hooks rather than copy-paste-modify the code base.
I also take a (late :-() look at the core MSSA design. Could you outline the high-level MSSA design in more detail than http://reviews.llvm.org/D7864?Eg. how does it compare/relate to http://www.airs.com/dnovillo/Papers/mem-ssa.pdf? Thanks!

-Gerolf

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
347	Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative approach you propose. I'm surprised though that a simple replacement of a load has not come up elsewhere.
1030	What got me here is that I expected the update to be simple. Why is here more involved than replacing two MD's with one, possibly removing a MP? From the clients perspective updates like this look too complicated to get right. Maybe there is verifier in place, but in the code I don't even see a single assertion but could help debugging issues should something go wrong. Perhaps we can come up with some utility routines that make simpler to grasp: when two memory or more memory operation get moved, this is the blueprint for updating MSSA and guarantee it stays consistent.
1097	There could be a lambda or function that takes care of the two loops.

Also, ust to give you some idea of general update complexity: If you don't
bound the problem with some common cases, it's simply not possible to make
it faster than current creation (Ie it would be simpler/easier to not
preserve memoryssa in those cases).

For the common cases, assuming you tell us what variables to remove and
what variables were added in what blocks:

If you only removed and replaced variables (Ie literally no code motion

or insertion), it requires the renaming pass only. It could be further
optimized to not redo use chain optimization for memoryuses if you can
guarantee you only touched memoryuses. For memorydefs, if you can guarantee
(and you probably can't) that none of the aa relationships have changed,
you would not have to redo use chain opt there either. You can't guarantee
this because basicaa, et al have walkers with random hard limits in them,
and so you may get better results simply by reducing the length of a chain
of things it had to walk, even if there was in fact, no aliasing either way.

if the cfg has not changed, and you have added/removed variables: if you

added memoryuses, if we stay with pruned ssa, we have to recompute the IDF.
We could minimize the size we calculate by tracking some things related to
where the last uses in the program are relative to the iterated dominance
frontier. It's complicated however.
After recomputing IDF, we also have to redo renaming. For inserting
memoryuses, if we tracked the first/last version of a block (and updated
it through the creation/removal apis), we only have to rename parts of the
dom tree where the first/last version changes.

(IE if we changed first version in a block due to phi node insertion, but
last version is the same, we do not have to rename children of that block.
if we changed last version but not first version, we only have to rename
children but not the current block)
For inserting stores, it's something close to this but a bit more
complicated.

if the cfg has changed, without tracking a lot more under the covers

related to merge/join sets, it's not possible to make it faster than doing
it from scratch.

The TL;DR of all this is that a general update mechanism is very hard to
make faster than for scratch without giving it all kinds of specific info,
and usually figuring out that info is as hard as figuring out the update
algorithms, in my experience.

Ping!

Sorry if this isn't the right place, but I have a general question about what it means to preserve MemorySSA, specifically regarding the defining accesses of MemoryUse nodes. Is the idea here that we make a best effort to keep the MemoryUse defining access links optimized (i.e. never pointing to a no-alias def)? Because of the limits of basic-aa, it isn't possible to guarantee this property after any code transformation, even in the limited case here, since the alias results for completely unrelated load/stores may have been affected, right?

Gerolf added inline comments.Jul 5 2016, 8:42 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
807	The code from here could be a function. I prefer a function to fit on my screen.
830	Why do you need this? When the first condition is false the while loop does not execute. But then SafeToLoadUnconditionally is not needed anyway
846	Within a block I thought the DFS numbers are 0 1 2 ... So I would expect loop up(load) < BBHoistBarrier. I'm probably wrong about the DFSs.
848	Load1 is loop invariant and the continue condition can be computed in the header.
860	That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could still be Load0. Also, not hoisting above loads that had not been hoisted is too restrictive. With better/cheaper DA this optimization can be more aggressive. Perhaps this could be a FIXME.

• dberlin added inline comments.Jul 5 2016, 10:45 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
807	Fixed
830	It's moved here because it's loop-invariant. That's also why the check matches the loop check, to avoid calculating it if the loop won't execute :) Otherwise, we would calculate it on every loop iteration, when it doesn't change.
846	You are thinking about it backwards i think. The dfs numbers are indeed as you list them If the block looks like this: 0 1 - load barrier 2 - load We can't hoist. If it's 0 1 - load 2 - load barrier We can hoist. So if DFS(load) > DFS(load barrier), that means the load barrier is above us in the block (and we come after the load barrier), and we can't hoist past it.
848	Fixed
860	Yes, i'll fix. This code was copied from mergeLoads originally. It is indeed too restrictive (there are a lot of random restrictions we could relax). I'll add a FIXME.

Update for review comments

A few more comments/questions below. I still have a few parts I haven't fully grasped yet, but don't let me hold up approval.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
93	vector doesn't appear to be used?
158	Can't you just compute these instruction indices for the diamond then/else blocks lazily? The same goes for LastSinkBarrier, which wouldn't even need a map in that case.
771	Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use it for the Load0 hash table insertion as well?
799	Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though.
997	Is there a reason you're using a queue for this work-list instead of the more common SmallVector?

I'll be honest. I think we are going a bit overboard for something that is off by default and is meant to be an identical algorithm to an existing pass that we know can stand improvements.

While I am all for trying to get code to be as good as possible, i don't think we need to make it completely perfect on the first pass when it's easily subject to incremental improvements :)

IE i think there is some point that is "good enough" to start, particularly for things that we can improve before we flip them to be the default.

For all we know, changes we have to make to account for converting the other passes are going to also make us change code here anyway.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
93	fixed
158	Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right now. It did not seem worth it for a first pass to build something to do it :) Remember this is supposed to be an NFC conversion to start, and building the barrier maps already converts what was N^2 before to N.
771	Let us please not do that in this patch. Doing that would change this algorithm to catch a different set of loads than the non-memoryssa version. I would like to keep the algorithms identical in what they catch for the moment, so debugging is easy. Eventually, when the mssa version is the default and the non-mssa version killed, we can improve it to be whatever we want.
799	If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :) Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones.
997	We push back and pull front because that exploration order guarantees we terminate at the earliest possible point. If we popped off the back, it would be an order that would explore lower things before higher things.

Gerolf added inline comments.Jul 6 2016, 8:11 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
830	Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the) while condition twice.
846	Hm, it looks alright today :-). Thanks.
889	Footer -> Tail in error message
907	There is similar code in mergeLoads. That could be a utility function that returns false when there is no memory access in at least one of the blocks, e.g.. static bool hasMemoryAccess(BasicBlock B1, BasicBlock B2).
929	Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for NLoads there.
933	Could be a separate function:
950	Please add a comment about what the equal stores are.
954	Since it is sinking stores would it speed up the code reversing the loop traversal (from second to first)?
979	A comment would be nice why the accesses get cleared here. Or would that fit better in sinkStores?
988	Typo: explosed
1003	What is a good invariant for this loop? Like there shouldn't be more checks than memory references in the code that contains Start? A checked invariant would make me more comfortable with this code.
1019	That looks very wrong. At the minimum this is a convoluted equality test.
1052	I continue struggling with this function. For now I just need some help understanding the comment. When you say "above" S0, S1 you refer to DFS numbering like Sx is above S0 in this example: 0 Sx 1 2 S0 Correct? This would be consistent with your DFS explanation in a previous explanation.
1054	What is the "top of the diamond"? The head?
1057	What is the "sink location block"? The Tail?
1060	What is "the block"? Perhaps "uses in the tail and/or below" describes it better.
1061	Would it make sense to clearly distinguish phi and memory phi (mphi) nodes?

• dberlin abandoned this revision.Jul 6 2016, 9:21 PM

• dberlin marked an inline comment as done.

I'm personally fine with committing this as is since it's is disabled, though I don't feel I have the authority to approve it.

I do have a couple more comments/questions below, mainly to help me gain confidence that my understanding of MemorySSA is correct.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
799	Is that right (the bit about all the loads being equal)? I'm thinking of a case like the below where all of the loads pass isSameOperation and all have the call as their clobbering access. In this case you'll end up comparing all pairs of loads (assuming none of the loads must alias). call foo br %c, %then, %else then: load i32, %p1 load i32, %p2 load i32, %p3 ... br %end else: load i32, %p4 load i32, %p5 load i32, %p6 ... br %end end:
1011	Do you even need the domtree info here to check for barriers in a diamond? There are references to hammocks, but as far as I can tell we never actually attempt to optimize hammocks. If that is the case, couldn't this just check for barriers in the same block as Start?
1052	I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of the diamond that references just one of the sunk stores. For example: ; 1 = MemDef ... call foo br %c %then, %else then: ; 2 = MemDef(1) store @A ; 3 = MemDef(2) store @B br %end else: ; 4 = MemDef(1) store @A br %end end: 5 = MemPhi(3, 4) I believe in this case, after updating the end block will look like: end: 5 = MemPhi(3, 6) ; 6 = MemDef(1) store @A Which seems wrong, since the phi is before it's use, but also it seems like having 6's defining access skip over the phi 5 and def 3 could cause trouble, though I'm less sure about the latter.

Let me clarify my last comment, since I hit send too early. This was a bug
i already fixed. Case 1 should also be checking that there are no stores
below. If there are, it should be doing case 2. In essence, the check for
case two should happen before the check for case 1 (since it's not possible
to have stores below without a phi node existing)

This will cause your phi node to get changed to memphi(3, 1), and do the
right thing.

I have a testcase for this.

hrrrm, are there any real showstoppers to get this one in? Maybe Dan has no more time time to work on it but this patch already constitutes a strong basis on which we can iterate and it's also disabled by default.
@Gerolf , any opinions?

• dberlin mentioned this in D46600: [MergedLoadStoreMotion] Fix a debug invariant bug in mergeStores.May 8 2018, 2:50 PM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

MergedLoadStoreMotion.cpp

619 lines

test/

Transforms/

InstMerge/

exceptions.ll

3 lines

ld_hoist1.ll

2 lines

ld_hoist_st_sink.ll

2 lines

st_sink_barrier_call.ll

2 lines

st_sink_bugfix_22613.ll

1 line

st_sink_no_barrier_call.ll

2 lines

st_sink_no_barrier_load.ll

4 lines

st_sink_no_barrier_store.ll

4 lines

st_sink_two_stores.ll

2 lines

st_sink_with_barrier.ll

2 lines

Diff 62825

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/MemorySSA.h"
		#include <queue>
		#include <unordered_set>
		#include <vector>
		gberryUnsubmitted Done Reply Inline Actions vector doesn't appear to be used? gberry: vector doesn't appear to be used?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions fixed dberlin: fixed

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "mldst-motion"		#define DEBUG_TYPE "mldst-motion"
		static cl::opt<bool>
		mcrosierUnsubmitted Not Done Reply Inline Actions Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption that other passes will have a command-line option to opt into using MemorySSA? Alternatively, we could include the option in the PassManagerBuilder and use this single flag to control the use of MemorySSA for all passes during the transition period. Just my 2c. mcrosier: Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is). A single flag would be nice, but would make it much harder to turn memoryssa on and off for each pass, and the point of the flag is to be able to test/debug regressions until we are satisified and get rid of the non-memoryssa version. (ala what happened with SROA, etc and the ssaupdater) dberlin: I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is). A single…
		mcrosierUnsubmitted Not Done Reply Inline Actions I agree we should have a flag per pass to simplify debugging. mcrosier: I agree we should have a flag per pass to simplify debugging.
		UseMemorySSA("use-memoryssa-mldst", cl::Hidden,
		cl::desc("Use MemorySSA for MergedLoadStoreMotion"));

		//===----------------------------------------------------------------------===//
		// MergedLoadStoreMotion Pass
		//===----------------------------------------------------------------------===//

		namespace {

		// This hash matches the most common things isSameOperationAs checks. It must
		// always be a subset or equal to isSameOperationAs for everything to function
		// properly.

		const auto StoreInstHash = [](const StoreInst *A) {
		return hash_combine(A->getType(), A->getPointerOperand()->getType(),
		A->getValueOperand()->getType(), A->isVolatile(),
		A->getAlignment(), A->getOrdering(), A->getSynchScope());
		};

		const auto StoreInstEq = [](const StoreInst A, const StoreInst B) {
		return (A == B \|\| A->isSameOperationAs(B));
		};

		typedef std::pair<LoadInst , const MemoryAccess > LoadPair;

		// This hash has two parts. The first is the instruction, the second, the
		// clobbering MemorySSA access. For the instruction part, this hash matches the
		// most common things isSameOperationAs checks. It must always be a subset or
		// equal to isSameOperationAs for everything to function properly.

		const auto LoadPairHash = [](const LoadPair &LP) {
		const LoadInst *A = LP.first;
		return hash_combine(A->getType(), A->getPointerOperand()->getType(),
		A->isVolatile(), A->getAlignment(), A->getOrdering(),
		A->getSynchScope(), LP.second);
		};
		const auto LoadPairEq = [](const LoadPair &A, const LoadPair &B) {
		return (A.second == B.second &&
		(A.first == B.first \|\| A.first->isSameOperationAs(B.first)));
		};
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MergedLoadStoreMotion Pass		// MergedLoadStoreMotion Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
class MergedLoadStoreMotion {		class MergedLoadStoreMotion {
MemoryDependenceResults *MD = nullptr;		MemoryDependenceResults *MD = nullptr;
AliasAnalysis *AA = nullptr;		AliasAnalysis *AA = nullptr;
		MemorySSA *MSSA = nullptr;
// The mergeLoad/Store algorithms could have Size0 * Size1 complexity,		MemorySSAWalker *CachingWalker = nullptr;
		reamesUnsubmitted Not Done Reply Inline Actions Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code would be cleaner. Also, I'm surprised that a generic instruction hash function doesn't already exist. What does GVN use for this? reames: Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions To answer the original question, i'm checking to see about lambdas. No generic Instruction hasher exists that covers precisely the set of stuff we care about. GVN does what newgvn does, which is it defines its own expression class, shoves operands in those classes, and calls hash_combine on the whole shebang: friend hash_code hash_value(const Expression &Value) { return hash_combine( Value.opcode, Value.type, hash_combine_range(Value.varargs.begin(), Value.varargs.end())); } The default hasher will end up hashing the pointer values themselves, which is what gvn wanted in this case. It is not useful, however, for hashing "parts of operations you care about", or "do two operations look pretty similar". Past that, i did not change the things included in the hash to ensure consistency of what is optimized with the existing code. dberlin: To answer the original question, i'm checking to see about lambdas. No generic Instruction…
// where Size0 and Size1 are the #instructions on the two sides of		DominatorTree *DT = nullptr;
// the diamond. The constant chosen here is arbitrary. Compiler Time		// The non-MemorySSA versions of mergeLoad/Store algorithms could have Size0 *
		// Size1 complexity, where Size0 and Size1 are the #instructions on the two
		// sides of the diamond. The constant chosen here is arbitrary. Compiler Time
// Control is enforced by the check Size0 * Size1 < MagicCompileTimeControl.		// Control is enforced by the check Size0 * Size1 < MagicCompileTimeControl.
const int MagicCompileTimeControl = 250;		const int MagicCompileTimeControl = 250;
		// We DFS number instructions and avoid hoisting or sinking things past may
		// throw instructions and instructions not guaranteed to transfer execution to
		// successors.
		DenseMap<const Instruction *, unsigned> DFSNumberMap;
		gberryUnsubmitted Not Done Reply Inline Actions Can't you just compute these instruction indices for the diamond then/else blocks lazily? The same goes for LastSinkBarrier, which wouldn't even need a map in that case. gberry: Can't you just compute these instruction indices for the diamond then/else blocks lazily? The…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right now. It did not seem worth it for a first pass to build something to do it :) Remember this is supposed to be an NFC conversion to start, and building the barrier maps already converts what was N^2 before to N. dberlin: Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right…
		DenseMap<const BasicBlock *, unsigned> FirstHoistBarrier;
		reamesUnsubmitted Not Done Reply Inline Actions Is the extra A == B check needed? I would expect that to be inside isSameOperationAs. reames: Is the extra A == B check needed? I would expect that to be inside isSameOperationAs.
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions It is not, it goes straight into comparing all sorts of fields/data. dberlin: It is not, it goes straight into comparing all sorts of fields/data.
		DenseMap<const BasicBlock *, unsigned> LastSinkBarrier;
		SmallPtrSet<MemoryAccess *, 8> AccessesToDelete;
		std::unordered_multiset<LoadPair, decltype(LoadPairHash),
		decltype(LoadPairEq)>
		LoadInfo;
		std::unordered_multiset<StoreInst *, decltype(StoreInstHash),
		decltype(StoreInstEq)>
		StoreInfo;

public:		public:
bool run(Function &F, MemoryDependenceResults *MD, AliasAnalysis &AA);		bool run(Function &F, MemoryDependenceResults *MD, AliasAnalysis &AA,
		DominatorTree DT, MemorySSA MSSA);
		MergedLoadStoreMotion()
		: LoadInfo(1, LoadPairHash, LoadPairEq),
		StoreInfo(1, StoreInstHash, StoreInstEq) {}

private:		private:
///		///
		mcrosierUnsubmitted Not Done Reply Inline Actions Perhaps I missed something, but why is TargetLibraryInfo required? mcrosier: Perhaps I missed something, but why is TargetLibraryInfo required?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions It is not, this is a merge error. dberlin: It is not, this is a merge error.
		mcrosierUnsubmitted Not Done Reply Inline Actions Should the DominatorTree be added as preserved? mcrosier: Should the DominatorTree be added as preserved?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only dberlin: This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only
/// \brief Remove instruction from parent and update memory dependence		/// \brief Remove instruction from parent and update memory dependence
/// analysis.		/// analysis.
///		///
void removeInstruction(Instruction *Inst);		void removeInstruction(Instruction *Inst);
BasicBlock getDiamondTail(BasicBlock BB);		BasicBlock getDiamondTail(BasicBlock BB);
bool isDiamondHead(BasicBlock *BB);		bool isDiamondHead(BasicBlock *BB);
// Routines for hoisting loads		// Routines for hoisting loads
bool isLoadHoistBarrierInRange(const Instruction &Start,		bool isLoadHoistBarrierInRange(const Instruction &Start,
const Instruction &End, LoadInst *LI,		const Instruction &End, LoadInst *LI);
bool SafeToLoadUnconditionally);
LoadInst canHoistFromBlock(BasicBlock BB, LoadInst *LI);		LoadInst canHoistFromBlock(BasicBlock BB, LoadInst *LI);
void hoistInstruction(BasicBlock BB, Instruction HoistCand,		void hoistInstruction(BasicBlock BB, Instruction HoistCand,
Instruction *ElseInst);		Instruction *ElseInst);
bool isSafeToHoist(Instruction *I) const;		bool isSafeToHoist(Instruction *I) const;
bool hoistLoad(BasicBlock BB, LoadInst HoistCand, LoadInst *ElseInst);		bool hoistLoad(BasicBlock BB, LoadInst HoistCand, LoadInst *ElseInst);
bool mergeLoads(BasicBlock *BB);		bool mergeLoads(BasicBlock *BB);
		bool mergeLoadsMemorySSA(BasicBlock *BB);
		bool mergeLoadMemorySSA(const MemoryUse Cand, BasicBlock HoistBlock);
// Routines for sinking stores		// Routines for sinking stores
StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);		StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);
PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);		PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);
bool isStoreSinkBarrierInRange(const Instruction &Start,		bool isStoreSinkBarrierInRange(const Instruction &Start,
const Instruction &End, MemoryLocation Loc);		const Instruction &End, MemoryLocation Loc);
		bool isStoreSinkBarrierInRangeMemorySSA(MemoryLocation &Loc,
		MemoryAccess *Start);
bool sinkStore(BasicBlock BB, StoreInst SinkCand, StoreInst *ElseInst);		bool sinkStore(BasicBlock BB, StoreInst SinkCand, StoreInst *ElseInst);
bool mergeStores(BasicBlock *BB);		bool mergeStores(BasicBlock *BB);
		bool mergeStoresMemorySSA(BasicBlock *BB);
		void computeBarriers();
		void updateMemorySSAForSunkStores(BasicBlock SinkBlock, StoreInst S0,
		StoreInst S1, StoreInst SNew);
};		};

///		///
/// \brief Remove instruction from parent and update memory dependence analysis.		/// \brief Remove instruction from parent and update memory dependence analysis.
///		///
void MergedLoadStoreMotion::removeInstruction(Instruction *Inst) {		void MergedLoadStoreMotion::removeInstruction(Instruction *Inst) {
// Notify the memory dependence analysis.		// Notify the memory dependence analysis.
if (MD) {		if (MD) {
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines

///		///
/// \brief True when instruction is a hoist barrier for a load		/// \brief True when instruction is a hoist barrier for a load
///		///
/// Whenever an instruction could possibly modify the value		/// Whenever an instruction could possibly modify the value
/// being loaded or protect against the load from happening		/// being loaded or protect against the load from happening
/// it is considered a hoist barrier.		/// it is considered a hoist barrier.
///		///
bool MergedLoadStoreMotion::isLoadHoistBarrierInRange(		bool MergedLoadStoreMotion::isLoadHoistBarrierInRange(const Instruction &Start,
const Instruction &Start, const Instruction &End, LoadInst *LI,		const Instruction &End,
bool SafeToLoadUnconditionally) {		LoadInst *LI) {
if (!SafeToLoadUnconditionally)
for (const Instruction &Inst :
make_range(Start.getIterator(), End.getIterator()))
if (!isGuaranteedToTransferExecutionToSuccessor(&Inst))
return true;
MemoryLocation Loc = MemoryLocation::get(LI);		MemoryLocation Loc = MemoryLocation::get(LI);
return AA->canInstructionRangeModRef(Start, End, Loc, MRI_Mod);		return AA->canInstructionRangeModRef(Start, End, Loc, MRI_Mod);
}		}

///		///
/// \brief Decide if a load can be hoisted		/// \brief Decide if a load can be hoisted
///		///
/// When there is a load in \p BB to the same address as \p LI		/// When there is a load in \p BB to the same address as \p LI
Show All 12 Lines	for (BasicBlock::iterator BBI = BB1->begin(), BBE = BB1->end(); BBI != BBE;
++BBI) {		++BBI) {
Instruction Inst = &BBI;		Instruction Inst = &BBI;

// Only merge and hoist loads when their result in used only in BB		// Only merge and hoist loads when their result in used only in BB
auto *Load1 = dyn_cast<LoadInst>(Inst);		auto *Load1 = dyn_cast<LoadInst>(Inst);
if (!Load1 \|\| Inst->isUsedOutsideOfBlock(BB1))		if (!Load1 \|\| Inst->isUsedOutsideOfBlock(BB1))
continue;		continue;

		if (!SafeToLoadUnconditionally) {
		// If the first hoist barrier in the block is before the load, we
		// can't hoist.
		unsigned int BB0HoistBarrier = FirstHoistBarrier.lookup(BB0);
		if (BB0HoistBarrier != 0 && DFSNumberMap.lookup(Load0) > BB0HoistBarrier)
		continue;
		unsigned int BB1HoistBarrier = FirstHoistBarrier.lookup(BB1);
		if (BB1HoistBarrier != 0 && DFSNumberMap.lookup(Load1) > BB1HoistBarrier)
		continue;
		}
MemoryLocation Loc0 = MemoryLocation::get(Load0);		MemoryLocation Loc0 = MemoryLocation::get(Load0);
		GerolfUnsubmitted Not Done Reply Inline Actions Loc0,1 should be moved below the conditional. Then they are definitely needed. Gerolf: Loc0,1 should be moved below the conditional. Then they are definitely needed.
MemoryLocation Loc1 = MemoryLocation::get(Load1);		MemoryLocation Loc1 = MemoryLocation::get(Load1);

if (Load0->isSameOperationAs(Load1) && AA->isMustAlias(Loc0, Loc1) &&		if (Load0->isSameOperationAs(Load1) && AA->isMustAlias(Loc0, Loc1) &&
!isLoadHoistBarrierInRange(BB1->front(), *Load1, Load1,		!isLoadHoistBarrierInRange(BB1->front(), *Load1, Load1) &&
SafeToLoadUnconditionally) &&		!isLoadHoistBarrierInRange(BB0->front(), *Load0, Load0)) {
!isLoadHoistBarrierInRange(BB0->front(), *Load0, Load0,
SafeToLoadUnconditionally)) {
return Load1;		return Load1;
}		}
}		}
return nullptr;		return nullptr;
}		}

///		///
/// \brief Merge two equivalent instructions \p HoistCand and \p ElseInst into		/// \brief Merge two equivalent instructions \p HoistCand and \p ElseInst into
Show All 17 Lines	void MergedLoadStoreMotion::hoistInstruction(BasicBlock *BB,
// Prepend point for instruction insert		// Prepend point for instruction insert
Instruction *HoistPt = BB->getTerminator();		Instruction *HoistPt = BB->getTerminator();

// Merged instruction		// Merged instruction
Instruction *HoistedInst = HoistCand->clone();		Instruction *HoistedInst = HoistCand->clone();

// Hoist instruction.		// Hoist instruction.
HoistedInst->insertBefore(HoistPt);		HoistedInst->insertBefore(HoistPt);
		if (MSSA) {
		GerolfUnsubmitted Not Done Reply Inline Actions This is a MSSA update problem. How about if (MSSA && MSSA->update(HoistCan, HoistedInst)) MSSA->removeMemoryAccess(ElseInst); Gerolf: This is a MSSA update problem. How about if (MSSA && MSSA->update(HoistCan, HoistedInst))…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions As mentioned, if you look early enough in the thread on this, you can see at one point it was, in fact that API, and i was explicitly asked to make simpler apis instead of these kinds of composed ones :) The way it is now is also consistent with how we do updates for everything else both SSA related (where you do exactly what you see below), and analysis related, in LLVM. (IE memdep has you call invalidateCachedPointerInfo and such, dominators requires you set and redirect dominators on your own) In fact, if memoryssa was part of the IR, there would be no difference between the way this code looks and the rest of the function (which is updating the IR) looks . Because it's a side datastructure the naming is a little different and we haven't built all the same utilities yet. Additonally, the kind of API you suggest above would have wildly varying complexity and is overkill vs building utilities that handle the common cases, which is what we do elsewhere for ssa/dom/etc updates. I'm more than happy to commit to building those utilities as we come across common cases. dberlin: As mentioned, if you look early enough in the thread on this, you can see at one point it was…
		GerolfUnsubmitted Not Done Reply Inline Actions Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative approach you propose. I'm surprised though that a simple replacement of a load has not come up elsewhere. Gerolf: Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative…
		// We also hoist operands of loads using this function, so check to see if
		// this is really a memory access before we try to update MemorySSA for it.
		MemoryAccess *HoistCandAccess = MSSA->getMemoryAccess(HoistCand);
		if (HoistCandAccess) {
		MemoryUseOrDef *MUD = cast<MemoryUseOrDef>(HoistCandAccess);
		// What is happening here is that we are creating the hoisted access
		// and destroying the old accesses.
		MSSA->createMemoryAccessInBB(HoistedInst, MUD->getDefiningAccess(), BB,
		MemorySSA::End);
		MSSA->removeMemoryAccess(HoistCandAccess);
		MSSA->removeMemoryAccess(MSSA->getMemoryAccess(ElseInst));
		}
		}

HoistCand->replaceAllUsesWith(HoistedInst);		HoistCand->replaceAllUsesWith(HoistedInst);
		reamesUnsubmitted Done Reply Inline Actions Not quite clear what you're trying to say with this comment? The code appears to just be updating MemorySSA for the transform performed. reames: Not quite clear what you're trying to say with this comment? The code appears to just be…
removeInstruction(HoistCand);		removeInstruction(HoistCand);
// Replace the else block instruction.		// Replace the else block instruction.
ElseInst->replaceAllUsesWith(HoistedInst);		ElseInst->replaceAllUsesWith(HoistedInst);
removeInstruction(ElseInst);		removeInstruction(ElseInst);
}		}

///		///
/// \brief Return true if no operand of \p I is defined in I's parent block		/// \brief Return true if no operand of \p I is defined in I's parent block
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = Succ0->begin(), BBE = Succ0->end();

++NLoads;		++NLoads;
if (NLoads * Size1 >= MagicCompileTimeControl)		if (NLoads * Size1 >= MagicCompileTimeControl)
break;		break;
if (LoadInst *L1 = canHoistFromBlock(Succ1, L0)) {		if (LoadInst *L1 = canHoistFromBlock(Succ1, L0)) {
bool Res = hoistLoad(BB, L0, L1);		bool Res = hoistLoad(BB, L0, L1);
MergedLoads \|= Res;		MergedLoads \|= Res;
// Don't attempt to hoist above loads that had not been hoisted.		// Don't attempt to hoist above loads that had not been hoisted.
		// FIXME: This is very restrictive, and could be fixed.
if (!Res)		if (!Res)
break;		break;
}		}
}		}
return MergedLoads;		return MergedLoads;
}		}

///		///
/// \brief True when instruction is a sink barrier for a store		/// \brief True when instruction is a sink barrier for a store
/// located in Loc		/// located in Loc
		reamesUnsubmitted Done Reply Inline Actions I'm a bit confused here. If we found the set of load instructions which share the generation number at the start of a block and we're merging two blocks with a single shared successor, why do we need any further legality checks? Aren't all the loads guaranteed to have the generation number at the end of the common block? Are you possibly trying to do two generations at once here? Or something else more complex? reames: I'm a bit confused here. If we found the set of load instructions which share the generation…
///		///
/// Whenever an instruction could possibly read or modify the		/// Whenever an instruction could possibly read or modify the
/// value being stored or protect against the store from		/// value being stored or protect against the store from
/// happening it is considered a sink barrier.		/// happening it is considered a sink barrier.
///		///
bool MergedLoadStoreMotion::isStoreSinkBarrierInRange(const Instruction &Start,		bool MergedLoadStoreMotion::isStoreSinkBarrierInRange(const Instruction &Start,
const Instruction &End,		const Instruction &End,
MemoryLocation Loc) {		MemoryLocation Loc) {
for (const Instruction &Inst :
make_range(Start.getIterator(), End.getIterator()))
if (Inst.mayThrow())
return true;
return AA->canInstructionRangeModRef(Start, End, Loc, MRI_ModRef);		return AA->canInstructionRangeModRef(Start, End, Loc, MRI_ModRef);
}		}

///		///
/// \brief Check if \p BB contains a store to the same address as \p SI		/// \brief Check if \p BB contains a store to the same address as \p SI
///		///
/// \return The store in \p when it is safe to sink. Otherwise return Null.		/// \return The store in \p when it is safe to sink. Otherwise return Null.
///		///
StoreInst MergedLoadStoreMotion::canSinkFromBlock(BasicBlock BB1,		StoreInst MergedLoadStoreMotion::canSinkFromBlock(BasicBlock BB1,
StoreInst *Store0) {		StoreInst *Store0) {
DEBUG(dbgs() << "can Sink? : "; Store0->dump(); dbgs() << "\n");		DEBUG(dbgs() << "can Sink? : "; Store0->dump(); dbgs() << "\n");
BasicBlock *BB0 = Store0->getParent();		BasicBlock *BB0 = Store0->getParent();
for (Instruction &Inst : reverse(*BB1)) {		for (Instruction &Inst : reverse(*BB1)) {
auto *Store1 = dyn_cast<StoreInst>(&Inst);		auto *Store1 = dyn_cast<StoreInst>(&Inst);
if (!Store1)		if (!Store1)
continue;		continue;

		// If the last sink barrier in the block is after us, we can't sink out
		// of the block.
		unsigned int BB0SinkBarrier = LastSinkBarrier.lookup(BB0);
		if (BB0SinkBarrier != 0 && DFSNumberMap.lookup(Store0) < BB0SinkBarrier)
		continue;
		unsigned int BB1SinkBarrier = LastSinkBarrier.lookup(BB1);
		if (BB1SinkBarrier != 0 && DFSNumberMap.lookup(Store1) < BB1SinkBarrier)
		continue;
MemoryLocation Loc0 = MemoryLocation::get(Store0);		MemoryLocation Loc0 = MemoryLocation::get(Store0);
MemoryLocation Loc1 = MemoryLocation::get(Store1);		MemoryLocation Loc1 = MemoryLocation::get(Store1);

if (AA->isMustAlias(Loc0, Loc1) && Store0->isSameOperationAs(Store1) &&		if (AA->isMustAlias(Loc0, Loc1) && Store0->isSameOperationAs(Store1) &&
!isStoreSinkBarrierInRange(*Store1->getNextNode(), BB1->back(), Loc1) &&		!isStoreSinkBarrierInRange(*Store1->getNextNode(), BB1->back(), Loc1) &&
!isStoreSinkBarrierInRange(*Store0->getNextNode(), BB0->back(), Loc0)) {		!isStoreSinkBarrierInRange(*Store0->getNextNode(), BB0->back(), Loc0)) {
return Store1;		return Store1;
}		}
}		}
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (A0 && A1 && A0->isIdenticalTo(A1) && A0->hasOneUse() &&
// Create the new store to be inserted at the join point.		// Create the new store to be inserted at the join point.
StoreInst *SNew = cast<StoreInst>(S0->clone());		StoreInst *SNew = cast<StoreInst>(S0->clone());
Instruction *ANew = A0->clone();		Instruction *ANew = A0->clone();
SNew->insertBefore(&*InsertPt);		SNew->insertBefore(&*InsertPt);
ANew->insertBefore(SNew);		ANew->insertBefore(SNew);

assert(S0->getParent() == A0->getParent());		assert(S0->getParent() == A0->getParent());
assert(S1->getParent() == A1->getParent());		assert(S1->getParent() == A1->getParent());
		updateMemorySSAForSunkStores(BB, S0, S1, SNew);
		mcrosierUnsubmitted Done Reply Inline Actions No need for extra curly brackets. mcrosier: No need for extra curly brackets.
		GerolfUnsubmitted Done Reply Inline Actions You could remove the conditional here and simply return from updateMemorySSA when MSSA is null. Gerolf: You could remove the conditional here and simply return from updateMemorySSA when MSSA is null.

// New PHI operand? Use it.		// New PHI operand? Use it.
if (PHINode *NewPN = getPHIOperand(BB, S0, S1))		if (PHINode *NewPN = getPHIOperand(BB, S0, S1))
SNew->setOperand(0, NewPN);		SNew->setOperand(0, NewPN);
removeInstruction(S0);		removeInstruction(S0);
removeInstruction(S1);		removeInstruction(S1);
		reamesUnsubmitted Done Reply Inline Actions Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may be completely correct, but it looks suspicious when compared to instruction insertion into a basic block. reames: Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may…
A0->replaceAllUsesWith(ANew);		A0->replaceAllUsesWith(ANew);
removeInstruction(A0);		removeInstruction(A0);
A1->replaceAllUsesWith(ANew);		A1->replaceAllUsesWith(ANew);
removeInstruction(A1);		removeInstruction(A1);
return true;		return true;
}		}
return false;		return false;
}		}
Show All 16 Lines	bool MergedLoadStoreMotion::mergeStores(BasicBlock *T) {
BasicBlock Pred1 = PI;		BasicBlock Pred1 = PI;
++PI;		++PI;
// tail block of a diamond/hammock?		// tail block of a diamond/hammock?
if (Pred0 == Pred1)		if (Pred0 == Pred1)
return false; // No.		return false; // No.
if (PI != E)		if (PI != E)
return false; // No. More than 2 predecessors.		return false; // No. More than 2 predecessors.

// #Instructions in Succ1 for Compile Time Control		// #Instructions in Succ1 for Compile Time Control
		reamesUnsubmitted Done Reply Inline Actions Given this is a linear traversal of Pred0, does the reserve really gain you anything here? reames: Given this is a linear traversal of Pred0, does the reserve really gain you anything here?
int Size1 = Pred1->size();		int Size1 = Pred1->size();
int NStores = 0;		int NStores = 0;

for (BasicBlock::reverse_iterator RBI = Pred0->rbegin(), RBE = Pred0->rend();		for (BasicBlock::reverse_iterator RBI = Pred0->rbegin(), RBE = Pred0->rend();
RBI != RBE;) {		RBI != RBE;) {

Instruction I = &RBI;		Instruction I = &RBI;
++RBI;		++RBI;

		reamesUnsubmitted Done Reply Inline Actions Neat! reames: Neat!
// Don't sink non-simple (atomic, volatile) stores.		// Don't sink non-simple (atomic, volatile) stores.
		reamesUnsubmitted Done Reply Inline Actions Ok, I'm again a bit confused by the algorithm. :) What I expected to see was something along the following: Find store in each block which is a candidate for merge (using memory-def edges from MemoryPhi) Merge any loads in those basic blocks which use the generation if possible. Search downward from each store to find a conflicting memory access or end of block. (This could either by by walking instructions or mem-use edges.) If both reach end of block, merge. Repeat previous steps with newly created MemoryPhi for earlier stores if any. reames: Ok, I'm again a bit confused by the algorithm. :) What I expected to see was something along…
auto *S0 = dyn_cast<StoreInst>(I);		auto *S0 = dyn_cast<StoreInst>(I);
if (!S0 \|\| !S0->isSimple())		if (!S0 \|\| !S0->isSimple())
continue;		continue;

++NStores;		++NStores;
if (NStores * Size1 >= MagicCompileTimeControl)		if (NStores * Size1 >= MagicCompileTimeControl)
break;		break;
if (StoreInst *S1 = canSinkFromBlock(Pred1, S0)) {		if (StoreInst *S1 = canSinkFromBlock(Pred1, S0)) {
bool Res = sinkStore(T, S0, S1);		bool Res = sinkStore(T, S0, S1);
MergedStores \|= Res;		MergedStores \|= Res;
// Don't attempt to sink below stores that had to stick around		// Don't attempt to sink below stores that had to stick around
// But after removal of a store and some of its feeding		// But after removal of a store and some of its feeding
// instruction search again from the beginning since the iterator		// instruction search again from the beginning since the iterator
// is likely stale at this point.		// is likely stale at this point.
if (!Res)		if (!Res)
break;		break;
RBI = Pred0->rbegin();		RBI = Pred0->rbegin();
RBE = Pred0->rend();		RBE = Pred0->rend();
DEBUG(dbgs() << "Search again\n"; Instruction I = &RBI; I->dump());		DEBUG(dbgs() << "Search again\n"; Instruction I = &RBI; I->dump());
}		}
}		}
return MergedStores;		return MergedStores;
}		}

		/// \brief Visit blocks in DFS order and number instructions. For each block
		gberryUnsubmitted Done Reply Inline Actions It seems like you forgot to finish this sentence. gberry: It seems like you forgot to finish this sentence.
		GerolfUnsubmitted Done Reply Inline Actions How about: Visit blocks in DFS order and number instructions. For each block track first hoist barrier and last sink barrier. Gerolf: How about: Visit blocks in DFS order and number instructions. For each block track first hoist…
		/// track first hoist barrier and last sink barrier.
		void MergedLoadStoreMotion::computeBarriers() {
		// This is 1 so the default constructed value of 0 can be used to say we
		// didn't find anything.
		unsigned DFSNum = 1;
		for (auto DI = df_begin(DT->getRootNode()), DE = df_end(DT->getRootNode());
		DI != DE; ++DI) {
		BasicBlock *DomBlock = DI->getBlock();
		bool FoundHoistBarrierInBlock = false;
		for (const auto &Inst : *DomBlock) {
		DFSNumberMap[&Inst] = DFSNum;

		if (!FoundHoistBarrierInBlock &&
		!isGuaranteedToTransferExecutionToSuccessor(&Inst)) {
		FirstHoistBarrier[DomBlock] = DFSNum;
		FoundHoistBarrierInBlock = true;
		}
		if (Inst.mayThrow()) {
		LastSinkBarrier[DomBlock] = DFSNum;
		}
		++DFSNum;
		}
		}
		}

bool MergedLoadStoreMotion::run(Function &F, MemoryDependenceResults *MD,		bool MergedLoadStoreMotion::run(Function &F, MemoryDependenceResults *MD,
AliasAnalysis &AA) {		AliasAnalysis &AA, DominatorTree *DT,
		MemorySSA *MSSA) {
this->MD = MD;		this->MD = MD;
this->AA = &AA;		this->AA = &AA;
		this->MSSA = MSSA;
		this->DT = DT;
		DT->updateDFSNumbers();
		if (MSSA)
		CachingWalker = MSSA->getWalker();

bool Changed = false;		bool Changed = false;
DEBUG(dbgs() << "Instruction Merger\n");		DEBUG(dbgs() << "Instruction Merger\n");
		computeBarriers();

// Merge unconditional branches, allowing PRE to catch more		// Merge unconditional branches, allowing PRE to catch more
// optimization opportunities.		// optimization opportunities.
for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE;) {		for (BasicBlock &BB : F) {
BasicBlock BB = &FI++;

// Hoist equivalent loads and sink stores		// Hoist equivalent loads and sink stores
// outside diamonds when possible		// outside diamonds when possible
if (isDiamondHead(BB)) {		if (isDiamondHead(&BB)) {
Changed \|= mergeLoads(BB);		if (MSSA) {
Changed \|= mergeStores(getDiamondTail(BB));		Changed \|= mergeLoadsMemorySSA(&BB);
		Changed \|= mergeStoresMemorySSA(getDiamondTail(&BB));
		} else {
		Changed \|= mergeLoads(&BB);
		Changed \|= mergeStores(getDiamondTail(&BB));
		}
}		}
}		}
return Changed;		return Changed;
}		}

namespace {		namespace {
class MergedLoadStoreMotionLegacyPass : public FunctionPass {		class MergedLoadStoreMotionLegacyPass : public FunctionPass {
public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MergedLoadStoreMotionLegacyPass() : FunctionPass(ID) {		MergedLoadStoreMotionLegacyPass() : FunctionPass(ID) {
initializeMergedLoadStoreMotionLegacyPassPass(		initializeMergedLoadStoreMotionLegacyPassPass(
*PassRegistry::getPassRegistry());		*PassRegistry::getPassRegistry());
}		}

///		///
/// \brief Run the transformation for each function		/// \brief Run the transformation for each function
///		///
bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;
MergedLoadStoreMotion Impl;		MergedLoadStoreMotion Impl;
auto *MDWP = getAnalysisIfAvailable<MemoryDependenceWrapperPass>();		auto *MDWP = getAnalysisIfAvailable<MemoryDependenceWrapperPass>();
		MemorySSA *MSSA = nullptr;
		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		if (UseMemorySSA)
		MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();

return Impl.run(F, MDWP ? &MDWP->getMemDep() : nullptr,		return Impl.run(F, MDWP ? &MDWP->getMemDep() : nullptr,
getAnalysis<AAResultsWrapperPass>().getAAResults());		getAnalysis<AAResultsWrapperPass>().getAAResults(), DT,
		MSSA);
}		}

private:		private:
// This transformation requires dominator postdominator info		// This transformation requires dominator postdominator info
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
		mcrosierUnsubmitted Done Reply Inline Actions but -> put mcrosier: but -> put
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
		if (UseMemorySSA) {
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		}
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addPreserved<MemoryDependenceWrapperPass>();		AU.addPreserved<MemoryDependenceWrapperPass>();
}		}
};		};

char MergedLoadStoreMotionLegacyPass::ID = 0;		char MergedLoadStoreMotionLegacyPass::ID = 0;
} // anonymous namespace		} // anonymous namespace

///		///
		mcrosierUnsubmitted Not Done Reply Inline Actions Pre-increment here? mcrosier: Pre-increment here?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Not sure what line this is supposed to go with, i can't find a post-increment around :) dberlin: Not sure what line this is supposed to go with, i can't find a post-increment around :)
		mcrosierUnsubmitted Done Reply Inline Actions Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the if statement. ++NLoads; if (NLoads * Size1 ...) vs if (++NLoads * Size1 ...) mcrosier: Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Oh, sorry. I copied this code from the non-memoryssa version. (In truth, the code can stand a lot of cleaning up, improvement, and reducing of restrictions) dberlin: Oh, sorry. I copied this code from the non-memoryssa version. (In truth, the code can stand a…
/// \brief createMergedLoadStoreMotionPass - The public interface to this file.		/// \brief createMergedLoadStoreMotionPass - The public interface to this file.
///		///
FunctionPass *llvm::createMergedLoadStoreMotionPass() {		FunctionPass *llvm::createMergedLoadStoreMotionPass() {
return new MergedLoadStoreMotionLegacyPass();		return new MergedLoadStoreMotionLegacyPass();
}		}

INITIALIZE_PASS_BEGIN(MergedLoadStoreMotionLegacyPass, "mldst-motion",		INITIALIZE_PASS_BEGIN(MergedLoadStoreMotionLegacyPass, "mldst-motion",
		mcrosierUnsubmitted Not Done Reply Inline Actions Any reason we don't just increment in the for statement itself? mcrosier: Any reason we don't just increment in the for statement itself?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating the iterator. It turns out to be non-trivial to change this, but i never moved this loop back to account for this. dberlin: Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Actually, it can't be done, we still invalidate the current iterator for loads. I'll document this. dberlin: Actually, it can't be done, we still invalidate the current iterator for loads. I'll document…
"MergedLoadStoreMotion", false, false)		"MergedLoadStoreMotion", false, false)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
		mcrosierUnsubmitted Done Reply Inline Actions Please add a period. mcrosier: Please add a period.
INITIALIZE_PASS_END(MergedLoadStoreMotionLegacyPass, "mldst-motion",		INITIALIZE_PASS_END(MergedLoadStoreMotionLegacyPass, "mldst-motion",
"MergedLoadStoreMotion", false, false)		"MergedLoadStoreMotion", false, false)

PreservedAnalyses		PreservedAnalyses
MergedLoadStoreMotionPass::run(Function &F, AnalysisManager<Function> &AM) {		MergedLoadStoreMotionPass::run(Function &F, AnalysisManager<Function> &AM) {
MergedLoadStoreMotion Impl;		MergedLoadStoreMotion Impl;
auto *MD = AM.getCachedResult<MemoryDependenceAnalysis>(F);		auto *MD = AM.getCachedResult<MemoryDependenceAnalysis>(F);
auto &AA = AM.getResult<AAManager>(F);		auto &AA = AM.getResult<AAManager>(F);
if (!Impl.run(F, MD, AA))
		MemorySSA *MSSA = nullptr;
		DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);
		if (UseMemorySSA)
		GerolfUnsubmitted Done Reply Inline Actions I think this is the only place where you need UseMemorySSA. All other instances can be handled by if (MSSA). Gerolf: I think this is the only place where you need UseMemorySSA. All other instances can be handled…
		MSSA = &AM.getResult<MemorySSAAnalysis>(F);

		if (!Impl.run(F, MD, AA, DT, MSSA))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// FIXME: This should also 'preserve the CFG'.		// FIXME: This should also 'preserve the CFG'.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
		PA.preserve<DominatorTreeAnalysis>();
		if (MSSA)
		gberryUnsubmitted Done Reply Inline Actions Isn't MemorySSA only preserved if 'UseMemorySSA' is true? gberry: Isn't MemorySSA only preserved if 'UseMemorySSA' is true?
		dberlinAuthorUnsubmitted Done Reply Inline Actions Yes, fixed. dberlin: Yes, fixed.
		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
}		}

		/// This file contained a MemorySSA and non-MemorySSA version of the same
		/// optimization, and to keep it less confusing, the functions specific to the
		/// MemorySSA version are placed below this comment.

		// Try to hoist a given load candidate into HoistBlock by trying to merge it
		// with any equivalent loads.
		bool MergedLoadStoreMotion::mergeLoadMemorySSA(const MemoryUse *Cand,
		BasicBlock *HoistBlock) {
		bool MergedLoads = false;
		BasicBlock *BB = Cand->getBlock();
		LoadInst *Load1 = dyn_cast<LoadInst>(Cand->getMemoryInst());
		// This isUsedOutsideOfBlock is also conservative.
		if (!Load1 \|\| Load1->isUsedOutsideOfBlock(BB))
		gberryUnsubmitted Not Done Reply Inline Actions Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use it for the Load0 hash table insertion as well? gberry: Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Let us please not do that in this patch. Doing that would change this algorithm to catch a different set of loads than the non-memoryssa version. I would like to keep the algorithms identical in what they catch for the moment, so debugging is easy. Eventually, when the mssa version is the default and the non-mssa version killed, we can improve it to be whatever we want. dberlin: Let us please not do that in this patch. Doing that would change this algorithm to catch a…
		return false;
		// We know that if the load has the same clobbering access as this one, they
		// must not be killed until the same point. That is, we are guaranteed that
		// all the loads that could possibly be merged must have a common MemoryDef
		// (or MemoryPhi) that they reach. If not, then we can't merge them because
		// something is in the way on one of the branches. If we do find a
		// possible match, the only further checking we need to do is to ensure they
		// are loads of the same pointer. We could simply check the pointer
		// operands, but isMustAlias can do a better job of it.

		auto LookupResult = LoadInfo.equal_range(
		{Load1, CachingWalker->getClobberingMemoryAccess(Load1)});
		bool Res = false;
		auto LookupIter = LookupResult.first;
		bool SafeToLoadUnconditionally =
		(LookupResult.first != LookupResult.second) &&
		isSafeToLoadUnconditionally(Load1->getPointerOperand(),
		Load1->getAlignment(),
		george.burgess.ivUnsubmitted Done Reply Inline Actions Potentially unfinished thought? george.burgess.iv: Potentially unfinished thought?
		Load1->getModule()->getDataLayout(),
		/ScanFrom=/HoistBlock->getTerminator());
		// If we won't be able to hoist the load out of the block, there is
		// no point in walking all the equivalent loads in the other block.
		unsigned int BB1HoistBarrier = FirstHoistBarrier.lookup(BB);
		if (!SafeToLoadUnconditionally &&
		george.burgess.ivUnsubmitted Done Reply Inline Actions Did you mean "only move simple (non-atomic, non-volatile)"? george.burgess.iv: Did you mean "only move simple (non-atomic, non-volatile)"?
		(BB1HoistBarrier != 0 && DFSNumberMap.lookup(Load1) > BB1HoistBarrier))
		return false;

		while (!Res && LookupIter != LookupResult.second) {
		gberryUnsubmitted Not Done Reply Inline Actions Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though. gberry: Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :) Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones. dberlin: If that was the case, all the loads in that block are equal (because they also passed…
		gberryUnsubmitted Not Done Reply Inline Actions Is that right (the bit about all the loads being equal)? I'm thinking of a case like the below where all of the loads pass isSameOperation and all have the call as their clobbering access. In this case you'll end up comparing all pairs of loads (assuming none of the loads must alias). call foo br %c, %then, %else then: load i32, %p1 load i32, %p2 load i32, %p3 ... br %end else: load i32, %p4 load i32, %p5 load i32, %p6 ... br %end end: gberry: Is that right (the bit about all the loads being equal)? I'm thinking of a case like the below…
		LoadInst *Load0 = LookupIter->first;
		auto OldIter = LookupIter;
		++LookupIter;
		if (!SafeToLoadUnconditionally) {
		// If the first hoist barrier in the block is before the load, we
		// can't hoist.
		unsigned int BB0HoistBarrier =
		FirstHoistBarrier.lookup(Load0->getParent());
		GerolfUnsubmitted Not Done Reply Inline Actions The code from here could be a function. I prefer a function to fit on my screen. Gerolf: The code from here could be a function. I prefer a function to fit on my screen.
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Fixed dberlin: Fixed
		if (BB0HoistBarrier != 0 && DFSNumberMap.lookup(Load0) > BB0HoistBarrier)
		continue;
		}

		MemoryLocation Loc0 = MemoryLocation::get(Load0);
		MemoryLocation Loc1 = MemoryLocation::get(Load1);
		if (AA->isMustAlias(Loc0, Loc1))
		Res \|= hoistLoad(HoistBlock, Load0, Load1);
		mcrosierUnsubmitted Done Reply Inline Actions Please add a period. mcrosier: Please add a period.
		MergedLoads \|= Res;
		if (Res)
		LoadInfo.erase(OldIter);
		}
		return MergedLoads;
		}

		///
		/// \brief Try to hoist two loads to same address into diamond header
		///
		/// Starting from a diamond head block, iterate over the loads in one
		/// successor block, put them in the hash table.
		/// Then walk through loads in the successor block, and see if they match any in
		/// the hash table and can be hoisted.
		///
		GerolfUnsubmitted Not Done Reply Inline Actions Why do you need this? When the first condition is false the while loop does not execute. But then SafeToLoadUnconditionally is not needed anyway Gerolf: Why do you need this? When the first condition is false the while loop does not execute. But…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions It's moved here because it's loop-invariant. That's also why the check matches the loop check, to avoid calculating it if the loop won't execute :) Otherwise, we would calculate it on every loop iteration, when it doesn't change. dberlin: It's moved here because it's loop-invariant. That's also why the check matches the loop check…
		GerolfUnsubmitted Not Done Reply Inline Actions Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the) while condition twice. Gerolf: Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the)…
		bool MergedLoadStoreMotion::mergeLoadsMemorySSA(BasicBlock *BB) {
		bool MergedLoads = false;
		assert(isDiamondHead(BB));
		BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator());
		BasicBlock *Succ0 = BI->getSuccessor(0);
		BasicBlock *Succ1 = BI->getSuccessor(1);
		// #Instructions in Succ1 for Compile Time Control
		int Size1 = Succ1->size();
		int NLoads = 0;

		// Skip if we don't have memory accesses in both blocks
		auto *Succ0Accesses = MSSA->getBlockAccesses(Succ0);
		if (!Succ0Accesses)
		return false;
		auto *Succ1Accesses = MSSA->getBlockAccesses(Succ1);
		if (!Succ1Accesses)
		GerolfUnsubmitted Not Done Reply Inline Actions Within a block I thought the DFS numbers are 0 1 2 ... So I would expect loop up(load) < BBHoistBarrier. I'm probably wrong about the DFSs. Gerolf: Within a block I thought the DFS numbers are 0 1 2 ... So I would expect loop up(load) <…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions You are thinking about it backwards i think. The dfs numbers are indeed as you list them If the block looks like this: 0 1 - load barrier 2 - load We can't hoist. If it's 0 1 - load 2 - load barrier We can hoist. So if DFS(load) > DFS(load barrier), that means the load barrier is above us in the block (and we come after the load barrier), and we can't hoist past it. dberlin: You are thinking about it backwards i think. The dfs numbers are indeed as you list them If…
		GerolfUnsubmitted Not Done Reply Inline Actions Hm, it looks alright today :-). Thanks. Gerolf: Hm, it looks alright today :-). Thanks.
		return false;

		GerolfUnsubmitted Not Done Reply Inline Actions Load1 is loop invariant and the continue condition can be computed in the header. Gerolf: Load1 is loop invariant and the continue condition can be computed in the header.
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Fixed dberlin: Fixed
		// Walk all accesses in first block, put them into a hash table.
		for (auto &Access : *Succ0Accesses) {
		const MemoryUse *MU = dyn_cast<MemoryUse>(&Access);
		if (!MU)
		continue;
		Instruction *I = MU->getMemoryInst();
		// Only move simple non-atomic and non-volatile loads.
		LoadInst *Load0 = dyn_cast<LoadInst>(I);
		// FIXME: The isUsedOutsideofBlock is super-conservative, and used not to
		// lengthen live ranges. There are better ways.
		if (!Load0 \|\| !Load0->isSimple() \|\| Load0->isUsedOutsideOfBlock(Succ0))
		continue;
		GerolfUnsubmitted Not Done Reply Inline Actions That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could still be Load0. Also, not hoisting above loads that had not been hoisted is too restrictive. With better/cheaper DA this optimization can be more aggressive. Perhaps this could be a FIXME. Gerolf: That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Yes, i'll fix. This code was copied from mergeLoads originally. It is indeed too restrictive (there are a lot of random restrictions we could relax). I'll add a FIXME. dberlin: 1. Yes, i'll fix. 2. This code was copied from mergeLoads originally. It is indeed too…

		LoadInfo.insert({Load0, CachingWalker->getClobberingMemoryAccess(Load0)});
		if (++NLoads * Size1 >= MagicCompileTimeControl)
		break;
		}
		// Walk all accesses in the second block, see if we can match them against
		// accesses in the first.
		for (auto AI = Succ1Accesses->begin(), AE = Succ1Accesses->end(); AI != AE;) {
		const MemoryUse MU = dyn_cast<MemoryUse>(&AI);
		++AI;
		if (!MU)
		continue;
		MergedLoads \|= mergeLoadMemorySSA(MU, BB);
		}
		LoadInfo.clear();
		return MergedLoads;
		}

		///
		/// \brief True when two stores are equivalent and can sink into the footer
		///
		/// Starting from a diamond tail block, place all the stores in one predecessor
		/// in a hash table, and try to match them against stores in the second
		/// predecessor
		///
		bool MergedLoadStoreMotion::mergeStoresMemorySSA(BasicBlock *T) {

		bool MergedStores = false;
		assert(T && "Footer of a diamond cannot be empty");
		GerolfUnsubmitted Not Done Reply Inline Actions Footer -> Tail in error message Gerolf: Footer -> Tail in error message

		pred_iterator PI = pred_begin(T), E = pred_end(T);
		assert(PI != E);
		BasicBlock Pred0 = PI;
		++PI;
		BasicBlock Pred1 = PI;
		++PI;
		// tail block of a diamond/hammock?
		if (Pred0 == Pred1)
		return false; // No.
		if (PI != E)
		return false; // No. More than 2 predecessors.

		// #Instructions in Succ1 for Compile Time Control
		int Size1 = Pred1->size();
		int NStores = 0;

		// Skip all this if we don't have any memory accesses to look at.
		GerolfUnsubmitted Not Done Reply Inline Actions There is similar code in mergeLoads. That could be a utility function that returns false when there is no memory access in at least one of the blocks, e.g.. static bool hasMemoryAccess(BasicBlock B1, BasicBlock B2). Gerolf: There is similar code in mergeLoads. That could be a utility function that returns false when…
		auto *Pred0Accesses = MSSA->getBlockAccesses(Pred0);
		if (!Pred0Accesses)
		return false;
		auto *Pred1Accesses = MSSA->getBlockAccesses(Pred1);
		if (!Pred1Accesses)
		return false;

		for (auto AI = Pred0Accesses->rbegin(), AE = Pred0Accesses->rend(); AI != AE;
		++AI) {
		if (const MemoryDef MD = dyn_cast<MemoryDef>(&AI)) {
		Instruction *I = MD->getMemoryInst();

		// Sink move non-simple (atomic, volatile) stores
		if (!isa<StoreInst>(I))
		continue;
		StoreInst *S0 = cast<StoreInst>(I);
		if (!S0->isSimple())
		continue;
		StoreInfo.insert(S0);

		++NStores;
		if (NStores * Size1 >= MagicCompileTimeControl)
		GerolfUnsubmitted Not Done Reply Inline Actions Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for NLoads there. Gerolf: Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for…
		break;
		}
		}

		GerolfUnsubmitted Not Done Reply Inline Actions Could be a separate function: Gerolf: Could be a separate function:
		for (auto AI = Pred1Accesses->rbegin(), AE = Pred1Accesses->rend(); AI != AE;
		++AI) {
		const MemoryDef MD = dyn_cast<MemoryDef>(&AI);
		if (!MD)
		continue;
		Instruction *Inst = MD->getMemoryInst();
		if (!isa<StoreInst>(Inst))
		continue;

		StoreInst *Store1 = cast<StoreInst>(Inst);
		// If we won't be able to sink the store out of the block, there is no point
		// in looking for equal stores.
		unsigned int BB1SinkBarrier = LastSinkBarrier.lookup(Store1->getParent());
		if (BB1SinkBarrier != 0 && DFSNumberMap.lookup(Store1) < BB1SinkBarrier)
		continue;

		auto LookupResult = StoreInfo.equal_range(Store1);
		GerolfUnsubmitted Not Done Reply Inline Actions Please add a comment about what the equal stores are. Gerolf: Please add a comment about what the equal stores are.
		bool Res = false;
		auto LookupIter = LookupResult.first;

		while (!Res && LookupIter != LookupResult.second) {
		GerolfUnsubmitted Not Done Reply Inline Actions Since it is sinking stores would it speed up the code reversing the loop traversal (from second to first)? Gerolf: Since it is sinking stores would it speed up the code reversing the loop traversal (from second…
		StoreInst Store0 = LookupIter;
		auto OldIter = LookupIter;
		++LookupIter;

		// If the last sink barrier in the block is after us, we can't sink out
		// of the block.
		unsigned int BB0SinkBarrier = LastSinkBarrier.lookup(Store0->getParent());
		if (BB0SinkBarrier != 0 && DFSNumberMap.lookup(Store0) < BB0SinkBarrier)
		continue;

		MemoryLocation Loc0 = MemoryLocation::get(Store0);
		MemoryLocation Loc1 = MemoryLocation::get(Store1);
		if (!AA->isMustAlias(Loc0, Loc1) \|\|
		isStoreSinkBarrierInRangeMemorySSA(Loc0,
		MSSA->getMemoryAccess(Store0)) \|\|
		isStoreSinkBarrierInRangeMemorySSA(Loc1,
		MSSA->getMemoryAccess(Store1)))
		continue;
		Res = sinkStore(T, Store0, Store1);
		gberryUnsubmitted Done Reply Inline Actions typo: "MemoryDefwith" gberry: typo: "MemoryDefwith"
		if (Res)
		StoreInfo.erase(OldIter);
		gberryUnsubmitted Done Reply Inline Actions Typo: "explosed". Also, there is no 'End' parameter as referenced in the comment. gberry: Typo: "explosed". Also, there is no 'End' parameter as referenced in the comment.
		MergedStores \|= Res;
		}
		}
		for (auto V : AccessesToDelete) {
		GerolfUnsubmitted Not Done Reply Inline Actions A comment would be nice why the accesses get cleared here. Or would that fit better in sinkStores? Gerolf: A comment would be nice why the accesses get cleared here. Or would that fit better in…
		MSSA->removeMemoryAccess(V);
		gberryUnsubmitted Done Reply Inline Actions getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not call them. which seems to conflict with this usage. gberry: getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not…
		dberlinAuthorUnsubmitted Done Reply Inline Actions I will change the docs for them in a separate patch. The change to make updateDFSNumbers public, etc, was done specifically to enable this usage. See the thread titlte "[PATCH] Make updateDFSNumbers API public" from last year, and there was a review where i computed them separately, and it was suggested to just make it public. dberlin: I will change the docs for them in a separate patch. The change to make updateDFSNumbers…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions This is now fixed in the docs for those functions dberlin: This is now fixed in the docs for those functions
		}
		AccessesToDelete.clear();
		StoreInfo.clear();
		return MergedStores;
		}
		gberryUnsubmitted Done Reply Inline Actions typo: "replaceALlUsesWith" gberry: typo: "replaceALlUsesWith"

		///
		/// \brief True when the store is not downwards explosed to block \p End
		GerolfUnsubmitted Not Done Reply Inline Actions Typo: explosed Gerolf: Typo: explosed

		bool MergedLoadStoreMotion::isStoreSinkBarrierInRangeMemorySSA(
		MemoryLocation &Loc, MemoryAccess *Start) {
		DomTreeNode *StartNode = DT->getNode(Start->getBlock());
		std::pair<unsigned, unsigned> StartDFS = {StartNode->getDFSNumIn(),
		StartNode->getDFSNumOut()};

		// Seed with current users that are in the block range.
		std::queue<MemoryAccess *> CurrentUses;
		gberryUnsubmitted Not Done Reply Inline Actions Is there a reason you're using a queue for this work-list instead of the more common SmallVector? gberry: Is there a reason you're using a queue for this work-list instead of the more common…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions We push back and pull front because that exploration order guarantees we terminate at the earliest possible point. If we popped off the back, it would be an order that would explore lower things before higher things. dberlin: We push back and pull front because that exploration order guarantees we terminate at the…
		SmallPtrSet<const MemoryAccess *, 8> Visited;
		for (auto U : Start->users())
		CurrentUses.emplace(cast<MemoryAccess>(U));

		// Process all the uses and see if they are below end.
		while (!CurrentUses.empty()) {
		GerolfUnsubmitted Not Done Reply Inline Actions What is a good invariant for this loop? Like there shouldn't be more checks than memory references in the code that contains Start? A checked invariant would make me more comfortable with this code. Gerolf: What is a good invariant for this loop? Like there shouldn't be more checks than memory…
		MemoryAccess *MA = CurrentUses.front();
		CurrentUses.pop();

		BasicBlock *BB = MA->getBlock();
		DomTreeNode *BBNode = DT->getNode(BB);
		std::pair<unsigned, unsigned> CurrDFS = {BBNode->getDFSNumIn(),
		BBNode->getDFSNumOut()};
		// First see if it's outside the dominator tree range
		gberryUnsubmitted Not Done Reply Inline Actions Do you even need the domtree info here to check for barriers in a diamond? There are references to hammocks, but as far as I can tell we never actually attempt to optimize hammocks. If that is the case, couldn't this just check for barriers in the same block as Start? gberry: Do you even need the domtree info here to check for barriers in a diamond? There are…
		// The only way it could affect us is if it's below us in the dominator
		// tree.
		// The start of the final sink point is not below us in a hammock's
		// dominator tree, because in a hammock, the merge block must be a
		// sibling of the split block in the dominator tree.
		// Thus, the things below us in the dominator tree are all things
		// that lead to the sink point.
		if ((CurrDFS.first >= StartDFS.first && CurrDFS.first <= StartDFS.first) &&
		GerolfUnsubmitted Not Done Reply Inline Actions That looks very wrong. At the minimum this is a convoluted equality test. Gerolf: That looks very wrong. At the minimum this is a convoluted equality test.
		CurrDFS.second >= StartDFS.second &&
		CurrDFS.second <= StartDFS.second) {
		// A phi node is not a memory access on its own, and we may have deleted
		// this access already. Becasue we are walking backwards, we don't perform
		// updates on the fly.
		if (AccessesToDelete.count(MA) \|\| isa<MemoryPhi>(MA))
		continue;
		Instruction *Use = cast<MemoryUseOrDef>(MA)->getMemoryInst();

		// If it really conflicts, we have a barrier.
		if (AA->getModRefInfo(Use, Loc) & MRI_ModRef)
		GerolfUnsubmitted Not Done Reply Inline Actions Shouldn't updates be owned by MSSA rather than each client? Gerolf: Shouldn't updates be owned by MSSA rather than each client?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions (replying here since phab is not good at processing email like this): That would be inconsistent with what we do in the rest of LLVM :) MemorySSA is an SSA form, and like SSA in LLVM, passes are responsible for keeping SSA up to date. This is also true of most analysis preservation. You can see that splitCriticalEdge takes a memdep argument, for example, and will update memdep instead of memdep updating itself. Same with dominators. Passes that want to preserve Dom are responsible for redirecting the dominators where they should go. They can screw it up, too! Doing general updates (IE "I have no idea what happened, you go find out and fix it") is also expensive, and at least for every pass i've converted so far, completely unnecessary. I also had more functional updaters (IE replaceAccessWithHoistedAccess) in memoryssa at earlier points, but if you look earlier in the review, you'll see others wanted it to be simple functionality instead. In retrospect, i agree. We can build a general updater if we need it, and we can build utilities to handle common updates if we need it. So far, it hasn't been necessary. At some point, the better question is really "why is memoryssa a side data structure instead of just part of the normal IR". But that's a question for another day. dberlin: (replying here since phab is not good at processing email like this): That would be…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions (and, just to also point out, the update algorithm we use is specific to diamonds, and would not work for more general control flow. It also would not work without the set of legality testing, etc, we perform elsewhere in the code ) dberlin: (and, just to also point out, the update algorithm we use is specific to diamonds, and would…
		GerolfUnsubmitted Not Done Reply Inline Actions What got me here is that I expected the update to be simple. Why is here more involved than replacing two MD's with one, possibly removing a MP? From the clients perspective updates like this look too complicated to get right. Maybe there is verifier in place, but in the code I don't even see a single assertion but could help debugging issues should something go wrong. Perhaps we can come up with some utility routines that make simpler to grasp: when two memory or more memory operation get moved, this is the blueprint for updating MSSA and guarantee it stays consistent. Gerolf: What got me here is that I expected the update to be simple. Why is here more involved than…
		return true;

		// If not, add it's immediate uses to keep walking
		for (auto U : Start->users()) {
		MemoryAccess *MU = cast<MemoryAccess>(U);
		if (Visited.insert(MU).second)
		CurrentUses.emplace(MU);
		}
		}
		}
		return false;
		}

		void MergedLoadStoreMotion::updateMemorySSAForSunkStores(
		BasicBlock SinkLocation, StoreInst S0, StoreInst S1, StoreInst SNew) {
		// There are three basic store sinking cases that we have to handle when
		// updating.
		// Because we are sinking to the bottom of a diamond, we may need to place a
		// phi node depending on what the old memory state of the MemoryDef's were.
		// This breaks down into three cases.
		//
		// 1. There are no stores above S0 or S1 in either block.
		GerolfUnsubmitted Not Done Reply Inline Actions I continue struggling with this function. For now I just need some help understanding the comment. When you say "above" S0, S1 you refer to DFS numbering like Sx is above S0 in this example: 0 Sx 1 2 S0 Correct? This would be consistent with your DFS explanation in a previous explanation. Gerolf: I continue struggling with this function. For now I just need some help understanding the…
		gberryUnsubmitted Not Done Reply Inline Actions I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of the diamond that references just one of the sunk stores. For example: ; 1 = MemDef ... call foo br %c %then, %else then: ; 2 = MemDef(1) store @A ; 3 = MemDef(2) store @B br %end else: ; 4 = MemDef(1) store @A br %end end: 5 = MemPhi(3, 4) I believe in this case, after updating the end block will look like: end: 5 = MemPhi(3, 6) ; 6 = MemDef(1) store @A Which seems wrong, since the phi is before it's use, but also it seems like having 6's defining access skip over the phi 5 and def 3 could cause trouble, though I'm less sure about the latter. gberry: I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of…
		// In this case, no phi node is necessary, both MemoryDefs must have the same
		// defining access, as the top of the diamond can only have one reaching
		GerolfUnsubmitted Not Done Reply Inline Actions What is the "top of the diamond"? The head? Gerolf: What is the "top of the diamond"? The head?
		// MemoryDef/MemoryPhi at the end. After use replacement, we may end up with a
		// phi
		// in the sink location block having all the same operands, which we check
		GerolfUnsubmitted Not Done Reply Inline Actions What is the "sink location block"? The Tail? Gerolf: What is the "sink location block"? The Tail?
		// for.
		//
		// 2. There are stores above either S0 or S1, and uses below the block
		GerolfUnsubmitted Not Done Reply Inline Actions What is "the block"? Perhaps "uses in the tail and/or below" describes it better. Gerolf: What is "the block"? Perhaps "uses in the tail and/or below" describes it better.
		// In this case, a phi node will already exist in the sink location, but may
		GerolfUnsubmitted Not Done Reply Inline Actions Would it make sense to clearly distinguish phi and memory phi (mphi) nodes? Gerolf: Would it make sense to clearly distinguish phi and memory phi (mphi) nodes?
		// need to be updated.
		// If the phi node argument is either S0 or S1's MemoryDef, we replace it with
		// S0/S1's
		// defining memorydef.
		// 3. There are stores above either S0 or S1, and no uses below the block
		// In this case, no phi node will exist, but we need one to merge the
		// MemoryDef's in order to produce the defining access for the MemoryDef we
		// will place in the sink location block.
		// After handling the above, we create a new MemoryDef with either the phi
		// node, or the correct version, in the sink location block.

		if (!MSSA)
		return;

		MemoryDef *MD0 = cast<MemoryDef>(MSSA->getMemoryAccess(S0));
		MemoryDef *MD1 = cast<MemoryDef>(MSSA->getMemoryAccess(S1));

		bool MayCreateDegeneratePhi = false;
		MemoryAccess *NewDefiningAccess;
		if (MD0->getDefiningAccess() == MD1->getDefiningAccess()) {
		// Case 1
		NewDefiningAccess = MD0->getDefiningAccess();
		// This may create a phi node that has the same versions for everything
		// after replaceAllUsesWith is done below.
		MayCreateDegeneratePhi = true;
		} else if (MemoryAccess *MA = MSSA->getMemoryAccess(SinkLocation)) {
		// Case 2
		MemoryPhi *MP = cast<MemoryPhi>(MA);
		for (Use &Op : MP->incoming_values()) {
		if (Op.get() == MD0)
		Op.set(MD0->getDefiningAccess());
		else if (Op.get() == MD1)
		Op.set(MD1->getDefiningAccess());
		}
		NewDefiningAccess = MP;
		} else {
		GerolfUnsubmitted Not Done Reply Inline Actions There could be a lambda or function that takes care of the two loops. Gerolf: There could be a lambda or function that takes care of the two loops.
		// Case 3
		MemoryPhi *MP = MSSA->createMemoryPhi(SinkLocation);
		// Being a diamond, the only incoming predecessors should be the blocks for
		// the two stores.
		MP->addIncoming(MD0->getDefiningAccess(), MD0->getBlock());
		MP->addIncoming(MD1->getDefiningAccess(), MD1->getBlock());
		NewDefiningAccess = MP;
		}
		// We may have stores that do not conflict with us, after us in the same
		// block.
		// Since we've already proved they do not conflict with us, the correct
		// thing to do is to reset their defining access to our defining access.

		BasicBlock *MD0Block = MD0->getBlock();
		for (auto UI = MD0->use_begin(), UE = MD0->use_end(); UI != UE;) {
		// Unfortunately, the iterators get invalidated when we reset the use
		Use U = &UI;
		++UI;
		if (cast<MemoryAccess>(U->getUser())->getBlock() == MD0Block)
		U->set(MD0->getDefiningAccess());
		}

		BasicBlock *MD1Block = MD1->getBlock();
		for (auto UI = MD1->use_begin(), UE = MD1->use_end(); UI != UE;) {
		Use U = &UI;
		++UI;
		if (cast<MemoryAccess>(U->getUser())->getBlock() == MD1Block)
		U->set(MD1->getDefiningAccess());
		}
		// Create the new MemoryDef
		MemoryAccess *SNewMA = MSSA->createMemoryAccessInBB(
		SNew, NewDefiningAccess, SinkLocation, MemorySSA::Beginning);
		MD1->replaceAllUsesWith(SNewMA);
		MD0->replaceAllUsesWith(SNewMA);
		MSSA->removeMemoryAccess(MSSA->getMemoryAccess(S0));
		// ilist's reverse iterator invalidation semantics basically mean we need to
		// wait until the loop in our caller is dead before we kill these
		AccessesToDelete.insert(MSSA->getMemoryAccess(S1));
		if (MayCreateDegeneratePhi) {
		if (MemoryAccess *MA = MSSA->getMemoryAccess(SinkLocation)) {
		MemoryPhi *MP = cast<MemoryPhi>(MA);
		MemoryAccess *SamePhiOp = nullptr;
		for (auto &Op : MP->incoming_values()) {
		if (!SamePhiOp)
		SamePhiOp = cast<MemoryAccess>(Op.get());
		else if (SamePhiOp != Op.get()) {
		SamePhiOp = nullptr;
		break;
		}
		}
		// If this has some value, it means all operands of the phi are now the
		// same
		if (SamePhiOp) {
		MP->replaceAllUsesWith(SamePhiOp);
		MSSA->removeMemoryAccess(MP);
		}
		}
		}
		}

test/Transforms/InstMerge/exceptions.ll

	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
	; RUN: opt -aa-pipeline=basic-aa -passes='require<memdep>',mldst-motion \			; RUN: opt -aa-pipeline=basic-aa -passes='require<memdep>',mldst-motion \
	; RUN: -S < %s \| FileCheck %s			; RUN: -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@r = common global i32 0, align 4			@r = common global i32 0, align 4
	@s = common global i32 0, align 4			@s = common global i32 0, align 4

	; CHECK-LABEL: define void @test1(			; CHECK-LABEL: define void @test1(
	define void @test1(i1 %cmp, i32* noalias %p) {			define void @test1(i1 %cmp, i32* noalias %p) {
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

test/Transforms/InstMerge/ld_hoist1.ll

	; Test load hoist			; Test load hoist
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc_linux"			target triple = "x86_64-pc_linux"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define float* @foo(i32* noalias nocapture readonly %in, float* noalias %out, i32 %size, i32* nocapture readonly %trigger) {			define float* @foo(i32* noalias nocapture readonly %in, float* noalias %out, i32 %size, i32* nocapture readonly %trigger) {
	entry:			entry:
	%cmp11 = icmp eq i32 %size, 0			%cmp11 = icmp eq i32 %size, 0
	br i1 %cmp11, label %for.end, label %for.body.lr.ph			br i1 %cmp11, label %for.end, label %for.body.lr.ph
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

test/Transforms/InstMerge/ld_hoist_st_sink.ll

	; Tests to make sure that loads and stores in a diamond get merged			; Tests to make sure that loads and stores in a diamond get merged
	; Loads are hoisted into the header. Stores sunks into the footer.			; Loads are hoisted into the header. Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i64, %struct.node, %struct.node, %struct.node, i64, %struct.arc, i64, i64, i64 }			%struct.node = type { i64, %struct.node, %struct.node, %struct.node, i64, %struct.arc, i64, i64, i64 }
	%struct.arc = type { i64, i64, i64 }			%struct.arc = type { i64, i64, i64 }

	define i64 @foo(%struct.node* nocapture readonly %r) nounwind {			define i64 @foo(%struct.node* nocapture readonly %r) nounwind {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

test/Transforms/InstMerge/st_sink_barrier_call.ll

	; Test to make sure that a function call that needs to be a barrier to sinking stores is indeed a barrier.			; Test to make sure that a function call that needs to be a barrier to sinking stores is indeed a barrier.
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	declare i32 @foo(i32 %x)			declare i32 @foo(i32 %x)

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	Show All 32 Lines

test/Transforms/InstMerge/st_sink_bugfix_22613.ll

	; ModuleID = 'bug.c'			; ModuleID = 'bug.c'
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; RUN: opt -O2 -S < %s \| FileCheck %s			; RUN: opt -O2 -S < %s \| FileCheck %s
				; RUN: opt -O2 -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s

	; CHECK-LABEL: main			; CHECK-LABEL: main
	; CHECK: if.end			; CHECK: if.end
	; CHECK: store			; CHECK: store
	; CHECK: memset			; CHECK: memset
	; CHECK: if.then			; CHECK: if.then
	; CHECK: store			; CHECK: store
	; CHECK: memset			; CHECK: memset
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/Transforms/InstMerge/st_sink_no_barrier_call.ll

	; Test to make sure that stores in a diamond get merged with a non barrier function call after the store instruction			; Test to make sure that stores in a diamond get merged with a non barrier function call after the store instruction
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	declare i32 @foo(i32 %x) #0			declare i32 @foo(i32 %x) #0

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	Show All 34 Lines

test/Transforms/InstMerge/st_sink_no_barrier_load.ll

	; Test to make sure that stores in a diamond get merged with a non barrier load after the store instruction			; Test to make sure that stores in a diamond get merged with a non barrier load after the store instruction
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	%node.017 = load %struct.node, %struct.node* %node.0.in16, align 8			%node.017 = load %struct.node, %struct.node* %node.0.in16, align 8
	Show All 31 Lines

test/Transforms/InstMerge/st_sink_no_barrier_store.ll

	; Test to make sure that stores in a diamond get merged with a non barrier store after the store instruction to be sunk			; Test to make sure that stores in a diamond get merged with a non barrier store after the store instruction to be sunk
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	%node.017 = load %struct.node, %struct.node* %node.0.in16, align 8			%node.017 = load %struct.node, %struct.node* %node.0.in16, align 8
	Show All 30 Lines

test/Transforms/InstMerge/st_sink_two_stores.ll

	; Test to make sure that stores in a diamond get merged			; Test to make sure that stores in a diamond get merged
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	Show All 36 Lines

test/Transforms/InstMerge/st_sink_with_barrier.ll

	; Test to make sure that load from the same address as a store and appears after the store prevents the store from being sunk			; Test to make sure that load from the same address as a store and appears after the store prevents the store from being sunk
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa-mldst -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -use-memoryssa-mldst -passes='mldst-motion,verify<memoryssa>' -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Update MergedLoadStoreMotion to use MemorySSAAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 62825

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

test/Transforms/InstMerge/exceptions.ll

test/Transforms/InstMerge/ld_hoist1.ll

test/Transforms/InstMerge/ld_hoist_st_sink.ll

test/Transforms/InstMerge/st_sink_barrier_call.ll

test/Transforms/InstMerge/st_sink_bugfix_22613.ll

test/Transforms/InstMerge/st_sink_no_barrier_call.ll

test/Transforms/InstMerge/st_sink_no_barrier_load.ll

test/Transforms/InstMerge/st_sink_no_barrier_store.ll

test/Transforms/InstMerge/st_sink_two_stores.ll

test/Transforms/InstMerge/st_sink_with_barrier.ll

Update MergedLoadStoreMotion to use MemorySSA
AbandonedPublic