This is an archive of the discontinued LLVM Phabricator instance.

Update MergedLoadStoreMotion to use MemorySSA
AbandonedPublic

Authored by • dberlin on Mar 28 2015, 5:43 PM.

Download Raw Diff

Details

Reviewers

reames
gberry
nlewycky
hfinkel

Summary

This updates MergedLoadStoreMotion to use and preserve MemorySSA.
It depends on D7864

Prior to this, the algorithm for loads was N^2, and for stores, it was N^2R, where N is the number of instructions in
the block, and R is the number of removed stores
(It restarted the reverse walk every time it removed a store due to iterator invalidation).

It is now O(M) (where M is the number of memory instructions in the two blocks) for loads,
and O(max(M,S^2)) for stores (because we have no downwards clobbering API yet, the hash table does not help
us determine memory dependence for our uses).

I have deliberately not changed behavior in terms of what loads/stores it will remove or the compile time
controls, in order to make the change minimal.

(Hopefully phabricator will not screwup a revision that is a branch of a branch,
arc diff --preview showed the right stuff)

Diff Detail

Event Timeline

• dberlin updated this revision to Diff 22847.Mar 28 2015, 5:43 PM

• dberlin retitled this revision from to Update MergedLoadStoreMotion to use MemorySSA.

• dberlin updated this object.

• dberlin edited the test plan for this revision. (Show Details)

• dberlin added reviewers: hfinkel, reames, nlewycky.

• dberlin added a parent revision: D7864: This patch introduces MemorySSA, a virtual SSA form for memory. Details on what it looks like are in MemorySSA.h.

• dberlin added a subscriber: Unknown Object (MLST).

Delete dead functions
Trigger an update so phabricator sends out this review, since it got eaten during sendgrid issues

Update for MemorySSA API Update

FYI, I'm basically reviewing this as a new algorithm. Trying to match it up with what's there before doesn't seem particularly worth doing given the algorithmic problems you commented on.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
106	Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code would be cleaner. Also, I'm surprised that a generic instruction hash function doesn't already exist. What does GVN use for this?
117	Is the extra A == B check needed? I would expect that to be inside isSameOperationAs.
391	Not quite clear what you're trying to say with this comment? The code appears to just be updating MemorySSA for the transform performed.
582	I'm a bit confused here. If we found the set of load instructions which share the generation number at the start of a block and we're merging two blocks with a single shared successor, why do we need any further legality checks? Aren't all the loads guaranteed to have the generation number at the end of the common block? Are you possibly trying to do two generations at once here? Or something else more complex?
685	Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may be completely correct, but it looks suspicious when compared to instruction insertion into a basic block.
718	Given this is a linear traversal of Pred0, does the reserve really gain you anything here?
727	Neat!
727–728	Ok, I'm again a bit confused by the algorithm. :) What I expected to see was something along the following: Find store in each block which is a candidate for merge (using memory-def edges from MemoryPhi) Merge any loads in those basic blocks which use the generation if possible. Search downward from each store to find a conflicting memory access or end of block. (This could either by by walking instructions or mem-use edges.) If both reach end of block, merge. Repeat previous steps with newly created MemoryPhi for earlier stores if any.

During building, MemorySSA currently optimizes uses to the nearest
dominating clobbering access for you, done at the suggestion of 2
people. We could make it stop doing this (it's just a flag), but the
tradeoff is most things end up walking more in practice.
Additionally, as things optimize, and you update memssa, you will
often end up with the *same result* anyway. The only way to stop that
is to make it so you are required to use "nearest dominating
memorydef" instead of "dominating memorydef" when updating. We do
this in the API's where we do insertion (addNewMemoryUse), but we
don't check/verify it in replaceMemoryAccess style API's, we only
check standard domination.

The other tradeoff, btw, is that while doing MemoryUse optimization
while building makes this particular algorithm slightly harder, it
makes an (IMHO) much more useful thing a lot easier.

A lot of passes (GVN, memcpyopt, etc) want to know whether they can
replace a load with a load to eliminate something:

1= MemoryDef(liveonentry)
store a
2 = MemoryDef(1)
store b
3 = MemoryDef(2)
store c
4 = MemoryDef(3)
store d
MemoryUse(1)
load a, 4 bytes
load a, 8 bytes
(or different types, or whatever)

By optimizing MemoryUse's during building, we are guaranteed that if
we do getMemoryAccess(load A)->definition(which is store a)->uses, or
we now have all loads that actually use *that* store's value. This
means we can look at the other loads of that store value and see if we
can reuse them.

Doing this without use optimization of uses would be much harder.
you'd need to use a downwards API from the store to get all possible
real uses, which may encompass walking the entire program.

On the other hand, the tradeoff of doing MemoryUse optimization is
that it means to get all the loads in a block, you really have to ask
for all the loads in the block, instead of trying to hope you have
chains that give it to you :)

Update for new insertion API. Clean up comments a bit
Update to walk stores

This now uses an immediate use walker rather than the alias ABI to determine downward exposure of stores

Waiting for update or possibly submission. Have lost track.

This revision now requires changes to proceed.Oct 8 2015, 10:28 AM

tvvikram added a subscriber: tvvikram.Jan 2 2016, 10:15 AM

mcrosier added a subscriber: mcrosier.Jun 14 2016, 8:57 AM

I have a large update coming to this, was just trying to make a sane update
API (the rest is easy).

mcrosier added a reviewer: gberry.Jun 14 2016, 9:13 AM

Update to work in progress
This will work for loads, stores coming up.

This also includes the memoryssa update API, which will be split out and committed separately

george.burgess.iv added a subscriber: george.burgess.iv.Jun 14 2016, 11:21 AM

Add support for stores. Code generated is correct but updating of memoryssa is wrong and will be fixed

Add verification for completeness of PHI nodes
Add basic PHI creation API
Update MemorySSA for store sinking

This version should correctly update memoryssa for store sinking :)
(I checked that it does on the llvm testcases, i'll test it further as i
split out the memoryssa part)

• dberlin added inline comments.Jun 15 2016, 12:10 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
106	To answer the original question, i'm checking to see about lambdas. No generic Instruction hasher exists that covers precisely the set of stuff we care about. GVN does what newgvn does, which is it defines its own expression class, shoves operands in those classes, and calls hash_combine on the whole shebang: friend hash_code hash_value(const Expression &Value) { return hash_combine( Value.opcode, Value.type, hash_combine_range(Value.varargs.begin(), Value.varargs.end())); } The default hasher will end up hashing the pointer values themselves, which is what gvn wanted in this case. It is not useful, however, for hashing "parts of operations you care about", or "do two operations look pretty similar". Past that, i did not change the things included in the hash to ensure consistency of what is optimized with the existing code.
117	It is not, it goes straight into comparing all sorts of fields/data.

A couple of minor comments I noticed when reading through this for my own edification:

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
753	It seems like you forgot to finish this sentence.
1113	typo: "MemoryDefwith"
1125	typo: "replaceALlUsesWith"

Update MergedLoadStoreMotion to optionally use MemorySSA
Add support for stores. Code generated is correct but updating of memoryssa is wrong
Add verification for completeness of PHI nodes
Add basic PHI creation API
Update MemorySSA for store sinking
Fix typos

rebased

• dberlin added a parent revision: D21463: Add MemoryAccess creation and PHI creation APIs to MemorySSA.Jun 17 2016, 1:34 AM

A few comments in passing.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
97	Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption that other passes will have a command-line option to opt into using MemorySSA? Alternatively, we could include the option in the PassManagerBuilder and use this single flag to control the use of MemorySSA for all passes during the transition period. Just my 2c.
180	Perhaps I missed something, but why is TargetLibraryInfo required?
186	Should the DominatorTree be added as preserved?
680	No need for extra curly brackets.
844	but -> put
859	Pre-increment here?
866	Any reason we don't just increment in the for statement itself?
871	Please add a period.
955	Please add a period.

• dberlin marked 10 inline comments as done.Jun 17 2016, 8:26 AM

• dberlin added inline comments.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
97	I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is). A single flag would be nice, but would make it much harder to turn memoryssa on and off for each pass, and the point of the flag is to be able to test/debug regressions until we are satisified and get rid of the non-memoryssa version. (ala what happened with SROA, etc and the ssaupdater)
180	It is not, this is a merge error.
186	This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only
859	Not sure what line this is supposed to go with, i can't find a post-increment around :)
866	Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating the iterator. It turns out to be non-trivial to change this, but i never moved this loop back to account for this.

• dberlin added inline comments.Jun 17 2016, 8:41 AM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
866	Actually, it can't be done, we still invalidate the current iterator for loads. I'll document this.

Update for requested changes

mcrosier added inline comments.Jun 17 2016, 8:54 AM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
97	I agree we should have a flag per pass to simplify debugging.
859	Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the if statement. ++NLoads; if (NLoads * Size1 ...) vs if (++NLoads * Size1 ...)

Rebase for passmanager conversion

It looks like this change still contains all of the MemorySSA changes that were split out into D21463

Properly rebase on top of other patch

Should be fixed now.

Would it make sense to split out the DFS numbering code motion barrier code into a separate change?

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
1030	Typo: "explosed". Also, there is no 'End' parameter as referenced in the comment.

Rebase after dependent revision commited

Also run verification pass with new pass manager for tests.

Any other comments?
If not, because it's off by default (and it looked like there are no real objections), my plan was to commit it.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
859	Oh, sorry. I copied this code from the non-memoryssa version. (In truth, the code can stand a lot of cleaning up, improvement, and reducing of restrictions)

I'm still digesting this, but here are a couple more nits I noticed.
I don't want to hold up approval from anyone else though.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
814	Isn't MemorySSA only preserved if 'UseMemorySSA' is true?
1035	getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not call them. which seems to conflict with this usage.

• dberlin added inline comments.Jun 22 2016, 12:54 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
814	Yes, fixed.
1035	I will change the docs for them in a separate patch. The change to make updateDFSNumbers public, etc, was done specifically to enable this usage. See the thread titlte "[PATCH] Make updateDFSNumbers API public" from last year, and there was a review where i computed them separately, and it was suggested to just make it public.

(and happy to wait if you want to review it)

Drive-by comments about comments (insert "we need to go deeper" meme here)

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
844	Potentially unfinished thought?
850	Did you mean "only move simple (non-atomic, non-volatile)"?

Update for review comments on comments

• dberlin added inline comments.Jun 22 2016, 1:40 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
1035	This is now fixed in the docs for those functions

Gerolf added a subscriber: Gerolf.Jun 22 2016, 7:07 PM

Gerolf added inline comments.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
327	Loc0,1 should be moved below the conditional. Then they are definitely needed.
376	This is a MSSA update problem. How about if (MSSA && MSSA->update(HoistCan, HoistedInst)) MSSA->removeMemoryAccess(ElseInst);
680	You could remove the conditional here and simply return from updateMemorySSA when MSSA is null.
753	How about: Visit blocks in DFS order and number instructions. For each block track first hoist barrier and last sink barrier.
815	I think this is the only place where you need UseMemorySSA. All other instances can be handled by if (MSSA).
1085	Shouldn't updates be owned by MSSA rather than each client?

• dberlin marked 10 inline comments as done.Jun 22 2016, 11:54 PM

• dberlin added inline comments.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
376	As mentioned, if you look early enough in the thread on this, you can see at one point it was, in fact that API, and i was explicitly asked to make simpler apis instead of these kinds of composed ones :) The way it is now is also consistent with how we do updates for everything else both SSA related (where you do exactly what you see below), and analysis related, in LLVM. (IE memdep has you call invalidateCachedPointerInfo and such, dominators requires you set and redirect dominators on your own) In fact, if memoryssa was part of the IR, there would be no difference between the way this code looks and the rest of the function (which is updating the IR) looks . Because it's a side datastructure the naming is a little different and we haven't built all the same utilities yet. Additonally, the kind of API you suggest above would have wildly varying complexity and is overkill vs building utilities that handle the common cases, which is what we do elsewhere for ssa/dom/etc updates. I'm more than happy to commit to building those utilities as we come across common cases.
1085	(replying here since phab is not good at processing email like this): That would be inconsistent with what we do in the rest of LLVM :) MemorySSA is an SSA form, and like SSA in LLVM, passes are responsible for keeping SSA up to date. This is also true of most analysis preservation. You can see that splitCriticalEdge takes a memdep argument, for example, and will update memdep instead of memdep updating itself. Same with dominators. Passes that want to preserve Dom are responsible for redirecting the dominators where they should go. They can screw it up, too! Doing general updates (IE "I have no idea what happened, you go find out and fix it") is also expensive, and at least for every pass i've converted so far, completely unnecessary. I also had more functional updaters (IE replaceAccessWithHoistedAccess) in memoryssa at earlier points, but if you look earlier in the review, you'll see others wanted it to be simple functionality instead. In retrospect, i agree. We can build a general updater if we need it, and we can build utilities to handle common updates if we need it. So far, it hasn't been necessary. At some point, the better question is really "why is memoryssa a side data structure instead of just part of the normal IR". But that's a question for another day.

Update for review comments

• dberlin added inline comments.Jun 23 2016, 10:19 AM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
1085	(and, just to also point out, the update algorithm we use is specific to diamonds, and would not work for more general control flow. It also would not work without the set of legality testing, etc, we perform elsewhere in the code )

I'll take a more in depth look into the load and store merge routines also. At a first glance it seems one could just add a few MSSA hooks rather than copy-paste-modify the code base.
I also take a (late :-() look at the core MSSA design. Could you outline the high-level MSSA design in more detail than http://reviews.llvm.org/D7864?Eg. how does it compare/relate to http://www.airs.com/dnovillo/Papers/mem-ssa.pdf? Thanks!

-Gerolf

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
376	Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative approach you propose. I'm surprised though that a simple replacement of a load has not come up elsewhere.
1085	What got me here is that I expected the update to be simple. Why is here more involved than replacing two MD's with one, possibly removing a MP? From the clients perspective updates like this look too complicated to get right. Maybe there is verifier in place, but in the code I don't even see a single assertion but could help debugging issues should something go wrong. Perhaps we can come up with some utility routines that make simpler to grasp: when two memory or more memory operation get moved, this is the blueprint for updating MSSA and guarantee it stays consistent.
1152	There could be a lambda or function that takes care of the two loops.

Also, ust to give you some idea of general update complexity: If you don't
bound the problem with some common cases, it's simply not possible to make
it faster than current creation (Ie it would be simpler/easier to not
preserve memoryssa in those cases).

For the common cases, assuming you tell us what variables to remove and
what variables were added in what blocks:

If you only removed and replaced variables (Ie literally no code motion

or insertion), it requires the renaming pass only. It could be further
optimized to not redo use chain optimization for memoryuses if you can
guarantee you only touched memoryuses. For memorydefs, if you can guarantee
(and you probably can't) that none of the aa relationships have changed,
you would not have to redo use chain opt there either. You can't guarantee
this because basicaa, et al have walkers with random hard limits in them,
and so you may get better results simply by reducing the length of a chain
of things it had to walk, even if there was in fact, no aliasing either way.

if the cfg has not changed, and you have added/removed variables: if you

added memoryuses, if we stay with pruned ssa, we have to recompute the IDF.
We could minimize the size we calculate by tracking some things related to
where the last uses in the program are relative to the iterated dominance
frontier. It's complicated however.
After recomputing IDF, we also have to redo renaming. For inserting
memoryuses, if we tracked the first/last version of a block (and updated
it through the creation/removal apis), we only have to rename parts of the
dom tree where the first/last version changes.

(IE if we changed first version in a block due to phi node insertion, but
last version is the same, we do not have to rename children of that block.
if we changed last version but not first version, we only have to rename
children but not the current block)
For inserting stores, it's something close to this but a bit more
complicated.

if the cfg has changed, without tracking a lot more under the covers

related to merge/join sets, it's not possible to make it faster than doing
it from scratch.

The TL;DR of all this is that a general update mechanism is very hard to
make faster than for scratch without giving it all kinds of specific info,
and usually figuring out that info is as hard as figuring out the update
algorithms, in my experience.

Ping!

Sorry if this isn't the right place, but I have a general question about what it means to preserve MemorySSA, specifically regarding the defining accesses of MemoryUse nodes. Is the idea here that we make a best effort to keep the MemoryUse defining access links optimized (i.e. never pointing to a no-alias def)? Because of the limits of basic-aa, it isn't possible to guarantee this property after any code transformation, even in the limited case here, since the alias results for completely unrelated load/stores may have been affected, right?

Gerolf added inline comments.Jul 5 2016, 8:42 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
862	The code from here could be a function. I prefer a function to fit on my screen.
885	Why do you need this? When the first condition is false the while loop does not execute. But then SafeToLoadUnconditionally is not needed anyway
901	Within a block I thought the DFS numbers are 0 1 2 ... So I would expect loop up(load) < BBHoistBarrier. I'm probably wrong about the DFSs.
903	Load1 is loop invariant and the continue condition can be computed in the header.
915	That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could still be Load0. Also, not hoisting above loads that had not been hoisted is too restrictive. With better/cheaper DA this optimization can be more aggressive. Perhaps this could be a FIXME.

• dberlin added inline comments.Jul 5 2016, 10:45 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
862	Fixed
885	It's moved here because it's loop-invariant. That's also why the check matches the loop check, to avoid calculating it if the loop won't execute :) Otherwise, we would calculate it on every loop iteration, when it doesn't change.
901	You are thinking about it backwards i think. The dfs numbers are indeed as you list them If the block looks like this: 0 1 - load barrier 2 - load We can't hoist. If it's 0 1 - load 2 - load barrier We can hoist. So if DFS(load) > DFS(load barrier), that means the load barrier is above us in the block (and we come after the load barrier), and we can't hoist past it.
903	Fixed
915	Yes, i'll fix. This code was copied from mergeLoads originally. It is indeed too restrictive (there are a lot of random restrictions we could relax). I'll add a FIXME.

Update for review comments

A few more comments/questions below. I still have a few parts I haven't fully grasped yet, but don't let me hold up approval.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
92	vector doesn't appear to be used?
108	Can't you just compute these instruction indices for the diamond then/else blocks lazily? The same goes for LastSinkBarrier, which wouldn't even need a map in that case.
826	Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use it for the Load0 hash table insertion as well?
854	Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though.
1052	Is there a reason you're using a queue for this work-list instead of the more common SmallVector?

I'll be honest. I think we are going a bit overboard for something that is off by default and is meant to be an identical algorithm to an existing pass that we know can stand improvements.

While I am all for trying to get code to be as good as possible, i don't think we need to make it completely perfect on the first pass when it's easily subject to incremental improvements :)

IE i think there is some point that is "good enough" to start, particularly for things that we can improve before we flip them to be the default.

For all we know, changes we have to make to account for converting the other passes are going to also make us change code here anyway.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
92	fixed
108	Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right now. It did not seem worth it for a first pass to build something to do it :) Remember this is supposed to be an NFC conversion to start, and building the barrier maps already converts what was N^2 before to N.
826	Let us please not do that in this patch. Doing that would change this algorithm to catch a different set of loads than the non-memoryssa version. I would like to keep the algorithms identical in what they catch for the moment, so debugging is easy. Eventually, when the mssa version is the default and the non-mssa version killed, we can improve it to be whatever we want.
854	If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :) Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones.
1052	We push back and pull front because that exploration order guarantees we terminate at the earliest possible point. If we popped off the back, it would be an order that would explore lower things before higher things.

Gerolf added inline comments.Jul 6 2016, 8:11 PM

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
885	Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the) while condition twice.
901	Hm, it looks alright today :-). Thanks.
944	Footer -> Tail in error message
962	There is similar code in mergeLoads. That could be a utility function that returns false when there is no memory access in at least one of the blocks, e.g.. static bool hasMemoryAccess(BasicBlock B1, BasicBlock B2).
984	Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for NLoads there.
988	Could be a separate function:
1005	Please add a comment about what the equal stores are.
1009	Since it is sinking stores would it speed up the code reversing the loop traversal (from second to first)?
1034	A comment would be nice why the accesses get cleared here. Or would that fit better in sinkStores?
1043	Typo: explosed
1058	What is a good invariant for this loop? Like there shouldn't be more checks than memory references in the code that contains Start? A checked invariant would make me more comfortable with this code.
1074	That looks very wrong. At the minimum this is a convoluted equality test.
1107	I continue struggling with this function. For now I just need some help understanding the comment. When you say "above" S0, S1 you refer to DFS numbering like Sx is above S0 in this example: 0 Sx 1 2 S0 Correct? This would be consistent with your DFS explanation in a previous explanation.
1109	What is the "top of the diamond"? The head?
1112	What is the "sink location block"? The Tail?
1115	What is "the block"? Perhaps "uses in the tail and/or below" describes it better.
1116	Would it make sense to clearly distinguish phi and memory phi (mphi) nodes?

• dberlin abandoned this revision.Jul 6 2016, 9:21 PM

• dberlin marked an inline comment as done.

I'm personally fine with committing this as is since it's is disabled, though I don't feel I have the authority to approve it.

I do have a couple more comments/questions below, mainly to help me gain confidence that my understanding of MemorySSA is correct.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
854	Is that right (the bit about all the loads being equal)? I'm thinking of a case like the below where all of the loads pass isSameOperation and all have the call as their clobbering access. In this case you'll end up comparing all pairs of loads (assuming none of the loads must alias). call foo br %c, %then, %else then: load i32, %p1 load i32, %p2 load i32, %p3 ... br %end else: load i32, %p4 load i32, %p5 load i32, %p6 ... br %end end:
1066	Do you even need the domtree info here to check for barriers in a diamond? There are references to hammocks, but as far as I can tell we never actually attempt to optimize hammocks. If that is the case, couldn't this just check for barriers in the same block as Start?
1107	I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of the diamond that references just one of the sunk stores. For example: ; 1 = MemDef ... call foo br %c %then, %else then: ; 2 = MemDef(1) store @A ; 3 = MemDef(2) store @B br %end else: ; 4 = MemDef(1) store @A br %end end: 5 = MemPhi(3, 4) I believe in this case, after updating the end block will look like: end: 5 = MemPhi(3, 6) ; 6 = MemDef(1) store @A Which seems wrong, since the phi is before it's use, but also it seems like having 6's defining access skip over the phi 5 and def 3 could cause trouble, though I'm less sure about the latter.

Let me clarify my last comment, since I hit send too early. This was a bug
i already fixed. Case 1 should also be checking that there are no stores
below. If there are, it should be doing case 2. In essence, the check for
case two should happen before the check for case 1 (since it's not possible
to have stores below without a phi node existing)

This will cause your phi node to get changed to memphi(3, 1), and do the
right thing.

I have a testcase for this.

hrrrm, are there any real showstoppers to get this one in? Maybe Dan has no more time time to work on it but this patch already constitutes a strong basis on which we can iterate and it's also disabled by default.
@Gerolf , any opinions?

• dberlin mentioned this in D46600: [MergedLoadStoreMotion] Fix a debug invariant bug in mergeStores.May 8 2018, 2:50 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

MemorySSA.h

30 lines

lib/

Transforms/

Scalar/

MergedLoadStoreMotion.cpp

289 lines

Utils/

MemorySSA.cpp

57 lines

test/

Transforms/

InstMerge/

exceptions.ll

1 line

ld_hoist1.ll

1 line

ld_hoist_st_sink.ll

1 line

st_sink_barrier_call.ll

1 line

st_sink_bugfix_22613.ll

1 line

st_sink_no_barrier_call.ll

1 line

st_sink_no_barrier_load.ll

1 line

st_sink_no_barrier_store.ll

1 line

st_sink_two_stores.ll

1 line

st_sink_with_barrier.ll

1 line

Diff 60709

include/llvm/Transforms/Utils/MemorySSA.h

Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	public:
/// \brief Return the list of MemoryAccess's for a given basic block.		/// \brief Return the list of MemoryAccess's for a given basic block.
///		///
/// This list is not modifiable by the user.		/// This list is not modifiable by the user.
const AccessListType getBlockAccesses(const BasicBlock BB) const {		const AccessListType getBlockAccesses(const BasicBlock BB) const {
auto It = PerBlockAccesses.find(BB);		auto It = PerBlockAccesses.find(BB);
return It == PerBlockAccesses.end() ? nullptr : It->second.get();		return It == PerBlockAccesses.end() ? nullptr : It->second.get();
}		}

		enum InsertionPlace { Beginning, End };

		/// \brief Create a MemoryAccess in MemorySSA at a specified point in a block,
		/// with a specified clobbering definition.
		/// This should be called when a memory instruction is created that is being
		/// used to replace an existing memory instruction. It will not create PHI
		/// nodes, or verify the clobbering definition. The insertion place is used
		/// solely to determine where in the memoryssa access lists the instruction
		/// will be placed. It will return the new MemoryAccess.
		MemoryAccess createMemoryAccess(Instruction I, MemoryAccess *Definition,
		const BasicBlock *BB, InsertionPlace Point);
		/// \brief Create a MemoryAccess in MemorySSA before or after an existing
		/// MemoryAccess with a specified clobbering definition.
		/// This should be called when a memory instruction is created that is being
		/// used to replace an existing memory instruction. It will not create PHI
		/// nodes, or verify the clobbering definition. The insertion place is used
		/// solely to determine where in the memoryssa access lists the instruction
		/// will be placed. It will return the new MemoryAccess.
		MemoryAccess createMemoryAccessBefore(Instruction I,
		MemoryAccess *Definition,
		MemoryAccess *InsertPt);
		MemoryAccess createMemoryAccessAfter(Instruction I,
		MemoryAccess *Definition,
		MemoryAccess *InsertPt);

/// \brief Remove a MemoryAccess from MemorySSA, including updating all		/// \brief Remove a MemoryAccess from MemorySSA, including updating all
/// definitions and uses.		/// definitions and uses.
/// This should be called when a memory instruction that has a MemoryAccess		/// This should be called when a memory instruction that has a MemoryAccess
/// associated with it is erased from the program. For example, if a store or		/// associated with it is erased from the program. For example, if a store or
/// load is simply erased (not replaced), removeMemoryAccess should be called		/// load is simply erased (not replaced), removeMemoryAccess should be called
/// on the MemoryAccess for that store/load.		/// on the MemoryAccess for that store/load.
void removeMemoryAccess(MemoryAccess *);		void removeMemoryAccess(MemoryAccess *);

enum InsertionPlace { Beginning, End };

/// \brief Given two memory accesses in the same basic block, determine		/// \brief Given two memory accesses in the same basic block, determine
/// whether MemoryAccess \p A dominates MemoryAccess \p B.		/// whether MemoryAccess \p A dominates MemoryAccess \p B.
bool locallyDominates(const MemoryAccess A, const MemoryAccess B) const;		bool locallyDominates(const MemoryAccess A, const MemoryAccess B) const;

/// \brief Verify that MemorySSA is self consistent (IE definitions dominate		/// \brief Verify that MemorySSA is self consistent (IE definitions dominate
/// all uses, uses appear in the right places). This is used by unit tests.		/// all uses, uses appear in the right places). This is used by unit tests.
void verifyMemorySSA() const;		void verifyMemorySSA() const;

Show All 10 Lines	using AccessMap =
DenseMap<const BasicBlock *, std::unique_ptr<AccessListType>>;		DenseMap<const BasicBlock *, std::unique_ptr<AccessListType>>;

void		void
determineInsertionPoint(const SmallPtrSetImpl<BasicBlock *> &DefiningBlocks);		determineInsertionPoint(const SmallPtrSetImpl<BasicBlock *> &DefiningBlocks);
void computeDomLevels(DenseMap<DomTreeNode *, unsigned> &DomLevels);		void computeDomLevels(DenseMap<DomTreeNode *, unsigned> &DomLevels);
void markUnreachableAsLiveOnEntry(BasicBlock *BB);		void markUnreachableAsLiveOnEntry(BasicBlock *BB);
bool dominatesUse(const MemoryAccess , const MemoryAccess ) const;		bool dominatesUse(const MemoryAccess , const MemoryAccess ) const;
MemoryUseOrDef createNewAccess(Instruction );		MemoryUseOrDef createNewAccess(Instruction );
		MemoryUseOrDef createDefinedAccess(Instruction , MemoryAccess *);
MemoryAccess findDominatingDef(BasicBlock , enum InsertionPlace);		MemoryAccess findDominatingDef(BasicBlock , enum InsertionPlace);
void removeFromLookups(MemoryAccess *);		void removeFromLookups(MemoryAccess *);

MemoryAccess renameBlock(BasicBlock , MemoryAccess *);		MemoryAccess renameBlock(BasicBlock , MemoryAccess *);
void renamePass(DomTreeNode , MemoryAccess IncomingVal,		void renamePass(DomTreeNode , MemoryAccess IncomingVal,
SmallPtrSet<BasicBlock *, 16> &Visited);		SmallPtrSet<BasicBlock *, 16> &Visited);
AccessListType getOrCreateAccessList(BasicBlock );		AccessListType getOrCreateAccessList(const BasicBlock );
AliasAnalysis *AA;		AliasAnalysis *AA;
DominatorTree *DT;		DominatorTree *DT;
Function &F;		Function &F;

// Memory SSA mappings		// Memory SSA mappings
DenseMap<const Value , MemoryAccess > ValueToMemoryAccess;		DenseMap<const Value , MemoryAccess > ValueToMemoryAccess;
AccessMap PerBlockAccesses;		AccessMap PerBlockAccesses;
std::unique_ptr<MemoryAccess> LiveOnEntryDef;		std::unique_ptr<MemoryAccess> LiveOnEntryDef;
▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/MemorySSA.h"
		#include <queue>
		#include <unordered_set>
		#include <vector>
		gberryUnsubmitted Done Reply Inline Actions vector doesn't appear to be used? gberry: vector doesn't appear to be used?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions fixed dberlin: fixed

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "mldst-motion"		#define DEBUG_TYPE "mldst-motion"
		static cl::opt<bool> UseMemorySSA("use-memoryssa", cl::Hidden,
		mcrosierUnsubmitted Not Done Reply Inline Actions Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption that other passes will have a command-line option to opt into using MemorySSA? Alternatively, we could include the option in the PassManagerBuilder and use this single flag to control the use of MemorySSA for all passes during the transition period. Just my 2c. mcrosier: Should we try and unique the name a bit more (e.g., "use-memssa-mlsm") under the assumption…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is). A single flag would be nice, but would make it much harder to turn memoryssa on and off for each pass, and the point of the flag is to be able to test/debug regressions until we are satisified and get rid of the non-memoryssa version. (ala what happened with SROA, etc and the ssaupdater) dberlin: I'll rename it to use-memoryssa-mldst (or whatever the shortname for this pass is). A single…
		mcrosierUnsubmitted Not Done Reply Inline Actions I agree we should have a flag per pass to simplify debugging. mcrosier: I agree we should have a flag per pass to simplify debugging.
		cl::desc("Use MemorySSA for optimizations"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MergedLoadStoreMotion Pass		// MergedLoadStoreMotion Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {

		// This hash matches the most common things isSameOperationAs checks. It must
		reamesUnsubmitted Not Done Reply Inline Actions Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code would be cleaner. Also, I'm surprised that a generic instruction hash function doesn't already exist. What does GVN use for this? reames: Can you use lambdas for these? I'm not entirely sure that works, but if it does, the code…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions To answer the original question, i'm checking to see about lambdas. No generic Instruction hasher exists that covers precisely the set of stuff we care about. GVN does what newgvn does, which is it defines its own expression class, shoves operands in those classes, and calls hash_combine on the whole shebang: friend hash_code hash_value(const Expression &Value) { return hash_combine( Value.opcode, Value.type, hash_combine_range(Value.varargs.begin(), Value.varargs.end())); } The default hasher will end up hashing the pointer values themselves, which is what gvn wanted in this case. It is not useful, however, for hashing "parts of operations you care about", or "do two operations look pretty similar". Past that, i did not change the things included in the hash to ensure consistency of what is optimized with the existing code. dberlin: To answer the original question, i'm checking to see about lambdas. No generic Instruction…
		// always be a subset or equal to isSameOperationAs for everything to function
		// properly.
		gberryUnsubmitted Not Done Reply Inline Actions Can't you just compute these instruction indices for the diamond then/else blocks lazily? The same goes for LastSinkBarrier, which wouldn't even need a map in that case. gberry: Can't you just compute these instruction indices for the diamond then/else blocks lazily? The…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right now. It did not seem worth it for a first pass to build something to do it :) Remember this is supposed to be an NFC conversion to start, and building the barrier maps already converts what was N^2 before to N. dberlin: Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right…

		struct StoreInstHash {
		size_t operator()(const StoreInst *A) const {
		return hash_combine(A->getType(), A->getPointerOperand()->getType(),
		A->getValueOperand()->getType(), A->isVolatile(),
		A->getAlignment(), A->getOrdering(),
		A->getSynchScope());
		}
		};
		reamesUnsubmitted Not Done Reply Inline Actions Is the extra A == B check needed? I would expect that to be inside isSameOperationAs. reames: Is the extra A == B check needed? I would expect that to be inside isSameOperationAs.
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions It is not, it goes straight into comparing all sorts of fields/data. dberlin: It is not, it goes straight into comparing all sorts of fields/data.

		struct StoreInstEq {
		bool operator()(const StoreInst A, const StoreInst B) const {
		return (A == B \|\| A->isSameOperationAs(B));
		}
		};

		typedef std::pair<LoadInst , const MemoryAccess > LoadPair;

		// This hash has two parts. The first is the instruction, the second, the
		// clobbering MemorySSA access. For the instruction part, this hash matches the
		// most common things isSameOperationAs checks. It must always be a subset or
		// equal to isSameOperationAs for everything to function properly.

		struct LoadPairHash {
		size_t operator()(const LoadPair &LP) const {
		const LoadInst *A = LP.first;
		return hash_combine(A->getType(), A->getPointerOperand()->getType(),
		A->isVolatile(), A->getAlignment(), A->getOrdering(),
		A->getSynchScope(), LP.second);
		}
		};

		struct LoadPairEq {
		bool operator()(const LoadPair &A, const LoadPair &B) const {
		return (A.second == B.second &&
		(A.first == B.first \|\| A.first->isSameOperationAs(B.first)));
		}
		};

class MergedLoadStoreMotion : public FunctionPass {		class MergedLoadStoreMotion : public FunctionPass {
AliasAnalysis *AA;		AliasAnalysis *AA;
MemoryDependenceResults *MD;		MemoryDependenceResults *MD;
		MemorySSA *MSSA;
		MemorySSAWalker *CachingWalker;
		DominatorTree *DT;
		// The non-MemorySSA versions of mergeLoad/Store algorithms could have Size0 *
		// Size1 complexity, where Size0 and Size1 are the #instructions on the two
		// sides of the diamond. The constant chosen here is arbitrary. Compiler Time
		// Control is enforced by the check Size0 * Size1 < MagicCompileTimeControl.
		const int MagicCompileTimeControl;
		// We DFS number instructions and avoid hoisting or sinking things past may
		// throw instructions and instructions not guaranteed to transfer execution to
		// successors.
		DenseMap<const Instruction *, unsigned> DFSNumberMap;
		DenseMap<const BasicBlock *, unsigned> FirstHoistBarrier;
		DenseMap<const BasicBlock *, unsigned> LastSinkBarrier;

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MergedLoadStoreMotion()		MergedLoadStoreMotion()
: FunctionPass(ID), MD(nullptr), MagicCompileTimeControl(250) {		: FunctionPass(ID), MD(nullptr), MagicCompileTimeControl(250) {

initializeMergedLoadStoreMotionPass(*PassRegistry::getPassRegistry());		initializeMergedLoadStoreMotionPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

private:		private:
// This transformation requires dominator postdominator info		// This transformation requires dominator postdominator info
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
		AU.addRequired<TargetLibraryInfoWrapperPass>();
		mcrosierUnsubmitted Not Done Reply Inline Actions Perhaps I missed something, but why is TargetLibraryInfo required? mcrosier: Perhaps I missed something, but why is TargetLibraryInfo required?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions It is not, this is a merge error. dberlin: It is not, this is a merge error.
		AU.addRequired<DominatorTreeWrapperPass>();
		AU.addRequired<MemorySSAWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
AU.addPreserved<MemoryDependenceWrapperPass>();		AU.addPreserved<MemoryDependenceWrapperPass>();
		mcrosierUnsubmitted Not Done Reply Inline Actions Should the DominatorTree be added as preserved? mcrosier: Should the DominatorTree be added as preserved?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only dberlin: This is not necessary, setPreservesCFG will preserve it, and all other passes marked CFG-only
}		}

// Helper routines		// Helper routines

///		///
/// \brief Remove instruction from parent and update memory dependence		/// \brief Remove instruction from parent and update memory dependence
/// analysis.		/// analysis.
///		///
void removeInstruction(Instruction *Inst);		void removeInstruction(Instruction *Inst);
BasicBlock getDiamondTail(BasicBlock BB);		BasicBlock getDiamondTail(BasicBlock BB);
bool isDiamondHead(BasicBlock *BB);		bool isDiamondHead(BasicBlock *BB);
// Routines for hoisting loads		// Routines for hoisting loads
bool isLoadHoistBarrierInRange(const Instruction &Start,		bool isLoadHoistBarrierInRange(const Instruction &Start,
const Instruction &End, LoadInst *LI,		const Instruction &End, LoadInst *LI);
bool SafeToLoadUnconditionally);
LoadInst canHoistFromBlock(BasicBlock BB, LoadInst *LI);		LoadInst canHoistFromBlock(BasicBlock BB, LoadInst *LI);
void hoistInstruction(BasicBlock BB, Instruction HoistCand,		void hoistInstruction(BasicBlock BB, Instruction HoistCand,
Instruction *ElseInst);		Instruction *ElseInst);
bool isSafeToHoist(Instruction *I) const;		bool isSafeToHoist(Instruction *I) const;
bool hoistLoad(BasicBlock BB, LoadInst HoistCand, LoadInst *ElseInst);		bool hoistLoad(BasicBlock BB, LoadInst HoistCand, LoadInst *ElseInst);
bool mergeLoads(BasicBlock *BB);		bool mergeLoads(BasicBlock *BB);
		bool mergeLoadsMemorySSA(BasicBlock *BB);
// Routines for sinking stores		// Routines for sinking stores
StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);		StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);
PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);		PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);
bool isStoreSinkBarrierInRange(const Instruction &Start,		bool isStoreSinkBarrierInRange(const Instruction &Start,
const Instruction &End, MemoryLocation Loc);		const Instruction &End, MemoryLocation Loc);
bool sinkStore(BasicBlock BB, StoreInst SinkCand, StoreInst *ElseInst);		bool sinkStore(BasicBlock BB, StoreInst SinkCand, StoreInst *ElseInst);
bool mergeStores(BasicBlock *BB);		bool mergeStores(BasicBlock *BB);
// The mergeLoad/Store algorithms could have Size0 * Size1 complexity,		void computeBarriers();
// where Size0 and Size1 are the #instructions on the two sides of		std::unordered_multiset<LoadPair, LoadPairHash, LoadPairEq> LoadInfo;
// the diamond. The constant chosen here is arbitrary. Compiler Time		std::unordered_multiset<StoreInst *, StoreInstHash, StoreInstEq> StoreInfo;
// Control is enforced by the check Size0 * Size1 < MagicCompileTimeControl.		SmallPtrSet<MemoryAccess *, 8> AccessesToDelete;
const int MagicCompileTimeControl;
};		};

char MergedLoadStoreMotion::ID = 0;		char MergedLoadStoreMotion::ID = 0;
} // anonymous namespace		} // anonymous namespace

///		///
/// \brief createMergedLoadStoreMotionPass - The public interface to this file.		/// \brief createMergedLoadStoreMotionPass - The public interface to this file.
///		///
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

///		///
/// \brief True when instruction is a hoist barrier for a load		/// \brief True when instruction is a hoist barrier for a load
///		///
/// Whenever an instruction could possibly modify the value		/// Whenever an instruction could possibly modify the value
/// being loaded or protect against the load from happening		/// being loaded or protect against the load from happening
/// it is considered a hoist barrier.		/// it is considered a hoist barrier.
///		///
bool MergedLoadStoreMotion::isLoadHoistBarrierInRange(		bool MergedLoadStoreMotion::isLoadHoistBarrierInRange(const Instruction &Start,
const Instruction &Start, const Instruction &End, LoadInst *LI,		const Instruction &End,
bool SafeToLoadUnconditionally) {		LoadInst *LI) {
if (!SafeToLoadUnconditionally)
for (const Instruction &Inst :
make_range(Start.getIterator(), End.getIterator()))
if (!isGuaranteedToTransferExecutionToSuccessor(&Inst))
return true;
MemoryLocation Loc = MemoryLocation::get(LI);		MemoryLocation Loc = MemoryLocation::get(LI);
return AA->canInstructionRangeModRef(Start, End, Loc, MRI_Mod);		return AA->canInstructionRangeModRef(Start, End, Loc, MRI_Mod);
}		}

///		///
/// \brief Decide if a load can be hoisted		/// \brief Decide if a load can be hoisted
///		///
/// When there is a load in \p BB to the same address as \p LI		/// When there is a load in \p BB to the same address as \p LI
Show All 12 Lines	for (BasicBlock::iterator BBI = BB1->begin(), BBE = BB1->end(); BBI != BBE;
++BBI) {		++BBI) {
Instruction Inst = &BBI;		Instruction Inst = &BBI;

// Only merge and hoist loads when their result in used only in BB		// Only merge and hoist loads when their result in used only in BB
auto *Load1 = dyn_cast<LoadInst>(Inst);		auto *Load1 = dyn_cast<LoadInst>(Inst);
if (!Load1 \|\| Inst->isUsedOutsideOfBlock(BB1))		if (!Load1 \|\| Inst->isUsedOutsideOfBlock(BB1))
continue;		continue;

MemoryLocation Loc0 = MemoryLocation::get(Load0);		MemoryLocation Loc0 = MemoryLocation::get(Load0);
		GerolfUnsubmitted Not Done Reply Inline Actions Loc0,1 should be moved below the conditional. Then they are definitely needed. Gerolf: Loc0,1 should be moved below the conditional. Then they are definitely needed.
MemoryLocation Loc1 = MemoryLocation::get(Load1);		MemoryLocation Loc1 = MemoryLocation::get(Load1);
		if (!SafeToLoadUnconditionally) {
		// If the first hoist barrier in the block is before the load, we
		// can't hoist.
		unsigned int BB0HoistBarrier = FirstHoistBarrier.lookup(BB0);
		if (BB0HoistBarrier != 0 && DFSNumberMap.lookup(Load0) > BB0HoistBarrier)
		continue;
		unsigned int BB1HoistBarrier = FirstHoistBarrier.lookup(BB1);
		if (BB1HoistBarrier != 0 && DFSNumberMap.lookup(Load1) > BB1HoistBarrier)
		continue;
		}

if (AA->isMustAlias(Loc0, Loc1) && Load0->isSameOperationAs(Load1) &&		if (AA->isMustAlias(Loc0, Loc1) && Load0->isSameOperationAs(Load1) &&
!isLoadHoistBarrierInRange(BB1->front(), *Load1, Load1,		!isLoadHoistBarrierInRange(BB1->front(), *Load1, Load1) &&
SafeToLoadUnconditionally) &&		!isLoadHoistBarrierInRange(BB0->front(), *Load0, Load0)) {
!isLoadHoistBarrierInRange(BB0->front(), *Load0, Load0,
SafeToLoadUnconditionally)) {
return Load1;		return Load1;
}		}
}		}
return nullptr;		return nullptr;
}		}

///		///
/// \brief Merge two equivalent instructions \p HoistCand and \p ElseInst into		/// \brief Merge two equivalent instructions \p HoistCand and \p ElseInst into
Show All 17 Lines	void MergedLoadStoreMotion::hoistInstruction(BasicBlock *BB,
// Prepend point for instruction insert		// Prepend point for instruction insert
Instruction *HoistPt = BB->getTerminator();		Instruction *HoistPt = BB->getTerminator();

// Merged instruction		// Merged instruction
Instruction *HoistedInst = HoistCand->clone();		Instruction *HoistedInst = HoistCand->clone();

// Hoist instruction.		// Hoist instruction.
HoistedInst->insertBefore(HoistPt);		HoistedInst->insertBefore(HoistPt);
		if (UseMemorySSA) {
		GerolfUnsubmitted Not Done Reply Inline Actions This is a MSSA update problem. How about if (MSSA && MSSA->update(HoistCan, HoistedInst)) MSSA->removeMemoryAccess(ElseInst); Gerolf: This is a MSSA update problem. How about if (MSSA && MSSA->update(HoistCan, HoistedInst))…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions As mentioned, if you look early enough in the thread on this, you can see at one point it was, in fact that API, and i was explicitly asked to make simpler apis instead of these kinds of composed ones :) The way it is now is also consistent with how we do updates for everything else both SSA related (where you do exactly what you see below), and analysis related, in LLVM. (IE memdep has you call invalidateCachedPointerInfo and such, dominators requires you set and redirect dominators on your own) In fact, if memoryssa was part of the IR, there would be no difference between the way this code looks and the rest of the function (which is updating the IR) looks . Because it's a side datastructure the naming is a little different and we haven't built all the same utilities yet. Additonally, the kind of API you suggest above would have wildly varying complexity and is overkill vs building utilities that handle the common cases, which is what we do elsewhere for ssa/dom/etc updates. I'm more than happy to commit to building those utilities as we come across common cases. dberlin: As mentioned, if you look early enough in the thread on this, you can see at one point it was…
		GerolfUnsubmitted Not Done Reply Inline Actions Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative approach you propose. I'm surprised though that a simple replacement of a load has not come up elsewhere. Gerolf: Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative…
		// We also hoist operands of loads using this function, so check to see if
		// this is really a memory access before we try to update MemorySSA for it.
		MemoryAccess *HoistCandAccess = MSSA->getMemoryAccess(HoistCand);
		if (HoistCandAccess) {
		MemoryUseOrDef *MUD = cast<MemoryUseOrDef>(HoistCandAccess);
		// What is happening here is that we are creating the hoisted access
		// and destroying the old accesses.
		MSSA->createMemoryAccess(HoistedInst, MUD->getDefiningAccess(), BB,
		MemorySSA::End);
		MSSA->removeMemoryAccess(HoistCandAccess);
		MSSA->removeMemoryAccess(MSSA->getMemoryAccess(ElseInst));
		}
		}

HoistCand->replaceAllUsesWith(HoistedInst);		HoistCand->replaceAllUsesWith(HoistedInst);
		reamesUnsubmitted Done Reply Inline Actions Not quite clear what you're trying to say with this comment? The code appears to just be updating MemorySSA for the transform performed. reames: Not quite clear what you're trying to say with this comment? The code appears to just be…
removeInstruction(HoistCand);		removeInstruction(HoistCand);
// Replace the else block instruction.		// Replace the else block instruction.
ElseInst->replaceAllUsesWith(HoistedInst);		ElseInst->replaceAllUsesWith(HoistedInst);
removeInstruction(ElseInst);		removeInstruction(ElseInst);
}		}

///		///
/// \brief Return true if no operand of \p I is defined in I's parent block		/// \brief Return true if no operand of \p I is defined in I's parent block
Show All 23 Lines	DEBUG(dbgs() << "Hoist Instruction into BB \n"; BB->dump();
dbgs() << "Instruction Left\n"; L0->dump(); dbgs() << "\n";		dbgs() << "Instruction Left\n"; L0->dump(); dbgs() << "\n";
dbgs() << "Instruction Right\n"; L1->dump(); dbgs() << "\n");		dbgs() << "Instruction Right\n"; L1->dump(); dbgs() << "\n");
hoistInstruction(BB, A0, A1);		hoistInstruction(BB, A0, A1);
hoistInstruction(BB, L0, L1);		hoistInstruction(BB, L0, L1);
return true;		return true;
}		}
return false;		return false;
}		}
		///
		/// \brief Try to hoist two loads to same address into diamond header
		///
		/// Starting from a diamond head block, iterate over the loads in one
		/// successor block, put them in the hash table.
		/// Then walk through loads in the successor block, and see if they match any in
		/// the hash table and can be hoisted.
		///
		bool MergedLoadStoreMotion::mergeLoadsMemorySSA(BasicBlock *BB) {
		bool MergedLoads = false;
		assert(isDiamondHead(BB));
		BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator());
		BasicBlock *Succ0 = BI->getSuccessor(0);
		BasicBlock *Succ1 = BI->getSuccessor(1);
		// #Instructions in Succ1 for Compile Time Control
		int Size1 = Succ1->size();
		int NLoads = 0;

		// Skip if we don't have memory accesses in both blocks
		auto *Succ0Accesses = MSSA->getBlockAccesses(Succ0);
		if (!Succ0Accesses)
		return false;
		auto *Succ1Accesses = MSSA->getBlockAccesses(Succ1);
		if (!Succ1Accesses)
		return false;

		// Walk all accesses in first block, but them into a hash table.
		for (auto &Access : *Succ0Accesses) {
		const MemoryUse *MU = dyn_cast<MemoryUse>(&Access);
		if (!MU)
		continue;
		Instruction *I = MU->getMemoryInst();
		// Only move non-simple (atomic, volatile) loads.
		LoadInst *Load0 = dyn_cast<LoadInst>(I);
		// FIXME: The isUsedOutsideofBlock is super-conservative, and used not to
		// lengthen live ranges. There are better ways.
		if (!Load0 \|\| !Load0->isSimple() \|\| Load0->isUsedOutsideOfBlock(Succ0))
		continue;

		LoadInfo.insert({Load0, CachingWalker->getClobberingMemoryAccess(Load0)});
		++NLoads;
		if (NLoads * Size1 >= MagicCompileTimeControl)
		break;
		}
		// Walk all accesses in the second block, see if we can match them against
		// accesses in the first.
		for (auto AI = Succ1Accesses->begin(), AE = Succ1Accesses->end(); AI != AE;) {
		auto CurrIter = AI;
		++AI;
		const MemoryUse MU = dyn_cast<MemoryUse>(&CurrIter);
		if (!MU)
		continue;
		LoadInst *Load1 = dyn_cast<LoadInst>(MU->getMemoryInst());
		// This isUsedOutsideOfBlock is also conservative
		if (!Load1 \|\| Load1->isUsedOutsideOfBlock(Succ1))
		continue;
		// We know that if the load has the same clobbering access as this one, they
		// must not be killed until the same point. That is, we are guaranteed that
		// all the loads that could possibly be merged must have a common MemoryDef
		// (or MemoryPhi) that they reach. If not, then we can't merge them because
		// something is in the way on one of the branches. If we do find a
		// possible match, the only further checking we need to do is to ensure they
		// are loads of the same pointer. We could simply check the pointer
		// operands, but isMustAlias can do a better job of it.

		auto LookupResult = LoadInfo.equal_range(
		{Load1, CachingWalker->getClobberingMemoryAccess(Load1)});
		bool Res = false;
		auto LookupIter = LookupResult.first;
		bool SafeToLoadUnconditionally =
		(LookupResult.first != LookupResult.second) &&
		isSafeToLoadUnconditionally(Load1->getPointerOperand(),
		Load1->getAlignment(),
		Load1->getModule()->getDataLayout(),
		/ScanFrom=/BB->getTerminator());

		while (!Res && LookupIter != LookupResult.second) {
		LoadInst *Load0 = LookupIter->first;
		auto OldIter = LookupIter;
		++LookupIter;
		if (!SafeToLoadUnconditionally) {
		// If the first hoist barrier in the block is before the load, we
		// can't hoist.
		unsigned int BB0HoistBarrier =
		FirstHoistBarrier.lookup(Load0->getParent());
		if (BB0HoistBarrier != 0 &&
		DFSNumberMap.lookup(Load0) > BB0HoistBarrier)
		continue;
		unsigned int BB1HoistBarrier =
		FirstHoistBarrier.lookup(Load1->getParent());
		if (BB1HoistBarrier != 0 &&
		DFSNumberMap.lookup(Load1) > BB1HoistBarrier)
		continue;
		}

		MemoryLocation Loc0 = MemoryLocation::get(Load0);
		MemoryLocation Loc1 = MemoryLocation::get(Load1);
		if (AA->isMustAlias(Loc0, Loc1))
		Res = hoistLoad(BB, Load0, Load1);
		MergedLoads \|= Res;
		// Don't attempt to hoist above loads that had not been hoisted.
		if (Res)
		LoadInfo.erase(OldIter);
		}
		}
		LoadInfo.clear();
		return MergedLoads;
		}

///		///
/// \brief Try to hoist two loads to same address into diamond header		/// \brief Try to hoist two loads to same address into diamond header
///		///
/// Starting from a diamond head block, iterate over the instructions in one		/// Starting from a diamond head block, iterate over the instructions in one
/// successor block and try to match a load in the second successor.		/// successor block and try to match a load in the second successor.
///		///
bool MergedLoadStoreMotion::mergeLoads(BasicBlock *BB) {		bool MergedLoadStoreMotion::mergeLoads(BasicBlock *BB) {
Show All 27 Lines	for (BasicBlock::iterator BBI = Succ0->begin(), BBE = Succ0->end();
}		}
}		}
return MergedLoads;		return MergedLoads;
}		}

///		///
/// \brief True when instruction is a sink barrier for a store		/// \brief True when instruction is a sink barrier for a store
/// located in Loc		/// located in Loc
///		///
		reamesUnsubmitted Done Reply Inline Actions I'm a bit confused here. If we found the set of load instructions which share the generation number at the start of a block and we're merging two blocks with a single shared successor, why do we need any further legality checks? Aren't all the loads guaranteed to have the generation number at the end of the common block? Are you possibly trying to do two generations at once here? Or something else more complex? reames: I'm a bit confused here. If we found the set of load instructions which share the generation…
/// Whenever an instruction could possibly read or modify the		/// Whenever an instruction could possibly read or modify the
/// value being stored or protect against the store from		/// value being stored or protect against the store from
/// happening it is considered a sink barrier.		/// happening it is considered a sink barrier.
///		///
bool MergedLoadStoreMotion::isStoreSinkBarrierInRange(const Instruction &Start,		bool MergedLoadStoreMotion::isStoreSinkBarrierInRange(const Instruction &Start,
const Instruction &End,		const Instruction &End,
MemoryLocation Loc) {		MemoryLocation Loc) {
for (const Instruction &Inst :
make_range(Start.getIterator(), End.getIterator()))
if (Inst.mayThrow())
return true;
return AA->canInstructionRangeModRef(Start, End, Loc, MRI_ModRef);		return AA->canInstructionRangeModRef(Start, End, Loc, MRI_ModRef);
}		}

///		///
/// \brief Check if \p BB contains a store to the same address as \p SI		/// \brief Check if \p BB contains a store to the same address as \p SI
///		///
/// \return The store in \p when it is safe to sink. Otherwise return Null.		/// \return The store in \p when it is safe to sink. Otherwise return Null.
///		///
StoreInst MergedLoadStoreMotion::canSinkFromBlock(BasicBlock BB1,		StoreInst MergedLoadStoreMotion::canSinkFromBlock(BasicBlock BB1,
StoreInst *Store0) {		StoreInst *Store0) {
DEBUG(dbgs() << "can Sink? : "; Store0->dump(); dbgs() << "\n");		DEBUG(dbgs() << "can Sink? : "; Store0->dump(); dbgs() << "\n");
BasicBlock *BB0 = Store0->getParent();		BasicBlock *BB0 = Store0->getParent();
for (BasicBlock::reverse_iterator RBI = BB1->rbegin(), RBE = BB1->rend();		for (BasicBlock::reverse_iterator RBI = BB1->rbegin(), RBE = BB1->rend();
RBI != RBE; ++RBI) {		RBI != RBE; ++RBI) {
Instruction Inst = &RBI;		Instruction Inst = &RBI;

auto *Store1 = dyn_cast<StoreInst>(Inst);		auto *Store1 = dyn_cast<StoreInst>(Inst);
if (!Store1)		if (!Store1)
continue;		continue;

MemoryLocation Loc0 = MemoryLocation::get(Store0);		MemoryLocation Loc0 = MemoryLocation::get(Store0);
MemoryLocation Loc1 = MemoryLocation::get(Store1);		MemoryLocation Loc1 = MemoryLocation::get(Store1);
		// If the last sink barrier in the block is after us, we can't sink out
		// of the block.
		unsigned int BB0SinkBarrier = LastSinkBarrier.lookup(BB0);
		if (BB0SinkBarrier != 0 && DFSNumberMap.lookup(Store0) < BB0SinkBarrier)
		continue;
		unsigned int BB1SinkBarrier = LastSinkBarrier.lookup(BB1);
		if (BB1SinkBarrier != 0 && DFSNumberMap.lookup(Store1) < BB1SinkBarrier)
		continue;

if (AA->isMustAlias(Loc0, Loc1) && Store0->isSameOperationAs(Store1) &&		if (AA->isMustAlias(Loc0, Loc1) && Store0->isSameOperationAs(Store1) &&
!isStoreSinkBarrierInRange(*Store1->getNextNode(), BB1->back(), Loc1) &&		!isStoreSinkBarrierInRange(*Store1->getNextNode(), BB1->back(), Loc1) &&
!isStoreSinkBarrierInRange(*Store0->getNextNode(), BB0->back(), Loc0)) {		!isStoreSinkBarrierInRange(*Store0->getNextNode(), BB0->back(), Loc0)) {
return Store1;		return Store1;
}		}
}		}
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (A0 && A1 && A0->isIdenticalTo(A1) && A0->hasOneUse() &&
// Create the new store to be inserted at the join point.		// Create the new store to be inserted at the join point.
StoreInst *SNew = cast<StoreInst>(S0->clone());		StoreInst *SNew = cast<StoreInst>(S0->clone());
Instruction *ANew = A0->clone();		Instruction *ANew = A0->clone();
SNew->insertBefore(&*InsertPt);		SNew->insertBefore(&*InsertPt);
ANew->insertBefore(SNew);		ANew->insertBefore(SNew);

assert(S0->getParent() == A0->getParent());		assert(S0->getParent() == A0->getParent());
assert(S1->getParent() == A1->getParent());		assert(S1->getParent() == A1->getParent());

		mcrosierUnsubmitted Done Reply Inline Actions No need for extra curly brackets. mcrosier: No need for extra curly brackets.
		GerolfUnsubmitted Done Reply Inline Actions You could remove the conditional here and simply return from updateMemorySSA when MSSA is null. Gerolf: You could remove the conditional here and simply return from updateMemorySSA when MSSA is null.
// New PHI operand? Use it.		// New PHI operand? Use it.
if (PHINode *NewPN = getPHIOperand(BB, S0, S1))		if (PHINode *NewPN = getPHIOperand(BB, S0, S1))
SNew->setOperand(0, NewPN);		SNew->setOperand(0, NewPN);
removeInstruction(S0);		removeInstruction(S0);
removeInstruction(S1);		removeInstruction(S1);
		reamesUnsubmitted Done Reply Inline Actions Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may be completely correct, but it looks suspicious when compared to instruction insertion into a basic block. reames: Are you guaranteed there's not another phi here? Not entirely sure what this is doing. It may…
A0->replaceAllUsesWith(ANew);		A0->replaceAllUsesWith(ANew);
removeInstruction(A0);		removeInstruction(A0);
A1->replaceAllUsesWith(ANew);		A1->replaceAllUsesWith(ANew);
removeInstruction(A1);		removeInstruction(A1);
return true;		return true;
}		}
return false;		return false;
}		}
Show All 16 Lines	bool MergedLoadStoreMotion::mergeStores(BasicBlock *T) {
BasicBlock Pred1 = PI;		BasicBlock Pred1 = PI;
++PI;		++PI;
// tail block of a diamond/hammock?		// tail block of a diamond/hammock?
if (Pred0 == Pred1)		if (Pred0 == Pred1)
return false; // No.		return false; // No.
if (PI != E)		if (PI != E)
return false; // No. More than 2 predecessors.		return false; // No. More than 2 predecessors.

// #Instructions in Succ1 for Compile Time Control		// #Instructions in Succ1 for Compile Time Control
		reamesUnsubmitted Done Reply Inline Actions Given this is a linear traversal of Pred0, does the reserve really gain you anything here? reames: Given this is a linear traversal of Pred0, does the reserve really gain you anything here?
int Size1 = Pred1->size();		int Size1 = Pred1->size();
int NStores = 0;		int NStores = 0;

for (BasicBlock::reverse_iterator RBI = Pred0->rbegin(), RBE = Pred0->rend();		for (BasicBlock::reverse_iterator RBI = Pred0->rbegin(), RBE = Pred0->rend();
RBI != RBE;) {		RBI != RBE;) {

Instruction I = &RBI;		Instruction I = &RBI;
++RBI;		++RBI;

		reamesUnsubmitted Done Reply Inline Actions Neat! reames: Neat!
// Don't sink non-simple (atomic, volatile) stores.		// Don't sink non-simple (atomic, volatile) stores.
		reamesUnsubmitted Done Reply Inline Actions Ok, I'm again a bit confused by the algorithm. :) What I expected to see was something along the following: Find store in each block which is a candidate for merge (using memory-def edges from MemoryPhi) Merge any loads in those basic blocks which use the generation if possible. Search downward from each store to find a conflicting memory access or end of block. (This could either by by walking instructions or mem-use edges.) If both reach end of block, merge. Repeat previous steps with newly created MemoryPhi for earlier stores if any. reames: Ok, I'm again a bit confused by the algorithm. :) What I expected to see was something along…
auto *S0 = dyn_cast<StoreInst>(I);		auto *S0 = dyn_cast<StoreInst>(I);
if (!S0 \|\| !S0->isSimple())		if (!S0 \|\| !S0->isSimple())
continue;		continue;

++NStores;		++NStores;
if (NStores * Size1 >= MagicCompileTimeControl)		if (NStores * Size1 >= MagicCompileTimeControl)
break;		break;
if (StoreInst *S1 = canSinkFromBlock(Pred1, S0)) {		if (StoreInst *S1 = canSinkFromBlock(Pred1, S0)) {
bool Res = sinkStore(T, S0, S1);		bool Res = sinkStore(T, S0, S1);
MergedStores \|= Res;		MergedStores \|= Res;
// Don't attempt to sink below stores that had to stick around		// Don't attempt to sink below stores that had to stick around
// But after removal of a store and some of its feeding		// But after removal of a store and some of its feeding
// instruction search again from the beginning since the iterator		// instruction search again from the beginning since the iterator
// is likely stale at this point.		// is likely stale at this point.
if (!Res)		if (!Res)
break;		break;
RBI = Pred0->rbegin();		RBI = Pred0->rbegin();
RBE = Pred0->rend();		RBE = Pred0->rend();
DEBUG(dbgs() << "Search again\n"; Instruction I = &RBI; I->dump());		DEBUG(dbgs() << "Search again\n"; Instruction I = &RBI; I->dump());
}		}
}		}
return MergedStores;		return MergedStores;
}		}

		// \brief DFS Number instructions in blocks in dominator order, tracking
		gberryUnsubmitted Done Reply Inline Actions It seems like you forgot to finish this sentence. gberry: It seems like you forgot to finish this sentence.
		GerolfUnsubmitted Done Reply Inline Actions How about: Visit blocks in DFS order and number instructions. For each block track first hoist barrier and last sink barrier. Gerolf: How about: Visit blocks in DFS order and number instructions. For each block track first hoist…
		void MergedLoadStoreMotion::computeBarriers() {
		// This is 1 so the default constructed value of 0 can be used to say we
		// didn't find anything.
		unsigned DFSNum = 1;
		for (auto DI = df_begin(DT->getRootNode()), DE = df_end(DT->getRootNode());
		DI != DE; ++DI) {
		BasicBlock *DomBlock = DI->getBlock();
		bool FoundHoistBarrierInBlock = false;
		for (const auto &Inst : *DomBlock) {
		DFSNumberMap[&Inst] = DFSNum;

		if (!FoundHoistBarrierInBlock &&
		!isGuaranteedToTransferExecutionToSuccessor(&Inst)) {
		FirstHoistBarrier[DomBlock] = DFSNum;
		FoundHoistBarrierInBlock = true;
		}
		if (Inst.mayThrow()) {
		LastSinkBarrier[DomBlock] = DFSNum;
		}
		++DFSNum;
		}
		}
		}

///		///
/// \brief Run the transformation for each function		/// \brief Run the transformation for each function
///		///
bool MergedLoadStoreMotion::runOnFunction(Function &F) {		bool MergedLoadStoreMotion::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

auto *MDWP = getAnalysisIfAvailable<MemoryDependenceWrapperPass>();		auto *MDWP = getAnalysisIfAvailable<MemoryDependenceWrapperPass>();
MD = MDWP ? &MDWP->getMemDep() : nullptr;		MD = MDWP ? &MDWP->getMemDep() : nullptr;
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		if (UseMemorySSA) {
		MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();
		CachingWalker = MSSA->getWalker();
		}

bool Changed = false;		bool Changed = false;
DEBUG(dbgs() << "Instruction Merger\n");		DEBUG(dbgs() << "Instruction Merger\n");
		computeBarriers();

// Merge unconditional branches, allowing PRE to catch more		// Merge unconditional branches, allowing PRE to catch more
// optimization opportunities.		// optimization opportunities.
for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE;) {		for (BasicBlock &BB : F) {
BasicBlock BB = &FI++;

// Hoist equivalent loads and sink stores		// Hoist equivalent loads and sink stores
// outside diamonds when possible		// outside diamonds when possible
if (isDiamondHead(BB)) {		if (isDiamondHead(&BB)) {
Changed \|= mergeLoads(BB);		if (UseMemorySSA)
Changed \|= mergeStores(getDiamondTail(BB));		Changed \|= mergeLoadsMemorySSA(&BB);
		else
		Changed \|= mergeLoads(&BB);
		Changed \|= mergeStores(getDiamondTail(&BB));
}		}
}		}
return Changed;		return Changed;
}		}
		gberryUnsubmitted Done Reply Inline Actions typo: "MemoryDefwith" gberry: typo: "MemoryDefwith"
		gberryUnsubmitted Done Reply Inline Actions typo: "replaceALlUsesWith" gberry: typo: "replaceALlUsesWith"
		mcrosierUnsubmitted Done Reply Inline Actions but -> put mcrosier: but -> put
		mcrosierUnsubmitted Not Done Reply Inline Actions Pre-increment here? mcrosier: Pre-increment here?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Not sure what line this is supposed to go with, i can't find a post-increment around :) dberlin: Not sure what line this is supposed to go with, i can't find a post-increment around :)
		mcrosierUnsubmitted Done Reply Inline Actions Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the if statement. ++NLoads; if (NLoads * Size1 ...) vs if (++NLoads * Size1 ...) mcrosier: Sorry if I wasn't clear. I was suggesting we remove the pre-increment above and do it in the…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Oh, sorry. I copied this code from the non-memoryssa version. (In truth, the code can stand a lot of cleaning up, improvement, and reducing of restrictions) dberlin: Oh, sorry. I copied this code from the non-memoryssa version. (In truth, the code can stand a…
		mcrosierUnsubmitted Done Reply Inline Actions Please add a period. mcrosier: Please add a period.
		mcrosierUnsubmitted Done Reply Inline Actions Please add a period. mcrosier: Please add a period.
		mcrosierUnsubmitted Not Done Reply Inline Actions Any reason we don't just increment in the for statement itself? mcrosier: Any reason we don't just increment in the for statement itself?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating the iterator. It turns out to be non-trivial to change this, but i never moved this loop back to account for this. dberlin: Fixed. Originally, the updater was modifying the accesslist we were walking on and invalidating…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Actually, it can't be done, we still invalidate the current iterator for loads. I'll document this. dberlin: Actually, it can't be done, we still invalidate the current iterator for loads. I'll document…
		gberryUnsubmitted Done Reply Inline Actions Typo: "explosed". Also, there is no 'End' parameter as referenced in the comment. gberry: Typo: "explosed". Also, there is no 'End' parameter as referenced in the comment.
		gberryUnsubmitted Done Reply Inline Actions Isn't MemorySSA only preserved if 'UseMemorySSA' is true? gberry: Isn't MemorySSA only preserved if 'UseMemorySSA' is true?
		dberlinAuthorUnsubmitted Done Reply Inline Actions Yes, fixed. dberlin: Yes, fixed.
		gberryUnsubmitted Done Reply Inline Actions getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not call them. which seems to conflict with this usage. gberry: getDFSNumIn/getDFSNumOut are documented as: These are an internal implementation detail, do not…
		dberlinAuthorUnsubmitted Done Reply Inline Actions I will change the docs for them in a separate patch. The change to make updateDFSNumbers public, etc, was done specifically to enable this usage. See the thread titlte "[PATCH] Make updateDFSNumbers API public" from last year, and there was a review where i computed them separately, and it was suggested to just make it public. dberlin: I will change the docs for them in a separate patch. The change to make updateDFSNumbers…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions This is now fixed in the docs for those functions dberlin: This is now fixed in the docs for those functions
		george.burgess.ivUnsubmitted Done Reply Inline Actions Potentially unfinished thought? george.burgess.iv: Potentially unfinished thought?
		george.burgess.ivUnsubmitted Done Reply Inline Actions Did you mean "only move simple (non-atomic, non-volatile)"? george.burgess.iv: Did you mean "only move simple (non-atomic, non-volatile)"?
		GerolfUnsubmitted Done Reply Inline Actions I think this is the only place where you need UseMemorySSA. All other instances can be handled by if (MSSA). Gerolf: I think this is the only place where you need UseMemorySSA. All other instances can be handled…
		GerolfUnsubmitted Not Done Reply Inline Actions Shouldn't updates be owned by MSSA rather than each client? Gerolf: Shouldn't updates be owned by MSSA rather than each client?
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions (replying here since phab is not good at processing email like this): That would be inconsistent with what we do in the rest of LLVM :) MemorySSA is an SSA form, and like SSA in LLVM, passes are responsible for keeping SSA up to date. This is also true of most analysis preservation. You can see that splitCriticalEdge takes a memdep argument, for example, and will update memdep instead of memdep updating itself. Same with dominators. Passes that want to preserve Dom are responsible for redirecting the dominators where they should go. They can screw it up, too! Doing general updates (IE "I have no idea what happened, you go find out and fix it") is also expensive, and at least for every pass i've converted so far, completely unnecessary. I also had more functional updaters (IE replaceAccessWithHoistedAccess) in memoryssa at earlier points, but if you look earlier in the review, you'll see others wanted it to be simple functionality instead. In retrospect, i agree. We can build a general updater if we need it, and we can build utilities to handle common updates if we need it. So far, it hasn't been necessary. At some point, the better question is really "why is memoryssa a side data structure instead of just part of the normal IR". But that's a question for another day. dberlin: (replying here since phab is not good at processing email like this): That would be…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions (and, just to also point out, the update algorithm we use is specific to diamonds, and would not work for more general control flow. It also would not work without the set of legality testing, etc, we perform elsewhere in the code ) dberlin: (and, just to also point out, the update algorithm we use is specific to diamonds, and would…
		GerolfUnsubmitted Not Done Reply Inline Actions What got me here is that I expected the update to be simple. Why is here more involved than replacing two MD's with one, possibly removing a MP? From the clients perspective updates like this look too complicated to get right. Maybe there is verifier in place, but in the code I don't even see a single assertion but could help debugging issues should something go wrong. Perhaps we can come up with some utility routines that make simpler to grasp: when two memory or more memory operation get moved, this is the blueprint for updating MSSA and guarantee it stays consistent. Gerolf: What got me here is that I expected the update to be simple. Why is here more involved than…
		GerolfUnsubmitted Not Done Reply Inline Actions There could be a lambda or function that takes care of the two loops. Gerolf: There could be a lambda or function that takes care of the two loops.
		GerolfUnsubmitted Not Done Reply Inline Actions Why do you need this? When the first condition is false the while loop does not execute. But then SafeToLoadUnconditionally is not needed anyway Gerolf: Why do you need this? When the first condition is false the while loop does not execute. But…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions It's moved here because it's loop-invariant. That's also why the check matches the loop check, to avoid calculating it if the loop won't execute :) Otherwise, we would calculate it on every loop iteration, when it doesn't change. dberlin: It's moved here because it's loop-invariant. That's also why the check matches the loop check…
		GerolfUnsubmitted Not Done Reply Inline Actions Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the) while condition twice. Gerolf: Ok, I assumed the while loop is on the hot path usually. Then the code checks the (part of the)…
		GerolfUnsubmitted Not Done Reply Inline Actions Load1 is loop invariant and the continue condition can be computed in the header. Gerolf: Load1 is loop invariant and the continue condition can be computed in the header.
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Fixed dberlin: Fixed
		GerolfUnsubmitted Not Done Reply Inline Actions Within a block I thought the DFS numbers are 0 1 2 ... So I would expect loop up(load) < BBHoistBarrier. I'm probably wrong about the DFSs. Gerolf: Within a block I thought the DFS numbers are 0 1 2 ... So I would expect loop up(load) <…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions You are thinking about it backwards i think. The dfs numbers are indeed as you list them If the block looks like this: 0 1 - load barrier 2 - load We can't hoist. If it's 0 1 - load 2 - load barrier We can hoist. So if DFS(load) > DFS(load barrier), that means the load barrier is above us in the block (and we come after the load barrier), and we can't hoist past it. dberlin: You are thinking about it backwards i think. The dfs numbers are indeed as you list them If…
		GerolfUnsubmitted Not Done Reply Inline Actions Hm, it looks alright today :-). Thanks. Gerolf: Hm, it looks alright today :-). Thanks.
		GerolfUnsubmitted Not Done Reply Inline Actions That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could still be Load0. Also, not hoisting above loads that had not been hoisted is too restrictive. With better/cheaper DA this optimization can be more aggressive. Perhaps this could be a FIXME. Gerolf: That comment does not look right. Load0 and load1 just got hoisted and LookupIter->first could…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Yes, i'll fix. This code was copied from mergeLoads originally. It is indeed too restrictive (there are a lot of random restrictions we could relax). I'll add a FIXME. dberlin: 1. Yes, i'll fix. 2. This code was copied from mergeLoads originally. It is indeed too…
		GerolfUnsubmitted Not Done Reply Inline Actions The code from here could be a function. I prefer a function to fit on my screen. Gerolf: The code from here could be a function. I prefer a function to fit on my screen.
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Fixed dberlin: Fixed
		gberryUnsubmitted Not Done Reply Inline Actions Is there a reason you're using a queue for this work-list instead of the more common SmallVector? gberry: Is there a reason you're using a queue for this work-list instead of the more common…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions We push back and pull front because that exploration order guarantees we terminate at the earliest possible point. If we popped off the back, it would be an order that would explore lower things before higher things. dberlin: We push back and pull front because that exploration order guarantees we terminate at the…
		gberryUnsubmitted Not Done Reply Inline Actions Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use it for the Load0 hash table insertion as well? gberry: Maybe check Load1->isSimple() here? Maybe factor out all these checks into a function and use…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Let us please not do that in this patch. Doing that would change this algorithm to catch a different set of loads than the non-memoryssa version. I would like to keep the algorithms identical in what they catch for the moment, so debugging is easy. Eventually, when the mssa version is the default and the non-mssa version killed, we can improve it to be whatever we want. dberlin: Let us please not do that in this patch. Doing that would change this algorithm to catch a…
		gberryUnsubmitted Not Done Reply Inline Actions Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though. gberry: Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :) Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones. dberlin: If that was the case, all the loads in that block are equal (because they also passed…
		gberryUnsubmitted Not Done Reply Inline Actions Is that right (the bit about all the loads being equal)? I'm thinking of a case like the below where all of the loads pass isSameOperation and all have the call as their clobbering access. In this case you'll end up comparing all pairs of loads (assuming none of the loads must alias). call foo br %c, %then, %else then: load i32, %p1 load i32, %p2 load i32, %p3 ... br %end else: load i32, %p4 load i32, %p5 load i32, %p6 ... br %end end: gberry: Is that right (the bit about all the loads being equal)? I'm thinking of a case like the below…
		GerolfUnsubmitted Not Done Reply Inline Actions Footer -> Tail in error message Gerolf: Footer -> Tail in error message
		GerolfUnsubmitted Not Done Reply Inline Actions There is similar code in mergeLoads. That could be a utility function that returns false when there is no memory access in at least one of the blocks, e.g.. static bool hasMemoryAccess(BasicBlock B1, BasicBlock B2). Gerolf: There is similar code in mergeLoads. That could be a utility function that returns false when…
		GerolfUnsubmitted Not Done Reply Inline Actions Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for NLoads there. Gerolf: Should be if (++NStores) to be consistent with your code in mergeLoads. Or change the code for…
		GerolfUnsubmitted Not Done Reply Inline Actions Could be a separate function: Gerolf: Could be a separate function:
		GerolfUnsubmitted Not Done Reply Inline Actions Please add a comment about what the equal stores are. Gerolf: Please add a comment about what the equal stores are.
		GerolfUnsubmitted Not Done Reply Inline Actions Since it is sinking stores would it speed up the code reversing the loop traversal (from second to first)? Gerolf: Since it is sinking stores would it speed up the code reversing the loop traversal (from second…
		GerolfUnsubmitted Not Done Reply Inline Actions A comment would be nice why the accesses get cleared here. Or would that fit better in sinkStores? Gerolf: A comment would be nice why the accesses get cleared here. Or would that fit better in…
		GerolfUnsubmitted Not Done Reply Inline Actions Typo: explosed Gerolf: Typo: explosed
		GerolfUnsubmitted Not Done Reply Inline Actions That looks very wrong. At the minimum this is a convoluted equality test. Gerolf: That looks very wrong. At the minimum this is a convoluted equality test.
		GerolfUnsubmitted Not Done Reply Inline Actions What is a good invariant for this loop? Like there shouldn't be more checks than memory references in the code that contains Start? A checked invariant would make me more comfortable with this code. Gerolf: What is a good invariant for this loop? Like there shouldn't be more checks than memory…
		GerolfUnsubmitted Not Done Reply Inline Actions I continue struggling with this function. For now I just need some help understanding the comment. When you say "above" S0, S1 you refer to DFS numbering like Sx is above S0 in this example: 0 Sx 1 2 S0 Correct? This would be consistent with your DFS explanation in a previous explanation. Gerolf: I continue struggling with this function. For now I just need some help understanding the…
		GerolfUnsubmitted Not Done Reply Inline Actions What is the "top of the diamond"? The head? Gerolf: What is the "top of the diamond"? The head?
		GerolfUnsubmitted Not Done Reply Inline Actions What is the "sink location block"? The Tail? Gerolf: What is the "sink location block"? The Tail?
		GerolfUnsubmitted Not Done Reply Inline Actions What is "the block"? Perhaps "uses in the tail and/or below" describes it better. Gerolf: What is "the block"? Perhaps "uses in the tail and/or below" describes it better.
		GerolfUnsubmitted Not Done Reply Inline Actions Would it make sense to clearly distinguish phi and memory phi (mphi) nodes? Gerolf: Would it make sense to clearly distinguish phi and memory phi (mphi) nodes?
		gberryUnsubmitted Not Done Reply Inline Actions Do you even need the domtree info here to check for barriers in a diamond? There are references to hammocks, but as far as I can tell we never actually attempt to optimize hammocks. If that is the case, couldn't this just check for barriers in the same block as Start? gberry: Do you even need the domtree info here to check for barriers in a diamond? There are…
		gberryUnsubmitted Not Done Reply Inline Actions I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of the diamond that references just one of the sunk stores. For example: ; 1 = MemDef ... call foo br %c %then, %else then: ; 2 = MemDef(1) store @A ; 3 = MemDef(2) store @B br %end else: ; 4 = MemDef(1) store @A br %end end: 5 = MemPhi(3, 4) I believe in this case, after updating the end block will look like: end: 5 = MemPhi(3, 6) ; 6 = MemDef(1) store @A Which seems wrong, since the phi is before it's use, but also it seems like having 6's defining access skip over the phi 5 and def 3 could cause trouble, though I'm less sure about the latter. gberry: I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of…

lib/Transforms/Utils/MemorySSA.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines

MemorySSA::~MemorySSA() {		MemorySSA::~MemorySSA() {
// Drop all our references		// Drop all our references
for (const auto &Pair : PerBlockAccesses)		for (const auto &Pair : PerBlockAccesses)
for (MemoryAccess &MA : *Pair.second)		for (MemoryAccess &MA : *Pair.second)
MA.dropAllReferences();		MA.dropAllReferences();
}		}

MemorySSA::AccessListType MemorySSA::getOrCreateAccessList(BasicBlock BB) {		MemorySSA::AccessListType *
		MemorySSA::getOrCreateAccessList(const BasicBlock *BB) {
auto Res = PerBlockAccesses.insert(std::make_pair(BB, nullptr));		auto Res = PerBlockAccesses.insert(std::make_pair(BB, nullptr));

if (Res.second)		if (Res.second)
Res.first->second = make_unique<AccessListType>();		Res.first->second = make_unique<AccessListType>();
return Res.first->second.get();		return Res.first->second.get();
}		}

MemorySSAWalker *MemorySSA::getWalker() {		MemorySSAWalker *MemorySSA::getWalker() {
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	MemorySSAWalker *MemorySSA::getWalker() {
// Mark the uses in unreachable blocks as live on entry, so that they go		// Mark the uses in unreachable blocks as live on entry, so that they go
// somewhere.		// somewhere.
for (auto &BB : F)		for (auto &BB : F)
if (!Visited.count(&BB))		if (!Visited.count(&BB))
markUnreachableAsLiveOnEntry(&BB);		markUnreachableAsLiveOnEntry(&BB);

return Walker.get();		return Walker.get();
}		}
		MemoryUseOrDef MemorySSA::createDefinedAccess(Instruction I,
		MemoryAccess *Definition) {
		assert(!isa<PHINode>(I) && "Cannot create a defined access for a PHI");
		MemoryUseOrDef *NewAccess = createNewAccess(I);
		assert(
		NewAccess != nullptr &&
		"Tried to create a memory access for a non-memory touching instruction");
		NewAccess->setDefiningAccess(Definition);
		return NewAccess;
		}

		MemoryAccess MemorySSA::createMemoryAccess(Instruction I,
		MemoryAccess *Definition,
		const BasicBlock *BB,
		InsertionPlace Point) {
		MemoryUseOrDef *NewAccess = createDefinedAccess(I, Definition);
		auto *Accesses = getOrCreateAccessList(BB);
		if (Point == Beginning) {
		// It goes after any phi nodes
		auto AI = std::find_if(
		Accesses->begin(), Accesses->end(),
		[](const MemoryAccess &MA) { return !isa<MemoryPhi>(MA); });

		Accesses->insert(AI, NewAccess);
		} else {
		Accesses->push_back(NewAccess);
		}

		return NewAccess;
		}
		MemoryAccess MemorySSA::createMemoryAccessBefore(Instruction I,
		MemoryAccess *Definition,
		MemoryAccess *InsertPt) {
		assert(I->getParent() == InsertPt->getBlock() &&
		"New and old access must be in the same block");
		MemoryUseOrDef *NewAccess = createDefinedAccess(I, Definition);
		auto *Accesses = getOrCreateAccessList(InsertPt->getBlock());
		Accesses->insert(AccessListType::iterator(InsertPt), NewAccess);
		return NewAccess;
		}

		MemoryAccess MemorySSA::createMemoryAccessAfter(Instruction I,
		MemoryAccess *Definition,
		MemoryAccess *InsertPt) {
		assert(I->getParent() == InsertPt->getBlock() &&
		"New and old access must be in the same block");
		MemoryUseOrDef *NewAccess = createDefinedAccess(I, Definition);
		auto *Accesses = getOrCreateAccessList(InsertPt->getBlock());
		Accesses->insertAfter(AccessListType::iterator(InsertPt), NewAccess);
		return NewAccess;
		}

/// \brief Helper function to create new memory accesses		/// \brief Helper function to create new memory accesses
MemoryUseOrDef MemorySSA::createNewAccess(Instruction I) {		MemoryUseOrDef MemorySSA::createNewAccess(Instruction I) {
// The assume intrinsic has a control dependency which we model by claiming		// The assume intrinsic has a control dependency which we model by claiming
// that it writes arbitrarily. Ignore that fake memory dependency here.		// that it writes arbitrarily. Ignore that fake memory dependency here.
// FIXME: Replace this special casing with a more accurate modelling of		// FIXME: Replace this special casing with a more accurate modelling of
// assume's control dependency.		// assume's control dependency.
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I))		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I))
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	for (BasicBlock &B : F) {
}		}

for (Instruction &I : B) {		for (Instruction &I : B) {
MemoryAccess *MD = dyn_cast_or_null<MemoryDef>(getMemoryAccess(&I));		MemoryAccess *MD = dyn_cast_or_null<MemoryDef>(getMemoryAccess(&I));
if (!MD)		if (!MD)
continue;		continue;

for (User *U : MD->users()) {		for (User *U : MD->users()) {
BasicBlock *UseBlock; (void)UseBlock;		BasicBlock *UseBlock;
		(void)UseBlock;
// Things are allowed to flow to phi nodes over their predecessor edge.		// Things are allowed to flow to phi nodes over their predecessor edge.
if (auto *P = dyn_cast<MemoryPhi>(U)) {		if (auto *P = dyn_cast<MemoryPhi>(U)) {
for (const auto &Arg : P->operands()) {		for (const auto &Arg : P->operands()) {
if (Arg == MD) {		if (Arg == MD) {
UseBlock = P->getIncomingBlock(Arg);		UseBlock = P->getIncomingBlock(Arg);
break;		break;
}		}
}		}
▲ Show 20 Lines • Show All 579 Lines • Show Last 20 Lines

test/Transforms/InstMerge/exceptions.ll

	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@r = common global i32 0, align 4			@r = common global i32 0, align 4
	@s = common global i32 0, align 4			@s = common global i32 0, align 4

	; CHECK-LABEL: define void @test1(			; CHECK-LABEL: define void @test1(
	define void @test1(i1 %cmp, i32* noalias %p) {			define void @test1(i1 %cmp, i32* noalias %p) {
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

test/Transforms/InstMerge/ld_hoist1.ll

	; Test load hoist			; Test load hoist
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc_linux"			target triple = "x86_64-pc_linux"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define float* @foo(i32* noalias nocapture readonly %in, float* noalias %out, i32 %size, i32* nocapture readonly %trigger) {			define float* @foo(i32* noalias nocapture readonly %in, float* noalias %out, i32 %size, i32* nocapture readonly %trigger) {
	entry:			entry:
	%cmp11 = icmp eq i32 %size, 0			%cmp11 = icmp eq i32 %size, 0
	br i1 %cmp11, label %for.end, label %for.body.lr.ph			br i1 %cmp11, label %for.end, label %for.body.lr.ph
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

test/Transforms/InstMerge/ld_hoist_st_sink.ll

	; Tests to make sure that loads and stores in a diamond get merged			; Tests to make sure that loads and stores in a diamond get merged
	; Loads are hoisted into the header. Stores sunks into the footer.			; Loads are hoisted into the header. Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i64, %struct.node, %struct.node, %struct.node, i64, %struct.arc, i64, i64, i64 }			%struct.node = type { i64, %struct.node, %struct.node, %struct.node, i64, %struct.arc, i64, i64, i64 }
	%struct.arc = type { i64, i64, i64 }			%struct.arc = type { i64, i64, i64 }

	define i64 @foo(%struct.node* nocapture readonly %r) nounwind {			define i64 @foo(%struct.node* nocapture readonly %r) nounwind {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

test/Transforms/InstMerge/st_sink_barrier_call.ll

	; Test to make sure that a function call that needs to be a barrier to sinking stores is indeed a barrier.			; Test to make sure that a function call that needs to be a barrier to sinking stores is indeed a barrier.
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	declare i32 @foo(i32 %x)			declare i32 @foo(i32 %x)

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	Show All 32 Lines

test/Transforms/InstMerge/st_sink_bugfix_22613.ll

	; ModuleID = 'bug.c'			; ModuleID = 'bug.c'
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; RUN: opt -O2 -S < %s \| FileCheck %s			; RUN: opt -O2 -S < %s \| FileCheck %s
				; RUN: opt -O2 -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s

	; CHECK-LABEL: main			; CHECK-LABEL: main
	; CHECK: if.end			; CHECK: if.end
	; CHECK: store			; CHECK: store
	; CHECK: memset			; CHECK: memset
	; CHECK: if.then			; CHECK: if.then
	; CHECK: store			; CHECK: store
	; CHECK: memset			; CHECK: memset
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/Transforms/InstMerge/st_sink_no_barrier_call.ll

	; Test to make sure that stores in a diamond get merged with a non barrier function call after the store instruction			; Test to make sure that stores in a diamond get merged with a non barrier function call after the store instruction
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	declare i32 @foo(i32 %x) #0			declare i32 @foo(i32 %x) #0

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	Show All 34 Lines

test/Transforms/InstMerge/st_sink_no_barrier_load.ll

	; Test to make sure that stores in a diamond get merged with a non barrier load after the store instruction			; Test to make sure that stores in a diamond get merged with a non barrier load after the store instruction
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	Show All 32 Lines

test/Transforms/InstMerge/st_sink_no_barrier_store.ll

	; Test to make sure that stores in a diamond get merged with a non barrier store after the store instruction to be sunk			; Test to make sure that stores in a diamond get merged with a non barrier store after the store instruction to be sunk
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	Show All 31 Lines

test/Transforms/InstMerge/st_sink_two_stores.ll

	; Test to make sure that stores in a diamond get merged			; Test to make sure that stores in a diamond get merged
	; Stores sunks into the footer.			; Stores sunks into the footer.
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	Show All 36 Lines

test/Transforms/InstMerge/st_sink_with_barrier.ll

	; Test to make sure that load from the same address as a store and appears after the store prevents the store from being sunk			; Test to make sure that load from the same address as a store and appears after the store prevents the store from being sunk
	; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s			; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -basicaa -use-memoryssa -mldst-motion -S < %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }			%struct.node = type { i32, %struct.node, %struct.node, %struct.node*, i32, i32, i32, i32 }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @sink_store(%struct.node* nocapture %r, i32 %index) {			define void @sink_store(%struct.node* nocapture %r, i32 %index) {
	entry:			entry:
	%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2			%node.0.in16 = getelementptr inbounds %struct.node, %struct.node* %r, i64 0, i32 2
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Update MergedLoadStoreMotion to use MemorySSAAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 60709

include/llvm/Transforms/Utils/MemorySSA.h

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

lib/Transforms/Utils/MemorySSA.cpp

test/Transforms/InstMerge/exceptions.ll

test/Transforms/InstMerge/ld_hoist1.ll

test/Transforms/InstMerge/ld_hoist_st_sink.ll

test/Transforms/InstMerge/st_sink_barrier_call.ll

test/Transforms/InstMerge/st_sink_bugfix_22613.ll

test/Transforms/InstMerge/st_sink_no_barrier_call.ll

test/Transforms/InstMerge/st_sink_no_barrier_load.ll

test/Transforms/InstMerge/st_sink_no_barrier_store.ll

test/Transforms/InstMerge/st_sink_two_stores.ll

test/Transforms/InstMerge/st_sink_with_barrier.ll

Update MergedLoadStoreMotion to use MemorySSA
AbandonedPublic