Okay, i'll bite
Jun 13 2018
I'm confused ... Is there a reason to not just go fix/enable gvn-hoist?
Jun 12 2018
So, procedure wise, you should address all the review comments made, and if you have, and it was accepted afterwards, that is fine.
(IE you don't need everyone who has ever made a comment to accept it).
Jun 4 2018
So, the goal would be to reduce the maintenance burden and significantly more complex logic you are starting to introduce here about what can be propagated where.
If it doesn't bring benefits, i'd like to understand why.
Jun 1 2018
If we want to start expanding this, i would suggest just using predicateinfo, which wouldn't require worklisting/etc to achieve this.
May 28 2018
So, just curious if we can get rid of the extra parameters you've added.
If you bootstrap with combineMetadata and patchReplacement asserting that the replacement dominates, does the assertion trigger?
If not, we can get rid of the parameters
If so, i'd love to see an example where so it can be analyzed.
May 24 2018
This assertion must hold for all replacements independent of this patch, because it would be invalid SSA otherwise.
I don't think the assertion is worth it given that.
The only case that is not true is when you are replacing into a phi node in a successor block (and there it is legal because it dominates that edge, even if it does not dominate the phi node).
Your assertion also would not catch this case.
This makes me wonder if there should be a real class hierarchy here, but i think this is something we can worry about in the future because it will be a larger change.
May 22 2018
If it turns out to be way too ugly. let's circle back and figure out what
we want to do.
May 21 2018
Sure, you will never find a single flag to control all of this, my goal was to be able to have flags at all to control the various issues.
Right, this is why we haven't done anything yet past taht - i ran out of time to start driving through a real plan :)
One of the reasons i moved Simplify* to take a SimplifyQuery is so that things like this are possible to add without a tremendous amount of work in Simplify*.
Davide beat me to it.
This is, AFAIK, right but your assumption is wrong.
So, actually, this *is* a variant the same issue we have elsewhere with Simplify* (and there are a few newgvn bugs open about it). These actually *are* congruent, we just don't want it simplifying this hard.
Yeah, some of this metadata travels backwards in time (IE if you can prove they are equivalent, the metadata must apply) and some of it is not.
Thanks for looking at this:
IMHO, We should be storing and hashing/comparing the relevant info in the loadexpression (IE store a bit that it's nonnull)
I really do not want to end up with complex business logic in the equals function, that's a thing the callers should be worrying about.
May 18 2018
I may have sowed some confusion here by not updating commit messages as the code changed.
The code here in CVP currently:
- use getBestSimplifyQuery to get the best SQ available (but not require a DT in the NPM)
- use the SQ in SQ.DT->dominates (causing the problem) and SimplifyInstruction(.., SQ)
Yeah, this is a bad idea :)
May 17 2018
So, IMHO, the swapping should not be *determined* as part of the equality function, and that is your underlying problem.
This feels like a workaround instead of fixing that problem.
Simplify handling of the false edge of branch predicates. We can handle this the same way as the true edge with an inverted predicate. This is both more general and simpler than the previous code.
May 16 2018
Ugh, yes, i was focused on the error issuing part and not the hey you have to create real phi nodes part.
Oh looks like Eli beat me by a few minutes :)
- If this is common enough, should we just have a unique_predecessors iterator?
So, this seems very confused.
getBestSimplifyQuery simply queries the analysis manager to see what passes are available.
May 15 2018
Thanks for doing this!
May 14 2018
I think the numbers are good enough to be worth it perf wise.
Did you post compile time numbers?
In a perfect world we should update LICM to preserve all CFG analysis.
For now, this at least makes it correct.
May 11 2018
Looks good to me. You may want to make a unit test for this.
It should only require three blocks.
May 10 2018
This looks pretty obviously right, i'd commit it.
I can verify this is correct :)
May 9 2018
We created an abstraction for the ordered basic block caching that works and is used in a few passes.
May 8 2018
May 7 2018
The solution in IDF should just be changing domtreenodepair to a tuple, with the last item being domtreenode->getDFSNumIn()
Just to make sure I understand you correctly: we need to change pair<DomTreeNode *, unsigned> to tuple<DomTreeNode *, unsigned, unsigned> and use the second and third elements as keys for the priority queue (i.e. we will need to implement a compare function for it, which will look at RootLevel first and at DfsNumber second if RootLevels are the same). Is it what you meant?
IDF does not guarantee that returned basic blocks are sorted in any
May 5 2018
During MemorySSA development, i'm pretty positive we tried some similar types of reverse maps, and while it did relatively okay on "llvm + clang testsuite", it fell down pretty hard in the real world from memory usage.
May 4 2018
A few notes:
- Because there is no single root here reaching all nodes, you can't guarantee the ordering will give a single iteration order for any sort order of the list.
- We are assuming no cycles (you could find sccs here to deal with the cycles, it's the same cost as finding RPO because they are done the same way)
- Off the top of my head, my intuition is that if the list is in dominator tree order (which is an RPO except for siblings) to start, the RPO generated by DFS of the operand tree should avoid the problem mentioned in 1, because it would initially order it the same way it would be ordered in the presence of a single root.
Is it guaranteed that a PHI node, without uses, being inserted in PHIsToRemove only can be used by later created PHI nodes
PHI-nodes are only inserted when we call SSAUpdater.RewriteUse. I did not expected that their use lists change afterwards, so frankly, I'm surprised that inserted PHIs get more uses on later iterations - that's why I'm asking for a simpler test that could help to understand how and why this happens.
Then it might be enough to iterate in reverse insertion order...
Why do we need to reverse the order?
As seen in my earlier post we remove the following PHI nodes when using my patch:%value.lcssa2 = phi i16 [ %value3.lcssa, %exit1.pred ], [ %value, %gazonk ] %value3.lcssa = phi i16 [ %value3, %foo ] %value3.lcssa4 = phi i16 [ undef, %unreachable1 ], [ %value3, %bar ] %value.lcssa.lcssa = phi i16 [ %value3.lcssa4, %exit2.pred ], [ %value.lcssa, %qqq ]
My idea was that if we iterated in reverse order we would remove the use of %value3.lcssa4 before trying to remove the PHI node defining that value.
But as seen above we wouldn't be able to remove the PHI node defining %value3.lcssa instead (as it is used by the first PHI node in the list).
So it will not help by changing the iteration order in case we desire to remove all those four PHI nodes.
Is there an intermediate IR dump you can send me of where you end up with
phis using phis that are useless?
May 3 2018
So, please don't let me stand in the way of reasonable fixes. I only get
~20 minutes a day to look at llvm stuff these days unless i'm on vacation.
Maybe some day it'll be better again, but i suspect it'll be a while ;)
May 2 2018
This looks right.
There is an optimal order to visit and revisit, and if you switch them, you will definitely take an extra N iterations per instruction :)
I'm still not sure this is really as general as it could be, but I guess it's okay.
I think what this patch really wants to ask/do is: "Does this binop simplify with the incoming value of the phi to 1 of the binop operands? If yes, substitute that value into the phi."
If you look at it that way, then it should be some add-on to the existing foldOpIntoPhi() - and that's probably a smaller and more general patch.
That substitution analysis seems to fall into the gray area -- or not gray depending on your viewpoint :) -- of whether this belongs in (New)GVN or instcombine. (cc @fhahn).
May 1 2018
I don't understand at all why this is better/simpler
The assertion is checking the a value is not live-in to the entry block unless it is a global. That assert is correct. From your test case, it looks like you've actually found a place where we recurse on a non-local search we shouldn't have. (i.e. a local add with two constant operands should never trigger the backwards walk.)
Apr 30 2018
Not really intentionally, it just grew this way :-)
Apr 26 2018
Hmm. I don't see anything wrong with the code, but how often will this trigger? instcombine will remove the PHI nodes in your testcase, so this is only a problem if you have a pass which creates PHI cycles like this. I guess GVN does in some cases? Can we avoid doing that?
Yes, this comes from GVN where it temporarily introduces phi nodes like this that are later removed, but which in the meantime prevent PRE from happening because of this inaccurate alias analysis. I'd previously gone for the approach of fixing this in GVN in D44160, but the conclusion there was to do it in alias analysis.
Yeah, i'd really like to understand the value of this before make this code more complex and handle more cases.
Really, as we've discussed many times, BasicAA's complex phi/cycle handling should probably be moved out to a caching forward pass that properly resolves cycles, instead of a lazy backwards pass that tries depth limiting to handle cycles. This was waiting on the new pass manager work, AFAIK (because you couldn't cache it otherwise)
Now that this is done (again, AFAIK), one way to start doing that would be to make PhiAA here, have it handle the cycles you are talking about (it's trivial to compute phi-induced cycles and it's not going to affect compile time in any meaningful way even if you invalidate it a lot. There is code to do it already if you want it, it's <100 lines) and do caching, and fall back to BasicAA.
Then we can slowly move functionality into that, from BasicAA.
If I understand you correctly, you're suggesting adding a new alias analysis pass which just handles phis (but by analyising them forwards, instead of backwards as BasicAA does), yes?
Apr 25 2018
Yeah, i'd really like to understand the value of this before make this code more complex and handle more cases.
Apr 21 2018
Trying to handle the invalid code we allow in unreachable blocks had so far
been a generally losing proposition. AFAIK we've always eventually given up
before getting any pass actually bug free. Can we just stop asking aa
Apr 20 2018
Okay. There is definitely at least 1 cause of value numbers changing after
the main loop completes that is not phi-of-ops related
(it's SCC related)
Apr 19 2018
So, just to put this on the review: the current code on this review definitely not the right fix compared to the patch i sent :)
I think we should commit that split/cleanup (not sure if you did it as part of the last patch.
It sounds like you had time to bootstrap/test it. If so, please feel free to commit it
Apr 18 2018
Sorry for the delay. I've mostly convinced myself this isn't wrong, so let's go with it for now.
Apr 12 2018
GVN should already be able to get this, and if it can't, improve it's intrinsic handling (Either in VNCoercion's memory transfer intrinsic handling, or ...).
This C case is actually not legal (it's been discussed before many times).
You cannot take the address of .a field, add an offset, and get to another field legally.
So i wouldn't express it in these terms, since in that language, this is clearly illegal.
Apr 11 2018
oh phabricator email and it's inability to be consistent about putting emails in comments.
Read the thread on llvm-commits and it'll make more sense :)
Apr 10 2018
(and i have patches to start down this path if anyone is ever interested)
Apr 9 2018
whoops, hit send early.
That said, if you find this actually generates gains (papers on it basically say it should be a few percent improvement in found constants), i don't have any intrinsic reason not to do it just make sure we get the cost vs gain tradeoff right.
So, predicateinfo is one of the fastest passes (or was) that we had. I'm not horribly worried about speed (i could make it faster, too).
It was faster than anything else we do that isn't lazy (IE it's faster than computing dominators, etc).
It wasn't even really measurable except on hundreds of thousands or millions of blocks.
I'm fine with this, we can always improve it more later.
Apr 5 2018
"I know, I also wish we would get rid of MemDep asap, but as long as it's there, it would be good to avoid weird miscompilations that causes calls to vtable."
It's kind of hard for me to know what to do with this.
The cache invalidation is still very very broken with or without this patch.
So this patch is not wrong but everything it does probably needs to be rewritten anyway
Apr 4 2018
This looks right to me (IE if it's wrong, i'm wrong about what will work, it looks implemented correctly)
Apr 3 2018
- Updating D45238: MemCpyOpt does not preserve MemDep, unforunately. See PR 36965, 36964, 36940, 36944, 36943, and 36941
Mar 28 2018
I'm ... confused.
Sorry, again, i don't think this is a thing memdep should be doing.
If GVN can't handle this sanely, that's on GVN.
So i'm not going t approve this.
Mar 26 2018
This isn't correct in the face of atomics and volatiles, and IMHO, MemoryDependence should not be performing this optimization.
If you want, teach GVN to requery MemDep in these cases.
Mar 23 2018
(I do wonder if we shouldn't add a unit test to the MemorySSAUpdater unit tests that just has 100 nested if or something that will take forever with exponential time but be fine with N^2 time, but otherwise, ...)
Mar 22 2018
(a quick scan of things named Map in LLVM doesn't find any other obvious cases of this).
Mar 21 2018
(there is a longer report version that describes the marker variation in detail that is escaping me at the moment. There is also code that implements it in clang).
Mar 20 2018
(but otherwise, if we discover the paper is wrong/etc, this change looks reasnable)
In particular, they should have the same time bounds as us, if both algorithms operate on blocks that each contains 1 instruction.
If the exponential factor comes from repeated walking, they should be just as exponential
Thus, my assumption is we screwed up.
So, staring at https://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf, i don't see how we can get exponential time, and thus wonder if i screwed this up somehow.
I can see we walk blocks to find the defs and they don't have to (because of how they use it), but that should only add a factor of O(N).
Mar 19 2018
+1. Real time updating in the presence of how JT operates right now is too
The lazy analysis it uses require DT be updated at all times, forever
(instead of say, once an iteration).
There is no way to update the domtree 100000 times a pass and make it fast
What we should be targeting is O(10)-O(100) updates.
Mar 16 2018
Yeah, that is why on the other thread i suggested it probably makes sense
to just change how AA is doing it rather than all simplification users
https://reviews.llvm.org/D28934 has a version of the SCC finder you can
borrow if you want (TarjanSCC class)
What you've implemented (or trying to) is really the SCC algorithm from from "Simple and Efficient
Construction of Static Single Assignment Form" at
FWIW: It's also possible to just fix AA without InstSimplify just by
replacing what it uses with something more powerful.
That would avoid disturbing the other users.
Mar 12 2018
In particular, i feel this way because GVN should already get your case
The fact that it doesn't is an issue with AA.
Working around it like this only fixes GVN's ability to optimize this.
There will be plenty of cases something *else* could eliminate code like
that will not be able to
Fixing AA fixes it for everyone (and doesn't require pass-specific
Yeah, my review window didn't update for some reason till just now.
So this looks like a really weird optimization to avoid AA being broken.
As you say, the code it inserts will be fixed up later (and you are taking an O(1) function and making it O(N).
We should just fix AA.
I'm a little worried about the duplicated code here. I feel like you are having to patch way too many points in the same function, and it will be easy to miss one in the future if someone changes the logicc slightly.
Mar 8 2018
It's definitely possible to make the SSAUpdater rewrite faster in bulk insertion. I just never bothered because it wasn't used that way :)
You will never make it as fast as exploiting the problem structure, but it could be made a *lot* faster.
Yeah, this is a generally tricky case to get.
It requires a value based PRE that is going to do a phi of ops transform,
because this testcase is really:
Mar 6 2018
This transform is a variation of some of the other phi of ops vs op of phi transforms we already do, just applied to a very limited case of expressions that would not normally be fully available, but where you've decide the cost is low.
It is also likely to screw up loop induction variables :)
At a minimum, the patch needs bootstrap and testing, i didn't test it except on this case and test/Transforms/NewGVN.