- User Since
- Jul 13 2012, 1:07 PM (305 w, 5 d)
This makes me wonder if there should be a real class hierarchy here, but i think this is something we can worry about in the future.
Tue, May 22
If it turns out to be way too ugly. let's circle back and figure out what
we want to do.
Mon, May 21
Sure, you will never find a single flag to control all of this, my goal was to be able to have flags at all to control the various issues.
Right, this is why we haven't done anything yet past taht - i ran out of time to start driving through a real plan :)
One of the reasons i moved Simplify* to take a SimplifyQuery is so that things like this are possible to add without a tremendous amount of work in Simplify*.
Davide beat me to it.
This is, AFAIK, right but your assumption is wrong.
So, actually, this *is* a variant the same issue we have elsewhere with Simplify* (and there are a few newgvn bugs open about it). These actually *are* congruent, we just don't want it simplifying this hard.
Yeah, some of this metadata travels backwards in time (IE if you can prove they are equivalent, the metadata must apply) and some of it is not.
Thanks for looking at this:
IMHO, We should be storing and hashing/comparing the relevant info in the loadexpression (IE store a bit that it's nonnull)
I really do not want to end up with complex business logic in the equals function, that's a thing the callers should be worrying about.
Fri, May 18
Yeah, this is a bad idea :)
Thu, May 17
So, IMHO, the swapping should not be *determined* as part of the equality function, and that is your underlying problem.
This feels like a workaround instead of fixing that problem.
Simplify handling of the false edge of branch predicates. We can handle
this the same way as the true edge with an inverted predicate. This is both
>more general and simpler than the previous code.
Wed, May 16
Ugh, yes, i was focused on the error issuing part and not the hey you have to create real phi nodes part.
Oh looks like Eli beat me by a few minutes :)
- If this is common enough, should we just have a unique_predecessors iterator?
So, this seems very confused.
getBestSimplifyQuery simply queries the analysis manager to see what passes are available.
Tue, May 15
Thanks for doing this!
Mon, May 14
I think the numbers are good enough to be worth it perf wise.
Did you post compile time numbers?
In a perfect world we should update LICM to preserve all CFG analysis.
For now, this at least makes it correct.
Fri, May 11
Looks good to me. You may want to make a unit test for this.
It should only require three blocks.
Thu, May 10
This looks pretty obviously right, i'd commit it.
I can verify this is correct :)
Wed, May 9
We created an abstraction for the ordered basic block caching that works and is used in a few passes.
Tue, May 8
Mon, May 7
IDF does not guarantee that returned basic blocks are sorted in any
Sat, May 5
During MemorySSA development, i'm pretty positive we tried some similar types of reverse maps, and while it did relatively okay on "llvm + clang testsuite", it fell down pretty hard in the real world from memory usage.
Fri, May 4
A few notes:
- Because there is no single root here reaching all nodes, you can't guarantee the ordering will give a single iteration order for any sort order of the list.
- We are assuming no cycles (you could find sccs here to deal with the cycles, it's the same cost as finding RPO because they are done the same way)
- Off the top of my head, my intuition is that if the list is in dominator tree order (which is an RPO except for siblings) to start, the RPO generated by DFS of the operand tree should avoid the problem mentioned in 1, because it would initially order it the same way it would be ordered in the presence of a single root.
Is there an intermediate IR dump you can send me of where you end up with
phis using phis that are useless?
Thu, May 3
So, please don't let me stand in the way of reasonable fixes. I only get
~20 minutes a day to look at llvm stuff these days unless i'm on vacation.
Maybe some day it'll be better again, but i suspect it'll be a while ;)
Wed, May 2
This looks right.
There is an optimal order to visit and revisit, and if you switch them, you will definitely take an extra N iterations per instruction :)
Tue, May 1
I don't understand at all why this is better/simpler
Mon, Apr 30
Not really intentionally, it just grew this way :-)
Thu, Apr 26
Wed, Apr 25
Yeah, i'd really like to understand the value of this before make this code more complex and handle more cases.
Apr 21 2018
Trying to handle the invalid code we allow in unreachable blocks had so far
been a generally losing proposition. AFAIK we've always eventually given up
before getting any pass actually bug free. Can we just stop asking aa
Apr 20 2018
Okay. There is definitely at least 1 cause of value numbers changing after
the main loop completes that is not phi-of-ops related
(it's SCC related)
Apr 19 2018
So, just to put this on the review: the current code on this review definitely not the right fix compared to the patch i sent :)
I think we should commit that split/cleanup (not sure if you did it as part of the last patch.
It sounds like you had time to bootstrap/test it. If so, please feel free to commit it
Apr 18 2018
Sorry for the delay. I've mostly convinced myself this isn't wrong, so let's go with it for now.
Apr 12 2018
GVN should already be able to get this, and if it can't, improve it's intrinsic handling (Either in VNCoercion's memory transfer intrinsic handling, or ...).
This C case is actually not legal (it's been discussed before many times).
You cannot take the address of .a field, add an offset, and get to another field legally.
So i wouldn't express it in these terms, since in that language, this is clearly illegal.
Apr 11 2018
oh phabricator email and it's inability to be consistent about putting emails in comments.
Read the thread on llvm-commits and it'll make more sense :)
Apr 10 2018
(and i have patches to start down this path if anyone is ever interested)
Apr 9 2018
whoops, hit send early.
That said, if you find this actually generates gains (papers on it basically say it should be a few percent improvement in found constants), i don't have any intrinsic reason not to do it just make sure we get the cost vs gain tradeoff right.
So, predicateinfo is one of the fastest passes (or was) that we had. I'm not horribly worried about speed (i could make it faster, too).
It was faster than anything else we do that isn't lazy (IE it's faster than computing dominators, etc).
It wasn't even really measurable except on hundreds of thousands or millions of blocks.
I'm fine with this, we can always improve it more later.
Apr 5 2018
"I know, I also wish we would get rid of MemDep asap, but as long as it's there, it would be good to avoid weird miscompilations that causes calls to vtable."
It's kind of hard for me to know what to do with this.
The cache invalidation is still very very broken with or without this patch.
So this patch is not wrong but everything it does probably needs to be rewritten anyway
Apr 4 2018
This looks right to me (IE if it's wrong, i'm wrong about what will work, it looks implemented correctly)
Apr 3 2018
- Updating D45238: MemCpyOpt does not preserve MemDep, unforunately. See PR 36965, 36964, 36940, 36944, 36943, and 36941
Mar 28 2018
I'm ... confused.
Sorry, again, i don't think this is a thing memdep should be doing.
If GVN can't handle this sanely, that's on GVN.
So i'm not going t approve this.
Mar 26 2018
This isn't correct in the face of atomics and volatiles, and IMHO, MemoryDependence should not be performing this optimization.
If you want, teach GVN to requery MemDep in these cases.
Mar 23 2018
(I do wonder if we shouldn't add a unit test to the MemorySSAUpdater unit tests that just has 100 nested if or something that will take forever with exponential time but be fine with N^2 time, but otherwise, ...)
Mar 22 2018
(a quick scan of things named Map in LLVM doesn't find any other obvious cases of this).
Mar 21 2018
(there is a longer report version that describes the marker variation in detail that is escaping me at the moment. There is also code that implements it in clang).
Mar 20 2018
(but otherwise, if we discover the paper is wrong/etc, this change looks reasnable)
In particular, they should have the same time bounds as us, if both algorithms operate on blocks that each contains 1 instruction.
If the exponential factor comes from repeated walking, they should be just as exponential
Thus, my assumption is we screwed up.
So, staring at https://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf, i don't see how we can get exponential time, and thus wonder if i screwed this up somehow.
I can see we walk blocks to find the defs and they don't have to (because of how they use it), but that should only add a factor of O(N).
Mar 19 2018
+1. Real time updating in the presence of how JT operates right now is too
The lazy analysis it uses require DT be updated at all times, forever
(instead of say, once an iteration).
There is no way to update the domtree 100000 times a pass and make it fast
What we should be targeting is O(10)-O(100) updates.
Mar 16 2018
Yeah, that is why on the other thread i suggested it probably makes sense
to just change how AA is doing it rather than all simplification users
https://reviews.llvm.org/D28934 has a version of the SCC finder you can
borrow if you want (TarjanSCC class)
What you've implemented (or trying to) is really the SCC algorithm from from "Simple and Efficient
Construction of Static Single Assignment Form" at
FWIW: It's also possible to just fix AA without InstSimplify just by
replacing what it uses with something more powerful.
That would avoid disturbing the other users.
Mar 12 2018
In particular, i feel this way because GVN should already get your case
The fact that it doesn't is an issue with AA.
Working around it like this only fixes GVN's ability to optimize this.
There will be plenty of cases something *else* could eliminate code like
that will not be able to
Fixing AA fixes it for everyone (and doesn't require pass-specific
Yeah, my review window didn't update for some reason till just now.
So this looks like a really weird optimization to avoid AA being broken.
As you say, the code it inserts will be fixed up later (and you are taking an O(1) function and making it O(N).
We should just fix AA.
I'm a little worried about the duplicated code here. I feel like you are having to patch way too many points in the same function, and it will be easy to miss one in the future if someone changes the logicc slightly.
Mar 8 2018
It's definitely possible to make the SSAUpdater rewrite faster in bulk insertion. I just never bothered because it wasn't used that way :)
You will never make it as fast as exploiting the problem structure, but it could be made a *lot* faster.
Yeah, this is a generally tricky case to get.
It requires a value based PRE that is going to do a phi of ops transform,
because this testcase is really:
Mar 6 2018
This transform is a variation of some of the other phi of ops vs op of phi transforms we already do, just applied to a very limited case of expressions that would not normally be fully available, but where you've decide the cost is low.
It is also likely to screw up loop induction variables :)
At a minimum, the patch needs bootstrap and testing, i didn't test it except on this case and test/Transforms/NewGVN.
Mar 2 2018
There is also a case 4 here i forgot, which is get unlucky, find something (where all blocks have the same number of preds) and probably generate wrong code :)
So, we are definitely "doing it wrong".
Feb 28 2018
Looks good to me, thanks so much for the updates!
I don't understand how this can happen, since the value phi should be equivalent to an op phi in the same block.
I need to look at this more.
Feb 26 2018
Can you just use update_test_checks to generate the CHECK lines? We are trying to use it for all the newgvn tests.
This is definitely correct.
Note that this also means we are likely leaving around TempToBlock mappings for ops because we don't remove them in removePhiOfOps.
(But that would require reference counting ATM, i believe, because they may be reused).
This shouldn't hurt anything, i believe.
Feb 25 2018
Feb 21 2018
Thanks, that is very helpful. It's much more likely that Dominator info is up to date than loop info, but it's not worth getting into here. The only thing I request is that you update the comments to suggest this only probably makes sense to use if loopinfo is computed already, and please just add your description of how it works with the example to the comments.
Feb 20 2018
Hey, can you help me understand the algorithm?
I don't have a good feel for the invariants you expect going in.
It just says "it works on anything given loop info".
IE What info you expect the loopinfo to provide vs the topo ordering.