- User Since
- Jul 13 2012, 1:07 PM (266 w, 4 d)
I think this is mostly reasonable at this point.
If you clean up the naming so it matches the style guide (IE you have a lot of things ending in T for no reason, etc), i'll approve it.
These copyright notices definitely should not be here :)
Mon, Aug 21
I don't think there is any ideal here. In the example you give, it reduces the number of bitcasts. In some of the tests you've changed, it adds more.
Thu, Aug 17
This is the same as what i said on D36682 :)
TL;DR This statement is slightyl different, but also has no effect.
It is also a statement that is more or less required by statute/regulation.
The statements are, AFAIK, best viewed as statements of provenance, and are fairly normal for government released works (some are even required by statute)
They do not change or affect the license in any way.
Tue, Aug 15
Mon, Aug 14
Thanks for working on this!
Sat, Aug 12
A few things, just so they end up here:
The NTSCD folks, unfortunately, seem to prove all of the algorithms for weak order dependence are equivalent to those that require with a unique end node. See, http://people.cs.ksu.edu/~tamtoft/Papers/Amt+al:WODvsCD/short.pdf . This means, i believe, outside of the NTSCD algorithms, we won't have a lot of luck with infinite loops.
Fri, Aug 11
Thanks for reviewing this.
This looks reasonable at a glance, note it is likely to be subtly broken in some cases until D35381 goes in.
I've added a few coments, i'll take a deeper look in a little bit.
Thu, Aug 10
The debug counter infrastructure also supports skipping some number of calls at the beginning as well, but that feels like it would generate very odd behavior with InstCombine.
Wed, Aug 9
BTW, if you wish to convince yourself what i said about two identical PDTs not generating the same PDF, here is an example:
I think where i come down on this is: disabled by default , and modify the existing test arguments to enable it, so that you don't have to xfail or modify a ton of tests :)
(then when it's fixed we can just remove the test arguments again).
But i'm also not opposed to doing the test updating.
Same as anything else - lack of time for someone ;)
If you start with a CFG where you require every node go to exit, you don't need to do anything special just when computing postdom, because you just added the previously virtual edges to the actual CFG instead :)
Very right, this phrase is very common (and basically the indication that they did not think about incomplete CFGs at all).
This is why none of the above papers directly applies.
This reasoning is not quite correct:
FIrst, remember that with this patch, we generate a different PDT for your example and mine. That's in fact, part of the point :)
Tue, Aug 8
Mon, Aug 7
(Copying this comment because it ended up on the wrong review thread)
Note: There is also a rewritten DSE that uses PostDom as well: https://reviews.llvm.org/D29624
Thu, Aug 3
EarlyCSE should be able to do this as well, and PredicateInfo will build a form that makes this trivial for anything to do it, in linear time, without any limits necessary.
(By numbers, predicateinfo was the fastest analysis we had that walked the IR at the time it was committed, so ...)
Wed, Aug 2
While I currently do not have any benchmark where optimizing infinite loops is useful (is there one I could have a look?), let's working under the assumption that it actually is. Chandler and Daniel raised the point that just the fact that the PDT contains all nodes makes user code simpler, which is another argument for this kind of modeling. In general, both points together make a good motivation to include all nodes in the PDT. I agree, assuming we understand and are happy with the resulting implications.
This looks fine to me.
I think it may make sense to try to add a few simple cfg drawings to show what machinations are really happening here, but if others undersrtand them just from the text descriptions, awesome.
NewGVN currently has a limitation that davide is fixing around recursive translation, so it won't get it quite yet.
Note also that the usual transform it performs is to remove the redundancy, not to just sink (which was the original usecase for the transform).
If you mainly care about the sinking, it may require a bit of thought. It's also worth noting our dividing line for instcombine was one expressed to me as "local transforms", and this is most certainly not :)
(IE if this is a local transform, i can express PRE in InstCombine, and have it do global PRE).
Tue, Aug 1
Please please don't :)
Doing it the way this codepath does it can be exponential time/space
NewGVN will do the same thing, and in fact, catch all possible cases that can exist here in polynomial time.
Mon, Jul 31
So, this is a subset of an already approved patch, so LGTM.
Thu, Jul 27
I'm going to accept this.
But please give Tobias until early next week to comment, given his past concerns.
(IMHO, if he wants to suggest we do something different, at this point, I believe the onus is on him to implement such a thing and show it works)
Wed, Jul 26
FYI: IDFCalculator can compute the PostDominanceFrontier in linear time.
Tue, Jul 25
Jul 21 2017
FWIW: This is and always has been false accounting in most passes as you know.
Most passes that could use MemorySSA already ask about every load and store in the function.
The only reason people think it's faster to not use MemorySSA, or to avoid the upfront cost, is because our time-passes doesn't account function time to lazy utilities/analysis properly.
If it did, AA/etc time would look *much* larger than it does now, because in reality, it *is* much larger than it seems.
Jul 20 2017
NewGVN, and other things will already get such cases.
I have a bug open for LVI to use predicateinfo, which would make it sensitive to this, as well as make it much faster, but that is a big amount of work.
Jul 18 2017
Jul 17 2017
SSA Updater exists to perform SSA Updates for complicated cases (I also have a patch out to rewrite it, because it's super-slow. The algorithm i gave you is a variant of that patch. See https://reviews.llvm.org/D28934).
(Sorry, this is a work in progress, as you can see from the OldResult stuff that i'm using for verification)
I need this patch for implementation of PR26223.
Note: I don't claim i believe the answers to be 100% correct from a theory perspective, they just match what dominators gives :)
Analyzing more, i'm confused, see the last paragraph.
Jul 14 2017
Yeah, i forgot to say, i'm fine if you just want to fix this as you have already, but if you do, you should leave a note saying it's not optimal or something.
(and feel free to commit once you've update it for all that)
You may want to not treat these as memory at all, as Vedant suggested on list.
Jul 13 2017
This looks right, but can you give an explicit paper reference in the comments of deleteedge?
For the cases that are always true, can you just use PostDominatorTree instead?
You should submit the machinecombiner change separately, but otherwise okay.
Jul 12 2017
Jul 11 2017
I think this is fine for now (the test case actually tests the patch), especially if we are going to move it to use the incremental updater.
It tends to be hard to formulate certain types of testcases right now due to the preconditions of loop-rotate
At some point we should make the DFS iterators more flexible, but i don't think we need to design that for this
Jul 9 2017
Jul 5 2017
the depth_first here should be post_order, i forgot depth_first is preorder
Actually, thinking about this more, i'm confused.
Yeah, this is a very clear case where we should make removeUnreachableBlocks use the new incremental API, and declare victory.
I'm fine with this for the moment, but, as you know, i think we gotta stop thinking "bandaids" :)
Jun 30 2017
I added some simple explanation and references in r306912 and wordsmithed it a bit in r306916.
Please let me know if you think they are sufficient :)
Jun 29 2017
Actually, that won't entirely work without the DFS numbers.
You can shortcut the false cases using the level numbers (IE it's not possible for Inst to have level number 5, an early exit have level number 3, and you dominate it), but you still need to know what part of the dom tree you are in.
Without a reason to do this specifically, it seems a waste of time
Jun 26 2017
Given we haven't found anything fuzzing wise, i think it's fine to land this, but let's watch out for performance regressions and be ready to revert if we need to.
So, just to note: It's generally nonsense to try to calculate the root list prior to DFS (and a source of bugs in the current postdominator tree). By nonsense i mean "impossible".
You need to see what's reachable to know what ends up needing to be connected to a virtual root.
Jun 25 2017
Jun 23 2017
Fine once namespace is renamed
This is fine once the properties are documented.
Can you also change IteratedDominanceFrontier to use this level info instead of computing it on it's own?
Jun 21 2017
I would just name the namespace seminca or something
detail is a random generic name :)
The formatting looks a little odd.
I'm not sure we would name the namespace detail (i'll comment in the patch that adds it)
Otherwise, looks fine.
You should document what the parent and sibling properties are.