- User Since
- Jul 13 2012, 1:07 PM (282 w, 5 d)
Wed, Nov 29
Mon, Nov 27
So, what Sanjoy has pointed out is that canonicalization like this to select, breaks the ability to remove redundant loads and stores and do PRE, compared to the equivalent control flow version, in general.
Sun, Nov 26
First, to bikeshed this horribly:
Tue, Nov 14
Nov 10 2017
Yeah, I filed a bug about this when we started the first review, its good to have a real testcasr
Nov 9 2017
Errr, outside of an irrelevant store, it generates literally the same output and jump threading result as: https://reviews.llvm.org/rL317768
Nov 8 2017
That's fine too.
Can you please fix LoadPRE as well?
ConstructSSALoadSet needs to be patching instructions it materializes if they come from an existing value.
In load PRE, it is replacing the value of a load with a phi of other values.
Your patch has convinced me that GVN is probably generally broken here, but just happens to not value number or create new phi nodes except for PRE, so it doesn't generate a lot of instances of this issue.
So yeah, i'm fine with the patch in general
Nov 6 2017
I think it would be a lot easier to understand your example if you could produce IR that, when run through -gvn -enable-pre -jump-threading, produces the wrong result without this patch.
TL; DR I'm .. confused.
Oct 31 2017
I'll repeat what i said on the bug
FWIW: Given the choice, i would much rather fix the underlying problem then patch various passes to work around some small subset of it (here, load of selected addresses)
This is underway right now, actually.
Oct 29 2017
Oct 27 2017
This seems very similar to what n-ary reassociate wants to do, I'd like to understand why we shouldn't do it there?
(where it seems to fit perfectly)
Oct 26 2017
Oct 25 2017
Oct 23 2017
This seems reasonable to me,
I'd just add a comment to explain why it's a small set vector (so nobody breaks this future)
Oct 22 2017
So, in the original test and code it's hard to tell, but yes, if it's trying to undo the propagation of constants into phi nodes, it's already going to be fighting, because everything else is trying to do that transform specifically :)
GVN/NewGVN/et al will undo this transform, and propagate constants. In fact, i expect pretty much any pass that can, will.
Oct 20 2017
I'll add a unit test.
- Update for review comments
Oct 18 2017
I haven't heard any better approaches yet, but i'll give a week before committing this.
Oct 17 2017
"Even if we mark for deletion, we should update the map immediately to be able to hoist across the marked instruction. Marking alone does not solve the problem, it will still need the same code that also updates the map. If marking is only needed for uniformity of GVN code, this can be done separately from fixing the actual bug."
The instructions will be deleted and the map updated as soon as we return from this function up the call stack.
It will not prevent further anything, because
- the hoisting has already happened (and all the functions in this stack are going to return and then the next thing that will happen is we will delete the dead instructions)
- The marked instruction could not have blocked hoisting anyway, or else it would not have been safe to PRE.
Both of the deletions are in PRE, after it has been successful. If either the inserted or the deleted instructions were implicit control flow, PRE should have been unsafe. It's probably worth noting this in a comment
Oct 16 2017
Oct 13 2017
I attached a patch for this to PR 34937.
I don't have time ATM to perform testing and shepherding of it through review, however.
This is use after free.
When the first implicit control flow instruction for a bb gets erased, we now have wrong info.
Oct 12 2017
Oct 11 2017
I do not plan to address these at the moment, because the order is consistent in the pass and doesn't affect correctness or what gets optimized.
If someone else wants to try to make everything reverse order clean, they are welcome to, it's not a priority for me.
Yes, this is known. The original "fix" this reverts is clearly broken. There is a discussion on the original revert thread about this and ways to fix it properly.
Oct 10 2017
This actually sounds like a great idea, IMHO.
Let's try it and see how well it works in practice.
It certainly would address my current concerns.
Okay if you address the remaining comments.
Suggesting a slightly different fix.
I suspect at least one of your bugs is that the param state lattice doesn't get updated when the value state lattice changes.
Oct 9 2017
I'd reuse the type we already have to express alias results.
While you don't care about partial alias, it makes all code have the same meaning for the same thing :)
(I think this needs a little thought)
So, i can't find what the suggestion was, but these are fairly specific to value lattices and don't belong being used by most other things.
There are lattices where they are correct, and lattices where they make no sense.
It's also kinda conservative.
thanks for working through this :)
Sounds reasonable to start with.
Also, if you used sets, you could just use set.swap in the constructor (or move constructors) to avoid the extra copying.
I'm a bit confused why you use a vector instead of std::set or DenseSet. It looks like either would be better in every way?
Oct 7 2017
Preds will in fact be thousands sometimes with switch statements. If you don't want to fix it, let me know and I'll fix the sort on Monday
Oct 6 2017
Oct 5 2017
Also this needs a test
We sort siblings by RPO in newgvn. If you want to guarantee parent before child, however, just sort by the dominator tree dfs in and out numbers.
Oct 4 2017
(Gonna let jakub comment on what should work/not work here)
This looks right to me, since people were trying to do it anyway :)
Oct 3 2017
Ah, the joy of not having users to break.
Yes, i expect you will have to move a bunch if not all of these to the header.
Oct 2 2017
FYI: I'm about to rename getOrInitValueState to getValueState.
That is what SCCP calls it, and IMHO, we should be consistent.
It's worth noting: You can make propagation much faster by keeping a separate worklist for overdefined , and prioritizing it over the normal one.
This will move things to overdefined as quickly as possible.
(Because anything meet overdefined is just about always overdefined).
I see this review has been hanging around a while, so i'll take a stab at reviewing it.
Sep 29 2017
Sep 28 2017
"Your suggestions above both involve adding functionality under a flag/option. That alone makes them different situations from just landing the current patch, in terms of seeing the work all the way through into production.
So, if the work is done (I think you would have to change some getDefiningAccess calls to getClobberingAccess), and it works, i'd prefer for us to just do that and declare victory.
If it's *not* done, okay, fine, that sucks but not gonna hold up the world here.
- non-local calls in GVN are the cause of pretty much all of it's massive slowdown.
Sep 27 2017
Is literally the only purpose to handle infinite loops?
If so, please tighten up the description a bit.
(Right now "indicate that the loop shouldn't be optimized away even if it's an infinite
loop with no other side effects." would cover unreachable loops, etc).
(repeating comments i forgot to add to review)
This definitely needs an RFC.
Sep 26 2017
FWIW: I think this is a good approach to start with (and is small enough that if we decide to go another way, it doesn't hurt anything).
Sep 22 2017
So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles (IE nested phi cycles)
Sep 15 2017
We've already discovered a slew of problem from over-optimizing things in simplifycfg, which is also being used as a canonicalization pass.
Either it needs a set of flags controlling what it does, or it needs to not be trying to do "performance improvement" and "canonicalization" as a single pass.
See, e.g, https://bugs.llvm.org/show_bug.cgi?id=34603
Please don't update it manually.
Pretty much every manual update LLVM does has been proven incorrect over time.
Please just express the added/removed edges if you have to.
If you are cutting out a region, this should not be difficult.
Sep 13 2017
The assembly is clearly worse, the IR is clearly better.
The IR has removed all computation except a mask of the input.
(Note, in the dump i gave, it has not run DCE, so there's a bunch of dead phis).
In essence, it has turned the entire function into a nested if statement based on the bar & x values, where all branches are storing a constant value.
(or selects, or however you want to canonicalize this)
So GCC-7 definitely eliminates all computation in your benchmark at -O3 except for one and, and we should be able to do the same with a newgvn PRE.
It won't do it at O2 because it requires partial antic, which gcc only does at O3.
A couple things:
Sep 12 2017
Sep 11 2017
Eli, i presume an updated version with isGuaranteed... should fix your testcases from 21041, right?
If so, Max, can you add them to this patch?
Sep 8 2017
(and using isGuarante* would handle guard since guard is marked as throwing)
I'll admit to not staring at the cost model too hard (I rarely have useful input on that kind of target specific thing), but it looked at a glance like you trying to calculate which might constant fold as well.
If not, or it's not part of the goal, awesome.
Sure, this will work, but it requires clearing the guard set between iterations :)
Given how much faster either solution is than the original option, seems fine to me. I doubt it will ever be a real performance issue.