- User Since
- Jul 13 2012, 1:07 PM (274 w, 3 d)
Fri, Oct 13
I attached a patch for this to PR 34937.
I don't have time ATM to perform testing and shepherding of it through review, however.
This is use after free.
When the first implicit control flow instruction for a bb gets erased, we now have wrong info.
Thu, Oct 12
Wed, Oct 11
I do not plan to address these at the moment, because the order is consistent in the pass and doesn't affect correctness or what gets optimized.
If someone else wants to try to make everything reverse order clean, they are welcome to, it's not a priority for me.
Yes, this is known. The original "fix" this reverts is clearly broken. There is a discussion on the original revert thread about this and ways to fix it properly.
Tue, Oct 10
This actually sounds like a great idea, IMHO.
Let's try it and see how well it works in practice.
It certainly would address my current concerns.
Okay if you address the remaining comments.
Suggesting a slightly different fix.
I suspect at least one of your bugs is that the param state lattice doesn't get updated when the value state lattice changes.
Mon, Oct 9
I'd reuse the type we already have to express alias results.
While you don't care about partial alias, it makes all code have the same meaning for the same thing :)
(I think this needs a little thought)
So, i can't find what the suggestion was, but these are fairly specific to value lattices and don't belong being used by most other things.
There are lattices where they are correct, and lattices where they make no sense.
It's also kinda conservative.
thanks for working through this :)
Sounds reasonable to start with.
Also, if you used sets, you could just use set.swap in the constructor (or move constructors) to avoid the extra copying.
I'm a bit confused why you use a vector instead of std::set or DenseSet. It looks like either would be better in every way?
Sat, Oct 7
Preds will in fact be thousands sometimes with switch statements. If you don't want to fix it, let me know and I'll fix the sort on Monday
Fri, Oct 6
Thu, Oct 5
Also this needs a test
We sort siblings by RPO in newgvn. If you want to guarantee parent before child, however, just sort by the dominator tree dfs in and out numbers.
Wed, Oct 4
(Gonna let jakub comment on what should work/not work here)
This looks right to me, since people were trying to do it anyway :)
Tue, Oct 3
Ah, the joy of not having users to break.
Yes, i expect you will have to move a bunch if not all of these to the header.
Mon, Oct 2
FYI: I'm about to rename getOrInitValueState to getValueState.
That is what SCCP calls it, and IMHO, we should be consistent.
It's worth noting: You can make propagation much faster by keeping a separate worklist for overdefined , and prioritizing it over the normal one.
This will move things to overdefined as quickly as possible.
(Because anything meet overdefined is just about always overdefined).
I see this review has been hanging around a while, so i'll take a stab at reviewing it.
Fri, Sep 29
Thu, Sep 28
"Your suggestions above both involve adding functionality under a flag/option. That alone makes them different situations from just landing the current patch, in terms of seeing the work all the way through into production.
So, if the work is done (I think you would have to change some getDefiningAccess calls to getClobberingAccess), and it works, i'd prefer for us to just do that and declare victory.
If it's *not* done, okay, fine, that sucks but not gonna hold up the world here.
- non-local calls in GVN are the cause of pretty much all of it's massive slowdown.
Wed, Sep 27
Is literally the only purpose to handle infinite loops?
If so, please tighten up the description a bit.
(Right now "indicate that the loop shouldn't be optimized away even if it's an infinite
loop with no other side effects." would cover unreachable loops, etc).
(repeating comments i forgot to add to review)
This definitely needs an RFC.
Tue, Sep 26
FWIW: I think this is a good approach to start with (and is small enough that if we decide to go another way, it doesn't hurt anything).
Fri, Sep 22
So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles.
Sep 15 2017
We've already discovered a slew of problem from over-optimizing things in simplifycfg, which is also being used as a canonicalization pass.
Either it needs a set of flags controlling what it does, or it needs to not be trying to do "performance improvement" and "canonicalization" as a single pass.
See, e.g, https://bugs.llvm.org/show_bug.cgi?id=34603
Please don't update it manually.
Pretty much every manual update LLVM does has been proven incorrect over time.
Please just express the added/removed edges if you have to.
If you are cutting out a region, this should not be difficult.
Sep 13 2017
The assembly is clearly worse, the IR is clearly better.
The IR has removed all computation except a mask of the input.
(Note, in the dump i gave, it has not run DCE, so there's a bunch of dead phis).
In essence, it has turned the entire function into a nested if statement based on the bar & x values, where all branches are storing a constant value.
(or selects, or however you want to canonicalize this)
So GCC-7 definitely eliminates all computation in your benchmark at -O3 except for one and, and we should be able to do the same with a newgvn PRE.
It won't do it at O2 because it requires partial antic, which gcc only does at O3.
A couple things:
Sep 12 2017
Sep 11 2017
Eli, i presume an updated version with isGuaranteed... should fix your testcases from 21041, right?
If so, Max, can you add them to this patch?
Sep 8 2017
(and using isGuarante* would handle guard since guard is marked as throwing)
I'll admit to not staring at the cost model too hard (I rarely have useful input on that kind of target specific thing), but it looked at a glance like you trying to calculate which might constant fold as well.
If not, or it's not part of the goal, awesome.
Sure, this will work, but it requires clearing the guard set between iterations :)
Given how much faster either solution is than the original option, seems fine to me. I doubt it will ever be a real performance issue.
Sep 7 2017
Other random note:
NewGVN can do this value-wise, not just lexically.
So in some future when NewGVN is the default, you may just want to make this part of one of the things it does after the analysis (probably after elimination too).
The eliminator will already have eliminated all cases where doing the above eliminates redundancies without costing anything (IE the part where you see how it simplifies)
We could also, for zero cost, track the things where it it would have just been code motion (IE one operand constant, one operand not), and where we didn't do it because it required speculation of multiple operands.
I feel unclear on what you see the safety conditions of this transform are. I believe yours are off a little :)
One reason we don't split critical edges by default in most passes in LLVM is that edges coming out of switches that lead to the same block are pretty much always critical.
Doing so by default for large switches leads to *huge* code blowup.
Of course. Just in case you try to use the code, i'm pretty sure if (OrderedInstructions->dominates(*Iter, guard))
should be if (OrderedInstructions->dominates(guard, *Iter))
Sep 6 2017
Sure, it's a function argument in a few other places.
The other places, you should be using auto :)
I marked a bunch of them.
Right now, for each load, you go through the loadbb, and see if you hit a guard.
This information does not change for each load. At most it changes as GVN removes things (and even then, no).
The number of guards is smaller than the number of blocks.
Sep 5 2017
- This seems wrong for assumes (I don't know enough about guards).
It is not incorrect to hoist through an assume, nor should doing so generate wrong code.
Assumes are essentially metadata, and documented as such.
Sep 1 2017
(I suspect most folks are waiting for chandler to say whether he feels this is good enough at this point or not)
- NewGVN: Cleanup comments
This has now passed all testing i can throw at it, so i'm going to commit it in a day or two, but please feel free to do post-commit review :)
- Update patch for correctness
- Add test
- NewGVN: Add debug message when we can't find a phi of ops operand but looked
- NewGVN: Revert now-unnecessary changes
Aug 31 2017
(At some point, now that we could use post-dominance, you may want to see if it helps. For starters, anything if control dependent on the entry block is guaranteed to execute)
So, LGTM at this point.
Have you performance tested it while being on by default?
Usually, passes that are off by default are undergoing work, with the expectation they will be on by default at some opt level at some point.
Aug 30 2017
Overall, i'm unclear on why you can't compute which ones will actually be sunk, and then split the preds.
Also, the examples does not even require splitting preds to effect the transform.
Aug 29 2017
Aug 28 2017
Aug 27 2017
This also fixes PR 33432, i forgot to add it to the log :)
With the latest update it should be correct now.
- Update patch for correctness
- Add test
Unfortunately, i can prove we have to give up rather than try to constant fold in this case.
Aug 25 2017
(In particular, i believe it may be correct to only care about phi nodes in the same block as the one we are trying to translate through, which would fix 33461 back to the better codegen)
The 33461 change is an example where i believe what we were doing is safe, but i want to see if we still find correctness bugs first :)