dberlin (Daniel Berlin)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 13 2012, 1:07 PM (282 w, 5 d)

Recent Activity

Wed, Nov 29

dberlin added inline comments to D40480: MemorySSA backed Dead Store Elimination. .
Wed, Nov 29, 1:11 AM

Mon, Nov 27

dberlin added a comment to D40304: [InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2).

So, what Sanjoy has pointed out is that canonicalization like this to select, breaks the ability to remove redundant loads and stores and do PRE, compared to the equivalent control flow version, in general.

Mon, Nov 27, 12:35 PM

Sun, Nov 26

dberlin added a comment to D40427: [ADT] Introduce Disjoint Set Union structure.

Two things.
First, to bikeshed this horribly:

Sun, Nov 26, 11:45 PM

Tue, Nov 14

dberlin accepted D39630: [PredicateInfo] Stable sort ValueDFS to remove non-deterministic ordering.
Tue, Nov 14, 10:17 AM

Nov 10 2017

dberlin added a comment to D39835: [GVN PRE] Clear nsw/nuw for original values in LoadPRE.

Yeah, I filed a bug about this when we started the first review, its good to have a real testcasr

Nov 10 2017, 5:56 PM

Nov 9 2017

dberlin added a comment to D39835: [GVN PRE] Clear nsw/nuw for original values in LoadPRE.

Sorry, I didn't really look at the jump-threading testcase in the other patch. (In general, it's bad practice to write testcases which involve multiple passes because it confuses the issue you're actually trying to fix.) We don't need to strip the nsw in pre-jt-add.ll from the other patch, and we don't need to strip nsw for this testcase.

The PRE transform in the testcase in this patch can be split into two parts: one, we clone the load and move it into the predecessors, and two, we eliminate the fully redundant loads. Cloning the load is obviously safe: it's the same operation. Eliminating the fully redundant load is also safe: storing the value to memory and loading it does not change it. Stores do not erase poison; you just get poisoned memory.

Nov 9 2017, 8:15 PM
dberlin added a comment to D39835: [GVN PRE] Clear nsw/nuw for original values in LoadPRE.

Errr, outside of an irrelevant store, it generates literally the same output and jump threading result as: https://reviews.llvm.org/rL317768

Nov 9 2017, 1:28 PM
dberlin added inline comments to D39781: [GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load.
Nov 9 2017, 10:34 AM

Nov 8 2017

dberlin accepted D39637: [GVN PRE] Patch the source for Phi node in PRE.

That's fine too.

Nov 8 2017, 9:01 PM
dberlin added a comment to D39637: [GVN PRE] Patch the source for Phi node in PRE.

Can you please fix LoadPRE as well?
ConstructSSALoadSet needs to be patching instructions it materializes if they come from an existing value.

Nov 8 2017, 8:59 PM
dberlin added a comment to D39637: [GVN PRE] Patch the source for Phi node in PRE.

In load PRE, it is replacing the value of a load with a phi of other values.

Nov 8 2017, 3:40 PM
dberlin added a comment to D39637: [GVN PRE] Patch the source for Phi node in PRE.

Your patch has convinced me that GVN is probably generally broken here, but just happens to not value number or create new phi nodes except for PRE, so it doesn't generate a lot of instances of this issue.
So yeah, i'm fine with the patch in general

Nov 8 2017, 1:51 PM

Nov 6 2017

dberlin added a comment to D39637: [GVN PRE] Patch the source for Phi node in PRE.

I think it would be a lot easier to understand your example if you could produce IR that, when run through -gvn -enable-pre -jump-threading, produces the wrong result without this patch.

Nov 6 2017, 9:07 PM
dberlin added a comment to D39637: [GVN PRE] Patch the source for Phi node in PRE.

TL; DR I'm .. confused.

Nov 6 2017, 8:36 PM

Oct 31 2017

dberlin added a comment to D39467: PR34603 Fix by Early-CSE extension.

I'll repeat what i said on the bug
FWIW: Given the choice, i would much rather fix the underlying problem then patch various passes to work around some small subset of it (here, load of selected addresses)
This is underway right now, actually.

Oct 31 2017, 12:28 PM

Oct 29 2017

dberlin accepted D39410: [GVNHoist] Fix non-deterministic sort order of PHIs for identical instructions.
Oct 29 2017, 10:36 PM

Oct 27 2017

dberlin added a comment to D39340: Modifying reassociate for improved CSE.

This seems very similar to what n-ary reassociate wants to do, I'd like to understand why we shouldn't do it there?
(where it seems to fit perfectly)

Oct 27 2017, 12:07 PM

Oct 26 2017

dberlin added inline comments to D38944: [GVN] Handle removal of first implicit CF instruction correctly.
Oct 26 2017, 11:01 AM

Oct 25 2017

dberlin added inline comments to D38862: Add must alias info to ModRefInfo..
Oct 25 2017, 7:44 PM

Oct 23 2017

dberlin accepted D39025: [GVNSink] Fix failing GVNSink tests in the reverse iteration bot.
Oct 23 2017, 11:15 AM
dberlin added a comment to D39025: [GVNSink] Fix failing GVNSink tests in the reverse iteration bot.

This seems reasonable to me,
I'd just add a comment to explain why it's a small set vector (so nobody breaks this future)

Oct 23 2017, 11:15 AM

Oct 22 2017

dberlin added a comment to D39011: [SimplifyCFG] try harder to forward switch condition to phi (PR34471).

So, in the original test and code it's hard to tell, but yes, if it's trying to undo the propagation of constants into phi nodes, it's already going to be fighting, because everything else is trying to do that transform specifically :)

Oct 22 2017, 11:04 AM
dberlin added a comment to D39011: [SimplifyCFG] try harder to forward switch condition to phi (PR34471).

Hey Sanjay,
GVN/NewGVN/et al will undo this transform, and propagate constants. In fact, i expect pretty much any pass that can, will.

Oct 22 2017, 10:06 AM

Oct 20 2017

dberlin added a comment to D38806: DepthFirstIterator.h: Use C++11 features to call a completed method onthe set type, instead of requiring that one exists..

I'll add a unit test.

Oct 20 2017, 12:01 PM
dberlin added inline comments to D38806: DepthFirstIterator.h: Use C++11 features to call a completed method onthe set type, instead of requiring that one exists..
Oct 20 2017, 11:22 AM
dberlin updated the diff for D38806: DepthFirstIterator.h: Use C++11 features to call a completed method onthe set type, instead of requiring that one exists..
  • Update for review comments
Oct 20 2017, 11:22 AM

Oct 18 2017

dberlin added a comment to D38806: DepthFirstIterator.h: Use C++11 features to call a completed method onthe set type, instead of requiring that one exists..

I haven't heard any better approaches yet, but i'll give a week before committing this.

Oct 18 2017, 8:03 PM

Oct 17 2017

dberlin added a comment to D38944: [GVN] Handle removal of first implicit CF instruction correctly.

"Even if we mark for deletion, we should update the map immediately to be able to hoist across the marked instruction. Marking alone does not solve the problem, it will still need the same code that also updates the map. If marking is only needed for uniformity of GVN code, this can be done separately from fixing the actual bug."
The instructions will be deleted and the map updated as soon as we return from this function up the call stack.
It will not prevent further anything, because

  1. the hoisting has already happened (and all the functions in this stack are going to return and then the next thing that will happen is we will delete the dead instructions)
  2. The marked instruction could not have blocked hoisting anyway, or else it would not have been safe to PRE.
Oct 17 2017, 9:25 AM
dberlin added a comment to D38944: [GVN] Handle removal of first implicit CF instruction correctly.

Both of the deletions are in PRE, after it has been successful. If either the inserted or the deleted instructions were implicit control flow, PRE should have been unsafe. It's probably worth noting this in a comment

Oct 17 2017, 8:00 AM

Oct 16 2017

dberlin added inline comments to D38944: [GVN] Handle removal of first implicit CF instruction correctly.
Oct 16 2017, 11:10 PM
dberlin added inline comments to D38944: [GVN] Handle removal of first implicit CF instruction correctly.
Oct 16 2017, 7:37 AM

Oct 13 2017

dberlin accepted D37353: [SparsePropagation] Enable interprocedural analysis.

Implement the previously discussed approach using generic LatticeKeys. A few things to note about this version of the patch:

First, there's a requirement that LatticeKeys are able to be the key_type of whatever mapping structure the generic solver uses internally (i.e., DenseMap). If a client uses a non-trivial type for LatticeKey, it will have to provide a specialization of DenseMapInfo. But we already have specializations for pointers and PointerIntPairs, which I think probably covers most potential clients. I've added a comment at the top of AbstractLatticeFunction mentioning this.

Oct 13 2017, 7:59 PM
dberlin added a comment to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.

I attached a patch for this to PR 34937.
I don't have time ATM to perform testing and shepherding of it through review, however.

Oct 13 2017, 5:48 PM
dberlin added a comment to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.

This is use after free.
When the first implicit control flow instruction for a bb gets erased, we now have wrong info.

Oct 13 2017, 5:38 PM

Oct 12 2017

dberlin added inline comments to D38856: [IPSCCP] Remove calls without side effects.
Oct 12 2017, 10:12 PM

Oct 11 2017

dberlin added a comment to rL313701: Revert "[GVNSink] Remove dependency on SmallPtrSet iteration order.".

I do not plan to address these at the moment, because the order is consistent in the pass and doesn't affect correctness or what gets optimized.
If someone else wants to try to make everything reverse order clean, they are welcome to, it's not a priority for me.

Oct 11 2017, 11:26 AM
dberlin added a comment to rL313701: Revert "[GVNSink] Remove dependency on SmallPtrSet iteration order.".

Yes, this is known. The original "fix" this reverts is clearly broken. There is a discussion on the original revert thread about this and ways to fix it properly.

Oct 11 2017, 11:19 AM
dberlin created D38806: DepthFirstIterator.h: Use C++11 features to call a completed method onthe set type, instead of requiring that one exists..
Oct 11 2017, 9:42 AM

Oct 10 2017

dberlin added a comment to D37353: [SparsePropagation] Enable interprocedural analysis.

This actually sounds like a great idea, IMHO.
Let's try it and see how well it works in practice.
It certainly would address my current concerns.

Oct 10 2017, 10:40 PM
dberlin accepted D37638: [IPSCCP] Move common functions to IPOUtils (NFC).
Oct 10 2017, 10:38 PM
dberlin accepted D38765: Don't replace constants with constants in GVN.

Okay if you address the remaining comments.
Thanks!

Oct 10 2017, 8:58 PM
dberlin added inline comments to D38765: Don't replace constants with constants in GVN.
Oct 10 2017, 8:56 PM
dberlin added a comment to D38765: Don't replace constants with constants in GVN.

Suggesting a slightly different fix.

Oct 10 2017, 6:00 PM
dberlin added a comment to D36656: [SCCP] Propagate integer range info for parameters in IPSCCP..

I suspect at least one of your bugs is that the param state lattice doesn't get updated when the value state lattice changes.

Oct 10 2017, 10:51 AM

Oct 9 2017

dberlin added a comment to D38569: Expose must/may alias info in MemorySSA..

I'd reuse the type we already have to express alias results.
While you don't care about partial alias, it makes all code have the same meaning for the same thing :)

Oct 9 2017, 7:04 PM
dberlin added a comment to D37353: [SparsePropagation] Enable interprocedural analysis.

(I think this needs a little thought)

Oct 9 2017, 1:30 PM
dberlin added inline comments to D37353: [SparsePropagation] Enable interprocedural analysis.
Oct 9 2017, 1:30 PM
dberlin added a comment to D37638: [IPSCCP] Move common functions to IPOUtils (NFC).

So, i can't find what the suggestion was, but these are fairly specific to value lattices and don't belong being used by most other things.
There are lattices where they are correct, and lattices where they make no sense.
It's also kinda conservative.

Oct 9 2017, 1:27 PM
dberlin accepted D37355: Add CalledValuePropagation pass.

thanks for working through this :)

Oct 9 2017, 11:26 AM
dberlin accepted D36656: [SCCP] Propagate integer range info for parameters in IPSCCP..

Sounds reasonable to start with.

Oct 9 2017, 11:10 AM
dberlin added inline comments to D36656: [SCCP] Propagate integer range info for parameters in IPSCCP..
Oct 9 2017, 7:25 AM
dberlin added a comment to D37355: Add CalledValuePropagation pass.

Also, if you used sets, you could just use set.swap in the constructor (or move constructors) to avoid the extra copying.

Oct 9 2017, 6:51 AM
dberlin added a comment to D37355: Add CalledValuePropagation pass.

I'm a bit confused why you use a vector instead of std::set or DenseSet. It looks like either would be better in every way?

Oct 9 2017, 6:50 AM

Oct 7 2017

dberlin added a comment to D38586: SLPVectorizer.cpp: Ensure SLPVectorizer can visit each block by dominated order.

Preds will in fact be thousands sometimes with switch statements. If you don't want to fix it, let me know and I'll fix the sort on Monday

Oct 7 2017, 7:37 AM

Oct 6 2017

dberlin added inline comments to D38569: Expose must/may alias info in MemorySSA..
Oct 6 2017, 7:56 AM

Oct 5 2017

dberlin added a comment to D37467: Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable..

I've taken some time to go through the code in NewGVN that computes similar things based on the suggestion from Danny. These now do *very* similar things in terms of walk. The differences I've seen are:

  1. While the safety checks are both preorder, my code uses DFS instead of BFS. The reason is that we walk up the stack to mark more nodes as unsafe when we discover an unsafe node.
Oct 5 2017, 6:34 PM
dberlin requested changes to D38586: SLPVectorizer.cpp: Ensure SLPVectorizer can visit each block by dominated order.

Also this needs a test

Oct 5 2017, 3:07 PM
dberlin added a comment to D38586: SLPVectorizer.cpp: Ensure SLPVectorizer can visit each block by dominated order.

We sort siblings by RPO in newgvn. If you want to guarantee parent before child, however, just sort by the dominator tree dfs in and out numbers.

Oct 5 2017, 3:01 PM

Oct 4 2017

dberlin added inline comments to D38569: Expose must/may alias info in MemorySSA..
Oct 4 2017, 6:30 PM
dberlin added a comment to D37343: [CGP] Merge empty case blocks if no extra moves are added..

(Gonna let jakub comment on what should work/not work here)

Oct 4 2017, 4:20 PM
dberlin accepted D38561: [SparsePropagation] Move member definitions to header (NFC).
Oct 4 2017, 2:06 PM
dberlin added a comment to D38561: [SparsePropagation] Move member definitions to header (NFC).

LGTM

Oct 4 2017, 2:05 PM
dberlin accepted D38541: [Dominators] Take fast path when applying <=1 updates.
Oct 4 2017, 10:09 AM
dberlin added a comment to D38541: [Dominators] Take fast path when applying <=1 updates.

This looks right to me, since people were trying to do it anyway :)

Oct 4 2017, 10:09 AM

Oct 3 2017

dberlin added a comment to D38476: Template the sparse propagation solver instead of using void pointers.

Ah, the joy of not having users to break.
Yes, i expect you will have to move a bunch if not all of these to the header.

Oct 3 2017, 1:09 PM

Oct 2 2017

dberlin added a comment to D37353: [SparsePropagation] Enable interprocedural analysis.

FYI: I'm about to rename getOrInitValueState to getValueState.
That is what SCCP calls it, and IMHO, we should be consistent.

Oct 2 2017, 4:08 PM
dberlin created D38476: Template the sparse propagation solver instead of using void pointers.
Oct 2 2017, 1:26 PM
dberlin added a comment to D37353: [SparsePropagation] Enable interprocedural analysis.

It's worth noting: You can make propagation much faster by keeping a separate worklist for overdefined , and prioritizing it over the normal one.
This will move things to overdefined as quickly as possible.
(Because anything meet overdefined is just about always overdefined).

Oct 2 2017, 11:38 AM
dberlin added a comment to D37355: Add CalledValuePropagation pass.

I see this review has been hanging around a while, so i'll take a stab at reviewing it.

Oct 2 2017, 11:19 AM
dberlin accepted D38331: [Dominators] Add DFS number verification.
Oct 2 2017, 10:51 AM
dberlin added inline comments to D38331: [Dominators] Add DFS number verification.
Oct 2 2017, 8:47 AM
dberlin added inline comments to D38331: [Dominators] Add DFS number verification.
Oct 2 2017, 8:40 AM

Sep 29 2017

dberlin added inline comments to D37343: [CGP] Merge empty case blocks if no extra moves are added..
Sep 29 2017, 1:14 PM
dberlin added a comment to D38336: Add an @llvm.sideeffect intrinsic.

This update revises the documentation to clarify that the intrinsic doesn't inherently prevent loops from being optimized away.

If it supports C++, ADCE needs to preserve calls to a library I/O functions, volatile operations, thread synchronizations, and atomic operations, and calls to functions which may perform any of these. Whatever it does to handle those should handle llvm.sideeffect as well.

This seems pretty incongruous to me with your answers to my earlier questions, so i feel like i'm missing something.

The basic idea of llvm.sideeffect is to just be a thing that gets treated like it has side effects. It's just like a volatile store through a non-escaped memory address.

I may have been unclear when you asked

Do you expect it to keep provably finite loops that have it alive.
IE a loop where we can prove the backedge is not taken?

and I said

No; a backedge can still be folded away in the normal ways. The llvm.sideeffect call would remain, no longer needed, but not invalid. If it were important, we could add code to remove it in that case, but the patch here doesn't have that.

I interpreted your question to be just about loops where we can prove the backedge is not taken. llvm.sideeffect does keep finite loops alive under the same conditions that other operations with side effects would.

In the future, we may want to optimize away redundant llvm.sideeffect calls, and an advanced form of this could allow optimizing away provably-finite loops containing llvm.sideeffect, however, the current patch doesn't depend on this.

Sep 29 2017, 8:33 AM

Sep 28 2017

dberlin added a comment to D38336: Add an @llvm.sideeffect intrinsic.

One reason i ask is because ADCE/etc wouldn't even see it. they won't process the block in such a case.
Hence, if the intrinsic relies on always being looked at and then deciding to keep everything containing it, i'd be opposed (since that adds O(N) walks to everything :P).
Right now, it doesn't sound like that's the case.

If it supports C++, ADCE needs to preserve calls to a library I/O functions, volatile operations, thread synchronizations, and atomic operations, and calls to functions which may perform any of these. Whatever it does to handle those should handle llvm.sideeffect as well.

Sep 28 2017, 10:37 PM
dberlin added a comment to D38374: [memcpyopt] Memcpy-memcpy dependence isn't detected across basic blocks.

"Your suggestions above both involve adding functionality under a flag/option. That alone makes them different situations from just landing the current patch, in terms of seeing the work all the way through into production.
"

Sep 28 2017, 10:20 PM
dberlin added a comment to D38374: [memcpyopt] Memcpy-memcpy dependence isn't detected across basic blocks.

So, if the work is done (I think you would have to change some getDefiningAccess calls to getClobberingAccess), and it works, i'd prefer for us to just do that and declare victory.
If it's *not* done, okay, fine, that sucks but not gonna hold up the world here.

Sep 28 2017, 6:41 PM
dberlin added a comment to D38374: [memcpyopt] Memcpy-memcpy dependence isn't detected across basic blocks.
  1. non-local calls in GVN are the cause of pretty much all of it's massive slowdown.

    Have you measured the effect this has on larger memory-using testcases?

I've measured compile times on some C++ and Rust testcases, and it doesn't appear to significantly slow down -O2. memcpyopt remains under 1% of the time, according to -time-passes.

Sep 28 2017, 2:36 PM
dberlin added a comment to D38374: [memcpyopt] Memcpy-memcpy dependence isn't detected across basic blocks.
  1. non-local calls in GVN are the cause of pretty much all of it's massive slowdown.
Sep 28 2017, 12:00 PM
dberlin accepted D37528: [JumpThreading] Preserve DT and LVI across the pass..
Sep 28 2017, 8:10 AM

Sep 27 2017

dberlin added a comment to D38336: Add an @llvm.sideeffect intrinsic.

Is literally the only purpose to handle infinite loops?
If so, please tighten up the description a bit.
(Right now "indicate that the loop shouldn't be optimized away even if it's an infinite
loop with no other side effects." would cover unreachable loops, etc).

Another potential use was mentioned here.

Sep 27 2017, 7:44 PM
dberlin added a comment to D38336: Add an @llvm.sideeffect intrinsic.

Is literally the only purpose to handle infinite loops?
If so, please tighten up the description a bit.
(Right now "indicate that the loop shouldn't be optimized away even if it's an infinite
loop with no other side effects." would cover unreachable loops, etc).

Sep 27 2017, 4:56 PM
dberlin added a comment to D37575: [BasicBlock] add new function removeEdge().

(repeating comments i forgot to add to review)
This definitely needs an RFC.

Sep 27 2017, 8:45 AM

Sep 26 2017

dberlin added a comment to D36656: [SCCP] Propagate integer range info for parameters in IPSCCP..

FWIW: I think this is a good approach to start with (and is small enough that if we decide to go another way, it doesn't hurt anything).

Sep 26 2017, 3:35 PM

Sep 22 2017

dberlin added a comment to D38097: [IVUsers] Changes to make IVUsers's results robust to instruction and uselist ordering.

So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles (IE nested phi cycles)

Sep 22 2017, 8:58 AM

Sep 15 2017

dberlin added inline comments to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.
Sep 15 2017, 9:08 PM
dberlin added a comment to D37940: Peel off the dominant case in switch statement.

We've already discovered a slew of problem from over-optimizing things in simplifycfg, which is also being used as a canonicalization pass.
Either it needs a set of flags controlling what it does, or it needs to not be trying to do "performance improvement" and "canonicalization" as a single pass.
See, e.g, https://bugs.llvm.org/show_bug.cgi?id=34603

Sep 15 2017, 4:35 PM
dberlin added a comment to D37902: [CodeExtractor] Fix multiple bugs under certain shape of extracted region.

Please don't update it manually.
Pretty much every manual update LLVM does has been proven incorrect over time.
Please just express the added/removed edges if you have to.
If you are cutting out a region, this should not be difficult.

Sep 15 2017, 9:31 AM

Sep 13 2017

dberlin added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

The assembly is clearly worse, the IR is clearly better.
The IR has removed all computation except a mask of the input.
(Note, in the dump i gave, it has not run DCE, so there's a bunch of dead phis).
In essence, it has turned the entire function into a nested if statement based on the bar & x values, where all branches are storing a constant value.
(or selects, or however you want to canonicalize this)

Sep 13 2017, 6:00 PM
dberlin added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

So GCC-7 definitely eliminates all computation in your benchmark at -O3 except for one and, and we should be able to do the same with a newgvn PRE.
It won't do it at O2 because it requires partial antic, which gcc only does at O3.

Sep 13 2017, 5:04 PM
dberlin added a comment to D37762: [InstCombine] Remove single use restriction from InstCombine's explicit sinking code..

A couple things:

Sep 13 2017, 3:17 PM
dberlin added inline comments to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.
Sep 13 2017, 7:39 AM

Sep 12 2017

dberlin added inline comments to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.
Sep 12 2017, 10:09 PM

Sep 11 2017

dberlin added inline comments to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.
Sep 11 2017, 7:53 PM
dberlin added a comment to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.

Eli, i presume an updated version with isGuaranteed... should fix your testcases from 21041, right?
If so, Max, can you add them to this patch?

Sep 11 2017, 12:57 PM

Sep 8 2017

dberlin added a comment to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.

(and using isGuarante* would handle guard since guard is marked as throwing)

Sep 8 2017, 12:56 PM
dberlin added inline comments to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.
Sep 8 2017, 12:55 PM
dberlin added a comment to D37467: Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable..

I'll admit to not staring at the cost model too hard (I rarely have useful input on that kind of target specific thing), but it looked at a glance like you trying to calculate which might constant fold as well.
If not, or it's not part of the goal, awesome.

Sep 8 2017, 8:57 AM
dberlin added a comment to D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.

Actually we don't even need OrderedInstructions to do that. It's enough to maintain a set of blocks where we have ever seen the guards. We iterate through blocks in reverse postorder which guarantees us that if a block has one predecessor it will be visited before the block. We also traverse block instructions in direct order. Using this, we just mark a block as having a guard when we meet one and know for sure that:

  1. if the current block is marked, there is a guard above the current instruction, and we cannot hoist;
  2. if the only predecessor (recursively) is marked, it also blocks us from hoisting.

    I've updated the patch using this solution. Please let me know if it makes sense to you.

Sure, this will work, but it requires clearing the guard set between iterations :)
Given how much faster either solution is than the original option, seems fine to me. I doubt it will ever be a real performance issue.

Sep 8 2017, 1:36 AM
dberlin added a comment to D37463: Fix miscompile in LoopSink pass.

I need to read what C++ specification says about this particular issue, but basically LLVM is not only used to compile C++. This situation can be illegal in other languages (again, need to dig more through specifications). My proposal is to add an option that prohibits this transform and set it to false by default, with abitily to turn it off for languages where it is prohibited.

Sep 8 2017, 12:51 AM