This is an archive of the discontinued LLVM Phabricator instance.

Eliminate PHI (int typed) which is only used by inttoptr
ClosedPublic

Authored by davidxl on Sep 13 2017, 4:17 PM.

Details

Summary

SCEV (and AA) has trouble understanding inttoptr/ptrtoint instructions and thus produces conservative analysis results, which blocks transformations such as loop vectorizer. See https://reviews.llvm.org/D37419 for background.

Instead of teaching SCEV to see through those bitcast operations, this patch improves instcombiner to eliminate those operations when possible.

Diff Detail

Repository
rL LLVM

Event Timeline

davidxl created this revision.Sep 13 2017, 4:17 PM
hfinkel edited edge metadata.Sep 14 2017, 8:24 PM

In the spirit of PR34548, do we need to make sure that the results of the inttoptrs are all dereferenced?

Is the code you're seeing like this coming from SROA?

lib/Transforms/InstCombine/InstCombinePHI.cpp
42 ↗(On Diff #115133)

Please make all phi/PHI in this comment have consistent capitalization.

72 ↗(On Diff #115133)

I don't understand why you're looking for a load here specifically. It's a good example, because it shows that you might need to look for something here other than a ptrtoint, but that seems to motivate taking a general input (not specifically looking for this load as you do below).

It does not look like related SROA -- the conversion already exists via stack memory store and load in IR. It looks like related libstdc++'s tuple implementation, though I did not have a small repro -- whether it is from sroa is also irrelevant -- user source can always produce IR like this which needs to be cleaned up or understood (by analysis).

Regarding PR34548, since I can not reproduce it, so I did not really get what the issue is. Can you elaborate why checking deref is needed (which is of course easy to do).

lib/Transforms/InstCombine/InstCombinePHI.cpp
72 ↗(On Diff #115133)

Checking for loads because instcombine can fold the inserted inttoptr away. Same for PHI defs. For other instructions, we are basically just hoisting inttoptr above to the PHI operand, which is unclear it is a win.

davidxl marked an inline comment as done.Sep 15 2017, 2:25 PM

This patch basically just sinks ptrtoint operation from phi arguments to the phi output, but does not do any ptrtoint(inttoptr) folding itself (relies on existing rules to do that), so it should be safe to do by itself.

lib/Transforms/InstCombine/InstCombinePHI.cpp
72 ↗(On Diff #115133)

Also general input is already handled first -- by looking in forward direction if there is an existing inttoptr that can be reused.

bb1:
%arg_int64 = <general operation>
...
%ptr_val = inttoptr i64 %arg_int64 to float*

...
bbm:

%int64_phi = PHI ([%arg_int64, %bb1], [...])
 %ptr_val = inttoptr i64 %int64_phi to float*

>

%ptr_val = PHI([%ptr_val, %bb1], [...])
davidxl updated this revision to Diff 115494.Sep 15 2017, 2:26 PM

Address review comments (fixing lower case in comments) by Hal.

wmi added inline comments.Sep 15 2017, 3:01 PM
lib/Transforms/InstCombine/InstCombinePHI.cpp
101–103 ↗(On Diff #115133)

Is this check always false? IntToPtr->getOperand(0)'s type is a pointer type, its type size should be equal to DL.getPointerSizeInBits?

157–158 ↗(On Diff #115133)

There are four case in which Arg will be added to AvailablePtrVals: 1. Arg comes from ptrtoint 2. Arg is used by inttoptr dominating the incoming bb. 3. Arg is a phi. 4. Arg is a load.
Seems it is still possible for Arg not to fall into the cases above, like Arg is just an int parameter. Then the assertion may break?

160 ↗(On Diff #115133)

Use EI = BB->getFirstNonPHI() to only iterate phi instructions?

173 ↗(On Diff #115133)

Don't have to break then return below. We can just return here.

davidxl added inline comments.Sep 15 2017, 3:55 PM
lib/Transforms/InstCombine/InstCombinePHI.cpp
101–103 ↗(On Diff #115133)

IntToPtr's operand 0's type is integer type not pointer type.

157–158 ↗(On Diff #115133)

The argument case requires ArgIntPtr to be nonull , so it will by pass the rest of the loop (line 127 above).

160 ↗(On Diff #115133)

Ok.

davidxl updated this revision to Diff 115521.Sep 15 2017, 4:37 PM

Address Wei's review feedback

hfinkel added inline comments.Sep 15 2017, 7:47 PM
lib/Transforms/InstCombine/InstCombinePHI.cpp
72 ↗(On Diff #115133)

Checking for loads because instcombine can fold the inserted inttoptr away. Same for PHI defs. For other instructions, we are basically just hoisting inttoptr above to the PHI operand, which is unclear it is a win.

Seems like a win to me. It will make AA more powerful in the loop. Besides, we can have a separate combine for

inttoptr(load(bitcast(pp))) => load(pp)

(assuming we can prove what's necessary to make AA on these pointers safe)

nlopes added a subscriber: nlopes.Sep 18 2017, 6:17 AM

This transformation looks correct to me and goes in the right direction (remove unneeded int2ptr/ptr2int, since they block many optimizations). The patch can be made a bit more general later.
Before the patch goes in, please add a few negative tests; you only have positive ones. Thanks!

lib/Transforms/InstCombine/InstCombinePHI.cpp
143 ↗(On Diff #115521)

I don't think this constraint is needed.

146 ↗(On Diff #115521)

idem.

This transformation looks correct to me and goes in the right direction (remove unneeded int2ptr/ptr2int, since they block many optimizations). The patch can be made a bit more general later.
Before the patch goes in, please add a few negative tests; you only have positive ones. Thanks!

Actually I'm less sure now if it's correct. Please give me 1 more day to review this properly.

Ok, so after another scan through the patch and a discussion with Gil, I must say the transformation is not fully correct.

The main criteria that allows folding load(int2ptr(ptr2int(x))) -> load(x) is that x must be dereferenceable. It might not be because ptr2int(x) might have been replaced with e.g. ptr2int(y+n), a valid, but out-of-bounds pointer.
For example, in intptr2.ll, it removes ptr2int(b), where b is an argument. This is not correct because we don't know whether b is dereferenceable or not. There are two ways to go around this problem: use the dereferenceable(n) parameter tag, or abuse the noinline function tag. If after the last round of inlining and all of IPAs, the pass manager gave indication to optimizations that no further IP stuff will happen, then we can assume whatever the best case about arguments instead of having to assume the worst case since we don't know where it might be inlined.

Also, the value of the integer value on the back-edge is not checked how it's computed. It needs to check that it's dereferenceable as well. A way to do so is to ensure it comes from a GEP of the PHI value, for example.

For the load case (when the value of initial edge of the PHI is a load), I'm still unsure and I'll study that problem further. Anyway, the other cases need a bit of non-trivial work and discussion.

regehr added a subscriber: regehr.Sep 19 2017, 9:34 AM

As I have mentioned, this patch itself does *not* fold any ptrtoint/inttoptr. It simply moves the intptr across the phi node. The folding you see with the test cases is done by existing optimizations, so I am not sure what the objection is about.

As I have mentioned, this patch itself does *not* fold any ptrtoint/inttoptr. It simply moves the intptr across the phi node. The folding you see with the test cases is done by existing optimizations, so I am not sure what the objection is about.

Ok, to be precise the patch does the following:
v = phi(ptr2int(p), ptr2int(q))
=>
v = bitcast(phi(p, q) to int)

This transformation by itself is not correct in all cases. Ptr2int is not a NOP.
Also, it makes me really uneasy the fact that the test cases provided take advantage of a broken optimization. Since it's not possible to test the new transformation in isolation, all I see is an end-to-end incorrect transformation (even if the proposed transformation was fully correct, which is not).

davidxl updated this revision to Diff 116069.Sep 20 2017, 2:14 PM

Addressed review feedback.

In this version, more strict check is added such that int phis that feeds inttoptr whose result is actually used by load/store/getelemenptr operations are considered from this instruction combining. This is strictly not necessary for this particular transformation, but there is a concern that some latent bug may be exposed if not done.

Also added a negative test case.

nlopes added a comment.EditedSep 21 2017, 8:35 AM

Ok, mea culpa, I thought CreateBitOrPointerCast() would simply create a bitcast.
Then, what we have here is:

v = phi(ptr2int(p), ptr2int(q))
 =>
v = ptr2int(phi(p, q))

This is mostly correct. And even if it's not totally correct (at least it's not in our model), it's easy to fix it later. (JFYI: in our model you would need to check there's no int2ptr between ptr2int and the phi).
Anyway, I'm not going to oppose any further against this patch. We don't have a finalized model ourselves, so no one can currently say whether it's correct or not.

I need an explicit LGTM for the patch. thanks.

liuz added a subscriber: liuz.Sep 21 2017, 11:04 AM
wmi added inline comments.Sep 27 2017, 10:12 AM
lib/Transforms/InstCombine/InstCombinePHI.cpp
72 ↗(On Diff #115133)

It seems like a win to me too. By hoisting inttoptr above, no inttoptr/ptrtoint is inside of the loop blocking phi nodes from being recognized as induction variables so loop vectorizer/indvar simplify/lsr will benefit from it.

wmi accepted this revision.Sep 27 2017, 12:05 PM
wmi added inline comments.
lib/Transforms/InstCombine/InstCombinePHI.cpp
72 ↗(On Diff #115133)

Sorry, because of phabricator I missed a part of the discussion about the LoopInfo issue and the plan to make it more general in the next step. It sgtm.

This revision is now accepted and ready to land.Sep 27 2017, 12:05 PM
This revision was automatically updated to reflect the committed changes.
rnk added a subscriber: rnk.Feb 28 2020, 11:24 AM
rnk added inline comments.
llvm/trunk/lib/Transforms/InstCombine/InstCombinePHI.cpp
244

I believe this line of code caused a clang crash with Windows EH:
https://llvm.org/pr45033

getFirstInsertionPt returns an iterator which may be an end iterator, which cannot be cast to Instruction*. This case arises for catchswitch blocks, which have no valid insertion point.

I will put together a fix, but I figured I should alert folks to this corner case to avoid such problems in the future.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 11:24 AM
Herald added a subscriber: javed.absar. · View Herald Transcript