This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Allow sinking when foldable in loop
ClosedPublic

Authored by junbuml on Aug 23 2017, 12:08 PM.

Details

Summary

Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.

Diff Detail

Event Timeline

junbuml created this revision.Aug 23 2017, 12:08 PM
bmakam added a subscriber: bmakam.Aug 25 2017, 8:10 AM

ContainFolderableUsersInLoop -> ContainFoldableUsersInLoop?

bmakam added inline comments.Aug 25 2017, 8:12 AM
lib/Transforms/Scalar/LICM.cpp
724

Do we need to check I->getParent() == UserI->getParent()? We already check if CurLoop->contains(UserI) right?

junbuml added inline comments.Aug 25 2017, 8:50 AM
lib/Transforms/Scalar/LICM.cpp
724

Even in the same loop, if they are in different blocks, the current ISel may not fold the GEP into the load.

junbuml updated this revision to Diff 113115.Aug 29 2017, 10:40 AM

Fixed Balaram's comment.

hfinkel added inline comments.Sep 2 2017, 1:24 PM
lib/Transforms/Scalar/LICM.cpp
721

This isn't a very good FIXME because it doesn't explain what you might fix about it. Are there other things for which we might check?

725

This can just be:

return TTI->getUserCost(GEP) == TargetTransformInfo::TCC_Free;

(this code here to call getGEPCost seems to duplicate the implementation logic of getUserCost)

On that note, you might not even have to restrict this to GEPs used by Loads, but rather, you could allow all zero-cost instructions.

770

The test for !ContainFoldableUsersInLoop limits us to looking for only one foldable user within the loop. Why?

hfinkel added inline comments.Sep 2 2017, 1:51 PM
lib/Transforms/Scalar/LICM.cpp
725

(Also, I think that using all 'Free' instructions has the benefit of being correlated with the unrolling cost of the loop - these are the instructions that won't increase the unrolling cost of the loop).

junbuml updated this revision to Diff 114859.Sep 12 2017, 10:08 AM
junbuml marked 2 inline comments as done.

Addressed Hal's comments.

lib/Transforms/Scalar/LICM.cpp
725

Use getUercost() directly, instead of getGEPCost().

It might be possible to allow this for all other zero-cost instructions. However, I'm not perfectly sure if this is good or needed for all other free instructions. For example, I'm not clear if sinking a free trunc is needed? However, in GEP case, by sinking a GEP, we can decouple the users of the GEP: one in loop and one in outside of the loop so that the one in loop will be folded in isel if they are in the same block.

I think we need extensive tests before opening up this for all other free instruction,s and isolating this for GEP as a first step would make review process easy.

725

I don't think Free from getUserCost guarantee that the instruction is folded away always. So, I specifically check for a GEP which could be a legal addressing mode and it's used in a load / store in the same block, expecting the isel fold them into its users.

hfinkel added inline comments.Sep 13 2017, 2:50 AM
lib/Transforms/Scalar/LICM.cpp
725

I don't think Free from getUserCost guarantee that the instruction is folded away always.

Yes, it should be. Free means free. If not, then there's something wrong with the cost model that we should fix.

It might be possible to allow this for all other zero-cost instructions. However, I'm not perfectly sure if this is good or needed for all other free instructions. For example, I'm not clear if sinking a free trunc is needed? However, in GEP case, by sinking a GEP, we can decouple the users of the GEP: one in loop and one in outside of the loop so that the one in loop will be folded in isel if they are in the same block.

I don't understand how the advantages, or disadvantages, of doing this for a free truncate are different from a free GEP. In both cases, we decouple things inside the loop from outside the loop allowing the folding to take place later.

I think we need extensive tests before opening up this for all other free instruction,s and isolating this for GEP as a first step would make review process easy.

Then please run tests (I assume you ran tests for this, as proposed, too). This makes the review harder. LICM is part of our canonicalization process, and we need to have an understandable canonical form. The more that this turns into a patchwork of heuristics, the harder it is to figure out what our canonical form is. "We always decouple free instructions" is easy to explain. We sometimes decouple GEPs if they happen to be used in certain ways is harder.

Considering that LICM is a part of the canonicalization process, it totally make sense to handle all free instructions, instead of being limited in GEPs used by Loads. However, I still don't think free from getUserCost() guarantee nop. It's possible that getUserCost() returns FREE for a GEP, but the GEP cannot be folded into its user. For example, if an user of the GEP is a call instruction taking the GEP as parameter, then the GEP may not be folded in isel. ISel may also not fold a free GEP into a load if the GEP is not in the same block.
Is FREE from getUserCost() supposed to guarantee nop or expect it to be nop?

With my current implementation (check GEPs used by load in the same block), I didn't any clear performance gain in my test for spec2000/2006/2017. However, when this change applied together with my another LICM patch (D37163), observed +4% performance gain in spec2006/xalancbmk.

efriedma edited edge metadata.Sep 13 2017, 2:28 PM

It's possible that getUserCost() returns FREE for a GEP, but the GEP cannot be folded into its user.

That seems like a bug. I mean, it currently does happen (because that's how getGEPCost is implemented), but in principle getUserCost should only return TCC_Free for operations which are actually free.

That seems like a bug. I mean, it currently does happen (because that's how getGEPCost is implemented), but in principle getUserCost should only return TCC_Free for operations which are actually free.

If we have to guarantee a real free for TCC_FREE, this must be unexpected behaviors. I will fix this first, and we can cover all free instructions here.

junbuml updated this revision to Diff 117752.Oct 4 2017, 3:49 PM

Instead of being restricted on GEP, allow this for all zero-cost instructions. Please take a look and let me know any comment.

hfinkel added inline comments.Oct 4 2017, 6:04 PM
include/llvm/Analysis/TargetTransformInfoImpl.h
782 ↗(On Diff #117752)

I see no reason to bifurcate the API by adding the Users parameter only to getGEPCost. getGEPCost may be the only case where we currently check users, but there's no reason that might not change in the future. Regardless, there's no reason to expose this distinction to users of the API. Please add the Users parameter to getUserCost here, and then pass it through to getGEPCost below. Then, in LICM, you can just call getUserCost for everything.

junbuml updated this revision to Diff 118707.Oct 11 2017, 3:04 PM

Use only getUserCost in LICM by adding an overload of getUserCode taking Users as its parameter.

hfinkel added inline comments.Oct 12 2017, 2:04 PM
include/llvm/Analysis/TargetTransformInfoImpl.h
826 ↗(On Diff #118707)

Can you add an overload of getUserCost that does not take a users array and composes the list and then you don't need to change this?

lib/Target/ARM/ARMTargetTransformInfo.cpp
614 ↗(On Diff #118707)

And then also don't need this modification?

junbuml added inline comments.Oct 13 2017, 10:41 AM
include/llvm/Analysis/TargetTransformInfoImpl.h
826 ↗(On Diff #118707)

Before this change, we had the implementation of getUserCost(User*, ArrayRef<const Value *>) in TargetTransformInfoImpl.h. So, it was okay to call getUserCost(User, Operands) here.

In this change I moved the implementation of getUserCost(User*, ArrayRef<const Value *>) to TargetTransformInfo.h as a helper function. Now, we have implementation of getUserCost(User*, ArrayRef<const Value *>, ArrayRef<const User *>), so I make this change.

Do you think we need to move implementation of getUserCost(User*, ArrayRef<const Value *>) in TargetTransformInfoImpl.h? Then, we might need to override each of getUserCost() for targets overriding this function (e.g., X86TargetTransformInfo and HexagonTargetTransformInfo).

junbuml added inline comments.Oct 16 2017, 10:40 AM
lib/Transforms/Scalar/LICM.cpp
729

As r314923 was reverted, we cannot simply use getUserCost for all instructions because it optimistically assume that a GEP will fold into addressing mode regardless of its users. I don't think we can rely on this optimistic assumption in here. To handle GEPs properly in this change, we can check GEP's users here directly, or we can add a function in TTI to see if an instruction is really foldable.

junbuml updated this revision to Diff 119698.Oct 20 2017, 1:27 PM

As r314923 was reverted, we cannot rely on getUserCost in the foldability check for GEPs. For now, I checked GEP's users in LICM directly. Please let me know if you prefer other way around.

hfinkel accepted this revision.Dec 11 2017, 8:22 PM

As r314923 was reverted, we cannot rely on getUserCost in the foldability check for GEPs. For now, I checked GEP's users in LICM directly. Please let me know if you prefer other way around.

I apologize for the delay. Yea, let's go ahead this way for now.

This revision is now accepted and ready to land.Dec 11 2017, 8:22 PM
junbuml updated this revision to Diff 126761.Dec 13 2017, 7:49 AM

Thanks for revisiting this.
Made some changes like the way we iterate users in sink(), while rebasing it .

Thanks for revisiting this.
Made some changes like the way we iterate users in sink(), while rebasing it .

LGTM

This revision was automatically updated to reflect the committed changes.