Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Details
Diff Detail
Event Timeline
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
708 | Do we need to check I->getParent() == UserI->getParent()? We already check if CurLoop->contains(UserI) right? |
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
708 | Even in the same loop, if they are in different blocks, the current ISel may not fold the GEP into the load. |
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
705 | This isn't a very good FIXME because it doesn't explain what you might fix about it. Are there other things for which we might check? | |
709 | This can just be: return TTI->getUserCost(GEP) == TargetTransformInfo::TCC_Free; (this code here to call getGEPCost seems to duplicate the implementation logic of getUserCost) On that note, you might not even have to restrict this to GEPs used by Loads, but rather, you could allow all zero-cost instructions. | |
768 | The test for !ContainFoldableUsersInLoop limits us to looking for only one foldable user within the loop. Why? |
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
709 | (Also, I think that using all 'Free' instructions has the benefit of being correlated with the unrolling cost of the loop - these are the instructions that won't increase the unrolling cost of the loop). |
Addressed Hal's comments.
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
709 | Use getUercost() directly, instead of getGEPCost(). It might be possible to allow this for all other zero-cost instructions. However, I'm not perfectly sure if this is good or needed for all other free instructions. For example, I'm not clear if sinking a free trunc is needed? However, in GEP case, by sinking a GEP, we can decouple the users of the GEP: one in loop and one in outside of the loop so that the one in loop will be folded in isel if they are in the same block. I think we need extensive tests before opening up this for all other free instruction,s and isolating this for GEP as a first step would make review process easy. | |
709 | I don't think Free from getUserCost guarantee that the instruction is folded away always. So, I specifically check for a GEP which could be a legal addressing mode and it's used in a load / store in the same block, expecting the isel fold them into its users. |
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
709 |
Yes, it should be. Free means free. If not, then there's something wrong with the cost model that we should fix.
I don't understand how the advantages, or disadvantages, of doing this for a free truncate are different from a free GEP. In both cases, we decouple things inside the loop from outside the loop allowing the folding to take place later.
Then please run tests (I assume you ran tests for this, as proposed, too). This makes the review harder. LICM is part of our canonicalization process, and we need to have an understandable canonical form. The more that this turns into a patchwork of heuristics, the harder it is to figure out what our canonical form is. "We always decouple free instructions" is easy to explain. We sometimes decouple GEPs if they happen to be used in certain ways is harder. |
Considering that LICM is a part of the canonicalization process, it totally make sense to handle all free instructions, instead of being limited in GEPs used by Loads. However, I still don't think free from getUserCost() guarantee nop. It's possible that getUserCost() returns FREE for a GEP, but the GEP cannot be folded into its user. For example, if an user of the GEP is a call instruction taking the GEP as parameter, then the GEP may not be folded in isel. ISel may also not fold a free GEP into a load if the GEP is not in the same block.
Is FREE from getUserCost() supposed to guarantee nop or expect it to be nop?
With my current implementation (check GEPs used by load in the same block), I didn't any clear performance gain in my test for spec2000/2006/2017. However, when this change applied together with my another LICM patch (D37163), observed +4% performance gain in spec2006/xalancbmk.
It's possible that getUserCost() returns FREE for a GEP, but the GEP cannot be folded into its user.
That seems like a bug. I mean, it currently does happen (because that's how getGEPCost is implemented), but in principle getUserCost should only return TCC_Free for operations which are actually free.
That seems like a bug. I mean, it currently does happen (because that's how getGEPCost is implemented), but in principle getUserCost should only return TCC_Free for operations which are actually free.
If we have to guarantee a real free for TCC_FREE, this must be unexpected behaviors. I will fix this first, and we can cover all free instructions here.
Instead of being restricted on GEP, allow this for all zero-cost instructions. Please take a look and let me know any comment.
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
781–782 | I see no reason to bifurcate the API by adding the Users parameter only to getGEPCost. getGEPCost may be the only case where we currently check users, but there's no reason that might not change in the future. Regardless, there's no reason to expose this distinction to users of the API. Please add the Users parameter to getUserCost here, and then pass it through to getGEPCost below. Then, in LICM, you can just call getUserCost for everything. |
Use only getUserCost in LICM by adding an overload of getUserCode taking Users as its parameter.
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
826 | Before this change, we had the implementation of getUserCost(User*, ArrayRef<const Value *>) in TargetTransformInfoImpl.h. So, it was okay to call getUserCost(User, Operands) here. In this change I moved the implementation of getUserCost(User*, ArrayRef<const Value *>) to TargetTransformInfo.h as a helper function. Now, we have implementation of getUserCost(User*, ArrayRef<const Value *>, ArrayRef<const User *>), so I make this change. Do you think we need to move implementation of getUserCost(User*, ArrayRef<const Value *>) in TargetTransformInfoImpl.h? Then, we might need to override each of getUserCost() for targets overriding this function (e.g., X86TargetTransformInfo and HexagonTargetTransformInfo). |
lib/Transforms/Scalar/LICM.cpp | ||
---|---|---|
713 | As r314923 was reverted, we cannot simply use getUserCost for all instructions because it optimistically assume that a GEP will fold into addressing mode regardless of its users. I don't think we can rely on this optimistic assumption in here. To handle GEPs properly in this change, we can check GEP's users here directly, or we can add a function in TTI to see if an instruction is really foldable. |
As r314923 was reverted, we cannot rely on getUserCost in the foldability check for GEPs. For now, I checked GEP's users in LICM directly. Please let me know if you prefer other way around.
Thanks for revisiting this.
Made some changes like the way we iterate users in sink(), while rebasing it .
I see no reason to bifurcate the API by adding the Users parameter only to getGEPCost. getGEPCost may be the only case where we currently check users, but there's no reason that might not change in the future. Regardless, there's no reason to expose this distinction to users of the API. Please add the Users parameter to getUserCost here, and then pass it through to getGEPCost below. Then, in LICM, you can just call getUserCost for everything.