I'm not sure an API this general is the most relevant. I would have expected something more tightly tied to the loop itself instead of the region of code which contains the loop. Thinking about how transforms might want to use this, I suspect a much tighter definition would be useful.
Consider a transform which figures out a condition C under which loop L is a noop. Modifying the guard (as defined here) would not be valid, but a guard which controls the preheader directly w/no side effects possible inside the guarded region, but outside the loop would be exactly what was needed. Not sure what other transforms need off the top of my head, but I think that needs to be explored.
Tue, Jul 16
Mon, Jul 15
- Rename InvalidTripCount to UnknownTripCount as per review comments.
Address the following review comments:
- Create a function template to report successful and missed loop fusion opportunities.
- Cleanup debug metadata in the test cases.
- Address other minor comments (pre-increment of stats, function name, missing asserts).
Thu, Jun 27
Wed, Jun 26
Jun 3 2019
This landed in https://reviews.llvm.org/rL340770.
May 30 2019
I agree with this change, conceptually. Have you done any performance measurements to see what the impact is?
May 23 2019
May 22 2019
Aside from a few minor comments, this LGTM.
May 17 2019
I discussed this with @Whitney and I think at this point it makes sense to remove the getGuard and getPotentialGuard methods as there still seems to be some discussions about the right approach for that.
Aside from that, I think everything else looks pretty good, aside from some specific style-related comments.
May 15 2019
I think this looks straightforward, as long as people agree to have a separate CreateConstrained* version of these functions. I'm not qualified to weigh in on that as I don't know Clang at all and can't comment about the tradeoffs (although I think they have been well articulated in the discussion). From what I can see, that is the only thing blocking this from getting approved (unless there is something else I missed while reading the discussion).
A few minor comments about commoning asserts and using dyn_cast instead of cast<>.
Aside from that, I think this looks good. That said, I'm by no means an expert in this area so don't feel I'm qualified to give a final approval to commit.
May 14 2019
Aside from two minor comments, I think this looks fine.
However, I don't think I'm qualified to give the final approval for this to land as I'm just starting to learn the background here.
May 13 2019
Apr 30 2019
Apr 24 2019
Apr 23 2019
Apr 17 2019
Apr 16 2019
All comments have been addressed. I will be committing this soon.
Apr 11 2019
This landed in https://reviews.llvm.org/rL340795.
Mar 26 2019
The only outstanding issue was from @Meinersbur regarding the constructor for FusionCandidate. The concern was keeping the logic in the constructor, vs moving it out into a createFusionCandidate method to wrap the constructor and add all the logic to that method. I don't have a preference one way or the other, but will happily make the change if people feel strongly about it.
Mar 25 2019
Address review comments as follows:
- Strengthen dependence checks between any value used in Loop 1 to ensure it is not defined in loop 0.
- Do not add a tree update to insert an edge between FC0.Latch and FC1.Header when FC0.ExitingBlock is the same as FC0.Latch. This edge has already been inserted into the tree updates earlier and will cause an assertion to fail while performing the updates.
- Rename ExitingBlocks to ExitingBlock since we are only concerned with loops with a single exiting block.
Mar 22 2019
Mar 1 2019
Addressed most of the review comments.
Addressed some more comments. I'll upload a new patch momentarily.
Feb 12 2019
I've addressed almost all of the comments.
I will work on an update to the problem that dmgreen provided an example for and post a new version of the patch when that is done.
In the meantime, if my understanding of the comments is incorrect, someone please correct me :)
Jan 18 2019
You probably realize this, but the link https://llvm.org/LICENSE.txt currently doesn't work ;)
Are you planning on including a copy of the LICENSE.txt file in the source tree as well?
Jan 17 2019
Addressed comments from dmgreen.
I've addressed most of these comments (except the ones I have some questions about). I will be refreshing the patch momentarily.
Dec 20 2018
Addressed all but two of the review comments.
The only outstanding comments are the use of std::set (which I tried to address with comments) and the logic inside the FusionCandidate constructor (which I'm waiting for feedback on the review).
Responded to two of the comments. Most of the others will be addressed in the next revision, which I should hopefully be uploading soon.
Dec 18 2018
Aug 27 2018
Aug 20 2018
Actually can you explain the removal of llvm/test/tools/dsymutil/PowerPC/sibling.test?
I personally can't think of a reason why not to do this. But I think sending an RFC to llvm-dev is in order here.
I'm not overly familiar with this. I think @sfertile is more familiar with this, so I've added him as a reviewer.
Jul 24 2018
Created a new function (isSafeForIPRA) to consistently check whether IPRA can be performed. This check is used in RegUsageInfoCollector as well as isSafeForNoCSROpt. I also went back to the original checks in isSafeForNoCSROpt to check for LocalLinkage.
Jul 17 2018
Jul 5 2018
Thanks @vivekvpandya for getting back to me.
I'll work on updating the patch now. I think we're on the same page with the potential problems.
Essentially, there seems to be some assumptions that when EnableIIPRA is set, it will work properly and add additional information that can be used. Now, with the early exit due to isDefinitionExact, that is not necessarily the case. So, I'm trying to put a check in to validate that IPRA was successful before using the information. This way, if we extend the checks in IPRA in the future, we don't need to continually replicate them in other places that rely on IPRA being successful.
Jun 27 2018
@vivekvpandya You're right - the change will allow noCSROpt to run on functions with external linkage, which is not correct.
I think what we really want here is the isSafeForNoCSROpt to check for the existing conditions (i.e., prior to this patch, when it looks for local linkage) and also whether IPRA was successful for this function. I added the isDefinitionExact() call, to try and replicate the necessary conditions for IPRA, but now I'm wondering if it would be better to query the PRUI and make sure it contains a valid RegMask for the given function. Does this seem reasonable to you?
Jun 25 2018
Jun 6 2018
Added the isDefinitionExact check to isSafeForNoCSROpt method.
Updated the test case to use -print-regusage to check for exact register clobbers computed by IPRA.
Change looks good to me. But please fix the new test case as suggested by @nemanjai
In this new test do you see improvement in register allocation due to IPRA? If test-case does not show any benefit due to IPRA then it is not suitable test for this patch.
Also I hope this patch is also tested with other architecture which have LIT based test-cases for IPRA. It is very unlikely that such test-case will have function with linkage type which return true for isInterposableLinkage() but if such case is there then this change may break such tests.
Apr 4 2018
Oct 24 2017
Oct 23 2017
Aug 22 2017
Did @nemanjai comment about the vector extracts get addressed?
I would change the summary of this review to reflect the fact we are moving this pass back to the PPC target.
Aside from that, this LGTM.
Aside from some minor changes to the comments, this LGTM.
Jul 10 2017
LGTM. Just some minor updates in the comments.
Jun 13 2017
Jun 12 2017
Jun 5 2017
I'm OK with this patch and then a follow-up patch to enable the new behaviour by default.
@arsenm Are you OK with this approach or do you prefer a different approach?
This looks OK to me, but I don't have enough knowledge of SROA to accept this.
If I understand this correctly, you're trying to use the tryBitfieldInsert before the tryBitPermutation function because you can get cleaner code out of some cases of tryBitfieldInsert. However, you only want to use the tryBitfieldInsert on some cases, not on all. Is this correct?
If so, I think it makes sense to refactor the tryBitfieldInsert to focus on the specific cases you want to try and identify and then call the existing tryBitPermutation and the remaining tryBitfieldInsert afterwards. There may be some code duplication as a result (hopefully that can be minimal) but the logic will be easier to understand.
May 25 2017
Aside from a spelling mistake, LGTM.
May 16 2017
Thanks for adding the additional test case!
May 12 2017
May 4 2017
Aside from two minor comments, this LGTM.
Please make sure to update the release notes when this gets committed.
Please add the PR that this problem is addressing to the summary.