Add an extension point allowing late loop optimization passes.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Actually, an extension point would not work, since extensions need to be added from outside of the PassManagerBuilder. There would have to be another way to collect them and add them to the pipeline.
This is the simplest way to insert the pass. This is very close to what we actually do. We'd need something like this, which would allow inserting target-specific passes from within the builder.
If we wanted to allow targets to add target-specific extensions, we could add a function to TargetMachine which takes a PassManagerBuilder, and make clang call it. And it's easy enough to add extension points if we need them.
That said, I'm not sure that's really relevant for your particular use-case; the transformation presented here is basically target-independent. The only issue is that there isn't a target-independent equivalent to hexagon_M4_pmpyw (but equivalent instructions do exist on many targets; x86 has PCLMULQDQ, ARM has VMULL.P64, etc.).
That's actually a good point!
That said, I'm not sure that's really relevant for your particular use-case; the transformation presented here is basically target-independent. The only issue is that there isn't a target-independent equivalent to hexagon_M4_pmpyw (but equivalent instructions do exist on many targets; x86 has PCLMULQDQ, ARM has VMULL.P64, etc.).
True, but there is other stuff in that file as well that only applies to Hexagon. Plus posting of this patch was motivated by a post on llvm-dev asking about this exact thing (i.e. target-specific loop idioms).
I assume you just posted the patch as a straw man to go with the llvm-dev discussion or is it something you intend to push forward?
include/llvm/Transforms/IPO/PassManagerBuilder.h | ||
---|---|---|
119 ↗ | (On Diff #84383) | Does not seems something we want in the PMBuilder. |
Both. We have a need for this and there is now someone else interested in this kind of capability. Hal wanted to see a use case, and I think this fits the bill.
include/llvm/Transforms/IPO/PassManagerBuilder.h | ||
---|---|---|
119 ↗ | (On Diff #84383) | I plan to go forward with Eli's suggestion. This was put here just to get this to compile and pass tests. |
Implement Eli's suggestion: add a TM hook to insert target-specific loop idiom recognition passes.
lib/Transforms/IPO/PassManagerBuilder.cpp | ||
---|---|---|
320 ↗ | (On Diff #84557) | This extension point could get its own separate patch. It seems very targeted, if the only point is to have TargetSpecific loop idiom recognition, then why aren't some TTI implemented and used in LoopIdiomPass? |
lib/Transforms/IPO/PassManagerBuilder.cpp | ||
---|---|---|
320 ↗ | (On Diff #84557) | I thought about suggesting this, but didn't because: a. It is not clear what from the existing loop idiom recognition pass could be reused What do you think? |
lib/Transforms/IPO/PassManagerBuilder.cpp | ||
---|---|---|
320 ↗ | (On Diff #84557) | These are good points!
Main point here: the insertion point in the pipeline :)
No sure what you mean by "awkward" with a TTI needing analysis, but that would definitely be a problem if the TTI would want an analysis that is not generic or not currently available in LoopIdiom. The reason I think discussing this extension point in its own revision is because I think we usually have "generic" and "coarse grain" extension points (do we have another case of specific hook like that?). |
include/llvm/Transforms/IPO/PassManagerBuilder.h | ||
---|---|---|
106 ↗ | (On Diff #84557) | Side note, but somehow related to the "too specific" aspect I mentioned, I'd call it EP_BeforeLoopIdiom and I'd describe it: /// This extension point allows adding passes before loop-idiom /// recognition passes. It can be used for example by a target to /// add a specific loop-idiom recognition passes. |
lib/Transforms/IPO/PassManagerBuilder.cpp | ||
---|---|---|
320 ↗ | (On Diff #84557) |
That's an understandable concern, but I think that this particular case is warranted, since it is not uncommon for targets to have their optimized versions of certain loop-based operations. It can be either via target-specific runtime libraries, or directly via hardware instructions. |
include/llvm/Transforms/IPO/PassManagerBuilder.h | ||
---|---|---|
106 ↗ | (On Diff #84557) | That could certainly be done. |
Extracted the implementation of the new insertion point, with the previous comments reflected.
Sorry I didn't notice this waiting on me, and thanks for the gentle prod Krzysztof! My two cents inline.
lib/Transforms/IPO/PassManagerBuilder.cpp | ||
---|---|---|
320 ↗ | (On Diff #84557) | I think this extension point makes sense, and I wouldn't even tie it to the loop idiom recognition but instead to the position in the loop pipeline. Specifically, I'd add a "late loop canonicalization and simplification" extension point. I would put it after the normal loop idiom but before loop deletion. I think documenting it as "the last thing in the loop pipeline before deletion" makes sense. You can talk about the fact that this is where you should extend the pipeline with loop transformations that tend to *remove loops entirely*. You should also make it really clear that these need to be *loop passes* and are run in a pipeline across each loop in the nest. And ideally get some assert failure in the extension mechanism that enforces this. Having this accidentally add a function pass will really dramatically change the loop pipeline by partitioning deletion from the rest. And as Mehdi said, I would land this as its own patch, and then to the TargetMachine patch separately. And just to clarify, I would be very concerned about a TTI callback for the reasons Krzysztof mentioned... Especially the needing another analysis. I just think it'll get awkward fast and injecting passes is pretty clean and testable. |
tools/opt/opt.cpp | ||
292–301 ↗ | (On Diff #84578) | If we go this way, I worry we're just bugging for a third... What about instead you call a method on TargetMachine with the builder that can add any extensions it wants. Then all of this gets sunk into the TM. You can subsume both early-as-possible and the late-loop extensions into this one API. Does that make sense to others? |
I think we have reached a consensus here. Could someone approve if there are no further concerns?