Page MenuHomePhabricator

Add loop pass insertion point EP_LateLoopOptimizations
ClosedPublic

Authored by kparzysz on Jan 13 2017, 1:03 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz updated this revision to Diff 84357.Jan 13 2017, 1:03 PM
kparzysz retitled this revision from to Hexagon-specific loop idiom recognition.
kparzysz updated this object.
kparzysz added a reviewer: hfinkel.
kparzysz set the repository for this revision to rL LLVM.
kparzysz added subscribers: mehdi_amini, llvm-commits.

Actually, an extension point would not work, since extensions need to be added from outside of the PassManagerBuilder. There would have to be another way to collect them and add them to the pipeline.

kparzysz updated this revision to Diff 84383.Jan 13 2017, 2:19 PM

This is the simplest way to insert the pass. This is very close to what we actually do. We'd need something like this, which would allow inserting target-specific passes from within the builder.

If we wanted to allow targets to add target-specific extensions, we could add a function to TargetMachine which takes a PassManagerBuilder, and make clang call it. And it's easy enough to add extension points if we need them.

That said, I'm not sure that's really relevant for your particular use-case; the transformation presented here is basically target-independent. The only issue is that there isn't a target-independent equivalent to hexagon_M4_pmpyw (but equivalent instructions do exist on many targets; x86 has PCLMULQDQ, ARM has VMULL.P64, etc.).

If we wanted to allow targets to add target-specific extensions, we could add a function to TargetMachine which takes a PassManagerBuilder, and make clang call it. And it's easy enough to add extension points if we need them.

That's actually a good point!

That said, I'm not sure that's really relevant for your particular use-case; the transformation presented here is basically target-independent. The only issue is that there isn't a target-independent equivalent to hexagon_M4_pmpyw (but equivalent instructions do exist on many targets; x86 has PCLMULQDQ, ARM has VMULL.P64, etc.).

True, but there is other stuff in that file as well that only applies to Hexagon. Plus posting of this patch was motivated by a post on llvm-dev asking about this exact thing (i.e. target-specific loop idioms).

I assume you just posted the patch as a straw man to go with the llvm-dev discussion or is it something you intend to push forward?

include/llvm/Transforms/IPO/PassManagerBuilder.h
119

Does not seems something we want in the PMBuilder.

I assume you just posted the patch as a straw man to go with the llvm-dev discussion or is it something you intend to push forward?

Both. We have a need for this and there is now someone else interested in this kind of capability. Hal wanted to see a use case, and I think this fits the bill.

kparzysz added inline comments.Jan 14 2017, 3:27 PM
include/llvm/Transforms/IPO/PassManagerBuilder.h
119

I plan to go forward with Eli's suggestion. This was put here just to get this to compile and pass tests.

kparzysz updated this revision to Diff 84557.Jan 16 2017, 6:56 AM
kparzysz added a reviewer: eli.friedman.

Implement Eli's suggestion: add a TM hook to insert target-specific loop idiom recognition passes.

mehdi_amini added inline comments.Jan 16 2017, 9:43 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
320

This extension point could get its own separate patch.

It seems very targeted, if the only point is to have TargetSpecific loop idiom recognition, then why aren't some TTI implemented and used in LoopIdiomPass?

hfinkel added inline comments.Jan 16 2017, 9:49 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
320

I thought about suggesting this, but didn't because:

a. It is not clear what from the existing loop idiom recognition pass could be reused
b. The target-specific logic might want access to other analyses, and that would be awkward if it is just a TTI callback.

What do you think?

mehdi_amini added inline comments.Jan 16 2017, 10:09 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
320

These are good points!

It is not clear what from the existing loop idiom recognition pass could be reused

Main point here: the insertion point in the pipeline :)

The target-specific logic might want access to other analyses, and that would be awkward if it is just a TTI callback.

No sure what you mean by "awkward" with a TTI needing analysis, but that would definitely be a problem if the TTI would want an analysis that is not generic or not currently available in LoopIdiom.

The reason I think discussing this extension point in its own revision is because I think we usually have "generic" and "coarse grain" extension points (do we have another case of specific hook like that?).
I'm cautious about it because adding such hooks makes it harder to evolve/maintain the pipeline (ultimately what when we'll have a hook before/after every single pass?).

mehdi_amini added inline comments.Jan 16 2017, 10:18 AM
include/llvm/Transforms/IPO/PassManagerBuilder.h
108

Side note, but somehow related to the "too specific" aspect I mentioned, I'd call it EP_BeforeLoopIdiom and I'd describe it:

/// This extension point allows adding passes before loop-idiom 
/// recognition passes. It can be used for example  by a target to
/// add a specific loop-idiom recognition passes.
kparzysz added inline comments.Jan 16 2017, 10:20 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
320

I'm cautious about it because adding such hooks makes it harder to evolve/maintain the pipeline (ultimately what when we'll have a hook before/after every single pass?).

That's an understandable concern, but I think that this particular case is warranted, since it is not uncommon for targets to have their optimized versions of certain loop-based operations. It can be either via target-specific runtime libraries, or directly via hardware instructions.
Since this insertion point is shared with that of the target-independent loop idiom recognizer, the implementations would also share the same ways of handling the code. That would make it easier to move code from a target-specific version to the target-independent pass, or conversely, to take a target-independent code and adopt it for the needs of a given target.

kparzysz added inline comments.Jan 16 2017, 10:22 AM
include/llvm/Transforms/IPO/PassManagerBuilder.h
108

That could certainly be done.

kparzysz updated this revision to Diff 84578.Jan 16 2017, 10:53 AM
kparzysz retitled this revision from Hexagon-specific loop idiom recognition to Add pass insertion point EP_BeforeLoopIdiom.
kparzysz updated this object.
kparzysz added a reviewer: mehdi_amini.

Extracted the implementation of the new insertion point, with the previous comments reflected.

chandlerc edited edge metadata.Jan 20 2017, 1:08 AM

Sorry I didn't notice this waiting on me, and thanks for the gentle prod Krzysztof! My two cents inline.

lib/Transforms/IPO/PassManagerBuilder.cpp
320

I think this extension point makes sense, and I wouldn't even tie it to the loop idiom recognition but instead to the position in the loop pipeline.

Specifically, I'd add a "late loop canonicalization and simplification" extension point. I would put it after the normal loop idiom but before loop deletion. I think documenting it as "the last thing in the loop pipeline before deletion" makes sense. You can talk about the fact that this is where you should extend the pipeline with loop transformations that tend to *remove loops entirely*.

You should also make it really clear that these need to be *loop passes* and are run in a pipeline across each loop in the nest. And ideally get some assert failure in the extension mechanism that enforces this. Having this accidentally add a function pass will really dramatically change the loop pipeline by partitioning deletion from the rest.

And as Mehdi said, I would land this as its own patch, and then to the TargetMachine patch separately.

And just to clarify, I would be very concerned about a TTI callback for the reasons Krzysztof mentioned... Especially the needing another analysis. I just think it'll get awkward fast and injecting passes is pretty clean and testable.

tools/opt/opt.cpp
294–299

If we go this way, I worry we're just bugging for a third...

What about instead you call a method on TargetMachine with the builder that can add any extensions it wants. Then all of this gets sunk into the TM. You can subsume both early-as-possible and the late-loop extensions into this one API. Does that make sense to others?

kparzysz added inline comments.Jan 20 2017, 6:41 AM
tools/opt/opt.cpp
294–299

It makes sense to me. There is already a review for doing that (by someone else): D28336.

kparzysz updated this revision to Diff 85160.Jan 20 2017, 11:00 AM
kparzysz retitled this revision from Add pass insertion point EP_BeforeLoopIdiom to Add loop pass insertion point EP_LateLoopOptimizations.
kparzysz edited the summary of this revision. (Show Details)

Isolated the extension point alone in this patch.

I think we have reached a consensus here. Could someone approve if there are no further concerns?

hfinkel accepted this revision.Jan 25 2017, 7:25 AM

I think we have reached a consensus here. Could someone approve if there are no further concerns?

LGTM.

This revision is now accepted and ready to land.Jan 25 2017, 7:25 AM
This revision was automatically updated to reflect the committed changes.