This is an archive of the discontinued LLVM Phabricator instance.

Fix IR/Analysis layering issue in OptBisect
ClosedPublic

Authored by rtrieu on Feb 19 2019, 2:08 PM.

Details

Summary

OptBisect, a class in IR, depends on classes in Analysis. Analysis depends on IR, which creates a dependency loop. This patch attempts a minimal change to fix the issue. OptPassGate will have an isEnabled function and shouldRunPass only called if true is returned. shouldRunPass is simplified down to one function instead of several overloaded ones.

Diff Detail

Repository
rL LLVM

Event Timeline

rtrieu created this revision.Feb 19 2019, 2:08 PM
ormris added a subscriber: ormris.Feb 19 2019, 2:16 PM

The need for generic way of getting textual description for IRUnit is obvious both in legacy and new pass manager.
However in this particular approach I dont like how "const void*" is used to erase specific type of IRUnit.

In new pass manager there was a similar task of erasing the type of IRUnit for pass-instrumentation purposes.
And there we decided to wrap the pointer into llvm::Any.

I'm not sure llvm::Any is the best approach here, but it sounds better than absolute anarchy of const void*.

The need for generic way of getting textual description for IRUnit is obvious both in legacy and new pass manager.
However in this particular approach I dont like how "const void*" is used to erase specific type of IRUnit.

In new pass manager there was a similar task of erasing the type of IRUnit for pass-instrumentation purposes.
And there we decided to wrap the pointer into llvm::Any.

I'm not sure llvm::Any is the best approach here, but it sounds better than absolute anarchy of const void*.

Fair.

Alternatively - how would you feel about callers passing in the strings directly?

return !BisectEnabled || checkPass(P->getPassName(), P->getDescription(U));

The Pass and Unit are only used to retrieve the Pass Name and Description (passing the Unit straight back to the Pass that just passed it in) - so why not pass in those (Pass Name and Description) instead?

I guess the way it is now, those two strings are not computed if Bisect is disabled, so there's some efficiency gain - any sense whether that's significant/important?

Alternatively - how would you feel about callers passing in the strings directly?

return !BisectEnabled || checkPass(P->getPassName(), P->getDescription(U));

The Pass and Unit are only used to retrieve the Pass Name and Description (passing the Unit straight back to the Pass that just passed it in) - so why not pass in those (Pass Name and Description) instead?

You mean e.g. changing

!F.getContext().getOptPassGate().shouldRunPass(this, &F))

into

!F.getContext().getOptPassGate().shouldRunPass(this->getName(), F.getDescription()))

Yep, that would resolve the layering issue w/o requiring opaque pointers.

Btw, in new-pm's pass instrumentation we do pass Pass Name instead of a pointer to pass exactly because of this layering issue.
However, note, that there we still pass IR unit itself (wrapped into llvm::Any) instead of "description"
because instrumentations framework is generic and instrumentation functionality is not predefined.
And yet, specific instrumentations like PrintIR do need IRUnit's description and right now every instrumentation does it on its own.

If we could implement a generic way of querying IRUnit description (w/o introducing a layering issue), it would
be beneficial both for legacy and new pm implementations.

I guess the way it is now, those two strings are not computed if Bisect is disabled, so there's some efficiency gain - any sense whether that's significant/important?

I dont believe once-per-pass-invocation name query is anything essential we need to worry about.

Alternatively - how would you feel about callers passing in the strings directly?

return !BisectEnabled || checkPass(P->getPassName(), P->getDescription(U));

The Pass and Unit are only used to retrieve the Pass Name and Description (passing the Unit straight back to the Pass that just passed it in) - so why not pass in those (Pass Name and Description) instead?

You mean e.g. changing

!F.getContext().getOptPassGate().shouldRunPass(this, &F))

into

!F.getContext().getOptPassGate().shouldRunPass(this->getName(), F.getDescription()))

Yep, that would resolve the layering issue w/o requiring opaque pointers.

Btw, in new-pm's pass instrumentation we do pass Pass Name instead of a pointer to pass exactly because of this layering issue.
However, note, that there we still pass IR unit itself (wrapped into llvm::Any) instead of "description"
because instrumentations framework is generic and instrumentation functionality is not predefined.
And yet, specific instrumentations like PrintIR do need IRUnit's description and right now every instrumentation does it on its own.

If we could implement a generic way of querying IRUnit description (w/o introducing a layering issue), it would
be beneficial both for legacy and new pm implementations.

OK - so sounds like you have a slight preference for keeping the description callback (though that requires also passing in the pass, not just the pass name) & using llvm::Any?

Or keeping the description API (maybe removing its virtuality until that's needed in the new PM version?) but having passes call it locally themselves, rather than being called back from OptBisect?

Keep in mind that for the new PM we're talking about an API that's much more powerful, because it's designed for arbitrary hooks.

OptBisect doesn't need that necessarily. I'm in favor of passing the strings instead.

Keep in mind that for the new PM we're talking about an API that's much more powerful, because it's designed for arbitrary hooks.
OptBisect doesn't need that necessarily. I'm in favor of passing the strings instead.

Yep, thats true.
Was just pointing out that new PM might benefit from generalization, if anybody decides to do it ;)

I'm fine with passing the strings.

Last time, this was brought up, passing strings was deemed too expensive since it would cause a string to be created every time:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/122032.html

There's a difference between getting the pass name and the IR unit name. Pass names are hard-coded string literals which can be cheaply referred to by a StringRef. IR units need to create a new string with the description (except Loop and Region, which don't have interesting descriptions yet.)

rtrieu updated this revision to Diff 187723.Feb 20 2019, 8:54 PM

Change void pointers to llvm::Any. Update some functions to fix const issues.

Last time, this was brought up, passing strings was deemed too expensive since it would cause a string to be created every time:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/122032.html

There's a difference between getting the pass name and the IR unit name. Pass names are hard-coded string literals which can be cheaply referred to by a StringRef. IR units need to create a new string with the description (except Loop and Region, which don't have interesting descriptions yet.)

Fair enough - thanks for the reference/thread!

Makes sense to me (& yeah, doing this for every Function, and the complicated CGSCC description, etc - at least for Function/Module/BasicBlock passing as a Twine might suffice to delay some/enough of the computation, but CGSCC is a bit too verbose for that).

The only other option would be passing a functor (either std::function, or making "shouldRunPass" a function template) which would avoid the need for the virtuality, Any usage, etc. *shrug*

I don't feel too strongly from the available options - but hope someone with a closer connection/appreciation for these classes might have an opinion about whether adding the extra virtual function to Pass is problematic (philosophically or practically) or not.

For the record, I'm not sure about the previous decision that passing strings is too expensive. On the one hand, bisection is almost always disabled and is disabled in 100% of cases where anyone cares about compile time (though that may not be true of all opt gate use cases). On the other hand, even for a relatively large file with many functions and loops this is going to be on the order of 200,000 calls to this interface. Sure, building 200,000 strings isn't cheap, but in a compilation that's running 200,000 optimization passes it's minuscule.

I'm OK with either solution, and the latest revision looks good to me. I just don't think we need to go to any great lengths just to avoid building a string before running a pass. If we're really concerned about the string cost, an OptPassGate::isEnabled() call would likely accomplish the same thing.

I don't feel too strongly from the available options - but hope someone with a closer connection/appreciation for these classes might have an opinion about whether adding the extra virtual function to Pass is problematic (philosophically or practically) or not.

And here will I chime yet again with connection to NewPM (where, say, Passes do not have virtual functions).

... Looking at how this starts using Any wrapper I feel more and more motivated to create an IRUnit set of interfaces
that provide a generic handling for wrapping/unwrapping/description generation etc of IRUnits.
Doing it via virtual functions of Passes seems to be inelegant at best.

rtrieu updated this revision to Diff 188499.Feb 26 2019, 9:14 PM
rtrieu edited the summary of this revision. (Show Details)

Removed the virtual function from Pass. Use OptPassGate::isEnabled to gate the construction of the strings on the caller side.

fedor.sergeev accepted this revision.Feb 27 2019, 3:50 AM

I do like the final result!
See my comment, but its good to go even w/o my suggestion.

include/llvm/IR/OptBisect.h
29 ↗(On Diff #188499)

So, since we went over changing the interface to use StringRef for description, can we move one more step further
in that direction and start passing StringRef for Pass?
That would make this interface fully ready for new pass manager.

I can do it later on myself, but doing it here would lessen the churn by doing one change in interface, not two of them.

(it will not add any more overhead since pass name is always available as a string literal)

This revision is now accepted and ready to land.Feb 27 2019, 3:50 AM
philip.pfaffe added inline comments.Feb 27 2019, 4:30 AM
include/llvm/IR/OptBisect.h
29 ↗(On Diff #188499)

I concur with @andrew.w.kaylor that the performance hit from building the IRDescription will probably be fine. Nevertheless, it should be much cheaper to pass a Twine here instead! Additionally there should be a comment here explaining IRDescription.

dblaikie added inline comments.Feb 27 2019, 11:17 AM
include/llvm/IR/OptBisect.h
29 ↗(On Diff #188499)

I'd probably still suggest doing it as two changes - independent changes in independent commits and all that. But I don't feel especially strongly about it/this isn't code I have much ownership of/investment in.

As for Twine - not sure that would be relevant here. The most expensive ones don't benefit from Twine anyway (only those where "getDescription" could be removed, and the implementation written inline in the call to shouldRunPass could be Twine-able) and now that the condition where the string isn't needed is hoisted out before the call - once we're at the point of producing the description we know we're going to need it in a string, so there's nothing to be gained by delaying through a Twine, I think?

Sounds like this is good with a comment on the IRDescription. I'll commit this for now and leave the suggestions for follow-ups. There's two follow-ups if any is interested in perusing:

  1. Change shouldRunPass to take a StringRef instead of Pass pointer.

@fedor.sergeev says this will work better with the new PM, however this will also take away the ability of OptPassGate to use information in the Pass to decide to run the pass. Notably, LegacyPassManagerTest.cpp uses it to only run on Module IR units.

  1. Change IRDescription from StringRef to Twine.

This looks possible for all the getDescription functions and may reduce a bit of time from skipping string construction. All the strings returned from getDescription are created from a mix of string literals and StringRef's, which should handle. Then shouldRunPass and checkPass would also need to have their second parameters changed to Twine. Since we are already gating the string construction on isEnabled, this will provide benefit only in a few cases.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2019, 8:00 PM