This is an archive of the discontinued LLVM Phabricator instance.

[PM][FunctionAttrs] add NoUnwind attribute inference to PostOrderFunctionAttrs pass
ClosedPublic

Authored by fedor.sergeev on Mar 13 2018, 4:11 AM.

Details

Summary

This was motivated by absence of PrunEH functionality in new PM.
It was decided that a proper way to do PruneEH is to add NoUnwind inference
into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top.

This change generalizes attribute handling implemented for (a removal of)
Convergent attribute, by introducing a generic builder-like class

AttributeInferer

It registers all the attribute inference requests, storing per-attribute
predicates into a vector, and then goes through an SCC Node, scanning all
the instructions for not breaking attribute assumptions.

The main idea is that as soon all the instructions from all the functions
of SCC Node conform to attribute assumptions then we are free to infer
the attribute as set for all the functions of SCC Node.

It handles two distinct cases of attributes:

  • those that might break due to derefinement of the function code

    for these attributes we are allowed to apply inference only if all the functions are "exact definitions". Example - NoUnwind.
  • those that do not care about derefinement

    for these attributes we are allowed to apply inference as soon as we see any function definition. Example - removal of Convergent attribute.

Also in this commit:

  • Converted all the FunctionAttrs tests to use FileCheck and added new-PM invocations to them
  • FunctionAttrs/convergent.ll test demonstrates a difference in behavior between new and old PM implementations. Marked with FIXME.
  • PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg combo as intended
  • some of "other" tests were updated since function-attrs now infers 'nounwind' even for old PM pipeline
  • -disable-nounwind-inference hidden option added as a possible workaround for a supposedly rare case when nounwind being inferred by default presents a problem

Diff Detail

Event Timeline

fedor.sergeev created this revision.Mar 13 2018, 4:11 AM
fedor.sergeev edited the summary of this revision. (Show Details)Mar 13 2018, 4:12 AM

cleaning up some tests, still a couple FIXMEs in tests

all the tests updated, one FIXME left on convergent.ll test as intended

fedor.sergeev edited the summary of this revision. (Show Details)Mar 13 2018, 10:10 AM

Hi, thanks for the patch.

I like the overall approach of making this pass generic. But I have some high-level comments on the patch:

  • Is it possible to explain in the code itself everything a user needs to know to understand the code? For example, I could not find where we define "derefinable", or derefinement. More broadly, it is not easy for me to follow the overall structure of this code just by reading the code.
  • It's a nit, but I would prefer to write all comments in proper English, so starting with capital letters, ending with periods (if they are complete sentences), using apostrophes in contractions (e.g. cant -> can't), etc. I think this will make it easier for others to read.
  • I'm somewhat confused by our notion of some instructions being "invalid". Perhaps that should have a different name.
  • I'm a bit concerned about the overhead of calling multiple virtual functions (std::function) for every instruction in the module, especially since our loop structure (which is the correct one!) does not have one easily-predicted virtual function call target, but instead iterates between call targets. I know we go through a lot of pain elsewhere in LLVM to avoid this. Since the list of functors is static, I wonder if we shouldn't metaprogram this. Using llvm::integer_sequence would make it not *too* bad.

    But maybe that's a premature optimization; I definitely don't want to do it if there's no benefit. Perhaps @chandlerc can comment on this.

Sorry for delay sending these commetns -- i had them typed up and didn't mash send. =[[[

Also, to address (briefly) what Justin mentioned: I think getting cache locality of single visit of instructions is likely to be more valuable than the overhead of the indirect calls. Could be wrong, but we can also look to see if there is a problem.

lib/Transforms/IPO/FunctionAttrs.cpp
1048–1049

This should be a (more expansive I suspect) doxygen comment on the type.

1050–1055

The tuple here should almost certainly be a little struct to simplify the code. At that point, I don't know that you need the type aliases.

1060–1066

I think this should be named more along the lines of registering something as it doesn't actually do inference here.

Currently, the doxygen comment doesn't add much value. Instead, I would suggest that the doxygen comment should explain what the semantics of these predicates are, how they are used, and mention what is different about the derefinement.

1081

This routine is only going to be run once over an SCC. I think that suggests a much simpler implementation.

As we process the SCC nodes and the instructions within each, we can use a remove_if (or better erase_if) pattern so that as predicates fail, we simple remove the struct w/ the callbacks for that attribute.

Then at the end, we run all the callbacks that remain.

This should allow you to not have index based walks or the bit vectors.

Also, to address (briefly) what Justin mentioned: I think getting cache locality of single visit of instructions is likely to be more valuable than the overhead of the indirect calls. Could be wrong, but we can also look to see if there is a problem.

Totally agree, but I think I wasn't clear, that's a different comparison from the one I was trying to make. The question I was trying ask is, given the current loop structure (which I think you're right, is the right structure), is the overhead of indirect calls vs direct (possibly inlined) calls (generated via metaprogramming) significant?

Also, to address (briefly) what Justin mentioned: I think getting cache locality of single visit of instructions is likely to be more valuable than the overhead of the indirect calls. Could be wrong, but we can also look to see if there is a problem.

Totally agree, but I think I wasn't clear, that's a different comparison from the one I was trying to make. The question I was trying ask is, given the current loop structure (which I think you're right, is the right structure), is the overhead of indirect calls vs direct (possibly inlined) calls (generated via metaprogramming) significant?

Probably not? But maybe?

That said, generating direct calls here seems really, really hard... Maybe you have an idea I don't see...

The only way I see to make them not indirect, we would need to have heterogeneous collection of callbacks for the various attributes and iterate across it heterogeneously.... Essentially, we'd need to take each collection of callbacks in the constructor so we can deduce all of their types and the number of them in one go, build a tuple of them, and then have the 'run' thing actually use tuple-iteration (maybe w/ a variadic function template) instead of a loop. The complexity of that code just didn't seem worth it w/o a benchmark...

generating direct calls here seems really, really hard... Maybe you have an idea I don't see...

It's...not awful. https://godbolt.org/g/D3KMQH

#include <array>
#include <vector>
#include <utility>

struct Instruction;

struct FooDescriptor {
  static bool IsGood(Instruction* I) { return true; }
};

struct BarDescriptor {
  static bool IsGood(Instruction* I) { return false; }
};

template <typename... Descriptors>
struct AttrInferer {
    void Foo(std::vector<Instruction*> Instrs) {
        for (Instruction* Instr : Instrs) {
            bool IsGood[] = {
                [&] { return Descriptors::IsGood(Instr); }()...
            };
            for (int i = 0; i < sizeof...(Descriptors); ++i) {
                
            }
        }
    }
};

void Test() {
    AttrInferer<FooDescriptor, BarDescriptor> a;
    a.Foo({});
}

Anyway if you think the virtual functions aren't a big deal, that's good enough for me.

Thanks everybody for good comments, will see into them.
Will put virtual functions issue at the very end of the queue.

  • Is it possible to explain in the code itself everything a user needs to know to understand the code? For example, I could not find where we define "derefinable", or derefinement.

I'm not sure about proper usage of this term either. It came from my discussions with Chandler, but if it sounds confusing ....
Anyway, I tried to explain the term in my commit message.

More broadly, it is not easy for me to follow the overall structure of this code just by reading the code.

You mean overall structure of AttributeInferer::run?

  • It's a nit, but I would prefer to write all comments in proper English, so starting with capital letters, ending with periods

Yeah, it is my biggest common mistake with the style :-/

fedor.sergeev added inline comments.Mar 14 2018, 4:13 PM
lib/Transforms/IPO/FunctionAttrs.cpp
1081

I can use remove/erase for the purpose of "Valid" attributes tracking.
However I still need a couple of deducible boolean facts separately tracked for each attribute -"ScanHere" and "Ready".
I can try adding those into Predicates structure and clean/set them accordingly.

Lets see how it looks like in implementation suggested by Justin (which, btw, does have an index walk ;) ).

fedor.sergeev edited the summary of this revision. (Show Details)

fixing comments, adding struct instead of tuple.
Addressed pretty much everything except of bitvectors and virtual functions,
just to have some ready base for comparison with other solutions.

fedor.sergeev marked 3 inline comments as done.Mar 15 2018, 3:02 AM

I like it.

I have a suggestions on the comments, but that's...pretty normal for me. :)

I think that if we don't go with the variadic templates thing -- which is probably a premature optimization, I trust Chandler here -- then going with Chandler's suggestion and getting rid of most or all of the bitsets would be a good simplification.

lib/Transforms/IPO/FunctionAttrs.cpp
1056

Can we expand upon what we mean by an "exact" definition?

1059

Should we say this function can be null (and if so, all functions need to be scanned)?

1069

Perhaps a different name for this vector, now that the struct has a name.

1116

This is kind of confusing right above the auto &AP = line, because it's not really describing that line...

Perhaps we could have a comment on the loop explaining at a higher level what we're doing?

1120

s/Perhaps,/Perhaps/

1122

This comment is probably obvious from the line above now.

1123
else if (...) {
}

?

1126

can't

1195

We should indicate somewhere (on the struct definition?) that if a function is skipped, it's skipped from both scanning *and* from having the attribute set on it. That was surprising to me.

1200

Perhaps instead of introducing this new "invalid" terminology, we should call it something related to one of the names in InferenceDescriptor?

1210

Perhaps this function would be clearer as:

if (I.mayThrow()) return false;
if (...) {
  if (SCCNodes.count(Callee)) {
     // I is a may-throw call to a function inside our SCC.  This doesn't invalidate our working assumption that the SCC is no-throw; we have to scan that other function.
    return false;
  }
}
// This instruction may throw, and it's not a call to another function in our SCC.  It therefore invalidates our non-throwing assumption.
return true;
1241

Suggest /*RequiresExactDefinition=*/false this way it's not ambiguous whether we're saying that false means it does require an exact definition, or that it doesn't.

1256

s/Note, that/Note that/

1271

Please re-clang-format

addressing most of Justin's comments on comments ;)

fedor.sergeev marked 11 inline comments as done.Mar 19 2018, 4:33 AM
fedor.sergeev added inline comments.
lib/Transforms/IPO/FunctionAttrs.cpp
1056

I'm not sure how can I do that w/o duplicating a huge comment for GlobalValue::isDefinitionExact.

1059

SkipFunc is not intended to be null. Comment added below to clarify the intended use.

1271

Err... clang-format behaves badly on this :(
It formats two calls of registerAttrInference differently.
So I took one format as is and hand-edited another invocation correspondingly.
Will repeat the same before integration as well.

fedor.sergeev marked an inline comment as done.Mar 19 2018, 4:50 AM

Btw, I tried Justin's version of "simplified" meta-programmed callbacks... and fully functional version of it
(one that performs early bailouts on ScanAttrHere/ValidAttrs) starts approaching Chandler's suggested tuple/tuple-iteration by complexity.
The problem with bailouts is that you cant just fill the whole array with calculated values of all the predicates,
or else you need to extend all your predicates with shortcut-ting semantics.

getting rid of bitvectors

making InstrBreaksNonThrowing a bit more readable, introducing -disable-nounwind-inference option just in case

fedor.sergeev edited the summary of this revision. (Show Details)Mar 20 2018, 3:46 AM
fedor.sergeev edited the summary of this revision. (Show Details)
fedor.sergeev marked 3 inline comments as done.

just to clarify - I believe this is ready to go :)

jlebar added inline comments.Mar 21 2018, 7:55 AM
lib/Transforms/IPO/FunctionAttrs.cpp
1056

Could we just have a pointer to GlobalValue::isDefinitionExact? As in

If true, only "exact" definitions can be used to infer this attribute.  (See GlobalValue::isDefinitionExact.)
1072

If we're going to mix both "properties" of the attribute and "state" of the algorithm in the same class, can we at least have a comment indicating that this is what we're doing, and all variables below this point are mutable state used by the algorithm?

1073

Capital

1073

s/still hold valid/are still valid/

1080

Capital

1082

I don't know how Chandler feels about it, but personally I'm not wild about the approach of sticking state into this struct. It uses the descriptor class for two quite different purposes, and perhaps worse, it's really easy to have a bug where we forget to reset ScanCurrent at the top of the loop...

I *think* Chandler's suggestion was, instead of tracking the state with a bitset, track it by removing elements from a list (or adding them to a list of "attributes to use when scanning this function").

It looks like the difficulty you face here is that the list you want to remove elements from is also the list you're iterating over? I'd be OK with pretty much any of the obvious solutions to that -- remove and adjust the loop index, use an std::list and remove in place, remove after the loop completes...

Dunno if Chandler has feels about this too.

1271

So I took one format as is and hand-edited another invocation correspondingly.

That's fine with me, so long as we don't expect that future modifications to this part of the file will preserve the format. People should be able to clang-format their changes and expect it's "good enough".

fedor.sergeev marked an inline comment as done.Mar 21 2018, 2:14 PM
fedor.sergeev added inline comments.
lib/Transforms/IPO/FunctionAttrs.cpp
1082

Well, I dont like merging descriptor with state either.
In my original implementation I had those in separate vectors, which forced me to use indices as the way to link descriptors and states.

Chandler's suggestion indeed was to remove elements.
However it does not play well with the need to have as many early exits as possible (which was a property of implementation before my changes).
Namely, there is a single meaningful vector here, a vector of descriptors, and while I can use removal from it to signify a single binary fact (e.g. Valid), there are more facts to track. Should I introduce copies of the main vector? How would it be different from having bit-vectors?

I really do not see an ideal solution here.

I can introduce a vector< pair<descriptor,state> >, which is rather clumsy.
Or I can introduce a more complicated interface to the descriptor class, with read/write accessors as needed,
and service functions that do initialization of mutables as needed.

1271

Most recent version is clang-formatted. This particular problem was solved by reordering arguments of InferenceDescriptor constructor.

fedor.sergeev marked an inline comment as done.Mar 21 2018, 2:16 PM
fedor.sergeev added inline comments.
lib/Transforms/IPO/FunctionAttrs.cpp
1080

I wonder if capitalizing "true" which is a boolean value and not an English word here is fine?

jlebar added inline comments.Mar 21 2018, 2:33 PM
lib/Transforms/IPO/FunctionAttrs.cpp
1080

I think of the boolean as *being* true, which in C++ might mean it has value true, in Python might mean it has value True, in PHP might mean it has value "True" :)... That is, I think you can say "this Python variable is true" with a lower-case "t", and that's just fine.

But that's quite a nit, whatever you like is fine. :)

1082

Should I introduce copies of the main vector?

That's what I was thinking.

How would it be different from having bit-vectors?

I think the notion was that you'd iterate over the vector that's being modified, so there's less indirection in the code.

In pseudocode:

ScannedSomeFunction = false;

SCCDescriptors = InterfaceDescriptors;
for (F : SCCNodes) {
  FnDescriptors = []
  for (Desc : SCCDescriptors) {
    if (F->isDeclaration() || (ID.RequiresExactDefinition && !F->hasExactDefinition())) {
      SCCDescriptors.Remove(Desc);
    } else if (!Desc.SkipFunction(F)) {
      FnDescriptors.Append(F);
    }
  }

  for (I : instructions(F)) {
    if (FnDescriptors.empty())
      break;

    for (Desc : FnDescriptors) {
      if (Desc.InstrBreaksAttribute(I)) {
        FnDescriptors.Remove(Desc);
	SCCDescriptors.Remove(Desc);
      }
    }
  }

  for (Desc : SCCDescriptors) {
    if (Desc.SkipFunction(F))
      continue;
    Desc.SetAttribute(F)
  }
}

I don't have SuccessfullyScanned here, but I *think* it's actually not necessary, because SkipFunction is the only reason we wouldn't scan something, and if we skipped it, we also won't set the attr on it.

fedor.sergeev added inline comments.Mar 21 2018, 2:37 PM
lib/Transforms/IPO/FunctionAttrs.cpp
1082

Ahem... that should work, indeed. Lemme try it.

fedor.sergeev edited the summary of this revision. (Show Details)

minor cleanup first

trying with erase_if
fedor.sergeev marked 2 inline comments as done.Mar 22 2018, 4:08 AM
fedor.sergeev added inline comments.
lib/Transforms/IPO/FunctionAttrs.cpp
1082

Done more or less along the lines of your suggestion, Justin (applying Chandler's advice about using erase_if).
The only catch is that I had to add AttrKind to the descriptor to be able to match descriptors in different vectors
(to implement your : FnDescriptors.Remove + SCCDescriptors.Remove).

I like the algorithm! Just comments on comments, then I think I'm happy. Dunno if @chandlerc wants to have a look or not...

lib/Transforms/IPO/FunctionAttrs.cpp
1054

Remove newline?

1057

Update comment?

1076

s/)/.)/

1103

Remove newline?

1106

... "for each of them. We'll remove attributes which aren't valid for the SCC from InferInSCC."

1116

This comment may be more confusing than not at this point.

1118

s/meeting/visiting/ (idom, sorry)

1119

has an unsuitable definition

1120

Nit, remove outer parens?

1126

Suggest moving the comment up one line and saying something like

For each attribute still in InferInSCC that doesn't skip F, check that the instructions in F are valid.
1136

Remove newline?

1140

Perhaps this can get a "because" -- it's kind of the crux of the whole algorithm, and it's quite hidden here.

1149

If you move this check to the top of the loop, then you don't need the check before the loop.

addressing Justin's comments

fedor.sergeev marked 22 inline comments as done.Mar 22 2018, 3:51 PM

more comments cleanup

fedor.sergeev marked 5 inline comments as done.Mar 22 2018, 3:55 PM
fedor.sergeev added inline comments.
lib/Transforms/IPO/FunctionAttrs.cpp
1149

Perhaps its a bit of over-optimization, but I dont want to touch instructions(F) (enter the instructions loop) just to determine that we need to break out.
I dont feel that having these two checks is detrimental to readability.

Hey, I can almost sign off on this, but two of the comments are not clear. I'm happy to provide suggestions if you like.

lib/Transforms/IPO/FunctionAttrs.cpp
1126

Sorry, I think the reworded comment is still unclear.

If you're willing to tell me what you don't like about the suggestion I made earlier, perhaps we can come up with something we both like.

1141

This comment needs to be reworded, sorry.

minor comments cleanup

fedor.sergeev marked an inline comment as not done.Mar 23 2018, 11:19 AM
fedor.sergeev added inline comments.
lib/Transforms/IPO/FunctionAttrs.cpp
1126

Mostly I didnt like "instructions are valid" part, which is not what we are checking for.
If you dont like my last variant - please, suggest your own that talks about attribute, not instruction.

jlebar accepted this revision.Mar 23 2018, 12:50 PM

\o/

lib/Transforms/IPO/FunctionAttrs.cpp
1126

What you came up with looks great to me.

Thank you for being patient with me -- I know I'm not an easy reviewer.

1140

lgtm, thanks again for being patient with me.

This revision is now accepted and ready to land.Mar 23 2018, 12:50 PM

Thanks for following up till the very end of it! :)
Your feeling of language is a bit over my capabilities, but I do like to learn here.

minor semantical update - calculation of return value corrected for AttributeInferer::run,
returning true only when real changes are detected.
Fixes regression introduced by most recent algorithmic update.

This revision was automatically updated to reflect the committed changes.