Page MenuHomePhabricator

[IR] convert warn-stack-size from module flag to fn attr
ClosedPublic

Authored by nickdesaulniers on Jun 15 2021, 5:18 PM.

Details

Summary

Otherwise, this causes issues when building with LTO for object files
that use different values.

Link: https://github.com/ClangBuiltLinux/linux/issues/1395

Diff Detail

Event Timeline

nickdesaulniers requested review of this revision.Jun 15 2021, 5:18 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 15 2021, 5:18 PM

what're the rules about module level attributes like this? For instance: Does this affect/hinder inlining (if the attributes don't match between the caller and callee)? Other situations that might matter?

llvm/lib/CodeGen/PrologEpilogInserter.cpp
282

I guess the 0 value here is the default value if the value can't be parsed as an integer? Is that desirable? I guess maybe we should ignore it (use UINT_MAX here instead, maybe) and fail in the verifier.

But I guess if we fail in the verifier, then it doesn't really matter/shouldn't be tested what the behavior is here when presented with invalid IR.

(but this is a divergence from the module flag handling, which looks like it does silently ignore non-numeric values, by using UINT_MAX)

what're the rules about module level attributes like this? For instance: Does this affect/hinder inlining (if the attributes don't match between the caller and callee)? Other situations that might matter?

I think you meant s/module level/function level/? That's a good question, one I had to think about a little bit. Here's my thoughts on the behavior this should exhibit, please let me know if you agree.

When using -Wframe-larger-than=<threshold> per TU, the developer wants to be alerted if any stack frame exceeds a threshold. The Linux kernel's use case is that the kernel's stack is limited to (usually) two pages (ulimit -s; typically 8KiB, but different architectures do support non-4KiB page sizes), so functions using more than 1KiB of stack are usually indicative of large objects being stack allocated that should have been heap allocated.

Currently, in C (and C with GNU extensions), there is no way to describe to the compiler function-grain specific values for -Wframe-larger-than=; rather than fine grain per function control, we only have coarse grain TU control.

So in the general case (non-LTO), we can only perform inline substitution at call sites visible from callers' definitions. Because there's no GNU C function attribute to change the current value of -Wframe-larger-than=, it's not possible for that value to differ between caller and callee. But with LTO; shit gets weird.

Suddenly now with LTO, we have cross TU (cross Module) visibility into call sites, so we can inline across TU/Module boundaries. Thus we can have an IR intermediary object with a call site where the caller's value of -Wframe-larger-than= differs from the callees! So the question is what should happen in such a case?

The extremely conservative approach which we have done in the past for certain mismatched function attributes is to simply not perform inline substitution, if we have no other options. This adds overhead to the inliner to check the XOR of attribute lists of the caller and callee for each call site.

But I *think* (and am open to sugguestions) that we should:

  1. permit inline substitution
  2. the caller's value of "warn-stack-size"= IR Fn Attr wins

I think this is ok because: if caller is defined in TU1 with -Wframe-larger-than= distinct from callee defined in TU2 with a different value of -Wframe-larger-than=, then we don't care what callee's value was. callee may even be DCE'd if it's inlined into a lone call site. I'd expect in such cases that callee's value was larger than caller's, in which case callee should be attributed no_inline for LTO if the tighter threshold for caller now warns. If callee's value was smaller than callers and we performed inline substitution, I think that's also perfectly fine, caller should not become "more strict."

Generally in the Linux kernel, we see a common value of -Wframe-larger-than= throughout most of the TUs, with only a few generally having a larger value to relax constraints a little. (Also, those relaxations are questionable, given the intent of -Wframe-larger-than= use in the kernel in the first place).

Let me add such a test to encode that intention; though I don't know yet what's involved/possible to implement. Let's see.

llvm/lib/CodeGen/PrologEpilogInserter.cpp
282

IIUC, the first parameter to getAsInteger is the Radix, not the default value on failure to parse. But it does return true on error, so I should check that here.

I also should add a verifier check for this new function attribute. While the "string key equals string value" attributes are quite flexible, it would be good to have some rigidity in requiring the string value to be parseable as an unsigned int.

llvm/lib/CodeGen/PrologEpilogInserter.cpp
282

Oh, I should use base 10 as the radix, otherwise it will try to parse hex and binary literals.

what're the rules about module level attributes like this? For instance: Does this affect/hinder inlining (if the attributes don't match between the caller and callee)? Other situations that might matter?

I think you meant s/module level/function level/? That's a good question, one I had to think about a little bit. Here's my thoughts on the behavior this should exhibit, please let me know if you agree.

When using -Wframe-larger-than=<threshold> per TU, the developer wants to be alerted if any stack frame exceeds a threshold. The Linux kernel's use case is that the kernel's stack is limited to (usually) two pages (ulimit -s; typically 8KiB, but different architectures do support non-4KiB page sizes), so functions using more than 1KiB of stack are usually indicative of large objects being stack allocated that should have been heap allocated.

Currently, in C (and C with GNU extensions), there is no way to describe to the compiler function-grain specific values for -Wframe-larger-than=; rather than fine grain per function control, we only have coarse grain TU control.

So in the general case (non-LTO), we can only perform inline substitution at call sites visible from callers' definitions. Because there's no GNU C function attribute to change the current value of -Wframe-larger-than=, it's not possible for that value to differ between caller and callee. But with LTO; shit gets weird.

Suddenly now with LTO, we have cross TU (cross Module) visibility into call sites, so we can inline across TU/Module boundaries. Thus we can have an IR intermediary object with a call site where the caller's value of -Wframe-larger-than= differs from the callees! So the question is what should happen in such a case?

The extremely conservative approach which we have done in the past for certain mismatched function attributes is to simply not perform inline substitution, if we have no other options. This adds overhead to the inliner to check the XOR of attribute lists of the caller and callee for each call site.

But I *think* (and am open to sugguestions) that we should:

  1. permit inline substitution
  2. the caller's value of "warn-stack-size"= IR Fn Attr wins

I think this is ok because: if caller is defined in TU1 with -Wframe-larger-than= distinct from callee defined in TU2 with a different value of -Wframe-larger-than=, then we don't care what callee's value was. callee may even be DCE'd if it's inlined into a lone call site. I'd expect in such cases that callee's value was larger than caller's, in which case callee should be attributed no_inline for LTO if the tighter threshold for caller now warns. If callee's value was smaller than callers and we performed inline substitution, I think that's also perfectly fine, caller should not become "more strict."

Generally in the Linux kernel, we see a common value of -Wframe-larger-than= throughout most of the TUs, with only a few generally having a larger value to relax constraints a little. (Also, those relaxations are questionable, given the intent of -Wframe-larger-than= use in the kernel in the first place).

Let me add such a test to encode that intention; though I don't know yet what's involved/possible to implement. Let's see.

Sure, that all sounds pretty reasonable to me - mostly I was curious what the existing/default behavior is (if we do nothing other than what's already in this patch, how does the inliner handle different values/mismatched presence of warn-stack-size attributes, for instance) - to check that whatever it does seems reasonable/acceptable.

nickdesaulniers marked an inline comment as done.
  • verifier checks, inliner test, use base 10 radix
dblaikie accepted this revision.Jun 18 2021, 12:02 PM

Sounds OK to me.

Another thing you might want to check is linkonce_odr functions - I guess you'll get an arbitrary choice between two linkonce_odr functions under LTO where they have different warn-stack-size? Maybe there's a way/place to merge and always pick the lower or upper value if there's one you think would be more right?

llvm/test/Transforms/Inline/warn-stack-size.ll
1 ↗(On Diff #352841)

Nice to see the test - though I probably wouldn't bother adding this test if this behavior already falls out of more general support in the inliner and the way it already handles attributes - the general behavior is likely already tested elsewhere? (though it'd be good to confirm that either in tests and/or the inliner code itself)

my original question was to confirm that the inliner already had accounted for this situation in a way that was desirable & it looks like/sounds like it is.

This revision is now accepted and ready to land.Jun 18 2021, 12:02 PM
  • fix lint, add linker test

Another thing you might want to check is linkonce_odr functions - I guess you'll get an arbitrary choice between two linkonce_odr functions under LTO where they have different warn-stack-size? Maybe there's a way/place to merge and always pick the lower or upper value if there's one you think would be more right?

I've added an llvm-link test for this. I'm not sure it adds any signal though here; I think the answer to such a question is "don't do that."

llvm/test/Transforms/Inline/warn-stack-size.ll
1 ↗(On Diff #352841)

AttributeFuncs::areInlineCompatible seems to define the disallow-list for mismatched function attributes.

AttributeFuncs::mergeAttributesForInlining() seems to be the merging strategy for certain function attributes.

I agree that this test just confirms that the implicit default merge strategy is used. I guess it would fail if someone unintentionally changed that, but I don't mind removing this test either. WDYT?

  • git add the Linker test!

Another thing you might want to check is linkonce_odr functions - I guess you'll get an arbitrary choice between two linkonce_odr functions under LTO where they have different warn-stack-size? Maybe there's a way/place to merge and always pick the lower or upper value if there's one you think would be more right?

I've added an llvm-link test for this. I'm not sure it adds any signal though here; I think the answer to such a question is "don't do that."

I don't think it's as easy as "don't do that". Unless someone passes exactly the same flags to every compilation (which they won't, that's why this is being implemented as a function attribute) then it'll be really easy for an inline function in a header (say, std::vector<int>::size - something easy for two unrelated translation units to use) in two different translation units each with a different warn-stack-size flag and so to get somewhat arbitrary behavior about how that function is warned on.

For instance: maybe the function doesn't get a warning because a copy with a higher warn-stack-size value is picked, until the translation unit using that higher value is refactored and starts using std::list instead of std::vector... and now some other TU's std::vector is picked, with a lower warn-stack-size value and breaks the build (assuming -Werror)...

llvm/test/Transforms/Inline/warn-stack-size.ll
1 ↗(On Diff #352841)

generally I wouldn't add a test like this, or the LTO one - I'd just confirm that their features are separately tested and the behavior that's desired for how I want to use the feature (eg: I wouldn't test that the add instruction lowers to some machine code in my optimization - I'd confirm the add instruction has the desired semantics for whatever transformation I want to perform)

MaskRay accepted this revision.Jun 21 2021, 1:51 PM
MaskRay added a subscriber: MaskRay.

Agree that a function attribute is more appropriate considering the LTO behavior.

llvm/docs/LangRef.rst
2052

In for once, is for redundant?

2054

Abbreviated as a non-negative integer

module attr

module flag metadata (or module flag) is more appropriate.

nickdesaulniers retitled this revision from [IR] convert warn-stack-size from module attr to fn attr to [IR] convert warn-stack-size from module flag to fn attr.Jun 21 2021, 2:26 PM
nickdesaulniers marked 2 inline comments as done.
nickdesaulniers edited the summary of this revision. (Show Details)
  • fix langref, delete inline and linker tests
nickdesaulniers marked 2 inline comments as done.Jun 21 2021, 2:49 PM

Another thing you might want to check is linkonce_odr functions - I guess you'll get an arbitrary choice between two linkonce_odr functions under LTO where they have different warn-stack-size? Maybe there's a way/place to merge and always pick the lower or upper value if there's one you think would be more right?

I've added an llvm-link test for this. I'm not sure it adds any signal though here; I think the answer to such a question is "don't do that."

I don't think it's as easy as "don't do that". Unless someone passes exactly the same flags to every compilation (which they won't, that's why this is being implemented as a function attribute) then it'll be really easy for an inline function in a header (say, std::vector<int>::size - something easy for two unrelated translation units to use) in two different translation units each with a different warn-stack-size flag and so to get somewhat arbitrary behavior about how that function is warned on.

Ah, ok that's which language construct can produce linkonce_odr. Fair point.

For instance: maybe the function doesn't get a warning because a copy with a higher warn-stack-size value is picked, until the translation unit using that higher value is refactored and starts using std::list instead of std::vector... and now some other TU's std::vector is picked, with a lower warn-stack-size value and breaks the build (assuming -Werror)...

This seems like a general problem perhaps with linking IR? Perhaps IRLinker::copyFunctionProto or IRLinker::mapAttributeTypes should try to do something here, though I'm not sure yet which policy would be preferred?

As a heads up, this will conflict with D104667.

dblaikie accepted this revision.Jun 21 2021, 2:55 PM

Another thing you might want to check is linkonce_odr functions - I guess you'll get an arbitrary choice between two linkonce_odr functions under LTO where they have different warn-stack-size? Maybe there's a way/place to merge and always pick the lower or upper value if there's one you think would be more right?

I've added an llvm-link test for this. I'm not sure it adds any signal though here; I think the answer to such a question is "don't do that."

I don't think it's as easy as "don't do that". Unless someone passes exactly the same flags to every compilation (which they won't, that's why this is being implemented as a function attribute) then it'll be really easy for an inline function in a header (say, std::vector<int>::size - something easy for two unrelated translation units to use) in two different translation units each with a different warn-stack-size flag and so to get somewhat arbitrary behavior about how that function is warned on.

Ah, ok that's which language construct can produce linkonce_odr. Fair point.

For instance: maybe the function doesn't get a warning because a copy with a higher warn-stack-size value is picked, until the translation unit using that higher value is refactored and starts using std::list instead of std::vector... and now some other TU's std::vector is picked, with a lower warn-stack-size value and breaks the build (assuming -Werror)...

This seems like a general problem perhaps with linking IR? Perhaps IRLinker::copyFunctionProto or IRLinker::mapAttributeTypes should try to do something here, though I'm not sure yet which policy would be preferred?

I don't know that there's a good answer (in more extreme cases - like different optimization levels or CPU features, at least at Google our answer has ended up being "compile the whole program with the right CPU features, because there's no great way to support good optimizations while respecting CPU features on a per-function basis"), basically - so this was more a "heads up, this is going to be possibly unavoidably messy/unreliable on the edges".

Probably worth at least writing up the risk/instability in the docs for the warning (in clang) and attribute (in llvm). (don't mind if that's in this patch or a follow-up).

(why I think there's no solution to this: any rule (highest wins, lowest wins, mismatch fails to compile) will create surprising/problematic effects, eg: you have one TU with the function in it and some value for the attribute - a new TU could introduce a copy of the function with a higher or lower value - and whichever choice of policy would then cause problems for one case or the other case. (either enforcing a stronger warning level on code that didn't ask for it, or slackening the warning level for code that thought it was protected by the warning))

I don't know that there's a good answer (in more extreme cases - like different optimization levels or CPU features, at least at Google our answer has ended up being "compile the whole program with the right CPU features, because there's no great way to support good optimizations while respecting CPU features on a per-function basis"), basically - so this was more a "heads up, this is going to be possibly unavoidably messy/unreliable on the edges".

Probably worth at least writing up the risk/instability in the docs for the warning (in clang) and attribute (in llvm). (don't mind if that's in this patch or a follow-up).

Sure. Let me land this, since we (Google & ClangBuiltLinux) have some tests and builds failing due to https://reviews.llvm.org/D103928. I will then send a follow up for us to iterate on in regards to documenting this more.

(why I think there's no solution to this: any rule (highest wins, lowest wins, mismatch fails to compile) will create surprising/problematic effects, eg: you have one TU with the function in it and some value for the attribute - a new TU could introduce a copy of the function with a higher or lower value - and whichever choice of policy would then cause problems for one case or the other case. (either enforcing a stronger warning level on code that didn't ask for it, or slackening the warning level for code that thought it was protected by the warning))

I agree. I don't think even having a function attribute in C for -Wframe-larger-than would resolve such policy issues either.

I don't know that there's a good answer (in more extreme cases - like different optimization levels or CPU features, at least at Google our answer has ended up being "compile the whole program with the right CPU features, because there's no great way to support good optimizations while respecting CPU features on a per-function basis"), basically - so this was more a "heads up, this is going to be possibly unavoidably messy/unreliable on the edges".

Probably worth at least writing up the risk/instability in the docs for the warning (in clang) and attribute (in llvm). (don't mind if that's in this patch or a follow-up).

Sure. Let me land this, since we (Google & ClangBuiltLinux) have some tests and builds failing due to https://reviews.llvm.org/D103928. I will then send a follow up for us to iterate on in regards to documenting this more.

(why I think there's no solution to this: any rule (highest wins, lowest wins, mismatch fails to compile) will create surprising/problematic effects, eg: you have one TU with the function in it and some value for the attribute - a new TU could introduce a copy of the function with a higher or lower value - and whichever choice of policy would then cause problems for one case or the other case. (either enforcing a stronger warning level on code that didn't ask for it, or slackening the warning level for code that thought it was protected by the warning))

I agree. I don't think even having a function attribute in C for -Wframe-larger-than would resolve such policy issues either.

At least then we could probably say it's an ODR violation (the two function definitions would be not the same if the user wrote the attribute differently for two definitions of the inline function in two different translation units) to have the function declared with different values for the attribute within the same program (so you could still compile two different files (that include a common header with a common function with the attribute specified there) with different values for the command line flag - because the function would get a consistent attribute value for the warning) - and then the linker could actually reject it on mismatch. But with the attribute currently coming from the command line, that's not feasible.

This revision was landed with ongoing or failed builds.Jun 21 2021, 3:16 PM
This revision was automatically updated to reflect the committed changes.

Probably worth at least writing up the risk/instability in the docs for the warning (in clang) and attribute (in llvm). (don't mind if that's in this patch or a follow-up).

I would think https://clang.llvm.org/docs/DiagnosticsReference.html#wframe-larger-than would be an appropriate place to document this for -Wframe-larger-than=, but it seems this whole page is generated via TableGen. It's not clear to me how we could insert such a note.

Langref changes: https://reviews.llvm.org/D104736.

At least then we could probably say it's an ODR violation (the two function definitions would be not the same if the user wrote the attribute differently for two definitions of the inline function in two different translation units) to have the function declared with different values for the attribute within the same program (so you could still compile two different files (that include a common header with a common function with the attribute specified there) with different values for the command line flag - because the function would get a consistent attribute value for the warning) - and then the linker could actually reject it on mismatch. But with the attribute currently coming from the command line, that's not feasible.

Probably worth at least writing up the risk/instability in the docs for the warning (in clang) and attribute (in llvm). (don't mind if that's in this patch or a follow-up).

I would think https://clang.llvm.org/docs/DiagnosticsReference.html#wframe-larger-than would be an appropriate place to document this for -Wframe-larger-than=, but it seems this whole page is generated via TableGen. It's not clear to me how we could insert such a note.

Yeah, I don't think we have a way to add more verbose/custom documentation for diagnostics. (@aaron.ballman might have some ideas)

Langref changes: https://reviews.llvm.org/D104736.

Thanks!

I would think https://clang.llvm.org/docs/DiagnosticsReference.html#wframe-larger-than would be an appropriate place to document this for -Wframe-larger-than=, but it seems this whole page is generated via TableGen. It's not clear to me how we could insert such a note.

Yeah, I don't think we have a way to add more verbose/custom documentation for diagnostics. (@aaron.ballman might have some ideas)

I'm not aware of any way to do that today -- it would require more machinery for generating the diagnostic documentation, which is made harder by this particular diagnostic being a backend one that's not written out from tablegen.

I would think https://clang.llvm.org/docs/DiagnosticsReference.html#wframe-larger-than would be an appropriate place to document this for -Wframe-larger-than=, but it seems this whole page is generated via TableGen. It's not clear to me how we could insert such a note.

Yeah, I don't think we have a way to add more verbose/custom documentation for diagnostics. (@aaron.ballman might have some ideas)

I'm not aware of any way to do that today -- it would require more machinery for generating the diagnostic documentation, which is made harder by this particular diagnostic being a backend one that's not written out from tablegen.

Fair enough - thanks for the context!