Otherwise, this causes issues when building with LTO for object files
that use different values.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
what're the rules about module level attributes like this? For instance: Does this affect/hinder inlining (if the attributes don't match between the caller and callee)? Other situations that might matter?
llvm/lib/CodeGen/PrologEpilogInserter.cpp | ||
---|---|---|
282 | I guess the 0 value here is the default value if the value can't be parsed as an integer? Is that desirable? I guess maybe we should ignore it (use UINT_MAX here instead, maybe) and fail in the verifier. But I guess if we fail in the verifier, then it doesn't really matter/shouldn't be tested what the behavior is here when presented with invalid IR. (but this is a divergence from the module flag handling, which looks like it does silently ignore non-numeric values, by using UINT_MAX) |
I think you meant s/module level/function level/? That's a good question, one I had to think about a little bit. Here's my thoughts on the behavior this should exhibit, please let me know if you agree.
When using -Wframe-larger-than=<threshold> per TU, the developer wants to be alerted if any stack frame exceeds a threshold. The Linux kernel's use case is that the kernel's stack is limited to (usually) two pages (ulimit -s; typically 8KiB, but different architectures do support non-4KiB page sizes), so functions using more than 1KiB of stack are usually indicative of large objects being stack allocated that should have been heap allocated.
Currently, in C (and C with GNU extensions), there is no way to describe to the compiler function-grain specific values for -Wframe-larger-than=; rather than fine grain per function control, we only have coarse grain TU control.
So in the general case (non-LTO), we can only perform inline substitution at call sites visible from callers' definitions. Because there's no GNU C function attribute to change the current value of -Wframe-larger-than=, it's not possible for that value to differ between caller and callee. But with LTO; shit gets weird.
Suddenly now with LTO, we have cross TU (cross Module) visibility into call sites, so we can inline across TU/Module boundaries. Thus we can have an IR intermediary object with a call site where the caller's value of -Wframe-larger-than= differs from the callees! So the question is what should happen in such a case?
The extremely conservative approach which we have done in the past for certain mismatched function attributes is to simply not perform inline substitution, if we have no other options. This adds overhead to the inliner to check the XOR of attribute lists of the caller and callee for each call site.
But I *think* (and am open to sugguestions) that we should:
- permit inline substitution
- the caller's value of "warn-stack-size"= IR Fn Attr wins
I think this is ok because: if caller is defined in TU1 with -Wframe-larger-than= distinct from callee defined in TU2 with a different value of -Wframe-larger-than=, then we don't care what callee's value was. callee may even be DCE'd if it's inlined into a lone call site. I'd expect in such cases that callee's value was larger than caller's, in which case callee should be attributed no_inline for LTO if the tighter threshold for caller now warns. If callee's value was smaller than callers and we performed inline substitution, I think that's also perfectly fine, caller should not become "more strict."
Generally in the Linux kernel, we see a common value of -Wframe-larger-than= throughout most of the TUs, with only a few generally having a larger value to relax constraints a little. (Also, those relaxations are questionable, given the intent of -Wframe-larger-than= use in the kernel in the first place).
Let me add such a test to encode that intention; though I don't know yet what's involved/possible to implement. Let's see.
llvm/lib/CodeGen/PrologEpilogInserter.cpp | ||
---|---|---|
282 | IIUC, the first parameter to getAsInteger is the Radix, not the default value on failure to parse. But it does return true on error, so I should check that here. I also should add a verifier check for this new function attribute. While the "string key equals string value" attributes are quite flexible, it would be good to have some rigidity in requiring the string value to be parseable as an unsigned int. |
llvm/lib/CodeGen/PrologEpilogInserter.cpp | ||
---|---|---|
282 | Oh, I should use base 10 as the radix, otherwise it will try to parse hex and binary literals. |
Sure, that all sounds pretty reasonable to me - mostly I was curious what the existing/default behavior is (if we do nothing other than what's already in this patch, how does the inliner handle different values/mismatched presence of warn-stack-size attributes, for instance) - to check that whatever it does seems reasonable/acceptable.
Sounds OK to me.
Another thing you might want to check is linkonce_odr functions - I guess you'll get an arbitrary choice between two linkonce_odr functions under LTO where they have different warn-stack-size? Maybe there's a way/place to merge and always pick the lower or upper value if there's one you think would be more right?
llvm/test/Transforms/Inline/warn-stack-size.ll | ||
---|---|---|
1 | Nice to see the test - though I probably wouldn't bother adding this test if this behavior already falls out of more general support in the inliner and the way it already handles attributes - the general behavior is likely already tested elsewhere? (though it'd be good to confirm that either in tests and/or the inliner code itself) my original question was to confirm that the inliner already had accounted for this situation in a way that was desirable & it looks like/sounds like it is. |
I've added an llvm-link test for this. I'm not sure it adds any signal though here; I think the answer to such a question is "don't do that."
llvm/test/Transforms/Inline/warn-stack-size.ll | ||
---|---|---|
1 | AttributeFuncs::areInlineCompatible seems to define the disallow-list for mismatched function attributes. AttributeFuncs::mergeAttributesForInlining() seems to be the merging strategy for certain function attributes. I agree that this test just confirms that the implicit default merge strategy is used. I guess it would fail if someone unintentionally changed that, but I don't mind removing this test either. WDYT? |
I don't think it's as easy as "don't do that". Unless someone passes exactly the same flags to every compilation (which they won't, that's why this is being implemented as a function attribute) then it'll be really easy for an inline function in a header (say, std::vector<int>::size - something easy for two unrelated translation units to use) in two different translation units each with a different warn-stack-size flag and so to get somewhat arbitrary behavior about how that function is warned on.
For instance: maybe the function doesn't get a warning because a copy with a higher warn-stack-size value is picked, until the translation unit using that higher value is refactored and starts using std::list instead of std::vector... and now some other TU's std::vector is picked, with a lower warn-stack-size value and breaks the build (assuming -Werror)...
llvm/test/Transforms/Inline/warn-stack-size.ll | ||
---|---|---|
1 | generally I wouldn't add a test like this, or the LTO one - I'd just confirm that their features are separately tested and the behavior that's desired for how I want to use the feature (eg: I wouldn't test that the add instruction lowers to some machine code in my optimization - I'd confirm the add instruction has the desired semantics for whatever transformation I want to perform) |
Ah, ok that's which language construct can produce linkonce_odr. Fair point.
For instance: maybe the function doesn't get a warning because a copy with a higher warn-stack-size value is picked, until the translation unit using that higher value is refactored and starts using std::list instead of std::vector... and now some other TU's std::vector is picked, with a lower warn-stack-size value and breaks the build (assuming -Werror)...
This seems like a general problem perhaps with linking IR? Perhaps IRLinker::copyFunctionProto or IRLinker::mapAttributeTypes should try to do something here, though I'm not sure yet which policy would be preferred?
I don't know that there's a good answer (in more extreme cases - like different optimization levels or CPU features, at least at Google our answer has ended up being "compile the whole program with the right CPU features, because there's no great way to support good optimizations while respecting CPU features on a per-function basis"), basically - so this was more a "heads up, this is going to be possibly unavoidably messy/unreliable on the edges".
Probably worth at least writing up the risk/instability in the docs for the warning (in clang) and attribute (in llvm). (don't mind if that's in this patch or a follow-up).
(why I think there's no solution to this: any rule (highest wins, lowest wins, mismatch fails to compile) will create surprising/problematic effects, eg: you have one TU with the function in it and some value for the attribute - a new TU could introduce a copy of the function with a higher or lower value - and whichever choice of policy would then cause problems for one case or the other case. (either enforcing a stronger warning level on code that didn't ask for it, or slackening the warning level for code that thought it was protected by the warning))
Sure. Let me land this, since we (Google & ClangBuiltLinux) have some tests and builds failing due to https://reviews.llvm.org/D103928. I will then send a follow up for us to iterate on in regards to documenting this more.
(why I think there's no solution to this: any rule (highest wins, lowest wins, mismatch fails to compile) will create surprising/problematic effects, eg: you have one TU with the function in it and some value for the attribute - a new TU could introduce a copy of the function with a higher or lower value - and whichever choice of policy would then cause problems for one case or the other case. (either enforcing a stronger warning level on code that didn't ask for it, or slackening the warning level for code that thought it was protected by the warning))
I agree. I don't think even having a function attribute in C for -Wframe-larger-than would resolve such policy issues either.
At least then we could probably say it's an ODR violation (the two function definitions would be not the same if the user wrote the attribute differently for two definitions of the inline function in two different translation units) to have the function declared with different values for the attribute within the same program (so you could still compile two different files (that include a common header with a common function with the attribute specified there) with different values for the command line flag - because the function would get a consistent attribute value for the warning) - and then the linker could actually reject it on mismatch. But with the attribute currently coming from the command line, that's not feasible.
I would think https://clang.llvm.org/docs/DiagnosticsReference.html#wframe-larger-than would be an appropriate place to document this for -Wframe-larger-than=, but it seems this whole page is generated via TableGen. It's not clear to me how we could insert such a note.
Langref changes: https://reviews.llvm.org/D104736.
At least then we could probably say it's an ODR violation (the two function definitions would be not the same if the user wrote the attribute differently for two definitions of the inline function in two different translation units) to have the function declared with different values for the attribute within the same program (so you could still compile two different files (that include a common header with a common function with the attribute specified there) with different values for the command line flag - because the function would get a consistent attribute value for the warning) - and then the linker could actually reject it on mismatch. But with the attribute currently coming from the command line, that's not feasible.
Yeah, I don't think we have a way to add more verbose/custom documentation for diagnostics. (@aaron.ballman might have some ideas)
Langref changes: https://reviews.llvm.org/D104736.
Thanks!
I'm not aware of any way to do that today -- it would require more machinery for generating the diagnostic documentation, which is made harder by this particular diagnostic being a backend one that's not written out from tablegen.
In for once, is for redundant?