This is an archive of the discontinued LLVM Phabricator instance.

[Stack Protection] Add diagnostic information for why stack protection was applied to a function
ClosedPublic

Authored by jhenderson on Jan 23 2017, 6:10 AM.

Details

Summary

Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.

This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied.

Diff Detail

Repository
rL LLVM

Event Timeline

jhenderson created this revision.Jan 23 2017, 6:10 AM
anemet edited edge metadata.

I think that you want to only add the new DiagnosticKind to the IR layer. The class itself should reside in CodeGen. You should then be able to keep the diagnostic and the code producing it closer. see what I did in D29004.

You also have seemingly unrelated test changes in this patch. Which reminds me; you need to add a test.

lib/IR/DiagnosticInfo.cpp
343–344 ↗(On Diff #85368)

This will produce a pretty cryptic message.

anemet requested changes to this revision.Jan 23 2017, 8:46 AM
This revision now requires changes to proceed.Jan 23 2017, 8:46 AM
jhenderson updated this revision to Diff 85409.Jan 23 2017, 9:11 AM
jhenderson edited edge metadata.

Thanks for spotting the test file. I did not mean for that file to be in there (the perils of making local changes...).

I have moved the class into StackProtector.h as suggested. I've also converted the enum into a string when printing. I'm uploading these changes now, and will add tests to it next.

Thanks for working on this!

lib/CodeGen/StackProtector.cpp
485 ↗(On Diff #85409)

Nit: Can we use a StringRef here instead?

jhenderson updated this revision to Diff 85566.EditedJan 24 2017, 4:46 AM

I have switched to using StringRef as suggested and added a test to check that the diagnostic is issued in each different case.

I also made several small changes based on comments in D29027: removed unused Unknown reason; added an end marker to the enum that can be used by the clang-side diagnostics when deciding if stack protection was applied due to a function attribute or command-line switch (i.e. -fstack-protector-all); finally, added to alloca case that it could be a variable length array that caused stack protection to be applied (this is indistinguishable at this point from an explicit use of allloca).

anemet accepted this revision.Feb 7 2017, 9:12 AM

I can't really judge the SSP part but the overall approach looks good to me. Maybe wait a day before committing.

This revision is now accepted and ready to land.Feb 7 2017, 9:12 AM
jhenderson marked an inline comment as done.Feb 7 2017, 9:22 AM
This revision was automatically updated to reflect the committed changes.
davidb reopened this revision.Feb 9 2017, 8:48 AM

The current implementation will result in a remark always being emitted for llc, so we need to make a change to prevent that.

This revision is now accepted and ready to land.Feb 9 2017, 8:48 AM
davidb requested changes to this revision.Feb 9 2017, 8:49 AM

remarks need to be suppressed by default.

This revision now requires changes to proceed.Feb 9 2017, 8:49 AM

It turns out that by default, when no handlers are provided, all diagnostic messages are produced, regardless of their severity. Optimization remarks are a special case that have explicit handling. I can think of five different ways to fix this for my case:

  1. Make DiagnosticInfoSSP derive from DiagnosticInfoOptimizationBase, and then provide a new switch to explicitly enable it.
  2. Remove DiagnosticInfoSSP entirely and use emitOptimizationRemark. The remarks would then be emitted by specifying something like -pass-remarks=stack-protector. I haven't investigated this, but this might make the clang-side changes unnecessary.
  3. Move the isEnabled() virtual function of DiagnosticInfoOptimizationBase up the inheritance hierarchy so that a greater range of subclasses can use the relevant behaviour. Potentially, move it all the way to the top level DiagnosticInfo class and default it to always be enabled, allowing subclasses to override it as desired.
  4. Disable all diagnostics with remark severity by default, either via the implementation in 3) or by querying the severity in "isDiagnosticEnabled" in LLVMContext.cpp, and explicitly enable them only as requested. This would then need to be paired up with a new option to enable all remarks, or at least a switch to enable the new diagnostic itself. This would be similar to what clang does.
  5. Always emitting this diagnostic is actually fine and so the failing tests need fixing.

Stack Protection is a pass, but it isn't really an optimization (in fact the motivation behind the change is to enable users to analyse where it has been applied and find ways to suppress it), so using something with Optimization in the name to emit it doesn't feel right to me. To me, 3) or 4) seem like the overall correct approach. As far as I can see, the only DiagnosticInfo users with DS_Remark severity all go through the DiagnosticInfoOptimizationBase class, so there wouldn't be any unintentional side effects of disabling remarks by default, since these are disabled already.

I guess the question is, should remarks be disabled by default? What do people think?

jhenderson updated this revision to Diff 89356.Feb 22 2017, 6:45 AM
jhenderson edited edge metadata.
jhenderson edited the summary of this revision. (Show Details)

After staring at this for a while, and discussing terminology, I have come to the conclusion that the best way to report these remarks is not via a new diagnostic class, but rather through simply using emitOptimizationRemark. This updated diff simplifies things substantially from the previous version and also gives us clang support for free via -Rpass=stack-protector.

After staring at this for a while, and discussing terminology, I have come to the conclusion that the best way to report these remarks is not via a new diagnostic class, but rather through simply using emitOptimizationRemark. This updated diff simplifies things substantially from the previous version and also gives us clang support for free via -Rpass=stack-protector.

I was going to propose the same thing but I am pretty biased toward opt-remarks so I decided against it. I am glad you came to the same conclusion independently ;).

One thing that you should use is the OptimizationRemarkEmitter facility instead of using the C API. This allows exporting opt remarks into a YAML file with -fsave-optimization-record. Then you can visualize with opt-viewer like this:

https://androm3da.github.io/optviewer-demo/output_analysis/cpython/._Modules_hashtable.c.html#L493

Looks like LoopInfo/DomintorTree is not available in SSP so it's probably more efficient for now to instantiate ORE on the fly rather than using it as analysis pass. Once something like D30128 is implemented for IR-level ORE, I can come back to this and make it use the analysis pass. The same approach is used by the loop passes currently, see LegacyLICMPass::runOnLoop for an example.

jhenderson updated this revision to Diff 89491.Feb 23 2017, 3:36 AM

Thanks, I've updated accordingly. I also added a test case to check that the remark is not emitted if the switch isn't specified.

I had to constify an argument to one of the OptimizationRemarkEmitter::emitOptimizationRemark overloads, since it was missing. I'm happy to add const to the others as appropriate as well, but that's probably a different change.

anemet requested changes to this revision.Feb 23 2017, 4:33 PM
anemet added inline comments.
lib/CodeGen/StackProtector.cpp
226 ↗(On Diff #89491)

Please add a comment that we're constructing ORE on the fly rather than using through the analysis pass to avoid building DominatorTree and LoopInfo which is not available this late in the IR pipeline.

231–233 ↗(On Diff #89491)

OK, clearly we had too many APIs driving this functionality, and not too surprisingly, you didn't pick the right ones. So I went ahead and cleaned up the APIs (e.g. r296019, r296037).

This is not the ctor you want to use (and after my changes, you can't). Please use one of the two that are available now.

254 ↗(On Diff #89491)

Same here, I removed this legacy interface. Please use ORE.emit().

This revision now requires changes to proceed.Feb 23 2017, 4:33 PM
jhenderson updated this revision to Diff 89655.Feb 24 2017, 6:04 AM
jhenderson edited edge metadata.
jhenderson marked 3 inline comments as done.

Addressed comments.

I'm a little unsure about the comment I've added, as I am not familiar with those aspects of the compiler, being new to the scene. Can you confirm that it looks ok? I noticed in particular that there is some DominatorTree information being passed around elsewhere in the stack protector code, but I don't know if this is relevant at all.

Also, I'm not a particular fan of what I've had to do to emit the "function attribute or command-line switch" remark. In the event that there are no basic blocks, the remark is not emitted, but the rest of the stack protection code is still run (although I don't think it does much - there's perhaps a separate opportunity to bail out early, but I don't want to risk changing behaviour in this change). That particular version of the remark feels like a function-level remark, whereas from what I can see, the two constructors only really support basic block-level and instruction-level remarks. Would it make sense to have a third constructor in OptimizationRemark that takes a Function instead? It seems to be easy to add, but I don't know if it fits the wider architecture, from your point of view.

anemet requested changes to this revision.Feb 24 2017, 9:56 AM
anemet added subscribers: ab, bogner.

Addressed comments.

I'm a little unsure about the comment I've added, as I am not familiar with those aspects of the compiler, being new to the scene. Can you confirm that it looks ok? I noticed in particular that there is some DominatorTree information being passed around elsewhere in the stack protector code, but I don't know if this is relevant at all.

Looks good. Yes, DT is used *if* it's available and currently it's not available by the nature of how the passes are ordered.

Also, I'm not a particular fan of what I've had to do to emit the "function attribute or command-line switch" remark. In the event that there are no basic blocks, the remark is not emitted, but the rest of the stack protection code is still run (although I don't think it does much - there's perhaps a separate opportunity to bail out early, but I don't want to risk changing behaviour in this change). That particular version of the remark feels like a function-level remark, whereas from what I can see, the two constructors only really support basic block-level and instruction-level remarks. Would it make sense to have a third constructor in OptimizationRemark that takes a Function instead? It seems to be easy to add, but I don't know if it fits the wider architecture, from your point of view.

Yes it's a good idea to add it. We should probably assert in there that !F.empty().

lib/CodeGen/StackProtector.cpp
239 ↗(On Diff #89655)

Convention for this is camel-case: StackProtectorReason. I think it's documented in the comments. If not feel free to add it.

242 ↗(On Diff #89655)

We don't run function passes on declarations which I think is the only reason that this can be empty (i.e. F.isDeclaration()).

245 ↗(On Diff #89655)

You can just << these one by one:

<< ReasonStub << "a func... ";

259–261 ↗(On Diff #89655)

Since you create the remark unconditionally you may as well pipe the the reason into it.

This revision now requires changes to proceed.Feb 24 2017, 9:56 AM
jhenderson updated this revision to Diff 89856.Feb 27 2017, 3:09 AM
jhenderson edited edge metadata.
jhenderson marked 3 inline comments as done.

Addressed review comments.

jhenderson marked an inline comment as done.Feb 27 2017, 3:10 AM
jhenderson added inline comments.
lib/CodeGen/StackProtector.cpp
239 ↗(On Diff #89655)

Done. Turns out it was commented in one place, but not in the place I used. I've added it to every instance.

245 ↗(On Diff #89655)

Done. I originally used this because the operator<< is not overloaded for Twine arguments, so I assumed it would be better to concatenate them together first.

anemet accepted this revision.Feb 27 2017, 2:37 PM

LGTM with these changes.

lib/CodeGen/StackProtector.cpp
238 ↗(On Diff #89856)

Start with upper case.

240 ↗(On Diff #89856)

You want different remark names for these. That helps calculating statistics on them without parsing the text.

lib/IR/DiagnosticInfo.cpp
239 ↗(On Diff #89856)

Convention is to start function names with a verb, i.e. getFirstFunctionBlock.

test/CodeGen/X86/stack-protector-remarks.ll
3 ↗(On Diff #89856)

This should be two lines down. (CHECK-NOTs are matches only within the partition set by the neighboring CHECKs.)

jhenderson marked 5 inline comments as done.Feb 28 2017, 1:26 AM

Thanks for the help. I've made the changes locally. I'll get a colleague with commit access to commit this later today hopefully.

I suspect that I might not be the only one to misunderstand CHECK-NOT!

jhenderson updated this revision to Diff 90005.Feb 28 2017, 3:28 AM

Addressed review comments prior to committing.

This revision was automatically updated to reflect the committed changes.

So this change went in, but the PPC build bots failed because the test does not specify a target triple and llc caused the alloca commands to get discarded due to running different passes, before the stack protector pass was run. This meant that the test failed, as the remarks were never emitted. We fixed that by specifying the X86 target triple, but a colleague pointed out that this test isn't really X86 specific, so should be moved to the generic CodeGen folder. They also suggested using opt to run just the stack protector pass, to allow the test to be target independent. Unfortunately, due to the stack protector pass requiring a target to be run and opt not following the approach of treating unknown or unspecified targets as the current machine (unlike llc), attempting to run the pass causes a crash in opt, unless a target triple is specified (which defeats the purpose of using opt anyway). Indeed, I cannot see a way of modifying anything to get this pass to work with opt, short of changing opt's behaviour when no triple is specified to explicitly calculate one.

@anemet - do you have any thoughts on how to make the testing more generic?

anemet added a comment.Mar 9 2017, 9:20 AM

So this change went in, but the PPC build bots failed because the test does not specify a target triple and llc caused the alloca commands to get discarded due to running different passes, before the stack protector pass was run. This meant that the test failed, as the remarks were never emitted. We fixed that by specifying the X86 target triple, but a colleague pointed out that this test isn't really X86 specific, so should be moved to the generic CodeGen folder. They also suggested using opt to run just the stack protector pass, to allow the test to be target independent. Unfortunately, due to the stack protector pass requiring a target to be run and opt not following the approach of treating unknown or unspecified targets as the current machine (unlike llc), attempting to run the pass causes a crash in opt, unless a target triple is specified (which defeats the purpose of using opt anyway). Indeed, I cannot see a way of modifying anything to get this pass to work with opt, short of changing opt's behaviour when no triple is specified to explicitly calculate one.

Were these public bots?

@anemet - do you have any thoughts on how to make the testing more generic?

I think that we should just pin it to a target. This is not testing the basic functionality but whether we provide diagnostics about it.

Were these public bots?

Yes - it was some of the PPC bots, although I don't know which ones off the top of my head; possibly all of them.

@anemet - do you have any thoughts on how to make the testing more generic?

I think that we should just pin it to a target. This is not testing the basic functionality but whether we provide diagnostics about it.

Ok, that's what we ended up doing (we put a REQUIRES: X86 in the tests along with relevant target triple).