This is an archive of the discontinued LLVM Phabricator instance.

[SanitizierCoverage] Avoid splitting critical edges when destination is a basic block containing unreachable
ClosedPublic

Authored by craig.topper on Feb 8 2019, 3:26 PM.

Details

Summary

This patch adds a new option to SplitAllCriticalEdges and uses it to avoid splitting critical edges when the destination basic block ends with unreachable. Otherwise if we split the critical edge, sanitizer coverage will instrument the new block that gets inserted for the split. But since this block itself shouldn't be reachable this is pointless. These basic blocks will stick around and generate assembly, but they don't end in sane control flow and might get placed at the end of the function. This makes it look like one function has code that flows into the next function.

The test case included here doesn't exist in the repo yet, but I made the patch relative to it to show the diff from this change.

This showed up while compiling the linux kernel with clang. The kernel has a tool called objtool that detected the code that appeared to flow from one function to the next. https://github.com/ClangBuiltLinux/linux/issues/351#issuecomment-461698884

Diff Detail

Event Timeline

craig.topper created this revision.Feb 8 2019, 3:26 PM

But since this block itself shouldn't be reachable this is pointless.

Your testcase shows an empty unreachable block, but it's also possible to have a block that ends with an unreachable, but still has reachable code, like a call to exit().

But since this block itself shouldn't be reachable this is pointless.

Your testcase shows an empty unreachable block, but it's also possible to have a block that ends with an unreachable, but still has reachable code, like a call to exit().

Should this code from SanitizerCoverage.cpp be checking that the first non-debug instruction is an unreachable instead of the terminator?

static bool shouldInstrumentBlock(const Function &F, const BasicBlock *BB,
                                  const DominatorTree *DT,
                                  const PostDominatorTree *PDT,
                                  const SanitizerCoverageOptions &Options) {
  // Don't insert coverage for unreachable blocks: we will never call
  // __sanitizer_cov() for them, so counting them in
  // NumberOfInstrumentedBlocks() might complicate calculation of code coverage
  // percentage. Also, unreachable instructions frequently have no debug
  // locations.
  if (isa<UnreachableInst>(BB->getTerminator()))
    return false;
rnk added a comment.Feb 11 2019, 10:49 AM

Should this code from SanitizerCoverage.cpp be checking that the first non-debug instruction is an unreachable instead of the terminator?

static bool shouldInstrumentBlock(const Function &F, const BasicBlock *BB,
                                  const DominatorTree *DT,
                                  const PostDominatorTree *PDT,
                                  const SanitizerCoverageOptions &Options) {
  // Don't insert coverage for unreachable blocks: we will never call
  // __sanitizer_cov() for them, so counting them in
  // NumberOfInstrumentedBlocks() might complicate calculation of code coverage
  // percentage. Also, unreachable instructions frequently have no debug
  // locations.
  if (isa<UnreachableInst>(BB->getTerminator()))
    return false;

Here's how I interpret the comment. I don't believe the "we will never call __sanitizer_cov for them". Maybe I just don't understand when __sanitizer_cov would be called, so that could be my lack of understanding. But, I think the second portion indicates that this is *intended* to avoid instrumenting fatal error patterns typically produced by macros like assert. Sanitizer coverage mostly exists to guide fuzzers, and I think, in the context of a codebase that does not use exceptions, i.e. most existing users (no value judgement), it's uninteresting to explore paths that lead to rejecting inputs with a fatal error. Of course, it's very interesting to explore longjmp and throw, which also create blocks that end in unreachable.

I think we'll need guidance from @kcc to know what needs to be done.

rnk added a subscriber: vitalybuka.Feb 26 2019, 2:17 PM

@vitalybuka @morehouse, any thoughts on this comment:
https://github.com/llvm/llvm-project/blob/29ac3a5b822ba8c097a3ae78d983cdb94da43dd4/llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp#L457

I don't think it is correct. I think sanitizer coverage should instrument blocks that end in unreachable, because they are in fact reachable, sometimes asserts and CHECKs fail, or exceptions are thrown.

I don't know all the details of when we might have an UnreachableInst, but I think in general instrumenting blocks that end in unreachable is unhelpful for fuzzing, since we are about to crash anyway (and will therefore be saving the current input whether we "count" the new coverage or not).

Maybe there are cases where UnreachablInst is actually reachable and we should consider instrumenting those with counters, but that is beyond my LLVM knowledge.

A block can end in unreachable, but still have reachable code at the beginning. If the block calls a function that is known not to return, the next instruction after the call will be UnreachableInst. For example https://godbolt.org/z/6AGtOf

A block can end in unreachable, but still have reachable code at the beginning. If the block calls a function that is known not to return, the next instruction after the call will be UnreachableInst. For example https://godbolt.org/z/6AGtOf

Sure, but that block is generally still not useful to instrument (for fuzzing). When fuzzing, we save all inputs that either (1) increase coverage as measured by SanitizerCoverage, or (2) crash. So if case 2 happens every time we touch a block that ends in unreachable, there's no point in instrumenting it so that case 1 happens too.

A block can end in unreachable, but still have reachable code at the beginning. If the block calls a function that is known not to return, the next instruction after the call will be UnreachableInst. For example https://godbolt.org/z/6AGtOf

Sure, but that block is generally still not useful to instrument (for fuzzing). When fuzzing, we save all inputs that either (1) increase coverage as measured by SanitizerCoverage, or (2) crash. So if case 2 happens every time we touch a block that ends in unreachable, there's no point in instrumenting it so that case 1 happens too.

This explanation looks reasonable to me.

rnk added a comment.Feb 27 2019, 11:12 AM

Sure, but that block is generally still not useful to instrument (for fuzzing). When fuzzing, we save all inputs that either (1) increase coverage as measured by SanitizerCoverage, or (2) crash. So if case 2 happens every time we touch a block that ends in unreachable, there's no point in instrumenting it so that case 1 happens too.

This explanation looks reasonable to me.

A block ending in unreachable does not necessarily crash, there are two very interesting cases where it doesn't:

  1. C++ throw
  2. longjmp

Maybe longjmp doesn't matter because you will get new coverage after returning to setjmp, but you don't have coverage for the many different ways of jumping to the same setjmp block.

kcc added a comment.Feb 27 2019, 1:16 PM

Reid has a good point, and it equally applies to the current code, which doesn't instrument unreachable blocks.

E.g. here:

int foo(int *a) {
  if (a)
    return 666;
  throw 42;
}

if the throw happens we don't get any coverage signal from it because the throw block is not instrumented.
This might mean a minor loss of signal for coverage, or a major loss of signal for other users of SanitizerCoverage.

Here we will get the coverage today, but IIUC not with this patch:

int foo(int *a) {
  if (a)
    *a = 666;
  throw 42;
}

Today, we split a critical edge that leads to throw, and instrument the new BB.

So, apparently, checking for isa<UnreachableInst>(DestBB->getTerminator()) is not the right way to check if the block entry is unreachable.

Thoughts?

rnk added a comment.Feb 27 2019, 2:51 PM
In D57982#1412610, @kcc wrote:

So, apparently, checking for isa<UnreachableInst>(DestBB->getTerminator()) is not the right way to check if the block entry is unreachable.

Thoughts?

Here is an idea for the code change: https://reviews.llvm.org/D58740

So, if I understand right, once we do that, we don't need this customization point for SplitAllCriticalEdges?

I think we still need some kind of critical edge splitting change. The issue I was seeing is that we forced all critical edges to be split and then put coverage instrumentation in the block we created for the split. That block doesn't have an unreachable instruction in it. But the only successor of that block does. That coverage instrumentation from the split block got emitted into the final binary, but there was no code after it before the next function started. I don't think https://reviews.llvm.org/D58740 changes that.

Only handle avoid splitting for blocks that start with an unreachable instead of ending with one.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 11:47 PM
morehouse added inline comments.Mar 5 2019, 2:34 PM
test/Instrumentation/SanitizerCoverage/unreachable-critedge.ll
42

Can we simplify these checks so the test won't break by some unrelated compiler change?

Simplify test checks.

rnk accepted this revision.Mar 12 2019, 11:08 AM

Looks good, I see why this is needed now.

This revision is now accepted and ready to land.Mar 12 2019, 11:08 AM
This revision was automatically updated to reflect the committed changes.