This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
BasicBlockUtils.h
-
lib/Transforms/
-
Transforms/
-
Instrumentation/
-
SanitizerCoverage.cpp
-
Utils/
-
BreakCriticalEdges.cpp
-
test/Instrumentation/SanitizerCoverage/
-
Instrumentation/
-
SanitizerCoverage/
-
unreachable-critedge.ll

Differential D57982

[SanitizierCoverage] Avoid splitting critical edges when destination is a basic block containing unreachable
ClosedPublic

Authored by craig.topper on Feb 8 2019, 3:26 PM.

Download Raw Diff

Details

Reviewers

yln
kcc
morehouse
rnk

Commits

rG03e93f514a54: [SanitizerCoverage] Avoid splitting critical edges when destination is a basic…
rL355947: [SanitizerCoverage] Avoid splitting critical edges when destination is a basic…

Summary

This patch adds a new option to SplitAllCriticalEdges and uses it to avoid splitting critical edges when the destination basic block ends with unreachable. Otherwise if we split the critical edge, sanitizer coverage will instrument the new block that gets inserted for the split. But since this block itself shouldn't be reachable this is pointless. These basic blocks will stick around and generate assembly, but they don't end in sane control flow and might get placed at the end of the function. This makes it look like one function has code that flows into the next function.

The test case included here doesn't exist in the repo yet, but I made the patch relative to it to show the diff from this change.

This showed up while compiling the linux kernel with clang. The kernel has a tool called objtool that detected the code that appeared to flow from one function to the next. https://github.com/ClangBuiltLinux/linux/issues/351#issuecomment-461698884

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Feb 8 2019, 3:26 PM

nickdesaulniers added a subscriber: nickdesaulniers.Feb 8 2019, 3:31 PM

But since this block itself shouldn't be reachable this is pointless.

Your testcase shows an empty unreachable block, but it's also possible to have a block that ends with an unreachable, but still has reachable code, like a call to exit().

In D57982#1391319, @efriedma wrote:

But since this block itself shouldn't be reachable this is pointless.

Your testcase shows an empty unreachable block, but it's also possible to have a block that ends with an unreachable, but still has reachable code, like a call to exit().

Should this code from SanitizerCoverage.cpp be checking that the first non-debug instruction is an unreachable instead of the terminator?

static bool shouldInstrumentBlock(const Function &F, const BasicBlock *BB,
                                  const DominatorTree *DT,
                                  const PostDominatorTree *PDT,
                                  const SanitizerCoverageOptions &Options) {
  // Don't insert coverage for unreachable blocks: we will never call
  // __sanitizer_cov() for them, so counting them in
  // NumberOfInstrumentedBlocks() might complicate calculation of code coverage
  // percentage. Also, unreachable instructions frequently have no debug
  // locations.
  if (isa<UnreachableInst>(BB->getTerminator()))
    return false;

In D57982#1391344, @craig.topper wrote:

Should this code from SanitizerCoverage.cpp be checking that the first non-debug instruction is an unreachable instead of the terminator?

static bool shouldInstrumentBlock(const Function &F, const BasicBlock *BB,
                                  const DominatorTree *DT,
                                  const PostDominatorTree *PDT,
                                  const SanitizerCoverageOptions &Options) {
  // Don't insert coverage for unreachable blocks: we will never call
  // __sanitizer_cov() for them, so counting them in
  // NumberOfInstrumentedBlocks() might complicate calculation of code coverage
  // percentage. Also, unreachable instructions frequently have no debug
  // locations.
  if (isa<UnreachableInst>(BB->getTerminator()))
    return false;

Here's how I interpret the comment. I don't believe the "we will never call __sanitizer_cov for them". Maybe I just don't understand when __sanitizer_cov would be called, so that could be my lack of understanding. But, I think the second portion indicates that this is *intended* to avoid instrumenting fatal error patterns typically produced by macros like assert. Sanitizer coverage mostly exists to guide fuzzers, and I think, in the context of a codebase that does not use exceptions, i.e. most existing users (no value judgement), it's uninteresting to explore paths that lead to rejecting inputs with a fatal error. Of course, it's very interesting to explore longjmp and throw, which also create blocks that end in unreachable.

I think we'll need guidance from @kcc to know what needs to be done.

Ping

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 26 2019, 1:51 PM

@vitalybuka @morehouse, any thoughts on this comment:
https://github.com/llvm/llvm-project/blob/29ac3a5b822ba8c097a3ae78d983cdb94da43dd4/llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp#L457

I don't think it is correct. I think sanitizer coverage should instrument blocks that end in unreachable, because they are in fact reachable, sometimes asserts and CHECKs fail, or exceptions are thrown.

I don't know all the details of when we might have an UnreachableInst, but I think in general instrumenting blocks that end in unreachable is unhelpful for fuzzing, since we are about to crash anyway (and will therefore be saving the current input whether we "count" the new coverage or not).

Maybe there are cases where UnreachablInst is actually reachable and we should consider instrumenting those with counters, but that is beyond my LLVM knowledge.

A block can end in unreachable, but still have reachable code at the beginning. If the block calls a function that is known not to return, the next instruction after the call will be UnreachableInst. For example https://godbolt.org/z/6AGtOf

In D57982#1412282, @craig.topper wrote:

A block can end in unreachable, but still have reachable code at the beginning. If the block calls a function that is known not to return, the next instruction after the call will be UnreachableInst. For example https://godbolt.org/z/6AGtOf

Sure, but that block is generally still not useful to instrument (for fuzzing). When fuzzing, we save all inputs that either (1) increase coverage as measured by SanitizerCoverage, or (2) crash. So if case 2 happens every time we touch a block that ends in unreachable, there's no point in instrumenting it so that case 1 happens too.

In D57982#1412322, @morehouse wrote:

In D57982#1412282, @craig.topper wrote:

A block can end in unreachable, but still have reachable code at the beginning. If the block calls a function that is known not to return, the next instruction after the call will be UnreachableInst. For example https://godbolt.org/z/6AGtOf

Sure, but that block is generally still not useful to instrument (for fuzzing). When fuzzing, we save all inputs that either (1) increase coverage as measured by SanitizerCoverage, or (2) crash. So if case 2 happens every time we touch a block that ends in unreachable, there's no point in instrumenting it so that case 1 happens too.

This explanation looks reasonable to me.

In D57982#1412373, @vitalybuka wrote:

In D57982#1412322, @morehouse wrote:

Sure, but that block is generally still not useful to instrument (for fuzzing). When fuzzing, we save all inputs that either (1) increase coverage as measured by SanitizerCoverage, or (2) crash. So if case 2 happens every time we touch a block that ends in unreachable, there's no point in instrumenting it so that case 1 happens too.

This explanation looks reasonable to me.

A block ending in unreachable does not necessarily crash, there are two very interesting cases where it doesn't:

C++ throw
longjmp

Maybe longjmp doesn't matter because you will get new coverage after returning to setjmp, but you don't have coverage for the many different ways of jumping to the same setjmp block.

Reid has a good point, and it equally applies to the current code, which doesn't instrument unreachable blocks.

E.g. here:

int foo(int *a) {
  if (a)
    return 666;
  throw 42;
}

if the throw happens we don't get any coverage signal from it because the throw block is not instrumented.
This might mean a minor loss of signal for coverage, or a major loss of signal for other users of SanitizerCoverage.

Here we will get the coverage today, but IIUC not with this patch:

int foo(int *a) {
  if (a)
    *a = 666;
  throw 42;
}

Today, we split a critical edge that leads to throw, and instrument the new BB.

So, apparently, checking for isa<UnreachableInst>(DestBB->getTerminator()) is not the right way to check if the block entry is unreachable.

Thoughts?

In D57982#1412610, @kcc wrote:

So, apparently, checking for isa<UnreachableInst>(DestBB->getTerminator()) is not the right way to check if the block entry is unreachable.

Thoughts?

Here is an idea for the code change: https://reviews.llvm.org/D58740

So, if I understand right, once we do that, we don't need this customization point for SplitAllCriticalEdges?

I think we still need some kind of critical edge splitting change. The issue I was seeing is that we forced all critical edges to be split and then put coverage instrumentation in the block we created for the split. That block doesn't have an unreachable instruction in it. But the only successor of that block does. That coverage instrumentation from the split block got emitted into the final binary, but there was no code after it before the next function started. I don't think https://reviews.llvm.org/D58740 changes that.

Only handle avoid splitting for blocks that start with an unreachable instead of ending with one.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 11:47 PM

Harbormaster completed remote builds in B28793: Diff 189282.Mar 4 2019, 11:48 PM

morehouse added inline comments.Mar 5 2019, 2:34 PM

test/Instrumentation/SanitizerCoverage/unreachable-critedge.ll
42 ↗	(On Diff #189282)	Can we simplify these checks so the test won't break by some unrelated compiler change?

Simplify test checks.

Looks good, I see why this is needed now.

This revision is now accepted and ready to land.Mar 12 2019, 11:08 AM

LGTM

Closed by commit rL355947: [SanitizerCoverage] Avoid splitting critical edges when destination is a basic… (authored by ctopper). · Explain WhyMar 12 2019, 11:21 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

BasicBlockUtils.h

6 lines

lib/

Transforms/

Instrumentation/

SanitizerCoverage.cpp

2 lines

Utils/

BreakCriticalEdges.cpp

4 lines

test/

Instrumentation/

SanitizerCoverage/

unreachable-critedge.ll

46 lines

Diff 190301

llvm/trunk/include/llvm/Transforms/Utils/BasicBlockUtils.h

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	struct CriticalEdgeSplittingOptions {			struct CriticalEdgeSplittingOptions {
	DominatorTree *DT;			DominatorTree *DT;
	PostDominatorTree *PDT;			PostDominatorTree *PDT;
	LoopInfo *LI;			LoopInfo *LI;
	MemorySSAUpdater *MSSAU;			MemorySSAUpdater *MSSAU;
	bool MergeIdenticalEdges = false;			bool MergeIdenticalEdges = false;
	bool KeepOneInputPHIs = false;			bool KeepOneInputPHIs = false;
	bool PreserveLCSSA = false;			bool PreserveLCSSA = false;
				bool IgnoreUnreachableDests = false;

	CriticalEdgeSplittingOptions(DominatorTree *DT = nullptr,			CriticalEdgeSplittingOptions(DominatorTree *DT = nullptr,
	LoopInfo *LI = nullptr,			LoopInfo *LI = nullptr,
	MemorySSAUpdater *MSSAU = nullptr,			MemorySSAUpdater *MSSAU = nullptr,
	PostDominatorTree *PDT = nullptr)			PostDominatorTree *PDT = nullptr)
	: DT(DT), PDT(PDT), LI(LI), MSSAU(MSSAU) {}			: DT(DT), PDT(PDT), LI(LI), MSSAU(MSSAU) {}

	CriticalEdgeSplittingOptions &setMergeIdenticalEdges() {			CriticalEdgeSplittingOptions &setMergeIdenticalEdges() {
	MergeIdenticalEdges = true;			MergeIdenticalEdges = true;
	return *this;			return *this;
	}			}

	CriticalEdgeSplittingOptions &setKeepOneInputPHIs() {			CriticalEdgeSplittingOptions &setKeepOneInputPHIs() {
	KeepOneInputPHIs = true;			KeepOneInputPHIs = true;
	return *this;			return *this;
	}			}

	CriticalEdgeSplittingOptions &setPreserveLCSSA() {			CriticalEdgeSplittingOptions &setPreserveLCSSA() {
	PreserveLCSSA = true;			PreserveLCSSA = true;
	return *this;			return *this;
	}			}

				CriticalEdgeSplittingOptions &setIgnoreUnreachableDests() {
				IgnoreUnreachableDests = true;
				return *this;
				}
	};			};

	/// If this edge is a critical edge, insert a new node to split the critical			/// If this edge is a critical edge, insert a new node to split the critical
	/// edge. This will update the analyses passed in through the option struct.			/// edge. This will update the analyses passed in through the option struct.
	/// This returns the new block if the edge was split, null otherwise.			/// This returns the new block if the edge was split, null otherwise.
	///			///
	/// If MergeIdenticalEdges in the options struct is true (not the default),			/// If MergeIdenticalEdges in the options struct is true (not the default),
	/// all edges from TI to the specified successor will be merged into the same			/// all edges from TI to the specified successor will be merged into the same
	▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Instrumentation/SanitizerCoverage.cpp

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	if (isa<UnreachableInst>(F.getEntryBlock().getTerminator()))
return false;		return false;
// Don't instrument functions using SEH for now. Splitting basic blocks like		// Don't instrument functions using SEH for now. Splitting basic blocks like
// we do for coverage breaks WinEHPrepare.		// we do for coverage breaks WinEHPrepare.
// FIXME: Remove this when SEH no longer uses landingpad pattern matching.		// FIXME: Remove this when SEH no longer uses landingpad pattern matching.
if (F.hasPersonalityFn() &&		if (F.hasPersonalityFn() &&
isAsynchronousEHPersonality(classifyEHPersonality(F.getPersonalityFn())))		isAsynchronousEHPersonality(classifyEHPersonality(F.getPersonalityFn())))
return false;		return false;
if (Options.CoverageType >= SanitizerCoverageOptions::SCK_Edge)		if (Options.CoverageType >= SanitizerCoverageOptions::SCK_Edge)
SplitAllCriticalEdges(F);		SplitAllCriticalEdges(F, CriticalEdgeSplittingOptions().setIgnoreUnreachableDests());
SmallVector<Instruction *, 8> IndirCalls;		SmallVector<Instruction *, 8> IndirCalls;
SmallVector<BasicBlock *, 16> BlocksToInstrument;		SmallVector<BasicBlock *, 16> BlocksToInstrument;
SmallVector<Instruction *, 8> CmpTraceTargets;		SmallVector<Instruction *, 8> CmpTraceTargets;
SmallVector<Instruction *, 8> SwitchTraceTargets;		SmallVector<Instruction *, 8> SwitchTraceTargets;
SmallVector<BinaryOperator *, 8> DivTraceTargets;		SmallVector<BinaryOperator *, 8> DivTraceTargets;
SmallVector<GetElementPtrInst *, 8> GepTraceTargets;		SmallVector<GetElementPtrInst *, 8> GepTraceTargets;

const DominatorTree *DT =		const DominatorTree *DT =
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/BreakCriticalEdges.cpp

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	llvm::SplitCriticalEdge(Instruction *TI, unsigned SuccNum,
// Splitting the critical edge to a pad block is non-trivial. Don't do		// Splitting the critical edge to a pad block is non-trivial. Don't do
// it in this generic function.		// it in this generic function.
if (DestBB->isEHPad()) return nullptr;		if (DestBB->isEHPad()) return nullptr;

// Don't split the non-fallthrough edge from a callbr.		// Don't split the non-fallthrough edge from a callbr.
if (isa<CallBrInst>(TI) && SuccNum > 0)		if (isa<CallBrInst>(TI) && SuccNum > 0)
return nullptr;		return nullptr;

		if (Options.IgnoreUnreachableDests &&
		isa<UnreachableInst>(DestBB->getFirstNonPHIOrDbgOrLifetime()))
		return nullptr;

// Create a new basic block, linking it into the CFG.		// Create a new basic block, linking it into the CFG.
BasicBlock *NewBB = BasicBlock::Create(TI->getContext(),		BasicBlock *NewBB = BasicBlock::Create(TI->getContext(),
TIBB->getName() + "." + DestBB->getName() + "_crit_edge");		TIBB->getName() + "." + DestBB->getName() + "_crit_edge");
// Create our unconditional branch.		// Create our unconditional branch.
BranchInst *NewBI = BranchInst::Create(DestBB, NewBB);		BranchInst *NewBI = BranchInst::Create(DestBB, NewBB);
NewBI->setDebugLoc(TI->getDebugLoc());		NewBI->setDebugLoc(TI->getDebugLoc());

// Branch to the new block, breaking the edge.		// Branch to the new block, breaking the edge.
▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

llvm/trunk/test/Instrumentation/SanitizerCoverage/unreachable-critedge.ll

				; RUN: opt < %s -S -sancov -sanitizer-coverage-level=3 \| FileCheck %s

				; The critical edges to unreachable_bb should not be split.
				define i32 @foo(i32 %c, i32 %d) {
				; CHECK-LABEL: @foo(
				; CHECK: switch i32 [[C:%.]], label [[UNREACHABLE_BB:%.]] [
				; CHECK-NEXT: i32 0, label %exit0
				; CHECK-NEXT: i32 1, label %exit1
				; CHECK-NEXT: i32 2, label %cont
				; CHECK-NEXT: ]
				; CHECK: cont:
				; CHECK: switch i32 [[D:%.*]], label [[UNREACHABLE_BB]] [
				; CHECK-NEXT: i32 0, label %exit2
				; CHECK-NEXT: i32 1, label %exit3
				; CHECK-NEXT: i32 2, label %exit4
				; CHECK-NEXT: ]
				; CHECK: unreachable_bb:
				; CHECK-NEXT: unreachable
				;
				switch i32 %c, label %unreachable_bb [i32 0, label %exit0
				i32 1, label %exit1
				i32 2, label %cont]

				cont:
				switch i32 %d, label %unreachable_bb [i32 0, label %exit2
				i32 1, label %exit3
				i32 2, label %exit4]

				exit0:
				ret i32 0

				exit1:
				ret i32 1

				exit2:
				ret i32 2

				exit3:
				ret i32 3

				exit4:
				ret i32 4

				unreachable_bb:
				unreachable
				}