This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
CMakeLists.txt
-
NVPTX.h
-
NVPTXLowerUnreachable.cpp
-
NVPTXTargetMachine.cpp
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
-
unreachable.ll

Differential D152789

NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG.
ClosedPublic

Authored by maleadt on Jun 13 2023, 2:06 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
tra
arsenm
jlebar

Commits

rG1ee4d880e876: NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the…

Summary

PTX does not have a notion of unreachable, which results in emitted basic
blocks having an edge to the next block:

block1:
  call @does_not_return();
  // unreachable
block2:
  // ptxas will create a CFG edge from block1 to block2

This may result in significant changes to the control flow graph, e.g., when
LLVM moves unreachable blocks to the end of the function. That's a problem
in the context of divergent control flow, as ptxas uses the CFG to determine
divergent regions, while some intructions may not be executed divergently.

For example, bar.sync is not allowed to be executed divergently on Pascal
or earlier. If we start with the following:

entry:
  // start of divergent region
  @%p0 bra cont;
  @%p1 bra unlikely;
  ...
  bra.uni cont;
unlikely:
  ...
  // unreachable
cont:
  // end of divergent region
  bar.sync 0;
  bra.uni exit;
exit:
  ret;

it is transformed by the branch-folder and block-placement passes to:

entry:
  // start of divergent region
  @%p0 bra cont;
  @%p1 bra unlikely;
  ...
  bra.uni cont;
cont:
  bar.sync 0;
  bra.uni exit;
unlikely:
  ...
  // unreachable
exit:
  // end of divergent region
  ret;

After moving the unlikely block to the end of the function, it has an edge
to the exit block, which widens the divergent region and makes the bar.sync
instruction happen divergently. That causes wrong computations, as we've been
running into for years with Julia code (which emits a lot of trap +
unreachable code all over the place).

To work around this, add an exit instruction before every unreachable,
as ptxas understands that exit terminates the CFG. Note that trap is not
equivalent, and only future versions of ptxas will model it like exit.
Another alternative would be to emit a branch to the block itself, but emitting
exit seems like a cleaner solution to represent unreachable to me.

Also note that this may not be sufficient, as it's possible that the block
with unreachable control flow is branched to from different divergent regions,
e.g. after block merging, in which case it may still be the case that ptxas
could reconstruct a CFG where divergent regions are merged (I haven't confirmed
this, but also haven't encountered this pattern in the wild yet):

entry:
  // start of divergent region 1
  @%p0 bra cont1;
  @%p1 bra unlikely;
  bra.uni cont1;
cont1:
  // intended end of divergent region 1
  bar.sync 0;
  // start of divergent region 2
  @%p2 bra cont2;
  @%p3 bra unlikely;
  bra.uni cont2;
cont2:
  // intended end of divergent region 2
  bra.uni exit;
unlikely:
  ...
  exit;
exit:
  // possible end of merged divergent region?

I originally tried to avoid the above by cloning paths towards unreachable and
splitting the outgoing edges, but that quickly became too complicated. I propose
we go with the simple solution first, also because modern GPUs with more flexible
hardware thread schedulers don't even suffer from this issue.

Finally, although I expect this to fix most of
https://bugs.llvm.org/show_bug.cgi?id=27738, I do still encounter
miscompilations with Julia's unreachable-heavy code when targeting these
older GPUs using an older ptxas version (specifically, from CUDA 11.4 or
below). This is likely due to related bugs in ptxas which have been fixed
since, as I have filed several reproducers with NVIDIA over the past couple of
years. I'm not inclined to look into fixing those issues over here, and will
instead be recommending our users to upgrade CUDA to 11.5+ when using these GPUs.

Also see:

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

maleadt created this revision.Jun 13 2023, 2:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 2:06 AM

Herald added subscribers: mattd, gchakrabarti, asavonic, hiraditya. · View Herald Transcript

maleadt requested review of this revision.Jun 13 2023, 2:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 2:06 AM

Herald added subscribers: llvm-commits, wdng, jholewinski. · View Herald Transcript

Harbormaster completed remote builds in B238429: Diff 530824.Jun 13 2023, 3:37 AM

vchuravy added a project: Restricted Project.Jun 13 2023, 10:11 AM

vchuravy added a subscriber: vchuravy.

It's unfortunate that we don't seem to have a way to lower unreachable via normal lowering mechanisms. AFAICT we don't even have a selection DAG node type for it.

This revision is now accepted and ready to land.Jun 13 2023, 10:46 AM

Why do we need the structurizer pass as well?

maleadt mentioned this in D153127: Don't add the structurizer pass..Jun 16 2023, 5:20 AM

Don't use the structurizer pass.

Harbormaster completed remote builds in B239385: Diff 532107.Jun 16 2023, 5:22 AM

In D152789#4418519, @jdoerfert wrote:

Why do we need the structurizer pass as well?

We don't, I accidentally added that to this PR.

I removed it, but I'm not sure if I did the right arc incantation (arc diff --update D152789) to push that change, as it's the only one the UI now shows.

Push the entire diff again.

Harbormaster completed remote builds in B239426: Diff 532157.Jun 16 2023, 8:46 AM

LG, thanks

I don't have commit access; can somebody land this for me? Or does this require more review?

In D152789#4434907, @maleadt wrote:

I don't have commit access; can somebody land this for me? Or does this require more review?

I can land it. How do you want me to attribute the patch? Should it be Tim Besard <tim@juliacomputing.com> or something else?

Tim Besard <tim@juliahub.com> would be better, thanks.

This revision was landed with ongoing or failed builds.Jun 21 2023, 11:41 AM

Closed by commit rG1ee4d880e876: NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the… (authored by maleadt, committed by tra). · Explain Why

This revision was automatically updated to reflect the committed changes.

tra added a commit: rG1ee4d880e876: NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the….

GitHub <noreply@github.com> mentioned this in rG5b7a7ec5a210: [NVPTX] Fix code generation for `trap-unreachable`. (#67478).Sep 30 2023, 10:59 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

CMakeLists.txt

1 line

NVPTX.h

1 line

NVPTXLowerUnreachable.cpp

126 lines

NVPTXTargetMachine.cpp

4 lines

test/

CodeGen/

NVPTX/

unreachable.ll

23 lines

Diff 533353

llvm/lib/Target/NVPTX/CMakeLists.txt

Show All 19 Lines	set(NVPTXCodeGen_sources
NVPTXGenericToNVVM.cpp		NVPTXGenericToNVVM.cpp
NVPTXISelDAGToDAG.cpp		NVPTXISelDAGToDAG.cpp
NVPTXISelLowering.cpp		NVPTXISelLowering.cpp
NVPTXImageOptimizer.cpp		NVPTXImageOptimizer.cpp
NVPTXInstrInfo.cpp		NVPTXInstrInfo.cpp
NVPTXLowerAggrCopies.cpp		NVPTXLowerAggrCopies.cpp
NVPTXLowerArgs.cpp		NVPTXLowerArgs.cpp
NVPTXLowerAlloca.cpp		NVPTXLowerAlloca.cpp
		NVPTXLowerUnreachable.cpp
NVPTXPeephole.cpp		NVPTXPeephole.cpp
NVPTXMCExpr.cpp		NVPTXMCExpr.cpp
NVPTXPrologEpilogPass.cpp		NVPTXPrologEpilogPass.cpp
NVPTXRegisterInfo.cpp		NVPTXRegisterInfo.cpp
NVPTXReplaceImageHandles.cpp		NVPTXReplaceImageHandles.cpp
NVPTXSubtarget.cpp		NVPTXSubtarget.cpp
NVPTXTargetMachine.cpp		NVPTXTargetMachine.cpp
NVPTXTargetTransformInfo.cpp		NVPTXTargetTransformInfo.cpp
Show All 34 Lines

llvm/lib/Target/NVPTX/NVPTX.h

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	ModulePass *createNVPTXCtorDtorLoweringLegacyPass();			ModulePass *createNVPTXCtorDtorLoweringLegacyPass();
	FunctionPass *createNVVMIntrRangePass(unsigned int SmVersion);			FunctionPass *createNVVMIntrRangePass(unsigned int SmVersion);
	FunctionPass *createNVVMReflectPass(unsigned int SmVersion);			FunctionPass *createNVVMReflectPass(unsigned int SmVersion);
	MachineFunctionPass *createNVPTXPrologEpilogPass();			MachineFunctionPass *createNVPTXPrologEpilogPass();
	MachineFunctionPass *createNVPTXReplaceImageHandlesPass();			MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
	FunctionPass *createNVPTXImageOptimizerPass();			FunctionPass *createNVPTXImageOptimizerPass();
	FunctionPass *createNVPTXLowerArgsPass();			FunctionPass *createNVPTXLowerArgsPass();
	FunctionPass *createNVPTXLowerAllocaPass();			FunctionPass *createNVPTXLowerAllocaPass();
				FunctionPass *createNVPTXLowerUnreachablePass();
	MachineFunctionPass *createNVPTXPeephole();			MachineFunctionPass *createNVPTXPeephole();
	MachineFunctionPass *createNVPTXProxyRegErasurePass();			MachineFunctionPass *createNVPTXProxyRegErasurePass();

	struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {			struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
	NVVMIntrRangePass();			NVVMIntrRangePass();
	NVVMIntrRangePass(unsigned SmVersion) : SmVersion(SmVersion) {}			NVVMIntrRangePass(unsigned SmVersion) : SmVersion(SmVersion) {}
	PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);			PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp

This file was added.

				//===-- NVPTXLowerUnreachable.cpp - Lower unreachables to exit =====--===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// PTX does not have a notion of `unreachable`, which results in emitted basic
				// blocks having an edge to the next block:
				//
				// block1:
				// call @does_not_return();
				// // unreachable
				// block2:
				// // ptxas will create a CFG edge from block1 to block2
				//
				// This may result in significant changes to the control flow graph, e.g., when
				// LLVM moves unreachable blocks to the end of the function. That's a problem
				// in the context of divergent control flow, as `ptxas` uses the CFG to
				// determine divergent regions, and some intructions may not be executed
				// divergently.
				//
				// For example, `bar.sync` is not allowed to be executed divergently on Pascal
				// or earlier. If we start with the following:
				//
				// entry:
				// // start of divergent region
				// @%p0 bra cont;
				// @%p1 bra unlikely;
				// ...
				// bra.uni cont;
				// unlikely:
				// ...
				// // unreachable
				// cont:
				// // end of divergent region
				// bar.sync 0;
				// bra.uni exit;
				// exit:
				// ret;
				//
				// it is transformed by the branch-folder and block-placement passes to:
				//
				// entry:
				// // start of divergent region
				// @%p0 bra cont;
				// @%p1 bra unlikely;
				// ...
				// bra.uni cont;
				// cont:
				// bar.sync 0;
				// bra.uni exit;
				// unlikely:
				// ...
				// // unreachable
				// exit:
				// // end of divergent region
				// ret;
				//
				// After moving the `unlikely` block to the end of the function, it has an edge
				// to the `exit` block, which widens the divergent region and makes the
				// `bar.sync` instruction happen divergently.
				//
				// To work around this, we add an `exit` instruction before every `unreachable`,
				// as `ptxas` understands that exit terminates the CFG. Note that `trap` is not
				// equivalent, and only future versions of `ptxas` will model it like `exit`.
				//
				//===----------------------------------------------------------------------===//

				#include "NVPTX.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/InlineAsm.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Type.h"
				#include "llvm/Pass.h"

				using namespace llvm;

				namespace llvm {
				void initializeNVPTXLowerUnreachablePass(PassRegistry &);
				}

				namespace {
				class NVPTXLowerUnreachable : public FunctionPass {
				bool runOnFunction(Function &F) override;

				public:
				static char ID; // Pass identification, replacement for typeid
				NVPTXLowerUnreachable() : FunctionPass(ID) {}
				StringRef getPassName() const override {
				return "add an exit instruction before every unreachable";
				}
				};
				} // namespace

				char NVPTXLowerUnreachable::ID = 1;

				INITIALIZE_PASS(NVPTXLowerUnreachable, "nvptx-lower-unreachable",
				"Lower Unreachable", false, false)

				// =============================================================================
				// Main function for this pass.
				// =============================================================================
				bool NVPTXLowerUnreachable::runOnFunction(Function &F) {
				if (skipFunction(F))
				return false;

				LLVMContext &C = F.getContext();
				FunctionType *ExitFTy = FunctionType::get(Type::getVoidTy(C), false);
				InlineAsm *Exit = InlineAsm::get(ExitFTy, "exit;", "", true);

				bool Changed = false;
				for (auto &BB : F)
				for (auto &I : BB) {
				if (auto unreachableInst = dyn_cast<UnreachableInst>(&I)) {
				Changed = true;
				CallInst::Create(ExitFTy, Exit, "", unreachableInst);
				}
				}
				return Changed;
				}

				FunctionPass *llvm::createNVPTXLowerUnreachablePass() {
				return new NVPTXLowerUnreachable();
				}

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

void initializeGenericToNVVMLegacyPassPass(PassRegistry &);		void initializeGenericToNVVMLegacyPassPass(PassRegistry &);
void initializeNVPTXAllocaHoistingPass(PassRegistry &);		void initializeNVPTXAllocaHoistingPass(PassRegistry &);
void initializeNVPTXAssignValidGlobalNamesPass(PassRegistry&);		void initializeNVPTXAssignValidGlobalNamesPass(PassRegistry&);
void initializeNVPTXAtomicLowerPass(PassRegistry &);		void initializeNVPTXAtomicLowerPass(PassRegistry &);
void initializeNVPTXCtorDtorLoweringLegacyPass(PassRegistry &);		void initializeNVPTXCtorDtorLoweringLegacyPass(PassRegistry &);
void initializeNVPTXLowerAggrCopiesPass(PassRegistry &);		void initializeNVPTXLowerAggrCopiesPass(PassRegistry &);
void initializeNVPTXLowerAllocaPass(PassRegistry &);		void initializeNVPTXLowerAllocaPass(PassRegistry &);
		void initializeNVPTXLowerUnreachablePass(PassRegistry &);
void initializeNVPTXCtorDtorLoweringLegacyPass(PassRegistry &);		void initializeNVPTXCtorDtorLoweringLegacyPass(PassRegistry &);
void initializeNVPTXLowerArgsPass(PassRegistry &);		void initializeNVPTXLowerArgsPass(PassRegistry &);
void initializeNVPTXProxyRegErasurePass(PassRegistry &);		void initializeNVPTXProxyRegErasurePass(PassRegistry &);
void initializeNVVMIntrRangePass(PassRegistry &);		void initializeNVVMIntrRangePass(PassRegistry &);
void initializeNVVMReflectPass(PassRegistry &);		void initializeNVVMReflectPass(PassRegistry &);
void initializeNVPTXAAWrapperPassPass(PassRegistry &);		void initializeNVPTXAAWrapperPassPass(PassRegistry &);
void initializeNVPTXExternalAAWrapperPass(PassRegistry &);		void initializeNVPTXExternalAAWrapperPass(PassRegistry &);

Show All 10 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeNVPTXTarget() {
initializeNVVMReflectPass(PR);		initializeNVVMReflectPass(PR);
initializeNVVMIntrRangePass(PR);		initializeNVVMIntrRangePass(PR);
initializeGenericToNVVMLegacyPassPass(PR);		initializeGenericToNVVMLegacyPassPass(PR);
initializeNVPTXAllocaHoistingPass(PR);		initializeNVPTXAllocaHoistingPass(PR);
initializeNVPTXAssignValidGlobalNamesPass(PR);		initializeNVPTXAssignValidGlobalNamesPass(PR);
initializeNVPTXAtomicLowerPass(PR);		initializeNVPTXAtomicLowerPass(PR);
initializeNVPTXLowerArgsPass(PR);		initializeNVPTXLowerArgsPass(PR);
initializeNVPTXLowerAllocaPass(PR);		initializeNVPTXLowerAllocaPass(PR);
		initializeNVPTXLowerUnreachablePass(PR);
initializeNVPTXCtorDtorLoweringLegacyPass(PR);		initializeNVPTXCtorDtorLoweringLegacyPass(PR);
initializeNVPTXLowerAggrCopiesPass(PR);		initializeNVPTXLowerAggrCopiesPass(PR);
initializeNVPTXProxyRegErasurePass(PR);		initializeNVPTXProxyRegErasurePass(PR);
initializeNVPTXDAGToDAGISelPass(PR);		initializeNVPTXDAGToDAGISelPass(PR);
initializeNVPTXAAWrapperPassPass(PR);		initializeNVPTXAAWrapperPassPass(PR);
initializeNVPTXExternalAAWrapperPass(PR);		initializeNVPTXExternalAAWrapperPass(PR);
}		}

▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
//		//
// but EarlyCSE can do neither of them.		// but EarlyCSE can do neither of them.
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();
if (!DisableLoadStoreVectorizer)		if (!DisableLoadStoreVectorizer)
addPass(createLoadStoreVectorizerPass());		addPass(createLoadStoreVectorizerPass());
addPass(createSROAPass());		addPass(createSROAPass());
}		}

		addPass(createNVPTXLowerUnreachablePass());
}		}

bool NVPTXPassConfig::addInstSelector() {		bool NVPTXPassConfig::addInstSelector() {
const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();		const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();

addPass(createLowerAggrCopies());		addPass(createLowerAggrCopies());
addPass(createAllocaHoisting());		addPass(createAllocaHoisting());
addPass(createNVPTXISelDag(getNVPTXTargetMachine(), getOptLevel()));		addPass(createNVPTXISelDag(getNVPTXTargetMachine(), getOptLevel()));
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/unreachable.ll

This file was added.

				; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs \| FileCheck %s
				; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs \| FileCheck %s
				; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs \| %ptxas-verify %}
				; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs \| %ptxas-verify %}

				; CHECK: .extern .func throw
				declare void @throw() #0

				; CHECK: .entry kernel_func
				define void @kernel_func() {
				; CHECK: call.uni
				; CHECK: throw,
				call void @throw()
				; CHECK: exit
				unreachable
				}

				attributes #0 = { noreturn }


				!nvvm.annotations = !{!1}

				!1 = !{ptr @kernel_func, !"kernel", i32 1}