This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/NVPTX/
-
lib/
-
Target/
-
NVPTX/
-
NVPTXTargetMachine.cpp

Differential D26947

[NVPTX] Structurize the NVPTX CFG.
Changes PlannedPublic

Authored by jlebar on Nov 21 2016, 4:30 PM.

Download Raw Diff

Details

Reviewers

tra

Summary

This fixes the failures in PR27738, our longstanding incorrect codegen
bug. It appears that our hypothesis that we were generating control
flow that's too complicated for ptxas to analyze correctly was right.

Diff Detail

Build Status

Buildable 1472
Build 1472: arc lint + arc unit

Event Timeline

jlebar updated this revision to Diff 78801.Nov 21 2016, 4:30 PM

jlebar retitled this revision from to [NVPTX] Structurize the NVPTX CFG..

jlebar updated this object.

jlebar added a reviewer: tra.

jlebar added subscribers: andrewcorrigan, llvm-commits, hfinkel.

Herald added a subscriber: jholewinski. · View Herald TranscriptNov 21 2016, 4:30 PM

jlebar updated this object.Nov 21 2016, 4:32 PM

jlebar updated this object.

jlebar added subscribers: jingyue, majnemer.

@majnemer points out that the structurizer has known bugs, e.g. PR25378, PR31076, PR27488.

Guess we may need to fix those before turning this on. Bummer.

Bummer, indeed. But we do see the light at the end of this particular tunnel. :-)

You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.

In D26947#602031, @majnemer wrote:

You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.

I tried and it didn't make any difference in my testcases.

If we have a test where it does make a difference, I am absolutely onboard with turning it on without getting the structurizer.

In D26947#602034, @jlebar wrote:

In D26947#602031, @majnemer wrote:

You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.

I tried and it didn't make any difference in my testcases.

If we have a test where it does make a difference, I am absolutely onboard with turning it on without getting the structurizer.

Here is one:

$ cat t.ll

define void @f(i32 %x, i32 %n, i32* %p) {
entry:
  %tmp29 = lshr i32 %x, %n
  %tmp3 = and i32 %tmp29, 1
  %tmp4 = icmp eq i32 %tmp3, 0
  br i1 %tmp4, label %bb, label %UnifiedReturnBlock

bb:
  store volatile i32 0, i32* %p
  ret void

UnifiedReturnBlock:
  ret void
}

$ diff <(~/llvm/Debug+Asserts/bin/llc bt.ll -o - -mtriple nvptx64 -enable-tail-merge=true) <(~/llvm/Debug+Asserts/bin/llc bt.ll -o - -mtriple nvptx64 -enable-tail-merge=false)
31a32

ret;

Here is one:

Capturing IRC conversation: I meant, a difference in the correctness of ptxas's generated code.

I am definitely willing to hear arguments that this is too conservative, it's just my instinct not to change stuff without a failing testcase.

Waiting for a more correct structurizer before we turn this on.

vchuravy added a subscriber: vchuravy.Oct 27 2017, 9:43 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

NVPTXTargetMachine.cpp

13 lines

Diff 78801

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	NVPTXTargetMachine::NVPTXTargetMachine(const Target &T, const Triple &TT,
CodeGenOpt::Level OL, bool is64bit)		CodeGenOpt::Level OL, bool is64bit)
// The pic relocation model is used regardless of what the client has		// The pic relocation model is used regardless of what the client has
// specified, as it is the only relocation model currently supported.		// specified, as it is the only relocation model currently supported.
: LLVMTargetMachine(T, computeDataLayout(is64bit), TT, CPU, FS, Options,		: LLVMTargetMachine(T, computeDataLayout(is64bit), TT, CPU, FS, Options,
Reloc::PIC_, CM, OL),		Reloc::PIC_, CM, OL),
is64bit(is64bit),		is64bit(is64bit),
TLOF(make_unique<NVPTXTargetObjectFile>()),		TLOF(make_unique<NVPTXTargetObjectFile>()),
Subtarget(TT, CPU, FS, *this) {		Subtarget(TT, CPU, FS, *this) {
		// NVPTX does not strictly require a structured CFG. However, without a
		// structured CFG we can create control flow that is too complicated for ptxas
		// to analyze, resulting in subtly incorrect generated code, e.g. PR 27738.
		setRequiresStructuredCFG(true);

if (TT.getOS() == Triple::NVCL)		if (TT.getOS() == Triple::NVCL)
drvInterface = NVPTX::NVCL;		drvInterface = NVPTX::NVCL;
else		else
drvInterface = NVPTX::CUDA;		drvInterface = NVPTX::CUDA;
initAsmInfo();		initAsmInfo();
}		}

NVPTXTargetMachine::~NVPTXTargetMachine() {}		NVPTXTargetMachine::~NVPTXTargetMachine() {}
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
// and		// and
//		//
// %0 = shl nsw %a, 2		// %0 = shl nsw %a, 2
// %1 = shl %a, 2		// %1 = shl %a, 2
//		//
// but EarlyCSE can do neither of them.		// but EarlyCSE can do neither of them.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();

		// NVPTX does not strictly require a structured CFG. However, without a
		// structured CFG we can emit control flow that is too complicated for ptxas
		// to analyze, resulting in subtly incorrect generated code, e.g. PR 27738.
		//
		// SkipUniformRegions == true lets us emit unstructured control flow where we
		// can prove that it is uniform across all the threads in a warp.
		addPass(createStructurizeCFGPass(/* SkipUniformRegions = */ true));
}		}

bool NVPTXPassConfig::addInstSelector() {		bool NVPTXPassConfig::addInstSelector() {
const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();		const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();

addPass(createLowerAggrCopies());		addPass(createLowerAggrCopies());
addPass(createAllocaHoisting());		addPass(createAllocaHoisting());
addPass(createNVPTXISelDag(getNVPTXTargetMachine(), getOptLevel()));		addPass(createNVPTXISelDag(getNVPTXTargetMachine(), getOptLevel()));
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines