This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
1/2
NVPTXTargetMachine.cpp
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
-
nvvm-reflect-arch.ll

Differential D96166

[NVPTX][NewPM] Temporarily disable NVPTX passes in new PM pipeline
ClosedPublic

Authored by aeubanks on Feb 5 2021, 11:16 AM.

Download Raw Diff

Details

Reviewers

echristo
rupprecht

Commits

rG526c0955c08b: [NVPTX][NewPM] Temporarily disable NVPTX passes in new PM pipeline

Summary

These passes are causing numerical discrepancies after being added to
the pipeline. Disable while investigating.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	380 ms	x64 debian > libarcher.races::task-dependency.c
	250 ms	x64 debian > libarcher.races::task-taskgroup-unrelated.c
	360 ms	x64 debian > libarcher.races::task-taskwait-nested.c
	300 ms	x64 debian > libarcher.races::task-two.c
	370 ms	x64 debian > libarcher.task::task-barrier.c
		View Full Test Results (13 Failed)

Event Timeline

aeubanks created this revision.Feb 5 2021, 11:16 AM

Herald added subscribers: hiraditya, jholewinski. · View Herald TranscriptFeb 5 2021, 11:16 AM

aeubanks requested review of this revision.Feb 5 2021, 11:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2021, 11:16 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aeubanks added reviewers: echristo, rupprecht.Feb 5 2021, 11:16 AM

Verified, thanks!

This revision is now accepted and ready to land.Feb 5 2021, 11:25 AM

This revision was landed with ongoing or failed builds.Feb 5 2021, 11:31 AM

Closed by commit rG526c0955c08b: [NVPTX][NewPM] Temporarily disable NVPTX passes in new PM pipeline (authored by aeubanks). · Explain Why

This revision was automatically updated to reflect the committed changes.

aeubanks added a commit: rG526c0955c08b: [NVPTX][NewPM] Temporarily disable NVPTX passes in new PM pipeline.

aeubanks added a subscriber: tra.Feb 5 2021, 12:13 PM

Harbormaster completed remote builds in B88116: Diff 321835.Feb 5 2021, 12:37 PM

tra added inline comments.Feb 8 2021, 9:46 AM

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
232–233	If we didn't run `NVVMReflectPass` before with the new PM and didn't fail, I'm very surprised. This pass is necessary for using NVIDIA's `libdevice` bitcode. Without the pass we'd probably see compiler complaining about unresolved reference to `__nvvm_reflect` function. `NVVMIntrRangePass` applies known range values to some CUDA functions which may allow compiler to optimize a bit better. Can be skipped w/o too much impact. I don't know anything about `createModuleToFunctionPassAdaptor`.

aeubanks mentioned this in D96291: [NVPTX][NewPM] Re-enable NVVMReflectPass.Feb 8 2021, 1:53 PM

aeubanks added inline comments.Feb 8 2021, 1:54 PM

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
232–233	https://reviews.llvm.org/D96291 to re-enable `NVVMReflectPass`, looks like only `NVVMIntrRangePass` is the issue (which makes sense). `createModuleToFunctionPassAdaptor()` is just pass manager infra stuff (changes a function pass to a module pass).

aeubanks mentioned this in rGe84a4650eb7e: [NVPTX][NewPM] Re-enable NVVMReflectPass.Feb 8 2021, 1:58 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

NVPTXTargetMachine.cpp

19 lines

test/

CodeGen/

NVPTX/

nvvm-reflect-arch.ll

5 lines

Diff 321835

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	PB.registerPipelineParsingCallback(
}		}
if (PassName == "nvvm-intr-range") {		if (PassName == "nvvm-intr-range") {
PM.addPass(NVVMIntrRangePass());		PM.addPass(NVVMIntrRangePass());
return true;		return true;
}		}
return false;		return false;
});		});

PB.registerPipelineStartEPCallback(		// FIXME: these passes are causing numerical discrepancies, investigate and
[this, DebugPassManager](ModulePassManager &PM,		// re-enable.
PassBuilder::OptimizationLevel Level) {
FunctionPassManager FPM(DebugPassManager);		// PB.registerPipelineStartEPCallback(
FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));		// [this, DebugPassManager](ModulePassManager &PM,
FPM.addPass(NVVMIntrRangePass(Subtarget.getSmVersion()));		// PassBuilder::OptimizationLevel Level) {
PM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));		// FunctionPassManager FPM(DebugPassManager);
});		// FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
		// FPM.addPass(NVVMIntrRangePass(Subtarget.getSmVersion()));
		traUnsubmitted Not Done Reply Inline Actions If we didn't run `NVVMReflectPass` before with the new PM and didn't fail, I'm very surprised. This pass is necessary for using NVIDIA's `libdevice` bitcode. Without the pass we'd probably see compiler complaining about unresolved reference to `__nvvm_reflect` function. `NVVMIntrRangePass` applies known range values to some CUDA functions which may allow compiler to optimize a bit better. Can be skipped w/o too much impact. I don't know anything about `createModuleToFunctionPassAdaptor`. tra: If we didn't run `NVVMReflectPass` before with the new PM and didn't fail, I'm very surprised.
		aeubanksAuthorUnsubmitted Done Reply Inline Actions https://reviews.llvm.org/D96291 to re-enable `NVVMReflectPass`, looks like only `NVVMIntrRangePass` is the issue (which makes sense). `createModuleToFunctionPassAdaptor()` is just pass manager infra stuff (changes a function pass to a module pass). aeubanks: https://reviews.llvm.org/D96291 to re-enable `NVVMReflectPass`, looks like only…
		// PM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
		// });
}		}

TargetTransformInfo		TargetTransformInfo
NVPTXTargetMachine::getTargetTransformInfo(const Function &F) {		NVPTXTargetMachine::getTargetTransformInfo(const Function &F) {
return TargetTransformInfo(NVPTXTTIImpl(this, F));		return TargetTransformInfo(NVPTXTTIImpl(this, F));
}		}

void NVPTXPassConfig::addEarlyCSEOrGVNPass() {		void NVPTXPassConfig::addEarlyCSEOrGVNPass() {
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/nvvm-reflect-arch.ll

	; Libdevice in recent CUDA versions relies on __CUDA_ARCH reflecting GPU type.			; Libdevice in recent CUDA versions relies on __CUDA_ARCH reflecting GPU type.
	; Verify that __nvvm_reflect() is replaced with an appropriate value.			; Verify that __nvvm_reflect() is replaced with an appropriate value.
	;			;
	; RUN: opt %s -S -nvvm-reflect -O2 -mtriple=nvptx64 \			; FIXME: fix pass and re-enable under new PM
				; RUN: opt %s -S -nvvm-reflect -O2 -enable-new-pm=0 -mtriple=nvptx64 \
	; RUN: \| FileCheck %s --check-prefixes=COMMON,SM20			; RUN: \| FileCheck %s --check-prefixes=COMMON,SM20
	; RUN: opt %s -S -nvvm-reflect -O2 -mtriple=nvptx64 -mcpu=sm_35 \			; RUN: opt %s -S -nvvm-reflect -O2 -enable-new-pm=0 -mtriple=nvptx64 -mcpu=sm_35 \
	; RUN: \| FileCheck %s --check-prefixes=COMMON,SM35			; RUN: \| FileCheck %s --check-prefixes=COMMON,SM35

	@"$str" = private addrspace(1) constant [12 x i8] c"__CUDA_ARCH\00"			@"$str" = private addrspace(1) constant [12 x i8] c"__CUDA_ARCH\00"

	declare i32 @__nvvm_reflect(i8*)			declare i32 @__nvvm_reflect(i8*)

	; COMMON-LABEL: @foo			; COMMON-LABEL: @foo
	define i32 @foo(float %a, float %b) {			define i32 @foo(float %a, float %b) {
	; COMMON-NOT: call i32 @__nvvm_reflect			; COMMON-NOT: call i32 @__nvvm_reflect
	%reflect = call i32 @__nvvm_reflect(i8* addrspacecast (i8 addrspace(1)* getelementptr inbounds ([12 x i8], [12 x i8] addrspace(1)* @"$str", i32 0, i32 0) to i8*))			%reflect = call i32 @__nvvm_reflect(i8* addrspacecast (i8 addrspace(1)* getelementptr inbounds ([12 x i8], [12 x i8] addrspace(1)* @"$str", i32 0, i32 0) to i8*))
	; SM20: ret i32 200			; SM20: ret i32 200
	; SM35: ret i32 350			; SM35: ret i32 350
	ret i32 %reflect			ret i32 %reflect
	}			}