Download Raw Diff

Details

Reviewers

jdoerfert
tra

Commits

rGecf5b780538e: [NVPTX] Enable AtomicExpandPass for NVPTX

Summary

This patch enables AtomicExpandPass for NVPTX.

Depend on D125652.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,070 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,030 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test

Event Timeline

tianshilei1992 created this revision.May 15 2022, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 11:23 AM

Herald added subscribers: mattd, gchakrabarti, asavonic and 2 others. · View Herald Transcript

tianshilei1992 requested review of this revision.May 15 2022, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 11:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tianshilei1992 added inline comments.May 15 2022, 11:25 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
257 ↗	(On Diff #429550)	Some targets, such as NVPTX, supports atomic load and store of floating-point variables. We don't need to convert it to integer.

tianshilei1992 mentioned this in D125512: [NVPTX] Enable atomic expansion of atomicrmw for NVPTX.May 15 2022, 11:33 AM

tianshilei1992 added a child revision: D125512: [NVPTX] Enable atomic expansion of atomicrmw for NVPTX.

Harbormaster completed remote builds in B164528: Diff 429550.May 15 2022, 12:20 PM

rebase

tianshilei1992 added a parent revision: D125652: [LLVM] Add a check if should cast atomic operations to integer type.May 15 2022, 6:05 PM

tianshilei1992 edited the summary of this revision. (Show Details)

add test

tianshilei1992 edited the summary of this revision. (Show Details)May 15 2022, 6:09 PM

Harbormaster completed remote builds in B164557: Diff 429584.May 15 2022, 7:02 PM

tra added inline comments.May 16 2022, 11:23 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5159	According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-atom `64-bit atom.{and,or,xor,min,max} require sm_32 or higher.` This must be conditional on the GPU variant we're compiling for, similar to how we handle f64 above.
llvm/test/CodeGen/NVPTX/atomic-expand.ll.ll
7	Your code also adds support for atomics on integers, those should be verified, too. In general we do want to have tests for all types native to PTX.

tianshilei1992 added inline comments.May 16 2022, 11:59 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5159	Thanks for pointing it out. Will do. I got a question actually. What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function `hasAtomAddF32`, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function hasAtomAddF32, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

SM20 is the current minimum. Note that NVIDIA has already stopped supporting pre-SM35 GPUs and we should probably start considering removing support for sm_2x in LLVM, too.

fix comments

tianshilei1992 marked an inline comment as done.May 16 2022, 2:07 PM

Harbormaster completed remote builds in B164744: Diff 429842.May 16 2022, 3:05 PM

tra added inline comments.May 16 2022, 4:17 PM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160	Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them (i8/i16/f16/i128). If not, this should probably be `llvm_unreachable()`.
llvm/test/CodeGen/NVPTX/atomic-expand.ll
1 ↗	(On Diff #429842)	Nit: You can specify multiple labels as one option `--check-prefixes=A,B,C` I'd also rename the labels. `CHECK`->`COMMON` or `ALL` -- makes the intent clear. `CHECK-SM30` -> `SM30` -- reduces unnecessary clutter and makes it a bit easier to spot the differences between SM30 and SM60.

tianshilei1992 added inline comments.May 17 2022, 9:17 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160	This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can. However, for other types, like `i16`, with the atomic expand, it's just converted to `AtomicCmpSwap` of type `i16`, and then the isel crashed. For now, I guess the best way is to use `llvm_unreachable` for other cases. In the long term, we may want to support the expansion of types with different bitwidth, but I don't have a clear idea how that would work. Otherwise, we will have to tell the front end that some types are not supported, which kind of breaks the assumption that front end can emit target agnostic code.

tra added inline comments.May 17 2022, 10:51 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160	I don't think CUDA can generate that kind of code, Why do you believe that's the case? I would assume that if a C++ compilation can, so would CUDA, though we may be sort of lucky and may currently not have appropriate GPU-side overloads for `std::atomic` functions. We do want to make it work, so it's likely to get fixed sooner or later. It may still be a good idea to add these extra test cases to the test, too, possibly commented out with a comment explaining what's going on.

fix comments

tianshilei1992 marked 2 inline comments as done.May 17 2022, 7:58 PM

tianshilei1992 added inline comments.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160	I added tests for `i8` and `i16`. Another types are not covered because based on https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#type-system they are not supported at the IR level.

Harbormaster completed remote builds in B165012: Diff 430225.May 17 2022, 8:43 PM

tra added inline comments.May 18 2022, 11:19 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5160	they are not supported [by NVVM] at the IR level. NVVM != LLVM, so it's not a particularly strong reason. LLVM does have limited support for i128 and we should cover it, even if the answer is "it does not work (yet?)".

setMinCmpXchgSizeInBits(32) should fix the i8/i16 testcases, I think.

add i128

tianshilei1992 marked an inline comment as done.May 20 2022, 11:43 AM

tra accepted this revision.May 20 2022, 12:00 PM

This revision is now accepted and ready to land.May 20 2022, 12:00 PM

Harbormaster completed remote builds in B165555: Diff 431016.May 20 2022, 12:18 PM

use setMinCmpXchgSizeInBits(32)

In D125639#3523733, @efriedma wrote:

setMinCmpXchgSizeInBits(32) should fix the i8/i16 testcases, I think.

Yes, that works. Thanks for the info.

fix return

fix tests

Harbormaster completed remote builds in B165570: Diff 431034.May 20 2022, 1:56 PM

Closed by commit rGecf5b780538e: [NVPTX] Enable AtomicExpandPass for NVPTX (authored by tianshilei1992). · Explain WhyMay 20 2022, 2:25 PM

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rGecf5b780538e: [NVPTX] Enable AtomicExpandPass for NVPTX.

Diff 429584

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Show First 20 Lines • Show All 555 Lines • ▼ Show 20 Lines	public:
bool enableAggressiveFMAFusion(EVT VT) const override { return true; }		bool enableAggressiveFMAFusion(EVT VT) const override { return true; }

// The default is to transform llvm.ctlz(x, false) (where false indicates that		// The default is to transform llvm.ctlz(x, false) (where false indicates that
// x == 0 is not undefined behavior) into a branch that checks whether x is 0		// x == 0 is not undefined behavior) into a branch that checks whether x is 0
// and avoids calling ctlz in that case. We have a dedicated ctlz		// and avoids calling ctlz in that case. We have a dedicated ctlz
// instruction, so we say that ctlz is cheap to speculate.		// instruction, so we say that ctlz is cheap to speculate.
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz() const override { return true; }

		AtomicExpansionKind shouldCastAtomicLoadInIR(LoadInst *LI) const override {
		return AtomicExpansionKind::None;
		}

		AtomicExpansionKind shouldCastAtomicStoreInIR(StoreInst *SI) const override {
		return AtomicExpansionKind::None;
		}

		AtomicExpansionKind
		shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override;

private:		private:
const NVPTXSubtarget &STI; // cache the subtarget here		const NVPTXSubtarget &STI; // cache the subtarget here
SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;		SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;

SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;

Show All 26 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 5,119 Lines • ▼ Show 20 Lines	case ISD::LOAD:
ReplaceLoadVector(N, DAG, Results);		ReplaceLoadVector(N, DAG, Results);
return;		return;
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
ReplaceINTRINSIC_W_CHAIN(N, DAG, Results);		ReplaceINTRINSIC_W_CHAIN(N, DAG, Results);
return;		return;
}		}
}		}

		NVPTXTargetLowering::AtomicExpansionKind
		NVPTXTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
		Type *Ty = AI->getValOperand()->getType();
		if (AI->isFloatingPointOperation()) {
		if (AI->getOperation() == AtomicRMWInst::BinOp::FAdd) {
		if (Ty->isFloatTy())
		return AtomicExpansionKind::None;
		if (Ty->isDoubleTy() && STI.hasAtomAddF64())
		return AtomicExpansionKind::None;
		}
		return AtomicExpansionKind::CmpXChg;
		}

		switch (AI->getOperation()) {
		default:
		return AtomicExpansionKind::CmpXChg;
		case AtomicRMWInst::BinOp::Add:
		case AtomicRMWInst::BinOp::Sub:
		case AtomicRMWInst::BinOp::Max:
		case AtomicRMWInst::BinOp::Min:
		case AtomicRMWInst::BinOp::UMax:
		case AtomicRMWInst::BinOp::UMin:
		case AtomicRMWInst::BinOp::Xchg:
		case AtomicRMWInst::BinOp::And:
		case AtomicRMWInst::BinOp::Or:
		case AtomicRMWInst::BinOp::Xor:
		assert(Ty->isIntegerTy());
		switch (cast<llvm::IntegerType>(Ty)->getBitWidth()) {
		case 32:
		return AtomicExpansionKind::None;
		case 64:
		return AtomicExpansionKind::None;
		traUnsubmitted Done Reply Inline Actions According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-atom `64-bit atom.{and,or,xor,min,max} require sm_32 or higher.` This must be conditional on the GPU variant we're compiling for, similar to how we handle f64 above. tra: According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions Thanks for pointing it out. Will do. I got a question actually. What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function `hasAtomAddF32`, but now I can't find it. Was it removed because we support a minimum SM version, like SM20? tianshilei1992: Thanks for pointing it out. Will do. I got a question actually. What's the minimum SM version…
		default:
		traUnsubmitted Done Reply Inline Actions Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them (i8/i16/f16/i128). If not, this should probably be `llvm_unreachable()`. tra: Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can. However, for other types, like `i16`, with the atomic expand, it's just converted to `AtomicCmpSwap` of type `i16`, and then the isel crashed. For now, I guess the best way is to use `llvm_unreachable` for other cases. In the long term, we may want to support the expansion of types with different bitwidth, but I don't have a clear idea how that would work. Otherwise, we will have to tell the front end that some types are not supported, which kind of breaks the assumption that front end can emit target agnostic code. tianshilei1992: This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can.
		traUnsubmitted Done Reply Inline Actions I don't think CUDA can generate that kind of code, Why do you believe that's the case? I would assume that if a C++ compilation can, so would CUDA, though we may be sort of lucky and may currently not have appropriate GPU-side overloads for `std::atomic` functions. We do want to make it work, so it's likely to get fixed sooner or later. It may still be a good idea to add these extra test cases to the test, too, possibly commented out with a comment explaining what's going on. tra: > I don't think CUDA can generate that kind of code, Why do you believe that's the case? I…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions I added tests for `i8` and `i16`. Another types are not covered because based on https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#type-system they are not supported at the IR level. tianshilei1992: I added tests for `i8` and `i16`. Another types are not covered because based on https://docs.
		traUnsubmitted Done Reply Inline Actions they are not supported [by NVVM] at the IR level. NVVM != LLVM, so it's not a particularly strong reason. LLVM does have limited support for i128 and we should cover it, even if the answer is "it does not work (yet?)". tra: > they are not supported [by NVVM] at the IR level. NVVM != LLVM, so it's not a particularly…
		return AtomicExpansionKind::CmpXChg;
		}
		}

		return AtomicExpansionKind::CmpXChg;
		}

// Pin NVPTXTargetObjectFile's vtables to this file.		// Pin NVPTXTargetObjectFile's vtables to this file.
NVPTXTargetObjectFile::~NVPTXTargetObjectFile() = default;		NVPTXTargetObjectFile::~NVPTXTargetObjectFile() = default;

MCSection *NVPTXTargetObjectFile::SelectSectionForGlobal(		MCSection *NVPTXTargetObjectFile::SelectSectionForGlobal(
const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const {		const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const {
return getDataSection();		return getDataSection();
}		}

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
// NVPTXLowerArgs is required for correctness and should be run right		// NVPTXLowerArgs is required for correctness and should be run right
// before the address space inference passes.		// before the address space inference passes.
addPass(createNVPTXLowerArgsPass(&getNVPTXTargetMachine()));		addPass(createNVPTXLowerArgsPass(&getNVPTXTargetMachine()));
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addAddressSpaceInferencePasses();		addAddressSpaceInferencePasses();
addStraightLineScalarOptimizationPasses();		addStraightLineScalarOptimizationPasses();
}		}

		addPass(createAtomicExpandPass());

// === LSR and other generic IR passes ===		// === LSR and other generic IR passes ===
TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();
// EarlyCSE is not always strong enough to clean up what LSR produces. For		// EarlyCSE is not always strong enough to clean up what LSR produces. For
// example, GVN can combine		// example, GVN can combine
//		//
// %0 = add %a, %b		// %0 = add %a, %b
// %1 = add %b, %a		// %1 = add %b, %a
//		//
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/atomic-expand.ll.ll

This file was added.

				; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 \| FileCheck %s --check-prefix=CHECK-SM20
				; RUN: llc < %s -march=nvptx64 -mcpu=sm_60 \| FileCheck %s --check-prefix=CHECK-SM60
				; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 \| %ptxas-verify %}
				; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_60 \| %ptxas-verify %}

				; CHECK-LABEL: foo
				define void @foo(ptr %0, double %1) {
				traUnsubmitted Not Done Reply Inline Actions Your code also adds support for atomics on integers, those should be verified, too. In general we do want to have tests for all types native to PTX. tra: Your code also adds support for atomics on integers, those should be verified, too. In general…
				entry:
				; CHECK-SM20: atom.cas.b64
				; CHECK-SM60: atom.add.f64
				%2 = atomicrmw fadd ptr %0, double %1 monotonic, align 8
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Enable AtomicExpandPass for NVPTX
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 429584

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

llvm/test/CodeGen/NVPTX/atomic-expand.ll.ll

This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Enable AtomicExpandPass for NVPTXClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 429584

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

llvm/test/CodeGen/NVPTX/atomic-expand.ll.ll

[NVPTX] Enable AtomicExpandPass for NVPTX
ClosedPublic