Download Raw Diff

Details

Reviewers

jdoerfert
tra

Commits

rGecf5b780538e: [NVPTX] Enable AtomicExpandPass for NVPTX

Summary

This patch enables AtomicExpandPass for NVPTX.

Depend on D125652.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tianshilei1992 created this revision.May 15 2022, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 11:23 AM

Herald added subscribers: mattd, gchakrabarti, asavonic and 2 others. · View Herald Transcript

tianshilei1992 requested review of this revision.May 15 2022, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 11:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tianshilei1992 added inline comments.May 15 2022, 11:25 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
257 ↗	(On Diff #429550)	Some targets, such as NVPTX, supports atomic load and store of floating-point variables. We don't need to convert it to integer.

tianshilei1992 mentioned this in D125512: [NVPTX] Enable atomic expansion of atomicrmw for NVPTX.May 15 2022, 11:33 AM

tianshilei1992 added a child revision: D125512: [NVPTX] Enable atomic expansion of atomicrmw for NVPTX.

Harbormaster completed remote builds in B164528: Diff 429550.May 15 2022, 12:20 PM

rebase

tianshilei1992 added a parent revision: D125652: [LLVM] Add a check if should cast atomic operations to integer type.May 15 2022, 6:05 PM

tianshilei1992 edited the summary of this revision. (Show Details)

add test

tianshilei1992 edited the summary of this revision. (Show Details)May 15 2022, 6:09 PM

Harbormaster completed remote builds in B164557: Diff 429584.May 15 2022, 7:02 PM

tra added inline comments.May 16 2022, 11:23 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5161	According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-atom `64-bit atom.{and,or,xor,min,max} require sm_32 or higher.` This must be conditional on the GPU variant we're compiling for, similar to how we handle f64 above.
llvm/test/CodeGen/NVPTX/atomic-expand.ll.ll
7 ↗	(On Diff #429584)	Your code also adds support for atomics on integers, those should be verified, too. In general we do want to have tests for all types native to PTX.

tianshilei1992 added inline comments.May 16 2022, 11:59 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5161	Thanks for pointing it out. Will do. I got a question actually. What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function `hasAtomAddF32`, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function hasAtomAddF32, but now I can't find it. Was it removed because we support a minimum SM version, like SM20?

SM20 is the current minimum. Note that NVIDIA has already stopped supporting pre-SM35 GPUs and we should probably start considering removing support for sm_2x in LLVM, too.

fix comments

tianshilei1992 marked an inline comment as done.May 16 2022, 2:07 PM

Harbormaster completed remote builds in B164744: Diff 429842.May 16 2022, 3:05 PM

tra added inline comments.May 16 2022, 4:17 PM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5162	Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them (i8/i16/f16/i128). If not, this should probably be `llvm_unreachable()`.
llvm/test/CodeGen/NVPTX/atomic-expand.ll
1 ↗	(On Diff #429842)	Nit: You can specify multiple labels as one option `--check-prefixes=A,B,C` I'd also rename the labels. `CHECK`->`COMMON` or `ALL` -- makes the intent clear. `CHECK-SM30` -> `SM30` -- reduces unnecessary clutter and makes it a bit easier to spot the differences between SM30 and SM60.

tianshilei1992 added inline comments.May 17 2022, 9:17 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5162	This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can. However, for other types, like `i16`, with the atomic expand, it's just converted to `AtomicCmpSwap` of type `i16`, and then the isel crashed. For now, I guess the best way is to use `llvm_unreachable` for other cases. In the long term, we may want to support the expansion of types with different bitwidth, but I don't have a clear idea how that would work. Otherwise, we will have to tell the front end that some types are not supported, which kind of breaks the assumption that front end can emit target agnostic code.

tra added inline comments.May 17 2022, 10:51 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5162	I don't think CUDA can generate that kind of code, Why do you believe that's the case? I would assume that if a C++ compilation can, so would CUDA, though we may be sort of lucky and may currently not have appropriate GPU-side overloads for `std::atomic` functions. We do want to make it work, so it's likely to get fixed sooner or later. It may still be a good idea to add these extra test cases to the test, too, possibly commented out with a comment explaining what's going on.

fix comments

tianshilei1992 marked 2 inline comments as done.May 17 2022, 7:58 PM

tianshilei1992 added inline comments.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5162	I added tests for `i8` and `i16`. Another types are not covered because based on https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#type-system they are not supported at the IR level.

Harbormaster completed remote builds in B165012: Diff 430225.May 17 2022, 8:43 PM

tra added inline comments.May 18 2022, 11:19 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
5162	they are not supported [by NVVM] at the IR level. NVVM != LLVM, so it's not a particularly strong reason. LLVM does have limited support for i128 and we should cover it, even if the answer is "it does not work (yet?)".

setMinCmpXchgSizeInBits(32) should fix the i8/i16 testcases, I think.

add i128

tianshilei1992 marked an inline comment as done.May 20 2022, 11:43 AM

tra accepted this revision.May 20 2022, 12:00 PM

This revision is now accepted and ready to land.May 20 2022, 12:00 PM

Harbormaster completed remote builds in B165555: Diff 431016.May 20 2022, 12:18 PM

use setMinCmpXchgSizeInBits(32)

In D125639#3523733, @efriedma wrote:

setMinCmpXchgSizeInBits(32) should fix the i8/i16 testcases, I think.

Yes, that works. Thanks for the info.

fix return

fix tests

Harbormaster completed remote builds in B165570: Diff 431034.May 20 2022, 1:56 PM

Closed by commit rGecf5b780538e: [NVPTX] Enable AtomicExpandPass for NVPTX (authored by tianshilei1992). · Explain WhyMay 20 2022, 2:25 PM

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rGecf5b780538e: [NVPTX] Enable AtomicExpandPass for NVPTX.

Diff 431055

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Show First 20 Lines • Show All 555 Lines • ▼ Show 20 Lines	public:
bool enableAggressiveFMAFusion(EVT VT) const override { return true; }		bool enableAggressiveFMAFusion(EVT VT) const override { return true; }

// The default is to transform llvm.ctlz(x, false) (where false indicates that		// The default is to transform llvm.ctlz(x, false) (where false indicates that
// x == 0 is not undefined behavior) into a branch that checks whether x is 0		// x == 0 is not undefined behavior) into a branch that checks whether x is 0
// and avoids calling ctlz in that case. We have a dedicated ctlz		// and avoids calling ctlz in that case. We have a dedicated ctlz
// instruction, so we say that ctlz is cheap to speculate.		// instruction, so we say that ctlz is cheap to speculate.
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz() const override { return true; }

		AtomicExpansionKind shouldCastAtomicLoadInIR(LoadInst *LI) const override {
		return AtomicExpansionKind::None;
		}

		AtomicExpansionKind shouldCastAtomicStoreInIR(StoreInst *SI) const override {
		return AtomicExpansionKind::None;
		}

		AtomicExpansionKind
		shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override;

private:		private:
const NVPTXSubtarget &STI; // cache the subtarget here		const NVPTXSubtarget &STI; // cache the subtarget here
SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;		SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;

SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;

Show All 26 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 583 Lines • ▼ Show 20 Lines	NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
}		}

// No FEXP2, FLOG2. The PTX ex2 and log2 functions are always approximate.		// No FEXP2, FLOG2. The PTX ex2 and log2 functions are always approximate.
// No FPOW or FREM in PTX.		// No FPOW or FREM in PTX.

// Now deduce the information based on the above mentioned		// Now deduce the information based on the above mentioned
// actions		// actions
computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(STI.getRegisterInfo());

		setMinCmpXchgSizeInBits(32);
}		}

const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {		const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((NVPTXISD::NodeType)Opcode) {		switch ((NVPTXISD::NodeType)Opcode) {
case NVPTXISD::FIRST_NUMBER:		case NVPTXISD::FIRST_NUMBER:
break;		break;
case NVPTXISD::CALL:		case NVPTXISD::CALL:
return "NVPTXISD::CALL";		return "NVPTXISD::CALL";
▲ Show 20 Lines • Show All 4,520 Lines • ▼ Show 20 Lines	case ISD::LOAD:
ReplaceLoadVector(N, DAG, Results);		ReplaceLoadVector(N, DAG, Results);
return;		return;
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
ReplaceINTRINSIC_W_CHAIN(N, DAG, Results);		ReplaceINTRINSIC_W_CHAIN(N, DAG, Results);
return;		return;
}		}
}		}

		NVPTXTargetLowering::AtomicExpansionKind
		NVPTXTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
		Type *Ty = AI->getValOperand()->getType();

		if (AI->isFloatingPointOperation()) {
		if (AI->getOperation() == AtomicRMWInst::BinOp::FAdd) {
		if (Ty->isFloatTy())
		return AtomicExpansionKind::None;
		if (Ty->isDoubleTy() && STI.hasAtomAddF64())
		return AtomicExpansionKind::None;
		}
		return AtomicExpansionKind::CmpXChg;
		}

		assert(Ty->isIntegerTy() && "Ty should be integer at this point");
		auto ITy = cast<llvm::IntegerType>(Ty);

		switch (AI->getOperation()) {
		default:
		return AtomicExpansionKind::CmpXChg;
		case AtomicRMWInst::BinOp::And:
		case AtomicRMWInst::BinOp::Or:
		case AtomicRMWInst::BinOp::Xor:
		case AtomicRMWInst::BinOp::Xchg:
		switch (ITy->getBitWidth()) {
		case 8:
		case 16:
		return AtomicExpansionKind::CmpXChg;
		case 32:
		return AtomicExpansionKind::None;
		case 64:
		if (STI.hasAtomBitwise64())
		traUnsubmitted Done Reply Inline Actions According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-atom `64-bit atom.{and,or,xor,min,max} require sm_32 or higher.` This must be conditional on the GPU variant we're compiling for, similar to how we handle f64 above. tra: According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions Thanks for pointing it out. Will do. I got a question actually. What's the minimum SM version do we currently support? I did find from one of your patch (https://reviews.llvm.org/D24943) that it had the function `hasAtomAddF32`, but now I can't find it. Was it removed because we support a minimum SM version, like SM20? tianshilei1992: Thanks for pointing it out. Will do. I got a question actually. What's the minimum SM version…
		return AtomicExpansionKind::None;
		traUnsubmitted Done Reply Inline Actions Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them (i8/i16/f16/i128). If not, this should probably be `llvm_unreachable()`. tra: Can we handle types other than 32 and 64-bit? If yes, we need tests at least for some of them…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can. However, for other types, like `i16`, with the atomic expand, it's just converted to `AtomicCmpSwap` of type `i16`, and then the isel crashed. For now, I guess the best way is to use `llvm_unreachable` for other cases. In the long term, we may want to support the expansion of types with different bitwidth, but I don't have a clear idea how that would work. Otherwise, we will have to tell the front end that some types are not supported, which kind of breaks the assumption that front end can emit target agnostic code. tianshilei1992: This is quite interesting. I don't think CUDA can generate that kind of code, but OpenMP can.
		traUnsubmitted Done Reply Inline Actions I don't think CUDA can generate that kind of code, Why do you believe that's the case? I would assume that if a C++ compilation can, so would CUDA, though we may be sort of lucky and may currently not have appropriate GPU-side overloads for `std::atomic` functions. We do want to make it work, so it's likely to get fixed sooner or later. It may still be a good idea to add these extra test cases to the test, too, possibly commented out with a comment explaining what's going on. tra: > I don't think CUDA can generate that kind of code, Why do you believe that's the case? I…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions I added tests for `i8` and `i16`. Another types are not covered because based on https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#type-system they are not supported at the IR level. tianshilei1992: I added tests for `i8` and `i16`. Another types are not covered because based on https://docs.
		traUnsubmitted Done Reply Inline Actions they are not supported [by NVVM] at the IR level. NVVM != LLVM, so it's not a particularly strong reason. LLVM does have limited support for i128 and we should cover it, even if the answer is "it does not work (yet?)". tra: > they are not supported [by NVVM] at the IR level. NVVM != LLVM, so it's not a particularly…
		return AtomicExpansionKind::CmpXChg;
		default:
		llvm_unreachable("unsupported width encountered");
		}
		case AtomicRMWInst::BinOp::Add:
		case AtomicRMWInst::BinOp::Sub:
		case AtomicRMWInst::BinOp::Max:
		case AtomicRMWInst::BinOp::Min:
		case AtomicRMWInst::BinOp::UMax:
		case AtomicRMWInst::BinOp::UMin:
		switch (ITy->getBitWidth()) {
		case 8:
		case 16:
		return AtomicExpansionKind::CmpXChg;
		case 32:
		return AtomicExpansionKind::None;
		case 64:
		if (STI.hasAtomMinMax64())
		return AtomicExpansionKind::None;
		return AtomicExpansionKind::CmpXChg;
		default:
		llvm_unreachable("unsupported width encountered");
		}
		}

		return AtomicExpansionKind::CmpXChg;
		}

// Pin NVPTXTargetObjectFile's vtables to this file.		// Pin NVPTXTargetObjectFile's vtables to this file.
NVPTXTargetObjectFile::~NVPTXTargetObjectFile() = default;		NVPTXTargetObjectFile::~NVPTXTargetObjectFile() = default;

MCSection *NVPTXTargetObjectFile::SelectSectionForGlobal(		MCSection *NVPTXTargetObjectFile::SelectSectionForGlobal(
const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const {		const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const {
return getDataSection();		return getDataSection();
}		}

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
// NVPTXLowerArgs is required for correctness and should be run right		// NVPTXLowerArgs is required for correctness and should be run right
// before the address space inference passes.		// before the address space inference passes.
addPass(createNVPTXLowerArgsPass(&getNVPTXTargetMachine()));		addPass(createNVPTXLowerArgsPass(&getNVPTXTargetMachine()));
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addAddressSpaceInferencePasses();		addAddressSpaceInferencePasses();
addStraightLineScalarOptimizationPasses();		addStraightLineScalarOptimizationPasses();
}		}

		addPass(createAtomicExpandPass());

// === LSR and other generic IR passes ===		// === LSR and other generic IR passes ===
TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();
// EarlyCSE is not always strong enough to clean up what LSR produces. For		// EarlyCSE is not always strong enough to clean up what LSR produces. For
// example, GVN can combine		// example, GVN can combine
//		//
// %0 = add %a, %b		// %0 = add %a, %b
// %1 = add %b, %a		// %1 = add %b, %a
//		//
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll

This file was added.

				; RUN: llc < %s -march=nvptx64 -mcpu=sm_30 \| FileCheck %s --check-prefixes=ALL,SM30
				; RUN: llc < %s -march=nvptx64 -mcpu=sm_60 \| FileCheck %s --check-prefixes=ALL,SM60
				; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_30 \| %ptxas-verify %}
				; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_60 \| %ptxas-verify %}

				; CHECK-LABEL: fadd_double
				define void @fadd_double(ptr %0, double %1) {
				entry:
				; SM30: atom.cas.b64
				; SM60: atom.add.f64
				%2 = atomicrmw fadd ptr %0, double %1 monotonic, align 8
				ret void
				}

				; CHECK-LABEL: fadd_float
				define void @fadd_float(ptr %0, float %1) {
				entry:
				; ALL: atom.add.f32
				%2 = atomicrmw fadd ptr %0, float %1 monotonic, align 4
				ret void
				}

				; CHECK-LABEL: bitwise_i32
				define void @bitwise_i32(ptr %0, i32 %1) {
				entry:
				; ALL: atom.and.b32
				%2 = atomicrmw and ptr %0, i32 %1 monotonic, align 4
				; ALL: atom.or.b32
				%3 = atomicrmw or ptr %0, i32 %1 monotonic, align 4
				; ALL: atom.xor.b32
				%4 = atomicrmw xor ptr %0, i32 %1 monotonic, align 4
				; ALL: atom.exch.b32
				%5 = atomicrmw xchg ptr %0, i32 %1 monotonic, align 4
				ret void
				}

				; CHECK-LABEL: bitwise_i64
				define void @bitwise_i64(ptr %0, i64 %1) {
				entry:
				; SM30: atom.cas.b64
				; SM60: atom.and.b64
				%2 = atomicrmw and ptr %0, i64 %1 monotonic, align 8
				; SM30: atom.cas.b64
				; SM60: atom.or.b64
				%3 = atomicrmw or ptr %0, i64 %1 monotonic, align 8
				; SM30: atom.cas.b64
				; SM60: atom.xor.b64
				%4 = atomicrmw xor ptr %0, i64 %1 monotonic, align 8
				; SM30: atom.cas.b64
				; SM60: atom.exch.b64
				%5 = atomicrmw xchg ptr %0, i64 %1 monotonic, align 8
				ret void
				}

				; CHECK-LABEL: minmax_i32
				define void @minmax_i32(ptr %0, i32 %1) {
				entry:
				; ALL: atom.min.s32
				%2 = atomicrmw min ptr %0, i32 %1 monotonic, align 4
				; ALL: atom.max.s32
				%3 = atomicrmw max ptr %0, i32 %1 monotonic, align 4
				; ALL: atom.min.u32
				%4 = atomicrmw umin ptr %0, i32 %1 monotonic, align 4
				; ALL: atom.max.u32
				%5 = atomicrmw umax ptr %0, i32 %1 monotonic, align 4
				ret void
				}

				; CHECK-LABEL: minmax_i64
				define void @minmax_i64(ptr %0, i64 %1) {
				entry:
				; SM30: atom.cas.b64
				; SM60: atom.min.s64
				%2 = atomicrmw min ptr %0, i64 %1 monotonic, align 8
				; SM30: atom.cas.b64
				; SM60: atom.max.s64
				%3 = atomicrmw max ptr %0, i64 %1 monotonic, align 8
				; SM30: atom.cas.b64
				; SM60: atom.min.u64
				%4 = atomicrmw umin ptr %0, i64 %1 monotonic, align 8
				; SM30: atom.cas.b64
				; SM60: atom.max.u64
				%5 = atomicrmw umax ptr %0, i64 %1 monotonic, align 8
				ret void
				}

				; CHECK-LABEL: bitwise_i8
				define void @bitwise_i8(ptr %0, i8 %1) {
				entry:
				; ALL: atom.and.b32
				%2 = atomicrmw and ptr %0, i8 %1 monotonic, align 1
				; ALL: atom.or.b32
				%3 = atomicrmw or ptr %0, i8 %1 monotonic, align 1
				; ALL: atom.xor.b32
				%4 = atomicrmw xor ptr %0, i8 %1 monotonic, align 1
				; ALL: atom.cas.b32
				%5 = atomicrmw xchg ptr %0, i8 %1 monotonic, align 1
				ret void
				}

				; CHECK-LABEL: minmax_i8
				define void @minmax_i8(ptr %0, i8 %1) {
				entry:
				; ALL: atom.cas.b32
				%2 = atomicrmw min ptr %0, i8 %1 monotonic, align 1
				; ALL: atom.cas.b32
				%3 = atomicrmw max ptr %0, i8 %1 monotonic, align 1
				; ALL: atom.cas.b32
				%4 = atomicrmw umin ptr %0, i8 %1 monotonic, align 1
				; ALL: atom.cas.b32
				%5 = atomicrmw umax ptr %0, i8 %1 monotonic, align 1
				ret void
				}

				; CHECK-LABEL: bitwise_i16
				define void @bitwise_i16(ptr %0, i16 %1) {
				entry:
				; ALL: atom.and.b32
				%2 = atomicrmw and ptr %0, i16 %1 monotonic, align 2
				; ALL: atom.or.b32
				%3 = atomicrmw or ptr %0, i16 %1 monotonic, align 2
				; ALL: atom.xor.b32
				%4 = atomicrmw xor ptr %0, i16 %1 monotonic, align 2
				; ALL: atom.cas.b32
				%5 = atomicrmw xchg ptr %0, i16 %1 monotonic, align 2
				ret void
				}

				; CHECK-LABEL: minmax_i16
				define void @minmax_i16(ptr %0, i16 %1) {
				entry:
				; ALL: atom.cas.b32
				%2 = atomicrmw min ptr %0, i16 %1 monotonic, align 2
				; ALL: atom.cas.b32
				%3 = atomicrmw max ptr %0, i16 %1 monotonic, align 2
				; ALL: atom.cas.b32
				%4 = atomicrmw umin ptr %0, i16 %1 monotonic, align 2
				; ALL: atom.cas.b32
				%5 = atomicrmw umax ptr %0, i16 %1 monotonic, align 2
				ret void
				}

				; TODO: We might still want to test other types, such as i128. Currently the
				; backend doesn't support them. Atomic expand only supports expansion to cas of
				; the same bitwidth, which means even after expansion, the back end still
				; doesn't support the instruction. Here we still put the tests. Remove the
				; comment once we have proper support, either from atomic expand or backend.

				; define void @bitwise_i128(ptr %0, i128 %1) {
				; entry:
				; %2 = atomicrmw and ptr %0, i128 %1 monotonic, align 16
				; %3 = atomicrmw or ptr %0, i128 %1 monotonic, align 16
				; %4 = atomicrmw xor ptr %0, i128 %1 monotonic, align 16
				; %5 = atomicrmw xchg ptr %0, i128 %1 monotonic, align 16
				; ret void
				; }

				; define void @minmax_i128(ptr %0, i128 %1) {
				; entry:
				; %2 = atomicrmw min ptr %0, i128 %1 monotonic, align 16
				; %3 = atomicrmw max ptr %0, i128 %1 monotonic, align 16
				; %4 = atomicrmw umin ptr %0, i128 %1 monotonic, align 16
				; %5 = atomicrmw umax ptr %0, i128 %1 monotonic, align 16
				; ret void
				; }

This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Enable AtomicExpandPass for NVPTX
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431055

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll

This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Enable AtomicExpandPass for NVPTXClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431055

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll

[NVPTX] Enable AtomicExpandPass for NVPTX
ClosedPublic