This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Handle ldg created from sign-/zero-extended load
ClosedPublic

Authored by jholewinski on Mar 10 2016, 10:55 AM.

Download Raw Diff

Details

Reviewers

jlebar
jingyue

Commits

rGc79979299aea: [NVPTX] Handle ldg created from sign-/zero-extended load
rL265389: [NVPTX] Handle ldg created from sign-/zero-extended load

Diff Detail

Repository: rL LLVM

Event Timeline

jholewinski updated this revision to Diff 50310.Mar 10 2016, 10:55 AM

jholewinski retitled this revision from to [NVPTX] Handle ldg created from sign-/zero-extended load.

jholewinski updated this object.

jholewinski added a reviewer: jingyue.

Herald added a subscriber: jholewinski. · View Herald TranscriptMar 10 2016, 10:55 AM

+jlebar@

I am OOO and maybe unable to review this until next week.

I'm happy to review this, but I need to understand the surrounding code better first, so it may take me a day or two. These two comments are as far as I got before I realized that. :)

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
1326 ↗	(On Diff #50310)	Do you think this warrants a comment? In particular I'm confused why we do this unconditionally, even for non-vector loads.
test/CodeGen/NVPTX/bug26185.ll
1 ↗	(On Diff #50310)	It might be helpful to have a comment here explaining what we're checking. The bug has some explanation, but I think it's not entirely helpful to someone approaching this file without any context.

(I also wonder if this is related to http://reviews.llvm.org/D17872 .)

Thanks for the comments. I'll try to get a new version of this up soon. As for http://reviews.llvm.org/D17872, it seems unlikely to be related. This bug is very specific to the selection of LDG. Though it may be generally related in so far as both are due to i8 handling. I really wish we had a better way of handling that.

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
1326 ↗	(On Diff #50310)	Which part? The setting of NodeVT, or the following loop? The whole point is just to create the result type for the load instruction. For scalars, this will be (iN, Other); for vectors, it will be (iN, ..., Other). Do you have a suggestion for a cleaner way of writing this? I agree that a comment is warranted.
test/CodeGen/NVPTX/bug26185.ll
1 ↗	(On Diff #50310)	Sure; sounds good.

Okay, I understand this! Seems...I don't want to say "sane". "Called for"?

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
1326 ↗	(On Diff #50310)	Yeah, the mechanics of the vector we're building up here are clear to me. But it's not clear why we always convert i8s to i16s in said vector. See below, though, I think this is an ISA detail I was missing.
2057 ↗	(On Diff #50310)	Maybe it would be worth explaining somewhere that the whole reason we're doing this is because there are no i8 registers? That is probably obvious to you, but certainly wasn't to me! Once I got that, this all started to make sense -- we load into an i16, but the top bits are undef, so we clear them and then do the zero extension.
2110 ↗	(On Diff #50310)	Not sure what this is supposed to stand for, and "Opr" doesn't appear anywhere else in llvm.

This revision is now accepted and ready to land.Mar 14 2016, 6:42 PM

Should we land this? It will fix PR26185.

Closed by commit rL265389: [NVPTX] Handle ldg created from sign-/zero-extended load (authored by jholewinski). · Explain WhyApr 5 2016, 5:43 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

NVPTX/

NVPTXISelDAGToDAG.cpp

85 lines

NVPTXInstrInfo.td

10 lines

test/

CodeGen/

NVPTX/

bug26185.ll

57 lines

Diff 52677

llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,280 Lines • ▼ Show 20 Lines

SDNode NVPTXDAGToDAGISel::SelectLDGLDU(SDNode N) {		SDNode NVPTXDAGToDAGISel::SelectLDGLDU(SDNode N) {

SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
SDValue Op1;		SDValue Op1;
MemSDNode *Mem;		MemSDNode *Mem;
bool IsLDG = true;		bool IsLDG = true;

// If this is an LDG intrinsic, the address is the third operand. Its its an		// If this is an LDG intrinsic, the address is the third operand. If its an
// LDG/LDU SD node (from custom vector handling), then its the second operand		// LDG/LDU SD node (from custom vector handling), then its the second operand
if (N->getOpcode() == ISD::INTRINSIC_W_CHAIN) {		if (N->getOpcode() == ISD::INTRINSIC_W_CHAIN) {
Op1 = N->getOperand(2);		Op1 = N->getOperand(2);
Mem = cast<MemIntrinsicSDNode>(N);		Mem = cast<MemIntrinsicSDNode>(N);
unsigned IID = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();		unsigned IID = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
switch (IID) {		switch (IID) {
default:		default:
return NULL;		return NULL;
Show All 14 Lines	SDNode NVPTXDAGToDAGISel::SelectLDGLDU(SDNode N) {
}		}

unsigned Opcode;		unsigned Opcode;
SDLoc DL(N);		SDLoc DL(N);
SDNode *LD;		SDNode *LD;
SDValue Base, Offset, Addr;		SDValue Base, Offset, Addr;

EVT EltVT = Mem->getMemoryVT();		EVT EltVT = Mem->getMemoryVT();
		unsigned NumElts = 1;
if (EltVT.isVector()) {		if (EltVT.isVector()) {
		NumElts = EltVT.getVectorNumElements();
EltVT = EltVT.getVectorElementType();		EltVT = EltVT.getVectorElementType();
}		}

		// Build the "promoted" result VTList for the load. If we are really loading
		// i8s, then the return type will be promoted to i16 since we do not expose
		// 8-bit registers in NVPTX.
		EVT NodeVT = (EltVT == MVT::i8) ? MVT::i16 : EltVT;
		SmallVector<EVT, 5> InstVTs;
		for (unsigned i = 0; i != NumElts; ++i) {
		InstVTs.push_back(NodeVT);
		}
		InstVTs.push_back(MVT::Other);
		SDVTList InstVTList = CurDAG->getVTList(InstVTs);

if (SelectDirectAddr(Op1, Addr)) {		if (SelectDirectAddr(Op1, Addr)) {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
return nullptr;		return nullptr;
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
if (IsLDG) {		if (IsLDG) {
switch (EltVT.getSimpleVT().SimpleTy) {		switch (EltVT.getSimpleVT().SimpleTy) {
default:		default:
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	case NVPTXISD::LDUV4:
case MVT::f32:		case MVT::f32:
Opcode = NVPTX::INT_PTX_LDU_G_v4f32_ELE_avar;		Opcode = NVPTX::INT_PTX_LDU_G_v4f32_ELE_avar;
break;		break;
}		}
break;		break;
}		}

SDValue Ops[] = { Addr, Chain };		SDValue Ops[] = { Addr, Chain };
LD = CurDAG->getMachineNode(Opcode, DL, N->getVTList(), Ops);		LD = CurDAG->getMachineNode(Opcode, DL, InstVTList, Ops);
} else if (TM.is64Bit() ? SelectADDRri64(Op1.getNode(), Op1, Base, Offset)		} else if (TM.is64Bit() ? SelectADDRri64(Op1.getNode(), Op1, Base, Offset)
: SelectADDRri(Op1.getNode(), Op1, Base, Offset)) {		: SelectADDRri(Op1.getNode(), Op1, Base, Offset)) {
if (TM.is64Bit()) {		if (TM.is64Bit()) {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
return nullptr;		return nullptr;
case ISD::LOAD:		case ISD::LOAD:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	if (TM.is64Bit()) {
break;		break;
}		}
break;		break;
}		}
}		}

SDValue Ops[] = { Base, Offset, Chain };		SDValue Ops[] = { Base, Offset, Chain };

LD = CurDAG->getMachineNode(Opcode, DL, N->getVTList(), Ops);		LD = CurDAG->getMachineNode(Opcode, DL, InstVTList, Ops);
} else {		} else {
if (TM.is64Bit()) {		if (TM.is64Bit()) {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
return nullptr;		return nullptr;
case ISD::LOAD:		case ISD::LOAD:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
if (IsLDG) {		if (IsLDG) {
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	if (TM.is64Bit()) {
Opcode = NVPTX::INT_PTX_LDU_G_v4f32_ELE_areg32;		Opcode = NVPTX::INT_PTX_LDU_G_v4f32_ELE_areg32;
break;		break;
}		}
break;		break;
}		}
}		}

SDValue Ops[] = { Op1, Chain };		SDValue Ops[] = { Op1, Chain };
LD = CurDAG->getMachineNode(Opcode, DL, N->getVTList(), Ops);		LD = CurDAG->getMachineNode(Opcode, DL, InstVTList, Ops);
}		}

MachineSDNode::mmo_iterator MemRefs0 = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemRefs0 = MF->allocateMemRefsArray(1);
MemRefs0[0] = Mem->getMemOperand();		MemRefs0[0] = Mem->getMemOperand();
cast<MachineSDNode>(LD)->setMemRefs(MemRefs0, MemRefs0 + 1);		cast<MachineSDNode>(LD)->setMemRefs(MemRefs0, MemRefs0 + 1);

		// For automatic generation of LDG (through SelectLoad[Vector], not the
		// intrinsics), we may have an extending load like:
		//
		// i32,ch = load<LD1[%data1(addrspace=1)], zext from i8> t0, t7, undef:i64
		//
		// Since we load an i8 value, the matching logic above will have selected an
		// LDG instruction that reads i8 and stores it in an i16 register (NVPTX does
		// not expose 8-bit registers):
		//
		// i16,ch = INT_PTX_LDG_GLOBAL_i8areg64 t7, t0
		//
		// To get the correct type in this case, truncate back to i8 and then extend
		// to the original load type.
		EVT OrigType = N->getValueType(0);
		LoadSDNode *LDSD = dyn_cast<LoadSDNode>(N);
		if (LDSD && EltVT == MVT::i8 && OrigType.getScalarSizeInBits() >= 32) {
		unsigned CvtOpc = 0;

		switch (LDSD->getExtensionType()) {
		default:
		llvm_unreachable("An extension is required for i8 loads");
		break;
		case ISD::SEXTLOAD:
		switch (OrigType.getSimpleVT().SimpleTy) {
		default:
		llvm_unreachable("Unhandled integer load type");
		break;
		case MVT::i32:
		CvtOpc = NVPTX::CVT_s32_s8;
		break;
		case MVT::i64:
		CvtOpc = NVPTX::CVT_s64_s8;
		break;
		}
		break;
		case ISD::EXTLOAD:
		case ISD::ZEXTLOAD:
		switch (OrigType.getSimpleVT().SimpleTy) {
		default:
		llvm_unreachable("Unhandled integer load type");
		break;
		case MVT::i32:
		CvtOpc = NVPTX::CVT_u32_u8;
		break;
		case MVT::i64:
		CvtOpc = NVPTX::CVT_u64_u8;
		break;
		}
		break;
		}

		// For each output value, truncate to i8 (since the upper 8 bits are
		// undefined) and then extend to the desired type.
		for (unsigned i = 0; i != NumElts; ++i) {
		SDValue Res(LD, i);
		SDValue OrigVal(N, i);

		SDNode *CvtNode =
		CurDAG->getMachineNode(CvtOpc, DL, OrigType, Res,
		CurDAG->getTargetConstant(NVPTX::PTXCvtMode::NONE, DL, MVT::i32));
		ReplaceUses(OrigVal, SDValue(CvtNode, 0));
		}
		}

return LD;		return LD;
}		}

SDNode NVPTXDAGToDAGISel::SelectStore(SDNode N) {		SDNode NVPTXDAGToDAGISel::SelectStore(SDNode N) {
SDLoc dl(N);		SDLoc dl(N);
StoreSDNode *ST = cast<StoreSDNode>(N);		StoreSDNode *ST = cast<StoreSDNode>(N);
EVT StoreVT = ST->getMemoryVT();		EVT StoreVT = ST->getMemoryVT();
SDNode *NVPTXST = nullptr;		SDNode *NVPTXST = nullptr;
▲ Show 20 Lines • Show All 3,070 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/NVPTX/NVPTXInstrInfo.td

	Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines
	// Type Conversion			// Type Conversion
	//-----------------------------------			//-----------------------------------

	let hasSideEffects = 0 in {			let hasSideEffects = 0 in {
	// Generate a cvt to the given type from all possible types. Each instance			// Generate a cvt to the given type from all possible types. Each instance
	// takes a CvtMode immediate that defines the conversion mode to use. It can			// takes a CvtMode immediate that defines the conversion mode to use. It can
	// be CvtNONE to omit a conversion mode.			// be CvtNONE to omit a conversion mode.
	multiclass CVT_FROM_ALL<string FromName, RegisterClass RC> {			multiclass CVT_FROM_ALL<string FromName, RegisterClass RC> {
				def _s8 :
				NVPTXInst<(outs RC:$dst),
				(ins Int16Regs:$src, CvtMode:$mode),
				!strconcat("cvt${mode:base}${mode:ftz}${mode:sat}.",
				FromName, ".s8\t$dst, $src;"), []>;
				def _u8 :
				NVPTXInst<(outs RC:$dst),
				(ins Int16Regs:$src, CvtMode:$mode),
				!strconcat("cvt${mode:base}${mode:ftz}${mode:sat}.",
				FromName, ".u8\t$dst, $src;"), []>;
	def _s16 :			def _s16 :
	NVPTXInst<(outs RC:$dst),			NVPTXInst<(outs RC:$dst),
	(ins Int16Regs:$src, CvtMode:$mode),			(ins Int16Regs:$src, CvtMode:$mode),
	!strconcat("cvt${mode:base}${mode:ftz}${mode:sat}.",			!strconcat("cvt${mode:base}${mode:ftz}${mode:sat}.",
	FromName, ".s16\t$dst, $src;"), []>;			FromName, ".s16\t$dst, $src;"), []>;
	def _u16 :			def _u16 :
	NVPTXInst<(outs RC:$dst),			NVPTXInst<(outs RC:$dst),
	(ins Int16Regs:$src, CvtMode:$mode),			(ins Int16Regs:$src, CvtMode:$mode),
	▲ Show 20 Lines • Show All 2,390 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/NVPTX/bug26185.ll

				; RUN: llc < %s -march=nvptx -mcpu=sm_35 \| FileCheck %s

				; Verify that we correctly emit code for i8 ldg/ldu. We do not expose 8-bit
				; registers in the backend, so these loads need special handling.

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
				target triple = "nvptx64-unknown-unknown"

				; CHECK-LABEL: ex_zext
				define void @ex_zext(i8* noalias readonly %data, i32* %res) {
				entry:
				; CHECK: ld.global.nc.u8
				%val = load i8, i8* %data
				; CHECK: cvt.u32.u8
				%valext = zext i8 %val to i32
				store i32 %valext, i32* %res
				ret void
				}

				; CHECK-LABEL: ex_sext
				define void @ex_sext(i8* noalias readonly %data, i32* %res) {
				entry:
				; CHECK: ld.global.nc.u8
				%val = load i8, i8* %data
				; CHECK: cvt.s32.s8
				%valext = sext i8 %val to i32
				store i32 %valext, i32* %res
				ret void
				}

				; CHECK-LABEL: ex_zext_v2
				define void @ex_zext_v2(<2 x i8>* noalias readonly %data, <2 x i32>* %res) {
				entry:
				; CHECK: ld.global.nc.v2.u8
				%val = load <2 x i8>, <2 x i8>* %data
				; CHECK: cvt.u32.u16
				%valext = zext <2 x i8> %val to <2 x i32>
				store <2 x i32> %valext, <2 x i32>* %res
				ret void
				}

				; CHECK-LABEL: ex_sext_v2
				define void @ex_sext_v2(<2 x i8>* noalias readonly %data, <2 x i32>* %res) {
				entry:
				; CHECK: ld.global.nc.v2.u8
				%val = load <2 x i8>, <2 x i8>* %data
				; CHECK: cvt.s32.s8
				%valext = sext <2 x i8> %val to <2 x i32>
				store <2 x i32> %valext, <2 x i32>* %res
				ret void
				}

				!nvvm.annotations = !{!0,!1,!2,!3}
				!0 = !{void (i8, i32)* @ex_zext, !"kernel", i32 1}
				!1 = !{void (i8, i32)* @ex_sext, !"kernel", i32 1}
				!2 = !{void (<2 x i8>, <2 x i32>)* @ex_zext_v2, !"kernel", i32 1}
				!3 = !{void (<2 x i8>, <2 x i32>)* @ex_sext_v2, !"kernel", i32 1}