This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
8
NVPTXInstrInfo.cpp
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
2
reg-copy-int.ll

Differential D12093

[NVPTX] Support register copy from i16 to i32 register types
Needs RevisionPublic

Authored by sfantao on Aug 17 2015, 4:28 PM.

Download Raw Diff

Details

Reviewers

jholewinski
jingyue

Summary

In some cases register copies between i16 and i32 are created, but there is no physical register copy implementation for them. This usually happens each time a non-coherent load is emitted for 1- or 2-byte int types. This patch adds support for that.

It is possible this would have to be fixed before the copies are emitted, given that we get things like:

ld.global.nc.u8 %rs7, [%rd81]
cvt.u32.u16 %r56, %rs7;

which means we have an implicit cast from u8 to u16, which suggest that something may be wrong. Nevertheless, for the testcases I have access to, this fix works just fine.

Diff Detail

Event Timeline

sfantao updated this revision to Diff 32353.Aug 17 2015, 4:28 PM

sfantao retitled this revision from to [NVPTX] Support register copy from i16 to i32 register types.

sfantao updated this object.

sfantao added reviewers: jholewinski, jingyue.

sfantao added a subscriber: llvm-commits.

Herald added a subscriber: jholewinski. · View Herald TranscriptAug 17 2015, 4:28 PM

jingyue commandeered this revision.Aug 17 2015, 5:31 PM

jingyue edited reviewers, added: sfantao; removed: jingyue.

jingyue added inline comments.

lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40	Out of curiosity, which code pattern leads to 16-to-32 copy? Normally, COPY traditionally copies the same type. Later, I added int-to-float and float-to-int because bitcast does that.
40–51	The error message here needs to be updated.
test/CodeGen/NVPTX/reg-copy-int.ll
15	I'd reduce this test case. The current one is too large to demonstrate what you're really testing.

jholewinski added inline comments.Aug 17 2015, 5:39 PM

lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40	As far as I know, a register copy from i16 to i32 should not occur. Trying to emit such a copy is usually an indication of something wrong earlier on. Can you please explain precisely the case you're hitting?

Thanks for the review. I am sending a new diff with a more concise regression test. See comments inlined.

lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40	The machine instructions that are getting produced that lead to that physical register copy are: %vreg159<def> = INT_PTX_LDG_GLOBAL_i8areg64 %vreg204; mem:LD1[%lsr.iv810(addrspace=1)](tbaa=!3) Int16Regs:%vreg159 Int64Regs:%vreg204 %vreg161<def> = COPY %vreg159; Int32Regs:%vreg161 Int16Regs:%vreg159 and the IR that generates this is any load of readonly function argument with no aliases.
39–40	I traced the problem down to SelectLDGLDU. INT_PTX_LDG_GLOBAL_i8areg64 is generated in there and loads an i8 to an i16 register.
40–51	Done!
test/CodeGen/NVPTX/reg-copy-int.ll
15	I'm was very happy with the regression test either... I was having an hard time replicating the problem with a small example and just realized that this issue only happens for readonly function arguments. The new regression is more concise.

Add more concise regression test.

jholewinski added inline comments.Aug 18 2015, 11:51 AM

lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40	Right, we don't define i8 registers, so a load of an i8 has to be put in an i16 register. We still shouldn't have a cross-class register copy here, though.

jingyue added inline comments.Aug 26 2015, 2:52 PM

lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40	An even simpler test case is target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64" target triple = "nvptx64-unknown-unknown" define void @ex(i8* noalias readonly %data, i32* %res) { entry: %val = load i8, i8* %data %valext = zext i8 %val to i32 store i32 %valext, i32* %res ret void } !nvvm.annotations = !{!0} !0 = !{void (i8, i32)* @ex, !"kernel", i32 1} `COPY` is generated when emitting `store`ing an `zext`ed load. Back up one step, is it right to emit an SDNode that returns int16reg when its corresponding IR instruction returns i8? Do ISel and other machine level passes assume some sort of consistency between IR types and SDNode types?

This is an issue with LDG handling. canLowerToLDG() is returning true for an i8 load zero-extended to i32, but SelectLDGLDU cannot handle this case.

I propose we fix SelectLDGLDU() [or the call-site in SelectLoad()] instead of working around this issue.

This revision now requires changes to proceed.Mar 9 2016, 10:22 AM

Revision Contents

Path

Size

lib/

Target/

NVPTX/

NVPTXInstrInfo.cpp

14 lines

test/

CodeGen/

NVPTX/

reg-copy-int.ll

21 lines

Diff 32367

lib/Target/NVPTX/NVPTXInstrInfo.cpp

	Show All 30 Lines
	NVPTXInstrInfo::NVPTXInstrInfo() : NVPTXGenInstrInfo(), RegInfo() {}			NVPTXInstrInfo::NVPTXInstrInfo() : NVPTXGenInstrInfo(), RegInfo() {}

	void NVPTXInstrInfo::copyPhysReg(			void NVPTXInstrInfo::copyPhysReg(
	MachineBasicBlock &MBB, MachineBasicBlock::iterator I, DebugLoc DL,			MachineBasicBlock &MBB, MachineBasicBlock::iterator I, DebugLoc DL,
	unsigned DestReg, unsigned SrcReg, bool KillSrc) const {			unsigned DestReg, unsigned SrcReg, bool KillSrc) const {
	const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();			const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
	const TargetRegisterClass *DestRC = MRI.getRegClass(DestReg);			const TargetRegisterClass *DestRC = MRI.getRegClass(DestReg);
	const TargetRegisterClass *SrcRC = MRI.getRegClass(SrcReg);			const TargetRegisterClass *SrcRC = MRI.getRegClass(SrcReg);

	if (DestRC->getSize() != SrcRC->getSize())			if (DestRC->getSize() != SrcRC->getSize()) {
				jingyueUnsubmitted Not Done Reply Inline Actions Out of curiosity, which code pattern leads to 16-to-32 copy? Normally, COPY traditionally copies the same type. Later, I added int-to-float and float-to-int because bitcast does that. jingyue: Out of curiosity, which code pattern leads to 16-to-32 copy? Normally, COPY traditionally…
				jholewinskiUnsubmitted Not Done Reply Inline Actions As far as I know, a register copy from i16 to i32 should not occur. Trying to emit such a copy is usually an indication of something wrong earlier on. Can you please explain precisely the case you're hitting? jholewinski: As far as I know, a register copy from i16 to i32 should not occur. Trying to emit such a copy…
				sfantaoAuthorUnsubmitted Not Done Reply Inline Actions I traced the problem down to SelectLDGLDU. INT_PTX_LDG_GLOBAL_i8areg64 is generated in there and loads an i8 to an i16 register. sfantao: I traced the problem down to SelectLDGLDU. INT_PTX_LDG_GLOBAL_i8areg64 is generated in there…
				jholewinskiUnsubmitted Not Done Reply Inline Actions Right, we don't define i8 registers, so a load of an i8 has to be put in an i16 register. We still shouldn't have a cross-class register copy here, though. jholewinski: Right, we don't define i8 registers, so a load of an i8 has to be put in an i16 register. We…
				jingyueUnsubmitted Not Done Reply Inline Actions An even simpler test case is target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64" target triple = "nvptx64-unknown-unknown" define void @ex(i8* noalias readonly %data, i32* %res) { entry: %val = load i8, i8* %data %valext = zext i8 %val to i32 store i32 %valext, i32* %res ret void } !nvvm.annotations = !{!0} !0 = !{void (i8, i32)* @ex, !"kernel", i32 1} `COPY` is generated when emitting `store`ing an `zext`ed load. Back up one step, is it right to emit an SDNode that returns int16reg when its corresponding IR instruction returns i8? Do ISel and other machine level passes assume some sort of consistency between IR types and SDNode types? jingyue: An even simpler test case is ``` target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64" target…
				sfantaoAuthorUnsubmitted Not Done Reply Inline Actions The machine instructions that are getting produced that lead to that physical register copy are: %vreg159<def> = INT_PTX_LDG_GLOBAL_i8areg64 %vreg204; mem:LD1[%lsr.iv810(addrspace=1)](tbaa=!3) Int16Regs:%vreg159 Int64Regs:%vreg204 %vreg161<def> = COPY %vreg159; Int32Regs:%vreg161 Int16Regs:%vreg159 and the IR that generates this is any load of readonly function argument with no aliases. sfantao: The machine instructions that are getting produced that lead to that physical register copy are…
	report_fatal_error("Copy one register into another with a different width");			// If the sizes differ it may be possible we are copying a i16 to a i32
				// register.
				if (DestRC == &NVPTX::Int32RegsRegClass &&
				SrcRC == &NVPTX::Int16RegsRegClass) {
				BuildMI(MBB, I, DL, get(NVPTX::CVT_u32_u16), DestReg)
				.addReg(SrcReg, getKillRegState(KillSrc))
				.addImm(0);
				return;
				}
				report_fatal_error("Invalid register copy - only 2-byte to 4-byte integer or same bitwidth copies are allowed");
				}
				jingyueUnsubmitted Not Done Reply Inline Actions The error message here needs to be updated. jingyue: The error message here needs to be updated.
				sfantaoAuthorUnsubmitted Not Done Reply Inline Actions Done! sfantao: Done!

	unsigned Op;			unsigned Op;
	if (DestRC == &NVPTX::Int1RegsRegClass) {			if (DestRC == &NVPTX::Int1RegsRegClass) {
	Op = NVPTX::IMOV1rr;			Op = NVPTX::IMOV1rr;
	} else if (DestRC == &NVPTX::Int16RegsRegClass) {			} else if (DestRC == &NVPTX::Int16RegsRegClass) {
	Op = NVPTX::IMOV16rr;			Op = NVPTX::IMOV16rr;
	} else if (DestRC == &NVPTX::Int32RegsRegClass) {			} else if (DestRC == &NVPTX::Int32RegsRegClass) {
	Op = (SrcRC == &NVPTX::Int32RegsRegClass ? NVPTX::IMOV32rr			Op = (SrcRC == &NVPTX::Int32RegsRegClass ? NVPTX::IMOV32rr
	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

test/CodeGen/NVPTX/reg-copy-int.ll

This file was added.

				; RUN: llc < %s -O3 -march=nvptx64 -mcpu=sm_35 \| FileCheck %s

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
				target triple = "nvptx64-unknown-unknown"

				; CHECK-LABEL ex(
				define void @ex(i8 addrspace(1)* noalias readonly dereferenceable(1) %data, i8 addrspace(1)* noalias dereferenceable(1) %res, i8 %op) {
				entry:
				%opext = zext i8 %op to i32
				; CHECK: ld.global.nc.u8 {{.*}}[[r1:%.+]], [%r{{.+}}];
				; CHECK: cvt.u32.u16 {{.*}}%r{{.+}}, [[r1]];
				%val = load i8, i8 addrspace(1)* %data, align 1
				%valext = zext i8 %val to i32
				%resval = add nuw nsw i32 %valext, %opext
				%restrunc = trunc i32 %resval to i8
				jingyueUnsubmitted Not Done Reply Inline Actions I'd reduce this test case. The current one is too large to demonstrate what you're really testing. jingyue: I'd reduce this test case. The current one is too large to demonstrate what you're really…
				sfantaoAuthorUnsubmitted Not Done Reply Inline Actions I'm was very happy with the regression test either... I was having an hard time replicating the problem with a small example and just realized that this issue only happens for readonly function arguments. The new regression is more concise. sfantao: I'm was very happy with the regression test either... I was having an hard time replicating the…
				store i8 %restrunc, i8 addrspace(1)* %res
				ret void;
				}

				!nvvm.annotations = !{!0}
				!0 = !{void (i8 addrspace(1), i8 addrspace(1), i8)* @ex, !"kernel", i32 1}