This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Support register copy from i16 to i32 register types
Needs RevisionPublic

Authored by sfantao on Aug 17 2015, 4:28 PM.

Details

Summary

In some cases register copies between i16 and i32 are created, but there is no physical register copy implementation for them. This usually happens each time a non-coherent load is emitted for 1- or 2-byte int types. This patch adds support for that.

It is possible this would have to be fixed before the copies are emitted, given that we get things like:

ld.global.nc.u8 %rs7, [%rd81]
cvt.u32.u16 %r56, %rs7;

which means we have an implicit cast from u8 to u16, which suggest that something may be wrong. Nevertheless, for the testcases I have access to, this fix works just fine.

Diff Detail

Event Timeline

sfantao updated this revision to Diff 32353.Aug 17 2015, 4:28 PM
sfantao retitled this revision from to [NVPTX] Support register copy from i16 to i32 register types.
sfantao updated this object.
sfantao added reviewers: jholewinski, jingyue.
sfantao added a subscriber: llvm-commits.
jingyue commandeered this revision.Aug 17 2015, 5:31 PM
jingyue edited reviewers, added: sfantao; removed: jingyue.
jingyue added inline comments.
lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40

Out of curiosity, which code pattern leads to 16-to-32 copy? Normally, COPY traditionally copies the same type. Later, I added int-to-float and float-to-int because bitcast does that.

40–51

The error message here needs to be updated.

test/CodeGen/NVPTX/reg-copy-int.ll
15

I'd reduce this test case. The current one is too large to demonstrate what you're really testing.

jholewinski added inline comments.Aug 17 2015, 5:39 PM
lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40

As far as I know, a register copy from i16 to i32 should not occur. Trying to emit such a copy is usually an indication of something wrong earlier on. Can you please explain precisely the case you're hitting?

sfantao commandeered this revision.Aug 17 2015, 7:19 PM
sfantao edited reviewers, added: jingyue; removed: sfantao.

Thanks for the review. I am sending a new diff with a more concise regression test. See comments inlined.

lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40

The machine instructions that are getting produced that lead to that physical register copy are:

%vreg159<def> = INT_PTX_LDG_GLOBAL_i8areg64 %vreg204; mem:LD1[%lsr.iv810(addrspace=1)](tbaa=!3) Int16Regs:%vreg159 Int64Regs:%vreg204
%vreg161<def> = COPY %vreg159; Int32Regs:%vreg161 Int16Regs:%vreg159

and the IR that generates this is any load of readonly function argument with no aliases.

39–40

I traced the problem down to SelectLDGLDU. INT_PTX_LDG_GLOBAL_i8areg64 is generated in there and loads an i8 to an i16 register.

40–51

Done!

test/CodeGen/NVPTX/reg-copy-int.ll
15

I'm was very happy with the regression test either... I was having an hard time replicating the problem with a small example and just realized that this issue only happens for readonly function arguments.

The new regression is more concise.

sfantao updated this revision to Diff 32367.Aug 17 2015, 7:19 PM

Add more concise regression test.

jholewinski added inline comments.Aug 18 2015, 11:51 AM
lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40

Right, we don't define i8 registers, so a load of an i8 has to be put in an i16 register. We still shouldn't have a cross-class register copy here, though.

jingyue added inline comments.Aug 26 2015, 2:52 PM
lib/Target/NVPTX/NVPTXInstrInfo.cpp
39–40

An even simpler test case is

target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-unknown-unknown"

define void @ex(i8* noalias readonly %data, i32* %res) {
entry:
  %val = load i8, i8* %data
  %valext = zext i8 %val to i32
  store i32 %valext, i32* %res
  ret void
}

!nvvm.annotations = !{!0}
!0 = !{void (i8*, i32*)* @ex, !"kernel", i32 1}

COPY is generated when emitting storeing an zexted load.

Back up one step, is it right to emit an SDNode that returns int16reg when its corresponding IR instruction returns i8? Do ISel and other machine level passes assume some sort of consistency between IR types and SDNode types?

jholewinski requested changes to this revision.Mar 9 2016, 10:22 AM
jholewinski edited edge metadata.

This is an issue with LDG handling. canLowerToLDG() is returning true for an i8 load zero-extended to i32, but SelectLDGLDU cannot handle this case.

I propose we fix SelectLDGLDU() [or the call-site in SelectLoad()] instead of working around this issue.

This revision now requires changes to proceed.Mar 9 2016, 10:22 AM