This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Expand vector trunc stores from i16 to i8
ClosedPublic

Authored by rampitec on Apr 7 2020, 4:44 PM.

Download Raw Diff

Details

Reviewers

arsenm
kzhuravl

Commits

rGf96810ff346d: [AMDGPU] Expand vector trunc stores from i16 to i8

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Apr 7 2020, 4:44 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald TranscriptApr 7 2020, 4:44 PM

Can merge with the existing store tests

In D77693#1968323, @arsenm wrote:

Can merge with the existing store tests

Which file do you prefer?

In D77693#1968350, @rampitec wrote:

In D77693#1968323, @arsenm wrote:

Can merge with the existing store tests

Which file do you prefer?

In fact that was long annoying me, we have many tests with zillions of functions. When I need to debug I first have to find which function has failed, then extract it.

I understand that we do not want to have a billion of files, but really a number of failures does not tell much anymore and a four or even five digit line number within a test usually discourages.

I know it is not only me, after all that is why we have all that update* scripts. Then we update a huge test and I refuse to believe everybody really looks to all the changes. We are saving on forks, but we are loosing in test quality I suppose.

In D77693#1968350, @rampitec wrote:

In D77693#1968323, @arsenm wrote:

Can merge with the existing store tests

Which file do you prefer?

In D77693#1968400, @rampitec wrote:

In D77693#1968350, @rampitec wrote:

In D77693#1968323, @arsenm wrote:

Can merge with the existing store tests

Which file do you prefer?

In fact that was long annoying me, we have many tests with zillions of functions. When I need to debug I first have to find which function has failed, then extract it.

I understand that we do not want to have a billion of files, but really a number of failures does not tell much anymore and a four or even five digit line number within a test usually discourages.

I know it is not only me, after all that is why we have all that update* scripts. Then we update a huge test and I refuse to believe everybody really looks to all the changes. We are saving on forks, but we are loosing in test quality I suppose.

Some of the older tests, like load and store, are not consistent enough. Ideally we would have tests for all of the types of loads for every address space (possibly generated)

This revision is now accepted and ready to land.Apr 7 2020, 7:44 PM

Closed by commit rGf96810ff346d: [AMDGPU] Expand vector trunc stores from i16 to i8 (authored by rampitec). · Explain WhyApr 7 2020, 9:48 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 9:48 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIISelLowering.cpp

5 lines

test/

CodeGen/

AMDGPU/

trunc-store-vec-i16-to-i8.ll

60 lines

Diff 255898

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,
setTruncStoreAction(MVT::v8i32, MVT::v8i16, Expand);		setTruncStoreAction(MVT::v8i32, MVT::v8i16, Expand);
setTruncStoreAction(MVT::v16i32, MVT::v16i16, Expand);		setTruncStoreAction(MVT::v16i32, MVT::v16i16, Expand);
setTruncStoreAction(MVT::v32i32, MVT::v32i16, Expand);		setTruncStoreAction(MVT::v32i32, MVT::v32i16, Expand);
setTruncStoreAction(MVT::v2i32, MVT::v2i8, Expand);		setTruncStoreAction(MVT::v2i32, MVT::v2i8, Expand);
setTruncStoreAction(MVT::v4i32, MVT::v4i8, Expand);		setTruncStoreAction(MVT::v4i32, MVT::v4i8, Expand);
setTruncStoreAction(MVT::v8i32, MVT::v8i8, Expand);		setTruncStoreAction(MVT::v8i32, MVT::v8i8, Expand);
setTruncStoreAction(MVT::v16i32, MVT::v16i8, Expand);		setTruncStoreAction(MVT::v16i32, MVT::v16i8, Expand);
setTruncStoreAction(MVT::v32i32, MVT::v32i8, Expand);		setTruncStoreAction(MVT::v32i32, MVT::v32i8, Expand);
		setTruncStoreAction(MVT::v2i16, MVT::v2i8, Expand);
		setTruncStoreAction(MVT::v4i16, MVT::v4i8, Expand);
		setTruncStoreAction(MVT::v8i16, MVT::v8i8, Expand);
		setTruncStoreAction(MVT::v16i16, MVT::v16i8, Expand);
		setTruncStoreAction(MVT::v32i16, MVT::v32i8, Expand);

setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);		setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);
setOperationAction(ISD::GlobalAddress, MVT::i64, Custom);		setOperationAction(ISD::GlobalAddress, MVT::i64, Custom);

setOperationAction(ISD::SELECT, MVT::i1, Promote);		setOperationAction(ISD::SELECT, MVT::i1, Promote);
setOperationAction(ISD::SELECT, MVT::i64, Custom);		setOperationAction(ISD::SELECT, MVT::i64, Custom);
setOperationAction(ISD::SELECT, MVT::f64, Promote);		setOperationAction(ISD::SELECT, MVT::f64, Promote);
AddPromotedToType(ISD::SELECT, MVT::f64, MVT::i64);		AddPromotedToType(ISD::SELECT, MVT::f64, MVT::i64);
▲ Show 20 Lines • Show All 10,858 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/trunc-store-vec-i16-to-i8.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				; GCN-LABEL: {{^}}short_char:
				; GCN: global_store_byte v
				define protected amdgpu_kernel void @short_char(i8 addrspace(1)* %out) {
				entry:
				%tmp = load i16, i16 addrspace(1)* undef
				%tmp1 = trunc i16 %tmp to i8
				store i8 %tmp1, i8 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}short2_char4:
				; GCN: global_store_dword v
				define protected amdgpu_kernel void @short2_char4(<4 x i8> addrspace(1)* %out) {
				entry:
				%tmp = load <2 x i16>, <2 x i16> addrspace(1)* undef, align 4
				%vecinit = shufflevector <2 x i16> %tmp, <2 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				%vecinit2 = shufflevector <4 x i16> %vecinit, <4 x i16> <i16 undef, i16 undef, i16 0, i16 0>, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
				%tmp1 = trunc <4 x i16> %vecinit2 to <4 x i8>
				store <4 x i8> %tmp1, <4 x i8> addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}short4_char8:
				; GCN: global_store_dwordx2 v
				define protected amdgpu_kernel void @short4_char8(<8 x i8> addrspace(1)* %out) {
				entry:
				%tmp = load <4 x i16>, <4 x i16> addrspace(1)* undef, align 8
				%vecinit = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
				%vecinit2 = shufflevector <8 x i16> %vecinit, <8 x i16> <i16 undef, i16 undef, i16 undef, i16 undef, i16 0, i16 0, i16 0, i16 0>, <8 x i32> <i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7>
				%tmp1 = trunc <8 x i16> %vecinit2 to <8 x i8>
				store <8 x i8> %tmp1, <8 x i8> addrspace(1)* %out, align 8
				ret void
				}

				; GCN-LABEL: {{^}}short8_char16:
				; GCN: global_store_dwordx4 v
				define protected amdgpu_kernel void @short8_char16(<16 x i8> addrspace(1)* %out) {
				entry:
				%tmp = load <8 x i16>, <8 x i16> addrspace(1)* undef, align 16
				%vecinit = shufflevector <8 x i16> %tmp, <8 x i16> undef, <16 x i32> <i32 0, i32 1, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%vecinit2 = shufflevector <16 x i16> %vecinit, <16 x i16> <i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>, <16 x i32> <i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7>
				%tmp1 = trunc <16 x i16> %vecinit2 to <16 x i8>
				store <16 x i8> %tmp1, <16 x i8> addrspace(1)* %out, align 16
				ret void
				}

				; GCN-LABEL: {{^}}short16_char32:
				; GCN: global_store_dwordx4 v
				; GCN: global_store_dwordx4 v
				define protected amdgpu_kernel void @short16_char32(<32 x i8> addrspace(1)* %out) {
				entry:
				%tmp = load <16 x i16>, <16 x i16> addrspace(1)* undef, align 32
				%vecinit = shufflevector <16 x i16> %tmp, <16 x i16> undef, <32 x i32> <i32 0, i32 1, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%vecinit2 = shufflevector <32 x i16> %vecinit, <32 x i16> <i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 undef, i16 0, i16 1, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 undef, i16 undef, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>, <32 x i32> <i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7, i32 0, i32 1, i32 6, i32 7>
				%tmp1 = trunc <32 x i16> %vecinit2 to <32 x i8>
				store <32 x i8> %tmp1, <32 x i8> addrspace(1)* %out, align 32
				ret void
				}