This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Support for v3i32/v3f32
ClosedPublic

Authored by tpr on Mar 4 2019, 7:12 AM.

Details

Summary

Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9

Diff Detail

Event Timeline

tpr created this revision.Mar 4 2019, 7:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 7:12 AM
arsenm added a comment.Mar 4 2019, 7:40 AM

Needs a test for the 96-bit spills. An -O0 test case with a value live across a block should work

lib/Target/AMDGPU/SIISelLowering.cpp
6644

I think the widening case is missing an alignment check, but that's a pre-existing problem that should be fixed separately

lib/Target/AMDGPU/SIRegisterInfo.td
471

Yes, it needs to be allocatable.

The allocation priority of the SGPRs is supposed to be lower than VGPRs, but here it's the same as the VGPR. You might need to shift all of the values around to fix this

What about flat and segmented memory operations?

lib/Target/AMDGPU/SIInstrInfo.cpp
577

Does that mean for a v3 we will have 3 s_mov_b32 instead of s_mov_b64 + s_mov_b32? That is suboptimal.

What happens in asm parser/disasm? I thing these still create bogus v4 operands?

What happens in asm parser/disasm? I thing these still create bogus v4 operands?

Those should already be using the correct register classes. There were never issues after the DAG with these

tpr added a comment.Mar 5 2019, 5:35 AM

What about flat and segmented memory operations?

Flat dwordx3 is supported, as shown by some test changes. What is segmented memory?

arsenm added inline comments.Mar 5 2019, 8:23 AM
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1433

You can just hardcode i32

lib/Target/AMDGPU/SIISelLowering.cpp
334–335

Why do these need to be changed?

1332

In the backend you can just hardened the index type to i32

In D58902#1418275, @tpr wrote:

What about flat and segmented memory operations?

Flat dwordx3 is supported, as shown by some test changes. What is segmented memory?

Segmented flat, like global_load.

tpr marked 2 inline comments as done.Mar 5 2019, 10:39 AM
tpr added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
577

Yes.

I can't find any lit test or graphics shader that attempts to copy a vec3 of sgprs here. Any idea how to provoke it so I can try it?

arsenm added inline comments.Mar 5 2019, 10:41 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
577

A v3 phi, and probably at -O0

tpr added a comment.Mar 6 2019, 3:58 AM
In D58902#1418275, @tpr wrote:

What about flat and segmented memory operations?

Flat dwordx3 is supported, as shown by some test changes. What is segmented memory?

Segmented flat, like global_load.

Yes, global dwordx3 is supported, as shown in test local-global-i32.ll (although the test does not check that it gets global rather than flat in the case that it should get global).

tpr marked 5 inline comments as done.Mar 6 2019, 10:57 AM
tpr added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
334–335

Need the first pair to stop "Cannot select: t60: v3i32 = insert_subvector undef:v3i32, t55, Constant:i32<0>" on ds_read2_superreg.ll.

Need the second pair to stop "Cannot select: t129: v4i32 = insert_subvector undef:v4i32, t229, Constant:i32<0>" on setcc.ll.

lib/Target/AMDGPU/SIInstrInfo.cpp
577

I managed to provoke it, but fixing it looks less than trivial as we need to somehow allow for a vec3 having a sub0_sub1 and a sub2. And it looks pretty rare anyway. So I won't fix it here. Is that ok?

arsenm added inline comments.Mar 6 2019, 11:08 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
577

Yes

tpr marked an inline comment as done.Mar 6 2019, 11:49 AM

Needs a test for the 96-bit spills. An -O0 test case with a value live across a block should work

OK, I've got a change for this I will add in the next update. (Also the same for vec5 that I will add in the next update of that review.)

tpr updated this revision to Diff 189564.Mar 6 2019, 12:45 PM

V2: Fixed broken v3f32 select.

Fixed reg class priorities.
Updated a fragile test.
Addressed review comments on DL::getIdxType.
Fixed vec3 sgpr inline asm constraint.
Added vec3 sgpr spill test.
tpr added a comment.Mar 6 2019, 12:47 PM

Actually I only have a vec3 spilling test for sgprs. I didn't see any multi-dword vgpr spill tests.

In D58902#1420345, @tpr wrote:

Actually I only have a vec3 spilling test for sgprs. I didn't see any multi-dword vgpr spill tests.

That can't be right. We should have some somewhere. We should have a new one at least for the 3 register case

tpr updated this revision to Diff 190085.Mar 11 2019, 7:12 AM

V3: Added multi-dword vgpr spill test, inc vec3 and vec5.

tpr updated this revision to Diff 190393.Mar 13 2019, 3:45 AM

V4: Fixed vec3 in sign_extend test.

I'm not sure how you're getting away without updating AMDGPUCallingConv.td. We have to maintain a list of what to do for each legal type in each convention

tpr updated this revision to Diff 190985.Mar 16 2019, 11:55 AM

V5: Added vec3 to AMDGPUCallingConvention.td. Fixed call-return-types test for dwordx3 buffer/flat.
(I will add vec5 to AMDGPUCallingConvention.td in the vec5 change.)

arsenm accepted this revision.Mar 18 2019, 1:27 PM

LGTM

This revision is now accepted and ready to land.Mar 18 2019, 1:27 PM
This revision was automatically updated to reflect the committed changes.

Hi,
This change breaks SI, at least with RADV.
Here's how to reproduce: ./deqp-vk --deqp-case=dEQP-VK.glsl.builtin.function.integer.bitfieldreverse.ivec3_highp_tess_control

Can you investigate please?
Thanks!

tpr added a comment.Apr 1 2019, 2:08 AM

Hi Samuel

Oh dear, sorry about that.

I'm not in a position to immediately run RADV, so I'll try manual code inspection first. Could you please email over (tpr.llvm@botech.co.uk) the .ll going in to the amdgcn backend, and the buggy .s you get out?

Thanks.

tpr added a comment.Apr 9 2019, 5:56 AM

Hopefully fixed by D60457.