This is an archive of the discontinued LLVM Phabricator instance.

SILoadStoreOptimizer: add support for GFX10 image instructions
ClosedPublic

Authored by foad on Jun 11 2020, 11:15 AM.

Details

Summary

GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.

Diff Detail

Event Timeline

foad created this revision.Jun 11 2020, 11:15 AM
foad marked 2 inline comments as done.Jun 11 2020, 11:19 AM

This isn't ready to push yet but I'd like to get some feedback on it -- see comments inline.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
116

Is this really true? I see opcodes like IMAGE_SAMPLE_D_CL_O_V4_V16 mentioned in some generated tables, but I'm not sure what the V16 there really means or whether these will ever occur in practice.

llvm/test/CodeGen/AMDGPU/merge-image-load.mir
2 ↗(On Diff #270187)

These tests pass with -mcpu=gfx1010 but I don't think they're really testing anything useful. Do I need to regenerate them using gfx10-specific image sample opcodes or something?

rampitec added inline comments.Jun 11 2020, 11:50 AM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
116

I think that's because we do not have all register classes, so it uses VReg_512 for 10, 11, or 12 dword address for example. I believe a longest one has 12 vaddrs, like this:

image_sample_c_d_cl_o v[16:19], [v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19], s[20:27], s[100:103] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x16,0x0f,0xec,0xf0,0x08,0x10,0x25,0x03,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0x10,0x11,0x12,0x13,0x00]
0x16,0x0f,0xec,0xf0,0x08,0x10,0x25,0x03,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0x10,0x11,0x12,0x13,0x14

You can check it in MIMGInstructions.td where MIMG_Sampler_AddrSizes is defined.

arsenm added inline comments.Jun 11 2020, 3:05 PM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
116

I think 12 is what I remember was the maximum. I was confused by the missing register classes. We probably should add the missing ones now that globalisel at least would handle them (I had to go back and break the code to round to the instruction constraints)

rampitec added inline comments.Jun 11 2020, 3:28 PM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
116

It will probably require not only register classes, but value types as well.

Mostly LGTM, with some nitpicks.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
108–113

false instead of 0 for bools.

But how about keeping RegisterEnum and making this two unsigned chars, one for the RegisterEnum flags and one for the NumVAddrs?

116

I remember the value types being the part where I got stuck when I tried to do this a very, very long time ago. And yes, 12 is the maximum number of vaddrs.

430–431

Remove the struct

foad marked an inline comment as done.Jun 12 2020, 7:21 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
108–113

I think using separate fields is better than bit twiddling. The storage size of AddressRegs doesn't really matter because it isn't stored anywhere, only used to return info from getRegs. In retrospect I shouldn't have bothered squashing NumVAddrs into a single byte.

foad updated this revision to Diff 270394.Jun 12 2020, 7:26 AM

Address review comments.

foad marked 6 inline comments as done.Jun 12 2020, 7:26 AM
piotr added inline comments.Jul 1 2020, 12:50 AM
llvm/test/CodeGen/AMDGPU/merge-image-load.mir
2 ↗(On Diff #270187)

Yes, it would be good if you could add tests for image instructions with NSA.

foad updated this revision to Diff 274808.Jul 1 2020, 7:46 AM

Check dlc bit for gfx10 MIMG instructions. Add tests.

foad marked an inline comment as done.Jul 1 2020, 7:47 AM
piotr added a comment.Jul 2 2020, 12:12 AM

Excellent test coverage, thank you for doing that.

piotr added inline comments.Jul 2 2020, 12:20 AM
llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir
13

s/ =C/ = C/

foad updated this revision to Diff 275023.Jul 2 2020, 2:29 AM

Fix formatting in MIR tests.

nhaehnle accepted this revision.Jul 8 2020, 9:32 AM

LGTM

This revision is now accepted and ready to land.Jul 8 2020, 9:32 AM
This revision was automatically updated to reflect the committed changes.