This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix lowering a16 image intrinsics
ClosedPublic

Authored by sebastian-ne on Jan 27 2020, 7:51 AM.

Details

Summary

scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers.

Event Timeline

sebastian-ne created this revision.Jan 27 2020, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2020, 7:51 AM
arsenm added inline comments.Jan 27 2020, 12:21 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll
3

if this is an encoding test, it should use -show-mc-encoding?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
21

These tests changes seem independent, but I think having the separate scalars has more value for testing the packing code actually works

sebastian-ne added inline comments.Jan 27 2020, 11:57 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll
3

Maybe the name was chosen uncarefully, this test should ensure that all components are actually converted and packed into one register.

Before this patch, a16 is not working because the y component just gets ignored, the compiler only ensures that the x component is present.
Still, all other a16 tests are passing. So I tried to introduce this test as a regression test.
It should make sure that actually all components are "encoded" into the packed coordinates.

I guess renaming it to llvm.amdgcn.image.a16.pack.ll would make more sense?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
21

Well, before this patch the packing did not work but the tests passed ;)
To test the packing, I added the new test.

nhaehnle said the amdgpu_ps calling convention is not build to handle f16 arguments, so it is not clear if they are packed or not?
The llvm.amdgcn.image.a16.dim.ll test, which takes i16 instead of halfs, also uses packed arguments. It groups all arguments into <2 x i16>s.

sebastian-ne added inline comments.Jan 28 2020, 12:00 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
21

I forgot to add, some testcases later down used e.g. v[2:9] as arguments, using the packed arguments ensures that always v[0:…] is used so the test should not be influenced by other optimizations or changes of the compiler.

rtaylor added inline comments.Jan 28 2020, 6:36 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll
3

Should we be checking the encoding for all existing tests instead?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
5

What is the point of combining the halves into vectors? I think this makes the test code unnecessarily complex and less readable.

rtaylor added inline comments.Jan 28 2020, 6:40 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5446–5448

Couldn't this just be a SCALAR_TO_VECTOR to get the 0th element and create the vector? I think Nicolai mentioned this before?

arsenm added inline comments.Jan 28 2020, 1:24 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
21

f16 arguments work. They are not packed

nhaehnle added inline comments.Jan 29 2020, 1:24 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5446–5448

Yes, please change this.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll
3

Yes, rename the file to avoid confusion.

Yes, we should add encoding tests, but that should happen in a separate patch that actually fixes the encoding for gfx1010 :)

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
5

Ideally, this test should be changed into one that uses the update_llc_test_checks script; with that, having the inputs packed like this will highlight some inefficiencies in the codegen that we should fix.

Address comments

  • Use scalar_to_vector instead of two vector inserts
  • Rename packing test
  • Convert tests to use update script
rtaylor added inline comments.Jan 29 2020, 6:02 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll
3

I'm not sure what this file is doing that the current tests don't do?

; GCN-LABEL: {{^}}sample_2d:
; GCN: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16{{$}}
define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
main_body:

%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v

}

5–16

Do you need all this checking or can you just check the image_sample instruction? If so, it would be cleaner and easier to read and easier to tell what you are checking with just the GFX9: image_sample...

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
5

Nicolai, can you be more specific, what inefficiencies?

sebastian-ne added inline comments.Jan 29 2020, 6:23 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll
5–16

Testing the image_sample instruction is not enough.
In the previous tests, only the image_sample was tested and the tests passed. However, a16 was not working because packing the f16 arguments into the registers did work only partially.
So the previous tests were incomplete.
This is exactly the part that this test fills. The significant part of this test is not the image_sample, but the conversion and packing of all arguments into the registers.
It tests the part that went wrong before this patch.

Is the intent of this test clear? I can try to explain it more detailed.

With my updated patch, the other tests check more details, so we could also remove this small test. Any opinions on this?

Remove superfluous test

sebastian-ne edited the summary of this revision. (Show Details)Jan 29 2020, 6:41 AM
This revision is now accepted and ready to land.Jan 30 2020, 1:27 AM

I changed the way the two f16 values get packed.
This way seems to be a lot better for optimization and results in less instructions sometimes.

sebastian-ne requested review of this revision.Feb 4 2020, 3:13 AM

Yes, that does look better, thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Feb 5 2020, 2:03 AM
This revision was automatically updated to reflect the committed changes.