scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 45002 Build 46602: arc lint + arc unit
Event Timeline
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll | ||
---|---|---|
3 | if this is an encoding test, it should use -show-mc-encoding? | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
21 | These tests changes seem independent, but I think having the separate scalars has more value for testing the packing code actually works |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll | ||
---|---|---|
3 | Maybe the name was chosen uncarefully, this test should ensure that all components are actually converted and packed into one register. Before this patch, a16 is not working because the y component just gets ignored, the compiler only ensures that the x component is present. I guess renaming it to llvm.amdgcn.image.a16.pack.ll would make more sense? | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
21 | Well, before this patch the packing did not work but the tests passed ;) nhaehnle said the amdgpu_ps calling convention is not build to handle f16 arguments, so it is not clear if they are packed or not? |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
---|---|---|
21 | I forgot to add, some testcases later down used e.g. v[2:9] as arguments, using the packed arguments ensures that always v[0:…] is used so the test should not be influenced by other optimizations or changes of the compiler. |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll | ||
---|---|---|
3 | Should we be checking the encoding for all existing tests instead? | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
5 | What is the point of combining the halves into vectors? I think this makes the test code unnecessarily complex and less readable. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5446–5448 | Couldn't this just be a SCALAR_TO_VECTOR to get the 0th element and create the vector? I think Nicolai mentioned this before? |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
---|---|---|
21 | f16 arguments work. They are not packed |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5446–5448 | Yes, please change this. | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll | ||
3 | Yes, rename the file to avoid confusion. Yes, we should add encoding tests, but that should happen in a separate patch that actually fixes the encoding for gfx1010 :) | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
5 | Ideally, this test should be changed into one that uses the update_llc_test_checks script; with that, having the inputs packed like this will highlight some inefficiencies in the codegen that we should fix. |
Address comments
- Use scalar_to_vector instead of two vector inserts
- Rename packing test
- Convert tests to use update script
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll | ||
---|---|---|
3 | I'm not sure what this file is doing that the current tests don't do? ; GCN-LABEL: {{^}}sample_2d: %v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0) ret <4 x float> %v } | |
5–16 | Do you need all this checking or can you just check the image_sample instruction? If so, it would be cleaner and easier to read and easier to tell what you are checking with just the GFX9: image_sample... | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll | ||
5 | Nicolai, can you be more specific, what inefficiencies? |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll | ||
---|---|---|
5–16 | Testing the image_sample instruction is not enough. Is the intent of this test clear? I can try to explain it more detailed. With my updated patch, the other tests check more details, so we could also remove this small test. Any opinions on this? |
I changed the way the two f16 values get packed.
This way seems to be a lot better for optimization and results in less instructions sometimes.
Couldn't this just be a SCALAR_TO_VECTOR to get the 0th element and create the vector? I think Nicolai mentioned this before?