Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32
Details
- Reviewers
foad arsenm - Commits
- rG46adccc5cc10: [AMDGPU] Improve Codegen for build_vector
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2284–2286 | This isn't a simple bitpacking, this has FP output effects like flushing |
It appears that this instruction is harder to select than expected on integers.
One suggested way in offline discussions was to use isCanonicalized on the inputs.
With these modifications the instruction will be matched if and only if the inputs have been flushed/quieted already (for example by some floating point operations before).
Let me know it that seems reasonable or if I'm missing something.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2702–2704 ↗ | (On Diff #332745) | We have some stripBitcast and other helpers to simplify this |
Please fix the test first, and then we can iterate on the code changes required to select v_pack_b32_f16.
As a general comment I was hoping that you could still implement this using patterns in SIInstructions.td, but using a PatFrag like HasOneUseUnaryOp to call into some C++ code just to check the isCanonicalized condition. The advantage of this is that it should work for GlobalISel as well as SelectionDAG (as long as you set GISelPredicateCode in the PatFrag).
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2690–2691 ↗ | (On Diff #332745) | "bitcast" not "bitconvert". |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
9624 | If we're going to define isCanonicalized on integer types (which seems reasonable to me) then how about changing this to just: case ISD::BITCAST: return isCanonicalized(DAG, Src, MaxDepth - 1); ... and moving the "hack round the mess" code into case ISD::TRUNCATE? | |
llvm/test/CodeGen/AMDGPU/v_pack.ll | ||
3 | Maybe add a second RUN line with "llc -global-isel ..."? | |
31–36 | I think you are testing the wrong thing here. The result of "fadd float" is canonicalized, but if you extract the low 16 bits and bitcast it to half, then you get a meaningless value that is not canonicalized. Instead you should use two "fadd half" instructions and insert the results into a <2 x half>, with no bitcasting. |
Rebased to it in Tablegen, it should be cleaner now. I removed the too complex pattern with the conversions from i16 to f16.
Instead this v_pack_b32_f16 instruction will be used only for f16 when we are sure the two inputs have already been flushed if needed.
I tried to do it for global isel, but there are several problems with the legalization of build_vector in this version. build_vector seems to be expanded into g_build_vector_trunc or other variations depending on the test. I'm not sure of what needs to be done in gisel, so I let a TODO for later.
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | ||
---|---|---|
204–205 | In MIR, instructions have dst operands followed by src operands. You probably need to check operand 1 and operand 2 here. |
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | ||
---|---|---|
188 | This frag matches: a binary operator whose inputs are both canonicalized. I think would be cleaner to have a frag (maybe a PatLeaf?) that matches just: a node that is canonicalized. Then instead of a pattern like "is_canonicalized<build_vector> src0, src1" you would write "build_vector (is_canonicalized src0), (is_canonicalized src1)". Unfortunately my TableGen skills are not great, so I don't know exactly how to implement this. |
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | ||
---|---|---|
188 | I tried to do it as you suggested. My tableGen skills are not great so maybe there is the way but I couldn't find it. From my understanding:
|
This frag matches: a binary operator whose inputs are both canonicalized.
I think would be cleaner to have a frag (maybe a PatLeaf?) that matches just: a node that is canonicalized. Then instead of a pattern like "is_canonicalized<build_vector> src0, src1" you would write "build_vector (is_canonicalized src0), (is_canonicalized src1)". Unfortunately my TableGen skills are not great, so I don't know exactly how to implement this.