This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable fneg and fabs divergence-driven instruction selection.
ClosedPublic

Authored by alex-t on Nov 19 2021, 9:31 AM.

Details

Summary

Detailed description: We currently have a set of patterns to select ISD::FNEG and ISD::FABS to the bitwise operations. We need to make them predicated to select the VALU or SALU bitwise operation variant according to the SDNode divergence bit.

Diff Detail

Event Timeline

alex-t created this revision.Nov 19 2021, 9:31 AM
alex-t requested review of this revision.Nov 19 2021, 9:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2021, 9:31 AM
Herald added a subscriber: wdng. · View Herald Transcript
alex-t retitled this revision from [AMDGPU] Enable fneg and fabs divergence-deriven instruction selection. to [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..Nov 19 2021, 9:35 AM
alex-t edited the summary of this revision. (Show Details)
rampitec added inline comments.Nov 19 2021, 10:41 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
1509

I think all of that should use VOP3 forms. That way you will avoid copy from SGPR to VGPR which appears in your test.

Please also reformat summary description.

alex-t added inline comments.Nov 22 2021, 7:22 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
1509

VOP2 form allows SGPR as 1st operand and the patterns are written to make a profit from this fact.
The COPY from SGPR to VGPR that you have mentioned resulted from the type legalization and further combining. In fact, the test does not specify the subtarget. As a result, for subtargets that have no fp16,

t32: f16,ch = load ...
    t33: f16 = fneg t32

is legalized to

      t42: i32 = and t40, Constant:i32<65535>
    t38: f32 = fp16_to_fp t42
  t39: f32 = fneg t38
t45: i32 = fp_to_fp16 t39

The latter, in order, gets combined to

t47: i32,ch = load ...
    t49: i32 = xor t47, Constant:i32<32768>

The problem here is that whatever order of operands we use in combiner, the SelectionDAG::getNode will canonicalize it making the constant RHS. So, we always get

xor t47, Constant:i32<32768>

For fp16 capable subtargets the explicit pattern is used and there are no SGPR to VGPR COPY. For now, I am going to update the test to check fp16 with the gfx900 subtarget.

alex-t updated this revision to Diff 388920.Nov 22 2021, 7:49 AM

test changed to enable fp16 patterns for fp16-capable subtarget

rampitec added inline comments.Nov 22 2021, 11:09 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
1509

VOP2 form allows SGPR as 1st operand and the patterns are written to make a profit from this fact.
The COPY from SGPR to VGPR that you have mentioned resulted from the type legalization and further combining. In fact, the test does not specify the subtarget. As a result, for subtargets that have no fp16,

I see. We prefer to use VOP3 form anyway to allow more potential operand variants. It will be shrunk later if possible. But thanks for updating the test.

alex-t updated this revision to Diff 389006.Nov 22 2021, 12:50 PM

VOP2 forms changed to VOP3. Tests updated.

alex-t edited the summary of this revision. (Show Details)Nov 22 2021, 12:50 PM

GlobalISel tests were updated to make them really auto-generatable.
update_mir_test_checks.py doesn't work if the prefixes in different RUN lines are the same.

This revision is now accepted and ready to land.Nov 22 2021, 12:57 PM