This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Mark additional VOP3 as commutable
ClosedPublic

Authored by Joe_Nash on Mar 25 2021, 1:33 PM.

Details

Summary

Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of more instructions.

Diff Detail

Event Timeline

Joe_Nash created this revision.Mar 25 2021, 1:33 PM
Joe_Nash requested review of this revision.Mar 25 2021, 1:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2021, 1:33 PM

It is unclear to me how to force a commute for testing purposes, but if anyone has an idea I can put that in. Vulkan short testlist passes fine with this patch. I will also test a compute psdb.

This revision is now accepted and ready to land.Mar 25 2021, 1:44 PM

It is unclear to me how to force a commute for testing purposes, but if anyone has an idea I can put that in. Vulkan short testlist passes fine with this patch. I will also test a compute psdb.

You could add a MIR test with two identical operations, other than commuting the operations and machine-cse should combine them

foad requested changes to this revision.Mar 26 2021, 2:02 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/VOP3Instructions.td
644

Subtraction is not commutative!

648–649

add_lshl is (s0 + s1) << s2 which is commutative.

This revision now requires changes to proceed.Mar 26 2021, 2:02 AM
rampitec requested changes to this revision.Mar 26 2021, 2:28 PM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/VOP3Instructions.td
644

It is commutative since there is subrev, but it needs to define Commutable_REV<> which it does not.

This revision now requires changes to proceed.Mar 26 2021, 2:28 PM
Joe_Nash updated this revision to Diff 333635.Mar 26 2021, 2:39 PM

remove sub as commutative and add add_lshl.
Its pretty bizarre to me that no testing cared if sub was commutative.

Joe_Nash marked 3 inline comments as done.Mar 26 2021, 2:40 PM
Joe_Nash added inline comments.
llvm/lib/Target/AMDGPU/VOP3Instructions.td
644

Going to leave it as not commutative for now, since no VOP3 handle Commutable_REV<> at the moment

remove sub as commutative and add add_lshl.
Its pretty bizarre to me that no testing cared if sub was commutative.

Check SIInstrInfo::commuteOpcode(). If there is no Commutable_REV<> it just will not be commuted.
But I also now have question how does that work if these opcodes do not define it?

llvm/lib/Target/AMDGPU/VOP3Instructions.td
644

Actually there is no subrev for VOP3 opcode too.

rampitec accepted this revision.Mar 26 2021, 2:48 PM

remove sub as commutative and add add_lshl.
Its pretty bizarre to me that no testing cared if sub was commutative.

Check SIInstrInfo::commuteOpcode(). If there is no Commutable_REV<> it just will not be commuted.
But I also now have question how does that work if these opcodes do not define it?

Ah, I see. If there is no Commutable_Rev it will just return itself.

Can you please also check that we either do not commute VOP3 with non-default op_sel or commute op_sel as well?

I suppose the same consideration applies to any modifiers too.

Joe_Nash updated this revision to Diff 333888.Mar 29 2021, 8:50 AM
Joe_Nash marked an inline comment as done.

update test using add_lshl

Joe_Nash updated this revision to Diff 333927.Mar 29 2021, 10:44 AM

Confirmed that modifier operands are appropriately swapped if the
instruction is commuted, including opsel modifiers. Added at tests
for it

foad accepted this revision.Mar 29 2021, 11:20 AM
This revision is now accepted and ready to land.Mar 29 2021, 11:20 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 29 2021, 11:51 AM

This seems to break tests:
http://45.33.8.238/linux/42925/step_12.txt
http://lab.llvm.org:8011/#/builders/52/builds/5816

PTAL, and please revert for now if it takes a while to fix.

Reverted since I don't fully understand the issue. It passed all tests on my system, but then upon rebasing top of tree it didn't. So I suspect the commutes may not be deterministic/ well constrained.

rampitec added a comment.EditedMar 30 2021, 12:41 AM

Reverted since I don't fully understand the issue. It passed all tests on my system, but then upon rebasing top of tree it didn't. So I suspect the commutes may not be deterministic/ well constrained.

I have a bad suspicion here: that is not OK to just commute source modifiers for opsel. A DST_OP_SEL shares bits with $src0_modifiers, so it needs to be transferred to the new src0 modifiers and masked in src1. I suspect it does not happen.

foad added inline comments.Mar 30 2021, 2:08 AM
llvm/lib/Target/AMDGPU/VOP3Instructions.td
371

You have to be super careful with fp min/max/med because the NaN handling is not commutative. You could commute them with suitable "nnan" or other IEEE-related flags, but it's probably not worth it. So I would suggest dropping them.

632

Likewise, drop f16 min/max/med.

There seems to be yet another problem with commute, not related to this patch directly though. DPP can never be commuted since DPP operand is always src0. I do not see this handling anywhere.

Reverted since I don't fully understand the issue. It passed all tests on my system, but then upon rebasing top of tree it didn't. So I suspect the commutes may not be deterministic/ well constrained.

I have a bad suspicion here: that is not OK to just commute source modifiers for opsel. A DST_OP_SEL shares bits with $src0_modifiers, so it needs to be transferred to the new src0 modifiers and masked in src1. I suspect it does not happen.

Ok I see the issue. In general, how can we tell apart src0 with opsel=1 and dst with opsel=1?
To avoid issues with commute, I would say don't commute vop3 with 16bit operands. However, V_MAD_I16 and V_MAD_I16_gfx9 do already have isCommute =1. Is this perhaps a bug?

Reverted since I don't fully understand the issue. It passed all tests on my system, but then upon rebasing top of tree it didn't. So I suspect the commutes may not be deterministic/ well constrained.

I have a bad suspicion here: that is not OK to just commute source modifiers for opsel. A DST_OP_SEL shares bits with $src0_modifiers, so it needs to be transferred to the new src0 modifiers and masked in src1. I suspect it does not happen.

Ok I see the issue. In general, how can we tell apart src0 with opsel=1 and dst with opsel=1?

I think the fix is not too difficult. If the opcode is opsel it has op_sel operand. Then on commute we need to transfer DST_OP_SEL bit from src0_modifiers to the new src0_modifiers and reset it in the new src1_modifiers. The rest should be OK.
The src0_modifiers has DST_OP_SEL for vdst bit and OP_SEL_0/OP_SEL_1 for the src0 itself.
You can always verify what you are getting if use llc -start-before instead of llc -run-pass. That will print the asm so you could see resulting op_sel easier. I have to admit I wish to have a more readable mir for that too.

To avoid issues with commute, I would say don't commute vop3 with 16bit operands. However, V_MAD_I16 and V_MAD_I16_gfx9 do already have isCommute =1. Is this perhaps a bug?

These are commutable. The bug is that we don't properly handle it.