Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of more instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
It is unclear to me how to force a commute for testing purposes, but if anyone has an idea I can put that in. Vulkan short testlist passes fine with this patch. I will also test a compute psdb.
You could add a MIR test with two identical operations, other than commuting the operations and machine-cse should combine them
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
644 | It is commutative since there is subrev, but it needs to define Commutable_REV<> which it does not. |
remove sub as commutative and add add_lshl.
Its pretty bizarre to me that no testing cared if sub was commutative.
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
644 | Going to leave it as not commutative for now, since no VOP3 handle Commutable_REV<> at the moment |
Check SIInstrInfo::commuteOpcode(). If there is no Commutable_REV<> it just will not be commuted.
But I also now have question how does that work if these opcodes do not define it?
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
644 | Actually there is no subrev for VOP3 opcode too. |
Can you please also check that we either do not commute VOP3 with non-default op_sel or commute op_sel as well?
I suppose the same consideration applies to any modifiers too.
Confirmed that modifier operands are appropriately swapped if the
instruction is commuted, including opsel modifiers. Added at tests
for it
This seems to break tests:
http://45.33.8.238/linux/42925/step_12.txt
http://lab.llvm.org:8011/#/builders/52/builds/5816
PTAL, and please revert for now if it takes a while to fix.
Reverted since I don't fully understand the issue. It passed all tests on my system, but then upon rebasing top of tree it didn't. So I suspect the commutes may not be deterministic/ well constrained.
I have a bad suspicion here: that is not OK to just commute source modifiers for opsel. A DST_OP_SEL shares bits with $src0_modifiers, so it needs to be transferred to the new src0 modifiers and masked in src1. I suspect it does not happen.
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
371 | You have to be super careful with fp min/max/med because the NaN handling is not commutative. You could commute them with suitable "nnan" or other IEEE-related flags, but it's probably not worth it. So I would suggest dropping them. | |
632 | Likewise, drop f16 min/max/med. |
There seems to be yet another problem with commute, not related to this patch directly though. DPP can never be commuted since DPP operand is always src0. I do not see this handling anywhere.
Ok I see the issue. In general, how can we tell apart src0 with opsel=1 and dst with opsel=1?
To avoid issues with commute, I would say don't commute vop3 with 16bit operands. However, V_MAD_I16 and V_MAD_I16_gfx9 do already have isCommute =1. Is this perhaps a bug?
I think the fix is not too difficult. If the opcode is opsel it has op_sel operand. Then on commute we need to transfer DST_OP_SEL bit from src0_modifiers to the new src0_modifiers and reset it in the new src1_modifiers. The rest should be OK.
The src0_modifiers has DST_OP_SEL for vdst bit and OP_SEL_0/OP_SEL_1 for the src0 itself.
You can always verify what you are getting if use llc -start-before instead of llc -run-pass. That will print the asm so you could see resulting op_sel easier. I have to admit I wish to have a more readable mir for that too.
To avoid issues with commute, I would say don't commute vop3 with 16bit operands. However, V_MAD_I16 and V_MAD_I16_gfx9 do already have isCommute =1. Is this perhaps a bug?
These are commutable. The bug is that we don't properly handle it.
You have to be super careful with fp min/max/med because the NaN handling is not commutative. You could commute them with suitable "nnan" or other IEEE-related flags, but it's probably not worth it. So I would suggest dropping them.