Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
|---|---|---|
| 409–410 | The add(a, select(... pattern has specific requirements for the result of inactive lanes that matches AArch64mla_m1. Did you mean to move it into AArch64mla_p? | |
| 411–414 | As above I think the sub(a, select(... pattern should remain assigned to AArch64mls_m1. | |
| 494 | Please delete redundant whitespace. | |
| llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
|---|---|---|
| 409–410 | The current semantics of add(a, select(...) ) is resulting in The patch, even after moving this pattern to AArch64mla_p, should generate the pseudo with same predicate semantics (i.e. p0/m), right? I assume that thats the meaning of FalseLanesUndef. Correct me if wrong. If this understanding is incorrect,
| |
| llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
|---|---|---|
| 409–410 |
The output is only valid because of how the operation has been register allocated. Remember the point of this patch is that MLA_ZPZZZ can emit an MLA or MAD based on what the register allocator chooses to do. You can see the bug the current implementation will introduce by running the following through llc: `
define <vscale x 16 x i8> @good(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %mask) {
%mul = mul nsw <vscale x 16 x i8> %b, %c
%sel = select <vscale x 16 x i1> %mask, <vscale x 16 x i8> %mul, <vscale x 16 x i8> zeroinitializer
%add = add <vscale x 16 x i8> %a, %sel
ret <vscale x 16 x i8> %add
}
define <vscale x 16 x i8> @bad(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %mask) {
%mul = mul nsw <vscale x 16 x i8> %a, %b
%sel = select <vscale x 16 x i1> %mask, <vscale x 16 x i8> %mul, <vscale x 16 x i8> zeroinitializer
%add = add <vscale x 16 x i8> %c, %sel
ret <vscale x 16 x i8> %add
}
good:
mla z0.b, p0/m, z1.b, z2.b
ret
bad:
mad z0.b, p0/m, z1.b, z2.b
retBoth functions expect the inactive lanes to return the original value of the addend. However the second example will incorrectly set the result of inactive lanes to the original value of the multiplicand. Essential both sets of IR are synonymous to the AArch64mla_m1 operation.
FalseLanesUndef means the inactive lanes have no defined value but as explained above the add(a, select(...) ) pattern sets the inactive lanes to a defined value, namely added + 0.
Based on the above I think option 3 is the only correct answer. | |
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
integer product = element1 * element2;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
else
Elem[result, e, esize] = Elem[**operand1**, e, esize];Ah ! Didnt notice the semantics highlighted above. I was thinking that to be operand3. Thanks for pointing out. Will make appropriate changes and upload the patch
The add(a, select(... pattern has specific requirements for the result of inactive lanes that matches AArch64mla_m1. Did you mean to move it into AArch64mla_p?