Details
Details
Diff Detail
Diff Detail
Event Timeline
llvm/test/CodeGen/AArch64/sve-fma.ll | ||
---|---|---|
7–12 | Looking at https://developer.arm.com/documentation/ddi0602/2023-06/SVE-Instructions/FMLA--indexed---Floating-point-fused-multiply-add-by-indexed-elements--Zda---Zda---Zn---Zm-indexed--- I think you've misunderstood how the indexed instructions operate. The index FMLA instruction does not multiple all elements of Zn by Zm[0] but rather is multiplies the elements within each 128-bit chunk of Zn by the element whose index applies to that same 128-bit chunk. Taking a 256-bit SVE implementation, an element type of f32 and an index of 1, the operation is: Za[0] += Zn[0]*Zm[1]; Za[1] += Zn[1]*Zm[1]; Za[2] += Zn[2]*Zm[1]; Za[3] += Zn[3]*Zm[1]; Za[4] += Zn[4]*Zm[5]; Za[5] += Zn[5]*Zm[5]; Za[6] += Zn[6]*Zm[5]; Za[7] += Zn[7]*Zm[5]; Which means in order for these tests to be functionally the same after the transformation an explicit splat is required and thus they'd be little point in using the index instruction. |
Looking at https://developer.arm.com/documentation/ddi0602/2023-06/SVE-Instructions/FMLA--indexed---Floating-point-fused-multiply-add-by-indexed-elements--Zda---Zda---Zn---Zm-indexed--- I think you've misunderstood how the indexed instructions operate.
The index FMLA instruction does not multiple all elements of Zn by Zm[0] but rather is multiplies the elements within each 128-bit chunk of Zn by the element whose index applies to that same 128-bit chunk. Taking a 256-bit SVE implementation, an element type of f32 and an index of 1, the operation is:
Which means in order for these tests to be functionally the same after the transformation an explicit splat is required and thus they'd be little point in using the index instruction.