This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add patterns to support sve indexed FMLA/FMLS
Needs ReviewPublic

Authored by lizhijin on Jul 5 2023, 6:39 PM.

Diff Detail

Event Timeline

lizhijin created this revision.Jul 5 2023, 6:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 6:39 PM
lizhijin requested review of this revision.Jul 5 2023, 6:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 6:39 PM
lizhijin retitled this revision from [AArch64] Add patterns to support indexed FMLA/FMLS to [AArch64][SVE] Add patterns to support sve indexed FMLA/FMLS.Jul 6 2023, 11:19 PM
paulwalker-arm added inline comments.Jul 7 2023, 5:17 AM
llvm/test/CodeGen/AArch64/sve-fma.ll
7–12

Looking at https://developer.arm.com/documentation/ddi0602/2023-06/SVE-Instructions/FMLA--indexed---Floating-point-fused-multiply-add-by-indexed-elements--Zda---Zda---Zn---Zm-indexed--- I think you've misunderstood how the indexed instructions operate.

The index FMLA instruction does not multiple all elements of Zn by Zm[0] but rather is multiplies the elements within each 128-bit chunk of Zn by the element whose index applies to that same 128-bit chunk. Taking a 256-bit SVE implementation, an element type of f32 and an index of 1, the operation is:

Za[0] += Zn[0]*Zm[1];
Za[1] += Zn[1]*Zm[1];
Za[2] += Zn[2]*Zm[1];
Za[3] += Zn[3]*Zm[1];
Za[4] += Zn[4]*Zm[5];
Za[5] += Zn[5]*Zm[5];
Za[6] += Zn[6]*Zm[5];
Za[7] += Zn[7]*Zm[5];

Which means in order for these tests to be functionally the same after the transformation an explicit splat is required and thus they'd be little point in using the index instruction.

Matt added a subscriber: Matt.Jul 10 2023, 8:40 AM