This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] FMLA/FMLS patterns improvement.
ClosedPublic

Authored by ilinpv on Apr 15 2020, 4:04 PM.

Details

Summary

FMLA/FMLS f16 indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467
Removed redundant v2f32 vector_extract indexed pattern since
Instruction Selection is able to match v4f32 instead.

Diff Detail

Event Timeline

ilinpv created this revision.Apr 15 2020, 4:04 PM
dmgreen added inline comments.Apr 15 2020, 11:09 PM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055

Should we have equal patterns to those below for f32 as well? So using DUP, D vector (4xf16) and possibly from a vector_extract too.

ilinpv updated this revision to Diff 258337.Apr 17 2020, 8:35 AM
ilinpv marked an inline comment as done.
ilinpv edited the summary of this revision. (Show Details)

More patterns added.

ilinpv added inline comments.Apr 17 2020, 9:25 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055

I'm worried about performance impact of change fmadd/sub -> fmla/ls in last pattern case.

dmgreen added inline comments.Apr 18 2020, 6:15 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055

What performance impact are you worried about?

8077

Do you mean the v4f16 variant of this pattern?

ilinpv marked an inline comment as not done.Apr 18 2020, 7:20 AM
ilinpv added inline comments.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055

I mean, can fmla/ls take more cycles that fmadd/sub, is it any performance improvement of such replacement?

8077

This pattern exactly replaces fmadd/sub to fmla/ls, so it is questionable weather or not this pattern is useful.
v4f16 vector_extract variant has no any test cases at all.

ilinpv updated this revision to Diff 258865.Apr 20 2020, 5:18 PM

Patterns corrected, vector_extract tests added.

ilinpv marked 2 inline comments as done.Apr 20 2020, 5:19 PM
ilinpv updated this revision to Diff 259008.Apr 21 2020, 8:27 AM
ilinpv edited the summary of this revision. (Show Details)

v2f32 pattern removed, test added.

dmgreen accepted this revision.Apr 21 2020, 8:45 AM

LGTM. Thanks

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8094

I was a little surprised when you said we could remove these, but it looks like the vector_extract (v2f32) is always converted to a vector_extract (v4f32 insert_subvector (v2f32)). So I agree, seems Ok to remove. (And if we do run into a problem, we can always add it back in).

This revision is now accepted and ready to land.Apr 21 2020, 8:45 AM
ab added a subscriber: ab.Apr 21 2020, 8:07 PM
ab added inline comments.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058

Should this be V128_lo? I don't think this is encodable for Rm in V16-V31 (same in the other indexed f16 variants I think)

ilinpv marked 2 inline comments as done.Apr 22 2020, 6:17 AM
ilinpv added inline comments.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058

Yep, I double checked encoding, you are right. Thank you very much for this. Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be

Patterns corrected to comply with encoding 4eca1c06a4a9183fcf7bb230d894617caf3cf3be

ab added inline comments.Apr 22 2020, 3:03 PM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058

Thanks Pavel! I think this applies to the AArch64dup variants too, which does entail adding FPR16Op_lo and FPR16_lo I imagine, and maybe a couple more

ilinpv marked an inline comment as done.Apr 23 2020, 3:48 PM
ilinpv added inline comments.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058

Oops. Thanks again, fix landed cc457672e628846c20e92c6e0a82896f0d6db031