This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Prefer to fold dup into fmul/fma as opposed to ld1r
ClosedPublic

Authored by dmgreen on Mar 2 2023, 1:46 PM.

Details

Summary

There is a fold to create LD1DUPpost from dup(load) that can be postinc. If the dup is used by a "by element" operation such as fmul or fma then it can be slightly better to fold the dup into the fmul instead, which produces slightly fast code.

ld1r { v1.4s }, [x0], #4
fmul v0.4s, v1.4s, v0.4s

vs

ldr s1, [x0], #4
fmul v0.4s, v0.4s, v1.s[0]

This could also be done with integer operations such as smull/umull too, so long as the load/dup gets correctly combined into the mul operation. Currently this just operates on foating point types.

Diff Detail

Event Timeline

dmgreen created this revision.Mar 2 2023, 1:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 1:46 PM
dmgreen requested review of this revision.Mar 2 2023, 1:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 1:46 PM
samtebbs accepted this revision.Mar 3 2023, 6:05 AM
This revision is now accepted and ready to land.Mar 3 2023, 6:05 AM
This revision was landed with ongoing or failed builds.Mar 7 2023, 1:24 PM
This revision was automatically updated to reflect the committed changes.