This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use first op of FADDPv* instead of implicit def.
ClosedPublic

Authored by fhahn on Mar 1 2022, 2:48 AM.

Details

Summary

This patch updates the FADDPv* patterns that only use the lower half of
the result register. For those patterns, the second operand does not
matter because its results won't be used.

Instead of introducing new implicit defs for those operands, just use
the first operand. The problem with using new implicit defs is that
register allocation can introduce unnecessary dependencies by using a
different register than the first operand.

For motivating cases, see the changes in the fadd_reduction_*_in_loop
cases. Without this change, the first faddp in the loop has an
unnecessary additional dependency through v0, which is also used for
a cross-iteration reduction.

This can noticeable impact performance. For slightly bigger loops,
this change can improve performance by 15%.

Diff Detail

Event Timeline

fhahn created this revision.Mar 1 2022, 2:48 AM
fhahn requested review of this revision.Mar 1 2022, 2:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 2:48 AM
t.p.northover accepted this revision.Mar 1 2022, 2:56 AM

Looks like a sensible change to me, but could you commit it with a small comment explaining? I think I remember puzzling over these patterns a little when someone asked about them, and with this change the actual dataflow we care about is even less apparent.

This revision is now accepted and ready to land.Mar 1 2022, 2:56 AM
fhahn updated this revision to Diff 412031.Mar 1 2022, 3:18 AM

Looks like a sensible change to me, but could you commit it with a small comment explaining? I think I remember puzzling over these patterns a little when someone asked about them, and with this change the actual dataflow we care about is even less apparent.

Thanks Tim!

I added a comment in the latest update. It would be great if you could take a quick look to double-check if it is helpful.

sdesmalen accepted this revision.Mar 3 2022, 12:52 AM

I've seen this in other places as well where introducing explicit implicit defs leads to less efficient register allocation, so I can see this is an improvement.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 12:52 AM
This revision was landed with ongoing or failed builds.Mar 3 2022, 5:32 AM
This revision was automatically updated to reflect the committed changes.