This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][NEON][SVE] Lower FCOPYSIGN using AArch64ISD::BSP
ClosedPublic

Authored by DavidTruby on Jan 27 2022, 9:41 AM.

Details

Summary

This patch modifies the FCOPYSIGN lowering to go through the BSP
pseudo-instruction. This allows the same lowering code for NEON,
SVE and SVE2.

As part of this, lowering for BSP for SVE and SVE2 is also added.

For SVE and NEON this patch is NFC.

Diff Detail

Event Timeline

DavidTruby created this revision.Jan 27 2022, 9:41 AM
DavidTruby requested review of this revision.Jan 27 2022, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2022, 9:41 AM

Add missing SVE2 tests

Fix operand ordering mistake

DavidTruby planned changes to this revision.Jan 31 2022, 3:28 AM

There's currently a bug in the output for scalars here that I am investigating

xgupta added a subscriber: xgupta.Jan 31 2022, 5:07 AM

Fix operand ordering in all cases

Apologies to reviewers for the confusion above, I believe this patch should now be correct in all the given cases.

I also fear the intent here may be a bit vague so I should clarify: what I'm trying to do is unify the lowering of FCOPYSIGN for scalar, NEON, SVE and SVE2 cases by lowering it to the AArch64ISD::BSP pseudo-instruction, which should have correct lowering for each architecture.
There didn't as yet exist SVE or SVE2 lowering for BSP, so that is added here as well.
As such, there should be no change to the code generation for SVE or NEON, other than possibly a few instruction orders changing. For SVE2 we should use the bsl instruction there that SVE does not have.

DavidTruby updated this revision to Diff 404534.EditedJan 31 2022, 7:55 AM

Fix SVE2 lowering. BSL argument ordering is different for SVE2 and NEON, my mistake!

Matt added a subscriber: Matt.Jan 31 2022, 8:06 AM
efriedma added inline comments.Jan 31 2022, 3:45 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7325

Do we want to preserve this comment somewhere?

7384

The special case here feels kind of weird, but I guess it's existing code.

llvm/test/CodeGen/AArch64/vector-fcopysign.ll
303

It looks like the generated code here got worse; is there something wrong with the way we lower AArch64ISD::BSP?

DavidTruby updated this revision to Diff 404907.Feb 1 2022, 6:23 AM

Reorder the operands and flip bit mask.
This changes the mask to select all the bits other than the sign from the first
operand and then only the sign from the second, making this match the operand order for std::copysign which should be more beneficial.

DavidTruby added inline comments.Feb 1 2022, 6:36 AM
llvm/test/CodeGen/AArch64/vector-fcopysign.ll
303

This looks like an oddity of register allocation to me. I'm not sure why the bsl instruction doesn't just use v3, in which case the extra mov wouldn't be necessary. By the time we get to AArch64ISD::BSP lowering register allocation has already happened though. I'm not sure what the solution to that is.

DavidTruby added inline comments.Feb 1 2022, 6:38 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7384

I tried without this special case, oddly that introduces quite a few extra instructions which is why I left the existing code in here.

peterwaller-arm added inline comments.Feb 1 2022, 7:07 AM
llvm/test/CodeGen/AArch64/arm64-fcopysign.ll
27

Heads up, this looks like an unintended semantic change.

DavidTruby added inline comments.Feb 1 2022, 7:12 AM
llvm/test/CodeGen/AArch64/arm64-fcopysign.ll
27

This is caused by the odd code mentioned above that handles specially for f64, which I forgot to change when changing the mask generation. However since that change I don't think the special handling is actually necessary anymore. I'm just verifying that and will have a fix up soon.

DavidTruby updated this revision to Diff 404928.Feb 1 2022, 7:16 AM

Remove special f64 handling which was erroneous and no longer necessary after previous change.

DavidTruby updated this revision to Diff 404969.Feb 1 2022, 9:23 AM

Rebase on main

DavidTruby updated this revision to Diff 405302.Feb 2 2022, 9:09 AM

Add back special f64 handling for NEON to avoid going through integer register

peterwaller-arm accepted this revision.Feb 3 2022, 6:14 AM

LGTM. I see the one additional instruction in the NOFP16 case which is an artefact of how the BSP node is defined, which is not changed in this patch. We've discussed it and attempted to remove it, but failed for now.

This revision is now accepted and ready to land.Feb 3 2022, 6:14 AM
This revision was landed with ongoing or failed builds.Feb 7 2022, 6:35 AM
This revision was automatically updated to reflect the committed changes.