This is an archive of the discontinued LLVM Phabricator instance.

[llvm][AArch64][SVE] Fold literals into math instructions
ClosedPublic

Authored by DavidTruby on Mar 22 2021, 6:15 AM.

Details

Summary

SVE has predicated literal forms of some instructions for specific
literals, which currently are generated correctly when using ACLE
but not when those instructions are generated directly.

This adds the patterns to generate those instructions when
generating from standard LLVM IR instructions.

Co-authored-by: Sander De Smalen <sander.desmalen@arm.com>
Co-authored-by: Cullen Rhodes <cullen.rhodes@arm.com>

Diff Detail

Event Timeline

DavidTruby created this revision.Mar 22 2021, 6:15 AM
DavidTruby requested review of this revision.Mar 22 2021, 6:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 6:15 AM
david-arm added inline comments.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2254

Hi @DavidTruby, this is quite a lot of very similar looking patterns to maintain here. Is there a possible way to condense them somehow, i.e. the A and B patterns are duplicates, so you could invoke a multiclass for A, then a multiclass for B?

Added extra tests for fmax, fmaxnm, fmin, fminnm

Refactored patterns into helper multiclass

DavidTruby added inline comments.Mar 29 2021, 7:28 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2254

@david-arm hopefully this helps a bit, it at least means that each pattern is only mentioned once so only need changing once if they have to be changed in future

joechrisellis added inline comments.Apr 6 2021, 3:28 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
1209–1219

nit: really small thing but the naming here is inconsistent with fpimm0. I would expect fpimm1 and fpimm2. I suppose the difficulty here is shoehoring fpimm_half into this naming convention. If you can think of a sensible way of doing this, I would prefer we change this. Otherwise, ignore me. :-)

llvm/test/CodeGen/AArch64/sve-fp-immediates-merging.ll
2

General test suggestion -- prefer:

target triple = "aarch64-unknown-linux-gnu"

...

attributes #0 = { "target-features"="+sve" }

... and tag your functions with #0. I personally find this to be a bit more reliable when you're running llc on this file. 😄

Also might be worth adding:

; RUN: ... 2>%t ...
; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t
3

Please can we add:

; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
; WARN-NOT: warning
DavidTruby updated this revision to Diff 335785.Apr 7 2021, 5:47 AM

Moved the zeroing predicates behind the UseExperimentalZeroingPseudos feature
Added additional tests for one of the patterns
Addressed misc reivew comments

paulwalker-arm added inline comments.Apr 19 2021, 3:37 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
411–418

These will need to be split in two because the the UNDEF forms are safe to use by default but the ZERO forms must be protected by UseExperimentalZeroingPseudos. You can see how we've done this for FADD_ZPZZ where there exists two multiclasses, sve_fp_bin_pred_hfd and sve_fp_2op_p_zds_zeroing_hsd.

2254

My immediate thought is whether these (or at least some of these) be rolled into the instruction/pseudo multiclasses. By this I mean can sve_fp_2op_i_p_zds be extended with extra parameters so the necessary intrinsic patterns can be added.

Likewise for the multiclasses used to create the UNDEF and ZERO psuedos, can the multiclasses themselves contain the IR patterns plus the vselect variants.

llvm/test/CodeGen/AArch64/sve-fp-immediates-merging.ll
3

Just a note to say this and the above RUN: ... 2>%t request are no longer necessary because llc will now error in these circumstances.

6–7

Minor point but attributes are normally placed after the function definitions as that where the IR printer would put them.

15–16

Please use update_llc_test_checks.py to generate the CHECK lines.

Matt added a subscriber: Matt.May 4 2021, 9:46 AM
DavidTruby updated this revision to Diff 343020.May 5 2021, 6:01 AM

Separated zeroing forms and regenreated check lines

This comment was removed by DavidTruby.

Fixed accidentally overwriting this with a different patch...

DavidTruby edited the summary of this revision. (Show Details)Jun 29 2021, 5:26 AM
DavidTruby edited the summary of this revision. (Show Details)

Merge multiclasses into SVEInstrFormats.td

@paulwalker-arm sorry for the long lag on this; is this new update more like what you wanted to see?

Would be good to be more consistent with the existing classes in here, for example the classes with patterns in are mostly near the top of the file with specific naming SVE_ style.

llvm/lib/Target/AArch64/SVEInstrFormats.td
8157

This doesn't really need to be a separate mutliclass as it is only instantiated once, similar with the other classes.

DavidTruby updated this revision to Diff 377223.Oct 5 2021, 7:01 AM

Merge additional multiclasses

paulwalker-arm added inline comments.Oct 6 2021, 5:02 AM
llvm/lib/Target/AArch64/SVEInstrFormats.td
1730–1733

Assuming my other comments are valid I believe we won't need a multiclass and can then follow the same idiom as with class SVE_2_Op_Pred... (i.e. add new classes in the SVE pattern match helpers section towards the start of this file.

1734–1737

This pattern seems like a better fit for the sve_fp_2op_i_p_zds multiclass where the relevant instructions are defined. Is that possible?

1738–1741

Is this pattern necessary? I'm thinking that even for the IR ops you can just pass the predicate through and thus we can reuse the first pattern. Doing this means we'll also take advantage of the immediate forms when lowering VLS operations.

1745

There's a bit of clash with naming here. In the past _zx implied zeroing where as we now typically attach element types info for the undef case (i.e. _hfd in this instance) and then zeroing_hfd for the zeroing case.

1769

Perhaps you don't need this parameter. Typically you can reference NAME which is the string after the defm when used. So for example you have:

defm FADD_ZPZI    : sve_fp_2op_i_p_zds_zx_zeroing

so NAME will be FADD_ZPZI and thus you can then do:

defm : sve_intrinsic_compact_fp_immediates_zeroing<NAME # "_ZERO_H",

This suggestion likely applies to sve_intrinsic_compact_fp_immediates also, especially if the second pattern can be removed.

Further cleanup of patterns

Hopefully this is going broadly in the direction you wanted @paulwalker-arm I just had a couple of questions about your comments

llvm/lib/Target/AArch64/SVEInstrFormats.td
1738–1741

I'm not sure I understand fully; I've managed to reuse this pattern for both the op and ir_op but it definitely seems to still be necessary to have the ir_op version there. I might be wrong about what you meant though?

1745

Hopefully the new names are more what you're expecting? Let me know if not!

Further refactor of patterns

Structurally this is looking great. There is just a few missing patterns after which I think you're done.

llvm/lib/Target/AArch64/SVEInstrFormats.td
443

Does the inst line need the PPR_3b: and ZPR: parts? I ask because the next class you've added seems to work without them.

1734–1739

For this class, which deals the the normal IR nodes (i.e. not intrinsics), can you add entries for all the legal floating point types (i.e. add entries for nxv2f16, nxv4f16 & nxv2f32) along with suitable tests.

llvm/test/CodeGen/AArch64/sve-fp-immediates-merging.ll
23–34

Sorry to be a pain but can you move the intrinsic tests into a separate file. This is what we've done for some of the other ones, for example there's sve-intrinsics-int-arith.ll and then sve-intrinsics-int-arith-imm.ll. So can you move them into sve-intrinsics-fp-arith-imm.ll leaving this file to just contain the stock IR tests.

Add extra patterns and tests for additional legal types

DavidTruby marked 3 inline comments as done.Oct 14 2021, 6:46 AM
paulwalker-arm accepted this revision.Oct 14 2021, 7:42 AM
This revision is now accepted and ready to land.Oct 14 2021, 7:42 AM
Allen added a subscriber: Allen.Mar 17 2022, 6:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 6:19 AM