This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] CodeGen for Armv8.8/9.3 MOPS
AbandonedPublic

Authored by tyb0807 on Jan 15 2022, 12:56 PM.

Details

Summary

This implements codegen for Armv8.8/9.3 Memory Operations extension
(MOPS). Any memcpy/memset/memmov intrinsics will be always be emitted
as a series of three instructions which perform the operation.

In addition, this introduces a new ACLE intrinsic for memset tagged (see
https://github.com/ARM-software/acle/pull/38).

void *__builtin_arm_mops_memset_tag(void *, int, size_t)

A corresponding LLVM intrinsic is introduced:

i8* llvm.aarch64.mops.memset.tag(i8*, i8, i64)

The types match llvm.memset but the return type is not void.

  1. SelectionDAG:
    • New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS.
    • A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS().
  1. GlobalIsel:
    • AArch64LegalizerInfo will now consider the following generic opcodes if +mops is available, instead of legalising by expanding them to libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET The s8 value of memset is legalised to s64 to match the pseudos.
    • AArch64O0PreLegalizerCombinerInfo will not combine any of the generic opcodes. This means that small or zero sized memory operations will not be optimised out if +mops is present.
    • AArch64InstructionSelector will select the above as new pseudo instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are each expanded in the usual place to a series of three instructions (e.g. SETP/SETM/SETE) which must be emitted together. To avoid the scheduler moving unrelated instructions between parts of MOPS sequences, the sequences are placed inside Machine Instruction Bundles, making sure that late scheduler passes handle it as a single unit.
    • Furthermore, this adds the following additions to the LegalizerInfo API:

      + Add a 3-type version of customForCartesianProduct.

      + Expose immIdx() so that immediates can be marked checked. This is required for G_MEMCPY etc which have an immediate operand $tailcall, and debug builds of LLVM check that the immediates have all been handled by the legalizer.

Patch by Tomas Matheson and Lucas Prates.

Diff Detail

Event Timeline

tyb0807 created this revision.Jan 15 2022, 12:56 PM
tyb0807 requested review of this revision.Jan 15 2022, 12:56 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 15 2022, 12:56 PM

This looks like a large patch. It may be better split up into a base/isel part, a clang part and a global isel part (which may require different reviewers).

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
836

It is probably better to expand the instruction later than to create a bundle.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
940

Are you sure we should _always_ be using them? I would expect a ldr+str to be better than a mov #4; cpyfp; cpyfm; cpyfe, for example.

11875

Can this safely set memVT to the size of a single element? That would need to be multiplied by the size of the memory being set, and if this is being used for aliasing info will be too small.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
8350

These need to be marked as 96bit pseudos. They should also clobber NZCV (as should the real instructions above, but that doesn't look like it was ever changed after D116157)

llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp
28

LLVM doesn't need brackets around a single statement.

48

llvm_unreachable is far more common than llvm_unreachable_internal. It can be moved into the default case too, then it doesn't need a return after it.

53–54

There is commented out code here.

llvm/test/CodeGen/AArch64/aarch64-mops-mte.ll
3

You likely don't need the -O2 and -O0 for llc run commands.

Following Dave's comments above, this patch is now split to 4 different patches
https://reviews.llvm.org/D117753
https://reviews.llvm.org/D117757
https://reviews.llvm.org/D117763
https://reviews.llvm.org/D117764
and may now be abandoned.

tyb0807 abandoned this revision.Jan 20 2022, 2:48 AM
tyb0807 marked 8 inline comments as done.Jan 20 2022, 3:10 AM
tyb0807 added inline comments.
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
836

Please see https://reviews.llvm.org/D117763, the instruction is now expanded as late as possible in the pipeline (i.e. during code emission)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
940

Please see https://reviews.llvm.org/D117764, we are now conservative on when to use MOPS instructions instead of ldr+str

11875

Please see https://reviews.llvm.org/D117764, the size is set to unknown.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
8350

Changes to make real instructions NZCV clobbering here https://reviews.llvm.org/D117757

Changes to make pseudos NZCV clobbering (and specifying their size) here https://reviews.llvm.org/D117764

llvm/test/CodeGen/AArch64/aarch64-mops-mte.ll
3

Agreed, we can do this without. However, this will change the generated code (thus the strings that these tests are looking for), hence I'm leaving those options. Please lmk if this is _really_ unwanted and I'll try to fix these tests