Page MenuHomePhabricator

[AArch64] CodeGen for Armv8.8/9.3 MOPS

Authored by tyb0807 on Jan 15 2022, 12:56 PM.



This implements codegen for Armv8.8/9.3 Memory Operations extension
(MOPS). Any memcpy/memset/memmov intrinsics will be always be emitted
as a series of three instructions which perform the operation.

In addition, this introduces a new ACLE intrinsic for memset tagged (see

void *__builtin_arm_mops_memset_tag(void *, int, size_t)

A corresponding LLVM intrinsic is introduced:

i8* llvm.aarch64.mops.memset.tag(i8*, i8, i64)

The types match llvm.memset but the return type is not void.

  1. SelectionDAG:
    • New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS.
    • A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS().
  1. GlobalIsel:
    • AArch64LegalizerInfo will now consider the following generic opcodes if +mops is available, instead of legalising by expanding them to libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET The s8 value of memset is legalised to s64 to match the pseudos.
    • AArch64O0PreLegalizerCombinerInfo will not combine any of the generic opcodes. This means that small or zero sized memory operations will not be optimised out if +mops is present.
    • AArch64InstructionSelector will select the above as new pseudo instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are each expanded in the usual place to a series of three instructions (e.g. SETP/SETM/SETE) which must be emitted together. To avoid the scheduler moving unrelated instructions between parts of MOPS sequences, the sequences are placed inside Machine Instruction Bundles, making sure that late scheduler passes handle it as a single unit.
    • Furthermore, this adds the following additions to the LegalizerInfo API:

      + Add a 3-type version of customForCartesianProduct.

      + Expose immIdx() so that immediates can be marked checked. This is required for G_MEMCPY etc which have an immediate operand $tailcall, and debug builds of LLVM check that the immediates have all been handled by the legalizer.

Patch by Tomas Matheson and Lucas Prates.

Diff Detail

Unit TestsFailed

2,520 msx64 debian > SanitizerCommon-asan-x86_64-Linux.Linux::decorate_proc_maps.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=address -m64 -funwind-tables -ldl -g /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/decorate_proc_maps.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/sanitizer_common/asan-x86_64-Linux/Linux/Output/decorate_proc_maps.cpp.tmp
1,070 msx64 debian > SanitizerCommon-tsan-x86_64-Linux.Linux::decorate_proc_maps.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=thread -m64 -funwind-tables -ldl -g /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/decorate_proc_maps.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/sanitizer_common/tsan-x86_64-Linux/Linux/Output/decorate_proc_maps.cpp.tmp

Event Timeline

tyb0807 created this revision.Jan 15 2022, 12:56 PM
tyb0807 requested review of this revision.Jan 15 2022, 12:56 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 15 2022, 12:56 PM

This looks like a large patch. It may be better split up into a base/isel part, a clang part and a global isel part (which may require different reviewers).


It is probably better to expand the instruction later than to create a bundle.


Are you sure we should _always_ be using them? I would expect a ldr+str to be better than a mov #4; cpyfp; cpyfm; cpyfe, for example.


Can this safely set memVT to the size of a single element? That would need to be multiplied by the size of the memory being set, and if this is being used for aliasing info will be too small.


These need to be marked as 96bit pseudos. They should also clobber NZCV (as should the real instructions above, but that doesn't look like it was ever changed after D116157)


LLVM doesn't need brackets around a single statement.


llvm_unreachable is far more common than llvm_unreachable_internal. It can be moved into the default case too, then it doesn't need a return after it.


There is commented out code here.


You likely don't need the -O2 and -O0 for llc run commands.

Following Dave's comments above, this patch is now split to 4 different patches
and may now be abandoned.

tyb0807 abandoned this revision.Jan 20 2022, 2:48 AM
tyb0807 marked 8 inline comments as done.Jan 20 2022, 3:10 AM
tyb0807 added inline comments.

Please see, the instruction is now expanded as late as possible in the pipeline (i.e. during code emission)


Please see, we are now conservative on when to use MOPS instructions instead of ldr+str


Please see, the size is set to unknown.


Changes to make real instructions NZCV clobbering here

Changes to make pseudos NZCV clobbering (and specifying their size) here


Agreed, we can do this without. However, this will change the generated code (thus the strings that these tests are looking for), hence I'm leaving those options. Please lmk if this is _really_ unwanted and I'll try to fix these tests