Page MenuHomePhabricator

[AArch64][SVE] Add a pass for SVE intrinsic optimisations

Authored by kmclaughlin on Mar 12 2020, 10:12 AM.



Creates the SVEIntrinsicOpts pass. In this patch, the pass tries
to remove unnecessary reinterpret intrinsics which convert to
and from svbool_t (llvm.aarch64.sve.convert.[to|from].svbool)

For example, the reinterprets below are redundant:

%1 = call <vscale x 16 x i1><vscale x 4 x i1> %a)
%2 = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)

The pass also looks for ptest intrinsics and phi instructions where
the operands are being needlessly converted to and from svbool_t.

Diff Detail

Event Timeline

kmclaughlin created this revision.Mar 12 2020, 10:12 AM
Herald added a project: Restricted Project. · View Herald Transcript
efriedma added inline comments.Mar 12 2020, 12:43 PM

unused bool?


processPhiNode(IntrinsicInst *I)?


Please use getArgOperand() to get the arguments of calls.


You might want to check whether the module actually declares any of the SVE intrinsics before you iterate over the whole function.

Cheers for working on this @kmclaughlin ! Have you considered adding an interface for the new PM? You could check this for reference: (also implements a FunctionPass).


In AArch64TargetMachine.cpp it's:

static cl::opt<bool> EnableSVEIntrinsicOpts(
    "aarch64-sve-intrinsic-opts", cl::Hidden,
    cl::desc("Enable SVE intrinsic opts"),

so it probably make sense to use see-intrinsic-opts here as well?


Given where this method is called, this is guaranteed to be always non-null. Perhaps assert instead?


Isn't it guaranteed that RequiredType == Reinterpret->getArgOperand(0)->getType() is always true? I.e., PN and the incoming values have identical type.


[nit] Perhaps Ctx instead of C1?


i -> I


I'd be tempted to rewrite this to use early exit (

auto *Y = isReinterpretToSVBool(I->getArgOperand(0));
if (nullptr == Y)
  return false;

// The rest of the function
  1. Could this be a for-range loop instead?
  1. This loop seems to be a perfect candidate for make_early_inc_range (, e.g.
for (Instruction &I : make_early_inc_range(BB))
 Changed |= optimizeIntrinsic(&I);

AFAIK, this iterates over BBs so I should be replaced with BB.

Also, perhaps for (auto *BB : RPOT) { instead?


What's %1 and %2? Is it worth adding the calls that generated them in the expected output?

Have you considered adding an interface for the new PM?

Please ignore that comment ^^^. I incorrectly assumed that the backend has also been ported to the new PM, but that's not the case. In particular, llc will only use the legacy PM anyway:

Apologies for the confusion!

kmclaughlin marked 13 inline comments as done.
  • Changed this from a function pass to a module pass & now check if any of the relevant SVE intrinsics are declared first before iterating over functions
  • Added more checks on expected output in the tests

Thanks for reviewing this, @efriedma & @andwar!




The incoming values to PN will all have the same type, but this is making sure that the reinterprets are all converting from the same type (there is a test for this in sve-intrinsic-opts-reinterpret.ll called reinterpret_reductions_1, where the arguments to are a mix of nxv2i1 and nxv4i1)


Changed this to use make_early_inc_range as suggested


Thanks for the suggestion - I changed this to a module pass so that we can check if any of the SVE intrinsics we are interested in are declared first.


I think that would make sense. I've added %1 and %2 to the expected output and added more checks to the other tests here.

Thanks for the updates @kmclaughlin ! Would you mind adding a comment to clearly mark the negative tests? E.g. for @reinterpret_reductions_1? Maybe also a comment why a particular case is not optimised? You've already done that for some tests, but not all of them.


I don't see any documentation for the pass that's being added here (apart from the commit msg). Perhaps it's worth expanding this comment?


I think that you could simplify things a bit by using SmallPtrSet instead:

With a set you can avoid explicit checks like this:

std::find(Functions.begin(), Functions.end(), Inst->getFunction()) == Functions.end()

With SmallPtrSet you can write less code :)


Could you decorate this for loop with a comment explaining what kind of data is generated here and why? Ta!

kmclaughlin marked 3 inline comments as done.

Use SmallPtrSet instead of SmallVector for storing functions found by runOnModule
Add more comments to clarify the purpose of the pass and some of the negative reinterpret tests

efriedma added inline comments.Mar 24 2020, 10:56 AM

Iterating over a SmallPtrSet is non-deterministic.

In this context, probably SetVector is the right data structure.

kmclaughlin marked an inline comment as done.

Use SmallSetVector for the list of functions gathered by runOnModule to preserve the order of iteration

efriedma accepted this revision.Apr 9 2020, 1:37 PM


This revision is now accepted and ready to land.Apr 9 2020, 1:37 PM
This revision was automatically updated to reflect the committed changes.