This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add predicate reinterpret intrinsics
ClosedPublic

Authored by c-rhodes on Feb 12 2020, 1:49 AM.

Details

Summary

Implements the following intrinsics:

  • llvm.aarch64.sve.convert.to.svbool
  • llvm.aarch64.sve.convert.from.svbool

For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the
other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>.

Diff Detail

Event Timeline

c-rhodes created this revision.Feb 12 2020, 1:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 1:49 AM
sdesmalen accepted this revision.Feb 12 2020, 6:12 AM

Simple patch that implements the conversions to/from svbool_t, looks good to me!

llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
44

nit: the naming is a bit inconsistent here, maybe rename the tests to have either: _from_b, _from_h, _from_s, _from_d and _to_b, _to_h, _to_s, _to_d instead of the b2b, b2h, h2b, etc.

This revision is now accepted and ready to land.Feb 12 2020, 6:12 AM

When you convert from <vscale x 2 x i1> to <vscale x 16 x i1> using sve.convert.to.svbool, what values do the other bits contain? Zero? Poison? Something else?

When you convert from <vscale x 2 x i1> to <vscale x 16 x i1> using sve.convert.to.svbool, what values do the other bits contain? Zero? Poison? Something else?

When converting from <vscale x 2 x i1> to svbool_t, the other lanes are defined as zero.

As you probably spotted, this implementation does not actually guarantee that those lanes are zeroed. At the moment though, this is not really a problem because there are no optimizations on these intrinsics that rely on this definition and the instructions themselves produce predicates where the lanes are zeroed implicitly. If we don't fold the zeroing operation into the instruction that implicitly zeroes the lanes, there is a performance penalty, so we should write some peephole pass to remove the unnecessary instructions.

We can either choose to:

Without having properly discussed this and deciding on which way to go, my LGTM was probably a bit premature. Sorry for that!

I'd prefer correctness first, optimizations later. If we miss the optimization, it's a giant red flag to anyone reading the assembly. If we miss correctness, some very unlucky person might waste days trying to track down a miscompile.

At the moment though, this is not really a problem

I'm not sure what you mean by this; you can trivially break the defined semantics by just writing convert.to.svbool(convert.from.svbool(ptrue)). Do you mean the way the C intrinsics are lowered gives some protection?

sdesmalen requested changes to this revision.Feb 18 2020, 3:07 PM

Do you mean the way the C intrinsics are lowered gives some protection?

Correct, the intrinsics are intended to be created by the Clang ACLE implementation, which will reinterpret-cast the predicate results to svbool_t. Without optimisations that introduce a pattern like you describe, nothing breaks in practice. Of course when we will start adding such optimisations, that changes things.

I'm happy with the decision to focus on correctness first though, it probably shouldn't be too hard to fold away the explicit zeroing.

@c-rhodes can you update this patch to explicitly zero the other lanes?

This revision now requires changes to proceed.Feb 18 2020, 3:07 PM
c-rhodes updated this revision to Diff 245400.Feb 19 2020, 7:12 AM
  • Explicitly zero lanes when converting to svbool_t.
efriedma added inline comments.Feb 19 2020, 9:55 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3055

If the conversion isn't changing the type, do you need a REINTERPRET_CAST at all?

llvm/test/CodeGen/AArch64/sve-intrinsics-reinterpret.ll
33

Are you sure the explicit zeroing is working correctly? It looks like reinterpret_bool_h2b and reinterpret_bool_s2b are generating the same code.

c-rhodes updated this revision to Diff 245828.Feb 21 2020, 5:42 AM
  • Fix zeroing.
  • Update tests.
c-rhodes marked 3 inline comments as done.Feb 21 2020, 5:45 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3055

No the reinterpret isn't necessary, I was expecting it to be DAG combined with performNVCASTCombine but realised there's a case missing for the reinterpret ISD node where that's invoked. I've updated it so the operand is returned if it's a cast to the same type.

Looks fine other than the change to LowerSPLAT_VECTOR

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7478

We should not be calling getMachineNode before Select(). If we need a new AArch64ISD opcode, please add one.

I'm not sure how this is relevant to the rest of the patch, in any case.

c-rhodes updated this revision to Diff 246163.Feb 24 2020, 2:28 AM
c-rhodes marked an inline comment as done.

Remove change to lower SPLAT_VECTOR of i1s to pfalse and update test.

c-rhodes marked an inline comment as done.Feb 24 2020, 2:31 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7478

It was necessary for how the test was written downstream with zeroinitializer but you're right it's not really relevant. I've removed this.

sdesmalen added inline comments.Feb 24 2020, 2:50 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3060

Can you just use OutVT instead of using a REINTERPET_CAST?

sdesmalen added inline comments.Feb 24 2020, 5:38 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3060

Sorry, ignore that comment, it of course needs the reinterpret cast because the AND operation is done on nxv16i1.

sdesmalen accepted this revision.Feb 24 2020, 1:08 PM

Same here, LGTM!

This revision is now accepted and ready to land.Feb 24 2020, 1:08 PM
This revision was automatically updated to reflect the committed changes.