This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Identify SAD pattern
ClosedPublic

Authored by mivnay on Oct 2 2020, 9:04 AM.

Details

Summary

(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))) to ZERO_EXTEND((UABD a, b))

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Diff Detail

Event Timeline

mivnay created this revision.Oct 2 2020, 9:04 AM
mivnay requested review of this revision.Oct 2 2020, 9:04 AM

So this converts v16i32 abs(sub(zext(v16i8), zext(v16i8)))) to v16i32 zext(abd(v16i8, v16i8))? And doesn't start with a trunk? That sounds like it should work OK.

Can you add other types and sign extends? And more tests. And a AArch64ISD node (they are not very complex to add :) )

mivnay added a comment.Oct 6 2020, 6:55 AM

So this converts v16i32 abs(sub(zext(v16i8), zext(v16i8)))) to v16i32 zext(abd(v16i8, v16i8))? And doesn't start with a trunk? That sounds like it should work OK.

Yes.

Can you add other types and sign extends? And more tests. And a AArch64ISD node (they are not very complex to add :) )

Few patterns already exists in .td files for uabdl, etc. I tried to add the ISD node for UABD (posted it in https://reviews.llvm.org/D88897) but there are unintended consequences (ISel is generating different patterns). Can I take a look at it later (when other pending patches are done)? About the sign extend variants, I am not sure whether it is legal to do this transformation.

So this converts v16i32 abs(sub(zext(v16i8), zext(v16i8)))) to v16i32 zext(abd(v16i8, v16i8))? And doesn't start with a trunk? That sounds like it should work OK.

Yes.

Can you add other types and sign extends? And more tests. And a AArch64ISD node (they are not very complex to add :) )

Few patterns already exists in .td files for uabdl, etc. I tried to add the ISD node for UABD (posted it in https://reviews.llvm.org/D88897) but there are unintended consequences (ISel is generating different patterns). Can I take a look at it later (when other pending patches are done)? About the sign extend variants, I am not sure whether it is legal to do this transformation.

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

Can D88897 be folder back into here now?

mivnay added a comment.Oct 6 2020, 9:49 AM

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

I am worried about the semantics of sabd instruction especially about v16i32 (sub(sext(v16i8), sext(v16i8))) to v16i32 sext(sub(v16i8, v16i8)) part. Can this be looked at later once my other patches are through?

Can D88897 be folder back into here now?

There seems to be more issues for adding SABD ISD node. Since it is a separate patch now, can this one move forward?

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

I am worried about the semantics of sabd instruction especially about v16i32 (sub(sext(v16i8), sext(v16i8))) to v16i32 sext(sub(v16i8, v16i8)) part. Can this be looked at later once my other patches are through?

The return type of the abd is positive unsigned, so I think if turning it into v16i32 zext(sabd(v16i8, v16i8)) should be valid? But it's worth testing that, I may be mistaken..

mivnay updated this revision to Diff 296632.Oct 7 2020, 3:21 AM
mivnay added a comment.Oct 7 2020, 3:26 AM

What part of sadb are you worried about? I thought they could be treated the same, given you are extended from enough extra bits. But I may be mistaken, they can be somewhat difficult.

I am worried about the semantics of sabd instruction especially about v16i32 (sub(sext(v16i8), sext(v16i8))) to v16i32 sext(sub(v16i8, v16i8)) part. Can this be looked at later once my other patches are through?

The return type of the abd is positive unsigned, so I think if turning it into v16i32 zext(sabd(v16i8, v16i8)) should be valid? But it's worth testing that, I may be mistaken..

I am not sure about the validity either and need to test it. Can that be added as enhancement later and go ahead with the current pattern?

Does this still only handle v16i8?

RKSimon added a subscriber: RKSimon.Oct 8 2020, 3:48 AM
RKSimon added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11067 ↗(On Diff #296632)

Any chance we could move some of this pattern matching into SelectionDAG? I'd love to get rid of some of the almost identical code from X86ISelLowering.

mivnay added a comment.Oct 8 2020, 6:59 AM

Does this still only handle v16i8?

Currently the patterns written are for v16i8 to v16i16 and v16i8 to v16i32 types. I am not sure what are the other types needed to support along with this as there are at least 24 more patterns if we consider signed and floating points excluding the vabdl ones which are already in td files (Reference : https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vabd).

There are 12 type pair possible for unsigned types alone:

  1. v8i8 to i16,i32,i64 vector types with same number of elements
  2. v16i8 to i16,i32,i64 vector types
  3. v4i16 to i32,i64 vector types
  4. v8i16 to i32, i64 vector types
  5. v2i32 to v2i64 vector type
  6. v4i32 to v4i64 vector type

I have handled the current patterns to solve the bug mentioned in the ticket. Which other types do you have in mind?

mivnay marked an inline comment as done.Oct 8 2020, 7:12 AM
mivnay added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11067 ↗(On Diff #296632)

I have only handled one of the ABD patterns here. There are more in td files and in other places, which would need a bigger refactoring. I think this is out of purview for this patch and can be filed as an enhancement.

mivnay marked an inline comment as done.Oct 8 2020, 10:52 PM

Currently the patterns written are for v16i8 to v16i16 and v16i8 to v16i32 types. I am not sure what are the other types needed to support along with this as there are at least 24 more patterns if we consider signed and floating points excluding the vabdl ones which are already in td files (Reference : https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vabd).

There are 12 type pair possible for unsigned types alone:

  1. v8i8 to i16,i32,i64 vector types with same number of elements
  2. v16i8 to i16,i32,i64 vector types
  3. v4i16 to i32,i64 vector types
  4. v8i16 to i32, i64 vector types
  5. v2i32 to v2i64 vector type
  6. v4i32 to v4i64 vector type

I have handled the current patterns to solve the bug mentioned in the ticket. Which other types do you have in mind?

If we are adding UABD/SABD pattern matching I would expect us to generalize it to support for all the types that instruction supports. You can ignore UABDL though, I agree that is a separate issue (and seems to be handled elsewhere).

The "to" type isn't very important - it just needs to be big enough to make the results equivalent. I think in this case that's only 1 or 2 bits, and it could even not be a power 2 (although that usually isn't very important to support). The "from" type would be i8, i16 or i32, handling either 64 or 128bit vectors. I think if we effectively ignore the "to" type (because it gets handled by the ZERO_EXTEND), what you have here is already pretty close.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11051 ↗(On Diff #296632)

You might want to simplify this and keep it in a single performABSCombine function.

llvm/lib/Target/AArch64/AArch64ISelLowering.h
225 ↗(On Diff #296632)

Can you move this past SRHADD? SHADD and SRHADD deserve to live next to each other.

llvm/test/CodeGen/AArch64/arm64-vabs.ll
165 ↗(On Diff #296632)

The tests needn't do this load if you pass the value as a function argument directly.

mivnay updated this revision to Diff 297174.Oct 9 2020, 3:35 AM
mivnay retitled this revision from [AArch64] Identify SAD pattern for v16i8 type to [AArch64] Identify SAD pattern.
mivnay marked 3 inline comments as done.

Thanks. https://alive2.llvm.org/ce/z/R7TNGK suggests this should be fine for sign extends too, which would be great to add.

@RKSimon As I mentioned in https://reviews.llvm.org/D88897#2316214 I'd like to make this target independent, If you don't mind waiting for me to port this over I am happy to do that later.

lib/Target/AArch64/AArch64ISelLowering.cpp
11085–11086

Can you change this to checking for sign extend or zero extend, then produce AArch64ISD::UABD or AArch64ISD::SABD as needed.

I believe that the final ISD::ZERO_EXTEND would remain as a zext for UABD or SABD.

mivnay updated this revision to Diff 297495.Oct 11 2020, 9:34 PM

Adding support for SABD patterns..

mivnay marked an inline comment as done.Oct 11 2020, 9:34 PM
dmgreen accepted this revision.Oct 13 2020, 12:21 AM

Thank you very much. With one little nitpick, LGTM.

lib/Target/AArch64/AArch64ISelLowering.cpp
11092

size -> Size

This revision is now accepted and ready to land.Oct 13 2020, 12:21 AM
This revision was landed with ongoing or failed builds.Oct 13 2020, 3:24 AM
This revision was automatically updated to reflect the committed changes.