This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add more intrinsics in 'isZeroingInactiveLanes'.
ClosedPublic

Authored by dewen on Jan 11 2023, 1:20 AM.

Details

Summary

The REINTERPRET_CAST operation generates redundant and and ptrue instructions. For some instructions, this is redundant, because its inactive lanes are zeroed by construction.

for example. codegen before:

facgt p2.d, p0/z, z4.d, z1.d
ptrue p1.d
and p1.b, p2/z, p2.b, p1.b

After:

facgt p1.d, p0/z, z4.d, z1.d

ref: https://reviews.llvm.org/D129851

Diff Detail

Event Timeline

dewen created this revision.Jan 11 2023, 1:20 AM
dewen requested review of this revision.Jan 11 2023, 1:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 1:20 AM
Matt added a subscriber: Matt.Jan 11 2023, 3:42 PM
dewen added inline comments.Jan 11 2023, 7:17 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
293

I recently looked at some applications and found that other intrinsics can do similar optimizations.

dewen edited the summary of this revision. (Show Details)Jan 11 2023, 11:04 PM
dewen added inline comments.Jan 11 2023, 11:35 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
293

@paulwalker-arm @sdesmalen @efriedma @georges @david-arm Can you review this patch for me?

@hassnaa-arm this patch seems to solve the same issue as you're trying to solve in D141397.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
306

Predicate select is slightly different, since there is only a full-vector predicate select (operating on svbool_t), so there are no inactive lanes. It might be worth to point that out in a comment.

Can you please add a test for this case? it seems to be missing.

llvm/test/CodeGen/AArch64/sve2-intrinsics-reinterpret.ll
385

nit: odd indentation

dewen added inline comments.Jan 12 2023, 1:16 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
306

Okay, thank you. @sdesmalen

dewen updated this revision to Diff 488615.Jan 12 2023, 5:17 AM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
306

I have updated. Please review. @sdesmalen

dewen added inline comments.Jan 12 2023, 5:21 AM
llvm/test/CodeGen/AArch64/sve2-intrinsics-reinterpret.ll
385

done

dewen marked an inline comment as done.Jan 12 2023, 11:13 PM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
306

hi, @sdesmalen, thanks for you review. Could you please review it again?

dewen marked an inline comment as done.Jan 13 2023, 1:21 AM
sdesmalen added inline comments.Jan 16 2023, 3:19 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
264–265

This comment only applies to the intrinsic, not to the ISD node.

266–267

We can't make this generalisation to ISD::VSELECT because it doesn't necessarily zero the other lanes, e.g.

define <vscale x 4 x i1> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i1> %p0, <vscale x 4 x i1> %p1) {
  %1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i1> %p0, <vscale x 4 x i1> %p1
  ret <vscale x 4 x i1> %1
}

Is lowered to:

foo:                                    // @foo
      sel     p0.b, p0, p1.b, p2.b
      ret

If p1.b and p2.b have non-zero values in their inactive lanes, then the result p0.b will also have non-zero values in the inactive lanes.

dewen marked an inline comment as done.Jan 16 2023, 4:58 AM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
266–267

@sdesmalen, thank you, You're right. I didn't think about it all before. Sorry.

dewen updated this revision to Diff 489511.Jan 16 2023, 5:09 AM
dewen marked an inline comment as done.Jan 16 2023, 5:12 AM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
266–267

I've deleted this part, thank you again.@sdesmalen

sdesmalen accepted this revision.Jan 16 2023, 9:22 AM
This revision is now accepted and ready to land.Jan 16 2023, 9:22 AM
paulwalker-arm added inline comments.Jan 16 2023, 5:55 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
252

Just to add context to my comments below. Here "inactive lanes" means those lanes that are not visible within the DAG. By which I mean the MVTs for SVE's predicate vector types represent a subset of the underlying bits within a predicate register. This function is asking whether the specified operation is known to zero the bits that are not part of this subset. (e.g. nxv2i1 represents every eighth bit of the predicate register 0,8,16... and this function returns true if the operation is guaranteed to zero bits 1-7,9-15...)

302–305

Does trn and uzp zero inactive lanes? My reading of the pseudo code is that all lanes of the result come from one of the inputs. For example when trn1 is used with a .d element type each "lane" is actually a block of the 8 predicate bits that sit within it and they are copied as a block without modification.

Actually I think we've likely got an upstream bug here with how we're lowering the svbool builtins of these instructions to these intrinsics because the builtins fully define the entire predicate register, which when fixed should mean this function will not need to worry about them.

Either way, if you agree I think it's best to remove their entries/tests from this patch.

306–312

I don't believe the logical instructions zero inactive lanes either. You can see this by looking at the tests you've added. The instructions are always "byte" based and so if the input general predicate contains garbage then so will the result. In this instance this is not a problem because from the ACLE's point of view we only support these intrinsics for <vscale x 16 x i1>[1] types and when used with such types they'll be no reinterprets to introduce the unnecessary and and so I think these entries should also be removed.

[1] I know the code generation works for other predicate types, but that's just a fluke based on some stale isel patterns (i.e. they'll never be generated via the clang builtins) that should probably be removed.

dewen added inline comments.Jan 16 2023, 6:48 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
302–305

I agree with you.@paulwalker-arm

306–312

I don't understand this part, but I think these logic instructions are going to put Inactive elements in the destination predicate register are set to zero. @paulwalker-arm

for example:






dewen added inline comments.Jan 16 2023, 11:36 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
302–305

hi, @paulwalker-arm, thanks for you review. I've reverted my previous patch. Sorry, I didn't see your comment when I submitted it.

Allen added a subscriber: Allen.Jan 16 2023, 11:41 PM
dewen marked an inline comment as done.Jan 16 2023, 11:48 PM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
306–312

Thank you @paulwalker-arm. I understand my mistake now. I'll update the patch.

dewen updated this revision to Diff 489718.Jan 17 2023, 12:34 AM
dewen marked an inline comment as done.Jan 17 2023, 3:51 AM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
302–305

I've updated it, can I ask you to review it again? Thank you.@paulwalker-arm

paulwalker-arm accepted this revision.Jan 17 2023, 3:57 AM

Thanks @dewen for the fixes. This looks good to me now. Do you fancy taking a look at fixing the incorrect lowering for the predicate trn, uzp and zip builtins (I suspect we need dedicated intrinsics where the element type is part of the name rather than relying on the operand types) as a separate patch? or would you rather somebody else fix it?

This revision was landed with ongoing or failed builds.Jan 17 2023, 7:11 PM
This revision was automatically updated to reflect the committed changes.
dewen added a comment.Jan 17 2023, 7:15 PM

Thanks @dewen for the fixes. This looks good to me now. Do you fancy taking a look at fixing the incorrect lowering for the predicate trn, uzp and zip builtins (I suspect we need dedicated intrinsics where the element type is part of the name rather than relying on the operand types) as a separate patch? or would you rather somebody else fix it?

Hi, thank you@paulwalker-arm. I'm interested, but I'm about to start the holiday. Happy New Year.

Thanks @dewen for the fixes. This looks good to me now. Do you fancy taking a look at fixing the incorrect lowering for the predicate trn, uzp and zip builtins (I suspect we need dedicated intrinsics where the element type is part of the name rather than relying on the operand types) as a separate patch? or would you rather somebody else fix it?

Hi, thank you@paulwalker-arm. I'm interested, but I'm about to start the holiday. Happy New Year.

Ok, I'll bring the fix in house then. Enjoy your holiday.