This is an archive of the discontinued LLVM Phabricator instance.

Add intrinsic to count trailing zero elements in a vector
ClosedPublic

Authored by kmclaughlin on Aug 31 2023, 7:24 AM.

Details

Summary

This patch introduces an experimental intrinsic for counting the
trailing zero elements in a vector. The intrinsic has generic expansion
in SelectionDAGBuilder, and for AArch64 there is a pattern which matches
to brkb & cntp instructions where SVE is enabled.

The intrinsic has a second operand to indicate whether it returns a valid
result if the first argument is all zero, similar to the existing cttz intrinsic.

These changes have been split out from D158291.

Diff Detail

Event Timeline

kmclaughlin created this revision.Aug 31 2023, 7:24 AM
kmclaughlin requested review of this revision.Aug 31 2023, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 7:24 AM

What's the use case for the poison behavior? If it wasn't for X86's weird bsr/bsf instruction behaviour would we have this on the regular llvm.cttz/ctlz intrinsics?

craig.topper added inline comments.Aug 31 2023, 8:09 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5344

Why AnyExt?

Matt added a subscriber: Matt.Aug 31 2023, 9:28 AM
  • Replaced getAnyExtOrTrunc in LowerINTRINSIC_WO_CHAIN with getZExtOrTrunc
kmclaughlin marked an inline comment as done.Aug 31 2023, 11:53 AM

What's the use case for the poison behavior? If it wasn't for X86's weird bsr/bsf instruction behaviour would we have this on the regular llvm.cttz/ctlz intrinsics?

Hi @craig.topper, it seemed sensible to add the poison behaviour to be consistent with the cttz intrinsic. Some targets seemed to require it, so I thought it made sense to return undef at least in the generic lowering if the result is 0.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5344

This should be getZExtOrTrunc

kmclaughlin marked an inline comment as done.Sep 1 2023, 1:19 AM

Hi all,
I just wanted to leave a note to say that I will be away for a few weeks, but I will be replying to any questions and review comments on my return at the beginning of October.
Thanks!

The specific rational for mirroring cttz's second operand, as well as just being consistent, is that it can reduce the range of the result sufficiently to lead to better code generation. I'll use SVE as an example, although knowing that for SVE we don't rely on the common expansion. The largest supported SVE vector register is 256 bytes long. This means cttz_elt has a return value of 0 <= ret >= 256. For this scenario the common expansion will require 16-bit element types. However, for cases when an all zero input results in poison the range is reduced by 1 meaning 8-bit element types can be used.

The RISC-V instruction that we'd want to use for this is vfirst.m that returns -1 if no bits in a mask are set and the bit position of the first set bit otherwise. Determining that no bits are set is simply checking if the result is less than 0. This the same instruction we use for reduce.or on mask vectors.

Which sounds like mirroring cttz's second operand is useful then because you can omit the need for the extra compare?

craig.topper added a comment.EditedSep 1 2023, 10:04 AM

Which sounds like mirroring cttz's second operand is useful then because you can omit the need for the extra compare?

Yes. I think I'm missing when the bit will be set. Does D158291 intend to set the zero is poison bit? I see this patch doesn't use ARM specific instructions when zero is poison is set which confused me.

llvm/include/llvm/IR/Intrinsics.td
2172

why i32 instead of i1 like the existing cttz intrinsic?

Yes I believe D158291 will set the zero is poison bit because the transformation already has to handle the zero case for its quick loop path. At which point I'd expect this patch to be updated accordingly so the SVE code generation remains unaffected.

The semantics of this new intrinsic should be described in LangRef.

And tests are missing: non-AArch64 tests to test generic lowering, and also AArch64 without SVE.

  • Added a description of @llvm.experimental.cttz.elts to LangRef.
  • Split the AArch64 test file so that some tests are not using SVE.
  • Added tests for other targets to test generic lowering.
  • Added the new tests & patterns from D158291 for this intrinsic to this patch.
kmclaughlin added inline comments.Oct 23 2023, 8:33 AM
llvm/include/llvm/IR/Intrinsics.td
2172

Hi @craig.topper, I tried adding this argument as an i1, but ran into problems when trying to build the patterns in AArch64SVEInstrInfo.td if I tried to match this type. I tried to find some similar intrinsics but didn't find any with patterns that match an i1, which is why I chose to use an i32 with timm32_0_1.

nikic added a subscriber: nikic.Oct 23 2023, 11:39 AM
nikic added inline comments.
llvm/include/llvm/IR/Intrinsics.td
2172

Can't comment on how to do this in tablegen, but as this is a target-independent intrinsic, we'd definitely want the argument to be i1 and not i32.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7549

I don't think this bit makes sense. The zero-is-poison flag *allows* you to return poison if this input is zero, but you don't have to return it. Setting this kind of flag should never result in more complex code, as you are generating here.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5349

I don't get what guarantees this assertion (or why it would make sense to assert this at all).

kmclaughlin marked 3 inline comments as done.
  • Changed the second argument of the @llvm.experimental.cttz.elts intrinsic to an i1.
  • Added an AArch64ISD node, CTTZ_ELTS, which is used in when lowering the intrinsic for AArch64 if shouldExpandCttzElements returned true.
  • Changed the generic lowering to no longer check if the result is 0 and return undef if zero-is-poison is set.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7549

Thanks @nikic, I misunderstood what the zero-is-poison flag was used for. I've removed the getSelectCC here and instead only use the flag above when computing the smallest vector element type that can be chosen for the expansion.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5349

Removed - there was nothing to guarantee that the flag would be zero here and we would want to lower the intrinsic for both possible values.

nikic added inline comments.Oct 26 2023, 1:16 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
338

If you need an ISD opcode anyway, maybe make it a generic one right away? Having a generic ISD opcode and expanding it during legalization is generally preferred over doing an expansion in SDAGBuilder, as this allows for more optimal legalizations if the operation is supported only for some types.

kmclaughlin added inline comments.Oct 26 2023, 8:13 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
338

Adding a generic opcode for this would introduce a lot of additional complexity with legalisation, and splitting/promoting the operand would likely result in poor CodeGen. It seemed better to avoid this by expanding the intrinsic in SelectionDAGBuilder as we have for others such as get_active_lane_mask, at least until other targets really need full legalisation support.

I few comments but otherwise this patch looks good. I agree with not making CTTZ_ELTS common, it's really just a target specific convenience and given there's no during code generation use case I don't think it's worth the cost of having to implement all the plumbing required for a common node. We can cross that bridge if/when the operation proves its worth.

llvm/docs/LangRef.rst
18359–18360

Is this last part true? The cases not directly handled by the target are converted to common DAG nodes so there's nothing here that should be functionally broken for other targets?

18389

I think you can drop the 'then'?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7525

Perhaps just !cast<ConstantInt>(I.getOperand(1))->isZero()?

7550

For the case where I.getType()->getScalarSizeInBits() is a non-power-of-two we're going to round up to the next power-of-two bit-width and so I think it's possible for RetTy to be smaller than NewEltTy? If true this should use getZExtOrTrunc().

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1795

Can this be hasSVEorSME()?

This function could be written as

return !Subtarget->hasSVE() || VT != MVT::nxv16i1;
5351–5352

Up to you but you likely don't need this given getZExtOrTrunc will do the right thing (i.e. nothing) for a nop cast.

kmclaughlin marked 6 inline comments as done.
  • Changed shouldExpandCttzElements to query hasSVEorSME() and added a RUN line to AArch64/intrinsic-cttz-elts-sve.ll with -mattr=+sme
  • Removed unnecessary getZExtOrTrunc from lowering of intrinsic in AArch64ISelLowering.cpp
  • Added a test where the return type of the intrinsic is not a power of two
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1795

We can use hasSVEorSME() here, I've changed this and added a RUN line to AArch64/intrinsic-cttz-elts-sve.ll

paulwalker-arm accepted this revision.Oct 27 2023, 9:08 AM
This revision is now accepted and ready to land.Oct 27 2023, 9:08 AM
kmclaughlin closed this revision.Nov 1 2023, 10:41 AM

Closed by commit rG3b786f2c7608

Thanks all for reviewing this patch!