This is an archive of the discontinued LLVM Phabricator instance.

Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)
ClosedPublic

Authored by kazu on Jan 15 2023, 10:40 AM.

Details

Summary

This patch drops the ZeroBehavior parameter from bit counting
functions like countLeadingZeros. ZeroBehavior specifies the behavior
when the input to count{Leading,Trailing}Zeros is zero and when the
input to count{Leading,Trailing}Ones is all ones.

ZeroBehavior was first introduced on May 24, 2013 in commit
eb91eac9fb866ab1243366d2e238b9961895612d. While that patch did not
state the intention, I would guess ZeroBehavior was for performance
reasons. The x86 machines around that time required a conditional
branch to implement countLeadingZero<uint32_t> that returns the 32 on
zero:

test    edi, edi
je      .LBB0_2
bsr     eax, edi
xor     eax, 31

.LBB1_2:

mov     eax, 32

That is, we can remove the conditional branch if we don't care about
the behavior on zero.

IIUC, Intel's Haswell architecture, launched on June 4, 2013,
introduced several bit manipulation instructions, including lzcnt and
tzcnt, which eliminated the need for the conditional branch.

I think it's time to retire ZeroBehavior as its utility is very
limited. If you care about compilation speed, you should build LLVM
with an appropriate -march= to take advantage of lzcnt and tzcnt.
Even if not, modern host compilers should be able to optimize away
quite a few conditional branches because the input is often known to
be nonzero from dominating conditional branches.

Diff Detail

Event Timeline

kazu created this revision.Jan 15 2023, 10:40 AM
kazu requested review of this revision.Jan 15 2023, 10:40 AM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJan 15 2023, 10:40 AM
craig.topper added inline comments.
llvm/include/llvm/Support/MathExtras.h
228

Note, x86 does not have an efficient instruction for find first set with zero returning -1. It will require a cmov to handle zero.

It would be nice to have comments reflecting the new behavior in the case of 0 / max value.

foad added a comment.Jan 17 2023, 3:26 AM

It would be nice to have comments reflecting the new behavior in the case of 0 / max value.

+1

kazu updated this revision to Diff 490348.Jan 18 2023, 6:22 PM

Updated the patch to leave findFirstSet and findLastSet untouched.

Added a comment about the behavior on 0/max input.

kazu edited the summary of this revision. (Show Details)Jan 18 2023, 6:23 PM
kazu marked an inline comment as done.

It would be nice to have comments reflecting the new behavior in the case of 0 / max value.

I've added comments to all four functions -- count{Leading,Trailing}{Zeros,Ones}.

llvm/include/llvm/Support/MathExtras.h
228

The new iteration of the patch leaves findFirstSet and findLastSet untouched.

kazu retitled this revision from Remove ZeroBehavior of countLeadingZeros and the like (NFC) to Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC).Jan 18 2023, 6:27 PM
kazu marked an inline comment as done.

Please take a look. Thanks!

barannikov88 accepted this revision.Jan 18 2023, 6:46 PM

LGTM

llvm/include/llvm/Support/MathExtras.h
228

(optional) ZB could be made a template argument and constexpr-ifed.

This revision is now accepted and ready to land.Jan 18 2023, 6:46 PM
arsenm added a comment.EditedJan 18 2023, 6:55 PM

If you care about compilation speed, you should build LLVM with an appropriate -march= to take advantage of lzcnt and tzcnt.

I think this is bad reasoning, nobody really uses -march

jrtc27 added inline comments.Jan 18 2023, 7:02 PM
llvm/include/llvm/Support/MathExtras.h
225

A bunch of functions still have this documentation, but these are now the only possible values, so this seems redundant

ZB_Max is the strange mode that should be dropped, perhaps also ZB_Undefined.

If you care about compilation speed, you should build LLVM with an appropriate -march= to take advantage of lzcnt and tzcnt.

I think this is bad reasoning, nobody really uses -march

I agree. The reason should be clarified that the lzcnt performance here really doesn't matter.
Note: https://stackoverflow.com/questions/21390165/why-does-breaking-the-output-dependency-of-lzcnt-matter lzcnt/tzcnt have false dependencies on older (pre-Skylake) Intel processors. But this doesn't really matter for LLVM, at least the minor issue does not justify keeping the weird mode ZB_Undefined.

ZB_Max is the strange mode that should be dropped, perhaps also ZB_Undefined.

If you care about compilation speed, you should build LLVM with an appropriate -march= to take advantage of lzcnt and tzcnt.

I think this is bad reasoning, nobody really uses -march

I agree. The reason should be clarified that the lzcnt performance here really doesn't matter.
Note: https://stackoverflow.com/questions/21390165/why-does-breaking-the-output-dependency-of-lzcnt-matter lzcnt/tzcnt have false dependencies on older (pre-Skylake) Intel processors. But this doesn't really matter for LLVM, at least the minor issue does not justify keeping the weird mode ZB_Undefined.

I'm not sure the false dependency issue is relevant here. Without -march, compilers will emit BSR/BSF and possibly a cmov to handle the zero behavior. Both BSR/BSF always have a false dependency because the behavior for 0 input is to keep the old value of the output register. Knowing we can use lzcnt/tzcnt is always better than BSR/BSF.

If we know zero doesn't matter, without march, gcc and recent clang will always emit BSF using the encoding for TZCNT. On CPUs without TZCNT this encoding is treated as BSF. This allows us to use TZCNT at runtime if the CPU is modern even without -march being passed. We can't do the same for BSF/LZCNT because their outputs are inverted from each other.

kazu updated this revision to Diff 490363.Jan 18 2023, 7:41 PM

I've adjusted comments for findFirstSet and findLastSet to reflect the
fact that ZeroBehavior now takes only two values -- ZB_Undefined and
ZB_Max.

kazu marked an inline comment as done.Jan 18 2023, 7:44 PM
kazu added inline comments.
llvm/include/llvm/Support/MathExtras.h
225

I've adjusted the redundant comments on two functions - findFirstSet and findLastSet.

This revision was landed with ongoing or failed builds.Jan 18 2023, 7:58 PM
This revision was automatically updated to reflect the committed changes.
kazu marked 2 inline comments as done.

@kazu Thanks for dealing with this!

I'd like to build on this and create llvm variants of the C++20 <bit> countl_zero/countr_zero/countl_one/countr_one template functions similar to what I did for popcount in D132407 (and have MathExtras.h reuse it) - I just wanted to ensure you hadn't already started anything similar?

kazu added a comment.Jan 19 2023, 8:56 AM

@kazu Thanks for dealing with this!

I'd like to build on this and create llvm variants of the C++20 <bit> countl_zero/countr_zero/countl_one/countr_one template functions similar to what I did for popcount in D132407 (and have MathExtras.h reuse it) - I just wanted to ensure you hadn't already started anything similar?

I just uploaded:

https://reviews.llvm.org/D142078