This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add llvm::byteswap to bit.h
ClosedPublic

Authored by kazu on Jan 20 2023, 10:47 PM.

Details

Summary

This patch adds C++23-style byteswap to bit.h.

The implementation and tests are largely taken from
llvm/include/llvm/Support/SwapByteOrder.h and
llvm/unittests/Support/SwapByteOrderTest.cpp, respectively.

Diff Detail

Event Timeline

kazu created this revision.Jan 20 2023, 10:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2023, 10:47 PM
kazu requested review of this revision.Jan 20 2023, 10:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2023, 10:47 PM
kazu edited the summary of this revision. (Show Details)Jan 20 2023, 10:51 PM
kazu added reviewers: RKSimon, dblaikie, MaskRay.
RKSimon added inline comments.Jan 21 2023, 3:16 AM
llvm/unittests/ADT/BitTest.cpp
126

Why not just merge these into a single TEST(BitTest, ByteSwap) ?

kazu updated this revision to Diff 491078.Jan 21 2023, 8:15 AM

I've grouped small tests in this iteration.

kazu marked an inline comment as done.Jan 21 2023, 8:18 AM

Please take a look. Thanks!

llvm/unittests/ADT/BitTest.cpp
126

I've grouped small ones. I've left the ones involving loops untouched as they are a little big.

Are you planning to update SwapByteOrder.h to use this?

llvm/include/llvm/ADT/bit.h
62

Looking at SwapByteOrder.h suggests this might need #include <cstdlib>?

70

use has_builtin ?

kazu marked an inline comment as done.Jan 21 2023, 10:41 AM

Are you planning to update SwapByteOrder.h to use this?

Yes, and users of getSwappedBytes for integer types.

llvm/include/llvm/ADT/bit.h
70

Shall we stop using the intrinsics and just stick to the ANDs and ORs? This way, we don't have to worry about the availability of intrinsics, the availability of has_builtin itself, whether we are using ICC, etc. (I couldn't find out why ICC is excluded here.) According to compiler explorer, gcc-4.5.3 and clang-10.0.0 generate bswap for the following code snippet:

#include <stdint.h>

uint32_t bs32(uint32_t v) { 
    uint32_t Byte0 = v & 0x000000FF;
    uint32_t Byte1 = v & 0x0000FF00;
    uint32_t Byte2 = v & 0x00FF0000;
    uint32_t Byte3 = v & 0xFF000000;
    return (Byte0 << 24) | (Byte1 << 8) | (Byte2 >> 8) | (Byte3 >> 24);
}

uint64_t bs64(uint64_t v) {
    uint64_t Hi = bs32(v);
    uint32_t Lo = bs32(v >> 32);
    return (Hi << 32) | Lo;
}

clang prior to 10.0.0 does not generate 64-bit bswap, but we don't attempt to use 64-bit bswap intrinsic in this patch or SwapByteOrder.h anyway.

Note that I mention gcc-4.5.3 above, but we require gcc-7.1.0.

kazu updated this revision to Diff 491107.Jan 21 2023, 4:04 PM

In this iteration, I've removed all the dependence on intrinsics
except for the 64-bit case on MSVC, which does not recognize the
idiom.

I'm also including stdlib.h for _byteswap_uint64.

kazu updated this revision to Diff 491108.Jan 21 2023, 4:26 PM

Guarded #include <stdlib.h> with defined(_MSC_VER) && !defined(_DEBUG).

kazu marked 2 inline comments as done.Jan 21 2023, 4:28 PM

Please take a look. Thanks!

llvm/include/llvm/ADT/bit.h
62

I've added

#if defined(_MSC_VER) && !defined(_DEBUG)
#include <stdlib.h>  // for _byteswap_uint64                                                                           
#endif
70

I've removed the dependence on GCC-style __builtin_xxx, so I don't use __has_builtin,

Both gcc and clang generate good code.

barannikov88 added inline comments.
llvm/include/llvm/ADT/bit.h
70

According to compiler explorer, gcc-4.5.3 and clang-10.0.0 generate bswap for the following code snippet

They may generate the intrinsic for this sample code, but they also may fail to do so when this function is inlined.
I'd prefer seeing the intrinsic.

kazu updated this revision to Diff 491131.Jan 21 2023, 10:51 PM
kazu marked an inline comment as done.

I've restored calls to intrinsics in this iteration.

I am now using __has_builtin to check for the availability of
intrinsics.

kazu marked an inline comment as done.Jan 21 2023, 10:51 PM

Please take a look. Thanks!

llvm/include/llvm/ADT/bit.h
70

OK. I've restored the calls to intrinsics.

barannikov88 accepted this revision.Jan 21 2023, 11:59 PM

LGTM, thanks.
Please wait for @RKSimon acknowledgement.

This revision is now accepted and ready to land.Jan 21 2023, 11:59 PM
RKSimon added inline comments.Jan 22 2023, 4:39 AM
llvm/include/llvm/ADT/bit.h
70

Shouldn't this be __has_builtin(__builtin_bswap32) ?

84

Shouldn't this be __has_builtin(__builtin_bswap64) ?

RKSimon added inline comments.Jan 22 2023, 4:40 AM
llvm/include/llvm/ADT/bit.h
24

<cstdlib>

kazu updated this revision to Diff 491175.Jan 22 2023, 9:28 AM
kazu marked an inline comment as done.

I've switched to ctsdlib from stdlib.h.

I've fixed the names inside __has_builtin.

kazu marked 3 inline comments as done.Jan 22 2023, 9:29 AM
This revision was landed with ongoing or failed builds.Jan 22 2023, 9:29 AM
This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.Jan 23 2023, 8:54 AM
llvm/include/llvm/ADT/bit.h
70

According to compiler explorer, gcc-4.5.3 and clang-10.0.0 generate bswap for the following code snippet

They may generate the intrinsic for this sample code, but they also may fail to do so when this function is inlined.

I'd prefer seeing the intrinsic.

Are there examples of this failing? If not, I'd prefer not to maintain #ifdef'd codepaths that can diverge/introduce build failures that are harder to test (because they don't show up for some people, etc).

Generally we rely on "sufficiently" well-performing compilers (specifically/especially sometimes as aggressively as "clang with LTO") to get good performance out of LLVM, so if Clang gets this right, that's probably enough justification to remove the special case.

barannikov88 added inline comments.Jan 23 2023, 10:22 AM
llvm/include/llvm/ADT/bit.h
70

Are there examples of this failing?

I didn't do actual testing, but this comment suggests that MSVC fails to recognize 64-bit version even when the function is not inlined.

Generally we rely on "sufficiently" well-performing compilers (specifically/especially sometimes as aggressively as "clang with LTO") to get good performance out of LLVM, so if Clang gets this right, that's probably enough justification to remove the special case.

It is possible that expanded version may even perform better than the "builtin" version in some cases. My judgement was only relying on the fact that there are these builtins, so there must have been a good reason for introducing them.

If not, I'd prefer not to maintain #ifdef'd codepaths that can diverge/introduce build failures that are harder to test (because they don't show up for some people, etc).

It is hard to disagree with it. I guess this particular change could rely on compiler optimizations (compared to e.g. popcount below), and that's why I only expressed my personal preference, not a requirement. Should I have marked it as "(optional)" or something like that?

dblaikie added inline comments.Jan 23 2023, 1:39 PM
llvm/include/llvm/ADT/bit.h
70

Are there examples of this failing?

I didn't do actual testing, but this comment suggests that MSVC fails to recognize 64-bit version even when the function is not inlined.

Yeah, I'd be willing to have slightly worse perf on MSVC really - now that we have clang-cl, a bootstrap would recover the performance, I guess.

Generally we rely on "sufficiently" well-performing compilers (specifically/especially sometimes as aggressively as "clang with LTO") to get good performance out of LLVM, so if Clang gets this right, that's probably enough justification to remove the special case.

It is possible that expanded version may even perform better than the "builtin" version in some cases. My judgement was only relying on the fact that there are these builtins, so there must have been a good reason for introducing them.

There's probably loads of history there - perhaps some folks need it for correctness, maybe in the past compilers couldn't optimize it as well as they can now.

If not, I'd prefer not to maintain #ifdef'd codepaths that can diverge/introduce build failures that are harder to test (because they don't show up for some people, etc).

It is hard to disagree with it. I guess this particular change could rely on compiler optimizations (compared to e.g. popcount below), and that's why I only expressed my personal preference, not a requirement. Should I have marked it as "(optional)" or something like that?

(I think even popcount can be pretty well identified/optimized by clang these days?)

Eh, no worries - not enough I'd bother writing a patch now/arguing for changes later. Would've if this wasn't already committed, but just a thing/note for next time sort of thing.