This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Use __libcpp_clz for a tighter __log2i function
ClosedPublic

Authored by hans on May 19 2022, 2:49 AM.

Details

Summary

While looking at D113413 I noticed that __log2i could perhaps be improved to be both slightly smaller and faster.

I'm not familiar with how to run benchmarks for this if there are any, but https://godbolt.org/z/48KT3z5Tj looks like an improvement to me.

Diff Detail

Event Timeline

hans created this revision.May 19 2022, 2:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 2:49 AM
hans requested review of this revision.May 19 2022, 2:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 2:49 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
ayzhao added a subscriber: ayzhao.May 19 2022, 9:46 AM
nilayvaish requested changes to this revision.May 21 2022, 12:31 PM

Can you take a look at the failed test?

libcxx/include/__algorithm/sort.h
506

I am wondering whether CHAR_BIT is preferred over using CHAR_BIT (available from climit header file). I don't know what the convention is for writing library code.

511–517

Instead of retaining this, should we just have a static_assert() on the size of the type being less or equal to that of unsigned long long?

This revision now requires changes to proceed.May 21 2022, 12:31 PM
philnik requested changes to this revision.May 21 2022, 12:38 PM
philnik added a subscriber: philnik.

The CI failue look unrelated, but I can't recall any problems with the AArch64 CI. Just rebase it and try again. I don't think there are any benchmarks, but it would be nice if you could provide some in libcxx/benchmarks. I'd also like to see benchmark results before approving.

libcxx/include/__algorithm/sort.h
506

I think CHAR_BIT is the preferred way. At least that's what I'm using.

511–517

We support __int128, so this needs to stay.

philnik retitled this revision from Use __libcpp_clz for a tighter __log2i function to [libc++] Use __libcpp_clz for a tighter __log2i function.May 21 2022, 12:40 PM
hans marked 4 inline comments as done.May 24 2022, 7:27 AM

Here are some benchmark numbers (requires D126297). I ran build.release/projects/libcxx/benchmarks/sort.libcxx.out --benchmark_filter=BM_Sort_uint32_QuickSortAdversary*.

Before my patch:

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1            4.51 ns         4.51 ns    154664960
BM_Sort_uint32_QuickSortAdversary_4            1.97 ns         1.97 ns    357040128
BM_Sort_uint32_QuickSortAdversary_16          0.941 ns        0.941 ns    742129664
BM_Sort_uint32_QuickSortAdversary_64           11.0 ns         11.0 ns     63176704
BM_Sort_uint32_QuickSortAdversary_256          18.1 ns         18.1 ns     38535168
BM_Sort_uint32_QuickSortAdversary_1024         33.1 ns         33.1 ns     21233664
BM_Sort_uint32_QuickSortAdversary_16384        49.4 ns         49.4 ns     14155776
BM_Sort_uint32_QuickSortAdversary_262144       59.9 ns         59.9 ns     11796480

After my patch:

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1            4.81 ns         4.81 ns    145227776
BM_Sort_uint32_QuickSortAdversary_4            1.82 ns         1.82 ns    379322368
BM_Sort_uint32_QuickSortAdversary_16          0.849 ns        0.849 ns    820772864
BM_Sort_uint32_QuickSortAdversary_64           11.6 ns         11.6 ns     60293120
BM_Sort_uint32_QuickSortAdversary_256          19.5 ns         19.5 ns     36175872
BM_Sort_uint32_QuickSortAdversary_1024         36.0 ns         36.0 ns     19660800
BM_Sort_uint32_QuickSortAdversary_16384        53.2 ns         53.2 ns     13369344
BM_Sort_uint32_QuickSortAdversary_262144       63.7 ns         63.7 ns     11272192

Looks like some numbers are up and some are down. I'd say this is perf neutral.

But it does have a small effect on size. The text of sort.libcxx.out goes from 509310 to 508542 bytes, so 768 bytes saved.

I also tried this on a "release" build of Chrome (without ThinLTO or PGO, so doesn't exactly match what we ship), and the text went from 259288968 to 259282312 bytes, so 6.5 KB saved.

libcxx/include/__algorithm/sort.h
506

It seems both are used roughly as often.

$ git grep -In [^_]CHAR_BIT -- libcxx/include | wc -l
25
$ git grep -In __CHAR_BIT__ -- libcxx/include | wc -l
22

Given that __CHAR_BIT__ avoids potentially pushing another header include on every user of <sort>, I'd prefer to stay with it.

I'd also like to see a clean CI run.

libcxx/include/__algorithm/sort.h
506

I think that's manageable. Preprocessing <climits> produces 104 LOC while preprocessing <algorithm> produces 32888 LOC. That's not even a drop in the ocean. And climits is already included anyways. Using CHAR_BIT just makes the code a bit nicer to read. IMO we should actually replace the uses of __CHAR_BIT__. If you are worried about compile times you should use -fmodules.

hans updated this revision to Diff 431705.May 24 2022, 9:58 AM
hans marked an inline comment as done.

Rebase to kick the CI.

hans added inline comments.May 25 2022, 7:08 AM
libcxx/include/__algorithm/sort.h
506

I suppose if <climits> is already included, it doesn't matter in this instance. I'll leave it to someone else to decide whether CHAR_BIT or __CHAR_BIT__ should be preferred in libc++ in general.

If you are worried about compile times you should use -fmodules.

If only things were that simple :-)

hans updated this revision to Diff 431973.May 25 2022, 7:08 AM

Switch to CHAR_BIT from <climits>.

Please take another look.

nilayvaish accepted this revision.May 25 2022, 9:03 PM
hans added a comment.May 27 2022, 8:57 AM

@philnik: Does it look good to you to?

philnik accepted this revision.May 27 2022, 9:03 AM

Yes.

This revision is now accepted and ready to land.May 27 2022, 9:03 AM
This revision was automatically updated to reflect the committed changes.