This is an archive of the discontinued LLVM Phabricator instance.

[llvm][ADT] Implement `BitVector::make`
AcceptedPublic

Authored by jansvoboda11 on Jan 12 2022, 7:02 AM.

Details

Summary

LLVM Programmer’s Manual strongly discourages the use of std::vector<bool> and suggests llvm::BitVector as a possible replacement.

Currently, there's no easy way to create llvm::BitVector with statically known bit pattern. Absence of convenient functionality like this may sway people to use std::vector<bool> instead.

This patch adds new std::initializer_list-based named constructor that can serve this purpose.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Jan 12 2022, 7:02 AM
jansvoboda11 requested review of this revision.Jan 12 2022, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 7:02 AM
jansvoboda11 edited the summary of this revision. (Show Details)Jan 12 2022, 7:03 AM
dexonsmith added a subscriber: dblaikie.

It might be awkward to add this new ambiguity. I wonder if we really want to copy that from std::vector<bool>, which doesn't have a great API for bitsets anyway. @dblaikie, any thoughts?

I'm curious which current users of std::vector<bool> (if any) need this. An alternative would be to add a static function with a good name:

class BitVector {
  static BitVector getBits(std::initializer_list<bool> Values);
};
// later:
BitVector BV = BitVector::getBits({false, true});

Or maybe getFromInitializerList() would be more clear.

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
419–422 ↗(On Diff #399318)

I assume this (and other changes here) are because there's a new constructor ambiguity?

It might be awkward to add this new ambiguity. I wonder if we really want to copy that from std::vector<bool>, which doesn't have a great API for bitsets anyway. @dblaikie, any thoughts?

Re: the ambiguity: We could pre-empt that in a separate commit by making the ctor "explicit" - honestly I think we shouldn't be using {} to call user-defined ctors that aren't representative of/"Feel like" conversions. (wouldn't mind that as a broad style statement for LLVM and/or in the form of "mark multi-arg ctors explicit unless they're intentionally converting/structural constructors".

What aspect of "not greatness" API design do you mean - having both the (unsigned bool) and init list ctors at the same time? I think with the former being marked as explicit it's /probably/ OK to me? () does "interesting" things, {} represents the value in some sense.

I wouldn't object to a named ctor either, though. (for either case - though I sort of feel like the existing (unsigned, bool) is the one that could use a name moreso than the new one (if the new one is only via init list, at least - if you could spell the new one as "SV(true, false, true, true, ...)" yeah, maybe having a name would be good))

jansvoboda11 marked an inline comment as done.Jan 13 2022, 2:54 AM

It might be awkward to add this new ambiguity. I wonder if we really want to copy that from std::vector<bool>, which doesn't have a great API for bitsets anyway. @dblaikie, any thoughts?

Since people are used to such constructors from the standard library and ADT as well (e.g. SmallVector), I think it would be good to maintain consistency.

Though I'm okay with static factory function... I like BitVector::make({true, false, true}).

I'm curious which current users of std::vector<bool> (if any) need this.

From a quick search, I can see two users at this moment: clang/unittests/Lex/HeaderSearchTest.cpp and flang/unittests/Runtime/Reduction.cpp. Yes, those are "only" unit tests, but I can see myself choosing std::vector<bool> over llvm::BitVector just because it's much easier to match/compare in tests. The last thing I want to do in unit tests is to fight with the API of some container.

Re: the ambiguity: We could pre-empt that in a separate commit by making the ctor "explicit" - honestly I think we shouldn't be using {} to call user-defined ctors that aren't representative of/"Feel like" conversions. (wouldn't mind that as a broad style statement for LLVM and/or in the form of "mark multi-arg ctors explicit unless they're intentionally converting/structural constructors".

Maybe I'm misunderstanding you, but the existing constructor already is marked as explicit and that doesn't make brace initialization invalid. Did you have something else in mind?

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
419–422 ↗(On Diff #399318)

That's correct. Without these changes, the new constructor would be called instead, implicitly converting the unsigned argument to bool.

It might be awkward to add this new ambiguity. I wonder if we really want to copy that from std::vector<bool>, which doesn't have a great API for bitsets anyway. @dblaikie, any thoughts?

Since people are used to such constructors from the standard library and ADT as well (e.g. SmallVector), I think it would be good to maintain consistency.

My general point is that for a bitset data structure, std::bitset and boost::dynamic_bitset are more compelling data structures for matching APIs than is std::vector<bool>. (BTW, I agree supporting unit tests well is important; wasn't intending to argue otherwise.)

What aspect of "not greatness" API design do you mean - having both the (unsigned bool) and init list ctors at the same time? I think with the former being marked as explicit it's /probably/ OK to me? () does "interesting" things, {} represents the value in some sense.

Yeah, it's the ambiguity on {} that seemed awkward to me; and maybe it's not that bad after all. As long as the interesting behaviour on () is not going to change the meaning of existing code then probably it's okay as-is.

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
419–422 ↗(On Diff #399318)

Does this simpler change also work?

Re: the ambiguity: We could pre-empt that in a separate commit by making the ctor "explicit" - honestly I think we shouldn't be using {} to call user-defined ctors that aren't representative of/"Feel like" conversions. (wouldn't mind that as a broad style statement for LLVM and/or in the form of "mark multi-arg ctors explicit unless they're intentionally converting/structural constructors".

Maybe I'm misunderstanding you, but the existing constructor already is marked as explicit and that doesn't make brace initialization invalid. Did you have something else in mind?

Right right - nevermind me there. Not sure what I was thinking. Thanks for the correction.

I don't have super strong feelings here - but yeah, does seem a bit close. What're the other APIs like by comparison that @dexonsmith has in mind/is suggesting as alternatives? Maybe a quick summary of how similar operations are spelled with different APIs might be good to consider what way we should go.

What're the other APIs like by comparison that @dexonsmith has in mind/is suggesting as alternatives? Maybe a quick summary of how similar operations are spelled with different APIs might be good to consider what way we should go.

I didn't find any data structures in LLVM/Clang with static member functions that take std::initializer_list and create new instance.


Both std::bitset and boost::dynamic_bitset have constructors that take std::string:

std::bitset<8> x(std::string("01010101"))

Even if we used StringRef instead, I don't like that: it requires extra input validation.


Additionally, std::bitset supports unsigned constructor:

std::bitset<8> x(0b01010101)

That conflicts with our existing (unsigned s, bool t = false) constructor though.


Another alternative would be a free factory function, similar to make_pair, make_optional, makeIntrusiveRefCnt etc.:

auto x = makeBitVector({true, false, true, false, true})

I'm starting to think the most sensible approach would be either llvm::BitVector::make({true, false, true}) or llvm::makeBitVector({true, false, true}).
While I was able to resolve the constructor ambiguity upstream, downstream projects that call llvm::BitVector{unsigned, bool} might get into trouble (i.e. hard to track down bugs) when they update to newer LLVM versions.

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
419–422 ↗(On Diff #399318)

No, this is a default member initializer, which can only be brace or equals initializer.

What're the other APIs like by comparison that @dexonsmith has in mind/is suggesting as alternatives? Maybe a quick summary of how similar operations are spelled with different APIs might be good to consider what way we should go.

I didn't find any data structures in LLVM/Clang with static member functions that take std::initializer_list and create new instance.


Both std::bitset and boost::dynamic_bitset have constructors that take std::string:

std::bitset<8> x(std::string("01010101"))

Even if we used StringRef instead, I don't like that: it requires extra input validation.


Additionally, std::bitset supports unsigned constructor:

std::bitset<8> x(0b01010101)

That conflicts with our existing (unsigned s, bool t = false) constructor though.


Another alternative would be a free factory function, similar to make_pair, make_optional, makeIntrusiveRefCnt etc.:

auto x = makeBitVector({true, false, true, false, true})

I'm starting to think the most sensible approach would be either llvm::BitVector::make({true, false, true}) or llvm::makeBitVector({true, false, true}).
While I was able to resolve the constructor ambiguity upstream, downstream projects that call llvm::BitVector{unsigned, bool} might get into trouble (i.e. hard to track down bugs) when they update to newer LLVM versions.

Yeah, thanks for the summary!

Yeah, seems there's enough existing users of the (unsigned, bool) API that without making it invalid it'd be hard to restructure things to get in a better state, so probably a "named constructor"/factory function would be the best thing at the moment.

Thanks for doing the legwork; SGTM too.

Use named constructor instead

jansvoboda11 retitled this revision from [llvm][ADT] Implement `BitVector(std::initializer_list)` to [llvm][ADT] Implement `BitVector::make`.Jan 21 2022, 4:32 AM
jansvoboda11 edited the summary of this revision. (Show Details)
dblaikie accepted this revision.Jan 21 2022, 9:34 AM

Looks like this addresses @dexonsmith's concerns & sounds good to me.

This revision is now accepted and ready to land.Jan 21 2022, 9:34 AM
dexonsmith accepted this revision.Jan 21 2022, 3:15 PM

LGTM too.