This is an archive of the discontinued LLVM Phabricator instance.

[BitVector] Add << and >> operators
ClosedPublic

Authored by zturner on Apr 19 2017, 2:31 PM.

Details

Summary

BitVector does not currently provide any operators for shifting left and right by a specified amount. This patch adds them.

The logic here is surprisingly headache-inducing, but I believe I got it correct.

Diff Detail

Event Timeline

zturner created this revision.Apr 19 2017, 2:31 PM
craig.topper added inline comments.
llvm/include/llvm/ADT/BitVector.h
504

I didn't look closely but why would there be non-zero values on a right shift that need to be cleared?

zturner added inline comments.Apr 19 2017, 2:50 PM
llvm/include/llvm/ADT/BitVector.h
504

I think I can get rid of that, thanks for noticing.

chandlerc added inline comments.Apr 19 2017, 3:38 PM
llvm/include/llvm/ADT/BitVector.h
472

Instead of this, test for 0 and do the word shift and return early? That will let the complex case be less indented.

524

Same as above.

655–661

Reality is worse than this. Using memmove operates on bytes, but the words might be in bigendian and thus the bytes *within* the words ordered differently. =[

I think you should either write a for loop that does the move a word at a time (and thus is obviously endian agnostic) or you'll need to do some endian dance here.. =[

zturner added inline comments.Apr 19 2017, 3:43 PM
llvm/include/llvm/ADT/BitVector.h
655–661

My brain already hurts from this, so I give the odds of me being completely wrong about 100%, but does this matter?

Although memmove moves bytes, the function itself moves a number of bytes that is always a multiple of the word size, and the move happens at a word-aligned boundary, so we should be guaranteed that we're just picking up a chunk of words in whatever endianness they may already be in, and moving that same chunk somewhere else in the same endianness.

shouldn't that work?

craig.topper added inline comments.Apr 19 2017, 3:45 PM
llvm/include/llvm/ADT/BitVector.h
655–661

Aren't we always moving a multiple of words with the memmove which should hide the endianess issue? Richard Smith just added a memmove to the equivalent shift code in APInt so I really hope memmove isn't an issue here.

chandlerc added inline comments.Apr 19 2017, 3:51 PM
llvm/include/llvm/ADT/BitVector.h
655–661

Sorry, I got confused by the predicate above that talks about BitDistance and thought this was the *byte* distance not *word* distance despite the number of times it says "word".

Maybe just an explicit comment here reminding the reader that this is safe because we move in increments of words.

zturner updated this revision to Diff 95843.Apr 19 2017, 4:00 PM

Deleted some miscellaneous functions in MathExtras.h that I ended up not using, and added some comments to the implementation to clarify a few minor points.

zturner updated this revision to Diff 95847.Apr 19 2017, 4:04 PM

Reduced indentation level a little bit.

chandlerc accepted this revision.Apr 19 2017, 5:51 PM

Generally this looks pretty awesome. Thanks for all of this. Minor nitpicks inline. I'd let Craig have a look as well just in case I'm not seeing anything, but good to submit if he's happy as well.

llvm/include/llvm/ADT/BitVector.h
499

Nit picky detail: I would use 'i' for an integer and 'I' for an iterator.

556

Why is this needed here again? We have to have put zeros into the low bits here as well...

This revision is now accepted and ready to land.Apr 19 2017, 5:51 PM
zturner added inline comments.Apr 20 2017, 9:01 AM
llvm/include/llvm/ADT/BitVector.h
556

Right, but it's actually the high order bits that we're concerned about. If your BitVector size is not a multiple of the word size, then the unused bits come from the most siginificant (i.e. leftmost) bits of the most significant word. And shifting left will write non-zero bits into those. Concrete example:

Suppose you've got this BitVector:

31--------------------------------0
XXXXXXXX XXXXXXXX XXXX0011 00001100

Where the X's represent unused bits. So here we've got a word size of 4 bytes, and a BitVector size of 12.

In memory this is going to be represented as

Bits[0] = 0b00000000000000000000001100001100

because the unused bits will be initialized to 0.

Now suppose we shift left by 10. You're going to end up with 0b00000000000011000011 | 000000000000 where I've used an | to indicate where the unused bits are supposed to be. Everything to the left of the | are unused bits and need to be cleared.

This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Apr 20 2017, 9:37 AM
hans added inline comments.
llvm/trunk/include/llvm/ADT/BitVector.h
478 ↗(On Diff #95967)

This sounds like exactly what APInt:tcShiftRight does. Would it be possible to reuse that code?

In general, it seems like a lot of what BitVector does is also implemented in APInt. Craig, what do you think?

BitVector's word size is dependent on target as it is using "unsigned long" not uint32_t/uint64_t. tcShiftRight is hardcoded to uint64_t though we could probably make it a template.