Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one
"active byte", i.e. all active bits are contained in boundaries
of a single byte of x.
https://alive2.llvm.org/ce/z/nvbbU5
https://alive2.llvm.org/ce/z/KiiL3J
Differential D117680
[InstCombine] Simplify bswap -> shift chfast on Jan 19 2022, 7:22 AM. Authored by
Details Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one https://alive2.llvm.org/ce/z/nvbbU5
Diff Detail
Event Timeline
Comment Actions This is one of several related patterns where we mask/shift a single byte from an end of the input: define i32 @bs_shl32(i32 %0) { %2 = shl i32 %0, 24 %3 = call i32 @llvm.bswap.i32(i32 %2) ret i32 %3 } define i32 @bs_lshr32(i32 %0) { %2 = lshr i32 %0, 24 %3 = call i32 @llvm.bswap.i32(i32 %2) ret i32 %3 } define i32 @bs_and32_000000ff(i32 %x) { %m = and i32 %x, 255 %b = call i32 @llvm.bswap.i32(i32 %m) ret i32 %b } define i32 @bs_and32_ff000000(i32 %x) { %m = and i32 %x, -16777216 %b = call i32 @llvm.bswap.i32(i32 %m) ret i32 %b } It might help to see a larger, motivating example in source or IR, so we know what a potentially more general solution would look like. Comment Actions Can we do computeKnownBits().countMaxActiveBits() <= 8 -> replace with shl. If (BitWidth - computeKnownBits().countMaxTrailingZeros()) <= 8 -> replace with lshr? If the input to the bswap happens to be a shift in the other direction, the new shift should be combined with it by existing combines to form an And. Comment Actions Hi @spatel, My motivation is rather boring. I have a generic data -> integer loader. The data is in big-endian order. For i32 example the data can be from 1 to 4 bytes in size. Here is the generic template which handles all 4 cases: The load<1> has unnecessary bswap, there zext(*data) should be enough. I can implement other proposed matches on IR and CodeGen levels if you think this is good idea. Comment Actions If we're after a more general mechanism, it might be possible to extend llvm::recognizeBSwapOrBitReverseIdiom() to recognise shift/mask patterns as well - the inner collectBitParts already handles bswap etc.
Comment Actions Thanks! I guessed there was more than just this one potential optimization, but that makes it clearer. Comment Actions That wouldn't help with the (bswap (and)) though would it? But my computeKnownBits suggestion would allow the bswap to be reduced to a shift. Comment Actions Right - knownbits would give us more flexibility on the single byte cases, but it wouldn't do anything for the patterns that deal with >1 byte? There's some overlap, but these could be independent patches. I was hoping that pushing the shift to the end could trigger a narrowing transform on the bswap, but we miss that too: define i32 @_Z4loadILj3EEjPKh(i8* noundef %0) { %2 = bitcast i8* %0 to i24* %3 = load i24, i24* %2, align 1 %4 = zext i24 %3 to i32 %5 = call i32 @llvm.bswap.i32(i32 %4) ; can we do anything with this? %6 = lshr exact i32 %5, 8 ret i32 %6 } define i32 @_Z4loadILj2EEjPKh(i8* noundef %0) { %2 = bitcast i8* %0 to i16* %3 = load i16, i16* %2, align 1 %4 = zext i16 %3 to i32 %5 = call i32 @llvm.bswap.i32(i32 %4) ; bswap.i16 %6 = lshr exact i32 %5, 16 ret i32 %6 } Comment Actions [InstCombine] Simplify bswap -> shift Simplify bswap(x) to shl(x) or lshr(x) if x has at most 8 (byte-size) Comment Actions I did this, except I used BitWidth - computeKnownBits().countMinTrailingZeros() witch I believe is the correct one.
Comment Actions Implemented extended variant which handles "active byte" at any position, suggested by @craig.topper. Comment Actions @spatel, can you also review this? Should I land 2 commits: one adding tests, second with the implementation and test updates? Comment Actions I think the commit message needs to be modified after the improved change. It's now longer low/high active bits. Comment Actions LGTM
Yes, the patch to add tests should be labeled "NFC". Just in case this patch is reverted, the tests can remain in tree.
Comment Actions I have noticed this post about undef/poison: https://llvm.discourse.group/t/evolution-of-undef-and-poison-over-time/5917/2. |
Can this go further.
Something like
Adapted from the SimplifyDemandedBits for bswap like https://reviews.llvm.org/D117508