This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/5
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
3/7
bswap-fold.ll

Differential D117680

[InstCombine] Simplify bswap -> shift
ClosedPublic

Authored by chfast on Jan 19 2022, 7:22 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
craig.topper
lebedev.ri

Commits

rG1d7604fdcebd: [InstCombine] Simplify bswap -> shift

Summary

Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one
"active byte", i.e. all active bits are contained in boundaries
of a single byte of x.

https://alive2.llvm.org/ce/z/nvbbU5
https://alive2.llvm.org/ce/z/KiiL3J

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

chfast created this revision.Jan 19 2022, 7:22 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 19 2022, 7:22 AM

chfast requested review of this revision.Jan 19 2022, 7:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2022, 7:22 AM

chfast added reviewers: RKSimon, spatel, xbolva00, reames.Jan 19 2022, 7:44 AM

chfast added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1242	To my understanding, LLVM assumes 8 bit bytes. I was not able to find any constant as the replacement for `8`.

This is one of several related patterns where we mask/shift a single byte from an end of the input:
https://alive2.llvm.org/ce/z/dtHMVr

define i32 @bs_shl32(i32 %0) {
  %2 = shl i32 %0, 24
  %3 = call i32 @llvm.bswap.i32(i32 %2)
  ret i32 %3
}

define i32 @bs_lshr32(i32 %0) {
  %2 = lshr i32 %0, 24
  %3 = call i32 @llvm.bswap.i32(i32 %2)
  ret i32 %3
}

define i32 @bs_and32_000000ff(i32 %x) {
  %m = and i32 %x, 255
  %b = call i32 @llvm.bswap.i32(i32 %m)
  ret i32 %b
}

define i32 @bs_and32_ff000000(i32 %x) {
  %m = and i32 %x, -16777216
  %b = call i32 @llvm.bswap.i32(i32 %m)
  ret i32 %b
}

It might help to see a larger, motivating example in source or IR, so we know what a potentially more general solution would look like.
The backend also misses these patterns, so we may want to mimic whatever we do in IR for codegen. I recently added a similar fold with D117508.

In D117680#3255028, @spatel wrote:
This is one of several related patterns where we mask/shift a single byte from an end of the input:
https://alive2.llvm.org/ce/z/dtHMVr
define i32 @bs_shl32(i32 %0) {
  %2 = shl i32 %0, 24
  %3 = call i32 @llvm.bswap.i32(i32 %2)
  ret i32 %3
}

define i32 @bs_lshr32(i32 %0) {
  %2 = lshr i32 %0, 24
  %3 = call i32 @llvm.bswap.i32(i32 %2)
  ret i32 %3
}

define i32 @bs_and32_000000ff(i32 %x) {
  %m = and i32 %x, 255
  %b = call i32 @llvm.bswap.i32(i32 %m)
  ret i32 %b
}

define i32 @bs_and32_ff000000(i32 %x) {
  %m = and i32 %x, -16777216
  %b = call i32 @llvm.bswap.i32(i32 %m)
  ret i32 %b
}
It might help to see a larger, motivating example in source or IR, so we know what a potentially more general solution would look like.
The backend also misses these patterns, so we may want to mimic whatever we do in IR for codegen. I recently added a similar fold with D117508.

Can we do computeKnownBits().countMaxActiveBits() <= 8 -> replace with shl. If (BitWidth - computeKnownBits().countMaxTrailingZeros()) <= 8 -> replace with lshr? If the input to the bswap happens to be a shift in the other direction, the new shift should be combined with it by existing combines to form an And.

Hi @spatel,

My motivation is rather boring. I have a generic data -> integer loader. The data is in big-endian order. For i32 example the data can be from 1 to 4 bytes in size. Here is the generic template which handles all 4 cases:
https://godbolt.org/z/dPcGjvb94

The load<1> has unnecessary bswap, there zext(*data) should be enough.
There may be some optimization opportunities in load<3> but I did not investigated this in detail.

I can implement other proposed matches on IR and CodeGen levels if you think this is good idea.

Harbormaster completed remote builds in B144292: Diff 401232.Jan 19 2022, 9:16 AM

If we're after a more general mechanism, it might be possible to extend llvm::recognizeBSwapOrBitReverseIdiom() to recognise shift/mask patterns as well - the inner collectBitParts already handles bswap etc.

llvm/test/Transforms/InstCombine/bswap-fold.ll
386	We don't gain much from duplicating tests for different bitwidths like this - one scalar and one vector is typically enough, possibly with some basic additional multiuse and vector shift-amounts-with-undef tests as well

In D117680#3255184, @chfast wrote:

My motivation is rather boring. I have a generic data -> integer loader. The data is in big-endian order. For i32 example the data can be from 1 to 4 bytes in size. Here is the generic template which handles all 4 cases:
https://godbolt.org/z/dPcGjvb94

Thanks! I guessed there was more than just this one potential optimization, but that makes it clearer.
Pushing a logical shift after the bswap (and reversing direction) might get us most of what we need:
https://alive2.llvm.org/ce/z/2zveR6
That should allow the existing demanded bits fold to trigger in the simplest cases if I'm seeing it correctly.

In D117680#3255573, @spatel wrote:

In D117680#3255184, @chfast wrote:

My motivation is rather boring. I have a generic data -> integer loader. The data is in big-endian order. For i32 example the data can be from 1 to 4 bytes in size. Here is the generic template which handles all 4 cases:
https://godbolt.org/z/dPcGjvb94

Thanks! I guessed there was more than just this one potential optimization, but that makes it clearer.
Pushing a logical shift after the bswap (and reversing direction) might get us most of what we need:
https://alive2.llvm.org/ce/z/2zveR6
That should allow the existing demanded bits fold to trigger in the simplest cases if I'm seeing it correctly.

That wouldn't help with the (bswap (and)) though would it? But my computeKnownBits suggestion would allow the bswap to be reduced to a shift.

In D117680#3255612, @craig.topper wrote:

In D117680#3255573, @spatel wrote:

Pushing a logical shift after the bswap (and reversing direction) might get us most of what we need:
https://alive2.llvm.org/ce/z/2zveR6
That should allow the existing demanded bits fold to trigger in the simplest cases if I'm seeing it correctly.

That wouldn't help with the (bswap (and)) though would it? But my computeKnownBits suggestion would allow the bswap to be reduced to a shift.

Right - knownbits would give us more flexibility on the single byte cases, but it wouldn't do anything for the patterns that deal with >1 byte? There's some overlap, but these could be independent patches.

I was hoping that pushing the shift to the end could trigger a narrowing transform on the bswap, but we miss that too:

define i32 @_Z4loadILj3EEjPKh(i8* noundef %0) {
  %2 = bitcast i8* %0 to i24*
  %3 = load i24, i24* %2, align 1
  %4 = zext i24 %3 to i32
  %5 = call i32 @llvm.bswap.i32(i32 %4) ; can we do anything with this?
  %6 = lshr exact i32 %5, 8
  ret i32 %6
}

define i32 @_Z4loadILj2EEjPKh(i8* noundef %0) {
  %2 = bitcast i8* %0 to i16*
  %3 = load i16, i16* %2, align 1
  %4 = zext i16 %3 to i32
  %5 = call i32 @llvm.bswap.i32(i32 %4) ; bswap.i16
  %6 = lshr exact i32 %5, 16
  ret i32 %6
}

[InstCombine] Simplify bswap -> shift

Simplify bswap(x) to shl(x) or lshr(x) if x has at most 8 (byte-size)
low/high active bits.

chfast retitled this revision from [InstCombine] Fold bswap(shl(x, C)) -> and(x, 255) to [InstCombine] Simplify bswap -> shift.Jan 20 2022, 7:08 AM

chfast edited the summary of this revision. (Show Details)

In D117680#3255155, @craig.topper wrote:

Can we do computeKnownBits().countMaxActiveBits() <= 8 -> replace with shl. If (BitWidth - computeKnownBits().countMaxTrailingZeros()) <= 8 -> replace with lshr?

I did this, except I used BitWidth - computeKnownBits().countMinTrailingZeros() witch I believe is the correct one.

llvm/test/Transforms/InstCombine/bswap-fold.ll
421	Any `undef` shift index or `and` argument prevents this optimization. Is this expected?
442	Here we replace bswap with `lshr i64 %2, 56`, but it is further expanded to shl i64 %0, 1 and i64 %3, 254

Harbormaster completed remote builds in B144576: Diff 401631.Jan 20 2022, 8:40 AM

craig.topper added inline comments.Jan 20 2022, 9:09 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

1221

Can this go further.

Something like

unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8);
unsigned LZ = alignDown(Known.countMinLeadingZeros(), 8);

if (BitWidth - TZ - LZ == 8) {
  if ((BitWidth - LZ - 8) > TZ)
    ShiftRight by ((BitWidth - LZ - 8)  - TZ)
  else
    ShiftLeft by (TZ - (BitWidth - LZ - 8))
}

Adapted from the SimplifyDemandedBits for bswap like https://reviews.llvm.org/D117508

craig.topper added inline comments.Jan 20 2022, 9:21 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

1221

Oops that should be

unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8);
unsigned LZ = alignDown(Known.countMinLeadingZeros(), 8);

if (BitWidth - TZ - LZ == 8) {
  if ((BitWidth - TZ - 8) > TZ)
    ShiftRight by ((BitWidth - TZ - 8)  - TZ)
  else
    ShiftLeft by (TZ - (BitWidth - TZ - 8))
}

spatel mentioned this in rG2d031ec5e53f: [InstCombine] add one-use check to opposite shift folds.Jan 20 2022, 10:49 AM

spatel added inline comments.Jan 20 2022, 10:53 AM

llvm/test/Transforms/InstCombine/bswap-fold.ll
442	I added one-use checks with: 2d031ec5e53f ...so this shouldn't have an extra instruction now. Please pre-commit the tests with the baseline CHECKs, so we just show diffs in this patch.

craig.topper added inline comments.Jan 20 2022, 10:53 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1221	Or even simpler. unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8); unsigned LZ = alignDown(Known.countMinLeadingZeros(), 8); if (BitWidth - TZ - LZ == 8) { if (LZ > TZ) ShiftRight by (LZ - TZ) else ShiftLeft by (TZ - LZ) } Wish I could amend or delete my previous comments.
llvm/test/Transforms/InstCombine/bswap-fold.ll
421	I didn't think of that, but it is correct for computeKnownBits. The undef doesn't allow computeKnownBits to determine anything. I don't think I would worry too much about it. @spatel or @lebedev.ri what do you think?

chfast added inline comments.Jan 20 2022, 12:11 PM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1221	Yes, I ended up with the same simplification on paper. Although I think you still have the shift kinds swapped.
llvm/test/Transforms/InstCombine/bswap-fold.ll
442	I added one-use checks with: 2d031ec5e53f ...so this shouldn't have an extra instruction now. Nice Please pre-commit the tests with the baseline CHECKs, so we just show diffs in this patch. Will do.

Implemented extended variant which handles "active byte" at any position, suggested by @craig.topper.

Harbormaster completed remote builds in B144681: Diff 401773.Jan 20 2022, 1:52 PM

LGTM

This revision is now accepted and ready to land.Jan 20 2022, 1:59 PM

chfast removed reviewers: xbolva00, reames, lebedev.ri.Jan 20 2022, 2:05 PM

@spatel, can you also review this?

Should I land 2 commits: one adding tests, second with the implementation and test updates?

https://alive2.llvm.org/ce/z/nvbbU5
https://alive2.llvm.org/ce/z/KiiL3J

In D117680#3259685, @chfast wrote:

Should I land 2 commits: one adding tests, second with the implementation and test updates?

Yes.

I think the commit message needs to be modified after the improved change. It's now longer low/high active bits.

chfast edited the summary of this revision. (Show Details)Jan 20 2022, 2:23 PM

In D117680#3259685, @chfast wrote:

@spatel, can you also review this?

LGTM

Should I land 2 commits: one adding tests, second with the implementation and test updates?

Yes, the patch to add tests should be labeled "NFC". Just in case this patch is reverted, the tests can remain in tree.

llvm/test/Transforms/InstCombine/bswap-fold.ll
421	I doubt that vector bswap occurs enough to worry about, and the even rarer case with undefs shouldn't limit this patch.

Trigger rebuild.

This revision was landed with ongoing or failed builds.Jan 20 2022, 4:26 PM

Closed by commit rG1d7604fdcebd: [InstCombine] Simplify bswap -> shift (authored by chfast). · Explain Why

This revision was automatically updated to reflect the committed changes.

chfast added a commit: rG1d7604fdcebd: [InstCombine] Simplify bswap -> shift.

Harbormaster completed remote builds in B144704: Diff 401807.Jan 20 2022, 5:10 PM

I have noticed this post about undef/poison: https://llvm.discourse.group/t/evolution-of-undef-and-poison-over-time/5917/2.
Should I have added tests with poison instead of undef?

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

15 lines

test/

Transforms/

InstCombine/

bswap-fold.ll

54 lines

Diff 401773

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 1,209 Lines • ▼ Show 20 Lines	if (Instruction *NewMinMax = factorizeMinMaxTree(II))
return NewMinMax;		return NewMinMax;

break;		break;
}		}
case Intrinsic::bswap: {		case Intrinsic::bswap: {
Value *IIOperand = II->getArgOperand(0);		Value *IIOperand = II->getArgOperand(0);
Value *X = nullptr;		Value *X = nullptr;

		KnownBits Known = computeKnownBits(IIOperand, 0, II);
		uint64_t LZ = alignDown(Known.countMinLeadingZeros(), 8);
		uint64_t TZ = alignDown(Known.countMinTrailingZeros(), 8);

		craig.topperUnsubmitted Not Done Reply Inline Actions Can this go further. Something like unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8); unsigned LZ = alignDown(Known.countMinLeadingZeros(), 8); if (BitWidth - TZ - LZ == 8) { if ((BitWidth - LZ - 8) > TZ) ShiftRight by ((BitWidth - LZ - 8) - TZ) else ShiftLeft by (TZ - (BitWidth - LZ - 8)) } Adapted from the SimplifyDemandedBits for bswap like https://reviews.llvm.org/D117508 craig.topper: Can this go further. Something like ``` unsigned TZ = alignDown(Known.countMinTrailingZeros()…
		craig.topperUnsubmitted Not Done Reply Inline Actions Oops that should be unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8); unsigned LZ = alignDown(Known.countMinLeadingZeros(), 8); if (BitWidth - TZ - LZ == 8) { if ((BitWidth - TZ - 8) > TZ) ShiftRight by ((BitWidth - TZ - 8) - TZ) else ShiftLeft by (TZ - (BitWidth - TZ - 8)) } craig.topper: Oops that should be ``` unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8); unsigned LZ…
		craig.topperUnsubmitted Not Done Reply Inline Actions Or even simpler. unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8); unsigned LZ = alignDown(Known.countMinLeadingZeros(), 8); if (BitWidth - TZ - LZ == 8) { if (LZ > TZ) ShiftRight by (LZ - TZ) else ShiftLeft by (TZ - LZ) } Wish I could amend or delete my previous comments. craig.topper: Or even simpler. ``` unsigned TZ = alignDown(Known.countMinTrailingZeros(), 8); unsigned LZ =…
		chfastAuthorUnsubmitted Done Reply Inline Actions Yes, I ended up with the same simplification on paper. Although I think you still have the shift kinds swapped. chfast: Yes, I ended up with the same simplification on paper. Although I think you still have the…
		// bswap(x) -> shift(x) if x has exactly one "active byte"
		if (Known.getBitWidth() - LZ - TZ == 8) {
		assert(LZ != TZ && "active byte cannot be in the middle");
		if (LZ > TZ) // -> shl(x) if the "active byte" is in the low part of x
		return BinaryOperator::CreateNUWShl(
		IIOperand, ConstantInt::get(IIOperand->getType(), LZ - TZ));
		// -> lshr(x) if the "active byte" is in the high part of x
		return BinaryOperator::CreateExactLShr(
		IIOperand, ConstantInt::get(IIOperand->getType(), TZ - LZ));
		}

// bswap(trunc(bswap(x))) -> trunc(lshr(x, c))		// bswap(trunc(bswap(x))) -> trunc(lshr(x, c))
if (match(IIOperand, m_Trunc(m_BSwap(m_Value(X))))) {		if (match(IIOperand, m_Trunc(m_BSwap(m_Value(X))))) {
unsigned C = X->getType()->getScalarSizeInBits() -		unsigned C = X->getType()->getScalarSizeInBits() -
IIOperand->getType()->getScalarSizeInBits();		IIOperand->getType()->getScalarSizeInBits();
Value *CV = ConstantInt::get(X->getType(), C);		Value *CV = ConstantInt::get(X->getType(), C);
Value *V = Builder.CreateLShr(X, CV);		Value *V = Builder.CreateLShr(X, CV);
return new TruncInst(V, IIOperand->getType());		return new TruncInst(V, IIOperand->getType());
}		}
break;		break;
}		}
		chfastAuthorUnsubmitted Done Reply Inline Actions To my understanding, LLVM assumes 8 bit bytes. I was not able to find any constant as the replacement for `8`. chfast: To my understanding, LLVM assumes 8 bit bytes. I was not able to find any constant as the…
case Intrinsic::masked_load:		case Intrinsic::masked_load:
if (Value SimplifiedMaskedOp = simplifyMaskedLoad(II))		if (Value SimplifiedMaskedOp = simplifyMaskedLoad(II))
return replaceInstUsesWith(CI, SimplifiedMaskedOp);		return replaceInstUsesWith(CI, SimplifiedMaskedOp);
break;		break;
case Intrinsic::masked_store:		case Intrinsic::masked_store:
return simplifyMaskedStore(*II);		return simplifyMaskedStore(*II);
case Intrinsic::masked_gather:		case Intrinsic::masked_gather:
return simplifyMaskedGather(*II);		return simplifyMaskedGather(*II);
▲ Show 20 Lines • Show All 2,096 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/bswap-fold.ll

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	;
%t2 = and i64 %t1, 1000000001		%t2 = and i64 %t1, 1000000001
%t3 = mul i64 %t2, %t1 ; to increase use count of the bswap		%t3 = mul i64 %t2, %t1 ; to increase use count of the bswap
ret i64 %t3		ret i64 %t3
}		}


define i64 @bs_active_high8(i64 %0) {		define i64 @bs_active_high8(i64 %0) {
; CHECK-LABEL: @bs_active_high8(		; CHECK-LABEL: @bs_active_high8(
; CHECK-NEXT: [[TMP2:%.]] = shl i64 [[TMP0:%.]], 56		; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 255
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: ret i64 [[TMP2]]
; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = shl i64 %0, 56		%2 = shl i64 %0, 56
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
ret i64 %3		ret i64 %3
}		}

define i32 @bs_active_high7(i32 %0) {		define i32 @bs_active_high7(i32 %0) {
; CHECK-LABEL: @bs_active_high7(		; CHECK-LABEL: @bs_active_high7(
; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP0:%.]], -33554432		; CHECK-NEXT: [[TMP2:%.]] = lshr i32 [[TMP0:%.]], 24
; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.bswap.i32(i32 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 254
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%2 = and i32 %0, -33554432 ; 0xfe000000		%2 = and i32 %0, -33554432 ; 0xfe000000
%3 = call i32 @llvm.bswap.i32(i32 %2)		%3 = call i32 @llvm.bswap.i32(i32 %2)
ret i32 %3		ret i32 %3
}		}

define <2 x i64> @bs_active_high4(<2 x i64> %0) {		define <2 x i64> @bs_active_high4(<2 x i64> %0) {
; CHECK-LABEL: @bs_active_high4(		; CHECK-LABEL: @bs_active_high4(
; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i64> [[TMP0:%.]], <i64 60, i64 60>		; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i64> [[TMP0:%.]], <i64 4, i64 4>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 240, i64 240>
; CHECK-NEXT: ret <2 x i64> [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
;		;
%2 = shl <2 x i64> %0, <i64 60, i64 60>		%2 = shl <2 x i64> %0, <i64 60, i64 60>
		RKSimonUnsubmitted Not Done Reply Inline Actions We don't gain much from duplicating tests for different bitwidths like this - one scalar and one vector is typically enough, possibly with some basic additional multiuse and vector shift-amounts-with-undef tests as well RKSimon: We don't gain much from duplicating tests for different bitwidths like this - one scalar and…
%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)		%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)
ret <2 x i64> %3		ret <2 x i64> %3
}		}

define <2 x i64> @bs_active_high_different(<2 x i64> %0) {		define <2 x i64> @bs_active_high_different(<2 x i64> %0) {
; CHECK-LABEL: @bs_active_high_different(		; CHECK-LABEL: @bs_active_high_different(
; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i64> [[TMP0:%.]], <i64 56, i64 57>		; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i64> [[TMP0:%.]], <i64 56, i64 57>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = lshr exact <2 x i64> [[TMP2]], <i64 56, i64 56>
; CHECK-NEXT: ret <2 x i64> [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
;		;
%2 = shl <2 x i64> %0, <i64 56, i64 57>		%2 = shl <2 x i64> %0, <i64 56, i64 57>
%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)		%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)
ret <2 x i64> %3		ret <2 x i64> %3
}		}

; negative test		; negative test
Show All 10 Lines

; negative test		; negative test
define <2 x i64> @bs_active_high_undef(<2 x i64> %0) {		define <2 x i64> @bs_active_high_undef(<2 x i64> %0) {
; CHECK-LABEL: @bs_active_high_undef(		; CHECK-LABEL: @bs_active_high_undef(
; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i64> [[TMP0:%.]], <i64 56, i64 undef>		; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i64> [[TMP0:%.]], <i64 56, i64 undef>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> [[TMP2]])
; CHECK-NEXT: ret <2 x i64> [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
;		;
%2 = shl <2 x i64> %0, <i64 56, i64 undef>		%2 = shl <2 x i64> %0, <i64 56, i64 undef>
		chfastAuthorUnsubmitted Done Reply Inline Actions Any `undef` shift index or `and` argument prevents this optimization. Is this expected? chfast: Any `undef` shift index or `and` argument prevents this optimization. Is this expected?
		craig.topperUnsubmitted Not Done Reply Inline Actions I didn't think of that, but it is correct for computeKnownBits. The undef doesn't allow computeKnownBits to determine anything. I don't think I would worry too much about it. @spatel or @lebedev.ri what do you think? craig.topper: I didn't think of that, but it is correct for computeKnownBits. The undef doesn't allow…
		spatelUnsubmitted Not Done Reply Inline Actions I doubt that vector bswap occurs enough to worry about, and the even rarer case with undefs shouldn't limit this patch. spatel: I doubt that vector bswap occurs enough to worry about, and the even rarer case with undefs…
%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)		%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)
ret <2 x i64> %3		ret <2 x i64> %3
}		}

define i64 @bs_active_high8_multiuse(i64 %0) {		define i64 @bs_active_high8_multiuse(i64 %0) {
; CHECK-LABEL: @bs_active_high8_multiuse(		; CHECK-LABEL: @bs_active_high8_multiuse(
; CHECK-NEXT: [[TMP2:%.]] = shl i64 [[TMP0:%.]], 56		; CHECK-NEXT: [[TMP2:%.]] = shl i64 [[TMP0:%.]], 56
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP0]], 255
; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
; CHECK-NEXT: ret i64 [[TMP4]]		; CHECK-NEXT: ret i64 [[TMP4]]
;		;
%2 = shl i64 %0, 56		%2 = shl i64 %0, 56
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
%4 = mul i64 %2, %3 ; increase use of shl and bswap		%4 = mul i64 %2, %3 ; increase use of shl and bswap
ret i64 %4		ret i64 %4
}		}

define i64 @bs_active_high7_multiuse(i64 %0) {		define i64 @bs_active_high7_multiuse(i64 %0) {
; CHECK-LABEL: @bs_active_high7_multiuse(		; CHECK-LABEL: @bs_active_high7_multiuse(
; CHECK-NEXT: [[TMP2:%.]] = shl i64 [[TMP0:%.]], 57		; CHECK-NEXT: [[TMP2:%.]] = shl i64 [[TMP0:%.]], 57
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i64 [[TMP2]], 56
		chfastAuthorUnsubmitted Done Reply Inline Actions Here we replace bswap with `lshr i64 %2, 56`, but it is further expanded to shl i64 %0, 1 and i64 %3, 254 chfast: Here we replace bswap with `lshr i64 %2, 56`, but it is further expanded to ``` shl i64 %0, 1…
		spatelUnsubmitted Not Done Reply Inline Actions I added one-use checks with: 2d031ec5e53f ...so this shouldn't have an extra instruction now. Please pre-commit the tests with the baseline CHECKs, so we just show diffs in this patch. spatel: I added one-use checks with: 2d031ec5e53f ...so this shouldn't have an extra instruction now.
		chfastAuthorUnsubmitted Done Reply Inline Actions I added one-use checks with: 2d031ec5e53f ...so this shouldn't have an extra instruction now. Nice Please pre-commit the tests with the baseline CHECKs, so we just show diffs in this patch. Will do. chfast: > I added one-use checks with: > 2d031ec5e53f > ...so this shouldn't have an extra instruction…
; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
; CHECK-NEXT: ret i64 [[TMP4]]		; CHECK-NEXT: ret i64 [[TMP4]]
;		;
%2 = shl i64 %0, 57		%2 = shl i64 %0, 57
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
%4 = mul i64 %2, %3 ; increase use of shl and bswap		%4 = mul i64 %2, %3 ; increase use of shl and bswap
ret i64 %4		ret i64 %4
}		}

define i64 @bs_active_byte_6h(i64 %0) {		define i64 @bs_active_byte_6h(i64 %0) {
; CHECK-LABEL: @bs_active_byte_6h(		; CHECK-LABEL: @bs_active_byte_6h(
; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 280375465082880		; CHECK-NEXT: [[TMP2:%.]] = lshr i64 [[TMP0:%.]], 24
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP2]], 16711680
; CHECK-NEXT: ret i64 [[TMP3]]		; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = and i64 %0, 280375465082880 ; 0xff00'00000000		%2 = and i64 %0, 280375465082880 ; 0xff00'00000000
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
ret i64 %3		ret i64 %3
}		}

define i32 @bs_active_byte_3h(i32 %0) {		define i32 @bs_active_byte_3h(i32 %0) {
; CHECK-LABEL: @bs_active_byte_3h(		; CHECK-LABEL: @bs_active_byte_3h(
; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP0:%.]], 393216		; CHECK-NEXT: [[TMP2:%.]] = lshr i32 [[TMP0:%.]], 8
; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.bswap.i32(i32 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 1536
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%2 = and i32 %0, 393216 ; 0x0006'0000		%2 = and i32 %0, 393216 ; 0x0006'0000
%3 = call i32 @llvm.bswap.i32(i32 %2)		%3 = call i32 @llvm.bswap.i32(i32 %2)
ret i32 %3		ret i32 %3
}		}

define <2 x i32> @bs_active_byte_3h_v2(<2 x i32> %0) {		define <2 x i32> @bs_active_byte_3h_v2(<2 x i32> %0) {
; CHECK-LABEL: @bs_active_byte_3h_v2(		; CHECK-LABEL: @bs_active_byte_3h_v2(
; CHECK-NEXT: [[TMP2:%.]] = and <2 x i32> [[TMP0:%.]], <i32 8388608, i32 65536>		; CHECK-NEXT: [[TMP2:%.]] = and <2 x i32> [[TMP0:%.]], <i32 8388608, i32 65536>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = lshr exact <2 x i32> [[TMP2]], <i32 8, i32 8>
; CHECK-NEXT: ret <2 x i32> [[TMP3]]		; CHECK-NEXT: ret <2 x i32> [[TMP3]]
;		;
%2 = and <2 x i32> %0, <i32 8388608, i32 65536> ; 0x0080'0000, 0x0001'0000		%2 = and <2 x i32> %0, <i32 8388608, i32 65536> ; 0x0080'0000, 0x0001'0000
%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)		%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)
ret <2 x i32> %3		ret <2 x i32> %3
}		}

; negative test		; negative test
define i64 @bs_active_byte_78h(i64 %0) {		define i64 @bs_active_byte_78h(i64 %0) {
; CHECK-LABEL: @bs_active_byte_78h(		; CHECK-LABEL: @bs_active_byte_78h(
; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 108086391056891904		; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 108086391056891904
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])
; CHECK-NEXT: ret i64 [[TMP3]]		; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = and i64 %0, 108086391056891904 ; 0x01800000'00000000		%2 = and i64 %0, 108086391056891904 ; 0x01800000'00000000
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
ret i64 %3		ret i64 %3
}		}


define i16 @bs_active_low1(i16 %0) {		define i16 @bs_active_low1(i16 %0) {
; CHECK-LABEL: @bs_active_low1(		; CHECK-LABEL: @bs_active_low1(
; CHECK-NEXT: [[TMP2:%.]] = lshr i16 [[TMP0:%.]], 15		; CHECK-NEXT: [[TMP2:%.]] = lshr i16 [[TMP0:%.]], 7
; CHECK-NEXT: [[TMP3:%.*]] = call i16 @llvm.bswap.i16(i16 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i16 [[TMP2]], 256
; CHECK-NEXT: ret i16 [[TMP3]]		; CHECK-NEXT: ret i16 [[TMP3]]
;		;
%2 = lshr i16 %0, 15		%2 = lshr i16 %0, 15
%3 = call i16 @llvm.bswap.i16(i16 %2)		%3 = call i16 @llvm.bswap.i16(i16 %2)
ret i16 %3		ret i16 %3
}		}

define <2 x i32> @bs_active_low8(<2 x i32> %0) {		define <2 x i32> @bs_active_low8(<2 x i32> %0) {
; CHECK-LABEL: @bs_active_low8(		; CHECK-LABEL: @bs_active_low8(
; CHECK-NEXT: [[TMP2:%.]] = and <2 x i32> [[TMP0:%.]], <i32 255, i32 255>		; CHECK-NEXT: [[TMP2:%.]] = shl <2 x i32> [[TMP0:%.]], <i32 24, i32 24>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> [[TMP2]])		; CHECK-NEXT: ret <2 x i32> [[TMP2]]
; CHECK-NEXT: ret <2 x i32> [[TMP3]]
;		;
%2 = and <2 x i32> %0, <i32 255, i32 255>		%2 = and <2 x i32> %0, <i32 255, i32 255>
%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)		%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)
ret <2 x i32> %3		ret <2 x i32> %3
}		}

define <2 x i32> @bs_active_low_different(<2 x i32> %0) {		define <2 x i32> @bs_active_low_different(<2 x i32> %0) {
; CHECK-LABEL: @bs_active_low_different(		; CHECK-LABEL: @bs_active_low_different(
; CHECK-NEXT: [[TMP2:%.]] = and <2 x i32> [[TMP0:%.]], <i32 2, i32 128>		; CHECK-NEXT: [[TMP2:%.]] = and <2 x i32> [[TMP0:%.]], <i32 2, i32 128>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = shl nuw <2 x i32> [[TMP2]], <i32 24, i32 24>
; CHECK-NEXT: ret <2 x i32> [[TMP3]]		; CHECK-NEXT: ret <2 x i32> [[TMP3]]
;		;
%2 = and <2 x i32> %0, <i32 2, i32 128>		%2 = and <2 x i32> %0, <i32 2, i32 128>
%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)		%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)
ret <2 x i32> %3		ret <2 x i32> %3
}		}

; negative test		; negative test
Show All 18 Lines	;
%2 = and <2 x i32> %0, <i32 255, i32 undef>		%2 = and <2 x i32> %0, <i32 255, i32 undef>
%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)		%3 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> %2)
ret <2 x i32> %3		ret <2 x i32> %3
}		}

define i64 @bs_active_low8_multiuse(i64 %0) {		define i64 @bs_active_low8_multiuse(i64 %0) {
; CHECK-LABEL: @bs_active_low8_multiuse(		; CHECK-LABEL: @bs_active_low8_multiuse(
; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 255		; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 255
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 56
; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
; CHECK-NEXT: ret i64 [[TMP4]]		; CHECK-NEXT: ret i64 [[TMP4]]
;		;
%2 = and i64 %0, 255		%2 = and i64 %0, 255
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
%4 = mul i64 %2, %3 ; increase use of and and bswap		%4 = mul i64 %2, %3 ; increase use of and and bswap
ret i64 %4		ret i64 %4
}		}

define i64 @bs_active_low7_multiuse(i64 %0) {		define i64 @bs_active_low7_multiuse(i64 %0) {
; CHECK-LABEL: @bs_active_low7_multiuse(		; CHECK-LABEL: @bs_active_low7_multiuse(
; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 127		; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 127
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 56
; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
; CHECK-NEXT: ret i64 [[TMP4]]		; CHECK-NEXT: ret i64 [[TMP4]]
;		;
%2 = and i64 %0, 127		%2 = and i64 %0, 127
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
%4 = mul i64 %2, %3 ; increase use of and and bswap		%4 = mul i64 %2, %3 ; increase use of and and bswap
ret i64 %4		ret i64 %4
}		}

define i64 @bs_active_byte_4l(i64 %0) {		define i64 @bs_active_byte_4l(i64 %0) {
; CHECK-LABEL: @bs_active_byte_4l(		; CHECK-LABEL: @bs_active_byte_4l(
; CHECK-NEXT: [[TMP2:%.]] = and i64 [[TMP0:%.]], 1140850688		; CHECK-NEXT: [[TMP2:%.]] = shl i64 [[TMP0:%.]], 8
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.bswap.i64(i64 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP2]], 292057776128
; CHECK-NEXT: ret i64 [[TMP3]]		; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = and i64 %0, 1140850688 ; 0x44000000		%2 = and i64 %0, 1140850688 ; 0x44000000
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
ret i64 %3		ret i64 %3
}		}

define i32 @bs_active_byte_2l(i32 %0) {		define i32 @bs_active_byte_2l(i32 %0) {
; CHECK-LABEL: @bs_active_byte_2l(		; CHECK-LABEL: @bs_active_byte_2l(
; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP0:%.]], 65280		; CHECK-NEXT: [[TMP2:%.]] = shl i32 [[TMP0:%.]], 8
; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.bswap.i32(i32 [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 16711680
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%2 = and i32 %0, 65280 ; 0xff00		%2 = and i32 %0, 65280 ; 0xff00
%3 = call i32 @llvm.bswap.i32(i32 %2)		%3 = call i32 @llvm.bswap.i32(i32 %2)
ret i32 %3		ret i32 %3
}		}

define <2 x i64> @bs_active_byte_2l_v2(<2 x i64> %0) {		define <2 x i64> @bs_active_byte_2l_v2(<2 x i64> %0) {
; CHECK-LABEL: @bs_active_byte_2l_v2(		; CHECK-LABEL: @bs_active_byte_2l_v2(
; CHECK-NEXT: [[TMP2:%.]] = and <2 x i64> [[TMP0:%.]], <i64 256, i64 65280>		; CHECK-NEXT: [[TMP2:%.]] = and <2 x i64> [[TMP0:%.]], <i64 256, i64 65280>
; CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw <2 x i64> [[TMP2]], <i64 40, i64 40>
; CHECK-NEXT: ret <2 x i64> [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
;		;
%2 = and <2 x i64> %0, <i64 256, i64 65280> ; 0x0100, 0xff00		%2 = and <2 x i64> %0, <i64 256, i64 65280> ; 0x0100, 0xff00
%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)		%3 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> %2)
ret <2 x i64> %3		ret <2 x i64> %3
}		}

; negative test		; negative test
Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Simplify bswap -> shiftClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 401773

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/Transforms/InstCombine/bswap-fold.ll

[InstCombine] Simplify bswap -> shift
ClosedPublic