Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck
asb
luismarques
benshi001
RKSimon
lebedev.ri

Summary

Combine
t2 = bswap t1; t3 = srl t2, x; bswap t3
to shl t1, x; if x %8 == 0.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Chenbing.Zheng created this revision.Feb 28 2022, 12:57 AM

Herald added subscribers: VincentWu, luke957, achieveartificialintelligence and 24 others. · View Herald TranscriptFeb 28 2022, 12:57 AM

Chenbing.Zheng requested review of this revision.Feb 28 2022, 12:57 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD and 2 others. · View Herald TranscriptFeb 28 2022, 12:57 AM

Is this a target-independent optimization that's worth doing in general DAGCombine?

In D120648#3348422, @frasercrmck wrote:

Is this a target-independent optimization that's worth doing in general DAGCombine?

+1 This looks like a useful generic fold to me.

liaolucy added a subscriber: liaolucy.Feb 28 2022, 1:43 AM

Harbormaster completed remote builds in B151719: Diff 411749.Feb 28 2022, 2:04 AM

address comments

Herald added a subscriber: ecnelises. · View Herald TranscriptFeb 28 2022, 4:39 AM

I think you need to confirm that the shift amount is a multiple of 8 ? https://alive2.llvm.org/ce/z/5DdJRS

Could you please post the legality reasoning, not just state what the fold is?

https://alive2.llvm.org/ce/z/gRNCw5

This revision now requires changes to proceed.Feb 28 2022, 4:57 AM

Harbormaster completed remote builds in B151739: Diff 411788.Feb 28 2022, 5:28 AM

add a condition x % 8 == 0, x is left shift num

In D120648#3348738, @lebedev.ri wrote:

Could you please post the legality reasoning, not just state what the fold is?

https://alive2.llvm.org/ce/z/gRNCw5

In D120648#3348733, @RKSimon wrote:

I think you need to confirm that the shift amount is a multiple of 8 ? https://alive2.llvm.org/ce/z/5DdJRS

Thanks, I agree with you

Harbormaster completed remote builds in B151747: Diff 411796.Feb 28 2022, 6:34 AM

I was looking at solving the same test cases by doing

(bitreverse (srl X, C)) -> (shl (bitreverse X), C)

I've posted my patch here D120667

craig.topper mentioned this in D120667: [DAGCombine][RISCV] Fold (bitreverse (srl X, C)) -> (shl (bitreverse X), C) if X is a bswap..Feb 28 2022, 10:05 AM

TBH, this pattern is probably more likely to occur in general code than anything with a bitreverse

In D120648#3349296, @RKSimon wrote:

TBH, this pattern is probably more likely to occur in general code than anything with a bitreverse

Fair enough, but in that case we need a different set of test cases. All of the examples here could legally be a single brev8(reverse bits within each byte) instruction. The shifts are artifacts of type legalization and shouldn't be there. No amount of DAG combines after type legalization can get to the optimal codegen. The only way to do it is to combine (bitreverse (bswap X)) to brev8 pre-type legalization and then type legalize brev8 with any_extend.

We'd probably do better with a smaller match/fold. This is really just recognizing that you can move a logical shift before/after a swap by reversing the direction. We don't get this in IR either if I'm seeing it correctly:
https://alive2.llvm.org/ce/z/2UmMSu

In D120648#3349311, @craig.topper wrote:

In D120648#3349296, @RKSimon wrote:

TBH, this pattern is probably more likely to occur in general code than anything with a bitreverse

Fair enough, but in that case we need a different set of test cases. All of the examples here could legally be a single brev8(reverse bits within each byte) instruction. The shifts are artifacts of type legalization and shouldn't be there. No amount of DAG combines after type legalization can get to the optimal codegen. The only way to do it is to combine (bitreverse (bswap X)) to brev8 pre-type legalization and then type legalize brev8 with any_extend.

I agree that for these cases in bswap-bitreverse.ll combine (bitreverse (bswap X)) can get more optimizations. So should we keep this combine？Maybe it can be used elsewhere.

In D120648#3349335, @spatel wrote:

We'd probably do better with a smaller match/fold. This is really just recognizing that you can move a logical shift before/after a swap by reversing the direction. We don't get this in IR either if I'm seeing it correctly:
https://alive2.llvm.org/ce/z/2UmMSu

Ping

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 5:40 PM

In D120648#3374242, @Chenbing.Zheng wrote:

Ping

We need test cases that are targetted specifically at the change being made. If we add a RISCV DAGCombine for (bitreverse (bswap X)) -> brev8 pre-type legalization, then this code change will become untested.

add tests

Herald added a subscriber: pengfei. · View Herald TranscriptMar 10 2022, 7:04 PM

Harbormaster completed remote builds in B153702: Diff 414560.Mar 10 2022, 8:07 PM

fix test

Harbormaster completed remote builds in B153724: Diff 414591.Mar 10 2022, 11:53 PM

Chenbing.Zheng mentioned this in D121448: [DAGCombine] fold (bitreverse(srl (bitreverse c), x)) -> (shl c, x).Mar 11 2022, 1:00 AM

Please can you precommit the additional tests and then rebase this patch so it shows the changes?

Chenbing.Zheng mentioned this in D121504: [DAGCombine] add tests for bswap-shift optimization.Mar 11 2022, 6:53 PM

rebase precommit tests D121504

Harbormaster completed remote builds in B153875: Diff 414786.Mar 11 2022, 7:11 PM

spatel mentioned this in rG83413bb617aa: [x86] reduce indentation; NFC.Mar 16 2022, 10:39 AM

Can you rebase again. I think the changes to bswap-bitreverse.ll have been fixed a different way and no longer apply.

rebase main

Harbormaster completed remote builds in B154776: Diff 416083.Mar 17 2022, 12:32 AM

lebedev.ri added inline comments.Mar 17 2022, 12:44 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9673
9675–9676	I'm guessing it isn't worth it to instead check with the knownbits that low 3 bits are zeros?

address comments

Chenbing.Zheng marked an inline comment as done.Mar 17 2022, 1:37 AM

Chenbing.Zheng added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9675–9676	It seems that x % 8 == 0 is simple and clear.

Harbormaster completed remote builds in B154782: Diff 416091.Mar 17 2022, 1:37 AM

lebedev.ri added inline comments.Mar 17 2022, 1:38 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9675–9676	It seems that x % 8 == 0 is simple and clear. Sure, but does it handle variable shift amounts?

RKSimon added inline comments.Mar 17 2022, 8:05 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9675–9676	It would also add vector support, which probably isn't that relevant but could maybe turn up.

New patch title sounds confusing.

I didn't see a reply to my earlier suggestion - is there a problem with a more general pattern match (independent of the question of using knownbits on the shift amount)?

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 8e383ce85cb7..498d2f51bbd5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -9745,6 +9745,16 @@ SDValue DAGCombiner::visitBSWAP(SDNode *N) {
     }
   }
 
+  if ((N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL) &&
+      N0.hasOneUse()) {
+    auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));
+    if (ShAmt && ShAmt->getZExtValue() % 8 == 0) {
+      SDValue NewSwap = DAG.getNode(ISD::BSWAP, DL, VT, N0.getOperand(0));
+      unsigned InverseShift = N0.getOpcode() == ISD::SHL ? ISD::SRL : ISD::SHL;
+      return DAG.getNode(InverseShift, DL, VT, NewSwap, N0.getOperand(1));
+    }
+  }
+
   return SDValue();
 }

craig.topper added inline comments.Mar 17 2022, 9:32 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9676	Where did we verify that calling getZExtValue() wouldn't assert if the shift was enormuously large. It would definitely be UB, but we can't guarantee it was folded yet.

spatel mentioned this in D122010: [InstCombine] try to canonicalize logical shift after bswap.Mar 18 2022, 8:23 AM

In D120648#3389382, @spatel wrote:

I didn't see a reply to my earlier suggestion - is there a problem with a more general pattern match (independent of the question of using knownbits on the shift amount)?

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 8e383ce85cb7..498d2f51bbd5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -9745,6 +9745,16 @@ SDValue DAGCombiner::visitBSWAP(SDNode *N) {
     }
   }
 
+  if ((N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL) &&
+      N0.hasOneUse()) {
+    auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));
+    if (ShAmt && ShAmt->getZExtValue() % 8 == 0) {
+      SDValue NewSwap = DAG.getNode(ISD::BSWAP, DL, VT, N0.getOperand(0));
+      unsigned InverseShift = N0.getOpcode() == ISD::SHL ? ISD::SRL : ISD::SHL;
+      return DAG.getNode(InverseShift, DL, VT, NewSwap, N0.getOperand(1));
+    }
+  }
+
   return SDValue();
 }

I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x) and it may provide more opportunity for optimization;
but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?

In D120648#3393774, @Chenbing.Zheng wrote:

I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x) and it may provide more opportunity for optimization;
but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?

We want to have the more flexible/robust pattern-matching because it will handle cases that may not be obvious/visible yet. By making the minimal transform, we try to ensure that the code will be in a "canonical" form that other transforms can then act on. (That is also why I propose adding the transform in IR in D122010.)

In this case, the key optimization is that a bswap-of-bswap (or bitreverse-of-bitreverse) cancels itself out. That optimization already exists, so we don't need to reimplement it here and other places. Just try to get the code into a form that will allow the existing code to apply. This is a fundamental feature of iterative combining. It may seem less efficient, but it's actually more efficient because we don't need to duplicate as much combiner code to get the same level of consistent optimization.

For example, we do not have this test in D121504 (but I think it should be included):

define i32 @test_bswap_shli_8_bswap_i32(i32 %a) nounwind {
    %1 = call i32 @llvm.bswap.i32(i32 %a)
    %2 = shl i32 %1, 8
    %3 = call i32 @llvm.bswap.i32(i32 %2)
    ret i32 %3
}

We could adjust this patch to also deal with the opposite direction shift, but then it will need more redundant pattern-matching code (especially if we need to duplicate it again for bitreverse).

I didn't step through it, but that might be why this patch fails to make some "rev16" optimizations for ARM code that we will get with the transform that I suggested.

spatel mentioned this in rG60820e53ec9d: [InstCombine] try to canonicalize logical shift after bswap.Mar 22 2022, 6:11 AM

In D120648#3395069, @spatel wrote:
In D120648#3393774, @Chenbing.Zheng wrote:

I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x) and it may provide more opportunity for optimization;
but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?

We want to have the more flexible/robust pattern-matching because it will handle cases that may not be obvious/visible yet. By making the minimal transform, we try to ensure that the code will be in a "canonical" form that other transforms can then act on. (That is also why I propose adding the transform in IR in D122010.)

In this case, the key optimization is that a bswap-of-bswap (or bitreverse-of-bitreverse) cancels itself out. That optimization already exists, so we don't need to reimplement it here and other places. Just try to get the code into a form that will allow the existing code to apply. This is a fundamental feature of iterative combining. It may seem less efficient, but it's actually more efficient because we don't need to duplicate as much combiner code to get the same level of consistent optimization.

For example, we do not have this test in D121504 (but I think it should be included):
define i32 @test_bswap_shli_8_bswap_i32(i32 %a) nounwind {
    %1 = call i32 @llvm.bswap.i32(i32 %a)
    %2 = shl i32 %1, 8
    %3 = call i32 @llvm.bswap.i32(i32 %2)
    ret i32 %3
}
We could adjust this patch to also deal with the opposite direction shift, but then it will need more redundant pattern-matching code (especially if we need to duplicate it again for bitreverse).

I didn't step through it, but that might be why this patch fails to make some "rev16" optimizations for ARM code that we will get with the transform that I suggested.

Sorry for my delayed reply，I think what you said makes sense. I will abundant this patch and I have update D121504 according to your suggestion. Would you mind review it and creat your patch to solve it.
There is a similar problem with bitreverse-shift, and I add tests in D121507.

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 28 2022, 1:25 AM

spatel mentioned this in D122655: [SDAG] try to canonicalize logical shift after bswap.Mar 29 2022, 7:00 AM

In D120648#3410788, @Chenbing.Zheng wrote:

Sorry for my delayed reply，I think what you said makes sense. I will abundant this patch and I have update D121504 according to your suggestion. Would you mind review it and creat your patch to solve it.

Thanks for committing the extra tests. I posted D122655 to transform those. We can generalize it to handle bitreverse as a follow-up patch if that is needed.

spatel mentioned this in rGe18cc5277fd8: [SDAG] try to canonicalize logical shift after bswap.Mar 30 2022, 6:30 AM

Diff 411796

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,644 Lines • ▼ Show 20 Lines

SDValue DAGCombiner::visitBSWAP(SDNode *N) {

// fold (bswap c1) -> c2

if (DAG.isConstantIntBuildVectorOrConstantInt(N0))

return DAG.getNode(ISD::BSWAP, DL, VT, N0);

// fold (bswap (bswap x)) -> x

if (N0.getOpcode() == ISD::BSWAP)

return N0.getOperand(0);

// fold (bswap(srl (bswap c), x)) -> (shl c, x)

if (N0->getOpcode() == ISD::SRL && N0.hasOneUse()) {

auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));

if (ShAmt && ShAmt->getZExtValue() % 8 == 0) {

SDValue BSwap = N0->getOperand(0);

if (BSwap->getOpcode() == ISD::BSWAP && BSwap.hasOneUse())

return DAG.getNode(ISD::SHL, DL, VT, BSwap->getOperand(0),

N0->getOperand(1));

}

// Canonicalize bswap(bitreverse(x)) -> bitreverse(bswap(x)). If bitreverse

// isn't supported, it will be expanded to bswap followed by a manual reversal

// of bits in each byte. By placing bswaps before bitreverse, we can remove

// the two bswaps if the bitreverse gets expanded.

if (N0.getOpcode() == ISD::BITREVERSE && N0.hasOneUse()) {

SDValue BSwap = DAG.getNode(ISD::BSWAP, DL, VT, N0.getOperand(0));

return DAG.getNode(ISD::BITREVERSE, DL, VT, BSwap);

}

// fold (bswap shl(x,c)) -> (zext(bswap(trunc(shl(x,sub(c,bw/2))))))

lebedev.riUnsubmitted

Done

return DAG.getNode(ISD::BITREVERSE, DL, VT, BSwap);

}

- // fold (bswap(srl (bswap c), x)) -> (shl c, x)

+ // fold (bswap(srl (bswap c), 8*x)) -> (shl c, 8*x)

if (N0->getOpcode() == ISD::SRL && N0.hasOneUse()) {

lebedev.ri:

// iff x >= bw/2 (i.e. lower half is known zero)

unsigned BW = VT.getScalarSizeInBits();

if (BW >= 32 && N0.getOpcode() == ISD::SHL && N0.hasOneUse()) {

lebedev.riUnsubmitted

Not Done

I'm guessing it isn't worth it to instead check with the knownbits that low 3 bits are zeros?

lebedev.ri: I'm guessing it isn't worth it to instead check with the knownbits that low 3 bits are zeros?

Chenbing.ZhengAuthorUnsubmitted

Done

It seems that x % 8 == 0 is simple and clear.

Chenbing.Zheng: It seems that x % 8 == 0 is simple and clear.

lebedev.riUnsubmitted

Not Done

It seems that x % 8 == 0 is simple and clear.

Sure, but does it handle variable shift amounts?

lebedev.ri: > It seems that x % 8 == 0 is simple and clear. Sure, but does it handle variable shift…

RKSimonUnsubmitted

Not Done

It would also add vector support, which probably isn't that relevant but could maybe turn up.

RKSimon: It would also add vector support, which probably isn't that relevant but could maybe turn up.

craig.topperUnsubmitted

Not Done

Where did we verify that calling getZExtValue() wouldn't assert if the shift was enormuously large. It would definitely be UB, but we can't guarantee it was folded yet.

craig.topper: Where did we verify that calling getZExtValue() wouldn't assert if the shift was enormuously…

auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));

EVT HalfVT = EVT::getIntegerVT(*DAG.getContext(), BW / 2);

if (ShAmt && ShAmt->getAPIntValue().ult(BW) &&

ShAmt->getZExtValue() >= (BW / 2) &&

(ShAmt->getZExtValue() % 16) == 0 && TLI.isTypeLegal(HalfVT) &&

TLI.isTruncateFree(VT, HalfVT) &&

(!LegalOperations || hasOperation(ISD::BSWAP, HalfVT))) {

SDValue Res = N0.getOperand(0);

▲ Show 20 Lines • Show All 14,728 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/bswap-bitreverse.ll

	Show First 20 Lines • Show All 842 Lines • ▼ Show 20 Lines
	; RV64ZBB-NEXT: and a1, a1, a2			; RV64ZBB-NEXT: and a1, a1, a2
	; RV64ZBB-NEXT: and a0, a0, a2			; RV64ZBB-NEXT: and a0, a0, a2
	; RV64ZBB-NEXT: slli a0, a0, 1			; RV64ZBB-NEXT: slli a0, a0, 1
	; RV64ZBB-NEXT: or a0, a1, a0			; RV64ZBB-NEXT: or a0, a1, a0
	; RV64ZBB-NEXT: ret			; RV64ZBB-NEXT: ret
	;			;
	; RV32ZBKB-LABEL: test_bswap_bitreverse_i16:			; RV32ZBKB-LABEL: test_bswap_bitreverse_i16:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: rev8 a0, a0			; RV32ZBKB-NEXT: slli a0, a0, 16
	; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: rev8 a0, a0
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: srli a0, a0, 16			; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bswap_bitreverse_i16:			; RV64ZBKB-LABEL: test_bswap_bitreverse_i16:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 48
	; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 48			; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i16 @llvm.bswap.i16(i16 %a)			%tmp = call i16 @llvm.bswap.i16(i16 %a)
	%tmp2 = call i16 @llvm.bitreverse.i16(i16 %tmp)			%tmp2 = call i16 @llvm.bitreverse.i16(i16 %tmp)
	ret i16 %tmp2			ret i16 %tmp2
	}			}

	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	;			;
	; RV32ZBKB-LABEL: test_bswap_bitreverse_i32:			; RV32ZBKB-LABEL: test_bswap_bitreverse_i32:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bswap_bitreverse_i32:			; RV64ZBKB-LABEL: test_bswap_bitreverse_i32:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 32
	; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 32			; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i32 @llvm.bswap.i32(i32 %a)			%tmp = call i32 @llvm.bswap.i32(i32 %a)
	%tmp2 = call i32 @llvm.bitreverse.i32(i32 %tmp)			%tmp2 = call i32 @llvm.bitreverse.i32(i32 %tmp)
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; RV64ZBB-NEXT: and a1, a1, a2			; RV64ZBB-NEXT: and a1, a1, a2
	; RV64ZBB-NEXT: and a0, a0, a2			; RV64ZBB-NEXT: and a0, a0, a2
	; RV64ZBB-NEXT: slli a0, a0, 1			; RV64ZBB-NEXT: slli a0, a0, 1
	; RV64ZBB-NEXT: or a0, a1, a0			; RV64ZBB-NEXT: or a0, a1, a0
	; RV64ZBB-NEXT: ret			; RV64ZBB-NEXT: ret
	;			;
	; RV32ZBKB-LABEL: test_bitreverse_bswap_i16:			; RV32ZBKB-LABEL: test_bitreverse_bswap_i16:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: rev8 a0, a0			; RV32ZBKB-NEXT: slli a0, a0, 16
	; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: rev8 a0, a0
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: srli a0, a0, 16			; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bitreverse_bswap_i16:			; RV64ZBKB-LABEL: test_bitreverse_bswap_i16:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 48
	; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 48			; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i16 @llvm.bitreverse.i16(i16 %a)			%tmp = call i16 @llvm.bitreverse.i16(i16 %a)
	%tmp2 = call i16 @llvm.bswap.i16(i16 %tmp)			%tmp2 = call i16 @llvm.bswap.i16(i16 %tmp)
	ret i16 %tmp2			ret i16 %tmp2
	}			}

	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	;			;
	; RV32ZBKB-LABEL: test_bitreverse_bswap_i32:			; RV32ZBKB-LABEL: test_bitreverse_bswap_i32:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bitreverse_bswap_i32:			; RV64ZBKB-LABEL: test_bitreverse_bswap_i32:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 32
	; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 32			; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i32 @llvm.bitreverse.i32(i32 %a)			%tmp = call i32 @llvm.bitreverse.i32(i32 %a)
	%tmp2 = call i32 @llvm.bswap.i32(i32 %tmp)			%tmp2 = call i32 @llvm.bswap.i32(i32 %tmp)
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] fold (bswap(srl (bswap c), 8x)) -> (shl c, 8x)
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 411796

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/RISCV/bswap-bitreverse.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] fold (bswap(srl (bswap c), 8*x)) -> (shl c, 8*x)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 411796

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/RISCV/bswap-bitreverse.ll

[DAGCombine] fold (bswap(srl (bswap c), 8x)) -> (shl c, 8x)
AbandonedPublic