This is an archive of the discontinued LLVM Phabricator instance.

In D120648#3348738, @lebedev.ri wrote:

Could you please post the legality reasoning, not just state what the fold is?

https://alive2.llvm.org/ce/z/gRNCw5

In D120648#3348733, @RKSimon wrote:

I think you need to confirm that the shift amount is a multiple of 8 ? https://alive2.llvm.org/ce/z/5DdJRS

Thanks, I agree with you

Harbormaster completed remote builds in B151747: Diff 411796.Feb 28 2022, 6:34 AM

I was looking at solving the same test cases by doing

(bitreverse (srl X, C)) -> (shl (bitreverse X), C)

I've posted my patch here D120667

craig.topper mentioned this in D120667: [DAGCombine][RISCV] Fold (bitreverse (srl X, C)) -> (shl (bitreverse X), C) if X is a bswap..Feb 28 2022, 10:05 AM

TBH, this pattern is probably more likely to occur in general code than anything with a bitreverse

In D120648#3349296, @RKSimon wrote:

TBH, this pattern is probably more likely to occur in general code than anything with a bitreverse

Fair enough, but in that case we need a different set of test cases. All of the examples here could legally be a single brev8(reverse bits within each byte) instruction. The shifts are artifacts of type legalization and shouldn't be there. No amount of DAG combines after type legalization can get to the optimal codegen. The only way to do it is to combine (bitreverse (bswap X)) to brev8 pre-type legalization and then type legalize brev8 with any_extend.

We'd probably do better with a smaller match/fold. This is really just recognizing that you can move a logical shift before/after a swap by reversing the direction. We don't get this in IR either if I'm seeing it correctly:
https://alive2.llvm.org/ce/z/2UmMSu

In D120648#3349311, @craig.topper wrote:

In D120648#3349296, @RKSimon wrote:

TBH, this pattern is probably more likely to occur in general code than anything with a bitreverse

Fair enough, but in that case we need a different set of test cases. All of the examples here could legally be a single brev8(reverse bits within each byte) instruction. The shifts are artifacts of type legalization and shouldn't be there. No amount of DAG combines after type legalization can get to the optimal codegen. The only way to do it is to combine (bitreverse (bswap X)) to brev8 pre-type legalization and then type legalize brev8 with any_extend.

I agree that for these cases in bswap-bitreverse.ll combine (bitreverse (bswap X)) can get more optimizations. So should we keep this combine？Maybe it can be used elsewhere.

In D120648#3349335, @spatel wrote:

We'd probably do better with a smaller match/fold. This is really just recognizing that you can move a logical shift before/after a swap by reversing the direction. We don't get this in IR either if I'm seeing it correctly:
https://alive2.llvm.org/ce/z/2UmMSu

Ping

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 5:40 PM

In D120648#3374242, @Chenbing.Zheng wrote:

Ping

We need test cases that are targetted specifically at the change being made. If we add a RISCV DAGCombine for (bitreverse (bswap X)) -> brev8 pre-type legalization, then this code change will become untested.

add tests

Herald added a subscriber: pengfei. · View Herald TranscriptMar 10 2022, 7:04 PM

Harbormaster completed remote builds in B153702: Diff 414560.Mar 10 2022, 8:07 PM

fix test

Harbormaster completed remote builds in B153724: Diff 414591.Mar 10 2022, 11:53 PM

Chenbing.Zheng mentioned this in D121448: [DAGCombine] fold (bitreverse(srl (bitreverse c), x)) -> (shl c, x).Mar 11 2022, 1:00 AM

Please can you precommit the additional tests and then rebase this patch so it shows the changes?

Chenbing.Zheng mentioned this in D121504: [DAGCombine] add tests for bswap-shift optimization.Mar 11 2022, 6:53 PM

rebase precommit tests D121504

Harbormaster completed remote builds in B153875: Diff 414786.Mar 11 2022, 7:11 PM

spatel mentioned this in rG83413bb617aa: [x86] reduce indentation; NFC.Mar 16 2022, 10:39 AM

Can you rebase again. I think the changes to bswap-bitreverse.ll have been fixed a different way and no longer apply.

rebase main

Harbormaster completed remote builds in B154776: Diff 416083.Mar 17 2022, 12:32 AM

lebedev.ri added inline comments.Mar 17 2022, 12:44 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9710
9712–9713	I'm guessing it isn't worth it to instead check with the knownbits that low 3 bits are zeros?

address comments

Chenbing.Zheng marked an inline comment as done.Mar 17 2022, 1:37 AM

Chenbing.Zheng added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9712–9713	It seems that x % 8 == 0 is simple and clear.

Harbormaster completed remote builds in B154782: Diff 416091.Mar 17 2022, 1:37 AM

lebedev.ri added inline comments.Mar 17 2022, 1:38 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9712–9713	It seems that x % 8 == 0 is simple and clear. Sure, but does it handle variable shift amounts?

RKSimon added inline comments.Mar 17 2022, 8:05 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9712–9713	It would also add vector support, which probably isn't that relevant but could maybe turn up.

New patch title sounds confusing.

I didn't see a reply to my earlier suggestion - is there a problem with a more general pattern match (independent of the question of using knownbits on the shift amount)?

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 8e383ce85cb7..498d2f51bbd5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -9745,6 +9745,16 @@ SDValue DAGCombiner::visitBSWAP(SDNode *N) {
     }
   }
 
+  if ((N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL) &&
+      N0.hasOneUse()) {
+    auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));
+    if (ShAmt && ShAmt->getZExtValue() % 8 == 0) {
+      SDValue NewSwap = DAG.getNode(ISD::BSWAP, DL, VT, N0.getOperand(0));
+      unsigned InverseShift = N0.getOpcode() == ISD::SHL ? ISD::SRL : ISD::SHL;
+      return DAG.getNode(InverseShift, DL, VT, NewSwap, N0.getOperand(1));
+    }
+  }
+
   return SDValue();
 }

craig.topper added inline comments.Mar 17 2022, 9:32 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9713	Where did we verify that calling getZExtValue() wouldn't assert if the shift was enormuously large. It would definitely be UB, but we can't guarantee it was folded yet.

spatel mentioned this in D122010: [InstCombine] try to canonicalize logical shift after bswap.Mar 18 2022, 8:23 AM

In D120648#3389382, @spatel wrote:

I didn't see a reply to my earlier suggestion - is there a problem with a more general pattern match (independent of the question of using knownbits on the shift amount)?

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 8e383ce85cb7..498d2f51bbd5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -9745,6 +9745,16 @@ SDValue DAGCombiner::visitBSWAP(SDNode *N) {
     }
   }
 
+  if ((N0.getOpcode() == ISD::SHL || N0.getOpcode() == ISD::SRL) &&
+      N0.hasOneUse()) {
+    auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));
+    if (ShAmt && ShAmt->getZExtValue() % 8 == 0) {
+      SDValue NewSwap = DAG.getNode(ISD::BSWAP, DL, VT, N0.getOperand(0));
+      unsigned InverseShift = N0.getOpcode() == ISD::SHL ? ISD::SRL : ISD::SHL;
+      return DAG.getNode(InverseShift, DL, VT, NewSwap, N0.getOperand(1));
+    }
+  }
+
   return SDValue();
 }

I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x) and it may provide more opportunity for optimization;
but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?

In D120648#3393774, @Chenbing.Zheng wrote:

I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x) and it may provide more opportunity for optimization;
but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?

We want to have the more flexible/robust pattern-matching because it will handle cases that may not be obvious/visible yet. By making the minimal transform, we try to ensure that the code will be in a "canonical" form that other transforms can then act on. (That is also why I propose adding the transform in IR in D122010.)

In this case, the key optimization is that a bswap-of-bswap (or bitreverse-of-bitreverse) cancels itself out. That optimization already exists, so we don't need to reimplement it here and other places. Just try to get the code into a form that will allow the existing code to apply. This is a fundamental feature of iterative combining. It may seem less efficient, but it's actually more efficient because we don't need to duplicate as much combiner code to get the same level of consistent optimization.

For example, we do not have this test in D121504 (but I think it should be included):

define i32 @test_bswap_shli_8_bswap_i32(i32 %a) nounwind {
    %1 = call i32 @llvm.bswap.i32(i32 %a)
    %2 = shl i32 %1, 8
    %3 = call i32 @llvm.bswap.i32(i32 %2)
    ret i32 %3
}

We could adjust this patch to also deal with the opposite direction shift, but then it will need more redundant pattern-matching code (especially if we need to duplicate it again for bitreverse).

I didn't step through it, but that might be why this patch fails to make some "rev16" optimizations for ARM code that we will get with the transform that I suggested.

spatel mentioned this in rG60820e53ec9d: [InstCombine] try to canonicalize logical shift after bswap.Mar 22 2022, 6:11 AM

In D120648#3395069, @spatel wrote:
In D120648#3393774, @Chenbing.Zheng wrote:

I thank this is a correct transform, it change bswap(shl x) -> srl (bswap x) or bswap(srl x) -> shl (bswap x) and it may provide more opportunity for optimization;
but looking at this transformation alone, it doesn't do any optimization. Optimization needs to depend on other instructions before and after. Why don't we do accurate optimization?

We want to have the more flexible/robust pattern-matching because it will handle cases that may not be obvious/visible yet. By making the minimal transform, we try to ensure that the code will be in a "canonical" form that other transforms can then act on. (That is also why I propose adding the transform in IR in D122010.)

In this case, the key optimization is that a bswap-of-bswap (or bitreverse-of-bitreverse) cancels itself out. That optimization already exists, so we don't need to reimplement it here and other places. Just try to get the code into a form that will allow the existing code to apply. This is a fundamental feature of iterative combining. It may seem less efficient, but it's actually more efficient because we don't need to duplicate as much combiner code to get the same level of consistent optimization.

For example, we do not have this test in D121504 (but I think it should be included):
define i32 @test_bswap_shli_8_bswap_i32(i32 %a) nounwind {
    %1 = call i32 @llvm.bswap.i32(i32 %a)
    %2 = shl i32 %1, 8
    %3 = call i32 @llvm.bswap.i32(i32 %2)
    ret i32 %3
}
We could adjust this patch to also deal with the opposite direction shift, but then it will need more redundant pattern-matching code (especially if we need to duplicate it again for bitreverse).

I didn't step through it, but that might be why this patch fails to make some "rev16" optimizations for ARM code that we will get with the transform that I suggested.

Sorry for my delayed reply，I think what you said makes sense. I will abundant this patch and I have update D121504 according to your suggestion. Would you mind review it and creat your patch to solve it.
There is a similar problem with bitreverse-shift, and I add tests in D121507.

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 28 2022, 1:25 AM

spatel mentioned this in D122655: [SDAG] try to canonicalize logical shift after bswap.Mar 29 2022, 7:00 AM

In D120648#3410788, @Chenbing.Zheng wrote:

Sorry for my delayed reply，I think what you said makes sense. I will abundant this patch and I have update D121504 according to your suggestion. Would you mind review it and creat your patch to solve it.

Thanks for committing the extra tests. I posted D122655 to transform those. We can generalize it to handle bitreverse as a follow-up patch if that is needed.

spatel mentioned this in rGe18cc5277fd8: [SDAG] try to canonicalize logical shift after bswap.Mar 30 2022, 6:30 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

11 lines

test/

CodeGen/

RISCV/

bswap-bitreverse.ll

24 lines

bswap-srli-bswap.ll

173 lines

X86/

combine-bswap.ll

9 lines

Diff 414786

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,701 Lines • ▼ Show 20 Lines

SDValue DAGCombiner::visitBSWAP(SDNode *N) {

// isn't supported, it will be expanded to bswap followed by a manual reversal

// of bits in each byte. By placing bswaps before bitreverse, we can remove

// the two bswaps if the bitreverse gets expanded.

if (N0.getOpcode() == ISD::BITREVERSE && N0.hasOneUse()) {

SDValue BSwap = DAG.getNode(ISD::BSWAP, DL, VT, N0.getOperand(0));

return DAG.getNode(ISD::BITREVERSE, DL, VT, BSwap);

}

// fold (bswap(srl (bswap c), x)) -> (shl c, x)

lebedev.riUnsubmitted

Done

return DAG.getNode(ISD::BITREVERSE, DL, VT, BSwap);

}

- // fold (bswap(srl (bswap c), x)) -> (shl c, x)

+ // fold (bswap(srl (bswap c), 8*x)) -> (shl c, 8*x)

if (N0->getOpcode() == ISD::SRL && N0.hasOneUse()) {

lebedev.ri:

if (N0->getOpcode() == ISD::SRL && N0.hasOneUse()) {

auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));

if (ShAmt && ShAmt->getZExtValue() % 8 == 0) {

lebedev.riUnsubmitted

Not Done

I'm guessing it isn't worth it to instead check with the knownbits that low 3 bits are zeros?

lebedev.ri: I'm guessing it isn't worth it to instead check with the knownbits that low 3 bits are zeros?

Chenbing.ZhengAuthorUnsubmitted

Done

It seems that x % 8 == 0 is simple and clear.

Chenbing.Zheng: It seems that x % 8 == 0 is simple and clear.

lebedev.riUnsubmitted

Not Done

It seems that x % 8 == 0 is simple and clear.

Sure, but does it handle variable shift amounts?

lebedev.ri: > It seems that x % 8 == 0 is simple and clear. Sure, but does it handle variable shift…

RKSimonUnsubmitted

Not Done

It would also add vector support, which probably isn't that relevant but could maybe turn up.

RKSimon: It would also add vector support, which probably isn't that relevant but could maybe turn up.

craig.topperUnsubmitted

Not Done

Where did we verify that calling getZExtValue() wouldn't assert if the shift was enormuously large. It would definitely be UB, but we can't guarantee it was folded yet.

craig.topper: Where did we verify that calling getZExtValue() wouldn't assert if the shift was enormuously…

SDValue BSwap = N0->getOperand(0);

if (BSwap->getOpcode() == ISD::BSWAP && BSwap.hasOneUse())

return DAG.getNode(ISD::SHL, DL, VT, BSwap->getOperand(0),

N0->getOperand(1));

}

// fold (bswap shl(x,c)) -> (zext(bswap(trunc(shl(x,sub(c,bw/2))))))

// iff x >= bw/2 (i.e. lower half is known zero)

unsigned BW = VT.getScalarSizeInBits();

if (BW >= 32 && N0.getOpcode() == ISD::SHL && N0.hasOneUse()) {

auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1));

EVT HalfVT = EVT::getIntegerVT(*DAG.getContext(), BW / 2);

if (ShAmt && ShAmt->getAPIntValue().ult(BW) &&

ShAmt->getZExtValue() >= (BW / 2) &&

▲ Show 20 Lines • Show All 14,734 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/bswap-bitreverse.ll

	Show First 20 Lines • Show All 842 Lines • ▼ Show 20 Lines
	; RV64ZBB-NEXT: and a1, a1, a2			; RV64ZBB-NEXT: and a1, a1, a2
	; RV64ZBB-NEXT: and a0, a0, a2			; RV64ZBB-NEXT: and a0, a0, a2
	; RV64ZBB-NEXT: slli a0, a0, 1			; RV64ZBB-NEXT: slli a0, a0, 1
	; RV64ZBB-NEXT: or a0, a1, a0			; RV64ZBB-NEXT: or a0, a1, a0
	; RV64ZBB-NEXT: ret			; RV64ZBB-NEXT: ret
	;			;
	; RV32ZBKB-LABEL: test_bswap_bitreverse_i16:			; RV32ZBKB-LABEL: test_bswap_bitreverse_i16:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: rev8 a0, a0			; RV32ZBKB-NEXT: slli a0, a0, 16
	; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: rev8 a0, a0
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: srli a0, a0, 16			; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bswap_bitreverse_i16:			; RV64ZBKB-LABEL: test_bswap_bitreverse_i16:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 48
	; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 48			; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i16 @llvm.bswap.i16(i16 %a)			%tmp = call i16 @llvm.bswap.i16(i16 %a)
	%tmp2 = call i16 @llvm.bitreverse.i16(i16 %tmp)			%tmp2 = call i16 @llvm.bitreverse.i16(i16 %tmp)
	ret i16 %tmp2			ret i16 %tmp2
	}			}

	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	;			;
	; RV32ZBKB-LABEL: test_bswap_bitreverse_i32:			; RV32ZBKB-LABEL: test_bswap_bitreverse_i32:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bswap_bitreverse_i32:			; RV64ZBKB-LABEL: test_bswap_bitreverse_i32:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 32
	; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 32			; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i32 @llvm.bswap.i32(i32 %a)			%tmp = call i32 @llvm.bswap.i32(i32 %a)
	%tmp2 = call i32 @llvm.bitreverse.i32(i32 %tmp)			%tmp2 = call i32 @llvm.bitreverse.i32(i32 %tmp)
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; RV64ZBB-NEXT: and a1, a1, a2			; RV64ZBB-NEXT: and a1, a1, a2
	; RV64ZBB-NEXT: and a0, a0, a2			; RV64ZBB-NEXT: and a0, a0, a2
	; RV64ZBB-NEXT: slli a0, a0, 1			; RV64ZBB-NEXT: slli a0, a0, 1
	; RV64ZBB-NEXT: or a0, a1, a0			; RV64ZBB-NEXT: or a0, a1, a0
	; RV64ZBB-NEXT: ret			; RV64ZBB-NEXT: ret
	;			;
	; RV32ZBKB-LABEL: test_bitreverse_bswap_i16:			; RV32ZBKB-LABEL: test_bitreverse_bswap_i16:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: rev8 a0, a0			; RV32ZBKB-NEXT: slli a0, a0, 16
	; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: rev8 a0, a0
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: srli a0, a0, 16			; RV32ZBKB-NEXT: srli a0, a0, 16
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bitreverse_bswap_i16:			; RV64ZBKB-LABEL: test_bitreverse_bswap_i16:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 48
	; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 48			; RV64ZBKB-NEXT: srli a0, a0, 48
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i16 @llvm.bitreverse.i16(i16 %a)			%tmp = call i16 @llvm.bitreverse.i16(i16 %a)
	%tmp2 = call i16 @llvm.bswap.i16(i16 %tmp)			%tmp2 = call i16 @llvm.bswap.i16(i16 %tmp)
	ret i16 %tmp2			ret i16 %tmp2
	}			}

	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	;			;
	; RV32ZBKB-LABEL: test_bitreverse_bswap_i32:			; RV32ZBKB-LABEL: test_bitreverse_bswap_i32:
	; RV32ZBKB: # %bb.0:			; RV32ZBKB: # %bb.0:
	; RV32ZBKB-NEXT: brev8 a0, a0			; RV32ZBKB-NEXT: brev8 a0, a0
	; RV32ZBKB-NEXT: ret			; RV32ZBKB-NEXT: ret
	;			;
	; RV64ZBKB-LABEL: test_bitreverse_bswap_i32:			; RV64ZBKB-LABEL: test_bitreverse_bswap_i32:
	; RV64ZBKB: # %bb.0:			; RV64ZBKB: # %bb.0:
	; RV64ZBKB-NEXT: rev8 a0, a0			; RV64ZBKB-NEXT: slli a0, a0, 32
	; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: rev8 a0, a0
	; RV64ZBKB-NEXT: brev8 a0, a0			; RV64ZBKB-NEXT: brev8 a0, a0
	; RV64ZBKB-NEXT: srli a0, a0, 32			; RV64ZBKB-NEXT: srli a0, a0, 32
	; RV64ZBKB-NEXT: ret			; RV64ZBKB-NEXT: ret
	%tmp = call i32 @llvm.bitreverse.i32(i32 %a)			%tmp = call i32 @llvm.bitreverse.i32(i32 %a)
	%tmp2 = call i32 @llvm.bswap.i32(i32 %tmp)			%tmp2 = call i32 @llvm.bswap.i32(i32 %tmp)
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/bswap-srli-bswap.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; RV64ZB-NEXT: ret
%2 = lshr i16 %1, 7		%2 = lshr i16 %1, 7
%3 = call i16 @llvm.bswap.i16(i16 %2)		%3 = call i16 @llvm.bswap.i16(i16 %2)
ret i16 %3		ret i16 %3
}		}

define i16 @test_bswap_srli_8_bswap_i16(i16 %a) nounwind {		define i16 @test_bswap_srli_8_bswap_i16(i16 %a) nounwind {
; RV32I-LABEL: test_bswap_srli_8_bswap_i16:		; RV32I-LABEL: test_bswap_srli_8_bswap_i16:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: andi a0, a0, 255
; RV32I-NEXT: slli a0, a0, 8		; RV32I-NEXT: slli a0, a0, 8
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_bswap_srli_8_bswap_i16:		; RV64I-LABEL: test_bswap_srli_8_bswap_i16:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: andi a0, a0, 255
; RV64I-NEXT: slli a0, a0, 8		; RV64I-NEXT: slli a0, a0, 8
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32ZB-LABEL: test_bswap_srli_8_bswap_i16:		; RV32ZB-LABEL: test_bswap_srli_8_bswap_i16:
; RV32ZB: # %bb.0:		; RV32ZB: # %bb.0:
; RV32ZB-NEXT: andi a0, a0, 255		; RV32ZB-NEXT: slli a0, a0, 8
; RV32ZB-NEXT: rev8 a0, a0
; RV32ZB-NEXT: srli a0, a0, 16
; RV32ZB-NEXT: ret		; RV32ZB-NEXT: ret
;		;
; RV64ZB-LABEL: test_bswap_srli_8_bswap_i16:		; RV64ZB-LABEL: test_bswap_srli_8_bswap_i16:
; RV64ZB: # %bb.0:		; RV64ZB: # %bb.0:
; RV64ZB-NEXT: andi a0, a0, 255		; RV64ZB-NEXT: slli a0, a0, 8
; RV64ZB-NEXT: rev8 a0, a0
; RV64ZB-NEXT: srli a0, a0, 48
; RV64ZB-NEXT: ret		; RV64ZB-NEXT: ret
%1 = call i16 @llvm.bswap.i16(i16 %a)		%1 = call i16 @llvm.bswap.i16(i16 %a)
%2 = lshr i16 %1, 8		%2 = lshr i16 %1, 8
%3 = call i16 @llvm.bswap.i16(i16 %2)		%3 = call i16 @llvm.bswap.i16(i16 %2)
ret i16 %3		ret i16 %3
}		}

define i32 @test_bswap_srli_8_bswap_i32(i32 %a) nounwind {		define i32 @test_bswap_srli_8_bswap_i32(i32 %a) nounwind {
; RV32I-LABEL: test_bswap_srli_8_bswap_i32:		; RV32I-LABEL: test_bswap_srli_8_bswap_i32:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: srli a1, a0, 8		; RV32I-NEXT: slli a0, a0, 8
; RV32I-NEXT: lui a2, 16
; RV32I-NEXT: addi a2, a2, -256
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: srli a2, a0, 24
; RV32I-NEXT: or a1, a1, a2
; RV32I-NEXT: slli a2, a0, 8
; RV32I-NEXT: lui a3, 4080
; RV32I-NEXT: and a2, a2, a3
; RV32I-NEXT: slli a0, a0, 24
; RV32I-NEXT: or a0, a0, a2
; RV32I-NEXT: or a0, a0, a1
; RV32I-NEXT: srli a1, a0, 8
; RV32I-NEXT: slli a1, a1, 24
; RV32I-NEXT: and a2, a0, a3
; RV32I-NEXT: or a1, a1, a2
; RV32I-NEXT: srli a0, a0, 16
; RV32I-NEXT: andi a0, a0, -256
; RV32I-NEXT: or a0, a0, a1
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_bswap_srli_8_bswap_i32:		; RV64I-LABEL: test_bswap_srli_8_bswap_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: srliw a1, a0, 8		; RV64I-NEXT: slliw a0, a0, 8
; RV64I-NEXT: lui a2, 16
; RV64I-NEXT: addiw a2, a2, -256
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: srliw a2, a0, 24
; RV64I-NEXT: or a1, a1, a2
; RV64I-NEXT: slli a2, a0, 8
; RV64I-NEXT: lui a3, 4080
; RV64I-NEXT: and a2, a2, a3
; RV64I-NEXT: slliw a0, a0, 24
; RV64I-NEXT: or a0, a0, a2
; RV64I-NEXT: or a0, a0, a1
; RV64I-NEXT: slli a1, a0, 32
; RV64I-NEXT: srli a1, a1, 32
; RV64I-NEXT: srli a1, a1, 8
; RV64I-NEXT: and a2, a0, a3
; RV64I-NEXT: slliw a1, a1, 24
; RV64I-NEXT: or a1, a1, a2
; RV64I-NEXT: srliw a0, a0, 16
; RV64I-NEXT: andi a0, a0, -256
; RV64I-NEXT: or a0, a0, a1
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32ZB-LABEL: test_bswap_srli_8_bswap_i32:		; RV32ZB-LABEL: test_bswap_srli_8_bswap_i32:
; RV32ZB: # %bb.0:		; RV32ZB: # %bb.0:
; RV32ZB-NEXT: rev8 a0, a0		; RV32ZB-NEXT: slli a0, a0, 8
; RV32ZB-NEXT: srli a0, a0, 8
; RV32ZB-NEXT: rev8 a0, a0
; RV32ZB-NEXT: ret		; RV32ZB-NEXT: ret
;		;
; RV64ZB-LABEL: test_bswap_srli_8_bswap_i32:		; RV64ZB-LABEL: test_bswap_srli_8_bswap_i32:
; RV64ZB: # %bb.0:		; RV64ZB: # %bb.0:
; RV64ZB-NEXT: rev8 a0, a0		; RV64ZB-NEXT: slliw a0, a0, 8
; RV64ZB-NEXT: srli a0, a0, 40
; RV64ZB-NEXT: rev8 a0, a0
; RV64ZB-NEXT: srli a0, a0, 32
; RV64ZB-NEXT: ret		; RV64ZB-NEXT: ret
%1 = call i32 @llvm.bswap.i32(i32 %a)		%1 = call i32 @llvm.bswap.i32(i32 %a)
%2 = lshr i32 %1, 8		%2 = lshr i32 %1, 8
%3 = call i32 @llvm.bswap.i32(i32 %2)		%3 = call i32 @llvm.bswap.i32(i32 %2)
ret i32 %3		ret i32 %3
}		}

define i32 @test_bswap_srli_16_bswap_i32(i32 %a) nounwind {		define i32 @test_bswap_srli_16_bswap_i32(i32 %a) nounwind {
; RV32I-LABEL: test_bswap_srli_16_bswap_i32:		; RV32I-LABEL: test_bswap_srli_16_bswap_i32:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: srli a1, a0, 8
; RV32I-NEXT: lui a2, 16
; RV32I-NEXT: addi a2, a2, -256
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: srli a2, a0, 24
; RV32I-NEXT: or a1, a1, a2
; RV32I-NEXT: slli a2, a0, 8
; RV32I-NEXT: lui a3, 4080
; RV32I-NEXT: and a2, a2, a3
; RV32I-NEXT: slli a0, a0, 24
; RV32I-NEXT: or a0, a0, a2
; RV32I-NEXT: or a0, a0, a1
; RV32I-NEXT: srli a1, a0, 16
; RV32I-NEXT: slli a1, a1, 24
; RV32I-NEXT: srli a0, a0, 24
; RV32I-NEXT: slli a0, a0, 16		; RV32I-NEXT: slli a0, a0, 16
; RV32I-NEXT: or a0, a1, a0
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_bswap_srli_16_bswap_i32:		; RV64I-LABEL: test_bswap_srli_16_bswap_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: srliw a1, a0, 8		; RV64I-NEXT: slliw a0, a0, 16
; RV64I-NEXT: lui a2, 16
; RV64I-NEXT: addiw a2, a2, -256
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: srliw a2, a0, 24
; RV64I-NEXT: or a1, a1, a2
; RV64I-NEXT: slli a2, a0, 8
; RV64I-NEXT: lui a3, 4080
; RV64I-NEXT: and a2, a2, a3
; RV64I-NEXT: slli a0, a0, 24
; RV64I-NEXT: or a0, a0, a2
; RV64I-NEXT: or a0, a0, a1
; RV64I-NEXT: slli a0, a0, 32
; RV64I-NEXT: srli a0, a0, 32
; RV64I-NEXT: srli a1, a0, 16
; RV64I-NEXT: srliw a0, a0, 24
; RV64I-NEXT: slli a0, a0, 16
; RV64I-NEXT: slliw a1, a1, 24
; RV64I-NEXT: or a0, a1, a0
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32ZB-LABEL: test_bswap_srli_16_bswap_i32:		; RV32ZB-LABEL: test_bswap_srli_16_bswap_i32:
; RV32ZB: # %bb.0:		; RV32ZB: # %bb.0:
; RV32ZB-NEXT: rev8 a0, a0		; RV32ZB-NEXT: slli a0, a0, 16
; RV32ZB-NEXT: srli a0, a0, 16
; RV32ZB-NEXT: rev8 a0, a0
; RV32ZB-NEXT: ret		; RV32ZB-NEXT: ret
;		;
; RV64ZB-LABEL: test_bswap_srli_16_bswap_i32:		; RV64ZB-LABEL: test_bswap_srli_16_bswap_i32:
; RV64ZB: # %bb.0:		; RV64ZB: # %bb.0:
; RV64ZB-NEXT: rev8 a0, a0		; RV64ZB-NEXT: slliw a0, a0, 16
; RV64ZB-NEXT: srli a0, a0, 48
; RV64ZB-NEXT: rev8 a0, a0
; RV64ZB-NEXT: srli a0, a0, 32
; RV64ZB-NEXT: ret		; RV64ZB-NEXT: ret
%1 = call i32 @llvm.bswap.i32(i32 %a)		%1 = call i32 @llvm.bswap.i32(i32 %a)
%2 = lshr i32 %1, 16		%2 = lshr i32 %1, 16
%3 = call i32 @llvm.bswap.i32(i32 %2)		%3 = call i32 @llvm.bswap.i32(i32 %2)
ret i32 %3		ret i32 %3
}		}

define i32 @test_bswap_srli_24_bswap_i32(i32 %a) nounwind {		define i32 @test_bswap_srli_24_bswap_i32(i32 %a) nounwind {
; RV32I-LABEL: test_bswap_srli_24_bswap_i32:		; RV32I-LABEL: test_bswap_srli_24_bswap_i32:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: slli a0, a0, 24		; RV32I-NEXT: slli a0, a0, 24
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_bswap_srli_24_bswap_i32:		; RV64I-LABEL: test_bswap_srli_24_bswap_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: slliw a0, a0, 24		; RV64I-NEXT: slliw a0, a0, 24
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32ZB-LABEL: test_bswap_srli_24_bswap_i32:		; RV32ZB-LABEL: test_bswap_srli_24_bswap_i32:
; RV32ZB: # %bb.0:		; RV32ZB: # %bb.0:
; RV32ZB-NEXT: andi a0, a0, 255		; RV32ZB-NEXT: slli a0, a0, 24
; RV32ZB-NEXT: rev8 a0, a0
; RV32ZB-NEXT: ret		; RV32ZB-NEXT: ret
;		;
; RV64ZB-LABEL: test_bswap_srli_24_bswap_i32:		; RV64ZB-LABEL: test_bswap_srli_24_bswap_i32:
; RV64ZB: # %bb.0:		; RV64ZB: # %bb.0:
; RV64ZB-NEXT: andi a0, a0, 255		; RV64ZB-NEXT: slliw a0, a0, 24
; RV64ZB-NEXT: rev8 a0, a0
; RV64ZB-NEXT: srli a0, a0, 32
; RV64ZB-NEXT: ret		; RV64ZB-NEXT: ret
%1 = call i32 @llvm.bswap.i32(i32 %a)		%1 = call i32 @llvm.bswap.i32(i32 %a)
%2 = lshr i32 %1, 24		%2 = lshr i32 %1, 24
%3 = call i32 @llvm.bswap.i32(i32 %2)		%3 = call i32 @llvm.bswap.i32(i32 %2)
ret i32 %3		ret i32 %3
}		}

define i64 @test_bswap_srli_48_bswap_i64(i64 %a) nounwind {		define i64 @test_bswap_srli_48_bswap_i64(i64 %a) nounwind {
; RV32I-LABEL: test_bswap_srli_48_bswap_i64:		; RV32I-LABEL: test_bswap_srli_48_bswap_i64:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: srli a1, a0, 8		; RV32I-NEXT: slli a1, a0, 16
; RV32I-NEXT: lui a2, 16
; RV32I-NEXT: addi a2, a2, -256
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: srli a2, a0, 24
; RV32I-NEXT: or a1, a1, a2
; RV32I-NEXT: slli a2, a0, 8
; RV32I-NEXT: lui a3, 4080
; RV32I-NEXT: and a2, a2, a3
; RV32I-NEXT: slli a0, a0, 24
; RV32I-NEXT: or a0, a0, a2
; RV32I-NEXT: or a0, a0, a1
; RV32I-NEXT: srli a1, a0, 16
; RV32I-NEXT: slli a1, a1, 24
; RV32I-NEXT: srli a0, a0, 24
; RV32I-NEXT: slli a0, a0, 16
; RV32I-NEXT: or a1, a1, a0
; RV32I-NEXT: li a0, 0		; RV32I-NEXT: li a0, 0
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_bswap_srli_48_bswap_i64:		; RV64I-LABEL: test_bswap_srli_48_bswap_i64:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: srli a1, a0, 24
; RV64I-NEXT: lui a2, 4080
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: srli a2, a0, 8
; RV64I-NEXT: li a3, 255
; RV64I-NEXT: slli a4, a3, 24
; RV64I-NEXT: and a2, a2, a4
; RV64I-NEXT: or a1, a2, a1
; RV64I-NEXT: srli a2, a0, 40
; RV64I-NEXT: lui a4, 16
; RV64I-NEXT: addiw a4, a4, -256
; RV64I-NEXT: and a2, a2, a4
; RV64I-NEXT: srli a4, a0, 56
; RV64I-NEXT: or a2, a2, a4
; RV64I-NEXT: or a1, a1, a2
; RV64I-NEXT: slli a2, a0, 24
; RV64I-NEXT: slli a4, a3, 40
; RV64I-NEXT: and a2, a2, a4
; RV64I-NEXT: srliw a4, a0, 24
; RV64I-NEXT: slli a4, a4, 32
; RV64I-NEXT: or a2, a2, a4
; RV64I-NEXT: slli a4, a0, 40
; RV64I-NEXT: slli a3, a3, 48
; RV64I-NEXT: and a3, a4, a3
; RV64I-NEXT: slli a0, a0, 56
; RV64I-NEXT: or a0, a0, a3
; RV64I-NEXT: or a0, a0, a2
; RV64I-NEXT: or a0, a0, a1
; RV64I-NEXT: srli a1, a0, 48
; RV64I-NEXT: slli a1, a1, 56
; RV64I-NEXT: srli a0, a0, 56
; RV64I-NEXT: slli a0, a0, 48		; RV64I-NEXT: slli a0, a0, 48
; RV64I-NEXT: or a0, a1, a0
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32ZB-LABEL: test_bswap_srli_48_bswap_i64:		; RV32ZB-LABEL: test_bswap_srli_48_bswap_i64:
; RV32ZB: # %bb.0:		; RV32ZB: # %bb.0:
; RV32ZB-NEXT: rev8 a0, a0		; RV32ZB-NEXT: slli a1, a0, 16
; RV32ZB-NEXT: srli a0, a0, 16
; RV32ZB-NEXT: rev8 a1, a0
; RV32ZB-NEXT: li a0, 0		; RV32ZB-NEXT: li a0, 0
; RV32ZB-NEXT: ret		; RV32ZB-NEXT: ret
;		;
; RV64ZB-LABEL: test_bswap_srli_48_bswap_i64:		; RV64ZB-LABEL: test_bswap_srli_48_bswap_i64:
; RV64ZB: # %bb.0:		; RV64ZB: # %bb.0:
; RV64ZB-NEXT: rev8 a0, a0		; RV64ZB-NEXT: slli a0, a0, 48
; RV64ZB-NEXT: srli a0, a0, 48
; RV64ZB-NEXT: rev8 a0, a0
; RV64ZB-NEXT: ret		; RV64ZB-NEXT: ret
%1 = call i64 @llvm.bswap.i64(i64 %a)		%1 = call i64 @llvm.bswap.i64(i64 %a)
%2 = lshr i64 %1, 48		%2 = lshr i64 %1, 48
%3 = call i64 @llvm.bswap.i64(i64 %2)		%3 = call i64 @llvm.bswap.i64(i64 %2)
ret i64 %3		ret i64 %3
}		}
No newline at end of file

llvm/test/CodeGen/X86/combine-bswap.ll

	Show All 32 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%b = call i32 @llvm.bswap.i32(i32 %a0)			%b = call i32 @llvm.bswap.i32(i32 %a0)
	%c = call i32 @llvm.bswap.i32(i32 %b)			%c = call i32 @llvm.bswap.i32(i32 %b)
	ret i32 %c			ret i32 %c
	}			}

	; TODO: fold (bswap(srl (bswap c), x)) -> (shl c, x)
	define i32 @test_bswap_srli_8_bswap_i32(i32 %a) nounwind {			define i32 @test_bswap_srli_8_bswap_i32(i32 %a) nounwind {
	; X86-LABEL: test_bswap_srli_8_bswap_i32:			; X86-LABEL: test_bswap_srli_8_bswap_i32:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: bswapl %eax			; X86-NEXT: shll $8, %eax
	; X86-NEXT: shrl $8, %eax
	; X86-NEXT: bswapl %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_bswap_srli_8_bswap_i32:			; X64-LABEL: test_bswap_srli_8_bswap_i32:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: bswapl %eax			; X64-NEXT: shll $8, %eax
	; X64-NEXT: shrl $8, %eax
	; X64-NEXT: bswapl %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%1 = call i32 @llvm.bswap.i32(i32 %a)			%1 = call i32 @llvm.bswap.i32(i32 %a)
	%2 = lshr i32 %1, 8			%2 = lshr i32 %1, 8
	%3 = call i32 @llvm.bswap.i32(i32 %2)			%3 = call i32 @llvm.bswap.i32(i32 %2)
	ret i32 %3			ret i32 %3
	}			}

	define i32 @test_demandedbits_bswap(i32 %a0) nounwind {			define i32 @test_demandedbits_bswap(i32 %a0) nounwind {
	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] fold (bswap(srl (bswap c), 8*x)) -> (shl c, 8*x)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 414786

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/RISCV/bswap-bitreverse.ll

llvm/test/CodeGen/RISCV/bswap-srli-bswap.ll

llvm/test/CodeGen/X86/combine-bswap.ll

[DAGCombine] fold (bswap(srl (bswap c), 8x)) -> (shl c, 8x)
AbandonedPublic