This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
TargetLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1
arm64-shifted-sext.ll
-
load-combine.ll
-
AMDGPU/
2
shl.ll
1
wwm-reserved-spill.ll
-
PowerPC/
-
pre-inc-disable.ll
-
RISCV/
-
aext-to-zext.ll
1
rv64i-complex-float.ll
-
X86/
-
fp128-cast.ll
3
parity.ll
1
setcc.ll
3
split-store.ll
1
vector-shuffle-combining-avx512bwvl.ll

Differential D140665

[SelectionDAG][RISCV][X86][AArch64][AMDGPU][PowerPC] Improve SimplifyDemandedBits for SHL with NUW/NSW flags.
Needs RevisionPublic

Authored by liaolucy on Dec 25 2022, 7:59 PM.

Download Raw Diff

Details

Reviewers

craig.topper
asb
spatel
RKSimon
lebedev.ri

Summary

If the shift is NUW/NSW, then it does demand the high bits.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,190 ms	x64 debian > Clang.Driver::fsanitize.c

Event Timeline

liaolucy created this revision.Dec 25 2022, 7:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 25 2022, 7:59 PM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 28 others. · View Herald Transcript

liaolucy requested review of this revision.Dec 25 2022, 7:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 25 2022, 7:59 PM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B204897: Diff 485264.Dec 25 2022, 8:58 PM

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In D140665#4016599, @craig.topper wrote:

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In fact, I prefer the following modifications:

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbf318a85e9e..76aa3c6d7623 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits(
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
+      if (Op.getNode()->getFlags().hasNoUnsignedWrap())
+        InDemandedMask.setHighBits(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;

But this modification, for x86 is not friendly.

If there are more suggestions, please tell.

What prevents SimplifyDemandedBits from turning this back into an aextload and causing an infinite loop.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
10313 ↗	(On Diff #485264)	Need to check that the SHL is the only user of the load or this will duplicate a load.
10317 ↗	(On Diff #485264)	You need to replace the chain output of the original load with the chain output of the new load.

craig.topper requested changes to this revision.Dec 25 2022, 10:31 PM

This revision now requires changes to proceed.Dec 25 2022, 10:31 PM

In D140665#4016600, @liaolucy wrote:

In D140665#4016599, @craig.topper wrote:

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In fact, I prefer the following modifications:

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbf318a85e9e..76aa3c6d7623 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits(
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
+      if (Op.getNode()->getFlags().hasNoUnsignedWrap())
+        InDemandedMask.setHighBits(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;

But this modification, for x86 is not friendly.

If there are more suggestions, please tell.

@spatel @RKSimon @lebedev.ri is it a bug that we are neither demanding the upper bits for nuw/nsw nor removing the poison flags if SimplifyDemandedBits returns true?

In D140665#4016606, @craig.topper wrote:
In D140665#4016600, @liaolucy wrote:
In D140665#4016599, @craig.topper wrote:

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In fact, I prefer the following modifications:
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbf318a85e9e..76aa3c6d7623 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits(
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
+      if (Op.getNode()->getFlags().hasNoUnsignedWrap())
+        InDemandedMask.setHighBits(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;
But this modification, for x86 is not friendly.

If there are more suggestions, please tell.
@spatel @RKSimon @lebedev.ri is it a bug that we are neither demanding the upper bits for nuw/nsw nor removing the poison flags if SimplifyDemandedBits returns true?

FMF are pretty rare in DAG, so this could be a simple oversight. It does seem like we should drop poison flags in that case, yes.

Improve SimplifyDemandedBits for SHL with NUW/NSW flags.

Herald added subscribers: kosarev, ecnelises, kerbowa and 11 others. · View Herald TranscriptDec 26 2022, 5:43 PM

liaolucy added reviewers: spatel, RKSimon, lebedev.ri.Dec 26 2022, 5:45 PM

I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:

%y = and %x, 15
%z = shl nuw %y, 4
 =>
%z = shl %x, 4

Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.

This revision now requires changes to proceed.Dec 26 2022, 5:53 PM

In D140665#4017069, @lebedev.ri wrote:
I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:
%y = and %x, 15
%z = shl nuw %y, 4
 =>
%z = shl %x, 4
Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.

You do have a point, in fact I just want to optimize RISCV. I have a question now, I see InstCombineSimplifyDemanded.cpp has similar support, can you help to explain why?

Harbormaster completed remote builds in B204939: Diff 485325.Dec 26 2022, 6:59 PM

In D140665#4017096, @liaolucy wrote:
In D140665#4017069, @lebedev.ri wrote:
I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:
%y = and %x, 15
%z = shl nuw %y, 4
 =>
%z = shl %x, 4
Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.
You do have a point, in fact I just want to optimize RISCV.

Having re-read the whole thread here, it is not obvious to me what is the original problem that is trying to be solved here?

I have a question now, I see InstCombineSimplifyDemanded.cpp has similar support, can you help to explain why?

Optimizations in back-end, including this one, are mainly to deal with
optimization opportiunities that arise during instruction lowering/legalization,
and other sequences that are not well-exploitable in generic IR.
OTOH, the middle-end optimizations is where the optimizations should happen in general.
It's likely the InstCombineSimplifyDemanded code has the same bug.

It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.

I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.

arsenm added a subscriber: arsenm.Dec 27 2022, 9:01 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/shl.ll
492	This is worse
llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll
48	This is obviously better, but also is -O0

In D140665#4017550, @spatel wrote:

It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.

I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.

I'm not sure what this is trying to fix originally,
but i think the test changes clearly show that we should be going in the opposite direction.

llvm/test/CodeGen/AArch64/arm64-shifted-sext.ll
199	This seems to be a regression
llvm/test/CodeGen/AMDGPU/shl.ll
492	This seems to be a regression
llvm/test/CodeGen/RISCV/rv64i-complex-float.ll
23	This seems to be a regression
llvm/test/CodeGen/X86/parity.ll
404	This seems to be a massive regression
533	This seems to be a massive regression
657	This seems to be a massive regression
llvm/test/CodeGen/X86/setcc.ll
76	This seems to be a regression
llvm/test/CodeGen/X86/split-store.ll
176	This seems to be a regression
197	This seems to be a regression
217	This seems to be a regression
llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
151	This seems to be a regression

In D140665#4017570, @lebedev.ri wrote:

In D140665#4017550, @spatel wrote:

It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.

I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.

I'm not sure what this is trying to fix originally,
but i think the test changes clearly show that we should be going in the opposite direction.

I'll try to fill in the original problem. See the test in aext-to-zext.ll

We have initially have this DAG

Initial selection DAG: %bb.0 'read:entry'                                        
SelectionDAG has 19 nodes:                                                       
  t0: ch,glue = EntryToken                                                       
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
  t3: i64 = Constant<0>                                                          
  t11: i16 = Constant<8>                                                         
              t8: i64 = add nuw t2, Constant:i64<1>                              
            t9: i8,ch = load<(load (s8) from %ir.arrayidx1)> t0, t8, undef:i64   
          t10: i16 = zero_extend t9                                              
        t13: i16 = shl nuw t10, Constant:i64<8>                                  
          t5: i8,ch = load<(load (s8) from %ir.adr)> t0, t2, undef:i64           
        t6: i16 = zero_extend t5                                                 
      t14: i16 = or t13, t6                                                      
    t15: i64 = zero_extend t14                                                   
  t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15                            
  t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

Part way through the first DAGCombine, t10 zero_extend is replaced by an any_extend because the extended bits aren't demanded. The users are a shl by 8 and a zero_extend that reads bits 63:16 of the shl.

The extends get combined with loads to leave this DAG

Optimized lowered selection DAG: %bb.0 'read:entry'                              
SelectionDAG has 15 nodes:                                                       
  t0: ch,glue = EntryToken                                                       
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
            t8: i64 = add nuw t2, Constant:i64<1>                                
          t20: i16,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
        t13: i16 = shl nuw t20, Constant:i64<8>                                  
        t21: i16,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
      t14: i16 = or t13, t21                                                     
    t15: i64 = zero_extend t14                                                   
  t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15                            
  t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

We type legalize and optimize to this

SelectionDAG has 16 nodes:                                                       
  t0: ch,glue = EntryToken                                                       
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
            t8: i64 = add nuw t2, Constant:i64<1>                                
          t22: i64,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
        t23: i64 = shl nuw t22, Constant:i64<8>                                  
        t24: i64,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
      t25: i64 = or t23, t24                                                     
    t27: i64 = and t25, Constant:i64<65535>                                      
  t17: ch,glue = CopyToReg t0, Register:i64 $x10, t27                            
  t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

The t27 'and' would be unnecessary if we had two zextloads instead of a zextload and an aextload.

This patch happens to prevent the original zero_extend from becoming an any_extend. Which solves the problem, but I don't think it's robust.

spatel mentioned this in D140733: [InstSimplify] fold exact divide to poison if it is known to not divide evenly.Dec 28 2022, 11:27 AM

spatel mentioned this in rGf0faea571403: [InstSimplify] fold exact divide to poison if it is known to not divide evenly.Dec 29 2022, 7:30 AM

spatel mentioned this in rGc43a7874a301: [InstCombine] don't let 'exact' inhibit demanded bits folds for udiv.Jan 4 2023, 10:13 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

8 lines

test/

CodeGen/

AArch64/

arm64-shifted-sext.ll

5 lines

load-combine.ll

8 lines

AMDGPU/

shl.ll

1 line

wwm-reserved-spill.ll

11 lines

PowerPC/

pre-inc-disable.ll

106 lines

RISCV/

aext-to-zext.ll

24 lines

rv64i-complex-float.ll

1 line

X86/

14 lines

123 lines

18 lines

8 lines

vector-shuffle-combining-avx512bwvl.ll

4 lines

Diff 485325

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,755 Lines • ▼ Show 20 Lines	if (const APInt *SA =
return TLO.CombineTo(		return TLO.CombineTo(
Op, TLO.DAG.getNode(ISD::SHL, dl, VT, NewExt, NewSA));		Op, TLO.DAG.getNode(ISD::SHL, dl, VT, NewExt, NewSA));
}		}
}		}
}		}
}		}

APInt InDemandedMask = DemandedBits.lshr(ShAmt);		APInt InDemandedMask = DemandedBits.lshr(ShAmt);

		// If the shift is NUW/NSW, then it does demand the high bits.
		SDNodeFlags Flags = Op.getNode()->getFlags();
		if (Flags.hasNoSignedWrap())
		InDemandedMask.setHighBits(ShAmt + 1);
		else if (Flags.hasNoUnsignedWrap())
		InDemandedMask.setHighBits(ShAmt);

if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,		if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
Depth + 1))		Depth + 1))
return true;		return true;
assert(!Known.hasConflict() && "Bits known to be one AND zero?");		assert(!Known.hasConflict() && "Bits known to be one AND zero?");
Known.Zero <<= ShAmt;		Known.Zero <<= ShAmt;
Known.One <<= ShAmt;		Known.One <<= ShAmt;
// low bits known zero.		// low bits known zero.
Known.Zero.setLowBits(ShAmt);		Known.Zero.setLowBits(ShAmt);
▲ Show 20 Lines • Show All 8,683 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-shifted-sext.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	entry:
%conv = sext i16 %inc to i32		%conv = sext i16 %inc to i32
%shr = ashr i32 %conv, 4		%shr = ashr i32 %conv, 4
ret i32 %shr		ret i32 %shr
}		}

define i32 @extendedLeftShiftshortTointBy16(i16 signext %a) nounwind readnone ssp {		define i32 @extendedLeftShiftshortTointBy16(i16 signext %a) nounwind readnone ssp {
; CHECK-LABEL: extendedLeftShiftshortTointBy16:		; CHECK-LABEL: extendedLeftShiftshortTointBy16:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: lsl w8, w0, #16		; CHECK-NEXT: add w8, w0, #1
; CHECK-NEXT: add w0, w8, #16, lsl #12 ; =65536		; CHECK-NEXT: and w8, w8, #0xffff
		lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
		; CHECK-NEXT: lsl w0, w8, #16
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%inc = add i16 %a, 1		%inc = add i16 %a, 1
%conv2 = zext i16 %inc to i32		%conv2 = zext i16 %inc to i32
%shl = shl nuw i32 %conv2, 16		%shl = shl nuw i32 %conv2, 16
ret i32 %shl		ret i32 %shl
}		}

▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/load-combine.ll

	Show First 20 Lines • Show All 572 Lines • ▼ Show 20 Lines
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr s0, [x0]			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ushll v0.8h, v0.8b, #0			; CHECK-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-NEXT: umov w8, v0.h[2]			; CHECK-NEXT: umov w8, v0.h[2]
	; CHECK-NEXT: umov w9, v0.h[1]			; CHECK-NEXT: umov w9, v0.h[1]
	; CHECK-NEXT: umov w10, v0.h[3]			; CHECK-NEXT: umov w10, v0.h[3]
	; CHECK-NEXT: lsl w8, w8, #16			; CHECK-NEXT: lsl w8, w8, #16
	; CHECK-NEXT: bfi w8, w9, #8, #8			; CHECK-NEXT: bfi w8, w9, #8, #8
	; CHECK-NEXT: orr w8, w8, w10, lsl #24			; CHECK-NEXT: bfi w8, w10, #24, #8
	; CHECK-NEXT: str w8, [x1]			; CHECK-NEXT: str w8, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ld = load <4 x i8>, ptr %in, align 4			%ld = load <4 x i8>, ptr %in, align 4

	%e2 = extractelement <4 x i8> %ld, i32 1			%e2 = extractelement <4 x i8> %ld, i32 1
	%e3 = extractelement <4 x i8> %ld, i32 2			%e3 = extractelement <4 x i8> %ld, i32 2
	%e4 = extractelement <4 x i8> %ld, i32 3			%e4 = extractelement <4 x i8> %ld, i32 3

	Show All 14 Lines

	define void @short_vector_to_i32_unused_high_i8(ptr %in, ptr %out, ptr %p) {			define void @short_vector_to_i32_unused_high_i8(ptr %in, ptr %out, ptr %p) {
	; CHECK-LABEL: short_vector_to_i32_unused_high_i8:			; CHECK-LABEL: short_vector_to_i32_unused_high_i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr s0, [x0]			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ldrh w9, [x0]			; CHECK-NEXT: ldrh w9, [x0]
	; CHECK-NEXT: ushll v0.8h, v0.8b, #0			; CHECK-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-NEXT: umov w8, v0.h[2]			; CHECK-NEXT: umov w8, v0.h[2]
	; CHECK-NEXT: orr w8, w9, w8, lsl #16			; CHECK-NEXT: bfi w9, w8, #16, #8
	; CHECK-NEXT: str w8, [x1]			; CHECK-NEXT: str w9, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ld = load <4 x i8>, ptr %in, align 4			%ld = load <4 x i8>, ptr %in, align 4

	%e1 = extractelement <4 x i8> %ld, i32 0			%e1 = extractelement <4 x i8> %ld, i32 0
	%e2 = extractelement <4 x i8> %ld, i32 1			%e2 = extractelement <4 x i8> %ld, i32 1
	%e3 = extractelement <4 x i8> %ld, i32 2			%e3 = extractelement <4 x i8> %ld, i32 2

	%z0 = zext i8 %e1 to i32			%z0 = zext i8 %e1 to i32
	Show All 13 Lines
	define void @short_vector_to_i32_unused_low_i16(ptr %in, ptr %out, ptr %p) {			define void @short_vector_to_i32_unused_low_i16(ptr %in, ptr %out, ptr %p) {
	; CHECK-LABEL: short_vector_to_i32_unused_low_i16:			; CHECK-LABEL: short_vector_to_i32_unused_low_i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr s0, [x0]			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ushll v0.8h, v0.8b, #0			; CHECK-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-NEXT: umov w8, v0.h[3]			; CHECK-NEXT: umov w8, v0.h[3]
	; CHECK-NEXT: umov w9, v0.h[2]			; CHECK-NEXT: umov w9, v0.h[2]
	; CHECK-NEXT: lsl w8, w8, #24			; CHECK-NEXT: lsl w8, w8, #24
	; CHECK-NEXT: orr w8, w8, w9, lsl #16			; CHECK-NEXT: bfi w8, w9, #16, #8
	; CHECK-NEXT: str w8, [x1]			; CHECK-NEXT: str w8, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ld = load <4 x i8>, ptr %in, align 4			%ld = load <4 x i8>, ptr %in, align 4

	%e3 = extractelement <4 x i8> %ld, i32 2			%e3 = extractelement <4 x i8> %ld, i32 2
	%e4 = extractelement <4 x i8> %ld, i32 3			%e4 = extractelement <4 x i8> %ld, i32 3

	%z2 = zext i8 %e3 to i32			%z2 = zext i8 %e3 to i32
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/shl.ll

	Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines
	;			;
	; VI-LABEL: shl_i16_i_s:			; VI-LABEL: shl_i16_i_s:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dword s4, s[0:1], 0x2c			; VI-NEXT: s_load_dword s4, s[0:1], 0x2c
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
				; VI-NEXT: s_and_b32 s4, s4, 15
				arsenmUnsubmitted Not Done Reply Inline Actions This is worse arsenm: This is worse
				lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
	; VI-NEXT: s_lshl_b32 s4, s4, 12			; VI-NEXT: s_lshl_b32 s4, s4, 12
	; VI-NEXT: v_mov_b32_e32 v0, s4			; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: buffer_store_short v0, off, s[0:3], 0			; VI-NEXT: buffer_store_short v0, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; EG-LABEL: shl_i16_i_s:			; EG-LABEL: shl_i16_i_s:
	; EG: ; %bb.0:			; EG: ; %bb.0:
	; EG-NEXT: ALU 0, @8, KC0[], KC1[]			; EG-NEXT: ALU 0, @8, KC0[], KC1[]
	▲ Show 20 Lines • Show All 1,722 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

	Show All 39 Lines
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[40:41], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[40:41], -1
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, s34			; GFX9-O0-NEXT: v_mov_b32_e32 v0, s34
	; GFX9-O0-NEXT: s_nop 1			; GFX9-O0-NEXT: s_nop 1
	; GFX9-O0-NEXT: v_mov_b32_dpp v0, v1 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-O0-NEXT: v_mov_b32_dpp v0, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-O0-NEXT: v_add_u32_e64 v0, v1, v0			; GFX9-O0-NEXT: v_add_u32_e64 v0, v1, v0
	; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]			; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]
	; GFX9-O0-NEXT: v_mov_b32_e32 v4, v0			; GFX9-O0-NEXT: v_mov_b32_e32 v4, v0
	; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[40:41], v3, v4			; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[40:41], v3, v4
	; GFX9-O0-NEXT: v_cndmask_b32_e64 v3, 0, 1, s[40:41]			; GFX9-O0-NEXT: v_cndmask_b32_e64 v3, 0, -1, s[40:41]
				arsenmUnsubmitted Not Done Reply Inline Actions This is obviously better, but also is -O0 arsenm: This is obviously better, but also is -O0
	; GFX9-O0-NEXT: s_mov_b32 s35, 1
	; GFX9-O0-NEXT: v_lshlrev_b32_e64 v3, s35, v3
	; GFX9-O0-NEXT: s_mov_b32 s35, 2			; GFX9-O0-NEXT: s_mov_b32 s35, 2
	; GFX9-O0-NEXT: v_and_b32_e64 v3, v3, s35			; GFX9-O0-NEXT: v_and_b32_e64 v3, v3, s35
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	Show All 29 Lines
	; GFX9-O3-NEXT: v_add_u32_e32 v1, v2, v1			; GFX9-O3-NEXT: v_add_u32_e32 v1, v2, v1
	; GFX9-O3-NEXT: v_add_u32_e32 v0, v3, v0			; GFX9-O3-NEXT: v_add_u32_e32 v0, v3, v0
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_mov_b32_e32 v4, v1			; GFX9-O3-NEXT: v_mov_b32_e32 v4, v1
	; GFX9-O3-NEXT: v_mov_b32_e32 v5, v0			; GFX9-O3-NEXT: v_mov_b32_e32 v5, v0
	; GFX9-O3-NEXT: v_cmp_eq_u32_e32 vcc, v4, v5			; GFX9-O3-NEXT: v_cmp_eq_u32_e32 vcc, v4, v5
	; GFX9-O3-NEXT: v_cndmask_b32_e64 v4, 0, 1, vcc			; GFX9-O3-NEXT: v_cndmask_b32_e64 v4, 0, 1, vcc
	; GFX9-O3-NEXT: v_lshlrev_b32_e32 v4, 1, v4			; GFX9-O3-NEXT: v_lshlrev_b32_e32 v4, 1, v4
	; GFX9-O3-NEXT: v_and_b32_e32 v4, 2, v4
	; GFX9-O3-NEXT: buffer_store_dword v4, off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dword v4, off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: v_readlane_b32 s35, v3, 5			; GFX9-O0-NEXT: v_readlane_b32 s35, v3, 5
	; GFX9-O0-NEXT: s_or_b64 exec, exec, s[34:35]			; GFX9-O0-NEXT: s_or_b64 exec, exec, s[34:35]
	; GFX9-O0-NEXT: v_readlane_b32 s36, v3, 0			; GFX9-O0-NEXT: v_readlane_b32 s36, v3, 0
	; GFX9-O0-NEXT: v_readlane_b32 s37, v3, 1			; GFX9-O0-NEXT: v_readlane_b32 s37, v3, 1
	; GFX9-O0-NEXT: v_readlane_b32 s38, v3, 2			; GFX9-O0-NEXT: v_readlane_b32 s38, v3, 2
	; GFX9-O0-NEXT: v_readlane_b32 s39, v3, 3			; GFX9-O0-NEXT: v_readlane_b32 s39, v3, 3
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[34:35], v0, v4			; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[34:35], v0, v4
	; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[34:35]			; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, -1, s[34:35]
	; GFX9-O0-NEXT: s_mov_b32 s34, 1
	; GFX9-O0-NEXT: v_lshlrev_b32_e64 v0, s34, v0
	; GFX9-O0-NEXT: s_mov_b32 s34, 2			; GFX9-O0-NEXT: s_mov_b32 s34, 2
	; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s34			; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s34
	; GFX9-O0-NEXT: s_mov_b32 s34, 0			; GFX9-O0-NEXT: s_mov_b32 s34, 0
				; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: s_setpc_b64 s[30:31]			; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
	Show All 36 Lines
	; GFX9-O3-NEXT: v_add_u32_e32 v1, v2, v1			; GFX9-O3-NEXT: v_add_u32_e32 v1, v2, v1
	; GFX9-O3-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-O3-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-O3-NEXT: v_mov_b32_e32 v5, v1			; GFX9-O3-NEXT: v_mov_b32_e32 v5, v1
	; GFX9-O3-NEXT: .LBB1_2: ; %merge			; GFX9-O3-NEXT: .LBB1_2: ; %merge
	; GFX9-O3-NEXT: s_or_b64 exec, exec, s[34:35]			; GFX9-O3-NEXT: s_or_b64 exec, exec, s[34:35]
	; GFX9-O3-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5			; GFX9-O3-NEXT: v_cmp_eq_u32_e32 vcc, v3, v5
	; GFX9-O3-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc			; GFX9-O3-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
	; GFX9-O3-NEXT: v_lshlrev_b32_e32 v0, 1, v0			; GFX9-O3-NEXT: v_lshlrev_b32_e32 v0, 1, v0
	; GFX9-O3-NEXT: v_and_b32_e32 v0, 2, v0
	; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]			; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	▲ Show 20 Lines • Show All 1,122 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/pre-inc-disable.ll

	Show All 15 Lines
	; RUN: < %s \| FileCheck %s --check-prefixes=P9BE-AIX32			; RUN: < %s \| FileCheck %s --check-prefixes=P9BE-AIX32

	define void @test64(ptr nocapture readonly %pix2, i32 signext %i_pix2) {			define void @test64(ptr nocapture readonly %pix2, i32 signext %i_pix2) {
	; P9LE-LABEL: test64:			; P9LE-LABEL: test64:
	; P9LE: # %bb.0: # %entry			; P9LE: # %bb.0: # %entry
	; P9LE-NEXT: add 5, 3, 4			; P9LE-NEXT: add 5, 3, 4
	; P9LE-NEXT: lfdx 0, 3, 4			; P9LE-NEXT: lfdx 0, 3, 4
	; P9LE-NEXT: addis 3, 2, .LCPI0_0@toc@ha			; P9LE-NEXT: addis 3, 2, .LCPI0_0@toc@ha
	; P9LE-NEXT: xxlxor 2, 2, 2			; P9LE-NEXT: xxlxor 3, 3, 3
	; P9LE-NEXT: vspltisw 4, 8			; P9LE-NEXT: vspltisw 4, 8
	; P9LE-NEXT: lxsd 3, 4(5)
	; P9LE-NEXT: addi 3, 3, .LCPI0_0@toc@l			; P9LE-NEXT: addi 3, 3, .LCPI0_0@toc@l
	; P9LE-NEXT: vadduwm 4, 4, 4
	; P9LE-NEXT: lxv 1, 0(3)			; P9LE-NEXT: lxv 1, 0(3)
	; P9LE-NEXT: addis 3, 2, .LCPI0_1@toc@ha			; P9LE-NEXT: xxlxor 2, 2, 2
	; P9LE-NEXT: addi 3, 3, .LCPI0_1@toc@l			; P9LE-NEXT: vadduwm 4, 4, 4
				; P9LE-NEXT: xxperm 3, 0, 1
				; P9LE-NEXT: lfd 0, 4(5)
	; P9LE-NEXT: xxperm 2, 0, 1			; P9LE-NEXT: xxperm 2, 0, 1
	; P9LE-NEXT: lxv 0, 0(3)			; P9LE-NEXT: vnegw 2, 2
	; P9LE-NEXT: xxperm 3, 3, 0			; P9LE-NEXT: vslw 2, 2, 4
	; P9LE-NEXT: vnegw 3, 3			; P9LE-NEXT: vsubuwm 2, 2, 3
	; P9LE-NEXT: vslw 3, 3, 4
	; P9LE-NEXT: vsubuwm 2, 3, 2
	; P9LE-NEXT: xxswapd 0, 2			; P9LE-NEXT: xxswapd 0, 2
	; P9LE-NEXT: stxv 0, 0(3)			; P9LE-NEXT: stxv 0, 0(3)
	; P9LE-NEXT: blr			; P9LE-NEXT: blr
	;			;
	; P9BE-LABEL: test64:			; P9BE-LABEL: test64:
	; P9BE: # %bb.0: # %entry			; P9BE: # %bb.0: # %entry
	; P9BE-NEXT: add 5, 3, 4			; P9BE-NEXT: add 5, 3, 4
	; P9BE-NEXT: lxsdx 2, 3, 4			; P9BE-NEXT: lfdx 0, 3, 4
	; P9BE-NEXT: addis 3, 2, .LCPI0_0@toc@ha			; P9BE-NEXT: addis 3, 2, .LCPI0_0@toc@ha
	; P9BE-NEXT: xxlxor 1, 1, 1			; P9BE-NEXT: xxlxor 3, 3, 3
	; P9BE-NEXT: vspltisw 4, 8			; P9BE-NEXT: vspltisw 4, 8
	; P9BE-NEXT: lxsd 3, 4(5)
	; P9BE-NEXT: addi 3, 3, .LCPI0_0@toc@l			; P9BE-NEXT: addi 3, 3, .LCPI0_0@toc@l
				; P9BE-NEXT: lxv 1, 0(3)
				; P9BE-NEXT: xxlxor 2, 2, 2
	; P9BE-NEXT: vadduwm 4, 4, 4			; P9BE-NEXT: vadduwm 4, 4, 4
	; P9BE-NEXT: lxv 0, 0(3)			; P9BE-NEXT: xxperm 3, 0, 1
	; P9BE-NEXT: addis 3, 2, .LCPI0_1@toc@ha			; P9BE-NEXT: lfd 0, 4(5)
	; P9BE-NEXT: addi 3, 3, .LCPI0_1@toc@l			; P9BE-NEXT: xxperm 2, 0, 1
	; P9BE-NEXT: xxperm 2, 1, 0			; P9BE-NEXT: vnegw 2, 2
	; P9BE-NEXT: lxv 0, 0(3)			; P9BE-NEXT: vslw 2, 2, 4
	; P9BE-NEXT: xxperm 3, 3, 0			; P9BE-NEXT: vsubuwm 2, 2, 3
	; P9BE-NEXT: vnegw 3, 3
	; P9BE-NEXT: vslw 3, 3, 4
	; P9BE-NEXT: vsubuwm 2, 3, 2
	; P9BE-NEXT: xxswapd 0, 2			; P9BE-NEXT: xxswapd 0, 2
	; P9BE-NEXT: stxv 0, 0(3)			; P9BE-NEXT: stxv 0, 0(3)
	; P9BE-NEXT: blr			; P9BE-NEXT: blr
	;			;
	; P9BE-AIX-LABEL: test64:			; P9BE-AIX-LABEL: test64:
	; P9BE-AIX: # %bb.0: # %entry			; P9BE-AIX: # %bb.0: # %entry
	; P9BE-AIX-NEXT: add 5, 3, 4			; P9BE-AIX-NEXT: add 5, 3, 4
	; P9BE-AIX-NEXT: lxsdx 2, 3, 4			; P9BE-AIX-NEXT: lfdx 0, 3, 4
	; P9BE-AIX-NEXT: ld 3, L..C0(2) # %const.0			; P9BE-AIX-NEXT: ld 3, L..C0(2) # %const.0
	; P9BE-AIX-NEXT: xxlxor 1, 1, 1			; P9BE-AIX-NEXT: xxlxor 3, 3, 3
				; P9BE-AIX-NEXT: xxlxor 2, 2, 2
	; P9BE-AIX-NEXT: vspltisw 4, 8			; P9BE-AIX-NEXT: vspltisw 4, 8
	; P9BE-AIX-NEXT: lxsd 3, 4(5)
	; P9BE-AIX-NEXT: lxv 0, 0(3)
	; P9BE-AIX-NEXT: ld 3, L..C1(2) # %const.1
	; P9BE-AIX-NEXT: vadduwm 4, 4, 4			; P9BE-AIX-NEXT: vadduwm 4, 4, 4
	; P9BE-AIX-NEXT: xxperm 2, 1, 0			; P9BE-AIX-NEXT: lxv 1, 0(3)
	; P9BE-AIX-NEXT: lxv 0, 0(3)			; P9BE-AIX-NEXT: xxperm 3, 0, 1
	; P9BE-AIX-NEXT: xxperm 3, 3, 0			; P9BE-AIX-NEXT: lfd 0, 4(5)
	; P9BE-AIX-NEXT: vnegw 3, 3			; P9BE-AIX-NEXT: xxperm 2, 0, 1
	; P9BE-AIX-NEXT: vslw 3, 3, 4			; P9BE-AIX-NEXT: vnegw 2, 2
	; P9BE-AIX-NEXT: vsubuwm 2, 3, 2			; P9BE-AIX-NEXT: vslw 2, 2, 4
				; P9BE-AIX-NEXT: vsubuwm 2, 2, 3
	; P9BE-AIX-NEXT: xxswapd 0, 2			; P9BE-AIX-NEXT: xxswapd 0, 2
	; P9BE-AIX-NEXT: stxv 0, 0(3)			; P9BE-AIX-NEXT: stxv 0, 0(3)
	; P9BE-AIX-NEXT: blr			; P9BE-AIX-NEXT: blr
	;			;
	; P9BE-AIX32-LABEL: test64:			; P9BE-AIX32-LABEL: test64:
	; P9BE-AIX32: # %bb.0: # %entry			; P9BE-AIX32: # %bb.0: # %entry
	; P9BE-AIX32-NEXT: lwzux 4, 3, 4			; P9BE-AIX32-NEXT: lwzux 4, 3, 4
	; P9BE-AIX32-NEXT: xxlxor 2, 2, 2			; P9BE-AIX32-NEXT: xxlxor 4, 4, 4
	; P9BE-AIX32-NEXT: vspltisw 4, 8			; P9BE-AIX32-NEXT: xxlxor 3, 3, 3
	; P9BE-AIX32-NEXT: stw 4, -48(1)			; P9BE-AIX32-NEXT: stw 4, -48(1)
	; P9BE-AIX32-NEXT: vadduwm 4, 4, 4
	; P9BE-AIX32-NEXT: lwz 4, 4(3)			; P9BE-AIX32-NEXT: lwz 4, 4(3)
	; P9BE-AIX32-NEXT: lxv 0, -48(1)			; P9BE-AIX32-NEXT: lxv 0, -48(1)
	; P9BE-AIX32-NEXT: stw 4, -32(1)			; P9BE-AIX32-NEXT: stw 4, -32(1)
	; P9BE-AIX32-NEXT: lwz 4, L..C0(2) # %const.0			; P9BE-AIX32-NEXT: lwz 4, L..C0(2) # %const.0
	; P9BE-AIX32-NEXT: lxv 1, -32(1)			; P9BE-AIX32-NEXT: lxv 1, -32(1)
	; P9BE-AIX32-NEXT: lwz 3, 8(3)			; P9BE-AIX32-NEXT: lwz 3, 8(3)
	; P9BE-AIX32-NEXT: stw 3, -16(1)			; P9BE-AIX32-NEXT: stw 3, -16(1)
	; P9BE-AIX32-NEXT: lwz 3, L..C1(2) # %const.1
	; P9BE-AIX32-NEXT: xxmrghw 2, 0, 1			; P9BE-AIX32-NEXT: xxmrghw 2, 0, 1
	; P9BE-AIX32-NEXT: lxv 0, 0(4)			; P9BE-AIX32-NEXT: lxv 0, 0(4)
	; P9BE-AIX32-NEXT: xxperm 2, 2, 0			; P9BE-AIX32-NEXT: lxv 2, -16(1)
	; P9BE-AIX32-NEXT: lxv 0, -16(1)			; P9BE-AIX32-NEXT: xxperm 4, 2, 0
	; P9BE-AIX32-NEXT: xxmrghw 3, 1, 0			; P9BE-AIX32-NEXT: xxmrghw 2, 1, 2
	; P9BE-AIX32-NEXT: lxv 0, 0(3)			; P9BE-AIX32-NEXT: xxperm 3, 2, 0
	; P9BE-AIX32-NEXT: xxperm 3, 3, 0			; P9BE-AIX32-NEXT: vnegw 2, 3
	; P9BE-AIX32-NEXT: vnegw 3, 3			; P9BE-AIX32-NEXT: vspltisw 3, 8
	; P9BE-AIX32-NEXT: vslw 3, 3, 4			; P9BE-AIX32-NEXT: vadduwm 3, 3, 3
	; P9BE-AIX32-NEXT: vsubuwm 2, 3, 2			; P9BE-AIX32-NEXT: vslw 2, 2, 3
				; P9BE-AIX32-NEXT: vsubuwm 2, 2, 4
	; P9BE-AIX32-NEXT: xxswapd 0, 2			; P9BE-AIX32-NEXT: xxswapd 0, 2
	; P9BE-AIX32-NEXT: stxv 0, 0(3)			; P9BE-AIX32-NEXT: stxv 0, 0(3)
	; P9BE-AIX32-NEXT: blr			; P9BE-AIX32-NEXT: blr
	entry:			entry:
	%idx.ext63 = sext i32 %i_pix2 to i64			%idx.ext63 = sext i32 %i_pix2 to i64
	%add.ptr64 = getelementptr inbounds i8, ptr %pix2, i64 %idx.ext63			%add.ptr64 = getelementptr inbounds i8, ptr %pix2, i64 %idx.ext63
	%arrayidx5.1 = getelementptr inbounds i8, ptr %add.ptr64, i64 4			%arrayidx5.1 = getelementptr inbounds i8, ptr %add.ptr64, i64 4
	%0 = load <4 x i16>, ptr %add.ptr64, align 1			%0 = load <4 x i16>, ptr %add.ptr64, align 1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; P9BE-NEXT: xxswapd 0, 2			; P9BE-NEXT: xxswapd 0, 2
	; P9BE-NEXT: stxv 0, 0(3)			; P9BE-NEXT: stxv 0, 0(3)
	; P9BE-NEXT: blr			; P9BE-NEXT: blr
	;			;
	; P9BE-AIX-LABEL: test32:			; P9BE-AIX-LABEL: test32:
	; P9BE-AIX: # %bb.0: # %entry			; P9BE-AIX: # %bb.0: # %entry
	; P9BE-AIX-NEXT: add 5, 3, 4			; P9BE-AIX-NEXT: add 5, 3, 4
	; P9BE-AIX-NEXT: lfiwzx 0, 3, 4			; P9BE-AIX-NEXT: lfiwzx 0, 3, 4
	; P9BE-AIX-NEXT: ld 3, L..C2(2) # %const.0			; P9BE-AIX-NEXT: ld 3, L..C1(2) # %const.0
	; P9BE-AIX-NEXT: xxlxor 3, 3, 3			; P9BE-AIX-NEXT: xxlxor 3, 3, 3
	; P9BE-AIX-NEXT: xxlxor 2, 2, 2			; P9BE-AIX-NEXT: xxlxor 2, 2, 2
	; P9BE-AIX-NEXT: vspltisw 4, 8			; P9BE-AIX-NEXT: vspltisw 4, 8
	; P9BE-AIX-NEXT: vadduwm 4, 4, 4			; P9BE-AIX-NEXT: vadduwm 4, 4, 4
	; P9BE-AIX-NEXT: lxv 1, 0(3)			; P9BE-AIX-NEXT: lxv 1, 0(3)
	; P9BE-AIX-NEXT: li 3, 4			; P9BE-AIX-NEXT: li 3, 4
	; P9BE-AIX-NEXT: xxperm 3, 0, 1			; P9BE-AIX-NEXT: xxperm 3, 0, 1
	; P9BE-AIX-NEXT: lfiwzx 0, 5, 3			; P9BE-AIX-NEXT: lfiwzx 0, 5, 3
	; P9BE-AIX-NEXT: xxperm 2, 0, 1			; P9BE-AIX-NEXT: xxperm 2, 0, 1
	; P9BE-AIX-NEXT: vnegw 2, 2			; P9BE-AIX-NEXT: vnegw 2, 2
	; P9BE-AIX-NEXT: vslw 2, 2, 4			; P9BE-AIX-NEXT: vslw 2, 2, 4
	; P9BE-AIX-NEXT: vsubuwm 2, 2, 3			; P9BE-AIX-NEXT: vsubuwm 2, 2, 3
	; P9BE-AIX-NEXT: xxswapd 0, 2			; P9BE-AIX-NEXT: xxswapd 0, 2
	; P9BE-AIX-NEXT: stxv 0, 0(3)			; P9BE-AIX-NEXT: stxv 0, 0(3)
	; P9BE-AIX-NEXT: blr			; P9BE-AIX-NEXT: blr
	;			;
	; P9BE-AIX32-LABEL: test32:			; P9BE-AIX32-LABEL: test32:
	; P9BE-AIX32: # %bb.0: # %entry			; P9BE-AIX32: # %bb.0: # %entry
	; P9BE-AIX32-NEXT: add 5, 3, 4			; P9BE-AIX32-NEXT: add 5, 3, 4
	; P9BE-AIX32-NEXT: lfiwzx 0, 3, 4			; P9BE-AIX32-NEXT: lfiwzx 0, 3, 4
	; P9BE-AIX32-NEXT: lwz 3, L..C2(2) # %const.0			; P9BE-AIX32-NEXT: lwz 3, L..C1(2) # %const.0
	; P9BE-AIX32-NEXT: xxlxor 3, 3, 3			; P9BE-AIX32-NEXT: xxlxor 3, 3, 3
	; P9BE-AIX32-NEXT: xxlxor 2, 2, 2			; P9BE-AIX32-NEXT: xxlxor 2, 2, 2
	; P9BE-AIX32-NEXT: vspltisw 4, 8			; P9BE-AIX32-NEXT: vspltisw 4, 8
	; P9BE-AIX32-NEXT: vadduwm 4, 4, 4			; P9BE-AIX32-NEXT: vadduwm 4, 4, 4
	; P9BE-AIX32-NEXT: lxv 1, 0(3)			; P9BE-AIX32-NEXT: lxv 1, 0(3)
	; P9BE-AIX32-NEXT: li 3, 4			; P9BE-AIX32-NEXT: li 3, 4
	; P9BE-AIX32-NEXT: xxperm 3, 0, 1			; P9BE-AIX32-NEXT: xxperm 3, 0, 1
	; P9BE-AIX32-NEXT: lfiwzx 0, 5, 3			; P9BE-AIX32-NEXT: lfiwzx 0, 5, 3
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; P9BE-NEXT: # %bb.1: # %if.then			; P9BE-NEXT: # %bb.1: # %if.then
	;			;
	; P9BE-AIX-LABEL: test16:			; P9BE-AIX-LABEL: test16:
	; P9BE-AIX: # %bb.0: # %entry			; P9BE-AIX: # %bb.0: # %entry
	; P9BE-AIX-NEXT: sldi 4, 4, 1			; P9BE-AIX-NEXT: sldi 4, 4, 1
	; P9BE-AIX-NEXT: li 7, 16			; P9BE-AIX-NEXT: li 7, 16
	; P9BE-AIX-NEXT: add 6, 3, 4			; P9BE-AIX-NEXT: add 6, 3, 4
	; P9BE-AIX-NEXT: lxsihzx 0, 6, 7			; P9BE-AIX-NEXT: lxsihzx 0, 6, 7
	; P9BE-AIX-NEXT: ld 6, L..C3(2) # %const.0			; P9BE-AIX-NEXT: ld 6, L..C2(2) # %const.0
	; P9BE-AIX-NEXT: lxv 1, 0(6)			; P9BE-AIX-NEXT: lxv 1, 0(6)
	; P9BE-AIX-NEXT: li 6, 0			; P9BE-AIX-NEXT: li 6, 0
	; P9BE-AIX-NEXT: mtvsrwz 2, 6			; P9BE-AIX-NEXT: mtvsrwz 2, 6
	; P9BE-AIX-NEXT: vmr 3, 2			; P9BE-AIX-NEXT: vmr 3, 2
	; P9BE-AIX-NEXT: vsplth 4, 2, 3			; P9BE-AIX-NEXT: vsplth 4, 2, 3
	; P9BE-AIX-NEXT: xxperm 3, 0, 1			; P9BE-AIX-NEXT: xxperm 3, 0, 1
	; P9BE-AIX-NEXT: lxsihzx 0, 3, 4			; P9BE-AIX-NEXT: lxsihzx 0, 3, 4
	; P9BE-AIX-NEXT: ld 3, L..C4(2) # %const.1			; P9BE-AIX-NEXT: ld 3, L..C3(2) # %const.1
	; P9BE-AIX-NEXT: xxperm 2, 0, 1			; P9BE-AIX-NEXT: xxperm 2, 0, 1
	; P9BE-AIX-NEXT: lxv 0, 0(3)			; P9BE-AIX-NEXT: lxv 0, 0(3)
	; P9BE-AIX-NEXT: li 3, 0			; P9BE-AIX-NEXT: li 3, 0
	; P9BE-AIX-NEXT: xxmrghw 2, 4, 2			; P9BE-AIX-NEXT: xxmrghw 2, 4, 2
	; P9BE-AIX-NEXT: xxperm 3, 2, 0			; P9BE-AIX-NEXT: xxperm 3, 2, 0
	; P9BE-AIX-NEXT: xxspltw 2, 3, 1			; P9BE-AIX-NEXT: xxspltw 2, 3, 1
	; P9BE-AIX-NEXT: vadduwm 2, 3, 2			; P9BE-AIX-NEXT: vadduwm 2, 3, 2
	; P9BE-AIX-NEXT: vextuwlx 3, 3, 2			; P9BE-AIX-NEXT: vextuwlx 3, 3, 2
	; P9BE-AIX-NEXT: cmpw 3, 5			; P9BE-AIX-NEXT: cmpw 3, 5
	; P9BE-AIX-NEXT: bgelr+ 0			; P9BE-AIX-NEXT: bgelr+ 0
	; P9BE-AIX-NEXT: # %bb.1: # %if.then			; P9BE-AIX-NEXT: # %bb.1: # %if.then
	;			;
	; P9BE-AIX32-LABEL: test16:			; P9BE-AIX32-LABEL: test16:
	; P9BE-AIX32: # %bb.0: # %entry			; P9BE-AIX32: # %bb.0: # %entry
	; P9BE-AIX32-NEXT: slwi 4, 4, 1			; P9BE-AIX32-NEXT: slwi 4, 4, 1
	; P9BE-AIX32-NEXT: li 6, 0			; P9BE-AIX32-NEXT: li 6, 0
	; P9BE-AIX32-NEXT: lhzux 4, 3, 4			; P9BE-AIX32-NEXT: lhzux 4, 3, 4
	; P9BE-AIX32-NEXT: lhz 3, 16(3)			; P9BE-AIX32-NEXT: lhz 3, 16(3)
	; P9BE-AIX32-NEXT: sth 6, -64(1)			; P9BE-AIX32-NEXT: sth 6, -64(1)
	; P9BE-AIX32-NEXT: lxv 2, -64(1)			; P9BE-AIX32-NEXT: lxv 2, -64(1)
	; P9BE-AIX32-NEXT: sth 4, -48(1)			; P9BE-AIX32-NEXT: sth 4, -48(1)
	; P9BE-AIX32-NEXT: lxv 4, -48(1)			; P9BE-AIX32-NEXT: lxv 4, -48(1)
	; P9BE-AIX32-NEXT: sth 3, -32(1)			; P9BE-AIX32-NEXT: sth 3, -32(1)
	; P9BE-AIX32-NEXT: lwz 3, L..C3(2) # %const.0			; P9BE-AIX32-NEXT: lwz 3, L..C2(2) # %const.0
	; P9BE-AIX32-NEXT: lxv 3, -32(1)			; P9BE-AIX32-NEXT: lxv 3, -32(1)
	; P9BE-AIX32-NEXT: vmrghh 4, 2, 4			; P9BE-AIX32-NEXT: vmrghh 4, 2, 4
	; P9BE-AIX32-NEXT: lxv 0, 0(3)			; P9BE-AIX32-NEXT: lxv 0, 0(3)
	; P9BE-AIX32-NEXT: vmrghh 3, 2, 3			; P9BE-AIX32-NEXT: vmrghh 3, 2, 3
	; P9BE-AIX32-NEXT: vsplth 2, 2, 0			; P9BE-AIX32-NEXT: vsplth 2, 2, 0
	; P9BE-AIX32-NEXT: xxmrghw 2, 2, 4			; P9BE-AIX32-NEXT: xxmrghw 2, 2, 4
	; P9BE-AIX32-NEXT: xxperm 3, 2, 0			; P9BE-AIX32-NEXT: xxperm 3, 2, 0
	; P9BE-AIX32-NEXT: xxspltw 2, 3, 1			; P9BE-AIX32-NEXT: xxspltw 2, 3, 1
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; P9BE-NEXT: bgelr+ 0			; P9BE-NEXT: bgelr+ 0
	; P9BE-NEXT: # %bb.1: # %if.then			; P9BE-NEXT: # %bb.1: # %if.then
	;			;
	; P9BE-AIX-LABEL: test8:			; P9BE-AIX-LABEL: test8:
	; P9BE-AIX: # %bb.0: # %entry			; P9BE-AIX: # %bb.0: # %entry
	; P9BE-AIX-NEXT: add 6, 3, 4			; P9BE-AIX-NEXT: add 6, 3, 4
	; P9BE-AIX-NEXT: li 7, 8			; P9BE-AIX-NEXT: li 7, 8
	; P9BE-AIX-NEXT: lxsibzx 0, 6, 7			; P9BE-AIX-NEXT: lxsibzx 0, 6, 7
	; P9BE-AIX-NEXT: ld 6, L..C5(2) # %const.0			; P9BE-AIX-NEXT: ld 6, L..C4(2) # %const.0
	; P9BE-AIX-NEXT: lxv 1, 0(6)			; P9BE-AIX-NEXT: lxv 1, 0(6)
	; P9BE-AIX-NEXT: li 6, 0			; P9BE-AIX-NEXT: li 6, 0
	; P9BE-AIX-NEXT: mtvsrwz 2, 6			; P9BE-AIX-NEXT: mtvsrwz 2, 6
	; P9BE-AIX-NEXT: vspltb 3, 2, 7			; P9BE-AIX-NEXT: vspltb 3, 2, 7
	; P9BE-AIX-NEXT: xxperm 0, 2, 1			; P9BE-AIX-NEXT: xxperm 0, 2, 1
	; P9BE-AIX-NEXT: lxsibzx 1, 3, 4			; P9BE-AIX-NEXT: lxsibzx 1, 3, 4
	; P9BE-AIX-NEXT: ld 3, L..C6(2) # %const.1			; P9BE-AIX-NEXT: ld 3, L..C5(2) # %const.1
	; P9BE-AIX-NEXT: lxv 2, 0(3)			; P9BE-AIX-NEXT: lxv 2, 0(3)
	; P9BE-AIX-NEXT: ld 3, L..C7(2) # %const.2			; P9BE-AIX-NEXT: ld 3, L..C6(2) # %const.2
	; P9BE-AIX-NEXT: xxperm 2, 1, 2			; P9BE-AIX-NEXT: xxperm 2, 1, 2
	; P9BE-AIX-NEXT: xxspltw 1, 3, 0			; P9BE-AIX-NEXT: xxspltw 1, 3, 0
	; P9BE-AIX-NEXT: vmrghh 2, 2, 3			; P9BE-AIX-NEXT: vmrghh 2, 2, 3
	; P9BE-AIX-NEXT: xxmrghw 2, 2, 0			; P9BE-AIX-NEXT: xxmrghw 2, 2, 0
	; P9BE-AIX-NEXT: lxv 0, 0(3)			; P9BE-AIX-NEXT: lxv 0, 0(3)
	; P9BE-AIX-NEXT: li 3, 0			; P9BE-AIX-NEXT: li 3, 0
	; P9BE-AIX-NEXT: xxperm 2, 1, 0			; P9BE-AIX-NEXT: xxperm 2, 1, 0
	; P9BE-AIX-NEXT: xxspltw 3, 2, 1			; P9BE-AIX-NEXT: xxspltw 3, 2, 1
	; P9BE-AIX-NEXT: vadduwm 2, 2, 3			; P9BE-AIX-NEXT: vadduwm 2, 2, 3
	; P9BE-AIX-NEXT: vextuwlx 3, 3, 2			; P9BE-AIX-NEXT: vextuwlx 3, 3, 2
	; P9BE-AIX-NEXT: cmpw 3, 5			; P9BE-AIX-NEXT: cmpw 3, 5
	; P9BE-AIX-NEXT: bgelr+ 0			; P9BE-AIX-NEXT: bgelr+ 0
	; P9BE-AIX-NEXT: # %bb.1: # %if.then			; P9BE-AIX-NEXT: # %bb.1: # %if.then
	;			;
	; P9BE-AIX32-LABEL: test8:			; P9BE-AIX32-LABEL: test8:
	; P9BE-AIX32: # %bb.0: # %entry			; P9BE-AIX32: # %bb.0: # %entry
	; P9BE-AIX32-NEXT: add 6, 3, 4			; P9BE-AIX32-NEXT: add 6, 3, 4
	; P9BE-AIX32-NEXT: li 7, 8			; P9BE-AIX32-NEXT: li 7, 8
	; P9BE-AIX32-NEXT: lxsibzx 0, 6, 7			; P9BE-AIX32-NEXT: lxsibzx 0, 6, 7
	; P9BE-AIX32-NEXT: lwz 6, L..C4(2) # %const.0			; P9BE-AIX32-NEXT: lwz 6, L..C3(2) # %const.0
	; P9BE-AIX32-NEXT: lxv 1, 0(6)			; P9BE-AIX32-NEXT: lxv 1, 0(6)
	; P9BE-AIX32-NEXT: li 6, 0			; P9BE-AIX32-NEXT: li 6, 0
	; P9BE-AIX32-NEXT: mtvsrwz 2, 6			; P9BE-AIX32-NEXT: mtvsrwz 2, 6
	; P9BE-AIX32-NEXT: vspltb 3, 2, 7			; P9BE-AIX32-NEXT: vspltb 3, 2, 7
	; P9BE-AIX32-NEXT: xxperm 0, 2, 1			; P9BE-AIX32-NEXT: xxperm 0, 2, 1
	; P9BE-AIX32-NEXT: lxsibzx 1, 3, 4			; P9BE-AIX32-NEXT: lxsibzx 1, 3, 4
	; P9BE-AIX32-NEXT: lwz 3, L..C5(2) # %const.1			; P9BE-AIX32-NEXT: lwz 3, L..C4(2) # %const.1
	; P9BE-AIX32-NEXT: lxv 2, 0(3)			; P9BE-AIX32-NEXT: lxv 2, 0(3)
	; P9BE-AIX32-NEXT: lwz 3, L..C6(2) # %const.2			; P9BE-AIX32-NEXT: lwz 3, L..C5(2) # %const.2
	; P9BE-AIX32-NEXT: xxperm 2, 1, 2			; P9BE-AIX32-NEXT: xxperm 2, 1, 2
	; P9BE-AIX32-NEXT: xxspltw 1, 3, 0			; P9BE-AIX32-NEXT: xxspltw 1, 3, 0
	; P9BE-AIX32-NEXT: vmrghh 2, 2, 3			; P9BE-AIX32-NEXT: vmrghh 2, 2, 3
	; P9BE-AIX32-NEXT: xxmrghw 2, 2, 0			; P9BE-AIX32-NEXT: xxmrghw 2, 2, 0
	; P9BE-AIX32-NEXT: lxv 0, 0(3)			; P9BE-AIX32-NEXT: lxv 0, 0(3)
	; P9BE-AIX32-NEXT: xxperm 2, 1, 0			; P9BE-AIX32-NEXT: xxperm 2, 1, 0
	; P9BE-AIX32-NEXT: xxspltw 3, 2, 1			; P9BE-AIX32-NEXT: xxspltw 3, 2, 1
	; P9BE-AIX32-NEXT: vadduwm 2, 2, 3			; P9BE-AIX32-NEXT: vadduwm 2, 2, 3
	Show All 36 Lines

llvm/test/CodeGen/RISCV/aext-to-zext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV64I

				; We prefer to zero extend for zextload. The default behavior in
				; TargetLowering::SimplifyDemandedBits is convert zero_extend into any_extend.
				define zeroext i16 @read(ptr nocapture noundef readonly %adr) {
				; RV64I-LABEL: read:
				; RV64I: # %bb.0: # %entry
				; RV64I-NEXT: lbu a1, 1(a0)
				; RV64I-NEXT: lbu a0, 0(a0)
				; RV64I-NEXT: slli a1, a1, 8
				; RV64I-NEXT: or a0, a1, a0
				; RV64I-NEXT: ret
				entry:
				%0 = load i8, ptr %adr, align 1
				%conv = zext i8 %0 to i16
				%arrayidx1 = getelementptr inbounds i8, ptr %adr, i64 1
				%1 = load i8, ptr %arrayidx1, align 1
				%conv2 = zext i8 %1 to i16
				%shl = shl nuw i16 %conv2, 8
				%or = or i16 %shl, %conv
				ret i16 %or
				}

llvm/test/CodeGen/RISCV/rv64i-complex-float.ll

	Show All 14 Lines
	; CHECK-NEXT: sd s2, 0(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s2, 0(sp) # 8-byte Folded Spill
	; CHECK-NEXT: srli s0, a0, 32			; CHECK-NEXT: srli s0, a0, 32
	; CHECK-NEXT: srli s1, a1, 32			; CHECK-NEXT: srli s1, a1, 32
	; CHECK-NEXT: call __addsf3@plt			; CHECK-NEXT: call __addsf3@plt
	; CHECK-NEXT: mv s2, a0			; CHECK-NEXT: mv s2, a0
	; CHECK-NEXT: mv a0, s0			; CHECK-NEXT: mv a0, s0
	; CHECK-NEXT: mv a1, s1			; CHECK-NEXT: mv a1, s1
	; CHECK-NEXT: call __addsf3@plt			; CHECK-NEXT: call __addsf3@plt
				; CHECK-NEXT: andi a0, a0, -1
				lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
	; CHECK-NEXT: slli a0, a0, 32			; CHECK-NEXT: slli a0, a0, 32
	; CHECK-NEXT: slli s2, s2, 32			; CHECK-NEXT: slli s2, s2, 32
	; CHECK-NEXT: srli a1, s2, 32			; CHECK-NEXT: srli a1, s2, 32
	; CHECK-NEXT: or a0, a0, a1			; CHECK-NEXT: or a0, a0, a1
	; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s1, 8(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s2, 0(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s2, 0(sp) # 8-byte Folded Reload
	Show All 23 Lines

llvm/test/CodeGen/X86/fp128-cast.ll

	Show First 20 Lines • Show All 1,215 Lines • ▼ Show 20 Lines
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: movl {{[0-9]+}}(%esp), %edx			; X32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-NEXT: movl {{[0-9]+}}(%esp), %esi			; X32-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %edi			; X32-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X32-NEXT: addl $3, %ecx			; X32-NEXT: addl $3, %esi
	; X32-NEXT: adcl $0, %edx
	; X32-NEXT: adcl $0, %esi
	; X32-NEXT: adcl $0, %edi			; X32-NEXT: adcl $0, %edi
	; X32-NEXT: movl %esi, 8(%eax)			; X32-NEXT: adcl $0, %ecx
	; X32-NEXT: movl %edx, 4(%eax)			; X32-NEXT: adcl $0, %edx
	; X32-NEXT: movl %ecx, (%eax)			; X32-NEXT: movl %ecx, 8(%eax)
	; X32-NEXT: movl %edi, 12(%eax)			; X32-NEXT: movl %edi, 4(%eax)
				; X32-NEXT: movl %esi, (%eax)
				; X32-NEXT: movl %edx, 12(%eax)
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
	; X32-NEXT: retl $4			; X32-NEXT: retl $4
	;			;
	; X64-AVX-LABEL: TestPair128:			; X64-AVX-LABEL: TestPair128:
	; X64-AVX: # %bb.0: # %entry			; X64-AVX: # %bb.0: # %entry
	; X64-AVX-NEXT: addq $3, %rsi			; X64-AVX-NEXT: addq $3, %rsi
	; X64-AVX-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)			; X64-AVX-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
	▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/parity.ll

	Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%a = and i16 %x, 15			%a = and i16 %x, 15
	%b = tail call i16 @llvm.ctpop.i16(i16 %a)			%b = tail call i16 @llvm.ctpop.i16(i16 %a)
	%c = and i16 %b, 1			%c = and i16 %b, 1
	ret i16 %c			ret i16 %c
	}			}

	define i16 @parity_16_shift(i16 %0) {			define i16 @parity_16_shift(i16 %0) {
				lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a massive regression lebedev.ri: This seems to be a massive regression
	; X86-NOPOPCNT-LABEL: parity_16_shift:			; X86-NOPOPCNT-LABEL: parity_16_shift:
	; X86-NOPOPCNT: # %bb.0:			; X86-NOPOPCNT: # %bb.0:
	; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NOPOPCNT-NEXT: xorl %eax, %eax			; X86-NOPOPCNT-NEXT: movl %eax, %ecx
	; X86-NOPOPCNT-NEXT: xorb %ch, %cl			; X86-NOPOPCNT-NEXT: shrl %ecx
	; X86-NOPOPCNT-NEXT: setnp %al			; X86-NOPOPCNT-NEXT: andl $21845, %ecx # imm = 0x5555
				; X86-NOPOPCNT-NEXT: subl %ecx, %eax
				; X86-NOPOPCNT-NEXT: movl %eax, %ecx
				; X86-NOPOPCNT-NEXT: andl $13107, %ecx # imm = 0x3333
				; X86-NOPOPCNT-NEXT: shrl $2, %eax
				; X86-NOPOPCNT-NEXT: andl $13107, %eax # imm = 0x3333
				; X86-NOPOPCNT-NEXT: addl %ecx, %eax
				; X86-NOPOPCNT-NEXT: movl %eax, %ecx
				; X86-NOPOPCNT-NEXT: shrl $4, %ecx
				; X86-NOPOPCNT-NEXT: addl %eax, %ecx
				; X86-NOPOPCNT-NEXT: movl %ecx, %eax
				; X86-NOPOPCNT-NEXT: shrl $8, %eax
				; X86-NOPOPCNT-NEXT: addl %ecx, %eax
	; X86-NOPOPCNT-NEXT: addl %eax, %eax			; X86-NOPOPCNT-NEXT: addl %eax, %eax
				; X86-NOPOPCNT-NEXT: andl $2, %eax
	; X86-NOPOPCNT-NEXT: # kill: def $ax killed $ax killed $eax			; X86-NOPOPCNT-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-NOPOPCNT-NEXT: retl			; X86-NOPOPCNT-NEXT: retl
	;			;
	; X64-NOPOPCNT-LABEL: parity_16_shift:			; X64-NOPOPCNT-LABEL: parity_16_shift:
	; X64-NOPOPCNT: # %bb.0:			; X64-NOPOPCNT: # %bb.0:
	; X64-NOPOPCNT-NEXT: movl %edi, %ecx			; X64-NOPOPCNT-NEXT: movl %edi, %eax
	; X64-NOPOPCNT-NEXT: xorl %eax, %eax			; X64-NOPOPCNT-NEXT: shrl %eax
	; X64-NOPOPCNT-NEXT: xorb %ch, %cl			; X64-NOPOPCNT-NEXT: andl $21845, %eax # imm = 0x5555
	; X64-NOPOPCNT-NEXT: setnp %al			; X64-NOPOPCNT-NEXT: subl %eax, %edi
				; X64-NOPOPCNT-NEXT: movl %edi, %eax
				; X64-NOPOPCNT-NEXT: andl $13107, %eax # imm = 0x3333
				; X64-NOPOPCNT-NEXT: shrl $2, %edi
				; X64-NOPOPCNT-NEXT: andl $13107, %edi # imm = 0x3333
				; X64-NOPOPCNT-NEXT: addl %edi, %eax
				; X64-NOPOPCNT-NEXT: movl %eax, %ecx
				; X64-NOPOPCNT-NEXT: shrl $4, %ecx
				; X64-NOPOPCNT-NEXT: addl %eax, %ecx
				; X64-NOPOPCNT-NEXT: movl %ecx, %eax
				; X64-NOPOPCNT-NEXT: shrl $8, %eax
				; X64-NOPOPCNT-NEXT: addl %ecx, %eax
	; X64-NOPOPCNT-NEXT: addl %eax, %eax			; X64-NOPOPCNT-NEXT: addl %eax, %eax
				; X64-NOPOPCNT-NEXT: andl $2, %eax
	; X64-NOPOPCNT-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NOPOPCNT-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NOPOPCNT-NEXT: retq			; X64-NOPOPCNT-NEXT: retq
	;			;
	; X86-POPCNT-LABEL: parity_16_shift:			; X86-POPCNT-LABEL: parity_16_shift:
	; X86-POPCNT: # %bb.0:			; X86-POPCNT: # %bb.0:
	; X86-POPCNT-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X86-POPCNT-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-POPCNT-NEXT: popcntl %eax, %eax			; X86-POPCNT-NEXT: popcntl %eax, %eax
	; X86-POPCNT-NEXT: andl $1, %eax
	; X86-POPCNT-NEXT: addl %eax, %eax			; X86-POPCNT-NEXT: addl %eax, %eax
				; X86-POPCNT-NEXT: andl $2, %eax
	; X86-POPCNT-NEXT: # kill: def $ax killed $ax killed $eax			; X86-POPCNT-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-POPCNT-NEXT: retl			; X86-POPCNT-NEXT: retl
	;			;
	; X64-POPCNT-LABEL: parity_16_shift:			; X64-POPCNT-LABEL: parity_16_shift:
	; X64-POPCNT: # %bb.0:			; X64-POPCNT: # %bb.0:
	; X64-POPCNT-NEXT: movzwl %di, %eax			; X64-POPCNT-NEXT: movzwl %di, %eax
	; X64-POPCNT-NEXT: popcntl %eax, %eax			; X64-POPCNT-NEXT: popcntl %eax, %eax
	; X64-POPCNT-NEXT: andl $1, %eax
	; X64-POPCNT-NEXT: addl %eax, %eax			; X64-POPCNT-NEXT: addl %eax, %eax
				; X64-POPCNT-NEXT: andl $2, %eax
	; X64-POPCNT-NEXT: # kill: def $ax killed $ax killed $eax			; X64-POPCNT-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-POPCNT-NEXT: retq			; X64-POPCNT-NEXT: retq
	%2 = tail call i16 @llvm.ctpop.i16(i16 %0)			%2 = tail call i16 @llvm.ctpop.i16(i16 %0)
	%3 = shl nuw nsw i16 %2, 1			%3 = shl nuw nsw i16 %2, 1
	%4 = and i16 %3, 2			%4 = and i16 %3, 2
	ret i16 %4			ret i16 %4
	}			}

	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; X64-NEXT: setnp %al			; X64-NEXT: setnp %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%a = and i32 %x, 15			%a = and i32 %x, 15
	%b = tail call i32 @llvm.ctpop.i32(i32 %a)			%b = tail call i32 @llvm.ctpop.i32(i32 %a)
	%c = and i32 %b, 1			%c = and i32 %b, 1
	ret i32 %c			ret i32 %c
	}			}

	define i32 @parity_32_shift(i32 %0) {			define i32 @parity_32_shift(i32 %0) {
				lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a massive regression lebedev.ri: This seems to be a massive regression
	; X86-NOPOPCNT-LABEL: parity_32_shift:			; X86-NOPOPCNT-LABEL: parity_32_shift:
	; X86-NOPOPCNT: # %bb.0:			; X86-NOPOPCNT: # %bb.0:
	; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NOPOPCNT-NEXT: movl %eax, %ecx			; X86-NOPOPCNT-NEXT: movl %eax, %ecx
	; X86-NOPOPCNT-NEXT: shrl $16, %ecx			; X86-NOPOPCNT-NEXT: shrl %ecx
	; X86-NOPOPCNT-NEXT: xorl %eax, %ecx			; X86-NOPOPCNT-NEXT: andl $1431655765, %ecx # imm = 0x55555555
	; X86-NOPOPCNT-NEXT: xorl %eax, %eax			; X86-NOPOPCNT-NEXT: subl %ecx, %eax
	; X86-NOPOPCNT-NEXT: xorb %ch, %cl			; X86-NOPOPCNT-NEXT: movl %eax, %ecx
	; X86-NOPOPCNT-NEXT: setnp %al			; X86-NOPOPCNT-NEXT: andl $858993459, %ecx # imm = 0x33333333
	; X86-NOPOPCNT-NEXT: addl %eax, %eax			; X86-NOPOPCNT-NEXT: shrl $2, %eax
				; X86-NOPOPCNT-NEXT: andl $858993459, %eax # imm = 0x33333333
				; X86-NOPOPCNT-NEXT: addl %ecx, %eax
				; X86-NOPOPCNT-NEXT: movl %eax, %ecx
				; X86-NOPOPCNT-NEXT: shrl $4, %ecx
				; X86-NOPOPCNT-NEXT: addl %eax, %ecx
				; X86-NOPOPCNT-NEXT: andl $17764111, %ecx # imm = 0x10F0F0F
				; X86-NOPOPCNT-NEXT: imull $16843009, %ecx, %eax # imm = 0x1010101
				; X86-NOPOPCNT-NEXT: shrl $23, %eax
				; X86-NOPOPCNT-NEXT: andl $2, %eax
	; X86-NOPOPCNT-NEXT: retl			; X86-NOPOPCNT-NEXT: retl
	;			;
	; X64-NOPOPCNT-LABEL: parity_32_shift:			; X64-NOPOPCNT-LABEL: parity_32_shift:
	; X64-NOPOPCNT: # %bb.0:			; X64-NOPOPCNT: # %bb.0:
	; X64-NOPOPCNT-NEXT: movl %edi, %ecx			; X64-NOPOPCNT-NEXT: movl %edi, %eax
	; X64-NOPOPCNT-NEXT: shrl $16, %ecx			; X64-NOPOPCNT-NEXT: shrl %eax
	; X64-NOPOPCNT-NEXT: xorl %edi, %ecx			; X64-NOPOPCNT-NEXT: andl $1431655765, %eax # imm = 0x55555555
	; X64-NOPOPCNT-NEXT: xorl %eax, %eax			; X64-NOPOPCNT-NEXT: subl %eax, %edi
	; X64-NOPOPCNT-NEXT: xorb %ch, %cl			; X64-NOPOPCNT-NEXT: movl %edi, %eax
	; X64-NOPOPCNT-NEXT: setnp %al			; X64-NOPOPCNT-NEXT: andl $858993459, %eax # imm = 0x33333333
	; X64-NOPOPCNT-NEXT: addl %eax, %eax			; X64-NOPOPCNT-NEXT: shrl $2, %edi
				; X64-NOPOPCNT-NEXT: andl $858993459, %edi # imm = 0x33333333
				; X64-NOPOPCNT-NEXT: addl %eax, %edi
				; X64-NOPOPCNT-NEXT: movl %edi, %eax
				; X64-NOPOPCNT-NEXT: shrl $4, %eax
				; X64-NOPOPCNT-NEXT: addl %edi, %eax
				; X64-NOPOPCNT-NEXT: andl $17764111, %eax # imm = 0x10F0F0F
				; X64-NOPOPCNT-NEXT: imull $16843009, %eax, %eax # imm = 0x1010101
				; X64-NOPOPCNT-NEXT: shrl $23, %eax
				; X64-NOPOPCNT-NEXT: andl $2, %eax
	; X64-NOPOPCNT-NEXT: retq			; X64-NOPOPCNT-NEXT: retq
	;			;
	; X86-POPCNT-LABEL: parity_32_shift:			; X86-POPCNT-LABEL: parity_32_shift:
	; X86-POPCNT: # %bb.0:			; X86-POPCNT: # %bb.0:
	; X86-POPCNT-NEXT: popcntl {{[0-9]+}}(%esp), %eax			; X86-POPCNT-NEXT: popcntl {{[0-9]+}}(%esp), %eax
	; X86-POPCNT-NEXT: andl $1, %eax
	; X86-POPCNT-NEXT: addl %eax, %eax			; X86-POPCNT-NEXT: addl %eax, %eax
				; X86-POPCNT-NEXT: andl $2, %eax
	; X86-POPCNT-NEXT: retl			; X86-POPCNT-NEXT: retl
	;			;
	; X64-POPCNT-LABEL: parity_32_shift:			; X64-POPCNT-LABEL: parity_32_shift:
	; X64-POPCNT: # %bb.0:			; X64-POPCNT: # %bb.0:
	; X64-POPCNT-NEXT: popcntl %edi, %eax			; X64-POPCNT-NEXT: popcntl %edi, %eax
	; X64-POPCNT-NEXT: andl $1, %eax
	; X64-POPCNT-NEXT: addl %eax, %eax			; X64-POPCNT-NEXT: addl %eax, %eax
				; X64-POPCNT-NEXT: andl $2, %eax
	; X64-POPCNT-NEXT: retq			; X64-POPCNT-NEXT: retq
	%2 = tail call i32 @llvm.ctpop.i32(i32 %0)			%2 = tail call i32 @llvm.ctpop.i32(i32 %0)
	%3 = shl nuw nsw i32 %2, 1			%3 = shl nuw nsw i32 %2, 1
	%4 = and i32 %3, 2			%4 = and i32 %3, 2
	ret i32 %4			ret i32 %4
	}			}

	define i64 @parity_64_zexti8(i8 %x) {			define i64 @parity_64_zexti8(i8 %x) {
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; X64-NEXT: setnp %al			; X64-NEXT: setnp %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%a = and i64 %x, 15			%a = and i64 %x, 15
	%b = tail call i64 @llvm.ctpop.i64(i64 %a)			%b = tail call i64 @llvm.ctpop.i64(i64 %a)
	%c = and i64 %b, 1			%c = and i64 %b, 1
	ret i64 %c			ret i64 %c
	}			}

	define i64 @parity_64_shift(i64 %0) {			define i64 @parity_64_shift(i64 %0) {
				lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a massive regression lebedev.ri: This seems to be a massive regression
	; X86-NOPOPCNT-LABEL: parity_64_shift:			; X86-NOPOPCNT-LABEL: parity_64_shift:
	; X86-NOPOPCNT: # %bb.0:			; X86-NOPOPCNT: # %bb.0:
	; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NOPOPCNT-NEXT: xorl {{[0-9]+}}(%esp), %eax			; X86-NOPOPCNT-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NOPOPCNT-NEXT: movl %eax, %ecx			; X86-NOPOPCNT-NEXT: movl %ecx, %edx
	; X86-NOPOPCNT-NEXT: shrl $16, %ecx			; X86-NOPOPCNT-NEXT: shrl $16, %edx
	; X86-NOPOPCNT-NEXT: xorl %eax, %ecx			; X86-NOPOPCNT-NEXT: xorl %ecx, %edx
				; X86-NOPOPCNT-NEXT: xorl %ecx, %ecx
				; X86-NOPOPCNT-NEXT: xorb %dh, %dl
				; X86-NOPOPCNT-NEXT: setnp %cl
				; X86-NOPOPCNT-NEXT: movl %eax, %edx
				; X86-NOPOPCNT-NEXT: shrl $16, %edx
				; X86-NOPOPCNT-NEXT: xorl %eax, %edx
	; X86-NOPOPCNT-NEXT: xorl %eax, %eax			; X86-NOPOPCNT-NEXT: xorl %eax, %eax
	; X86-NOPOPCNT-NEXT: xorb %ch, %cl			; X86-NOPOPCNT-NEXT: xorb %dh, %dl
	; X86-NOPOPCNT-NEXT: setnp %al			; X86-NOPOPCNT-NEXT: setnp %al
				; X86-NOPOPCNT-NEXT: addl %ecx, %eax
	; X86-NOPOPCNT-NEXT: addl %eax, %eax			; X86-NOPOPCNT-NEXT: addl %eax, %eax
				; X86-NOPOPCNT-NEXT: andl $2, %eax
	; X86-NOPOPCNT-NEXT: xorl %edx, %edx			; X86-NOPOPCNT-NEXT: xorl %edx, %edx
	; X86-NOPOPCNT-NEXT: retl			; X86-NOPOPCNT-NEXT: retl
	;			;
	; X64-NOPOPCNT-LABEL: parity_64_shift:			; X64-NOPOPCNT-LABEL: parity_64_shift:
	; X64-NOPOPCNT: # %bb.0:			; X64-NOPOPCNT: # %bb.0:
	; X64-NOPOPCNT-NEXT: movq %rdi, %rax			; X64-NOPOPCNT-NEXT: movq %rdi, %rax
	; X64-NOPOPCNT-NEXT: shrq $32, %rax			; X64-NOPOPCNT-NEXT: shrq $32, %rax
	; X64-NOPOPCNT-NEXT: xorl %edi, %eax			; X64-NOPOPCNT-NEXT: xorl %edi, %eax
	; X64-NOPOPCNT-NEXT: movl %eax, %ecx			; X64-NOPOPCNT-NEXT: movl %eax, %ecx
	; X64-NOPOPCNT-NEXT: shrl $16, %ecx			; X64-NOPOPCNT-NEXT: shrl $16, %ecx
	; X64-NOPOPCNT-NEXT: xorl %eax, %ecx			; X64-NOPOPCNT-NEXT: xorl %eax, %ecx
	; X64-NOPOPCNT-NEXT: xorl %eax, %eax			; X64-NOPOPCNT-NEXT: xorl %eax, %eax
	; X64-NOPOPCNT-NEXT: xorb %ch, %cl			; X64-NOPOPCNT-NEXT: xorb %ch, %cl
	; X64-NOPOPCNT-NEXT: setnp %al			; X64-NOPOPCNT-NEXT: setnp %al
	; X64-NOPOPCNT-NEXT: addq %rax, %rax			; X64-NOPOPCNT-NEXT: addl %eax, %eax
	; X64-NOPOPCNT-NEXT: retq			; X64-NOPOPCNT-NEXT: retq
	;			;
	; X86-POPCNT-LABEL: parity_64_shift:			; X86-POPCNT-LABEL: parity_64_shift:
	; X86-POPCNT: # %bb.0:			; X86-POPCNT: # %bb.0:
	; X86-POPCNT-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-POPCNT-NEXT: popcntl {{[0-9]+}}(%esp), %ecx
	; X86-POPCNT-NEXT: xorl {{[0-9]+}}(%esp), %eax			; X86-POPCNT-NEXT: popcntl {{[0-9]+}}(%esp), %eax
	; X86-POPCNT-NEXT: popcntl %eax, %eax			; X86-POPCNT-NEXT: addl %ecx, %eax
	; X86-POPCNT-NEXT: andl $1, %eax
	; X86-POPCNT-NEXT: addl %eax, %eax			; X86-POPCNT-NEXT: addl %eax, %eax
				; X86-POPCNT-NEXT: andl $2, %eax
	; X86-POPCNT-NEXT: xorl %edx, %edx			; X86-POPCNT-NEXT: xorl %edx, %edx
	; X86-POPCNT-NEXT: retl			; X86-POPCNT-NEXT: retl
	;			;
	; X64-POPCNT-LABEL: parity_64_shift:			; X64-POPCNT-LABEL: parity_64_shift:
	; X64-POPCNT: # %bb.0:			; X64-POPCNT: # %bb.0:
	; X64-POPCNT-NEXT: popcntq %rdi, %rax			; X64-POPCNT-NEXT: popcntq %rdi, %rax
	; X64-POPCNT-NEXT: andl $1, %eax			; X64-POPCNT-NEXT: andl $1, %eax
	; X64-POPCNT-NEXT: addq %rax, %rax			; X64-POPCNT-NEXT: addl %eax, %eax
	; X64-POPCNT-NEXT: retq			; X64-POPCNT-NEXT: retq
	%2 = tail call i64 @llvm.ctpop.i64(i64 %0)			%2 = tail call i64 @llvm.ctpop.i64(i64 %0)
	%3 = shl nuw nsw i64 %2, 1			%3 = shl nuw nsw i64 %2, 1
	%4 = and i64 %3, 2			%4 = and i64 %3, 2
	ret i64 %4			ret i64 %4
	}			}

	declare i4 @llvm.ctpop.i4(i4 %x)			declare i4 @llvm.ctpop.i4(i4 %x)
	declare i8 @llvm.ctpop.i8(i8 %x)			declare i8 @llvm.ctpop.i8(i8 %x)
	declare i16 @llvm.ctpop.i16(i16 %x)			declare i16 @llvm.ctpop.i16(i16 %x)
	declare i17 @llvm.ctpop.i17(i17 %x)			declare i17 @llvm.ctpop.i17(i17 %x)
	declare i32 @llvm.ctpop.i32(i32 %x)			declare i32 @llvm.ctpop.i32(i32 %x)
	declare i64 @llvm.ctpop.i64(i64 %x)			declare i64 @llvm.ctpop.i64(i64 %x)

llvm/test/CodeGen/X86/setcc.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%t0 = icmp ult i64 %x, 18			%t0 = icmp ult i64 %x, 18
	%if = select i1 %t0, i64 64, i64 0			%if = select i1 %t0, i64 64, i64 0
	ret i64 %if			ret i64 %if
	}			}

	@v4 = common global i32 0, align 4			@v4 = common global i32 0, align 4

	define i32 @t4(i32 %a) {			define i32 @t4(i32 %a) {
				lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
	; X86-LABEL: t4:			; X86-LABEL: t4:
	; X86: ## %bb.0:			; X86: ## %bb.0:
	; X86-NEXT: movl L_v4$non_lazy_ptr, %ecx			; X86-NEXT: movl L_v4$non_lazy_ptr, %eax
	; X86-NEXT: xorl %eax, %eax			; X86-NEXT: xorl %ecx, %ecx
	; X86-NEXT: cmpl $1, (%ecx)			; X86-NEXT: cmpl $1, (%eax)
	; X86-NEXT: adcw $1, %ax			; X86-NEXT: adcw $1, %cx
				; X86-NEXT: movzwl %cx, %eax
	; X86-NEXT: shll $16, %eax			; X86-NEXT: shll $16, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: t4:			; X64-LABEL: t4:
	; X64: ## %bb.0:			; X64: ## %bb.0:
	; X64-NEXT: movq _v4@GOTPCREL(%rip), %rcx			; X64-NEXT: movq _v4@GOTPCREL(%rip), %rax
	; X64-NEXT: xorl %eax, %eax			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: cmpl $1, (%rcx)			; X64-NEXT: cmpl $1, (%rax)
	; X64-NEXT: adcw $1, %ax			; X64-NEXT: adcw $1, %cx
				; X64-NEXT: movzwl %cx, %eax
	; X64-NEXT: shll $16, %eax			; X64-NEXT: shll $16, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%t0 = load i32, ptr @v4, align 4			%t0 = load i32, ptr @v4, align 4
	%not.tobool = icmp eq i32 %t0, 0			%not.tobool = icmp eq i32 %t0, 0
	%conv.i = sext i1 %not.tobool to i16			%conv.i = sext i1 %not.tobool to i16
	%call.lobit = lshr i16 %conv.i, 15			%call.lobit = lshr i16 %conv.i, 15
	%add.i.1 = add nuw nsw i16 %call.lobit, 1			%add.i.1 = add nuw nsw i16 %call.lobit, 1
	%conv4.2 = zext i16 %add.i.1 to i32			%conv4.2 = zext i16 %add.i.1 to i32
	▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/split-store.ll

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%t3 = zext i24 %tmp1 to i48		%t3 = zext i24 %tmp1 to i48
%t4 = or i48 %t2, %t3		%t4 = or i48 %t2, %t3
store i48 %t4, ptr %ref.tmp, align 2		store i48 %t4, ptr %ref.tmp, align 2
ret void		ret void
}		}

; getTypeSizeInBits(i12) != getTypeStoreSizeInBits(i12), so store split doesn't kick in.		; getTypeSizeInBits(i12) != getTypeStoreSizeInBits(i12), so store split doesn't kick in.

define void @int12_int12_pair(i12 signext %tmp1, i12 signext %tmp2, ptr %ref.tmp) {		define void @int12_int12_pair(i12 signext %tmp1, i12 signext %tmp2, ptr %ref.tmp) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
; CHECK-LABEL: int12_int12_pair:		; CHECK-LABEL: int12_int12_pair:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movzwl %si, %eax
; CHECK-NEXT: shll $12, %eax		; CHECK-NEXT: shll $12, %eax
; CHECK-NEXT: andl $4095, %edi # imm = 0xFFF		; CHECK-NEXT: andl $4095, %edi # imm = 0xFFF
; CHECK-NEXT: orl %eax, %edi		; CHECK-NEXT: orl %eax, %edi
; CHECK-NEXT: shrl $4, %esi		; CHECK-NEXT: shrl $4, %esi
; CHECK-NEXT: movb %sil, 2(%rdx)		; CHECK-NEXT: movb %sil, 2(%rdx)
; CHECK-NEXT: movw %di, (%rdx)		; CHECK-NEXT: movw %di, (%rdx)
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t1 = zext i12 %tmp2 to i24		%t1 = zext i12 %tmp2 to i24
%t2 = shl nuw i24 %t1, 12		%t2 = shl nuw i24 %t1, 12
%t3 = zext i12 %tmp1 to i24		%t3 = zext i12 %tmp1 to i24
%t4 = or i24 %t2, %t3		%t4 = or i24 %t2, %t3
store i24 %t4, ptr %ref.tmp, align 2		store i24 %t4, ptr %ref.tmp, align 2
ret void		ret void
}		}

; getTypeSizeInBits(i14) != getTypeStoreSizeInBits(i14), so store split doesn't kick in.		; getTypeSizeInBits(i14) != getTypeStoreSizeInBits(i14), so store split doesn't kick in.

define void @int7_int7_pair(i7 signext %tmp1, i7 signext %tmp2, ptr %ref.tmp) {		define void @int7_int7_pair(i7 signext %tmp1, i7 signext %tmp2, ptr %ref.tmp) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
; CHECK-LABEL: int7_int7_pair:		; CHECK-LABEL: int7_int7_pair:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: shll $7, %esi		; CHECK-NEXT: movzbl %sil, %eax
		; CHECK-NEXT: shll $7, %eax
; CHECK-NEXT: andl $127, %edi		; CHECK-NEXT: andl $127, %edi
; CHECK-NEXT: orl %esi, %edi		; CHECK-NEXT: orl %eax, %edi
; CHECK-NEXT: andl $16383, %edi # imm = 0x3FFF		; CHECK-NEXT: andl $16383, %edi # imm = 0x3FFF
; CHECK-NEXT: movw %di, (%rdx)		; CHECK-NEXT: movw %di, (%rdx)
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t1 = zext i7 %tmp2 to i14		%t1 = zext i7 %tmp2 to i14
%t2 = shl nuw i14 %t1, 7		%t2 = shl nuw i14 %t1, 7
%t3 = zext i7 %tmp1 to i14		%t3 = zext i7 %tmp1 to i14
%t4 = or i14 %t2, %t3		%t4 = or i14 %t2, %t3
store i14 %t4, ptr %ref.tmp, align 2		store i14 %t4, ptr %ref.tmp, align 2
ret void		ret void
}		}

; getTypeSizeInBits(i2) != getTypeStoreSizeInBits(i2), so store split doesn't kick in.		; getTypeSizeInBits(i2) != getTypeStoreSizeInBits(i2), so store split doesn't kick in.

define void @int1_int1_pair(i1 signext %tmp1, i1 signext %tmp2, ptr %ref.tmp) {		define void @int1_int1_pair(i1 signext %tmp1, i1 signext %tmp2, ptr %ref.tmp) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
; CHECK-LABEL: int1_int1_pair:		; CHECK-LABEL: int1_int1_pair:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
		; CHECK-NEXT: andb $1, %sil
; CHECK-NEXT: addb %sil, %sil		; CHECK-NEXT: addb %sil, %sil
; CHECK-NEXT: subb %dil, %sil		; CHECK-NEXT: subb %dil, %sil
; CHECK-NEXT: andb $3, %sil		; CHECK-NEXT: andb $3, %sil
; CHECK-NEXT: movb %sil, (%rdx)		; CHECK-NEXT: movb %sil, (%rdx)
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t1 = zext i1 %tmp2 to i2		%t1 = zext i1 %tmp2 to i2
%t2 = shl nuw i2 %t1, 1		%t2 = shl nuw i2 %t1, 1
%t3 = zext i1 %tmp1 to i2		%t3 = zext i1 %tmp1 to i2
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%12 = getelementptr inbounds i16, ptr %0, i64 12		%12 = getelementptr inbounds i16, ptr %0, i64 12
store <4 x i16> %8, ptr %0, align 2		store <4 x i16> %8, ptr %0, align 2
store <4 x i16> %9, ptr %10, align 2		store <4 x i16> %9, ptr %10, align 2
store <4 x i16> zeroinitializer, ptr %11, align 2		store <4 x i16> zeroinitializer, ptr %11, align 2
store <4 x i16> zeroinitializer, ptr %12, align 2		store <4 x i16> zeroinitializer, ptr %12, align 2
ret void		ret void
}		}

define <8 x i32> @PR46393(<8 x i16> %a0, i8 %a1) {		define <8 x i32> @PR46393(<8 x i16> %a0, i8 %a1) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions This seems to be a regression lebedev.ri: This seems to be a regression
; X86-LABEL: PR46393:		; X86-LABEL: PR46393:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; X86-NEXT: vpmovsxwd %xmm0, %ymm0
; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X86-NEXT: kmovd %eax, %k1		; X86-NEXT: kmovd %eax, %k1
; X86-NEXT: vpslld $16, %ymm0, %ymm0 {%k1} {z}		; X86-NEXT: vpslld $16, %ymm0, %ymm0 {%k1} {z}
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: PR46393:		; X64-LABEL: PR46393:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; X64-NEXT: vpmovsxwd %xmm0, %ymm0
; X64-NEXT: kmovd %edi, %k1		; X64-NEXT: kmovd %edi, %k1
; X64-NEXT: vpslld $16, %ymm0, %ymm0 {%k1} {z}		; X64-NEXT: vpslld $16, %ymm0, %ymm0 {%k1} {z}
; X64-NEXT: retq		; X64-NEXT: retq
%zext = sext <8 x i16> %a0 to <8 x i32>		%zext = sext <8 x i16> %a0 to <8 x i32>
%mask = bitcast i8 %a1 to <8 x i1>		%mask = bitcast i8 %a1 to <8 x i1>
%shl = shl nuw <8 x i32> %zext, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		%shl = shl nuw <8 x i32> %zext, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
%sel = select <8 x i1> %mask, <8 x i32> %shl, <8 x i32> zeroinitializer		%sel = select <8 x i1> %mask, <8 x i32> %shl, <8 x i32> zeroinitializer
ret <8 x i32> %sel		ret <8 x i32> %sel
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG][RISCV][X86][AArch64][AMDGPU][PowerPC] Improve SimplifyDemandedBits for SHL with NUW/NSW flags.Needs RevisionPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 485325

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/AArch64/arm64-shifted-sext.ll

llvm/test/CodeGen/AArch64/load-combine.ll

llvm/test/CodeGen/AMDGPU/shl.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

llvm/test/CodeGen/PowerPC/pre-inc-disable.ll

llvm/test/CodeGen/RISCV/aext-to-zext.ll

llvm/test/CodeGen/RISCV/rv64i-complex-float.ll

llvm/test/CodeGen/X86/fp128-cast.ll

llvm/test/CodeGen/X86/parity.ll

llvm/test/CodeGen/X86/setcc.ll

llvm/test/CodeGen/X86/split-store.ll

llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll

[SelectionDAG][RISCV][X86][AArch64][AMDGPU][PowerPC] Improve SimplifyDemandedBits for SHL with NUW/NSW flags.
Needs RevisionPublic