This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/BPF/
-
Target/
-
BPF/
-
BPFISelLowering.cpp
-
BPFInstrInfo.td
-
BPFMIPeephole.cpp
-
test/CodeGen/BPF/
-
CodeGen/
-
BPF/
-
32-bit-subreg-cond-select.ll
-
32-bit-subreg-peephole-phi-1.ll
-
32-bit-subreg-peephole-phi-2.ll
-
32-bit-subreg-peephole-phi-3.ll
-
32-bit-subreg-peephole.ll
-
32-bit-subreg-zext.ll

Differential D73985

[bpf] zero extension is required in BPF implementaiton so remove <<=32 >>=32
ClosedPublic

Authored by jrfastab on Feb 4 2020, 11:36 AM.

Download Raw Diff

Details

Reviewers

ast
yonghong-song

Commits

rG13f6c81c5d9a: [BPF] simplify zero extension with MOV_32_64

Summary

The current pattern matching for zext results in the following code snippet
being produced,

  w1 = w0
  r1 <<= 32
  r1 >>= 32

Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.

  R0=inv(id=0,smax_value=800)
  R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
  R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9=inv800 R10=fp0 fp-8=mmmm????
 58: (1c) w9 -= w8
 59: (bc) w1 = w8
 60: (67) r1 <<= 32
 61: (77) r1 >>= 32
 62: (bf) r2 = r7
 63: (0f) r2 += r1
 64: (bf) r1 = r6
 65: (bc) w3 = w9
 66: (b7) r4 = 0
 67: (85) call bpf_get_stack#67
  R0=inv(id=0,smax_value=800)
  R1_w=ctx(id=0,off=0,imm=0)
  R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
  R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R10=fp0 fp-8=mmmm????

After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.

Further it reduces 3 insns to 1.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jrfastab created this revision.Feb 4 2020, 11:36 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptFeb 4 2020, 11:37 AM

We can't commit this IMO until we resolve selftests in kernel side verifier, this introduces some failures. At the moment this is causing some new failures in existing test code.

In D73985#1858559, @jrfastab wrote:

We can't commit this IMO until we resolve selftests in kernel side verifier, this introduces some failures. At the moment this is causing some new failures in existing test code.

What kind of version compatibility is there between the kernel verifier and LLVM/clang? Does the latest kernel RC only work with LLVM git?

You should add a test.

@jrfastab I added isTruncateFree() callback in BPF IR Lowering (isel) phase. See https://reviews.llvm.org/D74101. This should help with the case where packet data/data_end is truncated with 32bit MOV.

@yonghong-song @ast Now we have proper alu32 fixes in kernel side how about we add this. We've been running with it for a couple months now?

In D73985#1969615, @jrfastab wrote:

@yonghong-song @ast Now we have proper alu32 fixes in kernel side how about we add this. We've been running with it for a couple months now?

Yeah, let's land it, but please add a test first.

@jrfastab agree with Alexei. please add a test. Thanks.

Added zext test.

Forgot to commit the int->unsigned int in the test function to get zext as expected instead of the sext, pushing now.

Harbormaster failed remote builds in B57666: Diff 265775!May 22 2020, 12:20 PM

Harbormaster failed remote builds in B57665: Diff 265774!

Thanks! The change looks good. There are 5 more tests need adjustment with this optimization. Could you fix them as well?

Failing Tests (5):
  LLVM :: CodeGen/BPF/32-bit-subreg-cond-select.ll
  LLVM :: CodeGen/BPF/32-bit-subreg-peephole-phi-1.ll
  LLVM :: CodeGen/BPF/32-bit-subreg-peephole-phi-2.ll
  LLVM :: CodeGen/BPF/32-bit-subreg-peephole-phi-3.ll
  LLVM :: CodeGen/BPF/32-bit-subreg-peephole.ll

@yonghong-song I can update the tests but I'm not completely following the CHECKs in 32-bit-subreg-peephole.ll

; long long select_u(unsigned a, unsigned b, long long c, long long d)
; {
;   if (a > b)
;     return c;
;   else
;     return d;
; }

with corresponding checks,

; Function Attrs: norecurse nounwind readnone
define dso_local i64 @select_u(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {
; CHECK-LABEL: select_u:
entry:
  %cmp = icmp ugt i32 %a, %b
  %c.d = select i1 %cmp, i64 %c, i64 %d
; CHECK: r{{[0-9]+}} <<= 32
; CHECK-NEXT: r{{[0-9]+}} >>= 32
; CHECK: if r{{[0-9]+}} {{<|>}} r{{[0-9]+}} goto
  ret i64 %c.d
}

Couple questions if you happen to know could save some time on my side. First why does this not use a 32-bit compare, %a is 32-bits and %b is 32-bit seems a JMP32 instruction should work OK. Here is the assembly generated,

# %bb.0:                                # %entry
        r0 = r3
        r1 <<= 32
        r1 >>= 32
        r2 <<= 32
        r2 >>= 32
        if r1 > r2 goto LBB0_2
# %bb.1:                                # %entry
        r0 = r4
LBB0_2:                                 # %entry
        exit
.Lfunc_end0:

Next with the patch I created here we drop the zext but it still does not use the alu32 ops. Here is the asm,

# %bb.0:                                # %entry
        r0 = r3
        if r1 > r2 goto LBB0_2
# %bb.1:                                # %entry
        r0 = r4
LBB0_2:                                 # %entry
        exit

This is probably not correct because how do we "know" the upper bits are zero seems we still need to zero them. Looking at the passes before peephole optimization,

# *** IR Dump After BPF PreEmit Checking ***:
# Machine code for function select_u: NoPHIs, TracksLiveness, NoVRegs
Function Live Ins: $w1, $w2, $r3, $r4

bb.0.entry:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
  liveins: $r3, $r4, $w1, $w2
  $r0 = MOV_rr $r3
  $r1 = MOV_32_64 killed $w1
  $r2 = MOV_32_64 killed $w2
  JUGT_rr killed $r1, killed $r2, %bb.2

bb.1.entry:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)
  liveins: $r4
  $r0 = MOV_rr killed $r4

bb.2.entry:
; predecessors: %bb.0, %bb.1
  liveins: $r0
  RET implicit $r0

# End machine code for function select_u.

And then after optimization,

# *** IR Dump After BPF PreEmit Peephole Optimization ***:
# Machine code for function select_u: NoPHIs, TracksLiveness, NoVRegs
Function Live Ins: $w1, $w2, $r3, $r4

bb.0.entry:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
  liveins: $r3, $r4, $w1, $w2
  $r0 = MOV_rr $r3
  JUGT_rr killed $r1, killed $r2, %bb.2

bb.1.entry:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)
  liveins: $r4
  $r0 = MOV_rr killed $r4

bb.2.entry:
; predecessors: %bb.0, %bb.1
  liveins: $r0
  RET implicit $r0

# End machine code for function select_u.

Seems something went wrong here? I'll look into it tomorrow but its not what I expected on two cases. First I expected a jmp32 op and second that last optimization seems wrong?

To partially answer one questions. Seems jmp32 is only used with mcpu=v3 so that explains the lack of it above but still seems like the -mcpu=v2 -mattr=+alu32 needs to be fixed ~with~ to work with my patch.

Seems something went wrong here? I'll look into it tomorrow but its not what I expected on two cases. First I expected a jmp32 op and second that last optimization seems wrong?

I checked the code. I agree with you that optimization seems not right.
In BPFMIPeephole.cpp, we have

// Eliminate identical move:
//
//   MOV rA, rA
//
// This is particularly possible to happen when sub-register support
// enabled. The special type cast insn MOV_32_64 involves different
// register class on src (i32) and dst (i64), RA could generate useless
// instruction due to this.
unsigned Opcode = MI.getOpcode();
if (Opcode == BPF::MOV_32_64 ||
    Opcode == BPF::MOV_rr || Opcode == BPF::MOV_rr_32) {
  Register dst = MI.getOperand(0).getReg();
  Register src = MI.getOperand(1).getReg();

  if (Opcode == BPF::MOV_32_64)
    dst = TRI->getSubReg(dst, BPF::sub_32);

  if (dst != src)
    continue;

  ToErase = &MI;
  RedundantMovElemNum++;
  Eliminated = true;
}

MOV_32_64 actually problematic here as it is not a simple register copy, it has side effect.
The same as MOV_rr_32. it also has a side effect to zero out the top bits. We cannot simply remove
them.

What we need to do is to further trace def/use chain, unless the 32bit reg already has upper bits
zeroed, we should not do the optimization.

Will work on a fix ASAP. Thanks for spotting the problem!

I think the following are the steps we should take:

We need to fold the following change to this commit:

diff --git a/llvm/lib/Target/BPF/BPFMIPeephole.cpp b/llvm/lib/Target/BPF/BPFMIPeephole.cpp 
index a2ceade6680..fe955fad042 100644 
--- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp 
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp 
@@ -301,19 +301,16 @@ bool BPFMIPreEmitPeephole::eliminateRedundantMov(void) { 
       // 
       //   MOV rA, rA 
       // 
-      // This is particularly possible to happen when sub-register support 
-      // enabled. The special type cast insn MOV_32_64 involves different 
-      // register class on src (i32) and dst (i64), RA could generate useless 
-      // instruction due to this. 
+      // Note that we cannot remove 
+      //   MOV_32_64  rA, wA 
+      //   MOV_rr_32  wA, wA 
+      // as these two instructions having side effects, zeroing out 
+      // top 32 bits of rA. 
       unsigned Opcode = MI.getOpcode(); 
-      if (Opcode == BPF::MOV_32_64 || 
-          Opcode == BPF::MOV_rr || Opcode == BPF::MOV_rr_32) { 
+      if (Opcode == BPF::MOV_rr) { 
         Register dst = MI.getOperand(0).getReg(); 
         Register src = MI.getOperand(1).getReg(); 
  
-        if (Opcode == BPF::MOV_32_64) 
-          dst = TRI->getSubReg(dst, BPF::sub_32); 
- 
         if (dst != src) 
           continue;

We do not this problem before since MOV_32_64 is always generated together with shifts and that is taken care of by SSA peephole optimization.

But since this patch may generate MOV_32_64 without shifts, we need to make this change.

@jrfastab could you take care of this?

optionally, we need to enhance SSA peephole optimization for MOV_32_64 without shift case. This is only one inst since there are no shifts. We can do it later if we do not want to do it in this patch.

@jrfastab you can take a look at BPFMIPeephole.cpp to see whether you want to do the optimization now or delay to later.

@yonghong-song I can merge the fix with this patch, but why do we eliminate MOV_rr? I'm trying to see where/why this case would happen I'm not seeing other backends with similar logic. Could we just remove the entire block for all cases?

In D73985#2054790, @jrfastab wrote:

@yonghong-song I can merge the fix with this patch, but why do we eliminate MOV_rr? I'm trying to see where/why this case would happen I'm not seeing other backends with similar logic. Could we just remove the entire block for all cases?

No particular reason. Just want to be cautiously. I agree that MOV_rr should be really unlikely generated by the compiler, but with 32bit subregister, not 100% sure. Could you check with kernel selftest bpf programs? If any program hits MOV_rr, I suggest to keep it. Otherwise, we can remove it.

Update BPFMIPeephole.cpp to remove incorrect MOV_32_64 and MOV_rr_32 instruction elimination
which should not be done because these insns have side effects. I left the MOV_rr case in
place for now but may remove it later after doing some more digging to see if this case
ever pops up in our code.

Also updated tests to catch zext using MOV_* instructions now. I converted the CHECK <<=32 >==32
cases to CHECK-NOT to ensure we no longer generate these lines.

Harbormaster completed remote builds in B57952: Diff 266368.May 26 2020, 5:28 PM

LGTM. Thanks! @jrfastab Do you have "git push" permission? If yes, you can directly push the change to trunk. Otherwise, I can help push the patch in. Let me know.

This revision is now accepted and ready to land.May 26 2020, 5:48 PM

@yonghong-song no I don't have "git push" permission. I guess I could request them if it would be helpful, but for now might be best if you could push for me. Thanks.

sounds good. I will git push the patch tomorrow..

Closed by commit rG13f6c81c5d9a: [BPF] simplify zero extension with MOV_32_64 (authored by jrfastab, committed by yonghong-song). · Explain WhyMay 27 2020, 11:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

BPF/

BPFISelLowering.cpp

6 lines

BPFInstrInfo.td

3 lines

BPFMIPeephole.cpp

15 lines

test/

CodeGen/

BPF/

32-bit-subreg-cond-select.ll

20 lines

32-bit-subreg-peephole-phi-1.ll

2 lines

32-bit-subreg-peephole-phi-2.ll

4 lines

32-bit-subreg-peephole-phi-3.ll

5 lines

32-bit-subreg-peephole.ll

24 lines

32-bit-subreg-zext.ll

21 lines

Diff 266616

llvm/lib/Target/BPF/BPFISelLowering.cpp

Show First 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	BPFTargetLowering::EmitSubregExt(MachineInstr &MI, MachineBasicBlock *BB,
unsigned Reg, bool isSigned) const {		unsigned Reg, bool isSigned) const {
const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
const TargetRegisterClass *RC = getRegClassFor(MVT::i64);		const TargetRegisterClass *RC = getRegClassFor(MVT::i64);
int RShiftOp = isSigned ? BPF::SRA_ri : BPF::SRL_ri;		int RShiftOp = isSigned ? BPF::SRA_ri : BPF::SRL_ri;
MachineFunction *F = BB->getParent();		MachineFunction *F = BB->getParent();
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();

MachineRegisterInfo &RegInfo = F->getRegInfo();		MachineRegisterInfo &RegInfo = F->getRegInfo();

		if (!isSigned) {
		Register PromotedReg0 = RegInfo.createVirtualRegister(RC);
		BuildMI(BB, DL, TII.get(BPF::MOV_32_64), PromotedReg0).addReg(Reg);
		return PromotedReg0;
		}
Register PromotedReg0 = RegInfo.createVirtualRegister(RC);		Register PromotedReg0 = RegInfo.createVirtualRegister(RC);
Register PromotedReg1 = RegInfo.createVirtualRegister(RC);		Register PromotedReg1 = RegInfo.createVirtualRegister(RC);
Register PromotedReg2 = RegInfo.createVirtualRegister(RC);		Register PromotedReg2 = RegInfo.createVirtualRegister(RC);
BuildMI(BB, DL, TII.get(BPF::MOV_32_64), PromotedReg0).addReg(Reg);		BuildMI(BB, DL, TII.get(BPF::MOV_32_64), PromotedReg0).addReg(Reg);
BuildMI(BB, DL, TII.get(BPF::SLL_ri), PromotedReg1)		BuildMI(BB, DL, TII.get(BPF::SLL_ri), PromotedReg1)
.addReg(PromotedReg0).addImm(32);		.addReg(PromotedReg0).addImm(32);
BuildMI(BB, DL, TII.get(RShiftOp), PromotedReg2)		BuildMI(BB, DL, TII.get(RShiftOp), PromotedReg2)
.addReg(PromotedReg1).addImm(32);		.addReg(PromotedReg1).addImm(32);
▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/lib/Target/BPF/BPFInstrInfo.td

Show First 20 Lines • Show All 726 Lines • ▼ Show 20 Lines	let isCodeGenOnly = 1 in {
def MOV_32_64 : ALU_RR<BPF_ALU, BPF_MOV,		def MOV_32_64 : ALU_RR<BPF_ALU, BPF_MOV,
(outs GPR:$dst), (ins GPR32:$src),		(outs GPR:$dst), (ins GPR32:$src),
"$dst = $src", []>;		"$dst = $src", []>;
}		}

def : Pat<(i64 (sext GPR32:$src)),		def : Pat<(i64 (sext GPR32:$src)),
(SRA_ri (SLL_ri (MOV_32_64 GPR32:$src), 32), 32)>;		(SRA_ri (SLL_ri (MOV_32_64 GPR32:$src), 32), 32)>;

def : Pat<(i64 (zext GPR32:$src)),		def : Pat<(i64 (zext GPR32:$src)), (MOV_32_64 GPR32:$src)>;
(SRL_ri (SLL_ri (MOV_32_64 GPR32:$src), 32), 32)>;

// For i64 -> i32 truncation, use the 32-bit subregister directly.		// For i64 -> i32 truncation, use the 32-bit subregister directly.
def : Pat<(i32 (trunc GPR:$src)),		def : Pat<(i32 (trunc GPR:$src)),
(i32 (EXTRACT_SUBREG GPR:$src, sub_32))>;		(i32 (EXTRACT_SUBREG GPR:$src, sub_32))>;

// For i32 -> i64 anyext, we don't care about the high bits.		// For i32 -> i64 anyext, we don't care about the high bits.
def : Pat<(i64 (anyext GPR32:$src)),		def : Pat<(i64 (anyext GPR32:$src)),
(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>;		(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>;
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/lib/Target/BPF/BPFMIPeephole.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
ToErase->eraseFromParent();		ToErase->eraseFromParent();
ToErase = nullptr;		ToErase = nullptr;
}		}

// Eliminate identical move:		// Eliminate identical move:
//		//
// MOV rA, rA		// MOV rA, rA
//		//
// This is particularly possible to happen when sub-register support		// Note that we cannot remove
// enabled. The special type cast insn MOV_32_64 involves different		// MOV_32_64 rA, wA
// register class on src (i32) and dst (i64), RA could generate useless		// MOV_rr_32 wA, wA
// instruction due to this.		// as these two instructions having side effects, zeroing out
		// top 32 bits of rA.
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
if (Opcode == BPF::MOV_32_64 \|\|		if (Opcode == BPF::MOV_rr) {
Opcode == BPF::MOV_rr \|\| Opcode == BPF::MOV_rr_32) {
Register dst = MI.getOperand(0).getReg();		Register dst = MI.getOperand(0).getReg();
Register src = MI.getOperand(1).getReg();		Register src = MI.getOperand(1).getReg();

if (Opcode == BPF::MOV_32_64)
dst = TRI->getSubReg(dst, BPF::sub_32);

if (dst != src)		if (dst != src)
continue;		continue;

ToErase = &MI;		ToErase = &MI;
RedundantMovElemNum++;		RedundantMovElemNum++;
Eliminated = true;		Eliminated = true;
}		}
}		}
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/test/CodeGen/BPF/32-bit-subreg-cond-select.ll

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i32 @select_cc_32(i32 %a, i32 %b, i32 %c, i32 %d) local_unnamed_addr #0 {			define dso_local i32 @select_cc_32(i32 %a, i32 %b, i32 %c, i32 %d) local_unnamed_addr #0 {
	entry:			entry:
	%cmp = icmp ugt i32 %a, %b			%cmp = icmp ugt i32 %a, %b
	%c.d = select i1 %cmp, i32 %c, i32 %d			%c.d = select i1 %cmp, i32 %c, i32 %d
	ret i32 %c.d			ret i32 %c.d
	}			}
	; CHECK-LABEL: select_cc_32			; CHECK-LABEL: select_cc_32
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i64 @select_cc_32_64(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {			define dso_local i64 @select_cc_32_64(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {
	entry:			entry:
	%cmp = icmp ugt i32 %a, %b			%cmp = icmp ugt i32 %a, %b
	%c.d = select i1 %cmp, i64 %c, i64 %d			%c.d = select i1 %cmp, i64 %c, i64 %d
	ret i64 %c.d			ret i64 %c.d
	}			}
	; CHECK-LABEL: select_cc_32_64			; CHECK-LABEL: select_cc_32_64
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i32 @select_cc_64_32(i64 %a, i64 %b, i32 %c, i32 %d) local_unnamed_addr #0 {			define dso_local i32 @select_cc_64_32(i64 %a, i64 %b, i32 %c, i32 %d) local_unnamed_addr #0 {
	entry:			entry:
	%cmp = icmp sgt i64 %a, %b			%cmp = icmp sgt i64 %a, %b
	%c.d = select i1 %cmp, i32 %c, i32 %d			%c.d = select i1 %cmp, i32 %c, i32 %d
	ret i32 %c.d			ret i32 %c.d
	}			}
	; CHECK-LABEL: select_cc_64_32			; CHECK-LABEL: select_cc_64_32
	; CHECK-NOT: r{{[0-9]+}} <<= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i32 @selecti_cc_32(i32 %a, i32 %c, i32 %d) local_unnamed_addr #0 {			define dso_local i32 @selecti_cc_32(i32 %a, i32 %c, i32 %d) local_unnamed_addr #0 {
	entry:			entry:
	%cmp = icmp ugt i32 %a, 10			%cmp = icmp ugt i32 %a, 10
	%c.d = select i1 %cmp, i32 %c, i32 %d			%c.d = select i1 %cmp, i32 %c, i32 %d
	ret i32 %c.d			ret i32 %c.d
	}			}
	; CHECK-LABEL: selecti_cc_32			; CHECK-LABEL: selecti_cc_32
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i64 @selecti_cc_32_64(i32 %a, i64 %c, i64 %d) local_unnamed_addr #0 {			define dso_local i64 @selecti_cc_32_64(i32 %a, i64 %c, i64 %d) local_unnamed_addr #0 {
	entry:			entry:
	%cmp = icmp ugt i32 %a, 11			%cmp = icmp ugt i32 %a, 11
	%c.d = select i1 %cmp, i64 %c, i64 %d			%c.d = select i1 %cmp, i64 %c, i64 %d
	ret i64 %c.d			ret i64 %c.d
	}			}
	; CHECK-LABEL: selecti_cc_32_64			; CHECK-LABEL: selecti_cc_32_64
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i32 @selecti_cc_64_32(i64 %a, i32 %c, i32 %d) local_unnamed_addr #0 {			define dso_local i32 @selecti_cc_64_32(i64 %a, i32 %c, i32 %d) local_unnamed_addr #0 {
	entry:			entry:
	%cmp = icmp sgt i64 %a, 12			%cmp = icmp sgt i64 %a, 12
	%c.d = select i1 %cmp, i32 %c, i32 %d			%c.d = select i1 %cmp, i32 %c, i32 %d
	ret i32 %c.d			ret i32 %c.d
	}			}
	; CHECK-LABEL: selecti_cc_64_32			; CHECK-LABEL: selecti_cc_64_32
	; CHECK-NOT: r{{[0-9]+}} <<= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32

llvm/test/CodeGen/BPF/32-bit-subreg-peephole-phi-1.ll

Show All 21 Lines	entry:
%cmp2 = icmp slt i32 %c, %a		%cmp2 = icmp slt i32 %c, %a
%cond3 = select i1 %cmp2, i32 1, i32 2		%cond3 = select i1 %cmp2, i32 1, i32 2
%ret.0 = select i1 %cmp, i32 %cond, i32 %cond3		%ret.0 = select i1 %cmp, i32 %cond, i32 %cond3
%cmp4 = icmp eq i32 %ret.0, %b		%cmp4 = icmp eq i32 %ret.0, %b
%conv = zext i1 %cmp4 to i32		%conv = zext i1 %cmp4 to i32
%call = tail call i32 @helper(i32 %conv)		%call = tail call i32 @helper(i32 %conv)
ret i32 %call		ret i32 %call
}		}
; CHECK: r{{[0-9]+}} >>= 32		; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
; CHECK-NOT: r{{[0-9]+}} >>= 32		; CHECK-NOT: r{{[0-9]+}} >>= 32
; CHECK: if r{{[0-9]+}} == r{{[0-9]+}} goto		; CHECK: if r{{[0-9]+}} == r{{[0-9]+}} goto

declare dso_local i32 @helper(i32) local_unnamed_addr		declare dso_local i32 @helper(i32) local_unnamed_addr

llvm/test/CodeGen/BPF/32-bit-subreg-peephole-phi-2.ll

Show All 21 Lines	entry:
%cmp2 = icmp slt i32 %c, %a		%cmp2 = icmp slt i32 %c, %a
%cond3 = select i1 %cmp2, i32 1, i32 2		%cond3 = select i1 %cmp2, i32 1, i32 2
%ret.0 = select i1 %cmp, i32 %cond, i32 %cond3		%ret.0 = select i1 %cmp, i32 %cond, i32 %cond3
%cmp4 = icmp eq i32 %ret.0, %b		%cmp4 = icmp eq i32 %ret.0, %b
%conv = zext i1 %cmp4 to i32		%conv = zext i1 %cmp4 to i32
%call = tail call i32 @helper(i32 %conv)		%call = tail call i32 @helper(i32 %conv)
ret i32 %call		ret i32 %call
}		}
; CHECK: r{{[0-9]+}} >>= 32		; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
; CHECK: r{{[0-9]+}} >>= 32		; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
; CHECK: if r{{[0-9]+}} == r{{[0-9]+}} goto		; CHECK: if r{{[0-9]+}} == r{{[0-9]+}} goto

declare dso_local i32 @helper(i32) local_unnamed_addr		declare dso_local i32 @helper(i32) local_unnamed_addr

llvm/test/CodeGen/BPF/32-bit-subreg-peephole-phi-3.ll

Show All 38 Lines	for.body: ; preds = %for.body, %entry
%cmp2 = icmp ugt i64 %b.addr.015, %c		%cmp2 = icmp ugt i64 %b.addr.015, %c
%0 = or i1 %cmp2, %cmp1		%0 = or i1 %cmp2, %cmp1
%val.2 = select i1 %0, i32 1, i32 %val.017		%val.2 = select i1 %0, i32 1, i32 %val.017
%add5 = add i64 %b.addr.015, %c		%add5 = add i64 %b.addr.015, %c
%inc = add nuw nsw i64 %i.018, 1		%inc = add nuw nsw i64 %i.018, 1
%exitcond = icmp eq i64 %inc, 100		%exitcond = icmp eq i64 %inc, 100
br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !2		br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !2
}		}
; CHECK: [[VAL:r[0-9]+]] <<= 32		; CHECK: [[VAL:r[0-9]+]] = w{{[0-9]+}}
; CHECK: [[VAL]] >>= 32		; CHECK-NOT: [[VAL:r[0-9]+]] <<= 32
		; CHECK-NOT: [[VAL]] >>= 32
; CHECK: if [[VAL]] == 0 goto		; CHECK: if [[VAL]] == 0 goto

!2 = distinct !{!2, !3}		!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.unroll.disable"}		!3 = !{!"llvm.loop.unroll.disable"}

llvm/test/CodeGen/BPF/32-bit-subreg-peephole.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; }			; }

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i64 @select_u(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {			define dso_local i64 @select_u(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {
	; CHECK-LABEL: select_u:			; CHECK-LABEL: select_u:
	entry:			entry:
	%cmp = icmp ugt i32 %a, %b			%cmp = icmp ugt i32 %a, %b
	%c.d = select i1 %cmp, i64 %c, i64 %d			%c.d = select i1 %cmp, i64 %c, i64 %d
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32
	; CHECK: if r{{[0-9]+}} {{<\|>}} r{{[0-9]+}} goto			; CHECK: if r{{[0-9]+}} {{<\|>}} r{{[0-9]+}} goto
	ret i64 %c.d			ret i64 %c.d
	}			}

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i64 @select_u_2(i32 %a, i64 %b, i64 %c, i64 %d) local_unnamed_addr #0 {			define dso_local i64 @select_u_2(i32 %a, i64 %b, i64 %c, i64 %d) local_unnamed_addr #0 {
	; CHECK-LABEL: select_u_2:			; CHECK-LABEL: select_u_2:
	entry:			entry:
	%conv = zext i32 %a to i64			%conv = zext i32 %a to i64
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32
	%cmp = icmp ugt i64 %conv, %b			%cmp = icmp ugt i64 %conv, %b
	%c.d = select i1 %cmp, i64 %c, i64 %d			%c.d = select i1 %cmp, i64 %c, i64 %d
	ret i64 %c.d			ret i64 %c.d
	}			}

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i64 @select_s(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {			define dso_local i64 @select_s(i32 %a, i32 %b, i64 %c, i64 %d) local_unnamed_addr #0 {
	; CHECK-LABEL: select_s:			; CHECK-LABEL: select_s:
	entry:			entry:
	%cmp = icmp sgt i32 %a, %b			%cmp = icmp sgt i32 %a, %b
	%c.d = select i1 %cmp, i64 %c, i64 %d			%c.d = select i1 %cmp, i64 %c, i64 %d
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} <<= 32
	; CHECK-NEXT: r{{[0-9]+}} s>>= 32			; CHECK-NEXT: r{{[0-9]+}} s>>= 32
	; CHECK: if r{{[0-9]+}} s{{<\|>}} r{{[0-9]+}} goto			; CHECK: if r{{[0-9]+}} s{{<\|>}} r{{[0-9]+}} goto
	ret i64 %c.d			ret i64 %c.d
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define dso_local i32 @foo(i32 %b, i32 %c) local_unnamed_addr #0 {			define dso_local i32 @foo(i32 %b, i32 %c) local_unnamed_addr #0 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	entry:			entry:
	%call = tail call i64 bitcast (i64 (...)* @bar to i64 ()*)() #2			%call = tail call i64 bitcast (i64 (...)* @bar to i64 ()*)() #2
	%conv = trunc i64 %call to i32			%conv = trunc i64 %call to i32
	%cmp = icmp ult i32 %conv, 10			%cmp = icmp ult i32 %conv, 10
	; The shifts can't be optimized out because %call comes from function call			; %call comes from function call returning i64 so the high bits will need
	; returning i64 so the high bits might be valid.			; to be cleared.
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32
	%b.c = select i1 %cmp, i32 %b, i32 %c			%b.c = select i1 %cmp, i32 %b, i32 %c
	; CHECK: if r{{[0-9]+}} {{<\|>}} {{[0-9]+}} goto			; CHECK: if r{{[0-9]+}} {{<\|>}} {{[0-9]+}} goto
	ret i32 %b.c			ret i32 %b.c
	}			}

	declare dso_local i64 @bar(...) local_unnamed_addr #1			declare dso_local i64 @bar(...) local_unnamed_addr #1

	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define dso_local i32* @inc_p(i32* readnone %p, i32 %a) local_unnamed_addr #0 {			define dso_local i32* @inc_p(i32* readnone %p, i32 %a) local_unnamed_addr #0 {
	; CHECK-LABEL: inc_p:			; CHECK-LABEL: inc_p:
	entry:			entry:
	%idx.ext = zext i32 %a to i64			%idx.ext = zext i32 %a to i64
	; CHECK: r{{[0-9]+}} <<= 32			; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
	; CHECK-NEXT: r{{[0-9]+}} >>= 32			; CHECK-NOT: r{{[0-9]+}} <<= 32
				; CHECK-NOT: r{{[0-9]+}} >>= 32
	%add.ptr = getelementptr inbounds i32, i32* %p, i64 %idx.ext			%add.ptr = getelementptr inbounds i32, i32* %p, i64 %idx.ext
	ret i32* %add.ptr			ret i32* %add.ptr
	}			}

	define dso_local i32 @test() local_unnamed_addr {			define dso_local i32 @test() local_unnamed_addr {
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	entry:			entry:
	%call = tail call i32 bitcast (i32 (...)* @helper to i32 ()*)()			%call = tail call i32 bitcast (i32 (...)* @helper to i32 ()*)()
	Show All 10 Lines

llvm/test/CodeGen/BPF/32-bit-subreg-zext.ll

This file was added.

				; RUN: llc -O2 -march=bpfel -mattr=+alu32 < %s \| FileCheck %s
				; RUN: llc -O2 -march=bpfel -mcpu=v3 < %s \| FileCheck %s
				; RUN: llc -O2 -march=bpfeb -mattr=+alu32 < %s \| FileCheck %s
				; RUN: llc -O2 -march=bpfeb -mcpu=v3 < %s \| FileCheck %s
				;
				; long zext(unsigned int a)
				; {
				; long b = a;
				; return b;
				; }

				; Function Attrs: norecurse nounwind
				define dso_local i64 @zext(i32 %a) local_unnamed_addr #0 {
				entry:
				%conv = zext i32 %a to i64
				; CHECK-NOT: r[[#]] <<= 32
				; CHECK-NOT: r[[#]] >>= 32
				ret i64 %conv
				}

				attributes #0 = { norecurse nounwind }