This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add a new pass to fold extension into load instructions in previous BB
Needs ReviewPublic

Authored by Carrot on May 24 2018, 2:16 PM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
chandlerc
RKSimon

Summary

Compile the attached new test case, original llvm generates

movzwl  a, %esi          // %esi already contains zero extended value
calll   v1  
jmp     .LBB0_3

.LBB0_2:

movzwl  a+2, %esi    // %esi already contains zero extended value
calll   v2

.LBB0_3:

movzwl  %si, %eax    // another zext, we should avoid it.
cmpl    $4, %eax

In source code all related values are 16 bit. In lowering phase, function X86TargetLowering::EmitCmp intentionally creates zero extension before cmp in order to avoid 16 bit immediate, which is slow on modern x86. Later in X86FixupBWInsts.cpp 16 bit loads are changed to load and extension, it makes the extension before cmp redundant.

The extension created by X86TargetLowering::EmitCmp is beneficial, we need to fold it into previous load instructions to get best performance. The function optimizeLoadInstr() can only move the load into its single user in the same BB, so it can fold the simple ext(load) pair, but it can't do the folding cross BB, and it can't fold ext into load, so it can't be used for this purpose.

So I write this new pass X86FoldXBBExtLoad.cpp to fold 16bit extension into previous load instructions.

Diff Detail

Event Timeline

Carrot created this revision.May 24 2018, 2:16 PM

Herald added a subscriber: mgorny. · View Herald TranscriptMay 24 2018, 2:16 PM

craig.topper added reviewers: RKSimon, spatel, chandlerc.May 24 2018, 2:22 PM

We have an MI combine pass IIRC -- why not put this there? (Genuine question, haven't looked at these enough to know if that doesn't work...)

test/CodeGen/X86/fold-xbb-ext-load.ll
9–16	Can you craft yoru test case to be reasonable to read with the automatically generated CHECK lines? That always makes maintenance easier if the result is readable... not sure if these will be. You might need to break this apart into separate functions to make it read better.

In D47346#1111861, @chandlerc wrote:

We have an MI combine pass IIRC -- why not put this there? (Genuine question, haven't looked at these enough to know if that doesn't work...)

The patterns handled by MachineCombiner usually are simple code sequences in a single basic block, new code sequences are inserted at the root instruction. The pattern handled by this file may cross several basic blocks, new instructions are inserted at multiple positions in previous basic blocks. So it can't be handled by MachineCombiner.

RKSimon added inline comments.Jun 7 2018, 2:12 AM

lib/Target/X86/X86FoldXBBExtLoad.cpp
2	Fix filename and brief
test/CodeGen/X86/fold-xbb-ext-load.ll
9–16	This doesn't look like its auto-generated - can you use update_llc_test_check.py?

xbolva00 added a subscriber: xbolva00.Jun 7 2018, 2:21 AM

Carrot updated this revision to Diff 150417.Jun 7 2018, 3:31 PM

Carrot marked 2 inline comments as done.

This looks nice. Any more reviews?

Some stats from benchmarks maybe? :)

Still looking at this, @Carrot ?

RKSimon resigned from this revision.Feb 7 2021, 6:24 AM

Herald added subscribers: nikic, pengfei. · View Herald TranscriptFeb 7 2021, 6:24 AM

Revision Contents

Path

Size

lib/

Target/

X86/

CMakeLists.txt

1 line

X86.h

3 lines

X86FoldXBBExtLoad.cpp

204 lines

X86TargetMachine.cpp

2 lines

test/

CodeGen/

X86/

O3-pipeline.ll

1 line

bmi-intrinsics-fast-isel.ll

6 lines

fold-xbb-ext-load.ll

58 lines

jump_sign.ll

1 line

Diff 150417

lib/Target/X86/CMakeLists.txt

Show All 30 Lines	set(sources
X86ExpandPseudo.cpp		X86ExpandPseudo.cpp
X86FastISel.cpp		X86FastISel.cpp
X86FixupBWInsts.cpp		X86FixupBWInsts.cpp
X86FixupLEAs.cpp		X86FixupLEAs.cpp
X86AvoidStoreForwardingBlocks.cpp		X86AvoidStoreForwardingBlocks.cpp
X86FixupSetCC.cpp		X86FixupSetCC.cpp
X86FlagsCopyLowering.cpp		X86FlagsCopyLowering.cpp
X86FloatingPoint.cpp		X86FloatingPoint.cpp
		X86FoldXBBExtLoad.cpp
X86FrameLowering.cpp		X86FrameLowering.cpp
X86InstructionSelector.cpp		X86InstructionSelector.cpp
X86ISelDAGToDAG.cpp		X86ISelDAGToDAG.cpp
X86ISelLowering.cpp		X86ISelLowering.cpp
X86IndirectBranchTracking.cpp		X86IndirectBranchTracking.cpp
X86InterleavedAccess.cpp		X86InterleavedAccess.cpp
X86InstrFMA3Info.cpp		X86InstrFMA3Info.cpp
X86InstrInfo.cpp		X86InstrInfo.cpp
Show All 29 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	FunctionPass *createX86RetpolineThunksPass();			FunctionPass *createX86RetpolineThunksPass();

	InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,			InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
	X86Subtarget &,			X86Subtarget &,
	X86RegisterBankInfo &);			X86RegisterBankInfo &);

	void initializeEvexToVexInstPassPass(PassRegistry &);			void initializeEvexToVexInstPassPass(PassRegistry &);

				FunctionPass *createX86FoldXBBExtLoad();

				void initializeFoldXBBExtLoadPassPass(PassRegistry &);
	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Target/X86/X86FoldXBBExtLoad.cpp

				//===-- X86FoldXBBExtLoad.cpp - Fold cross BB ext/load instructions --------===//
				//
				RKSimonUnsubmitted Done Reply Inline Actions Fix filename and brief RKSimon: Fix filename and brief
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				// ===----------------------------------------------------------------------===//
				/// \file
				/// This file defines the pass that finds 16bit sign/zero extensions that can be
				/// folded into previous loads, may cross multiple basic blocks. This problem is
				/// created by function X86TargetLowering::EmitCmp, in order to avoid 16 bit
				/// immediates, this function intentionally creats an extension. If the value
				/// comes from memory, then the extension can be folded into the load.
				///
				//===----------------------------------------------------------------------===//

				#include "X86.h"
				#include "X86InstrInfo.h"
				#include "X86Subtarget.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				using namespace llvm;

				#define FOLDXBBEXTLOAD_DESC "X86 Cross BB ZExt/SExt Load Folding"
				#define FOLDXBBEXTLOAD_NAME "x86-fold-xbb-ext-load"

				#define DEBUG_TYPE FOLDXBBEXTLOAD_NAME

				// Option to allow this optimization pass to have fine-grained control.
				static cl::opt<bool>
				FoldXBBExtLoad("fold-xbb-ext-load",
				cl::desc("Fold cross basic block sext/zext load instructions"),
				cl::init(true), cl::Hidden);

				namespace {
				class FoldXBBExtLoadPass : public MachineFunctionPass {
				// This function finds the foldable instruction pattern and do the
				// transformation if possible.
				// MI is a 16b to 32b extension instruction.
				bool tryFoldInst(MachineInstr *MI);

				public:
				static char ID;

				StringRef getPassName() const override { return FOLDXBBEXTLOAD_DESC; }

				FoldXBBExtLoadPass() : MachineFunctionPass(ID) {
				initializeFoldXBBExtLoadPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::IsSSA);
				}

				private:
				MachineFunction *MF;
				MachineRegisterInfo* MRI;
				};

				char FoldXBBExtLoadPass::ID = 0;
				}

				INITIALIZE_PASS(FoldXBBExtLoadPass, FOLDXBBEXTLOAD_NAME, FOLDXBBEXTLOAD_DESC,
				false, false)

				FunctionPass *llvm::createX86FoldXBBExtLoad() {
				return new FoldXBBExtLoadPass();
				}

				bool FoldXBBExtLoadPass::runOnMachineFunction(MachineFunction &MF) {
				if (!FoldXBBExtLoad \|\| skipFunction(MF.getFunction()))
				return false;

				this->MF = &MF;
				MRI = &MF.getRegInfo();

				// To be deleted extension instructions.
				SmallVector<MachineInstr *, 4> MIDelete;

				LLVM_DEBUG(dbgs() << "Start X86FoldXBBExtLoad\n";);

				for (auto &MBB : MF)
				for (auto I = MBB.rbegin(); I != MBB.rend(); ++I) {
				MachineInstr MI = &I;
				if (MI->getOpcode() == X86::MOVZX32rr16 \|\|
				MI->getOpcode() == X86::MOVSX32rr16)
				if (tryFoldInst(MI))
				MIDelete.push_back(MI);
				}

				while (!MIDelete.empty()) {
				MachineInstr *MI = MIDelete.pop_back_val();
				MachineBasicBlock *MBB = MI->getParent();
				MBB->erase(MI);
				}

				LLVM_DEBUG(dbgs() << "End X86FoldXBBExtLoad\n";);

				return true;
				}

				// This function actually does the pattern matching and transformation.
				// It starts from ExtMI, which is a 16b to 32b extension instruction.
				bool FoldXBBExtLoadPass::tryFoldInst(MachineInstr *ExtMI) {
				// All involved 16b virtual registers.
				SmallSetVector<unsigned, 4> AllRegs;
				// Work list.
				SmallVector<unsigned, 4> RegList;

				if (TargetRegisterInfo::isPhysicalRegister(ExtMI->getOperand(0).getReg()))
				return false;

				unsigned Reg = ExtMI->getOperand(1).getReg();
				RegList.push_back(Reg);
				AllRegs.insert(Reg);

				// Check if all defs of Reg is 16bit load.
				while (!RegList.empty()) {
				auto RegNo = RegList.pop_back_val();
				if (TargetRegisterInfo::isPhysicalRegister(RegNo))
				return false;

				auto *MI = MRI->getVRegDef(RegNo);
				switch (MI->getOpcode()) {
				case X86::PHI:
				for (unsigned i = 1, e = MI->getNumOperands(); i != e; i += 2)
				if (MI->getOperand(i).isReg()) {
				auto NewReg = MI->getOperand(i).getReg();
				if (AllRegs.insert(NewReg))
				RegList.push_back(NewReg);
				}
				break;

				case X86::MOV16rm:
				break;

				default:
				// Ext can't be folded into other instructions.
				return false;
				}
				}

				// Now we can change all 16b load instructions to load and extension.
				const X86InstrInfo *TII = MF->getSubtarget<X86Subtarget>().getInstrInfo();
				const TargetLowering *TLI =
				MF->getSubtarget<X86Subtarget>().getTargetLowering();
				const TargetRegisterClass *NewClass = TLI->getRegClassFor(MVT::i32);
				for (auto RegNo : AllRegs) {
				// Change the register class to 32bit.
				MRI->setRegClass(RegNo, NewClass);

				// If the def instruction is 16bit load, change it to ext load.
				// If the def is PHI, do nothing, change the register class is enough.
				auto *DefMI = MRI->getVRegDef(RegNo);
				if (DefMI->getOpcode() == X86::MOV16rm) {
				unsigned NewOpcode = (ExtMI->getOpcode() == X86::MOVZX32rr16) ?
				X86::MOVZX32rm16 : X86::MOVSX32rm16;
				MachineInstrBuilder MIB =
				BuildMI(*MF, DefMI->getDebugLoc(), TII->get(NewOpcode), RegNo);

				unsigned NumArgs = DefMI->getNumOperands();
				for (unsigned i = 1; i < NumArgs; ++i)
				MIB.add(DefMI->getOperand(i));
				MIB->setMemRefs(DefMI->memoperands_begin(), DefMI->memoperands_end());

				MachineBasicBlock *MBB = DefMI->getParent();
				MBB->insert(DefMI, MIB);
				MBB->erase(DefMI);
				}

				// Change register uses to subreg.
				for (auto &MO : MRI->use_operands(RegNo)) {
				auto *MI = MO.getParent();
				// Do nothing for PHI instructions.
				if (MI->getOpcode() == X86::PHI &&
				AllRegs.count(MI->getOperand(0).getReg()))
				continue;

				// Change other uses to sub reg.
				MO.setSubReg(X86::sub_16bit);
				}
				}

				// Now we can safely replace users of ExtMI to its src register.
				SmallVector<MachineOperand *, 4> MOUsers;
				unsigned OldReg = ExtMI->getOperand(0).getReg();
				unsigned NewReg = ExtMI->getOperand(1).getReg();
				for (auto &ExtMO : MRI->use_operands(OldReg))
				MOUsers.push_back(&ExtMO);
				while (!MOUsers.empty()) {
				MachineOperand *MO = MOUsers.pop_back_val();
				MO->setReg(NewReg);
				}
				// We don't know which live range ends first, so clear kill flags of both
				// registers.
				MRI->clearKillFlags(OldReg);
				MRI->clearKillFlags(NewReg);

				return true;
				}

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	extern "C" void LLVMInitializeX86Target() {
initializeFixupLEAPassPass(PR);		initializeFixupLEAPassPass(PR);
initializeShadowCallStackPass(PR);		initializeShadowCallStackPass(PR);
initializeX86CallFrameOptimizationPass(PR);		initializeX86CallFrameOptimizationPass(PR);
initializeX86CmovConverterPassPass(PR);		initializeX86CmovConverterPassPass(PR);
initializeX86ExecutionDomainFixPass(PR);		initializeX86ExecutionDomainFixPass(PR);
initializeX86DomainReassignmentPass(PR);		initializeX86DomainReassignmentPass(PR);
initializeX86AvoidSFBPassPass(PR);		initializeX86AvoidSFBPassPass(PR);
initializeX86FlagsCopyLoweringPassPass(PR);		initializeX86FlagsCopyLoweringPassPass(PR);
		initializeFoldXBBExtLoadPassPass(PR);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO()) {		if (TT.isOSBinFormatMachO()) {
if (TT.getArch() == Triple::x86_64)		if (TT.getArch() == Triple::x86_64)
return llvm::make_unique<X86_64MachoTargetObjectFile>();		return llvm::make_unique<X86_64MachoTargetObjectFile>();
return llvm::make_unique<TargetLoweringObjectFileMachO>();		return llvm::make_unique<TargetLoweringObjectFileMachO>();
}		}
▲ Show 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	void X86PassConfig::addPreRegAlloc() {
}		}

addPass(createX86FlagsCopyLoweringPass());		addPass(createX86FlagsCopyLoweringPass());
addPass(createX86WinAllocaExpander());		addPass(createX86WinAllocaExpander());
}		}
void X86PassConfig::addMachineSSAOptimization() {		void X86PassConfig::addMachineSSAOptimization() {
addPass(createX86DomainReassignmentPass());		addPass(createX86DomainReassignmentPass());
TargetPassConfig::addMachineSSAOptimization();		TargetPassConfig::addMachineSSAOptimization();
		addPass(createX86FoldXBBExtLoad());
}		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }		void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }

Show All 29 Lines

test/CodeGen/X86/O3-pipeline.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Early Machine Loop Invariant Code Motion			; CHECK-NEXT: Early Machine Loop Invariant Code Motion
	; CHECK-NEXT: Machine Common Subexpression Elimination			; CHECK-NEXT: Machine Common Subexpression Elimination
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: Machine code sinking			; CHECK-NEXT: Machine code sinking
	; CHECK-NEXT: Peephole Optimizations			; CHECK-NEXT: Peephole Optimizations
	; CHECK-NEXT: Remove dead machine instructions			; CHECK-NEXT: Remove dead machine instructions
				; CHECK-NEXT: X86 Cross BB ZExt/SExt Load Folding
	; CHECK-NEXT: Live Range Shrink			; CHECK-NEXT: Live Range Shrink
	; CHECK-NEXT: X86 Fixup SetCC			; CHECK-NEXT: X86 Fixup SetCC
	; CHECK-NEXT: X86 LEA Optimize			; CHECK-NEXT: X86 LEA Optimize
	; CHECK-NEXT: X86 Optimize Call Frame			; CHECK-NEXT: X86 Optimize Call Frame
	; CHECK-NEXT: X86 Avoid Store Forwarding Block			; CHECK-NEXT: X86 Avoid Store Forwarding Block
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: X86 EFLAGS copy lowering			; CHECK-NEXT: X86 EFLAGS copy lowering
	; CHECK-NEXT: X86 WinAlloca Expander			; CHECK-NEXT: X86 WinAlloca Expander
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

test/CodeGen/X86/bmi-intrinsics-fast-isel.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -fast-isel-sink-local-values < %s -fast-isel -mtriple=i686-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefix=X32			; RUN: llc -fast-isel-sink-local-values < %s -fast-isel -mtriple=i686-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefix=X32
	; RUN: llc -fast-isel-sink-local-values < %s -fast-isel -mtriple=x86_64-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefix=X64			; RUN: llc -fast-isel-sink-local-values < %s -fast-isel -mtriple=x86_64-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefix=X64

	; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/bmi-builtins.c			; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/bmi-builtins.c

	;			;
	; AMD Intrinsics			; AMD Intrinsics
	;			;

	define i16 @test__tzcnt_u16(i16 %a0) {			define i16 @test__tzcnt_u16(i16 %a0) {
	; X32-LABEL: test__tzcnt_u16:			; X32-LABEL: test__tzcnt_u16:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movzwl %ax, %ecx			; X32-NEXT: cmpl $0, %eax
	; X32-NEXT: cmpl $0, %ecx
	; X32-NEXT: jne .LBB0_1			; X32-NEXT: jne .LBB0_1
	; X32-NEXT: # %bb.2:			; X32-NEXT: # %bb.2:
	; X32-NEXT: movw $16, %ax			; X32-NEXT: movw $16, %ax
	; X32-NEXT: retl			; X32-NEXT: retl
	; X32-NEXT: .LBB0_1:			; X32-NEXT: .LBB0_1:
	; X32-NEXT: tzcntw %ax, %ax			; X32-NEXT: tzcntw %ax, %ax
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	;			;
	; Intel intrinsics			; Intel intrinsics
	;			;

	define i16 @test_tzcnt_u16(i16 %a0) {			define i16 @test_tzcnt_u16(i16 %a0) {
	; X32-LABEL: test_tzcnt_u16:			; X32-LABEL: test_tzcnt_u16:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movzwl %ax, %ecx			; X32-NEXT: cmpl $0, %eax
	; X32-NEXT: cmpl $0, %ecx
	; X32-NEXT: jne .LBB7_1			; X32-NEXT: jne .LBB7_1
	; X32-NEXT: # %bb.2:			; X32-NEXT: # %bb.2:
	; X32-NEXT: movw $16, %ax			; X32-NEXT: movw $16, %ax
	; X32-NEXT: retl			; X32-NEXT: retl
	; X32-NEXT: .LBB7_1:			; X32-NEXT: .LBB7_1:
	; X32-NEXT: tzcntw %ax, %ax			; X32-NEXT: tzcntw %ax, %ax
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

test/CodeGen/X86/fold-xbb-ext-load.ll

				; RUN: llc < %s -mtriple=i686-unknown-unknown -tail-dup-placement=false \| FileCheck %s

				declare void @v1()
				declare void @v2()
				declare void @v3()
				@a = external global [10 x i16]

				define void @foo(i32 %cond) {
				; CHECK-LABEL: foo:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: pushl %esi
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: .cfi_offset %esi, -8
				; CHECK-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; CHECK-NEXT: je .LBB0_2
				; CHECK-NEXT: # %bb.1: # %if.then
				chandlercUnsubmitted Done Reply Inline Actions Can you craft yoru test case to be reasonable to read with the automatically generated CHECK lines? That always makes maintenance easier if the result is readable... not sure if these will be. You might need to break this apart into separate functions to make it read better. chandlerc: Can you craft yoru test case to be reasonable to read with the automatically generated CHECK…
				RKSimonUnsubmitted Done Reply Inline Actions This doesn't look like its auto-generated - can you use update_llc_test_check.py? RKSimon: This doesn't look like its auto-generated - can you use update_llc_test_check.py?
				; CHECK-NEXT: movzwl a, %esi
				; CHECK-NEXT: calll v1
				; CHECK-NEXT: jmp .LBB0_3
				; CHECK-NEXT: .LBB0_2: # %if.else
				; CHECK-NEXT: movzwl a+2, %esi
				; CHECK-NEXT: calll v2
				; CHECK-NEXT: .LBB0_3: # %if.end
				; CHECK-NEXT: cmpl $4, %esi
				; CHECK-NEXT: jb .LBB0_5
				; CHECK-NEXT: # %bb.4: # %if.then1
				; CHECK-NEXT: calll v3
				; CHECK-NEXT: .LBB0_5: # %if.end2
				; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .cfi_def_cfa_offset 4
				; CHECK-NEXT: retl
				entry:
				%tobool = icmp eq i32 %cond, 0
				br i1 %tobool, label %if.else, label %if.then

				if.then:
				%0 = load i16, i16* getelementptr ([10 x i16], [10 x i16]* @a, i64 0, i64 0)
				call void @v1()
				br label %if.end

				if.else:
				%1 = load i16, i16* getelementptr ([10 x i16], [10 x i16]* @a, i64 0, i64 1)
				call void @v2()
				br label %if.end

				if.end:
				%2 = phi i16 [ %0, %if.then ], [ %1, %if.else ]
				%cmp = icmp ugt i16 %2, 3
				br i1 %cmp, label %if.then1, label %if.end2

				if.then1:
				call void @v3()
				br label %if.end2

				if.end2:
				ret void
				}

test/CodeGen/X86/jump_sign.ll

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB12_5			; CHECK-NEXT: jne .LBB12_5
	; CHECK-NEXT: # %bb.3: # %sw.bb			; CHECK-NEXT: # %bb.3: # %sw.bb
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB12_8			; CHECK-NEXT: jne .LBB12_8
	; CHECK-NEXT: # %bb.4: # %if.end29			; CHECK-NEXT: # %bb.4: # %if.end29
	; CHECK-NEXT: movzwl (%eax), %eax			; CHECK-NEXT: movzwl (%eax), %eax
	; CHECK-NEXT: movzwl %ax, %eax
	; CHECK-NEXT: imull $52429, %eax, %ecx # imm = 0xCCCD			; CHECK-NEXT: imull $52429, %eax, %ecx # imm = 0xCCCD
	; CHECK-NEXT: shrl $19, %ecx			; CHECK-NEXT: shrl $19, %ecx
	; CHECK-NEXT: addl %ecx, %ecx			; CHECK-NEXT: addl %ecx, %ecx
	; CHECK-NEXT: leal (%ecx,%ecx,4), %ecx			; CHECK-NEXT: leal (%ecx,%ecx,4), %ecx
	; CHECK-NEXT: cmpw %cx, %ax			; CHECK-NEXT: cmpw %cx, %ax
	; CHECK-NEXT: jne .LBB12_5			; CHECK-NEXT: jne .LBB12_5
	; CHECK-NEXT: .LBB12_8: # %if.then44			; CHECK-NEXT: .LBB12_8: # %if.then44
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add a new pass to fold extension into load instructions in previous BBNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150417

lib/Target/X86/CMakeLists.txt

lib/Target/X86/X86.h

lib/Target/X86/X86FoldXBBExtLoad.cpp

lib/Target/X86/X86TargetMachine.cpp

test/CodeGen/X86/O3-pipeline.ll

test/CodeGen/X86/bmi-intrinsics-fast-isel.ll

test/CodeGen/X86/fold-xbb-ext-load.ll

test/CodeGen/X86/jump_sign.ll

[X86] Add a new pass to fold extension into load instructions in previous BB
Needs ReviewPublic