This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86.td
-
X86ISelDAGToDAG.cpp
-
X86Subtarget.h
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
bmi-x86_64.ll
-
bmi.ll
-
extract-bits.ll

Differential D52570

[X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on compiling for a CPU with single uop BEXTR
ClosedPublic

Authored by craig.topper on Sep 26 2018, 1:14 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
lebedev.ri
andreadb

Commits

rG1709829fede3: [X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on…
rL343399: [X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on…

Summary

This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction.

The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop.

For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Sep 26 2018, 1:14 PM

The code that led to PR38938 saw a perf improvement by moving to BEXTR on btver2 - both from the load-folding and hoisting the control constant out of the loop.

I'd really like to avoid yet another fast/slow feature flag but there might not be a better way right now if you want to get this in - @andreadb any thoughts?

https://godbolt.org/z/zdktz_

Forgot to mention that BEXTR breaks the two addressness of the shift+and pattern to avoid a copy which can also be beneficial. Unfortunately isel can't generally determine that a copy will be needed.

I agree, I don't really want to add a new fast/slow flag either.

Can we come up with a slightly better logic rather than just haveTBM()?
Else, i'm not sure how to fix D52293 / PR38938.
Can we perhaps also allow cases when we are shifting a load?

lib/Target/X86/X86ISelDAGToDAG.cpp
2598 ↗	(On Diff #167180)	I think we are also missing a check that NVT is not i64 if we are in 32-bit mode.

In D52570#1247140, @craig.topper wrote:

Forgot to mention that BEXTR breaks the two addressness of the shift+and pattern to avoid a copy which can also be beneficial. Unfortunately isel can't generally determine that a copy will be needed.

I agree, I don't really want to add a new fast/slow flag either.

As Simon already wrote, BEXTR is very fast on AMD.
The throughput reported by llvm-mca matches what I see with perf.

What if at ISel stage we just select BEXTR, and then we have a later (peephole) pass that expands it based on the subtarget and properties of the machine instruction?

Basically, for BEXTR we could do something similar to what is done for LEA. We could have a "fixup" pass (or custom patterns matched by the MachineCombiner) to expand BEXTR when the shift-and sequence is more convenient. This would still require a subtarget target hook.

If the decision only depends on the subtarget, and (maybe) properties of the instructions, then we could have tablegen automatically expand a TargetSubtarget hook for us.

Something like this (just an example...):

Index: lib/Target/X86/X86SchedPredicates.td
===================================================================
--- lib/Target/X86/X86SchedPredicates.td        (revision 343198)
+++ lib/Target/X86/X86SchedPredicates.td        (working copy)
@@ -54,3 +54,5 @@
// X86GenInstrInfo.
def IsThreeOperandsLEAFn :
    TIIPredicate<"isThreeOperandsLEA", IsThreeOperandsLEABody>;
+
+def ShouldExpandBEXTRDecl : STIPredicateDecl<"shouldExpandBEXTR", TruePred, /* overrides */ 0, /* expandForMC */ 0>;
Index: lib/Target/X86/X86ScheduleBtVer2.td
===================================================================
--- lib/Target/X86/X86ScheduleBtVer2.td (revision 343198)
+++ lib/Target/X86/X86ScheduleBtVer2.td (working copy)
@@ -772,4 +772,9 @@
  ], ZeroIdiomPredicate>
]>;

+def : STIPredicate<
+  ShouldExpandBEXTRDecl,
+  [ InstructionEquivalenceClass<[BEXTR32rr, BEXTR64rr], FalsePred> ]
+>;
+
} // SchedModel

..that generates this:

bool X86GenSubtargetInfo::shouldExpandBEXTR(const MachineInstr *MI) const {
  unsigned ProcessorID = getSchedModel().getProcessorID();
  switch(MI->getOpcode()) {
  default:
    break;
  case X86::BEXTR32rr:
  case X86::BEXTR64rr:
    if (ProcessorID == 4) {
      return false;
    }
    break;
  }

  return true;
} // X86GenSubtargetInfo::shouldExpandBEXTR

As I wrote. This is just an idea...

Not sure if that helps.
-Andrea

The time for X86InstrInfo::getMachineCombinerPatterns might be upon us.....

In D52570#1247523, @lebedev.ri wrote:

Can we come up with a slightly better logic rather than just haveTBM()?
Else, i'm not sure how to fix D52293 / PR38938.
Can we perhaps also allow cases when we are shifting a load?

Folding a load isn't clearly a perf win here either. Doesn't look like load-op fusion occurs on Intel CPUs so BEXTR with load sends 3 uops out of the frontend. So that's the same number of uops at load+shift+and. If if I've added up the sizes correctly, I think the move immediate + bextr with load is one byte longer than load+shift+and at least for 32-bit. 64-bit would incur a rex prefix on the shift. But we'd probably still have a 32-bit and. So that's probably the same size.

lib/Target/X86/X86ISelDAGToDAG.cpp
2598 ↗	(On Diff #167180)	We're so far past type legalization here that i64 isn't possible on a 32-bit target.

In the short term I think a fast feature flag (for bdver/btver/znver amd targets) is the way to go - this will solve your immediate regression and give us time to get a proper scheduler model driven solution in place (I have my eye on BEXTR, SHLD/SHRD and LEA feature flags to be replaced by this).

In D52570#1248811, @RKSimon wrote:

In the short term I think a fast feature flag (for bdver/btver/znver amd targets) is the way to go

Hm, but those *are* all the AMD targets that support BMI.
Perhaps this can be simplified down to isAMD() ?

In D52570#1248819, @lebedev.ri wrote:

In D52570#1248811, @RKSimon wrote:

In the short term I think a fast feature flag (for bdver/btver/znver amd targets) is the way to go

Hm, but those *are* all the AMD targets that support BMI.
Perhaps this can be simplified down to isAMD() ?

No - it is never a good idea to map a uarch feature/bug onto a model/vendor predicate. It usually becomes an inaccurate description at some point because companies and chips change. We've experienced this problem in x86 already with "isAtom()".

In D52570#1248819, @lebedev.ri wrote:

In D52570#1248811, @RKSimon wrote:

In the short term I think a fast feature flag (for bdver/btver/znver amd targets) is the way to go

Hm, but those *are* all the AMD targets that support BMI.
Perhaps this can be simplified down to isAMD() ?

Please don't - we need to be getting rid of X86ProcFamilyEnum not adding yet more complications to it - handling all those enums is even worse than feature flags.

Add feature flag and put it on AMD CPUs.

I removed some of the unused check prefixes from bmi.ll so I didn't have to add even more variations of them.

So the target feature, was the way forward, great!

test/CodeGen/X86/extract-bits.ll
3–11 ↗	(On Diff #167546)	Can you please add `+fast-bextr` to all these?

<pedantic>This phab's title and summary don't match the contents of the patch any more

craig.topper retitled this revision from [X86] Don't generate BMI BEXTR from X86DAGToDAGISel::matchBEXTRFromAnd to [X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on compiling for a CPU with single uop BEXTR.Sep 29 2018, 9:17 AM

craig.topper edited the summary of this revision. (Show Details)

Add +fast-bextr to extract-bits.ll

Harbormaster completed remote builds in B23278: Diff 167612.Sep 29 2018, 9:48 AM

Looks good.

LGTM with one minor cleanup of the CHECKS in bmi.ll

test/CodeGen/X86/bmi.ll
447 ↗	(On Diff #167612)	cleanup?

This revision is now accepted and ready to land.Sep 29 2018, 10:07 AM

Closed by commit rL343399: [X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on… (authored by ctopper). · Explain WhySep 29 2018, 8:04 PM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in D52426: [X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering.Sep 30 2018, 3:16 AM

lebedev.ri mentioned this in D52293: [TLI][X86][AArch64] Generalize isDesirableToCommuteWithShift() hook and enable for X86.Sep 30 2018, 6:14 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86.td

9 lines

X86ISelDAGToDAG.cpp

9 lines

X86Subtarget.h

4 lines

test/

CodeGen/

X86/

bmi-x86_64.ll

64 lines

bmi.ll

85 lines

extract-bits.ll

20 lines

Diff 167633

llvm/trunk/lib/Target/X86/X86.td

Show First 20 Lines • Show All 394 Lines • ▼ Show 20 Lines	: SubtargetFeature<
"feature.", [FeatureRetpolineIndirectCalls]>;		"feature.", [FeatureRetpolineIndirectCalls]>;

// Direct Move instructions.		// Direct Move instructions.
def FeatureMOVDIRI : SubtargetFeature<"movdiri", "HasMOVDIRI", "true",		def FeatureMOVDIRI : SubtargetFeature<"movdiri", "HasMOVDIRI", "true",
"Support movdiri instruction">;		"Support movdiri instruction">;
def FeatureMOVDIR64B : SubtargetFeature<"movdir64b", "HasMOVDIR64B", "true",		def FeatureMOVDIR64B : SubtargetFeature<"movdir64b", "HasMOVDIR64B", "true",
"Support movdir64b instruction">;		"Support movdir64b instruction">;

		def FeatureFastBEXTR : SubtargetFeature<"fast-bextr", "HasFastBEXTR", "true",
		"Indicates that the BEXTR instruction is implemented as a single uop "
		"with good throughput.">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Register File Description		// Register File Description
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "X86RegisterInfo.td"		include "X86RegisterInfo.td"
include "X86RegisterBanks.td"		include "X86RegisterBanks.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	def : ProcessorModel<"btver2", BtVer2Model, [
FeatureLZCNT,		FeatureLZCNT,
FeatureFastLZCNT,		FeatureFastLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureXSAVE,		FeatureXSAVE,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast15ByteNOP,		FeatureFast15ByteNOP,
		FeatureFastBEXTR,
FeatureFastPartialYMMorZMMWrite		FeatureFastPartialYMMorZMMWrite
]>;		]>;

// Bulldozer		// Bulldozer
def : Proc<"bdver1", [		def : Proc<"bdver1", [
FeatureX87,		FeatureX87,
FeatureCMOV,		FeatureCMOV,
FeatureXOP,		FeatureXOP,
Show All 39 Lines	def : Proc<"bdver2", [
FeatureXSAVE,		FeatureXSAVE,
FeatureBMI,		FeatureBMI,
FeatureTBM,		FeatureTBM,
FeatureLWP,		FeatureLWP,
FeatureFMA,		FeatureFMA,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast11ByteNOP,		FeatureFast11ByteNOP,
		FeatureFastBEXTR,
FeatureMacroFusion		FeatureMacroFusion
]>;		]>;

// Steamroller		// Steamroller
def : Proc<"bdver3", [		def : Proc<"bdver3", [
FeatureX87,		FeatureX87,
FeatureCMOV,		FeatureCMOV,
FeatureXOP,		FeatureXOP,
Show All 16 Lines	def : Proc<"bdver3", [
FeatureTBM,		FeatureTBM,
FeatureLWP,		FeatureLWP,
FeatureFMA,		FeatureFMA,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast11ByteNOP,		FeatureFast11ByteNOP,
		FeatureFastBEXTR,
FeatureMacroFusion		FeatureMacroFusion
]>;		]>;

// Excavator		// Excavator
def : Proc<"bdver4", [		def : Proc<"bdver4", [
FeatureX87,		FeatureX87,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
Show All 15 Lines	def : Proc<"bdver4", [
FeatureBMI2,		FeatureBMI2,
FeatureTBM,		FeatureTBM,
FeatureLWP,		FeatureLWP,
FeatureFMA,		FeatureFMA,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
		FeatureFastBEXTR,
FeatureFast11ByteNOP,		FeatureFast11ByteNOP,
FeatureMWAITX,		FeatureMWAITX,
FeatureMacroFusion		FeatureMacroFusion
]>;		]>;

// Znver1		// Znver1
def: ProcessorModel<"znver1", Znver1Model, [		def: ProcessorModel<"znver1", Znver1Model, [
FeatureADX,		FeatureADX,
Show All 9 Lines	def: ProcessorModel<"znver1", Znver1Model, [
FeatureF16C,		FeatureF16C,
FeatureFMA,		FeatureFMA,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
FeatureFastLZCNT,		FeatureFastLZCNT,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureLZCNT,		FeatureLZCNT,
		FeatureFastBEXTR,
FeatureFast15ByteNOP,		FeatureFast15ByteNOP,
FeatureMacroFusion,		FeatureMacroFusion,
FeatureMMX,		FeatureMMX,
FeatureMOVBE,		FeatureMOVBE,
FeatureMWAITX,		FeatureMWAITX,
FeaturePCLMUL,		FeaturePCLMUL,
FeaturePOPCNT,		FeaturePOPCNT,
FeaturePRFCHW,		FeaturePRFCHW,
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp

	Show First 20 Lines • Show All 2,584 Lines • ▼ Show 20 Lines
	// See if this is an (X >> C1) & C2 that we can match to BEXTR/BEXTRI.			// See if this is an (X >> C1) & C2 that we can match to BEXTR/BEXTRI.
	bool X86DAGToDAGISel::matchBEXTRFromAnd(SDNode *Node) {			bool X86DAGToDAGISel::matchBEXTRFromAnd(SDNode *Node) {
	MVT NVT = Node->getSimpleValueType(0);			MVT NVT = Node->getSimpleValueType(0);
	SDLoc dl(Node);			SDLoc dl(Node);

	SDValue N0 = Node->getOperand(0);			SDValue N0 = Node->getOperand(0);
	SDValue N1 = Node->getOperand(1);			SDValue N1 = Node->getOperand(1);

	if (!Subtarget->hasBMI() && !Subtarget->hasTBM())			// If we have TBM we can use an immediate for the control. If we have BMI
				// we should only do this if the BEXTR instruction is implemented well.
				// Otherwise moving the control into a register makes this more costly.
				// TODO: Maybe load folding, greater than 32-bit masks, or a guarantee of LICM
				// hoisting the move immediate would make it worthwhile with a less optimal
				// BEXTR?
				if (!Subtarget->hasTBM() &&
				!(Subtarget->hasBMI() && Subtarget->hasFastBEXTR()))
	return false;			return false;

	// Must have a shift right.			// Must have a shift right.
	if (N0->getOpcode() != ISD::SRL && N0->getOpcode() != ISD::SRA)			if (N0->getOpcode() != ISD::SRL && N0->getOpcode() != ISD::SRA)
	return false;			return false;

	// Shift can't have additional users.			// Shift can't have additional users.
	if (!N0->hasOneUse())			if (!N0->hasOneUse())
	▲ Show 20 Lines • Show All 955 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	protected:
bool HasRDPID = false;		bool HasRDPID = false;

/// Processor supports WaitPKG instructions		/// Processor supports WaitPKG instructions
bool HasWAITPKG = false;		bool HasWAITPKG = false;

/// Processor supports PCONFIG instruction		/// Processor supports PCONFIG instruction
bool HasPCONFIG = false;		bool HasPCONFIG = false;

		/// Processor has a single uop BEXTR implementation.
		bool HasFastBEXTR = false;

/// Use a retpoline thunk rather than indirect calls to block speculative		/// Use a retpoline thunk rather than indirect calls to block speculative
/// execution.		/// execution.
bool UseRetpolineIndirectCalls = false;		bool UseRetpolineIndirectCalls = false;

/// Use a retpoline thunk or remove any indirect branch to block speculative		/// Use a retpoline thunk or remove any indirect branch to block speculative
/// execution.		/// execution.
bool UseRetpolineIndirectBranches = false;		bool UseRetpolineIndirectBranches = false;

▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	public:
bool hasFastPartialYMMorZMMWrite() const {		bool hasFastPartialYMMorZMMWrite() const {
return HasFastPartialYMMorZMMWrite;		return HasFastPartialYMMorZMMWrite;
}		}
bool hasFastGather() const { return HasFastGather; }		bool hasFastGather() const { return HasFastGather; }
bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }		bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }		bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
bool hasFastLZCNT() const { return HasFastLZCNT; }		bool hasFastLZCNT() const { return HasFastLZCNT; }
bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }		bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }
		bool hasFastBEXTR() const { return HasFastBEXTR; }
bool hasMacroFusion() const { return HasMacroFusion; }		bool hasMacroFusion() const { return HasMacroFusion; }
bool hasERMSB() const { return HasERMSB; }		bool hasERMSB() const { return HasERMSB; }
bool hasSlowDivide32() const { return HasSlowDivide32; }		bool hasSlowDivide32() const { return HasSlowDivide32; }
bool hasSlowDivide64() const { return HasSlowDivide64; }		bool hasSlowDivide64() const { return HasSlowDivide64; }
bool padShortFunctions() const { return PadShortFunctions; }		bool padShortFunctions() const { return PadShortFunctions; }
bool slowTwoMemOps() const { return SlowTwoMemOps; }		bool slowTwoMemOps() const { return SlowTwoMemOps; }
bool LEAusesAG() const { return LEAUsesAG; }		bool LEAusesAG() const { return LEAUsesAG; }
bool slowLEA() const { return SlowLEA; }		bool slowLEA() const { return SlowLEA; }
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/bmi-x86_64.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefixes=CHECK,BMI1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefixes=CHECK,BEXTR-SLOW,BMI1,BMI1-SLOW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+bmi2 \| FileCheck %s --check-prefixes=CHECK,BMI2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+bmi2 \| FileCheck %s --check-prefixes=CHECK,BEXTR-SLOW,BMI2,BMI2-SLOW
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+fast-bextr \| FileCheck %s --check-prefixes=CHECK,BEXTR-FAST,BMI1,BMI1-FAST
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+bmi2,+fast-bextr \| FileCheck %s --check-prefixes=CHECK,BEXTR-FAST,BMI2,BMI2-FAST

	declare i64 @llvm.x86.bmi.bextr.64(i64, i64)			declare i64 @llvm.x86.bmi.bextr.64(i64, i64)

	define i64 @bextr64(i64 %x, i64 %y) {			define i64 @bextr64(i64 %x, i64 %y) {
	; CHECK-LABEL: bextr64:			; CHECK-LABEL: bextr64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: bextrq %rsi, %rdi, %rax			; CHECK-NEXT: bextrq %rsi, %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%tmp = tail call i64 @llvm.x86.bmi.bextr.64(i64 %x, i64 %y)			%tmp = tail call i64 @llvm.x86.bmi.bextr.64(i64 %x, i64 %y)
	ret i64 %tmp			ret i64 %tmp
	}			}

	define i64 @bextr64b(i64 %x) uwtable ssp {			define i64 @bextr64b(i64 %x) uwtable ssp {
	; CHECK-LABEL: bextr64b:			; BEXTR-SLOW-LABEL: bextr64b:
	; CHECK: # %bb.0:			; BEXTR-SLOW: # %bb.0:
	; CHECK-NEXT: movl $3076, %eax # imm = 0xC04			; BEXTR-SLOW-NEXT: movq %rdi, %rax
	; CHECK-NEXT: bextrl %eax, %edi, %eax			; BEXTR-SLOW-NEXT: shrl $4, %eax
	; CHECK-NEXT: retq			; BEXTR-SLOW-NEXT: andl $4095, %eax # imm = 0xFFF
				; BEXTR-SLOW-NEXT: retq
				;
				; BEXTR-FAST-LABEL: bextr64b:
				; BEXTR-FAST: # %bb.0:
				; BEXTR-FAST-NEXT: movl $3076, %eax # imm = 0xC04
				; BEXTR-FAST-NEXT: bextrl %eax, %edi, %eax
				; BEXTR-FAST-NEXT: retq
	%1 = lshr i64 %x, 4			%1 = lshr i64 %x, 4
	%2 = and i64 %1, 4095			%2 = and i64 %1, 4095
	ret i64 %2			ret i64 %2
	}			}

	; Make sure we still use the AH subreg trick to extract 15:8			; Make sure we still use the AH subreg trick to extract 15:8
	define i64 @bextr64_subreg(i64 %x) uwtable ssp {			define i64 @bextr64_subreg(i64 %x) uwtable ssp {
	; CHECK-LABEL: bextr64_subreg:			; CHECK-LABEL: bextr64_subreg:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: movzbl %ah, %eax			; CHECK-NEXT: movzbl %ah, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%1 = lshr i64 %x, 8			%1 = lshr i64 %x, 8
	%2 = and i64 %1, 255			%2 = and i64 %1, 255
	ret i64 %2			ret i64 %2
	}			}

	define i64 @bextr64b_load(i64* %x) {			define i64 @bextr64b_load(i64* %x) {
	; CHECK-LABEL: bextr64b_load:			; BEXTR-SLOW-LABEL: bextr64b_load:
	; CHECK: # %bb.0:			; BEXTR-SLOW: # %bb.0:
	; CHECK-NEXT: movl $3076, %eax # imm = 0xC04			; BEXTR-SLOW-NEXT: movl (%rdi), %eax
	; CHECK-NEXT: bextrl %eax, (%rdi), %eax			; BEXTR-SLOW-NEXT: shrl $4, %eax
	; CHECK-NEXT: retq			; BEXTR-SLOW-NEXT: andl $4095, %eax # imm = 0xFFF
				; BEXTR-SLOW-NEXT: retq
				;
				; BEXTR-FAST-LABEL: bextr64b_load:
				; BEXTR-FAST: # %bb.0:
				; BEXTR-FAST-NEXT: movl $3076, %eax # imm = 0xC04
				; BEXTR-FAST-NEXT: bextrl %eax, (%rdi), %eax
				; BEXTR-FAST-NEXT: retq
	%1 = load i64, i64* %x, align 8			%1 = load i64, i64* %x, align 8
	%2 = lshr i64 %1, 4			%2 = lshr i64 %1, 4
	%3 = and i64 %2, 4095			%3 = and i64 %2, 4095
	ret i64 %3			ret i64 %3
	}			}

	; PR34042			; PR34042
	define i64 @bextr64c(i64 %x, i32 %y) {			define i64 @bextr64c(i64 %x, i32 %y) {
	; CHECK-LABEL: bextr64c:			; CHECK-LABEL: bextr64c:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: # kill: def $esi killed $esi def $rsi			; CHECK-NEXT: # kill: def $esi killed $esi def $rsi
	; CHECK-NEXT: bextrq %rsi, %rdi, %rax			; CHECK-NEXT: bextrq %rsi, %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%tmp0 = sext i32 %y to i64			%tmp0 = sext i32 %y to i64
	%tmp1 = tail call i64 @llvm.x86.bmi.bextr.64(i64 %x, i64 %tmp0)			%tmp1 = tail call i64 @llvm.x86.bmi.bextr.64(i64 %x, i64 %tmp0)
	ret i64 %tmp1			ret i64 %tmp1
	}			}

	define i64 @bextr64d(i64 %a) {			define i64 @bextr64d(i64 %a) {
	; CHECK-LABEL: bextr64d:			; BMI1-SLOW-LABEL: bextr64d:
	; CHECK: # %bb.0: # %entry			; BMI1-SLOW: # %bb.0: # %entry
	; CHECK-NEXT: movl $8450, %eax # imm = 0x2102			; BMI1-SLOW-NEXT: shrq $2, %rdi
	; CHECK-NEXT: bextrq %rax, %rdi, %rax			; BMI1-SLOW-NEXT: movl $8448, %eax # imm = 0x2100
	; CHECK-NEXT: retq			; BMI1-SLOW-NEXT: bextrq %rax, %rdi, %rax
				; BMI1-SLOW-NEXT: retq
				;
				; BMI2-SLOW-LABEL: bextr64d:
				; BMI2-SLOW: # %bb.0: # %entry
				; BMI2-SLOW-NEXT: shrq $2, %rdi
				; BMI2-SLOW-NEXT: movb $33, %al
				; BMI2-SLOW-NEXT: bzhiq %rax, %rdi, %rax
				; BMI2-SLOW-NEXT: retq
				;
				; BEXTR-FAST-LABEL: bextr64d:
				; BEXTR-FAST: # %bb.0: # %entry
				; BEXTR-FAST-NEXT: movl $8450, %eax # imm = 0x2102
				; BEXTR-FAST-NEXT: bextrq %rax, %rdi, %rax
				; BEXTR-FAST-NEXT: retq
	entry:			entry:
	%shr = lshr i64 %a, 2			%shr = lshr i64 %a, 2
	%and = and i64 %shr, 8589934591			%and = and i64 %shr, 8589934591
	ret i64 %and			ret i64 %and
	}			}

	define i64 @non_bextr64(i64 %x) {			define i64 @non_bextr64(i64 %x) {
	; CHECK-LABEL: non_bextr64:			; CHECK-LABEL: non_bextr64:
	Show All 10 Lines

llvm/trunk/test/CodeGen/X86/bmi.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefixes=CHECK,X86,X86-SLOW-BEXTR
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi,+bmi2 \| FileCheck %s --check-prefixes=CHECK,X86,BMI2,X86-BMI2			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi,+bmi2 \| FileCheck %s --check-prefixes=CHECK,X86,X86-SLOW-BEXTR
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi \| FileCheck %s --check-prefixes=CHECK,X64,X64-SLOW-BEXTR
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+bmi2 \| FileCheck %s --check-prefixes=CHECK,X64,BMI2,X64-BMI2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+bmi2 \| FileCheck %s --check-prefixes=CHECK,X64,X64-SLOW-BEXTR
				; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi,+fast-bextr \| FileCheck %s --check-prefixes=CHECK,X86,X86-FAST-BEXTR
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi,+fast-bextr \| FileCheck %s --check-prefixes=CHECK,X64,X64-FAST-BEXTR

	define i32 @andn32(i32 %x, i32 %y) {			define i32 @andn32(i32 %x, i32 %y) {
	; X86-LABEL: andn32:			; X86-LABEL: andn32:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: andnl {{[0-9]+}}(%esp), %eax, %eax			; X86-NEXT: andnl {{[0-9]+}}(%esp), %eax, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines
	; X64-NEXT: bextrl %esi, (%rdi), %eax			; X64-NEXT: bextrl %esi, (%rdi), %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%x1 = load i32, i32* %x			%x1 = load i32, i32* %x
	%tmp = tail call i32 @llvm.x86.bmi.bextr.32(i32 %x1, i32 %y)			%tmp = tail call i32 @llvm.x86.bmi.bextr.32(i32 %x1, i32 %y)
	ret i32 %tmp			ret i32 %tmp
	}			}

	define i32 @bextr32b(i32 %x) uwtable ssp {			define i32 @bextr32b(i32 %x) uwtable ssp {
	; X86-LABEL: bextr32b:			; X86-SLOW-BEXTR-LABEL: bextr32b:
	; X86: # %bb.0:			; X86-SLOW-BEXTR: # %bb.0:
	; X86-NEXT: movl $3076, %eax # imm = 0xC04			; X86-SLOW-BEXTR-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: bextrl %eax, {{[0-9]+}}(%esp), %eax			; X86-SLOW-BEXTR-NEXT: shrl $4, %eax
	; X86-NEXT: retl			; X86-SLOW-BEXTR-NEXT: andl $4095, %eax # imm = 0xFFF
	;			; X86-SLOW-BEXTR-NEXT: retl
	; X64-LABEL: bextr32b:			;
	; X64: # %bb.0:			; X64-SLOW-BEXTR-LABEL: bextr32b:
	; X64-NEXT: movl $3076, %eax # imm = 0xC04			; X64-SLOW-BEXTR: # %bb.0:
	; X64-NEXT: bextrl %eax, %edi, %eax			; X64-SLOW-BEXTR-NEXT: movl %edi, %eax
	; X64-NEXT: retq			; X64-SLOW-BEXTR-NEXT: shrl $4, %eax
				; X64-SLOW-BEXTR-NEXT: andl $4095, %eax # imm = 0xFFF
				; X64-SLOW-BEXTR-NEXT: retq
				;
				; X86-FAST-BEXTR-LABEL: bextr32b:
				; X86-FAST-BEXTR: # %bb.0:
				; X86-FAST-BEXTR-NEXT: movl $3076, %eax # imm = 0xC04
				; X86-FAST-BEXTR-NEXT: bextrl %eax, {{[0-9]+}}(%esp), %eax
				; X86-FAST-BEXTR-NEXT: retl
				;
				; X64-FAST-BEXTR-LABEL: bextr32b:
				; X64-FAST-BEXTR: # %bb.0:
				; X64-FAST-BEXTR-NEXT: movl $3076, %eax # imm = 0xC04
				; X64-FAST-BEXTR-NEXT: bextrl %eax, %edi, %eax
				; X64-FAST-BEXTR-NEXT: retq
	%1 = lshr i32 %x, 4			%1 = lshr i32 %x, 4
	%2 = and i32 %1, 4095			%2 = and i32 %1, 4095
	ret i32 %2			ret i32 %2
	}			}

	; Make sure we still use AH subreg trick to extract 15:8			; Make sure we still use AH subreg trick to extract 15:8
	define i32 @bextr32_subreg(i32 %x) uwtable ssp {			define i32 @bextr32_subreg(i32 %x) uwtable ssp {
	; X86-LABEL: bextr32_subreg:			; X86-LABEL: bextr32_subreg:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: bextr32_subreg:			; X64-LABEL: bextr32_subreg:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movzbl %ah, %eax			; X64-NEXT: movzbl %ah, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%1 = lshr i32 %x, 8			%1 = lshr i32 %x, 8
	%2 = and i32 %1, 255			%2 = and i32 %1, 255
	ret i32 %2			ret i32 %2
	}			}

	define i32 @bextr32b_load(i32* %x) uwtable ssp {			define i32 @bextr32b_load(i32* %x) uwtable ssp {
	; X86-LABEL: bextr32b_load:			; X86-SLOW-BEXTR-LABEL: bextr32b_load:
	; X86: # %bb.0:			; X86-SLOW-BEXTR: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SLOW-BEXTR-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl $3076, %ecx # imm = 0xC04			; X86-SLOW-BEXTR-NEXT: movl (%eax), %eax
	; X86-NEXT: bextrl %ecx, (%eax), %eax			; X86-SLOW-BEXTR-NEXT: shrl $4, %eax
	; X86-NEXT: retl			; X86-SLOW-BEXTR-NEXT: andl $4095, %eax # imm = 0xFFF
	;			; X86-SLOW-BEXTR-NEXT: retl
	; X64-LABEL: bextr32b_load:			;
	; X64: # %bb.0:			; X64-SLOW-BEXTR-LABEL: bextr32b_load:
	; X64-NEXT: movl $3076, %eax # imm = 0xC04			; X64-SLOW-BEXTR: # %bb.0:
	; X64-NEXT: bextrl %eax, (%rdi), %eax			; X64-SLOW-BEXTR-NEXT: movl (%rdi), %eax
	; X64-NEXT: retq			; X64-SLOW-BEXTR-NEXT: shrl $4, %eax
				; X64-SLOW-BEXTR-NEXT: andl $4095, %eax # imm = 0xFFF
				; X64-SLOW-BEXTR-NEXT: retq
				;
				; X86-FAST-BEXTR-LABEL: bextr32b_load:
				; X86-FAST-BEXTR: # %bb.0:
				; X86-FAST-BEXTR-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-FAST-BEXTR-NEXT: movl $3076, %ecx # imm = 0xC04
				; X86-FAST-BEXTR-NEXT: bextrl %ecx, (%eax), %eax
				; X86-FAST-BEXTR-NEXT: retl
				;
				; X64-FAST-BEXTR-LABEL: bextr32b_load:
				; X64-FAST-BEXTR: # %bb.0:
				; X64-FAST-BEXTR-NEXT: movl $3076, %eax # imm = 0xC04
				; X64-FAST-BEXTR-NEXT: bextrl %eax, (%rdi), %eax
				; X64-FAST-BEXTR-NEXT: retq
	%1 = load i32, i32* %x			%1 = load i32, i32* %x
	%2 = lshr i32 %1, 4			%2 = lshr i32 %1, 4
	%3 = and i32 %2, 4095			%3 = and i32 %2, 4095
	ret i32 %3			ret i32 %3
	}			}

	; PR34042			; PR34042
	define i32 @bextr32c(i32 %x, i16 zeroext %y) {			define i32 @bextr32c(i32 %x, i16 zeroext %y) {
	▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/extract-bits.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=-bmi,-tbm,-bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X86,NOBMI,X86-NOBMI			; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=-bmi,-tbm,-bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X86,NOBMI,X86-NOBMI
	; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,-tbm,-bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1NOTBM,X86-BMI1NOTBM			; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,-tbm,-bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1NOTBM,X86-BMI1NOTBM
	; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,+tbm,-bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1TBM,X86-BMI1TBM			; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,+tbm,-bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1TBM,X86-BMI1TBM
	; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,+tbm,+bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1BMI2,X86-BMI1BMI2,BMI1TBM,X86-BMI1TBM,BMI1TBMBMI2,X86-BMI1TBMBMI2			; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,+tbm,+bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1BMI2,X86-BMI1BMI2,BMI1TBM,X86-BMI1TBM,BMI1TBMBMI2,X86-BMI1TBMBMI2
	; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,-tbm,+bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1BMI2,X86-BMI1BMI2,BMI1NOTBMBMI2,X86-BMI1NOTBMBMI2			; RUN: llc -mtriple=i686-unknown-linux-gnu -mattr=+bmi,-tbm,+bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X86,BMI1,X86-BMI1,BMI1BMI2,X86-BMI1BMI2,BMI1NOTBMBMI2,X86-BMI1NOTBMBMI2
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=-bmi,-tbm,-bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X64,NOBMI,X64-NOBMI			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=-bmi,-tbm,-bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X64,NOBMI,X64-NOBMI
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,-tbm,-bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1NOTBM,X64-BMI1NOTBM			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,-tbm,-bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1NOTBM,X64-BMI1NOTBM
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,+tbm,-bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1TBM,X64-BMI1TBM			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,+tbm,-bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1TBM,X64-BMI1TBM
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,+tbm,+bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1BMI2,X64-BMI1BMI2,BMI1TBM,X64-BMI1TBM,BMI1TBMBMI2,X64-BMI1TBMBMI2			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,+tbm,+bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1BMI2,X64-BMI1BMI2,BMI1TBM,X64-BMI1TBM,BMI1TBMBMI2,X64-BMI1TBMBMI2
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,-tbm,+bmi2 < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1BMI2,X64-BMI1BMI2,BMI1NOTBMBMI2,X64-BMI1NOTBMBMI2			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi,-tbm,+bmi2,+fast-bextr < %s \| FileCheck %s --check-prefixes=CHECK,X64,BMI1,X64-BMI1,BMI1BMI2,X64-BMI1BMI2,BMI1NOTBMBMI2,X64-BMI1NOTBMBMI2

	; Please keep in sync with test/CodeGen/AArch64/extract-bits.ll			; Please keep in sync with test/CodeGen/AArch64/extract-bits.ll

	; https://bugs.llvm.org/show_bug.cgi?id=36419			; https://bugs.llvm.org/show_bug.cgi?id=36419
	; https://bugs.llvm.org/show_bug.cgi?id=37603			; https://bugs.llvm.org/show_bug.cgi?id=37603
	; https://bugs.llvm.org/show_bug.cgi?id=37610			; https://bugs.llvm.org/show_bug.cgi?id=37610

	; Patterns:			; Patterns:
	▲ Show 20 Lines • Show All 5,739 Lines • Show Last 20 Lines