This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
BasicTTIImpl.h
-
TargetLowering.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
-
AMDGPU/
-
AMDGPUISelLowering.h
2
AMDGPUISelLowering.cpp
-
ARM/
-
ARMISelLowering.h
-
ARMISelLowering.cpp
-
Hexagon/
-
HexagonISelLowering.h
-
Mips/
-
MipsISelLowering.h
-
MipsISelLowering.cpp
-
NVPTX/
-
NVPTXISelLowering.h
-
PowerPC/
-
PPCISelLowering.h
-
RISCV/
-
RISCVISelLowering.h
-
RISCVISelLowering.cpp
-
SystemZ/
-
SystemZISelLowering.h
-
VE/
-
VEISelLowering.h
-
WebAssembly/
-
WebAssemblyISelLowering.h
-
WebAssemblyISelLowering.cpp
-
X86/
-
X86ISelLowering.h
1
X86ISelLowering.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
cttz.ll
-
CodeGen/X86/
-
X86/
2
clz.ll

Differential D132520

[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
ClosedPublic

Authored by RKSimon on Aug 23 2022, 6:31 PM.

Download Raw Diff

Details

Reviewers

craig.topper
pengfei
pcordes

Commits

rGf9de13232f9b: [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch

Summary

This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Aug 23 2022, 6:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 6:31 PM

Herald added subscribers: kosarev, mattd, gchakrabarti and 39 others. · View Herald Transcript

RKSimon requested review of this revision.Aug 23 2022, 6:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 6:31 PM

Herald added subscribers: • pcwang-thead, MaskRay, aheejin, jholewinski. · View Herald Transcript

RKSimon added inline comments.Aug 23 2022, 6:32 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
696	@foad @arsenm Is the Type arg enough for the target to make a better decision on when to speculate cttz/ctlz?

Harbormaster completed remote builds in B182989: Diff 455037.Aug 23 2022, 9:34 PM

pengfei added inline comments.Aug 23 2022, 9:53 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
5824	promote

Fix capitalization typo

Harbormaster completed remote builds in B183052: Diff 455122.Aug 24 2022, 3:52 AM

LGTM.

This revision is now accepted and ready to land.Aug 24 2022, 4:27 AM

foad added inline comments.Aug 24 2022, 4:27 AM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
696	I don't think we need to do any more work here. The hook should always return true for AMDGPU and you can delete the FIXME comment.

RKSimon mentioned this in rG3cf48963ff5a: [AMDGPU] Remove old isCheapToSpeculateCttz FIXME.Aug 24 2022, 7:53 AM

This revision was landed with ongoing or failed builds.Aug 24 2022, 9:28 AM

Closed by commit rGf9de13232f9b: [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rGf9de13232f9b: [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch.

hliao mentioned this in rGdda38786534a: [LoongArch] Fix build due to TLI interface changes. NFC..Aug 24 2022, 12:18 PM

Looks good to me. Ideas for further minor improvements. (I guess I should be filing these as missed-optimization bug reports now that this commit is in?)

If we only need a 16-bit output, we can potentially tzcnt %di, %ax, at the cost of a false dependency merging into AX. Again probably only sensible at -Os or -Oz. And of course that requires that we know tzcnt will run as tzcnt, not bsf. tzcnt itself produces the operand-size when the input is zero, like we're emulating without it.

But tzcnt already has a false output dependency on Intel CPUs from Sandybridge through Broadwell (fixed in Skylake for tzcnt/lzcnt, but not popcnt until later).

One way to avoid that (and still emulate tzcnt by setting a high bit):

leal    65536(%rdi), %eax
rep bsf  %eax, %eax

If EDI holds a 16-bit function arg that we require to be zero-extended to 16-bit, that's equivalent to OR. If it was sign-extended to 16-bit, the top half might be all-ones in which case this would clear it to zero. But the high half being non-zero would imply that the sign bit was set in the low half, so that's actually fine. We only need bit 16 set when the low half is zero, and this does achieve that goal.

(Assuming our caller respects clang's unofficial extension to the x86-64 SysV calling convention where narrow args are extended to 32-bit, which GCC follows but ICC doesn't. If the high half is arbitrary garbage when the low half is zero, no constant we can add to it can guarantee non-zero, let alone producing 16 exactly. So LEA can't work then.)

llvm/test/CodeGen/X86/clz.ll
553	This would be even better as mov 4(%esp), %eax or $65536, %eax Only 2 back-end uops (instead of 1 mov-immediate + 1 micro-fused load+or). Pretty much equivalent for the front-end and ROB, but takes one fewer entry in the RS until these execute. The mov-immediate is independent so it can exec early, but still takes a cycle on an execution unit. And then we're left with load and OR uops, just like with my asm. The one advantage is that the mov-immediate could possibly retire from the ROB while still stalled waiting for the load, but that seems pretty unlikely. (Cache miss on stack memory in a function with stack args, or at the end of a long dependency chain involving storing the arg). That movzx + or-immediate asm is what we get from C++ source that manually sets a bit, `__builtin_ctz(x\|0x10000)` https://godbolt.org/z/d63WM5s4c (clang nightly after this revision landed). Or from `std::countr_zero(p);` or on a reference, so it seems this mov-immediate thing is only with by-value stack args, and isn't a concern for the general case of other addressing modes. So always a simple addressing mode, ESP or EBP, where we won't have un-lamination on Sandy/Ivy Bridge. Also of course avoids reading outside the pointed-to 16-bit object, which could cross into an unmapped page. (Or suffer a slowdown from a cache-line split). Anyway, what we do when dereferencing a pointer like for `std::countr_zero(p);`is slightly more efficient than what we do for a by-value stack arg. But it seems this is not specific to CTTZ lowering; we get mov-immediate / or mem,reg from `return x\|0x1000;` (vs. 64-bit mode correctly choosing `mov %edi, %eax` / `or $65536, %eax`, which puts the mov on the critical path on CPUs without mov-elimination, like Ice Lake with updated microcode :(. But it saves code size. In the 32-bit case there's no latency advantage: a load has to happen as part of the critical path either way, and OR register has the same latency as OR-immediate.) BTW, with `-Oz` at least, we should be using 4-byte `bts $16, %eax` instead of 5-byte `or $65536, %eax` (or 6-byte for other registers, plus a REX if needed). On Intel CPUs its still only 1 uop, although can run on fewer execution ports (p06 in SKL/ICL, only p1 in Alder Lake P-cores). Appropriate at least for `-Os -mtune=intel` (or any specific Sandybridge-family) if we want to be that fine-grained about different instruction selection, probably even `-O2 -mtune=intel`. Although maybe not, since Alder Lake P-cores dropped the throughput to 1, competing with imul and tzcnt/lzcnt/popcnt for that port. So maybe only for a -march before that? But people normally expect -march=haswell to be good on later Intel, and it's a pretty small savings. On AMD CPUs, `bts $imm, %reg` is 2 uops, so only appropriate for -Oz and maybe -Os.
560	BTW, with `-Oz` at least, we should be using 4-byte `bts $16, %edi` instead of 6-byte `or $65536, %edi` (or 5-byte for EAX). On Intel CPUs `bts $i8, %reg` is still only 1 uop, although can run on fewer execution ports than `or` (p06 in SKL/ICL; the shift ports. Only p1 in Alder Lake P-cores). Appropriate at least for `-Os -mtune=intel` (or any specific Sandybridge-family) if we want to be that fine-grained about different instruction selection. Perhaps even `-O2 -mtune=intel`. Although maybe not, since Alder Lake P-cores dropped the throughput to 1, competing with imul and tzcnt/lzcnt/popcnt for that port. (BTS still has 1 cycle latency, unlike most integer uops that can only run on port 1. Alder Lake E-cores run as 1 uop with 1 cycle latency to the integer output, 2 cycle latency to the CF output.) So maybe only for -O2 with a -march before Alder Lake? But people normally expect -march=haswell to be good on later Intel, and it's a pretty small savings, just 2 bytes. OTOH it might be a pretty small gain, unless used in a loop with a port 1 bottleneck. On AMD CPUs, `bts $imm, %reg` is 2 uops, so only appropriate for -Oz and maybe -Os.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

BasicTTIImpl.h

4 lines

TargetLowering.h

4 lines

lib/

CodeGen/

CodeGenPrepare.cpp

6 lines

Target/

AArch64/

AArch64ISelLowering.h

4 lines

AMDGPU/

AMDGPUISelLowering.h

4 lines

AMDGPUISelLowering.cpp

4 lines

ARM/

ARMISelLowering.h

4 lines

ARMISelLowering.cpp

4 lines

Hexagon/

HexagonISelLowering.h

4 lines

Mips/

MipsISelLowering.h

4 lines

MipsISelLowering.cpp

4 lines

NVPTX/

NVPTXISelLowering.h

2 lines

PowerPC/

PPCISelLowering.h

4 lines

RISCV/

RISCVISelLowering.h

4 lines

RISCVISelLowering.cpp

4 lines

SystemZ/

SystemZISelLowering.h

2 lines

VE/

VEISelLowering.h

2 lines

WebAssembly/

WebAssemblyISelLowering.h

4 lines

WebAssemblyISelLowering.cpp

4 lines

X86/

X86ISelLowering.h

4 lines

X86ISelLowering.cpp

13 lines

test/

Analysis/

CostModel/

X86/

cttz.ll

60 lines

CodeGen/

X86/

clz.ll

44 lines

Diff 455250

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,468 Lines • ▼ Show 20 Lines	case Intrinsic::powi:
Cost += thisT()->getArithmeticInstrCost(Instruction::FDiv, RetTy,		Cost += thisT()->getArithmeticInstrCost(Instruction::FDiv, RetTy,
CostKind);		CostKind);
return Cost;		return Cost;
}		}
}		}
break;		break;
case Intrinsic::cttz:		case Intrinsic::cttz:
// FIXME: If necessary, this should go in target-specific overrides.		// FIXME: If necessary, this should go in target-specific overrides.
if (RetVF.isScalar() && getTLI()->isCheapToSpeculateCttz())		if (RetVF.isScalar() && getTLI()->isCheapToSpeculateCttz(RetTy))
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
break;		break;

case Intrinsic::ctlz:		case Intrinsic::ctlz:
// FIXME: If necessary, this should go in target-specific overrides.		// FIXME: If necessary, this should go in target-specific overrides.
if (RetVF.isScalar() && getTLI()->isCheapToSpeculateCtlz())		if (RetVF.isScalar() && getTLI()->isCheapToSpeculateCtlz(RetTy))
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
break;		break;

case Intrinsic::memcpy:		case Intrinsic::memcpy:
return thisT()->getMemcpyCost(ICA.getInst());		return thisT()->getMemcpyCost(ICA.getInst());

case Intrinsic::masked_scatter: {		case Intrinsic::masked_scatter: {
const Value *Mask = Args[3];		const Value *Mask = Args[3];
▲ Show 20 Lines • Show All 919 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	public:

/// Returns if it's reasonable to merge stores to MemVT size.		/// Returns if it's reasonable to merge stores to MemVT size.
virtual bool canMergeStoresTo(unsigned AS, EVT MemVT,		virtual bool canMergeStoresTo(unsigned AS, EVT MemVT,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
return true;		return true;
}		}

/// Return true if it is cheap to speculate a call to intrinsic cttz.		/// Return true if it is cheap to speculate a call to intrinsic cttz.
virtual bool isCheapToSpeculateCttz() const {		virtual bool isCheapToSpeculateCttz(Type *Ty) const {
return false;		return false;
}		}

/// Return true if it is cheap to speculate a call to intrinsic ctlz.		/// Return true if it is cheap to speculate a call to intrinsic ctlz.
virtual bool isCheapToSpeculateCtlz() const {		virtual bool isCheapToSpeculateCtlz(Type *Ty) const {
return false;		return false;
}		}

/// Return true if ctlz instruction is fast.		/// Return true if ctlz instruction is fast.
virtual bool isCtlzFast() const {		virtual bool isCtlzFast() const {
return false;		return false;
}		}

▲ Show 20 Lines • Show All 4,376 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,039 Lines • ▼ Show 20 Lines	static bool despeculateCountZeros(IntrinsicInst *CountZeros,
const TargetLowering *TLI,		const TargetLowering *TLI,
const DataLayout *DL,		const DataLayout *DL,
bool &ModifiedDT) {		bool &ModifiedDT) {
// If a zero input is undefined, it doesn't make sense to despeculate that.		// If a zero input is undefined, it doesn't make sense to despeculate that.
if (match(CountZeros->getOperand(1), m_One()))		if (match(CountZeros->getOperand(1), m_One()))
return false;		return false;

// If it's cheap to speculate, there's nothing to do.		// If it's cheap to speculate, there's nothing to do.
		Type *Ty = CountZeros->getType();
auto IntrinsicID = CountZeros->getIntrinsicID();		auto IntrinsicID = CountZeros->getIntrinsicID();
if ((IntrinsicID == Intrinsic::cttz && TLI->isCheapToSpeculateCttz()) \|\|		if ((IntrinsicID == Intrinsic::cttz && TLI->isCheapToSpeculateCttz(Ty)) \|\|
(IntrinsicID == Intrinsic::ctlz && TLI->isCheapToSpeculateCtlz()))		(IntrinsicID == Intrinsic::ctlz && TLI->isCheapToSpeculateCtlz(Ty)))
return false;		return false;

// Only handle legal scalar cases. Anything else requires too much work.		// Only handle legal scalar cases. Anything else requires too much work.
Type *Ty = CountZeros->getType();
unsigned SizeInBits = Ty->getScalarSizeInBits();		unsigned SizeInBits = Ty->getScalarSizeInBits();
if (Ty->isVectorTy() \|\| SizeInBits > DL->getLargestLegalIntTypeSizeInBits())		if (Ty->isVectorTy() \|\| SizeInBits > DL->getLargestLegalIntTypeSizeInBits())
return false;		return false;

// Bail if the value is never zero.		// Bail if the value is never zero.
Use &Op = CountZeros->getOperandUse(0);		Use &Op = CountZeros->getOperandUse(0);
if (isKnownNonZero(Op, *DL))		if (isKnownNonZero(Op, *DL))
return false;		return false;
▲ Show 20 Lines • Show All 6,362 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 736 Lines • ▼ Show 20 Lines	bool canMergeStoresTo(unsigned AddressSpace, EVT MemVT,

bool NoFloat = MF.getFunction().hasFnAttribute(Attribute::NoImplicitFloat);		bool NoFloat = MF.getFunction().hasFnAttribute(Attribute::NoImplicitFloat);

if (NoFloat)		if (NoFloat)
return (MemVT.getSizeInBits() <= 64);		return (MemVT.getSizeInBits() <= 64);
return true;		return true;
}		}

bool isCheapToSpeculateCttz() const override {		bool isCheapToSpeculateCttz(Type *) const override {
return true;		return true;
}		}

bool isCheapToSpeculateCtlz() const override {		bool isCheapToSpeculateCtlz(Type *) const override {
return true;		return true;
}		}

bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;		bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;

bool hasAndNotCompare(SDValue V) const override {		bool hasAndNotCompare(SDValue V) const override {
// We can use bics for any scalar.		// We can use bics for any scalar.
return V.getValueType().isScalarInteger();		return V.getValueType().isScalarInteger();
▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	public:

bool isLoadBitCastBeneficial(EVT, EVT, const SelectionDAG &DAG,		bool isLoadBitCastBeneficial(EVT, EVT, const SelectionDAG &DAG,
const MachineMemOperand &MMO) const final;		const MachineMemOperand &MMO) const final;

bool storeOfVectorConstantIsCheap(EVT MemVT,		bool storeOfVectorConstantIsCheap(EVT MemVT,
unsigned NumElem,		unsigned NumElem,
unsigned AS) const override;		unsigned AS) const override;
bool aggressivelyPreferBuildVectorSources(EVT VecVT) const override;		bool aggressivelyPreferBuildVectorSources(EVT VecVT) const override;
bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz(Type *Ty) const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz(Type *Ty) const override;

bool isSDNodeAlwaysUniform(const SDNode *N) const override;		bool isSDNodeAlwaysUniform(const SDNode *N) const override;
static CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg);		static CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg);
static CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool IsVarArg);		static CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool IsVarArg);

SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,		const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 686 Lines • ▼ Show 20 Lines	bool AMDGPUTargetLowering::isLoadBitCastBeneficial(EVT LoadTy, EVT CastTy,
return allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),		return allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),
CastTy, MMO, &Fast) &&		CastTy, MMO, &Fast) &&
Fast;		Fast;
}		}

// SI+ has instructions for cttz / ctlz for 32-bit values. This is probably also		// SI+ has instructions for cttz / ctlz for 32-bit values. This is probably also
// profitable with the expansion for 64-bit since it's generally good to		// profitable with the expansion for 64-bit since it's generally good to
// speculate things.		// speculate things.
bool AMDGPUTargetLowering::isCheapToSpeculateCttz() const {		bool AMDGPUTargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
return true;		return true;
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions @foad @arsenm Is the Type arg enough for the target to make a better decision on when to speculate cttz/ctlz? RKSimon: @foad @arsenm Is the Type arg enough for the target to make a better decision on when to…
		foadUnsubmitted Not Done Reply Inline Actions I don't think we need to do any more work here. The hook should always return true for AMDGPU and you can delete the FIXME comment. foad: I don't think we need to do any more work here. The hook should always return true for AMDGPU…
}		}

bool AMDGPUTargetLowering::isCheapToSpeculateCtlz() const {		bool AMDGPUTargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
return true;		return true;
}		}

bool AMDGPUTargetLowering::isSDNodeAlwaysUniform(const SDNode *N) const {		bool AMDGPUTargetLowering::isSDNodeAlwaysUniform(const SDNode *N) const {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
case ISD::EntryToken:		case ISD::EntryToken:
case ISD::TokenFactor:		case ISD::TokenFactor:
return true;		return true;
▲ Show 20 Lines • Show All 4,119 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 673 Lines • ▼ Show 20 Lines	bool canCombineStoreAndExtract(Type VectorTy, Value Idx,
unsigned &Cost) const override;		unsigned &Cost) const override;

bool canMergeStoresTo(unsigned AddressSpace, EVT MemVT,		bool canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
const MachineFunction &MF) const override {		const MachineFunction &MF) const override {
// Do not merge to larger than i32.		// Do not merge to larger than i32.
return (MemVT.getSizeInBits() <= 32);		return (MemVT.getSizeInBits() <= 32);
}		}

bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz(Type *Ty) const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz(Type *Ty) const override;

bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {		bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 21,151 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::canCombineStoreAndExtract(Type VectorTy, Value Idx,
// or Q register.		// or Q register.
if (BitWidth == 64 \|\| BitWidth == 128) {		if (BitWidth == 64 \|\| BitWidth == 128) {
Cost = 0;		Cost = 0;
return true;		return true;
}		}
return false;		return false;
}		}

bool ARMTargetLowering::isCheapToSpeculateCttz() const {		bool ARMTargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
return Subtarget->hasV6T2Ops();		return Subtarget->hasV6T2Ops();
}		}

bool ARMTargetLowering::isCheapToSpeculateCtlz() const {		bool ARMTargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
return Subtarget->hasV6T2Ops();		return Subtarget->hasV6T2Ops();
}		}

bool ARMTargetLowering::shouldExpandShift(SelectionDAG &DAG, SDNode *N) const {		bool ARMTargetLowering::shouldExpandShift(SelectionDAG &DAG, SDNode *N) const {
return !Subtarget->hasMinSize() \|\| Subtarget->isTargetWindows();		return !Subtarget->hasMinSize() \|\| Subtarget->isTargetWindows();
}		}

Value ARMTargetLowering::emitLoadLinked(IRBuilderBase &Builder, Type ValueTy,		Value ARMTargetLowering::emitLoadLinked(IRBuilderBase &Builder, Type ValueTy,
▲ Show 20 Lines • Show All 598 Lines • Show Last 20 Lines

llvm/lib/Target/Hexagon/HexagonISelLowering.h

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	public:

bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,		bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,
MachineFunction &MF,		MachineFunction &MF,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;

bool isTruncateFree(Type Ty1, Type Ty2) const override;		bool isTruncateFree(Type Ty1, Type Ty2) const override;
bool isTruncateFree(EVT VT1, EVT VT2) const override;		bool isTruncateFree(EVT VT1, EVT VT2) const override;

bool isCheapToSpeculateCttz() const override { return true; }		bool isCheapToSpeculateCttz(Type *) const override { return true; }
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz(Type *) const override { return true; }
bool isCtlzFast() const override { return true; }		bool isCtlzFast() const override { return true; }

bool hasBitTest(SDValue X, SDValue Y) const override;		bool hasBitTest(SDValue X, SDValue Y) const override;

bool allowTruncateForTailCall(Type Ty1, Type Ty2) const override;		bool allowTruncateForTailCall(Type Ty1, Type Ty2) const override;

/// Return true if an FMA operation is faster than a pair of mul and add		/// Return true if an FMA operation is faster than a pair of mul and add
/// instructions. fmuladd intrinsics will be expanded to FMAs when this		/// instructions. fmuladd intrinsics will be expanded to FMAs when this
▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

llvm/lib/Target/Mips/MipsISelLowering.h

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	public:

MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {		MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {
return MVT::i32;		return MVT::i32;
}		}

EVT getTypeForExtReturn(LLVMContext &Context, EVT VT,		EVT getTypeForExtReturn(LLVMContext &Context, EVT VT,
ISD::NodeType) const override;		ISD::NodeType) const override;

bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz(Type *Ty) const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz(Type *Ty) const override;
bool hasBitTest(SDValue X, SDValue Y) const override;		bool hasBitTest(SDValue X, SDValue Y) const override;
bool shouldFoldConstantShiftPairToMask(const SDNode *N,		bool shouldFoldConstantShiftPairToMask(const SDNode *N,
CombineLevel Level) const override;		CombineLevel Level) const override;

/// Return the register type for a given MVT, ensuring vectors are treated		/// Return the register type for a given MVT, ensuring vectors are treated
/// as a series of gpr sized integers.		/// as a series of gpr sized integers.
MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,		MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
EVT VT) const override;		EVT VT) const override;
▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

llvm/lib/Target/Mips/MipsISelLowering.cpp

Show First 20 Lines • Show All 1,166 Lines • ▼ Show 20 Lines	case ISD::SHL:
return performSHLCombine(N, DAG, DCI, Subtarget);		return performSHLCombine(N, DAG, DCI, Subtarget);
case ISD::SUB:		case ISD::SUB:
return performSUBCombine(N, DAG, DCI, Subtarget);		return performSUBCombine(N, DAG, DCI, Subtarget);
}		}

return SDValue();		return SDValue();
}		}

bool MipsTargetLowering::isCheapToSpeculateCttz() const {		bool MipsTargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
return Subtarget.hasMips32();		return Subtarget.hasMips32();
}		}

bool MipsTargetLowering::isCheapToSpeculateCtlz() const {		bool MipsTargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
return Subtarget.hasMips32();		return Subtarget.hasMips32();
}		}

bool MipsTargetLowering::hasBitTest(SDValue X, SDValue Y) const {		bool MipsTargetLowering::hasBitTest(SDValue X, SDValue Y) const {
// We can use ANDI+SLTIU as a bit test. Y contains the bit position.		// We can use ANDI+SLTIU as a bit test. Y contains the bit position.
// For MIPSR2 or later, we may be able to use the `ext` instruction or its'		// For MIPSR2 or later, we may be able to use the `ext` instruction or its'
// double-word variants.		// double-word variants.
if (auto *C = dyn_cast<ConstantSDNode>(Y))		if (auto *C = dyn_cast<ConstantSDNode>(Y))
▲ Show 20 Lines • Show All 3,847 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	public:
}		}

bool enableAggressiveFMAFusion(EVT VT) const override { return true; }		bool enableAggressiveFMAFusion(EVT VT) const override { return true; }

// The default is to transform llvm.ctlz(x, false) (where false indicates that		// The default is to transform llvm.ctlz(x, false) (where false indicates that
// x == 0 is not undefined behavior) into a branch that checks whether x is 0		// x == 0 is not undefined behavior) into a branch that checks whether x is 0
// and avoids calling ctlz in that case. We have a dedicated ctlz		// and avoids calling ctlz in that case. We have a dedicated ctlz
// instruction, so we say that ctlz is cheap to speculate.		// instruction, so we say that ctlz is cheap to speculate.
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz(Type *Ty) const override { return true; }

AtomicExpansionKind shouldCastAtomicLoadInIR(LoadInst *LI) const override {		AtomicExpansionKind shouldCastAtomicLoadInIR(LoadInst *LI) const override {
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
}		}

AtomicExpansionKind shouldCastAtomicStoreInIR(StoreInst *SI) const override {		AtomicExpansionKind shouldCastAtomicStoreInIR(StoreInst *SI) const override {
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
}		}
Show All 38 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 784 Lines • ▼ Show 20 Lines	public:
bool useSoftFloat() const override;		bool useSoftFloat() const override;

bool hasSPE() const;		bool hasSPE() const;

MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {		MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {
return MVT::i32;		return MVT::i32;
}		}

bool isCheapToSpeculateCttz() const override {		bool isCheapToSpeculateCttz(Type *Ty) const override {
return true;		return true;
}		}

bool isCheapToSpeculateCtlz() const override {		bool isCheapToSpeculateCtlz(Type *Ty) const override {
return true;		return true;
}		}

bool isCtlzFast() const override {		bool isCtlzFast() const override {
return true;		return true;
}		}

bool isEqualityCmpFoldedWithSignedCmp() const override {		bool isEqualityCmpFoldedWithSignedCmp() const override {
▲ Show 20 Lines • Show All 686 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;
bool isLegalICmpImmediate(int64_t Imm) const override;		bool isLegalICmpImmediate(int64_t Imm) const override;
bool isLegalAddImmediate(int64_t Imm) const override;		bool isLegalAddImmediate(int64_t Imm) const override;
bool isTruncateFree(Type SrcTy, Type DstTy) const override;		bool isTruncateFree(Type SrcTy, Type DstTy) const override;
bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;		bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;
bool isZExtFree(SDValue Val, EVT VT2) const override;		bool isZExtFree(SDValue Val, EVT VT2) const override;
bool isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT) const override;		bool isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT) const override;
bool signExtendConstant(const ConstantInt *CI) const override;		bool signExtendConstant(const ConstantInt *CI) const override;
bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz(Type *Ty) const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz(Type *Ty) const override;
bool hasAndNotCompare(SDValue Y) const override;		bool hasAndNotCompare(SDValue Y) const override;
bool hasBitTest(SDValue X, SDValue Y) const override;		bool hasBitTest(SDValue X, SDValue Y) const override;
bool shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd(		bool shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd(
SDValue X, ConstantSDNode XC, ConstantSDNode CC, SDValue Y,		SDValue X, ConstantSDNode XC, ConstantSDNode CC, SDValue Y,
unsigned OldShiftOpcode, unsigned NewShiftOpcode,		unsigned OldShiftOpcode, unsigned NewShiftOpcode,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;
bool shouldSinkOperands(Instruction *I,		bool shouldSinkOperands(Instruction *I,
SmallVectorImpl<Use *> &Ops) const override;		SmallVectorImpl<Use *> &Ops) const override;
▲ Show 20 Lines • Show All 381 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,149 Lines • ▼ Show 20 Lines
	bool RISCVTargetLowering::isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT) const {			bool RISCVTargetLowering::isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT) const {
	return Subtarget.is64Bit() && SrcVT == MVT::i32 && DstVT == MVT::i64;			return Subtarget.is64Bit() && SrcVT == MVT::i32 && DstVT == MVT::i64;
	}			}

	bool RISCVTargetLowering::signExtendConstant(const ConstantInt *CI) const {			bool RISCVTargetLowering::signExtendConstant(const ConstantInt *CI) const {
	return Subtarget.is64Bit() && CI->getType()->isIntegerTy(32);			return Subtarget.is64Bit() && CI->getType()->isIntegerTy(32);
	}			}

	bool RISCVTargetLowering::isCheapToSpeculateCttz() const {			bool RISCVTargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
	return Subtarget.hasStdExtZbb();			return Subtarget.hasStdExtZbb();
	}			}

	bool RISCVTargetLowering::isCheapToSpeculateCtlz() const {			bool RISCVTargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
	return Subtarget.hasStdExtZbb();			return Subtarget.hasStdExtZbb();
	}			}

	bool RISCVTargetLowering::hasAndNotCompare(SDValue Y) const {			bool RISCVTargetLowering::hasAndNotCompare(SDValue Y) const {
	EVT VT = Y.getValueType();			EVT VT = Y.getValueType();

	// FIXME: Support vectors once we have tests.			// FIXME: Support vectors once we have tests.
	if (VT.isVector())			if (VT.isVector())
	▲ Show 20 Lines • Show All 11,803 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	public:
unsigned		unsigned
getNumRegisters(LLVMContext &Context, EVT VT,		getNumRegisters(LLVMContext &Context, EVT VT,
Optional<MVT> RegisterVT) const override {		Optional<MVT> RegisterVT) const override {
// i128 inline assembly operand.		// i128 inline assembly operand.
if (VT == MVT::i128 && RegisterVT && *RegisterVT == MVT::Untyped)		if (VT == MVT::i128 && RegisterVT && *RegisterVT == MVT::Untyped)
return 1;		return 1;
return TargetLowering::getNumRegisters(Context, VT);		return TargetLowering::getNumRegisters(Context, VT);
}		}
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz(Type *) const override { return true; }
bool preferZeroCompareBranch() const override { return true; }		bool preferZeroCompareBranch() const override { return true; }
bool hasBitPreservingFPLogic(EVT VT) const override {		bool hasBitPreservingFPLogic(EVT VT) const override {
EVT ScVT = VT.getScalarType();		EVT ScVT = VT.getScalarType();
return ScVT == MVT::f32 \|\| ScVT == MVT::f64 \|\| ScVT == MVT::f128;		return ScVT == MVT::f32 \|\| ScVT == MVT::f64 \|\| ScVT == MVT::f128;
}		}
bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override {		bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override {
ConstantInt* Mask = dyn_cast<ConstantInt>(AndI.getOperand(1));		ConstantInt* Mask = dyn_cast<ConstantInt>(AndI.getOperand(1));
return Mask && Mask->getValue().isIntN(16);		return Mask && Mask->getValue().isIntN(16);
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

llvm/lib/Target/VE/VEISelLowering.h

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	public:
// Return lower limit for number of blocks in a jump table.		// Return lower limit for number of blocks in a jump table.
unsigned getMinimumJumpTableEntries() const override;		unsigned getMinimumJumpTableEntries() const override;

// SX-Aurora VE's s/udiv is 5-9 times slower than multiply.		// SX-Aurora VE's s/udiv is 5-9 times slower than multiply.
bool isIntDivCheap(EVT, AttributeList) const override { return false; }		bool isIntDivCheap(EVT, AttributeList) const override { return false; }
// VE doesn't have rem.		// VE doesn't have rem.
bool hasStandaloneRem(EVT) const override { return false; }		bool hasStandaloneRem(EVT) const override { return false; }
// VE LDZ instruction returns 64 if the input is zero.		// VE LDZ instruction returns 64 if the input is zero.
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz(Type *) const override { return true; }
// VE LDZ instruction is fast.		// VE LDZ instruction is fast.
bool isCtlzFast() const override { return true; }		bool isCtlzFast() const override { return true; }
// VE has NND instruction.		// VE has NND instruction.
bool hasAndNot(SDValue Y) const override;		bool hasAndNot(SDValue Y) const override;

/// } Target Optimization		/// } Target Optimization
};		};
} // namespace llvm		} // namespace llvm

#endif // LLVM_LIB_TARGET_VE_VEISELLOWERING_H		#endif // LLVM_LIB_TARGET_VE_VEISELLOWERING_H

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	private:
MVT getScalarShiftAmountTy(const DataLayout &DL, EVT) const override;		MVT getScalarShiftAmountTy(const DataLayout &DL, EVT) const override;
MachineBasicBlock *		MachineBasicBlock *
EmitInstrWithCustomInserter(MachineInstr &MI,		EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *MBB) const override;		MachineBasicBlock *MBB) const override;
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;
std::pair<unsigned, const TargetRegisterClass *>		std::pair<unsigned, const TargetRegisterClass *>
getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,		getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,
StringRef Constraint, MVT VT) const override;		StringRef Constraint, MVT VT) const override;
bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz(Type *Ty) const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz(Type *Ty) const override;
bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,		bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS,		unsigned AS,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;
bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, Align Alignment,		bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, Align Alignment,
MachineMemOperand::Flags Flags,		MachineMemOperand::Flags Flags,
bool *Fast) const override;		bool *Fast) const override;
bool isIntDivCheap(EVT VT, AttributeList Attr) const override;		bool isIntDivCheap(EVT VT, AttributeList Attr) const override;
bool isVectorLoadExtDesirable(SDValue ExtVal) const override;		bool isVectorLoadExtDesirable(SDValue ExtVal) const override;
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 745 Lines • ▼ Show 20 Lines	if (Constraint.size() == 1) {
default:		default:
break;		break;
}		}
}		}

return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);		return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);
}		}

bool WebAssemblyTargetLowering::isCheapToSpeculateCttz() const {		bool WebAssemblyTargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
// Assume ctz is a relatively cheap operation.		// Assume ctz is a relatively cheap operation.
return true;		return true;
}		}

bool WebAssemblyTargetLowering::isCheapToSpeculateCtlz() const {		bool WebAssemblyTargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
// Assume clz is a relatively cheap operation.		// Assume clz is a relatively cheap operation.
return true;		return true;
}		}

bool WebAssemblyTargetLowering::isLegalAddressingMode(const DataLayout &DL,		bool WebAssemblyTargetLowering::isLegalAddressingMode(const DataLayout &DL,
const AddrMode &AM,		const AddrMode &AM,
Type *Ty, unsigned AS,		Type *Ty, unsigned AS,
Instruction *I) const {		Instruction *I) const {
▲ Show 20 Lines • Show All 1,995 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines	public:
/// with x86-specific store splitting optimizations.		/// with x86-specific store splitting optimizations.
bool mergeStoresAfterLegalization(EVT MemVT) const override {		bool mergeStoresAfterLegalization(EVT MemVT) const override {
return !MemVT.isVector();		return !MemVT.isVector();
}		}

bool canMergeStoresTo(unsigned AddressSpace, EVT MemVT,		bool canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
const MachineFunction &MF) const override;		const MachineFunction &MF) const override;

bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz(Type *Ty) const override;

bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz(Type *Ty) const override;

bool isCtlzFast() const override;		bool isCtlzFast() const override;

bool hasBitPreservingFPLogic(EVT VT) const override;		bool hasBitPreservingFPLogic(EVT VT) const override;

bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const override {		bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const override {
// If the pair to store is a mixture of float and int values, we will		// If the pair to store is a mixture of float and int values, we will
// save two bitwise instructions and one float-to-int instruction and		// save two bitwise instructions and one float-to-int instruction and
▲ Show 20 Lines • Show All 743 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,814 Lines • ▼ Show 20 Lines
bool X86TargetLowering::shouldFormOverflowOp(unsigned Opcode, EVT VT,		bool X86TargetLowering::shouldFormOverflowOp(unsigned Opcode, EVT VT,
bool) const {		bool) const {
// TODO: Allow vectors?		// TODO: Allow vectors?
if (VT.isVector())		if (VT.isVector())
return false;		return false;
return VT.isSimple() \|\| !isOperationExpand(Opcode, VT);		return VT.isSimple() \|\| !isOperationExpand(Opcode, VT);
}		}

bool X86TargetLowering::isCheapToSpeculateCttz() const {		bool X86TargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
// Speculate cttz only if we can directly use TZCNT.		// Speculate cttz only if we can directly use TZCNT or can promote to i32.
		pengfeiUnsubmitted Not Done Reply Inline Actions promote pengfei: promote
return Subtarget.hasBMI();		return Subtarget.hasBMI() \|\|
		(!Ty->isVectorTy() && Ty->getScalarSizeInBits() < 32);
}		}

bool X86TargetLowering::isCheapToSpeculateCtlz() const {		bool X86TargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
// Speculate ctlz only if we can directly use LZCNT.		// Speculate ctlz only if we can directly use LZCNT.
return Subtarget.hasLZCNT();		return Subtarget.hasLZCNT();
}		}

bool X86TargetLowering::hasBitPreservingFPLogic(EVT VT) const {		bool X86TargetLowering::hasBitPreservingFPLogic(EVT VT) const {
return VT == MVT::f32 \|\| VT == MVT::f64 \|\| VT.isVector();		return VT == MVT::f32 \|\| VT == MVT::f64 \|\| VT.isVector();
}		}

▲ Show 20 Lines • Show All 23,035 Lines • ▼ Show 20 Lines	static SDValue LowerCTTZ(SDValue Op, const X86Subtarget &Subtarget,

assert(!VT.isVector() && Op.getOpcode() == ISD::CTTZ &&		assert(!VT.isVector() && Op.getOpcode() == ISD::CTTZ &&
"Only scalar CTTZ requires custom lowering");		"Only scalar CTTZ requires custom lowering");

// Issue a bsf (scan bits forward) which also sets EFLAGS.		// Issue a bsf (scan bits forward) which also sets EFLAGS.
SDVTList VTs = DAG.getVTList(VT, MVT::i32);		SDVTList VTs = DAG.getVTList(VT, MVT::i32);
Op = DAG.getNode(X86ISD::BSF, dl, VTs, N0);		Op = DAG.getNode(X86ISD::BSF, dl, VTs, N0);

		// If src is known never zero we can skip the CMOV.
		if (DAG.isKnownNeverZero(N0))
		return Op;

// If src is zero (i.e. bsf sets ZF), returns NumBits.		// If src is zero (i.e. bsf sets ZF), returns NumBits.
SDValue Ops[] = {Op, DAG.getConstant(NumBits, dl, VT),		SDValue Ops[] = {Op, DAG.getConstant(NumBits, dl, VT),
DAG.getTargetConstant(X86::COND_E, dl, MVT::i8),		DAG.getTargetConstant(X86::COND_E, dl, MVT::i8),
Op.getValue(1)};		Op.getValue(1)};
return DAG.getNode(X86ISD::CMOV, dl, VT, Ops);		return DAG.getNode(X86ISD::CMOV, dl, VT, Ops);
}		}

static SDValue lowerAddSub(SDValue Op, SelectionDAG &DAG,		static SDValue lowerAddSub(SDValue Op, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 27,732 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/cttz.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=-bmi,+sse2 \| FileCheck %s -check-prefixes=SSE2,NOBMI			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=-bmi,+sse2 \| FileCheck %s -check-prefixes=CHECK,SSE2,NOBMI
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+sse2 \| FileCheck %s -check-prefixes=SSE2,BMI			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+sse2 \| FileCheck %s -check-prefixes=CHECK,SSE2,BMI
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+sse4.2 \| FileCheck %s -check-prefixes=BMI,SSE42			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+sse4.2 \| FileCheck %s -check-prefixes=CHECK,BMI,SSE42
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx \| FileCheck %s -check-prefixes=BMI,AVX1			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx \| FileCheck %s -check-prefixes=CHECK,BMI,AVX1
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx2 \| FileCheck %s -check-prefixes=BMI,AVX2			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx2 \| FileCheck %s -check-prefixes=CHECK,BMI,AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512f \| FileCheck %s -check-prefixes=BMI,AVX512,AVX512F			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512f \| FileCheck %s -check-prefixes=CHECK,BMI,AVX512,AVX512F
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512bw,+avx512dq \| FileCheck %s -check-prefixes=BMI,AVX512,AVX512BW			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512bw,+avx512dq \| FileCheck %s -check-prefixes=CHECK,BMI,AVX512,AVX512BW
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512bw,+avx512dq \| FileCheck %s -check-prefixes=BMI,AVX512,AVX512BW			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512bw,+avx512dq \| FileCheck %s -check-prefixes=CHECK,BMI,AVX512,AVX512BW
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512vpopcntdq \| FileCheck %s -check-prefixes=BMI,AVX512,AVX512VPOPCNT			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512vpopcntdq \| FileCheck %s -check-prefixes=CHECK,BMI,AVX512,AVX512VPOPCNT
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512bitalg \| FileCheck %s -check-prefixes=BMI,AVX512,AVX512BITALG			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output -mattr=+bmi,+avx512vl,+avx512bitalg \| FileCheck %s -check-prefixes=CHECK,BMI,AVX512,AVX512BITALG

	; Verify the cost of scalar trailing zero count instructions.			; Verify the cost of scalar trailing zero count instructions.

	declare i64 @llvm.cttz.i64(i64, i1)			declare i64 @llvm.cttz.i64(i64, i1)
	declare i32 @llvm.cttz.i32(i32, i1)			declare i32 @llvm.cttz.i32(i32, i1)
	declare i16 @llvm.cttz.i16(i16, i1)			declare i16 @llvm.cttz.i16(i16, i1)
	declare i8 @llvm.cttz.i8(i8, i1)			declare i8 @llvm.cttz.i8(i8, i1)

	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 true)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz
	;			;
	%cttz = call i32 @llvm.cttz.i32(i32 %a, i1 1)			%cttz = call i32 @llvm.cttz.i32(i32 %a, i1 1)
	ret i32 %cttz			ret i32 %cttz
	}			}

	define i16 @var_cttz_i16(i16 %a) {			define i16 @var_cttz_i16(i16 %a) {
	; NOBMI-LABEL: 'var_cttz_i16'			; CHECK-LABEL: 'var_cttz_i16'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;
	; BMI-LABEL: 'var_cttz_i16'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;			;
	%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 0)			%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 0)
	ret i16 %cttz			ret i16 %cttz
	}			}

	define i16 @var_cttz_i16u(i16 %a) {			define i16 @var_cttz_i16u(i16 %a) {
	; NOBMI-LABEL: 'var_cttz_i16u'			; CHECK-LABEL: 'var_cttz_i16u'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;
	; BMI-LABEL: 'var_cttz_i16u'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;			;
	%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 1)			%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 1)
	ret i16 %cttz			ret i16 %cttz
	}			}

	define i8 @var_cttz_i8(i8 %a) {			define i8 @var_cttz_i8(i8 %a) {
	; NOBMI-LABEL: 'var_cttz_i8'			; CHECK-LABEL: 'var_cttz_i8'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;
	; BMI-LABEL: 'var_cttz_i8'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;			;
	%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 0)			%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 0)
	ret i8 %cttz			ret i8 %cttz
	}			}

	define i8 @var_cttz_i8u(i8 %a) {			define i8 @var_cttz_i8u(i8 %a) {
	; NOBMI-LABEL: 'var_cttz_i8u'			; CHECK-LABEL: 'var_cttz_i8u'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;
	; BMI-LABEL: 'var_cttz_i8u'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;			;
	%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 1)			%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 1)
	ret i8 %cttz			ret i8 %cttz
	}			}

	; Verify the cost of vector trailing zero count instructions.			; Verify the cost of vector trailing zero count instructions.

	declare <2 x i64> @llvm.cttz.v2i64(<2 x i64>, i1)			declare <2 x i64> @llvm.cttz.v2i64(<2 x i64>, i1)
	▲ Show 20 Lines • Show All 709 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/clz.ll

	Show First 20 Lines • Show All 504 Lines • ▼ Show 20 Lines
	; X64-CLZ-LABEL: ctlz_i64_zero_test:			; X64-CLZ-LABEL: ctlz_i64_zero_test:
	; X64-CLZ: # %bb.0:			; X64-CLZ: # %bb.0:
	; X64-CLZ-NEXT: lzcntq %rdi, %rax			; X64-CLZ-NEXT: lzcntq %rdi, %rax
	; X64-CLZ-NEXT: retq			; X64-CLZ-NEXT: retq
	%tmp1 = call i64 @llvm.ctlz.i64(i64 %n, i1 false)			%tmp1 = call i64 @llvm.ctlz.i64(i64 %n, i1 false)
	ret i64 %tmp1			ret i64 %tmp1
	}			}

	; Generate a test and branch to handle zero inputs because bsr/bsf are very slow.			; Promote i8 cttz to i32 and mask bit8 to prevent (slow) zero-src bsf case.
	define i8 @cttz_i8_zero_test(i8 %n) {			define i8 @cttz_i8_zero_test(i8 %n) {
	; X86-LABEL: cttz_i8_zero_test:			; X86-LABEL: cttz_i8_zero_test:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl $256, %eax # imm = 0x100
	; X86-NEXT: testb %al, %al			; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: je .LBB12_1
	; X86-NEXT: # %bb.2: # %cond.false
	; X86-NEXT: movzbl %al, %eax
	; X86-NEXT: rep bsfl %eax, %eax			; X86-NEXT: rep bsfl %eax, %eax
	; X86-NEXT: # kill: def $al killed $al killed $eax			; X86-NEXT: # kill: def $al killed $al killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	; X86-NEXT: .LBB12_1:
	; X86-NEXT: movb $8, %al
	; X86-NEXT: # kill: def $al killed $al killed $eax
	; X86-NEXT: retl
	;			;
	; X64-LABEL: cttz_i8_zero_test:			; X64-LABEL: cttz_i8_zero_test:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: testb %dil, %dil			; X64-NEXT: orl $256, %edi # imm = 0x100
	; X64-NEXT: je .LBB12_1			; X64-NEXT: rep bsfl %edi, %eax
	; X64-NEXT: # %bb.2: # %cond.false
	; X64-NEXT: movzbl %dil, %eax
	; X64-NEXT: rep bsfl %eax, %eax
	; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq
	; X64-NEXT: .LBB12_1:
	; X64-NEXT: movb $8, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-CLZ-LABEL: cttz_i8_zero_test:			; X86-CLZ-LABEL: cttz_i8_zero_test:
	; X86-CLZ: # %bb.0:			; X86-CLZ: # %bb.0:
	; X86-CLZ-NEXT: movl $256, %eax # imm = 0x100			; X86-CLZ-NEXT: movl $256, %eax # imm = 0x100
	; X86-CLZ-NEXT: orl {{[0-9]+}}(%esp), %eax			; X86-CLZ-NEXT: orl {{[0-9]+}}(%esp), %eax
	; X86-CLZ-NEXT: tzcntl %eax, %eax			; X86-CLZ-NEXT: tzcntl %eax, %eax
	; X86-CLZ-NEXT: # kill: def $al killed $al killed $eax			; X86-CLZ-NEXT: # kill: def $al killed $al killed $eax
	; X86-CLZ-NEXT: retl			; X86-CLZ-NEXT: retl
	;			;
	; X64-CLZ-LABEL: cttz_i8_zero_test:			; X64-CLZ-LABEL: cttz_i8_zero_test:
	; X64-CLZ: # %bb.0:			; X64-CLZ: # %bb.0:
	; X64-CLZ-NEXT: orl $256, %edi # imm = 0x100			; X64-CLZ-NEXT: orl $256, %edi # imm = 0x100
	; X64-CLZ-NEXT: tzcntl %edi, %eax			; X64-CLZ-NEXT: tzcntl %edi, %eax
	; X64-CLZ-NEXT: # kill: def $al killed $al killed $eax			; X64-CLZ-NEXT: # kill: def $al killed $al killed $eax
	; X64-CLZ-NEXT: retq			; X64-CLZ-NEXT: retq
	%tmp1 = call i8 @llvm.cttz.i8(i8 %n, i1 false)			%tmp1 = call i8 @llvm.cttz.i8(i8 %n, i1 false)
	ret i8 %tmp1			ret i8 %tmp1
	}			}

	; Generate a test and branch to handle zero inputs because bsr/bsf are very slow.			; Promote i16 cttz to i32 and mask bit16 to prevent (slow) zero-src bsf case.
	define i16 @cttz_i16_zero_test(i16 %n) {			define i16 @cttz_i16_zero_test(i16 %n) {
	; X86-LABEL: cttz_i16_zero_test:			; X86-LABEL: cttz_i16_zero_test:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl $65536, %eax # imm = 0x10000
	; X86-NEXT: testw %ax, %ax			; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
				pcordesUnsubmitted Not Done Reply Inline Actions This would be even better as mov 4(%esp), %eax or $65536, %eax Only 2 back-end uops (instead of 1 mov-immediate + 1 micro-fused load+or). Pretty much equivalent for the front-end and ROB, but takes one fewer entry in the RS until these execute. The mov-immediate is independent so it can exec early, but still takes a cycle on an execution unit. And then we're left with load and OR uops, just like with my asm. The one advantage is that the mov-immediate could possibly retire from the ROB while still stalled waiting for the load, but that seems pretty unlikely. (Cache miss on stack memory in a function with stack args, or at the end of a long dependency chain involving storing the arg). That movzx + or-immediate asm is what we get from C++ source that manually sets a bit, `__builtin_ctz(x\|0x10000)` https://godbolt.org/z/d63WM5s4c (clang nightly after this revision landed). Or from `std::countr_zero(p);` or on a reference, so it seems this mov-immediate thing is only with by-value stack args, and isn't a concern for the general case of other addressing modes. So always a simple addressing mode, ESP or EBP, where we won't have un-lamination on Sandy/Ivy Bridge. Also of course avoids reading outside the pointed-to 16-bit object, which could cross into an unmapped page. (Or suffer a slowdown from a cache-line split). Anyway, what we do when dereferencing a pointer like for `std::countr_zero(p);`is slightly more efficient than what we do for a by-value stack arg. But it seems this is not specific to CTTZ lowering; we get mov-immediate / or mem,reg from `return x\|0x1000;` (vs. 64-bit mode correctly choosing `mov %edi, %eax` / `or $65536, %eax`, which puts the mov on the critical path on CPUs without mov-elimination, like Ice Lake with updated microcode :(. But it saves code size. In the 32-bit case there's no latency advantage: a load has to happen as part of the critical path either way, and OR register has the same latency as OR-immediate.) BTW, with `-Oz` at least, we should be using 4-byte `bts $16, %eax` instead of 5-byte `or $65536, %eax` (or 6-byte for other registers, plus a REX if needed). On Intel CPUs its still only 1 uop, although can run on fewer execution ports (p06 in SKL/ICL, only p1 in Alder Lake P-cores). Appropriate at least for `-Os -mtune=intel` (or any specific Sandybridge-family) if we want to be that fine-grained about different instruction selection, probably even `-O2 -mtune=intel`. Although maybe not, since Alder Lake P-cores dropped the throughput to 1, competing with imul and tzcnt/lzcnt/popcnt for that port. So maybe only for a -march before that? But people normally expect -march=haswell to be good on later Intel, and it's a pretty small savings. On AMD CPUs, `bts $imm, %reg` is 2 uops, so only appropriate for -Oz and maybe -Os. pcordes: This would be even better as ``` mov 4(%esp), %eax or $65536, %eax ``` Only 2 back…
	; X86-NEXT: je .LBB13_1
	; X86-NEXT: # %bb.2: # %cond.false
	; X86-NEXT: rep bsfl %eax, %eax			; X86-NEXT: rep bsfl %eax, %eax
	; X86-NEXT: # kill: def $ax killed $ax killed $eax			; X86-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	; X86-NEXT: .LBB13_1:
	; X86-NEXT: movw $16, %ax
	; X86-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-NEXT: retl
	;			;
	; X64-LABEL: cttz_i16_zero_test:			; X64-LABEL: cttz_i16_zero_test:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: testw %di, %di			; X64-NEXT: orl $65536, %edi # imm = 0x10000
				pcordesUnsubmitted Not Done Reply Inline Actions BTW, with `-Oz` at least, we should be using 4-byte `bts $16, %edi` instead of 6-byte `or $65536, %edi` (or 5-byte for EAX). On Intel CPUs `bts $i8, %reg` is still only 1 uop, although can run on fewer execution ports than `or` (p06 in SKL/ICL; the shift ports. Only p1 in Alder Lake P-cores). Appropriate at least for `-Os -mtune=intel` (or any specific Sandybridge-family) if we want to be that fine-grained about different instruction selection. Perhaps even `-O2 -mtune=intel`. Although maybe not, since Alder Lake P-cores dropped the throughput to 1, competing with imul and tzcnt/lzcnt/popcnt for that port. (BTS still has 1 cycle latency, unlike most integer uops that can only run on port 1. Alder Lake E-cores run as 1 uop with 1 cycle latency to the integer output, 2 cycle latency to the CF output.) So maybe only for -O2 with a -march before Alder Lake? But people normally expect -march=haswell to be good on later Intel, and it's a pretty small savings, just 2 bytes. OTOH it might be a pretty small gain, unless used in a loop with a port 1 bottleneck. On AMD CPUs, `bts $imm, %reg` is 2 uops, so only appropriate for -Oz and maybe -Os. pcordes: BTW, with `-Oz` at least, we should be using 4-byte `bts $16, %edi` instead of 6-byte `or…
	; X64-NEXT: je .LBB13_1
	; X64-NEXT: # %bb.2: # %cond.false
	; X64-NEXT: rep bsfl %edi, %eax			; X64-NEXT: rep bsfl %edi, %eax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	; X64-NEXT: .LBB13_1:
	; X64-NEXT: movw $16, %ax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq
	;			;
	; X86-CLZ-LABEL: cttz_i16_zero_test:			; X86-CLZ-LABEL: cttz_i16_zero_test:
	; X86-CLZ: # %bb.0:			; X86-CLZ: # %bb.0:
	; X86-CLZ-NEXT: movl $65536, %eax # imm = 0x10000			; X86-CLZ-NEXT: movl $65536, %eax # imm = 0x10000
	; X86-CLZ-NEXT: orl {{[0-9]+}}(%esp), %eax			; X86-CLZ-NEXT: orl {{[0-9]+}}(%esp), %eax
	; X86-CLZ-NEXT: tzcntl %eax, %eax			; X86-CLZ-NEXT: tzcntl %eax, %eax
	; X86-CLZ-NEXT: # kill: def $ax killed $ax killed $eax			; X86-CLZ-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-CLZ-NEXT: retl			; X86-CLZ-NEXT: retl
	▲ Show 20 Lines • Show All 581 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branchClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 455250

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/lib/Target/Hexagon/HexagonISelLowering.h

llvm/lib/Target/Mips/MipsISelLowering.h

llvm/lib/Target/Mips/MipsISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/SystemZ/SystemZISelLowering.h

llvm/lib/Target/VE/VEISelLowering.h

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/Analysis/CostModel/X86/cttz.ll

llvm/test/CodeGen/X86/clz.ll

[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
ClosedPublic