This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsX86.td
-
lib/
-
Support/
-
Host.cpp
-
Target/X86/
-
X86/
-
X86.td
-
X86ISelLowering.cpp
-
X86InstrInfo.td
-
X86Schedule.td
-
X86Subtarget.h
-
X86Subtarget.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
clzero.ll
-
MC/
-
Disassembler/X86/
-
X86/
-
x86-32.txt
-
X86/
-
x86-32.s
-
x86-64.s

Differential D29385

Clzero intrinsic and its addition under znver1
ClosedPublic

Authored by GGanesh on Feb 1 2017, 2:38 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper

Commits

rG50f3d1452c2a: [X86] Clzero intrinsic and its addition under znver1
rL294558: [X86] Clzero intrinsic and its addition under znver1

Summary

This patch does the following.

Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
Adds the clzero feature under znver1 architecture.
The custom inserter is added in Lowering.
A testcase is added to check the intrinsic.
The clzero instruction is added to assembler test.

Diff Detail

Repository: rL LLVM

Event Timeline

GGanesh created this revision.Feb 1 2017, 2:38 AM

Herald added a subscriber: igorb. · View Herald TranscriptFeb 1 2017, 2:38 AM

GGanesh added a parent revision: D29386: Clzero flag addition and inclusion under znver1.Feb 1 2017, 2:43 AM

test/CodeGen/X86/clzero.ll
36 ↗	(On Diff #86597)	Please check on i686 target triples as well as x86_64
test/MC/X86/x86-64.s
1508 ↗	(On Diff #86597)	Should we test for the clzero %rax / clzero %eax aliases as well?

RKSimon added a subscriber: llvm-commits.Feb 1 2017, 9:42 AM

Updated for the review comments

One minor but no other comments from me. @craig.topper ?

test/MC/X86/x86-32.s
450 ↗	(On Diff #87385)	Add clzero only test as well // CHECK: clzero // CHECK: encoding: [0x0f,0x01,0xfc] clzero

Updated the test file "x86-32.s" for clzero only test!

andreadb added a subscriber: andreadb.Feb 7 2017, 8:53 AM

RKSimon added inline comments.Feb 7 2017, 8:59 AM

lib/Target/X86/X86InstrInfo.td
2461 ↗	(On Diff #87438)	Given that clzero writes to memory shouldn't it have a mayStore = 1 attribute? I'm not sure if WriteSystem alone creates the necessary barrier. This needs testing - check if separate load+instruction and instruction+store don't get folded across the clzero?

I have a (probably dumb) question: should CLZERO be treated as a memory read/write barrier for the purpose of scheduling? Is it okay to reoder CLFLUSH based on the EAX/RAX register dependency only?

I am asking this because it looks to me that we don't model the 'flush/zero' behavior of cache line operations on x86. For what I can see (please correct me if I am wrong), nothing in the code suggests that CLZERO (or even CLFLUSH) is treated as a memory barrier for scheduling purpose. Is that behavior intended (i.e. do we care about it)? Am I reading those tablegen definitions incorrectly?

Ah, I see that Simon asked a similar question :-)

The instruction will default to having MCID::UnmodeledSideEffects set. Is that sufficient to protect the scheduling?

In D29385#670375, @craig.topper wrote:

The instruction will default to having MCID::UnmodeledSideEffects set. Is that sufficient to protect the scheduling?

In which case we just need a test to check for it

I think it is okay even if we don't set the mayStore attribute.
I wrote a simple test to check the following

Schedules based on the instruction attribute
Side-effect handling

As Craig Topper says it falls back the UnmodeledSideEffects case by default.

Test:

#include "xmmintrin.h"
int a[2];
int b[2];
int c,d, *p;
int main ()
{
  b[0] += a[0];
  __builtin_ia32_clzero (p) ;
  d += c;
  b[1] += a[0];
  return 0;
}

We can notice that the schedule without the intrinsic has two mov scheduled for the first cycle. Without the intrinsic they go back to safe-order.
Without clzero intrinsic: $>clang -O3 ClzeroFoldTest.c -S -march=znver1

# BB#0:                                 # %entry
        movl    a(%rip), %eax
        movl    c(%rip), %ecx
        addl    %eax, b(%rip)
        addl    %eax, b+4(%rip)
        addl    %ecx, d(%rip)
        xorl    %eax, %eax
        retq

With clzero intrinsic: $>clang -O3 ClzeroFoldTest.c -S -march=znver1

# BB#0:
        movl    a(%rip), %eax
        addl    %eax, b(%rip)
        movq    p(%rip), %rax
        leaq    (%rax), %rax
        clzero
        movl    c(%rip), %eax
        addl    %eax, d(%rip)
        movl    a(%rip), %eax
        addl    %eax, b+4(%rip)
        xorl    %eax, %eax
        retq

In D29385#670375, @craig.topper wrote:

The instruction will default to having MCID::UnmodeledSideEffects set. Is that sufficient to protect the scheduling?

Yes, that would work.
An instruction with unmodeled side effects is basically treated as a scheduling barrier. So we are safe.

@craig.topper, is it fair to say that flag UnmodeledSideEffects is conservatively implicitly set because the tablegen definition for CLZEROr does not provide a primary instruction pattern, and flag hasSideEffects_Unset is true (i.e. hasSideEffects is not explicitly mentioned)?

@craig.topper If you are okay, can you please commit the changes on my behalf?

LGTM

This revision is now accepted and ready to land.Feb 8 2017, 8:07 PM

I'll commit shortly.

Closed by commit rL294558: [X86] Clzero intrinsic and its addition under znver1 (authored by ctopper). · Explain WhyFeb 8 2017, 8:39 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsX86.td

7 lines

lib/

Support/

Host.cpp

4 lines

Target/

X86/

3 lines

25 lines

16 lines

1 line

4 lines

1 line

test/

CodeGen/

X86/

clzero.ll

23 lines

MC/

Disassembler/

X86/

x86-32.txt

3 lines

X86/

x86-32.s

8 lines

x86-64.s

8 lines

Diff 87756

llvm/trunk/include/llvm/IR/IntrinsicsX86.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,489 Lines • ▼ Show 20 Lines
	let TargetPrefix = "x86" in {			let TargetPrefix = "x86" in {
	def int_x86_monitorx			def int_x86_monitorx
	: GCCBuiltin<"__builtin_ia32_monitorx">,			: GCCBuiltin<"__builtin_ia32_monitorx">,
	Intrinsic<[], [ llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty ], []>;			Intrinsic<[], [ llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty ], []>;
	def int_x86_mwaitx			def int_x86_mwaitx
	: GCCBuiltin<"__builtin_ia32_mwaitx">,			: GCCBuiltin<"__builtin_ia32_mwaitx">,
	Intrinsic<[], [ llvm_i32_ty, llvm_i32_ty, llvm_i32_ty ], []>;			Intrinsic<[], [ llvm_i32_ty, llvm_i32_ty, llvm_i32_ty ], []>;
	}			}

				//===----------------------------------------------------------------------===//
				// Cache-line zero
				let TargetPrefix = "x86" in {
				def int_x86_clzero : GCCBuiltin<"__builtin_ia32_clzero">,
				Intrinsic<[], [llvm_ptr_ty], []>;
				}

llvm/trunk/lib/Support/Host.cpp

Show First 20 Lines • Show All 1,347 Lines • ▼ Show 20 Lines	bool sys::getHostCPUFeatures(StringMap<bool> &Features) {
Features["lzcnt"] = HasExtLeaf1 && ((ECX >> 5) & 1);		Features["lzcnt"] = HasExtLeaf1 && ((ECX >> 5) & 1);
Features["sse4a"] = HasExtLeaf1 && ((ECX >> 6) & 1);		Features["sse4a"] = HasExtLeaf1 && ((ECX >> 6) & 1);
Features["prfchw"] = HasExtLeaf1 && ((ECX >> 8) & 1);		Features["prfchw"] = HasExtLeaf1 && ((ECX >> 8) & 1);
Features["xop"] = HasExtLeaf1 && ((ECX >> 11) & 1) && HasAVXSave;		Features["xop"] = HasExtLeaf1 && ((ECX >> 11) & 1) && HasAVXSave;
Features["fma4"] = HasExtLeaf1 && ((ECX >> 16) & 1) && HasAVXSave;		Features["fma4"] = HasExtLeaf1 && ((ECX >> 16) & 1) && HasAVXSave;
Features["tbm"] = HasExtLeaf1 && ((ECX >> 21) & 1);		Features["tbm"] = HasExtLeaf1 && ((ECX >> 21) & 1);
Features["mwaitx"] = HasExtLeaf1 && ((ECX >> 29) & 1);		Features["mwaitx"] = HasExtLeaf1 && ((ECX >> 29) & 1);

		bool HasExtLeaf8 = MaxExtLevel >= 0x80000008 &&
		!getX86CpuIDAndInfoEx(0x80000008,0x0, &EAX, &EBX, &ECX, &EDX);
		Features["clzero"] = HasExtLeaf8 && ((EBX >> 0) & 1);

bool HasLeaf7 =		bool HasLeaf7 =
MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);		MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);

// AVX2 is only supported if we have the OS save support from AVX.		// AVX2 is only supported if we have the OS save support from AVX.
Features["avx2"] = HasAVXSave && HasLeaf7 && ((EBX >> 5) & 1);		Features["avx2"] = HasAVXSave && HasLeaf7 && ((EBX >> 5) & 1);

Features["fsgsbase"] = HasLeaf7 && ((EBX >> 0) & 1);		Features["fsgsbase"] = HasLeaf7 && ((EBX >> 0) & 1);
Features["sgx"] = HasLeaf7 && ((EBX >> 2) & 1);		Features["sgx"] = HasLeaf7 && ((EBX >> 2) & 1);
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86.td

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	def FeaturePRFCHW : SubtargetFeature<"prfchw", "HasPRFCHW", "true",			def FeaturePRFCHW : SubtargetFeature<"prfchw", "HasPRFCHW", "true",
	"Support PRFCHW instructions">;			"Support PRFCHW instructions">;
	def FeatureRDSEED : SubtargetFeature<"rdseed", "HasRDSEED", "true",			def FeatureRDSEED : SubtargetFeature<"rdseed", "HasRDSEED", "true",
	"Support RDSEED instruction">;			"Support RDSEED instruction">;
	def FeatureLAHFSAHF : SubtargetFeature<"sahf", "HasLAHFSAHF", "true",			def FeatureLAHFSAHF : SubtargetFeature<"sahf", "HasLAHFSAHF", "true",
	"Support LAHF and SAHF instructions">;			"Support LAHF and SAHF instructions">;
	def FeatureMWAITX : SubtargetFeature<"mwaitx", "HasMWAITX", "true",			def FeatureMWAITX : SubtargetFeature<"mwaitx", "HasMWAITX", "true",
	"Enable MONITORX/MWAITX timer functionality">;			"Enable MONITORX/MWAITX timer functionality">;
				def FeatureCLZERO : SubtargetFeature<"clzero", "HasCLZERO", "true",
				"Enable Cache Line Zero">;
	def FeatureMPX : SubtargetFeature<"mpx", "HasMPX", "true",			def FeatureMPX : SubtargetFeature<"mpx", "HasMPX", "true",
	"Support MPX instructions">;			"Support MPX instructions">;
	def FeatureLEAForSP : SubtargetFeature<"lea-sp", "UseLeaForSP", "true",			def FeatureLEAForSP : SubtargetFeature<"lea-sp", "UseLeaForSP", "true",
	"Use LEA for adjusting the stack pointer">;			"Use LEA for adjusting the stack pointer">;
	def FeatureSlowDivide32 : SubtargetFeature<"idivl-to-divb",			def FeatureSlowDivide32 : SubtargetFeature<"idivl-to-divb",
	"HasSlowDivide32", "true",			"HasSlowDivide32", "true",
	"Use 8-bit divide for positive values less than 256">;			"Use 8-bit divide for positive values less than 256">;
	def FeatureSlowDivide64 : SubtargetFeature<"idivq-to-divl",			def FeatureSlowDivide64 : SubtargetFeature<"idivq-to-divl",
	▲ Show 20 Lines • Show All 547 Lines • ▼ Show 20 Lines
	// Zen			// Zen
	def: ProcessorModel<"znver1", BtVer2Model, [			def: ProcessorModel<"znver1", BtVer2Model, [
	FeatureADX,			FeatureADX,
	FeatureAES,			FeatureAES,
	FeatureAVX2,			FeatureAVX2,
	FeatureBMI,			FeatureBMI,
	FeatureBMI2,			FeatureBMI2,
	FeatureCLFLUSHOPT,			FeatureCLFLUSHOPT,
				FeatureCLZERO,
	FeatureCMPXCHG16B,			FeatureCMPXCHG16B,
	FeatureF16C,			FeatureF16C,
	FeatureFMA,			FeatureFMA,
	FeatureFSGSBase,			FeatureFSGSBase,
	FeatureFXSR,			FeatureFXSR,
	FeatureFastLZCNT,			FeatureFastLZCNT,
	FeatureLAHFSAHF,			FeatureLAHFSAHF,
	FeatureLZCNT,			FeatureLZCNT,
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,412 Lines • ▼ Show 20 Lines	static MachineBasicBlock emitMonitor(MachineInstr &MI, MachineBasicBlock BB,

// The instruction doesn't actually take any operands though.		// The instruction doesn't actually take any operands though.
BuildMI(*BB, MI, dl, TII->get(Opc));		BuildMI(*BB, MI, dl, TII->get(Opc));

MI.eraseFromParent(); // The pseudo is gone now.		MI.eraseFromParent(); // The pseudo is gone now.
return BB;		return BB;
}		}

		static MachineBasicBlock emitClzero(MachineInstr MI, MachineBasicBlock *BB,
		const X86Subtarget &Subtarget) {
		DebugLoc dl = MI->getDebugLoc();
		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		// Address into RAX/EAX
		unsigned MemOpc = Subtarget.is64Bit() ? X86::LEA64r : X86::LEA32r;
		unsigned MemReg = Subtarget.is64Bit() ? X86::RAX : X86::EAX;
		MachineInstrBuilder MIB = BuildMI(*BB, MI, dl, TII->get(MemOpc), MemReg);
		for (int i = 0; i < X86::AddrNumOperands; ++i)
		MIB.add(MI->getOperand(i));

		// The instruction doesn't actually take any operands though.
		BuildMI(*BB, MI, dl, TII->get(X86::CLZEROr));

		MI->eraseFromParent(); // The pseudo is gone now.
		return BB;
		}



MachineBasicBlock *		MachineBasicBlock *
X86TargetLowering::EmitVAARG64WithCustomInserter(MachineInstr &MI,		X86TargetLowering::EmitVAARG64WithCustomInserter(MachineInstr &MI,
MachineBasicBlock *MBB) const {		MachineBasicBlock *MBB) const {
// Emit va_arg instruction on X86-64.		// Emit va_arg instruction on X86-64.

// Operands to this pseudo-instruction:		// Operands to this pseudo-instruction:
// 0 ) Output : destination address (reg)		// 0 ) Output : destination address (reg)
// 1-5) Input : va_list address (addr, i64mem)		// 1-5) Input : va_list address (addr, i64mem)
▲ Show 20 Lines • Show All 1,604 Lines • ▼ Show 20 Lines	assert(Subtarget.hasSSE42() &&
"Target must have SSE4.2 or AVX features enabled");		"Target must have SSE4.2 or AVX features enabled");
return emitPCMPSTRI(MI, BB, Subtarget.getInstrInfo());		return emitPCMPSTRI(MI, BB, Subtarget.getInstrInfo());

// Thread synchronization.		// Thread synchronization.
case X86::MONITOR:		case X86::MONITOR:
return emitMonitor(MI, BB, Subtarget, X86::MONITORrrr);		return emitMonitor(MI, BB, Subtarget, X86::MONITORrrr);
case X86::MONITORX:		case X86::MONITORX:
return emitMonitor(MI, BB, Subtarget, X86::MONITORXrrr);		return emitMonitor(MI, BB, Subtarget, X86::MONITORXrrr);

		// Cache line zero
		case X86::CLZERO:
		return emitClzero(&MI, BB, Subtarget);

// PKU feature		// PKU feature
case X86::WRPKRU:		case X86::WRPKRU:
return emitWRPKRU(MI, BB, Subtarget);		return emitWRPKRU(MI, BB, Subtarget);
case X86::RDPKRU:		case X86::RDPKRU:
return emitRDPKRU(MI, BB, Subtarget);		return emitRDPKRU(MI, BB, Subtarget);
// xbegin		// xbegin
case X86::XBEGIN:		case X86::XBEGIN:
return emitXBegin(MI, BB, Subtarget.getInstrInfo());		return emitXBegin(MI, BB, Subtarget.getInstrInfo());
▲ Show 20 Lines • Show All 9,223 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 853 Lines • ▼ Show 20 Lines
	def HasTSX : Predicate<"Subtarget->hasRTM() \|\| Subtarget->hasHLE()">;			def HasTSX : Predicate<"Subtarget->hasRTM() \|\| Subtarget->hasHLE()">;
	def HasADX : Predicate<"Subtarget->hasADX()">;			def HasADX : Predicate<"Subtarget->hasADX()">;
	def HasSHA : Predicate<"Subtarget->hasSHA()">;			def HasSHA : Predicate<"Subtarget->hasSHA()">;
	def HasPRFCHW : Predicate<"Subtarget->hasPRFCHW()">;			def HasPRFCHW : Predicate<"Subtarget->hasPRFCHW()">;
	def HasRDSEED : Predicate<"Subtarget->hasRDSEED()">;			def HasRDSEED : Predicate<"Subtarget->hasRDSEED()">;
	def HasPrefetchW : Predicate<"Subtarget->hasPRFCHW()">;			def HasPrefetchW : Predicate<"Subtarget->hasPRFCHW()">;
	def HasLAHFSAHF : Predicate<"Subtarget->hasLAHFSAHF()">;			def HasLAHFSAHF : Predicate<"Subtarget->hasLAHFSAHF()">;
	def HasMWAITX : Predicate<"Subtarget->hasMWAITX()">;			def HasMWAITX : Predicate<"Subtarget->hasMWAITX()">;
				def HasCLZERO : Predicate<"Subtarget->hasCLZERO()">;
	def FPStackf32 : Predicate<"!Subtarget->hasSSE1()">;			def FPStackf32 : Predicate<"!Subtarget->hasSSE1()">;
	def FPStackf64 : Predicate<"!Subtarget->hasSSE2()">;			def FPStackf64 : Predicate<"!Subtarget->hasSSE2()">;
	def HasMPX : Predicate<"Subtarget->hasMPX()">;			def HasMPX : Predicate<"Subtarget->hasMPX()">;
	def HasCLFLUSHOPT : Predicate<"Subtarget->hasCLFLUSHOPT()">;			def HasCLFLUSHOPT : Predicate<"Subtarget->hasCLFLUSHOPT()">;
	def HasCmpxchg16b: Predicate<"Subtarget->hasCmpxchg16b()">;			def HasCmpxchg16b: Predicate<"Subtarget->hasCmpxchg16b()">;
	def Not64BitMode : Predicate<"!Subtarget->is64Bit()">,			def Not64BitMode : Predicate<"!Subtarget->is64Bit()">,
	AssemblerPredicate<"!Mode64Bit", "Not 64-bit mode">;			AssemblerPredicate<"!Mode64Bit", "Not 64-bit mode">;
	def In64BitMode : Predicate<"Subtarget->is64Bit()">,			def In64BitMode : Predicate<"Subtarget->is64Bit()">,
	▲ Show 20 Lines • Show All 1,581 Lines • ▼ Show 20 Lines
	def : InstAlias<"monitorx\t{%eax, %ecx, %edx\|edx, ecx, eax}", (MONITORXrrr)>,			def : InstAlias<"monitorx\t{%eax, %ecx, %edx\|edx, ecx, eax}", (MONITORXrrr)>,
	Requires<[ Not64BitMode ]>;			Requires<[ Not64BitMode ]>;
	def : InstAlias<"monitorx\t{%rax, %rcx, %rdx\|rdx, rcx, rax}", (MONITORXrrr)>,			def : InstAlias<"monitorx\t{%rax, %rcx, %rdx\|rdx, rcx, rax}", (MONITORXrrr)>,
	Requires<[ In64BitMode ]>;			Requires<[ In64BitMode ]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// CLZERO Instruction			// CLZERO Instruction
	//			//
				let SchedRW = [WriteSystem] in {
	let Uses = [EAX] in			let Uses = [EAX] in
	def CLZEROr : I<0x01, MRM_FC, (outs), (ins), "clzero", []>, TB;			def CLZEROr : I<0x01, MRM_FC, (outs), (ins), "clzero", [], IIC_SSE_CLZERO>,
				TB, Requires<[HasCLZERO]>;

				let usesCustomInserter = 1 in {
				def CLZERO : PseudoI<(outs), (ins i32mem:$src1),
				[(int_x86_clzero addr:$src1)]>, Requires<[HasCLZERO]>;
				}
				} // SchedRW

				def : InstAlias<"clzero\t{%eax\|eax}", (CLZEROr)>, Requires<[Not64BitMode]>;
				def : InstAlias<"clzero\t{%rax\|rax}", (CLZEROr)>, Requires<[In64BitMode]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Pattern fragments to auto generate TBM instructions.			// Pattern fragments to auto generate TBM instructions.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let Predicates = [HasTBM] in {			let Predicates = [HasTBM] in {
	def : Pat<(X86bextr GR32:$src1, (i32 imm:$src2)),			def : Pat<(X86bextr GR32:$src1, (i32 imm:$src2)),
	(BEXTRI32ri GR32:$src1, imm:$src2)>;			(BEXTRI32ri GR32:$src1, imm:$src2)>;
	▲ Show 20 Lines • Show All 659 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Schedule.td

	Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
	def IIC_SSE_PMADD : InstrItinClass;			def IIC_SSE_PMADD : InstrItinClass;
	def IIC_SSE_PMULHRSW : InstrItinClass;			def IIC_SSE_PMULHRSW : InstrItinClass;
	def IIC_SSE_PALIGNRR : InstrItinClass;			def IIC_SSE_PALIGNRR : InstrItinClass;
	def IIC_SSE_PALIGNRM : InstrItinClass;			def IIC_SSE_PALIGNRM : InstrItinClass;
	def IIC_SSE_MWAIT : InstrItinClass;			def IIC_SSE_MWAIT : InstrItinClass;
	def IIC_SSE_MONITOR : InstrItinClass;			def IIC_SSE_MONITOR : InstrItinClass;
	def IIC_SSE_MWAITX : InstrItinClass;			def IIC_SSE_MWAITX : InstrItinClass;
	def IIC_SSE_MONITORX : InstrItinClass;			def IIC_SSE_MONITORX : InstrItinClass;
				def IIC_SSE_CLZERO : InstrItinClass;

	def IIC_SSE_PREFETCH : InstrItinClass;			def IIC_SSE_PREFETCH : InstrItinClass;
	def IIC_SSE_PAUSE : InstrItinClass;			def IIC_SSE_PAUSE : InstrItinClass;
	def IIC_SSE_LFENCE : InstrItinClass;			def IIC_SSE_LFENCE : InstrItinClass;
	def IIC_SSE_MFENCE : InstrItinClass;			def IIC_SSE_MFENCE : InstrItinClass;
	def IIC_SSE_SFENCE : InstrItinClass;			def IIC_SSE_SFENCE : InstrItinClass;
	def IIC_SSE_LDMXCSR : InstrItinClass;			def IIC_SSE_LDMXCSR : InstrItinClass;
	def IIC_SSE_STMXCSR : InstrItinClass;			def IIC_SSE_STMXCSR : InstrItinClass;
	▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	protected:
bool HasRDSEED;		bool HasRDSEED;

/// Processor has LAHF/SAHF instructions.		/// Processor has LAHF/SAHF instructions.
bool HasLAHFSAHF;		bool HasLAHFSAHF;

/// Processor has MONITORX/MWAITX instructions.		/// Processor has MONITORX/MWAITX instructions.
bool HasMWAITX;		bool HasMWAITX;

		/// Processor has Cache Line Zero instruction
		bool HasCLZERO;

/// Processor has Prefetch with intent to Write instruction		/// Processor has Prefetch with intent to Write instruction
bool HasPFPREFETCHWT1;		bool HasPFPREFETCHWT1;

/// True if BT (bit test) of memory instructions are slow.		/// True if BT (bit test) of memory instructions are slow.
bool IsBTMemSlow;		bool IsBTMemSlow;

/// True if SHLD instructions are slow.		/// True if SHLD instructions are slow.
bool IsSHLDSlow;		bool IsSHLDSlow;
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	public:
bool hasRTM() const { return HasRTM; }		bool hasRTM() const { return HasRTM; }
bool hasHLE() const { return HasHLE; }		bool hasHLE() const { return HasHLE; }
bool hasADX() const { return HasADX; }		bool hasADX() const { return HasADX; }
bool hasSHA() const { return HasSHA; }		bool hasSHA() const { return HasSHA; }
bool hasPRFCHW() const { return HasPRFCHW; }		bool hasPRFCHW() const { return HasPRFCHW; }
bool hasRDSEED() const { return HasRDSEED; }		bool hasRDSEED() const { return HasRDSEED; }
bool hasLAHFSAHF() const { return HasLAHFSAHF; }		bool hasLAHFSAHF() const { return HasLAHFSAHF; }
bool hasMWAITX() const { return HasMWAITX; }		bool hasMWAITX() const { return HasMWAITX; }
		bool hasCLZERO() const { return HasCLZERO; }
bool isBTMemSlow() const { return IsBTMemSlow; }		bool isBTMemSlow() const { return IsBTMemSlow; }
bool isSHLDSlow() const { return IsSHLDSlow; }		bool isSHLDSlow() const { return IsSHLDSlow; }
bool isPMULLDSlow() const { return IsPMULLDSlow; }		bool isPMULLDSlow() const { return IsPMULLDSlow; }
bool isUnalignedMem16Slow() const { return IsUAMem16Slow; }		bool isUnalignedMem16Slow() const { return IsUAMem16Slow; }
bool isUnalignedMem32Slow() const { return IsUAMem32Slow; }		bool isUnalignedMem32Slow() const { return IsUAMem32Slow; }
bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }		bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }
bool hasCmpxchg16b() const { return HasCmpxchg16b; }		bool hasCmpxchg16b() const { return HasCmpxchg16b; }
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.cpp

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
HasVLX = false;		HasVLX = false;
HasADX = false;		HasADX = false;
HasPKU = false;		HasPKU = false;
HasSHA = false;		HasSHA = false;
HasPRFCHW = false;		HasPRFCHW = false;
HasRDSEED = false;		HasRDSEED = false;
HasLAHFSAHF = false;		HasLAHFSAHF = false;
HasMWAITX = false;		HasMWAITX = false;
		HasCLZERO = false;
HasMPX = false;		HasMPX = false;
IsBTMemSlow = false;		IsBTMemSlow = false;
IsPMULLDSlow = false;		IsPMULLDSlow = false;
IsSHLDSlow = false;		IsSHLDSlow = false;
IsUAMem16Slow = false;		IsUAMem16Slow = false;
IsUAMem32Slow = false;		IsUAMem32Slow = false;
HasSSEUnalignedMem = false;		HasSSEUnalignedMem = false;
HasCmpxchg16b = false;		HasCmpxchg16b = false;
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/clzero.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-linux -mattr=+clzero \| FileCheck %s --check-prefix=X64
				; RUN: llc < %s -mtriple=i386-pc-linux -mattr=+clzero \| FileCheck %s --check-prefix=X32

				define void @foo(i8* %p) #0 {
				; X64-LABEL: foo:
				; X64: # BB#0: # %entry
				; X64-NEXT: leaq (%rdi), %rax
				; X64-NEXT: clzero
				; X64-NEXT: retq
				;
				; X32-LABEL: foo:
				; X32: # BB#0: # %entry
				; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X32-NEXT: leal (%eax), %eax
				; X32-NEXT: clzero
				; X32-NEXT: retl
				entry:
				tail call void @llvm.x86.clzero(i8* %p) #1
				ret void
				}

				declare void @llvm.x86.clzero(i8*) #1

llvm/trunk/test/MC/Disassembler/X86/x86-32.txt

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	0x0f 0x01 0xdd			0x0f 0x01 0xdd

	# CHECK: skinit			# CHECK: skinit
	0x0f 0x01 0xde			0x0f 0x01 0xde

	# CHECK: invlpga			# CHECK: invlpga
	0x0f 0x01 0xdf			0x0f 0x01 0xdf

				# CHECK: clzero
				0x0f,0x01,0xfc

	# CHECK: movl $0, -4(%ebp)			# CHECK: movl $0, -4(%ebp)
	0xc7 0x45 0xfc 0x00 0x00 0x00 0x00			0xc7 0x45 0xfc 0x00 0x00 0x00 0x00

	# CHECK: movl %cr0, %ecx			# CHECK: movl %cr0, %ecx
	0x0f 0x20 0xc1			0x0f 0x20 0xc1

	# CHECK: leal 4(%esp), %ecx			# CHECK: leal 4(%esp), %ecx
	0x8d 0x4c 0x24 0x04			0x8d 0x4c 0x24 0x04
	▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

llvm/trunk/test/MC/X86/x86-32.s

	Show First 20 Lines • Show All 438 Lines • ▼ Show 20 Lines
	// CHECK: movl %dr6, %eax			// CHECK: movl %dr6, %eax
	// CHECK: encoding: [0x0f,0x21,0xf0]			// CHECK: encoding: [0x0f,0x21,0xf0]
	movl %dr6,%eax			movl %dr6,%eax

	// CHECK: movl %dr7, %eax			// CHECK: movl %dr7, %eax
	// CHECK: encoding: [0x0f,0x21,0xf8]			// CHECK: encoding: [0x0f,0x21,0xf8]
	movl %dr7,%eax			movl %dr7,%eax

				// CHECK: clzero
				// CHECK: encoding: [0x0f,0x01,0xfc]
				clzero

				// CHECK: clzero
				// CHECK: encoding: [0x0f,0x01,0xfc]
				clzero %eax

	// radr://8017522			// radr://8017522
	// CHECK: wait			// CHECK: wait
	// CHECK: encoding: [0x9b]			// CHECK: encoding: [0x9b]
	fwait			fwait

	// rdar://7873482			// rdar://7873482
	// CHECK: [0x65,0xa1,0x7c,0x00,0x00,0x00]			// CHECK: [0x65,0xa1,0x7c,0x00,0x00,0x00]
	movl %gs:124, %eax			movl %gs:124, %eax
	▲ Show 20 Lines • Show All 637 Lines • Show Last 20 Lines

llvm/trunk/test/MC/X86/x86-64.s

	Show First 20 Lines • Show All 1,496 Lines • ▼ Show 20 Lines
	// CHECK: mwaitx			// CHECK: mwaitx
	// CHECK: encoding: [0x0f,0x01,0xfb]			// CHECK: encoding: [0x0f,0x01,0xfb]
	mwaitx			mwaitx

	// CHECK: mwaitx			// CHECK: mwaitx
	// CHECK: encoding: [0x0f,0x01,0xfb]			// CHECK: encoding: [0x0f,0x01,0xfb]
	mwaitx %rax, %rcx, %rbx			mwaitx %rax, %rcx, %rbx

				// CHECK: clzero
				// CHECK: encoding: [0x0f,0x01,0xfc]
				clzero

				// CHECK: clzero
				// CHECK: encoding: [0x0f,0x01,0xfc]
				clzero %rax

	// CHECK: movl %r15d, (%r15,%r15)			// CHECK: movl %r15d, (%r15,%r15)
	// CHECK: encoding: [0x47,0x89,0x3c,0x3f]			// CHECK: encoding: [0x47,0x89,0x3c,0x3f]
	movl %r15d, (%r15,%r15)			movl %r15d, (%r15,%r15)

This is an archive of the discontinued LLVM Phabricator instance.

Clzero intrinsic and its addition under znver1ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 87756

llvm/trunk/include/llvm/IR/IntrinsicsX86.td

llvm/trunk/lib/Support/Host.cpp

llvm/trunk/lib/Target/X86/X86.td

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/lib/Target/X86/X86InstrInfo.td

llvm/trunk/lib/Target/X86/X86Schedule.td

llvm/trunk/lib/Target/X86/X86Subtarget.h

llvm/trunk/lib/Target/X86/X86Subtarget.cpp

llvm/trunk/test/CodeGen/X86/clzero.ll

llvm/trunk/test/MC/Disassembler/X86/x86-32.txt

llvm/trunk/test/MC/X86/x86-32.s

llvm/trunk/test/MC/X86/x86-64.s

Clzero intrinsic and its addition under znver1
ClosedPublic