This is an archive of the discontinued LLVM Phabricator instance.

AMD family 17h (znver1) enablement
ClosedPublic

Authored by GGanesh on Dec 21 2016, 3:03 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper

Commits

rGd55b83128bc1: AMD family 17h (znver1) enablement
rL291543: AMD family 17h (znver1) enablement

Summary

This patch enables the following

AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
ISAs that are enabled for "znver1" architecture.
Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
Enables CLZERO feature and adds the builtin macro __builtin_ia32_clzero for clzero instruction.
ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
For the time being, it uses the btver2 scheduler model.
Test file is updated to check this flag.

This item is linked to clang review item https://reviews.llvm.org/D28018

Diff Detail

Repository: rL LLVM

Event Timeline

GGanesh updated this revision to Diff 82213.Dec 21 2016, 3:03 AM

GGanesh retitled this revision from to AMD family 17h (znver1) enablement.

GGanesh updated this object.

GGanesh added a reviewer: craig.topper.

GGanesh set the repository for this revision to rL LLVM.

GGanesh added a subscriber: llvm-commits.

GGanesh updated this object.Dec 21 2016, 3:08 AM

ashutosh.nema added a subscriber: ashutosh.nema.Dec 21 2016, 3:09 AM

This needs tests for the clzero intrinsic (behind a -mattr=+clzero arg for 32 and 64-bit targets as well as -mcpu=znver1), in fact it probably make sense to separate the clzero support out into a separate preliminary patch and then add -mcpu=znver1 test as part of this one?

lib/Target/X86/X86.td
804 ↗	(On Diff #82213)	Only one use so far - probably best to just declare it as : def : ProcessorModel<"znver1", BtVer2Model, [ Also, add a TODO comment for a znver1 scheduler model

vprasad added a subscriber: vprasad.Dec 21 2016, 5:20 AM

craig.topper added inline comments.Dec 21 2016, 8:57 AM

lib/Target/X86/X86InstrInfo.td
2456 ↗	(On Diff #82213)	The custom inserter needs to be implemented in X86ISelLowering.cpp. As Simon said, clzero should be split out of this patch.

RKSimon added inline comments.Dec 21 2016, 12:25 PM

lib/Target/X86/X86.td
799 ↗	(On Diff #82213)	GCC seems to think znver1 has MOVBE support - who is right?

I am preparing a patch which doesn't include the clzero feature patch.
I will submit a separate patch for clzero feature patch.

The clzero intrinsic handling and feature addition will be handled as a separate patch.
Added movbe and sse4a into ISA list of znver1.

A few minors but I don't have any major concerns.

Add znver1 tests to llvm\test\CodeGen\X86\slow-unaligned-mem.ll

lib/Target/X86/X86.td
764 ↗	(On Diff #83567)	Are we happy with alphabetical ordering of the feature bits? We don't seem to be consistent for this for many targets at all.
767 ↗	(On Diff #83567)	Remove FeatureAVX - it will be implicitly included as FeatureAVX2 is set.
777 ↗	(On Diff #83567)	Add znver1 to llvm\test\CodeGen\X86\lzcnt-zext-cmp.ll
791 ↗	(On Diff #83567)	Remove FeatureSSSE3 - it will be implicitly included as FeatureAVX2 is set.
792 ↗	(On Diff #83567)	Add znver1 to llvm\test\CodeGen\X86\x86-64-double-shifts-var.ll as you're testing for slow SHLD

RKSimon mentioned this in D28018: AMD family 17h (znver1) enablement.Jan 8 2017, 9:23 AM

Adding znver1 to following tests.
a. LZCNT
b. Slow SHLD
c. slow unaligned memory

Added a TODO noting down znver1 scheduler model is due.

LGTM - @craig.topper any additional comments?

LGTM

This revision is now accepted and ready to land.Jan 9 2017, 7:24 PM

Closed by commit rL291543: AMD family 17h (znver1) enablement (authored by ctopper). · Explain WhyJan 9 2017, 10:12 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Support/

Host.cpp

19 lines

Target/

X86/

X86.td

36 lines

test/

CodeGen/

X86/

cpus.ll

1 line

lzcnt-zext-cmp.ll

2 lines

slow-unaligned-mem.ll

1 line

x86-64-double-shifts-var.ll

1 line

Diff 83778

llvm/trunk/lib/Support/Host.cpp

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	enum ProcessorTypes {
INTEL_X86_64,		INTEL_X86_64,
INTEL_NOCONA,		INTEL_NOCONA,
INTEL_PRESCOTT,		INTEL_PRESCOTT,
AMD_i486,		AMD_i486,
AMDPENTIUM,		AMDPENTIUM,
AMDATHLON,		AMDATHLON,
AMDFAM14H,		AMDFAM14H,
AMDFAM16H,		AMDFAM16H,
		AMDFAM17H,
CPU_TYPE_MAX		CPU_TYPE_MAX
};		};

enum ProcessorSubtypes {		enum ProcessorSubtypes {
INTEL_COREI7_NEHALEM = 1,		INTEL_COREI7_NEHALEM = 1,
INTEL_COREI7_WESTMERE,		INTEL_COREI7_WESTMERE,
INTEL_COREI7_SANDYBRIDGE,		INTEL_COREI7_SANDYBRIDGE,
AMDFAM10H_BARCELONA,		AMDFAM10H_BARCELONA,
Show All 22 Lines	enum ProcessorSubtypes {
AMDATHLON_K8SSE3,		AMDATHLON_K8SSE3,
AMDATHLON_OPTERON,		AMDATHLON_OPTERON,
AMDATHLON_FX,		AMDATHLON_FX,
AMDATHLON_64,		AMDATHLON_64,
AMD_BTVER1,		AMD_BTVER1,
AMD_BTVER2,		AMD_BTVER2,
AMDFAM15H_BDVER3,		AMDFAM15H_BDVER3,
AMDFAM15H_BDVER4,		AMDFAM15H_BDVER4,
		AMDFAM17H_ZNVER1,
CPU_SUBTYPE_MAX		CPU_SUBTYPE_MAX
};		};

enum ProcessorFeatures {		enum ProcessorFeatures {
FEATURE_CMOV = 0,		FEATURE_CMOV = 0,
FEATURE_MMX,		FEATURE_MMX,
FEATURE_POPCNT,		FEATURE_POPCNT,
FEATURE_SSE,		FEATURE_SSE,
▲ Show 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	case 22:
*Type = AMDFAM16H;		*Type = AMDFAM16H;
if (!(Features &		if (!(Features &
(1 << FEATURE_AVX))) { // If no AVX support provide a sane fallback.		(1 << FEATURE_AVX))) { // If no AVX support provide a sane fallback.
*Subtype = AMD_BTVER1;		*Subtype = AMD_BTVER1;
break; // "btver1";		break; // "btver1";
}		}
*Subtype = AMD_BTVER2;		*Subtype = AMD_BTVER2;
break; // "btver2"		break; // "btver2"
		case 23:
		*Type = AMDFAM17H;
		if (Features & (1 << FEATURE_ADX)) {
		*Subtype = AMDFAM17H_ZNVER1;
		break; // "znver1"
		}
		*Subtype = AMD_BTVER1;
		break;
default:		default:
break; // "generic"		break; // "generic"
}		}
}		}

static unsigned getAvailableFeatures(unsigned int ECX, unsigned int EDX,		static unsigned getAvailableFeatures(unsigned int ECX, unsigned int EDX,
unsigned MaxLeaf) {		unsigned MaxLeaf) {
unsigned Features = 0;		unsigned Features = 0;
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	case AMDFAM16H:
switch (Subtype) {		switch (Subtype) {
case AMD_BTVER1:		case AMD_BTVER1:
return "btver1";		return "btver1";
case AMD_BTVER2:		case AMD_BTVER2:
return "btver2";		return "btver2";
default:		default:
return "amdfam16";		return "amdfam16";
}		}
		case AMDFAM17H:
		switch (Subtype) {
		case AMD_BTVER1:
		return "btver1";
		case AMDFAM17H_ZNVER1:
		return "znver1";
		default:
		return "amdfam17";
		}
default:		default:
return "generic";		return "generic";
}		}
}		}
return "generic";		return "generic";
}		}

#elif defined(__APPLE__) && (defined(__ppc__) \|\| defined(__powerpc__))		#elif defined(__APPLE__) && (defined(__ppc__) \|\| defined(__powerpc__))
▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86.td

Show First 20 Lines • Show All 754 Lines • ▼ Show 20 Lines	def : Proc<"bdver4", [
FeatureFMA,		FeatureFMA,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureMWAITX		FeatureMWAITX
]>;		]>;

		// TODO: The scheduler model falls to BTVER2 model.
		// The znver1 model has to be put in place.
		// Zen
		def: ProcessorModel<"znver1", BtVer2Model, [
		FeatureADX,
		FeatureAES,
		FeatureAVX2,
		FeatureBMI,
		FeatureBMI2,
		FeatureCLFLUSHOPT,
		FeatureCMPXCHG16B,
		FeatureF16C,
		FeatureFMA,
		FeatureFSGSBase,
		FeatureFXSR,
		FeatureFastLZCNT,
		FeatureLAHFSAHF,
		FeatureLZCNT,
		FeatureMMX,
		FeatureMOVBE,
		FeatureMWAITX,
		FeaturePCLMUL,
		FeaturePOPCNT,
		FeaturePRFCHW,
		FeatureRDRAND,
		FeatureRDSEED,
		FeatureSHA,
		FeatureSMAP,
		FeatureSSE4A,
		FeatureSlowSHLD,
		FeatureX87,
		FeatureXSAVE,
		FeatureXSAVEC,
		FeatureXSAVEOPT,
		FeatureXSAVES]>;

def : Proc<"geode", [FeatureX87, FeatureSlowUAMem16, Feature3DNowA]>;		def : Proc<"geode", [FeatureX87, FeatureSlowUAMem16, Feature3DNowA]>;

def : Proc<"winchip-c6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;		def : Proc<"winchip-c6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;
def : Proc<"winchip2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"winchip2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;
def : Proc<"c3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"c3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;
def : Proc<"c3-2", [FeatureX87, FeatureSlowUAMem16, FeatureMMX,		def : Proc<"c3-2", [FeatureX87, FeatureSlowUAMem16, FeatureMMX,
FeatureSSE1, FeatureFXSR]>;		FeatureSSE1, FeatureFXSR]>;

▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cpus.ll

	Show All 27 Lines
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=amdfam10 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=amdfam10 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=barcelona 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=barcelona 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver1 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver1 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver2 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver2 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver3 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver3 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver4 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=bdver4 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=btver1 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=btver1 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
	; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=btver2 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty			; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=btver2 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
				; RUN: llc < %s -o /dev/null -mtriple=x86_64-unknown-unknown -mcpu=znver1 2>&1 \| FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty

llvm/trunk/test/CodeGen/X86/lzcnt-zext-cmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; Test patterns which generates lzcnt instructions.			; Test patterns which generates lzcnt instructions.
	; Eg: zext(or(setcc(cmp), setcc(cmp))) -> shr(or(lzcnt, lzcnt))			; Eg: zext(or(setcc(cmp), setcc(cmp))) -> shr(or(lzcnt, lzcnt))
	; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=btver2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=btver2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=btver2 -mattr=-fast-lzcnt \| FileCheck --check-prefix=NOFASTLZCNT %s			; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=btver2 -mattr=-fast-lzcnt \| FileCheck --check-prefix=NOFASTLZCNT %s
				; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=znver1 \| FileCheck %s
				; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=znver1 -mattr=-fast-lzcnt \| FileCheck --check-prefix=NOFASTLZCNT %s

	; Test one 32-bit input, output is 32-bit, no transformations expected.			; Test one 32-bit input, output is 32-bit, no transformations expected.
	define i32 @test_zext_cmp0(i32 %a) {			define i32 @test_zext_cmp0(i32 %a) {
	; CHECK-LABEL: test_zext_cmp0:			; CHECK-LABEL: test_zext_cmp0:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NEXT: testl %edi, %edi
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	▲ Show 20 Lines • Show All 328 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll

	Show All 40 Lines
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=amdfam10 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=amdfam10 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=barcelona 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=barcelona 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=btver1 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=btver1 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=btver2 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=btver2 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver1 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver1 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver2 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver2 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver3 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver3 2>&1 \| FileCheck %s --check-prefix=FAST
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver4 2>&1 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=bdver4 2>&1 \| FileCheck %s --check-prefix=FAST
				; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=znver1 2>&1 \| FileCheck %s --check-prefix=FAST

	; Other chips with slow unaligned memory accesses			; Other chips with slow unaligned memory accesses

	; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=c3-2 2>&1 \| FileCheck %s --check-prefix=SLOW			; RUN: llc < %s -mtriple=i386-unknown-unknown -mcpu=c3-2 2>&1 \| FileCheck %s --check-prefix=SLOW

	; Verify that the slow/fast unaligned memory attribute is set correctly for each CPU model.			; Verify that the slow/fast unaligned memory attribute is set correctly for each CPU model.
	; Slow chips use 4-byte stores. Fast chips with SSE or later use something other than 4-byte stores.			; Slow chips use 4-byte stores. Fast chips with SSE or later use something other than 4-byte stores.
	; Chips that don't have SSE use 4-byte stores either way, so they're not tested.			; Chips that don't have SSE use 4-byte stores either way, so they're not tested.
	Show All 39 Lines

llvm/trunk/test/CodeGen/X86/x86-64-double-shifts-var.ll

	Show All 11 Lines
	; RUN: llc < %s -march=x86-64 -mcpu=athlon64-sse3 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=athlon64-sse3 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=amdfam10 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=amdfam10 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=btver1 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=btver1 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=btver2 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=btver2 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=bdver1 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=bdver1 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=bdver2 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=bdver2 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=bdver3 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=bdver3 \| FileCheck %s
	; RUN: llc < %s -march=x86-64 -mcpu=bdver4 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=bdver4 \| FileCheck %s
				; RUN: llc < %s -march=x86-64 -mcpu=znver1 \| FileCheck %s

	; Verify that for the X86_64 processors that are known to have poor latency			; Verify that for the X86_64 processors that are known to have poor latency
	; double precision shift instructions we do not generate 'shld' or 'shrd'			; double precision shift instructions we do not generate 'shld' or 'shrd'
	; instructions.			; instructions.

	;uint64_t lshift(uint64_t a, uint64_t b, int c)			;uint64_t lshift(uint64_t a, uint64_t b, int c)
	;{			;{
	; return (a << c) \| (b >> (64-c));			; return (a << c) \| (b >> (64-c));
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMD family 17h (znver1) enablementClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83778

llvm/trunk/lib/Support/Host.cpp

llvm/trunk/lib/Target/X86/X86.td

llvm/trunk/test/CodeGen/X86/cpus.ll

llvm/trunk/test/CodeGen/X86/lzcnt-zext-cmp.ll

llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll

llvm/trunk/test/CodeGen/X86/x86-64-double-shifts-var.ll

AMD family 17h (znver1) enablement
ClosedPublic