This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
bypass-slow-division-tune.ll

Differential D75567

[x86] Enable bypassing 64-bit division on generic x86-64
ClosedPublic

Authored by atdt on Mar 3 2020, 1:54 PM.

Download Raw Diff

Details

Reviewers

echristo
craig.topper
RKSimon

Commits

rGf0903de1aa7a: [x86] Enable bypassing 64-bit division on generic x86-64

Summary

This is currently enabled for Intel big cores from Sandy Bridge onward, as well as Atom, Silvermont, and KNL, due to 64-bit division being so slow on these cores. AMD cores can do this in hardware (use 32-bit division based on input operand width), so it's not a win there. But since the majority of x86 CPUs benefit from this optimization, and since the potential upside is significantly greater than the downside, we should enable this for the generic x86-64 target.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

atdt created this revision.Mar 3 2020, 1:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 1:54 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

RKSimon added reviewers: craig.topper, RKSimon.Mar 3 2020, 2:45 PM

Harbormaster completed remote builds in B47969: Diff 248023.Mar 3 2020, 3:02 PM

Test cases?

@craig.topper Any objections?

In D75567#1959215, @RKSimon wrote:

Test cases?

I'm not sure what kind of test case to add. The codegen tests for bypassing slow division use -mattr=[+-]idivq-to-divl to explicitly control this feature, rather than assume it is enabled or disabled for a particular target.

I don't have any objections

In D75567#1959903, @atdt wrote:

In D75567#1959215, @RKSimon wrote:

Test cases?

I'm not sure what kind of test case to add. The codegen tests for bypassing slow division use -mattr=[+-]idivq-to-divl to explicitly control this feature, rather than assume it is enabled or disabled for a particular target.

In which case we probably just need to add an additional -mcpu=x86-64 test to bypass-slow-division-tune.ll ?

RKSimon mentioned this in rG2bcbf1319e9c: [X86] Add generic cpu target for the slow division tests.Apr 15 2020, 12:05 PM

Update unit test for bypassing 64-bit division on generic x86-64

LGTM

This revision is now accepted and ready to land.Apr 24 2020, 3:11 PM

Harbormaster completed remote builds in B54630: Diff 260005.Apr 24 2020, 3:44 PM

LGTM - cheers

Thanks! I can't commit this myself.

Closed by commit rGf0903de1aa7a: [x86] Enable bypassing 64-bit division on generic x86-64 (authored by RKSimon). · Explain WhyApr 29 2020, 9:06 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86.td

1 line

test/

CodeGen/

X86/

bypass-slow-division-tune.ll

22 lines

Diff 260937

llvm/lib/Target/X86/X86.td

Show First 20 Lines • Show All 1,254 Lines • ▼ Show 20 Lines	def : ProcessorModel<"x86-64", SandyBridgeModel, [
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
FeatureSSE2,		FeatureSSE2,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureSlow3OpsLEA,		FeatureSlow3OpsLEA,
		FeatureSlowDivide64,
FeatureSlowIncDec,		FeatureSlowIncDec,
FeatureMacroFusion,		FeatureMacroFusion,
FeatureInsertVZEROUPPER		FeatureInsertVZEROUPPER
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Calling Conventions		// Calling Conventions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/bypass-slow-division-tune.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; ATOM-NEXT: xorl %edx, %edx			; ATOM-NEXT: xorl %edx, %edx
	; ATOM-NEXT: divl %esi			; ATOM-NEXT: divl %esi
	; ATOM-NEXT: # kill: def $eax killed $eax def $rax			; ATOM-NEXT: # kill: def $eax killed $eax def $rax
	; ATOM-NEXT: retq			; ATOM-NEXT: retq
	;			;
	; X64-LABEL: div64:			; X64-LABEL: div64:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: movq %rdi, %rcx
				; X64-NEXT: orq %rsi, %rcx
				; X64-NEXT: shrq $32, %rcx
				; X64-NEXT: je .LBB1_1
				; X64-NEXT: # %bb.2:
	; X64-NEXT: cqto			; X64-NEXT: cqto
	; X64-NEXT: idivq %rsi			; X64-NEXT: idivq %rsi
	; X64-NEXT: retq			; X64-NEXT: retq
				; X64-NEXT: .LBB1_1:
				; X64-NEXT: # kill: def $eax killed $eax killed $rax
				; X64-NEXT: xorl %edx, %edx
				; X64-NEXT: divl %esi
				; X64-NEXT: # kill: def $eax killed $eax def $rax
				; X64-NEXT: retq
	;			;
	; SLM-LABEL: div64:			; SLM-LABEL: div64:
	; SLM: # %bb.0: # %entry			; SLM: # %bb.0: # %entry
	; SLM-NEXT: movq %rdi, %rcx			; SLM-NEXT: movq %rdi, %rcx
	; SLM-NEXT: movq %rdi, %rax			; SLM-NEXT: movq %rdi, %rax
	; SLM-NEXT: orq %rsi, %rcx			; SLM-NEXT: orq %rsi, %rcx
	; SLM-NEXT: shrq $32, %rcx			; SLM-NEXT: shrq $32, %rcx
	; SLM-NEXT: je .LBB1_1			; SLM-NEXT: je .LBB1_1
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	; ATOM-NEXT: xorl %edx, %edx			; ATOM-NEXT: xorl %edx, %edx
	; ATOM-NEXT: divl %esi			; ATOM-NEXT: divl %esi
	; ATOM-NEXT: # kill: def $eax killed $eax def $rax			; ATOM-NEXT: # kill: def $eax killed $eax def $rax
	; ATOM-NEXT: retq			; ATOM-NEXT: retq
	;			;
	; X64-LABEL: div64_hugews:			; X64-LABEL: div64_hugews:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
				; X64-NEXT: movq %rdi, %rcx
				; X64-NEXT: orq %rsi, %rcx
				; X64-NEXT: shrq $32, %rcx
				; X64-NEXT: je .LBB4_1
				; X64-NEXT: # %bb.2:
	; X64-NEXT: cqto			; X64-NEXT: cqto
	; X64-NEXT: idivq %rsi			; X64-NEXT: idivq %rsi
	; X64-NEXT: retq			; X64-NEXT: retq
				; X64-NEXT: .LBB4_1:
				; X64-NEXT: # kill: def $eax killed $eax killed $rax
				; X64-NEXT: xorl %edx, %edx
				; X64-NEXT: divl %esi
				; X64-NEXT: # kill: def $eax killed $eax def $rax
				; X64-NEXT: retq
	;			;
	; SLM-LABEL: div64_hugews:			; SLM-LABEL: div64_hugews:
	; SLM: # %bb.0:			; SLM: # %bb.0:
	; SLM-NEXT: movq %rdi, %rcx			; SLM-NEXT: movq %rdi, %rcx
	; SLM-NEXT: movq %rdi, %rax			; SLM-NEXT: movq %rdi, %rax
	; SLM-NEXT: orq %rsi, %rcx			; SLM-NEXT: orq %rsi, %rcx
	; SLM-NEXT: shrq $32, %rcx			; SLM-NEXT: shrq $32, %rcx
	; SLM-NEXT: je .LBB4_1			; SLM-NEXT: je .LBB4_1
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines