This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelDAGToDAG.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
br-fold.ll
-
critical-edge-split-2.ll
-
pr33290.ll
-
relocimm-small-model.ll

Differential D89341

[X86] Encode global symbol address in small code model
ClosedPublic

Authored by weiwang on Oct 13 2020, 1:52 PM.

Download Raw Diff

Details

Reviewers

craig.topper

Commits

rGd602e79a81ad: [X86] Encode global address in small code model

Summary

In small code model, program and its symbols are linked in the lower 2 GB of the address space. Try encoding global symbol address even when the range is unknown in such case.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	650 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp

Event Timeline

weiwang created this revision.Oct 13 2020, 1:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2020, 1:52 PM

Herald added subscribers: llvm-commits, wenlei, hiraditya. · View Herald Transcript

weiwang requested review of this revision.Oct 13 2020, 1:52 PM

weiwang edited the summary of this revision. (Show Details)Oct 13 2020, 1:57 PM

Herald added a subscriber: pengfei. · View Herald TranscriptOct 13 2020, 1:57 PM

weiwang added a reviewer: hoyFB.Oct 13 2020, 1:57 PM

zhuhan0 added a subscriber: zhuhan0.Oct 13 2020, 2:22 PM

Harbormaster completed remote builds in B74973: Diff 297951.Oct 13 2020, 2:23 PM

weiwang edited reviewers, added: craig.topper; removed: hoyFB.Oct 13 2020, 2:27 PM

Is this something we can detect in X86DAGToDAGISel::isSExtAbsoluteSymbolRef? We already have a pattern for X86sub_flag that calls that. Same for a bunch of other instructions.

Would be good to share benchmark/performance numbers you have for this change too. Thanks.

Thanks for the comment, Craig.

isSExtAbsoluteSymbolRef does seem to check for the width of immediate. If the immediate can be encoded directly, the node should be replaced with one of the SUB64ri* nodes. I think the X86Wrapper node can be replaced with a corresponding imm node if conditions are met, then the matching can proceed.

In D89341#2328984, @wenlei wrote:

Would be good to share benchmark/performance numbers you have for this change too. Thanks.

Sure. This change is to address some codegen difference we saw between clang and gcc from internal workloads. Gcc seems to prefer encoding address immediate into cmp in small code model. With the change, we saw 1% perf improvement on average across multiple workloads.

In D89341#2328996, @weiwang wrote:

Thanks for the comment, Craig.

isSExtAbsoluteSymbolRef does seem to check for the width of immediate. If the immediate can be encoded directly, the node should be replaced with one of the SUB64ri* nodes. I think the X86Wrapper node can be replaced with a corresponding imm node if conditions are met, then the matching can proceed.

I was thinking that if we don't have range information from getAbsoluteSymbolRange, but the Width passed to isSExtAbsoluteSymbolRef is 32 and the code model is small we could return true?

In D89341#2331338, @craig.topper wrote:

In D89341#2328996, @weiwang wrote:

Thanks for the comment, Craig.

isSExtAbsoluteSymbolRef does seem to check for the width of immediate. If the immediate can be encoded directly, the node should be replaced with one of the SUB64ri* nodes. I think the X86Wrapper node can be replaced with a corresponding imm node if conditions are met, then the matching can proceed.

I was thinking that if we don't have range information from getAbsoluteSymbolRange, but the Width passed to isSExtAbsoluteSymbolRef is 32 and the code model is small we could return true?

Yes, I think it is possible to do that. The similar check is also done in X86DAGToDAGISel::selectMOV64Imm32.

Sorry for the long delay. Got held up by some other task.

Update:

modified change as suggested.
fixed several test cases due to the change.
added a new test case.

Harbormaster completed remote builds in B76244: Diff 300397.Oct 23 2020, 2:25 PM

LGTM

This revision is now accepted and ready to land.Oct 24 2020, 10:33 PM

weiwang retitled this revision from [X86] Encode global symbol address in sub if possible to [X86] Encode global symbol address in small code model.Oct 26 2020, 9:50 AM

weiwang edited the summary of this revision. (Show Details)

This revision was landed with ongoing or failed builds.Oct 26 2020, 11:14 PM

Closed by commit rGd602e79a81ad: [X86] Encode global address in small code model (authored by weiwang). · Explain Why

This revision was automatically updated to reflect the committed changes.

weiwang added a commit: rGd602e79a81ad: [X86] Encode global address in small code model.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelDAGToDAG.cpp

7 lines

test/

CodeGen/

X86/

br-fold.ll

4 lines

critical-edge-split-2.ll

12 lines

pr33290.ll

15 lines

relocimm-small-model.ll

25 lines

Diff 300397

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,723 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::isSExtAbsoluteSymbolRef(unsigned Width, SDNode *N) const {
if (N->getOpcode() != X86ISD::Wrapper)		if (N->getOpcode() != X86ISD::Wrapper)
return false;		return false;

auto *GA = dyn_cast<GlobalAddressSDNode>(N->getOperand(0));		auto *GA = dyn_cast<GlobalAddressSDNode>(N->getOperand(0));
if (!GA)		if (!GA)
return false;		return false;

Optional<ConstantRange> CR = GA->getGlobal()->getAbsoluteSymbolRange();		Optional<ConstantRange> CR = GA->getGlobal()->getAbsoluteSymbolRange();
return CR && CR->getSignedMin().sge(-1ull << Width) &&		if (!CR)
		return Width == 32 && TM.getCodeModel() == CodeModel::Small;

		return CR->getSignedMin().sge(-1ull << Width) &&
CR->getSignedMax().slt(1ull << Width);		CR->getSignedMax().slt(1ull << Width);
}		}

static X86::CondCode getCondFromNode(SDNode *N) {		static X86::CondCode getCondFromNode(SDNode *N) {
assert(N->isMachineOpcode() && "Unexpected node");		assert(N->isMachineOpcode() && "Unexpected node");
X86::CondCode CC = X86::COND_INVALID;		X86::CondCode CC = X86::COND_INVALID;
unsigned Opc = N->getMachineOpcode();		unsigned Opc = N->getMachineOpcode();
if (Opc == X86::JCC_1)		if (Opc == X86::JCC_1)
▲ Show 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::foldLoadStoreIntoMemOperand(SDNode *Node) {
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case X86ISD::ADD:		case X86ISD::ADD:
// Try to match inc/dec.		// Try to match inc/dec.
if (!Subtarget->slowIncDec() \|\| CurDAG->shouldOptForSize()) {		if (!Subtarget->slowIncDec() \|\| CurDAG->shouldOptForSize()) {
bool IsOne = isOneConstant(StoredVal.getOperand(1));		bool IsOne = isOneConstant(StoredVal.getOperand(1));
bool IsNegOne = isAllOnesConstant(StoredVal.getOperand(1));		bool IsNegOne = isAllOnesConstant(StoredVal.getOperand(1));
// ADD/SUB with 1/-1 and carry flag isn't used can use inc/dec.		// ADD/SUB with 1/-1 and carry flag isn't used can use inc/dec.
if ((IsOne \|\| IsNegOne) && hasNoCarryFlagUses(StoredVal.getValue(1))) {		if ((IsOne \|\| IsNegOne) && hasNoCarryFlagUses(StoredVal.getValue(1))) {
unsigned NewOpc =		unsigned NewOpc =
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - unsigned NewOpc = - ((Opc == X86ISD::ADD) == IsOne) - ? SelectOpcode(X86::INC64m, X86::INC32m, X86::INC16m, X86::INC8m) - : SelectOpcode(X86::DEC64m, X86::DEC32m, X86::DEC16m, X86::DEC8m); + unsigned NewOpc = ((Opc == X86ISD::ADD) == IsOne) + ? SelectOpcode(X86::INC64m, X86::INC32m, + X86::INC16m, X86::INC8m) + : SelectOpcode(X86::DEC64m, X86::DEC32m, + X86::DEC16m, X86::DEC8m); Lint: Pre-merge checks: clang-format: please reformat the code ``` - unsigned NewOpc = - ((Opc ==…
((Opc == X86ISD::ADD) == IsOne)		((Opc == X86ISD::ADD) == IsOne)
? SelectOpcode(X86::INC64m, X86::INC32m, X86::INC16m, X86::INC8m)		? SelectOpcode(X86::INC64m, X86::INC32m, X86::INC16m, X86::INC8m)
: SelectOpcode(X86::DEC64m, X86::DEC32m, X86::DEC16m, X86::DEC8m);		: SelectOpcode(X86::DEC64m, X86::DEC32m, X86::DEC16m, X86::DEC8m);
const SDValue Ops[] = {Base, Scale, Index, Disp, Segment, InputChain};		const SDValue Ops[] = {Base, Scale, Index, Disp, Segment, InputChain};
Result = CurDAG->getMachineNode(NewOpc, SDLoc(Node), MVT::i32,		Result = CurDAG->getMachineNode(NewOpc, SDLoc(Node), MVT::i32,
MVT::Other, Ops);		MVT::Other, Ops);
break;		break;
}		}
▲ Show 20 Lines • Show All 2,716 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/br-fold.ll

	; RUN: llc -mtriple=x86_64-apple-darwin < %s \| FileCheck -check-prefix=X64_DARWIN %s			; RUN: llc -mtriple=x86_64-apple-darwin < %s \| FileCheck -check-prefix=X64_DARWIN %s
	; RUN: llc -mtriple=x86_64-pc-linux < %s \| FileCheck -check-prefix=X64_LINUX %s			; RUN: llc -mtriple=x86_64-pc-linux < %s \| FileCheck -check-prefix=X64_LINUX %s
	; RUN: llc -mtriple=x86_64-pc-windows < %s \| FileCheck -check-prefix=X64_WINDOWS %s			; RUN: llc -mtriple=x86_64-pc-windows < %s \| FileCheck -check-prefix=X64_WINDOWS %s
	; RUN: llc -mtriple=x86_64-pc-windows-gnu < %s \| FileCheck -check-prefix=X64_WINDOWS_GNU %s			; RUN: llc -mtriple=x86_64-pc-windows-gnu < %s \| FileCheck -check-prefix=X64_WINDOWS_GNU %s
	; RUN: llc -mtriple=x86_64-scei-ps4 < %s \| FileCheck -check-prefix=PS4 %s			; RUN: llc -mtriple=x86_64-scei-ps4 < %s \| FileCheck -check-prefix=PS4 %s

	; X64_DARWIN: orq			; X64_DARWIN: orq
	; X64_DARWIN-NEXT: ud2			; X64_DARWIN-NEXT: ud2

	; X64_LINUX: orq %rax, %rcx			; X64_LINUX: orq $_ZN11xercesc_2_56XMLUni16fgNotationStringE, %rax
	; X64_LINUX-NEXT: jne			; X64_LINUX-NEXT: jne
	; X64_LINUX-NEXT: %bb8.i329			; X64_LINUX-NEXT: %bb8.i329

	; X64_WINDOWS: orq %rax, %rcx			; X64_WINDOWS: orq %rax, %rcx
	; X64_WINDOWS-NEXT: jne			; X64_WINDOWS-NEXT: jne

	; X64_WINDOWS_GNU: movq .refptr._ZN11xercesc_2_513SchemaSymbols21fgURI_SCHEMAFORSCHEMAE(%rip), %rax			; X64_WINDOWS_GNU: movq .refptr._ZN11xercesc_2_513SchemaSymbols21fgURI_SCHEMAFORSCHEMAE(%rip), %rax
	; X64_WINDOWS_GNU: orq .refptr._ZN11xercesc_2_56XMLUni16fgNotationStringE(%rip), %rax			; X64_WINDOWS_GNU: orq .refptr._ZN11xercesc_2_56XMLUni16fgNotationStringE(%rip), %rax
	; X64_WINDOWS_GNU-NEXT: jne			; X64_WINDOWS_GNU-NEXT: jne

	; PS4: orq %rax, %rcx			; PS4: orq $_ZN11xercesc_2_56XMLUni16fgNotationStringE, %rax
	; PS4-NEXT: ud2			; PS4-NEXT: ud2

	@_ZN11xercesc_2_513SchemaSymbols21fgURI_SCHEMAFORSCHEMAE = external constant [33 x i16], align 32 ; <[33 x i16]*> [#uses=1]			@_ZN11xercesc_2_513SchemaSymbols21fgURI_SCHEMAFORSCHEMAE = external constant [33 x i16], align 32 ; <[33 x i16]*> [#uses=1]
	@_ZN11xercesc_2_56XMLUni16fgNotationStringE = external constant [9 x i16], align 16 ; <[9 x i16]*> [#uses=1]			@_ZN11xercesc_2_56XMLUni16fgNotationStringE = external constant [9 x i16], align 16 ; <[9 x i16]*> [#uses=1]

	define fastcc void @foo() {			define fastcc void @foo() {
	entry:			entry:
	br i1 icmp eq (i64 or (i64 ptrtoint ([33 x i16]* @_ZN11xercesc_2_513SchemaSymbols21fgURI_SCHEMAFORSCHEMAE to i64),			br i1 icmp eq (i64 or (i64 ptrtoint ([33 x i16]* @_ZN11xercesc_2_513SchemaSymbols21fgURI_SCHEMAFORSCHEMAE to i64),
	Show All 9 Lines

llvm/test/CodeGen/X86/critical-edge-split-2.ll

	Show All 9 Lines
	; PR8642			; PR8642
	define i16 @test1(i1 zeroext %C, i8** nocapture %argv) nounwind ssp {			define i16 @test1(i1 zeroext %C, i8** nocapture %argv) nounwind ssp {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movw $1, %ax			; CHECK-NEXT: movw $1, %ax
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NEXT: testl %edi, %edi
	; CHECK-NEXT: jne .LBB0_2			; CHECK-NEXT: jne .LBB0_2
	; CHECK-NEXT: # %bb.1: # %cond.false.i			; CHECK-NEXT: # %bb.1: # %cond.false.i
	; CHECK-NEXT: movl $g_4, %eax			; CHECK-NEXT: movl $g_2+4, %eax
	; CHECK-NEXT: movl $g_2+4, %ecx			; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: cmpq $g_4, %rax
	; CHECK-NEXT: cmpq %rax, %rcx			; CHECK-NEXT: sete %cl
	; CHECK-NEXT: sete %sil
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: xorl %edx, %edx			; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: divl %esi			; CHECK-NEXT: divl %ecx
	; CHECK-NEXT: movl %edx, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: .LBB0_2: # %cond.end.i			; CHECK-NEXT: .LBB0_2: # %cond.end.i
	; CHECK-NEXT: # kill: def $ax killed $ax killed $eax			; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq

	entry:			entry:
	br i1 %C, label %cond.end.i, label %cond.false.i			br i1 %C, label %cond.end.i, label %cond.false.i

	cond.false.i:			cond.false.i:
	br label %cond.end.i			br label %cond.end.i

	cond.end.i:			cond.end.i:
	%call1 = phi i16 [ trunc (i32 srem (i32 1, i32 zext (i1 icmp eq (%1* bitcast (i8* getelementptr inbounds (%0, %0* @g_2, i64 0, i32 1, i32 0) to %1), %1 @g_4) to i32)) to i16), %cond.false.i ], [ 1, %entry ]			%call1 = phi i16 [ trunc (i32 srem (i32 1, i32 zext (i1 icmp eq (%1* bitcast (i8* getelementptr inbounds (%0, %0* @g_2, i64 0, i32 1, i32 0) to %1), %1 @g_4) to i32)) to i16), %cond.false.i ], [ 1, %entry ]
	ret i16 %call1			ret i16 %call1
	}			}

llvm/test/CodeGen/X86/pr33290.ll

	Show All 16 Lines
	; X86-NEXT: movb $0, c			; X86-NEXT: movb $0, c
	; X86-NEXT: leal a+2(%ecx), %ecx			; X86-NEXT: leal a+2(%ecx), %ecx
	; X86-NEXT: movl %ecx, (%eax)			; X86-NEXT: movl %ecx, (%eax)
	; X86-NEXT: jmp .LBB0_1			; X86-NEXT: jmp .LBB0_1
	;			;
	; X64-LABEL: e:			; X64-LABEL: e:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq {{.*}}(%rip), %rax			; X64-NEXT: movq {{.*}}(%rip), %rax
	; X64-NEXT: movl $a, %esi
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB0_1: # %for.cond			; X64-NEXT: .LBB0_1: # %for.cond
	; X64-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: movzbl {{.*}}(%rip), %edx			; X64-NEXT: movzbl {{.*}}(%rip), %ecx
	; X64-NEXT: addq %rsi, %rdx			; X64-NEXT: addq $a, %rcx
	; X64-NEXT: setb %cl			; X64-NEXT: setb %dl
	; X64-NEXT: addq $2, %rdx			; X64-NEXT: addq $2, %rcx
	; X64-NEXT: adcb $0, %cl			; X64-NEXT: adcb $0, %dl
	; X64-NEXT: movb %cl, {{.*}}(%rip)			; X64-NEXT: movb %dl, {{.*}}(%rip)
	; X64-NEXT: movl %edx, (%rax)			; X64-NEXT: movl %ecx, (%rax)
	; X64-NEXT: jmp .LBB0_1			; X64-NEXT: jmp .LBB0_1
	entry:			entry:
	%0 = load i32, i32* @b, align 8			%0 = load i32, i32* @b, align 8
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%1 = load i8, i8* @c, align 1			%1 = load i8, i8* @c, align 1
	%conv = zext i8 %1 to i128			%conv = zext i8 %1 to i128
	%add = add nuw nsw i128 %conv, add (i128 ptrtoint (i32* @a to i128), i128 2)			%add = add nuw nsw i128 %conv, add (i128 ptrtoint (i32* @a to i128), i128 2)
	%2 = lshr i128 %add, 64			%2 = lshr i128 %add, 64
	%conv1 = trunc i128 %2 to i8			%conv1 = trunc i128 %2 to i8
	store i8 %conv1, i8* @c, align 1			store i8 %conv1, i8* @c, align 1
	%conv2 = trunc i128 %add to i32			%conv2 = trunc i128 %add to i32
	store i32 %conv2, i32* %0, align 4			store i32 %conv2, i32* %0, align 4
	br label %for.cond			br label %for.cond
	}			}

llvm/test/CodeGen/X86/relocimm-small-model.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s --check-prefix=CHECK-SMALL
				; RUN: llc --code-model=medium < %s \| FileCheck %s --check-prefix=CHECK-MEDIUM

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@a = external dso_local global i32, align 4

				declare void @f()

				define void @foo(i64 %b) {
				; CHECK-MEDIUM: cmpq %rax, %rdi
				; CHECK-SMALL: cmpq $a, %rdi
				entry:
				%cmp = icmp eq i64 %b, ptrtoint (i32* @a to i64)
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				tail call void @f()
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}