This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx-cmp.ll
-
mmx-fold-zero.ll

Differential D97208

[X86] Always use rip-relative addressing on 64-bit when rematerializing all zeros/ones registers using a folded load.
ClosedPublic

Authored by craig.topper on Feb 22 2021, 11:01 AM.

Download Raw Diff

Details

Reviewers

RKSimon
MaskRay
rnk
spatel
pengfei

Commits

rG54bacaf31127: [X86] Always use rip-relative addressing on 64-bit when rematerializing all…

Summary

Previously we only used RIP relative when PIC was enabled. But
we know we're in small/kernel code model here so we should
be able to always use RIP-relative which will give a smaller
encoding.

None of our lit tests are affected by this for some reason. Maybe the
update_test_checks.py is too aggressive with regular expressions?
I'll work on a proper test before committing, but I wanted to make
sure I wasn't missing anything that would prevent us from making this
change.

Here's a godbolt link that demonstrates the current codegen https://godbolt.org/z/j3158o
Note in the non-PIC version the load from .LCPI0_0 doesn't use
RIP-relative addressing, but if you change the constant in the
source from 0.0 to 1.0 it will become RIP-relative.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Feb 22 2021, 11:01 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 22 2021, 11:01 AM

craig.topper requested review of this revision.Feb 22 2021, 11:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2021, 11:01 AM

Harbormaster completed remote builds in B90248: Diff 325495.Feb 22 2021, 12:00 PM

LG.

% myllvm-mc -show-encoding a.s
        .text
.LCPI0_0:
        .quad   0
        ucomisd .LCPI0_0, %xmm0                 # encoding: [0x66,0x0f,0x2e,0x04,0x25,A,A,A,A]
                                        #   fixup A - offset: 5, value: .LCPI0_0, kind: reloc_signed_4byte
        ucomisd .LCPI0_0(%rip), %xmm0           # encoding: [0x66,0x0f,0x2e,0x05,A,A,A,A]
                                        #   fixup A - offset: 4, value: .LCPI0_0-4, kind: reloc_riprel_4byte

None of our lit tests are affected by this for some reason. Maybe the update_llc_test_checks.py is too aggressive with regular expressions?

Confirmed.

; CHECK-NEXT: ucomisd {{\.LCPI.*}}, %xmm0

.* hides the (%rip) difference.

FYI I raised https://bugs.llvm.org/show_bug.cgi?id=45091 about better exposing pointer offsets on x86 as we've missed some regressions in the past from this but nothing has happened on it

None of our lit tests are affected by this for some reason. Maybe the update_llc_test_checks.py is too aggressive with regular expressions?

Confirmed.

; CHECK-NEXT: ucomisd {{\.LCPI.*}}, %xmm0

.* hides the (%rip) difference.

It's not true. I looked at the code in llvm/utils/UpdateTestChecks/asm.py, the script can differentiate the rip from the pure address, e.g

cd llvm/utils/UpdateTestChecks/
python
>>> from UpdateTestChecks import asm
>>> class T():
...   x86_scrub_rip = True
...
>>> asm.scrub_asm_x86('ucomisd .LCPI0_0(%rip), %xmm0', T)
'ucomisd {{.*}}(%rip), %xmm0'
>>> asm.scrub_asm_x86('ucomisd .LCPI0_0, %xmm0', T)
'ucomisd {{\\.LCPI.*}}, %xmm0'

So the update_llc_test_checks.py isn't too aggressive to eliminate the difference.
I also check a test llvm/test/CodeGen/X86/vec_reassociate.ll, in which we can see the differences between X86 and X64.

In D97208#2581319, @RKSimon wrote:

FYI I raised https://bugs.llvm.org/show_bug.cgi?id=45091 about better exposing pointer offsets on x86 as we've missed some regressions in the past from this but nothing has happened on it

It seems the script also handles these cases by

# Detect stack spills and reloads and hide their exact offset and whether
# they used the stack pointer or frame pointer.
asm = SCRUB_X86_SPILL_RELOAD_RE.sub(r'{{[-0-9]+}}(%\1{{[sb]}}p)\2', asm)

In D97208#2581460, @pengfei wrote:
None of our lit tests are affected by this for some reason. Maybe the update_llc_test_checks.py is too aggressive with regular expressions?

Confirmed.

; CHECK-NEXT: ucomisd {{\.LCPI.*}}, %xmm0

.* hides the (%rip) difference.

It's not true. I looked at the code in llvm/utils/UpdateTestChecks/asm.py, the script can differentiate the rip from the pure address, e.g
cd llvm/utils/UpdateTestChecks/
python
>>> from UpdateTestChecks import asm
>>> class T():
...   x86_scrub_rip = True
...
>>> asm.scrub_asm_x86('ucomisd .LCPI0_0(%rip), %xmm0', T)
'ucomisd {{.*}}(%rip), %xmm0'
>>> asm.scrub_asm_x86('ucomisd .LCPI0_0, %xmm0', T)
'ucomisd {{\\.LCPI.*}}, %xmm0'
So the update_llc_test_checks.py isn't too aggressive to eliminate the difference.
I also check a test llvm/test/CodeGen/X86/vec_reassociate.ll, in which we can see the differences between X86 and X64.

Oh, I misunderstood @MaskRay 's comments. It's true {{\\.LCPI.*}} hides the difference, but we can regenerate the tests by the script.

In D97208#2581481, @pengfei wrote:
In D97208#2581319, @RKSimon wrote:

FYI I raised https://bugs.llvm.org/show_bug.cgi?id=45091 about better exposing pointer offsets on x86 as we've missed some regressions in the past from this but nothing has happened on it

It seems the script also handles these cases by
# Detect stack spills and reloads and hide their exact offset and whether
# they used the stack pointer or frame pointer.
asm = SCRUB_X86_SPILL_RELOAD_RE.sub(r'{{[-0-9]+}}(%\1{{[sb]}}p)\2', asm)

May be we can add an extra option like

if getattr(args, 'x86_keep_offset', False):
  asm = SCRUB_X86_SPILL_RELOAD_RE.sub(r'{{[-0-9]+}}(%\1{{[sb]}}p)\2', asm)

update_llc_test_checks.py already has several '-x86_scrub_*- command switches that covers some of this

What do we want to do with this patch?

It appears to be stuck on our test check patterns hiding too much address math - is that something we should move away from by default? I proposed something similar on https://bugs.llvm.org/show_bug.cgi?id=45091 - would it help if we got that dealt with first?

craig.topper mentioned this in D99460: [X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests..Mar 27 2021, 4:47 PM

Rebase on top of D99460 where I weakened regular expressions we get test failures now.

craig.topper added a parent revision: D99460: [X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests..Mar 27 2021, 4:58 PM

Harbormaster completed remote builds in B95988: Diff 333702.Mar 27 2021, 5:44 PM

craig.topper mentioned this in rG0248e2407166: [X86][update_llc_test_checks] Use a less greedy regular expression for….Mar 28 2021, 11:40 AM

LGTM

This revision is now accepted and ready to land.Mar 29 2021, 2:33 AM

This revision was landed with ongoing or failed builds.Mar 29 2021, 10:06 AM

Closed by commit rG54bacaf31127: [X86] Always use rip-relative addressing on 64-bit when rematerializing all… (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG54bacaf31127: [X86] Always use rip-relative addressing on 64-bit when rematerializing all….

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86InstrInfo.cpp

19 lines

test/

CodeGen/

X86/

avx-cmp.ll

4 lines

mmx-fold-zero.ll

4 lines

Diff 333920

llvm/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,079 Lines • ▼ Show 20 Lines	case X86::AVX512_FsFLD0F128: {

// Medium and large mode can't fold loads this way.		// Medium and large mode can't fold loads this way.
if (MF.getTarget().getCodeModel() != CodeModel::Small &&		if (MF.getTarget().getCodeModel() != CodeModel::Small &&
MF.getTarget().getCodeModel() != CodeModel::Kernel)		MF.getTarget().getCodeModel() != CodeModel::Kernel)
return nullptr;		return nullptr;

// x86-32 PIC requires a PIC base register for constant pools.		// x86-32 PIC requires a PIC base register for constant pools.
unsigned PICBase = 0;		unsigned PICBase = 0;
if (MF.getTarget().isPositionIndependent()) {		// Since we're using Small or Kernel code model, we can always use
if (Subtarget.is64Bit())		// RIP-relative addressing for a smaller encoding.
		if (Subtarget.is64Bit()) {
PICBase = X86::RIP;		PICBase = X86::RIP;
else		} else if (MF.getTarget().isPositionIndependent()) {
// FIXME: PICBase = getGlobalBaseReg(&MF);		// FIXME: PICBase = getGlobalBaseReg(&MF);
// This doesn't work for several reasons.		// This doesn't work for several reasons.
// 1. GlobalBaseReg may have been spilled.		// 1. GlobalBaseReg may have been spilled.
// 2. It may not be live at MI.		// 2. It may not be live at MI.
return nullptr;		return nullptr;
}		}

// Create a constant-pool entry.		// Create a constant-pool entry.
MachineConstantPool &MCP = *MF.getConstantPool();		MachineConstantPool &MCP = *MF.getConstantPool();
Type *Ty;		Type *Ty;
unsigned Opc = LoadMI.getOpcode();		unsigned Opc = LoadMI.getOpcode();
if (Opc == X86::FsFLD0SS \|\| Opc == X86::AVX512_FsFLD0SS)		if (Opc == X86::FsFLD0SS \|\| Opc == X86::AVX512_FsFLD0SS)
Ty = Type::getFloatTy(MF.getFunction().getContext());		Ty = Type::getFloatTy(MF.getFunction().getContext());
▲ Show 20 Lines • Show All 2,959 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-cmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_rip
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s

	define <8 x i32> @cmp00(<8 x float> %a, <8 x float> %b) nounwind {			define <8 x i32> @cmp00(<8 x float> %a, <8 x float> %b) nounwind {
	; CHECK-LABEL: cmp00:			; CHECK-LABEL: cmp00:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; CHECK-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%bincmp = fcmp olt <8 x float> %a, %b			%bincmp = fcmp olt <8 x float> %a, %b
	Show All 34 Lines
	; CHECK-NEXT: # %bb.3: # %for.cond5			; CHECK-NEXT: # %bb.3: # %for.cond5
	; CHECK-NEXT: # in Loop: Header=BB2_2 Depth=1			; CHECK-NEXT: # in Loop: Header=BB2_2 Depth=1
	; CHECK-NEXT: testb %bpl, %bpl			; CHECK-NEXT: testb %bpl, %bpl
	; CHECK-NEXT: jne .LBB2_2			; CHECK-NEXT: jne .LBB2_2
	; CHECK-NEXT: # %bb.4: # %for.body33.preheader			; CHECK-NEXT: # %bb.4: # %for.body33.preheader
	; CHECK-NEXT: # in Loop: Header=BB2_2 Depth=1			; CHECK-NEXT: # in Loop: Header=BB2_2 Depth=1
	; CHECK-NEXT: vmovsd (%rsp), %xmm0 # 8-byte Reload			; CHECK-NEXT: vmovsd (%rsp), %xmm0 # 8-byte Reload
	; CHECK-NEXT: # xmm0 = mem[0],zero			; CHECK-NEXT: # xmm0 = mem[0],zero
	; CHECK-NEXT: vucomisd {{\.LCPI[0-9]+_[0-9]+}}, %xmm0			; CHECK-NEXT: vucomisd {{\.LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; CHECK-NEXT: jne .LBB2_5			; CHECK-NEXT: jne .LBB2_5
	; CHECK-NEXT: jnp .LBB2_2			; CHECK-NEXT: jnp .LBB2_2
	; CHECK-NEXT: .LBB2_5: # %if.then			; CHECK-NEXT: .LBB2_5: # %if.then
	; CHECK-NEXT: # in Loop: Header=BB2_2 Depth=1			; CHECK-NEXT: # in Loop: Header=BB2_2 Depth=1
	; CHECK-NEXT: callq scale@PLT			; CHECK-NEXT: callq scale@PLT
	; CHECK-NEXT: jmp .LBB2_2			; CHECK-NEXT: jmp .LBB2_2
	; CHECK-NEXT: .LBB2_6: # %for.end52			; CHECK-NEXT: .LBB2_6: # %for.end52
	; CHECK-NEXT: addq $8, %rsp			; CHECK-NEXT: addq $8, %rsp
	▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/mmx-fold-zero.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --no_x86_scrub_rip
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X64

	define double @mmx_zero(double, double, double, double) nounwind {			define double @mmx_zero(double, double, double, double) nounwind {
	; X86-LABEL: mmx_zero:			; X86-LABEL: mmx_zero:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl %ebp			; X86-NEXT: pushl %ebp
	; X86-NEXT: movl %esp, %ebp			; X86-NEXT: movl %esp, %ebp
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq %mm1, %mm7			; X64-NEXT: movq %mm1, %mm7
	; X64-NEXT: pmuludq %mm5, %mm7			; X64-NEXT: pmuludq %mm5, %mm7
	; X64-NEXT: paddw %mm4, %mm7			; X64-NEXT: paddw %mm4, %mm7
	; X64-NEXT: paddw %mm7, %mm5			; X64-NEXT: paddw %mm7, %mm5
	; X64-NEXT: paddw %mm5, %mm2			; X64-NEXT: paddw %mm5, %mm2
	; X64-NEXT: paddw %mm2, %mm0			; X64-NEXT: paddw %mm2, %mm0
	; X64-NEXT: paddw %mm6, %mm0			; X64-NEXT: paddw %mm6, %mm0
	; X64-NEXT: pmuludq %mm3, %mm0			; X64-NEXT: pmuludq %mm3, %mm0
	; X64-NEXT: paddw {{\.LCPI[0-9]+_[0-9]+}}, %mm0			; X64-NEXT: paddw {{\.LCPI[0-9]+_[0-9]+}}(%rip), %mm0
	; X64-NEXT: paddw %mm1, %mm0			; X64-NEXT: paddw %mm1, %mm0
	; X64-NEXT: pmuludq %mm7, %mm0			; X64-NEXT: pmuludq %mm7, %mm0
	; X64-NEXT: pmuludq {{[-0-9]+}}(%r{{[sb]}}p), %mm0 # 8-byte Folded Reload			; X64-NEXT: pmuludq {{[-0-9]+}}(%r{{[sb]}}p), %mm0 # 8-byte Folded Reload
	; X64-NEXT: paddw %mm5, %mm0			; X64-NEXT: paddw %mm5, %mm0
	; X64-NEXT: paddw %mm2, %mm0			; X64-NEXT: paddw %mm2, %mm0
	; X64-NEXT: movq2dq %mm0, %xmm0			; X64-NEXT: movq2dq %mm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%5 = bitcast double %0 to x86_mmx			%5 = bitcast double %0 to x86_mmx
	Show All 28 Lines