This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
3
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
3
pr38730.ll

Differential D51502

[X86] Fix register resizings for inline assembly register operands.
ClosedPublic

Authored by niravd on Aug 30 2018, 12:14 PM.

Download Raw Diff

Details

Reviewers

eli.friedman
craig.topper
nickdesaulniers

Commits

rG59ad1c845787: [X86] Fix register resizings for inline assembly register operands.
rL342175: [X86] Fix register resizings for inline assembly register operands.

Summary

In the case of a 64-bit value being assigned to a named 32-bit register in 32-bit mode, match GCC's translated assignment to register pairs.

Fixes PR38730.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 22110
Build 22110: arc lint + arc unit

Event Timeline

niravd created this revision.Aug 30 2018, 12:14 PM

Harbormaster completed remote builds in B22110: Diff 163390.Aug 30 2018, 12:14 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 30 2018, 12:14 PM

efriedma added a subscriber: efriedma.Aug 30 2018, 12:24 PM

efriedma added inline comments.

llvm/test/CodeGen/X86/pr38730.ll
22	Shouldn't this be an error? I can't see how it's possible to put a 64-bit value into edx.

srhines added a subscriber: srhines.Aug 30 2018, 1:09 PM

srhines added inline comments.

llvm/test/CodeGen/X86/pr38730.ll
22	GCC issues a warning for this, but then uses two registers (one being the specified edx).

nickdesaulniers added inline comments.Aug 30 2018, 1:21 PM

llvm/test/CodeGen/X86/pr38730.ll
22	See also: https://lore.kernel.org/patchwork/patch/403136/

nickdesaulniers added a reviewer: nickdesaulniers.Aug 30 2018, 1:27 PM

nickdesaulniers added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
41122	trailing whitespace

I tested and can verify that this patch fixes the crash in the compiler. I think we chose a different second register in this case than GCC does; maybe that's ok?

If someone is doing this, and intentionally ignoring gcc's warning, I think we have to assume they're depending on gcc's register choices. So we should either reject or match gcc.

Testing this patch locally, I think the register choice is indeed important. Consider the following test case:

long long foo () {
    register long long x asm("edx");
    asm("call bar": "=r"(x));
    return x;
}

long long bar () {
    return 0x0011223344556677;
}

GCC produces the following disassembly:

00000000 <foo>:
   0:	e8 fc ff ff ff       	call   1 <foo+0x1>
   5:	89 d0                	mov    %edx,%eax
   7:	89 ca                	mov    %ecx,%edx
   9:	c3                   	ret    
   a:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi

00000010 <bar>:
  10:	b8 77 66 55 44       	mov    $0x44556677,%eax
  15:	ba 33 22 11 00       	mov    $0x112233,%edx
  1a:	c3

Clang+this patch produces the following disassembly (with my annotations, read bar first, then foo):

00000000 <foo>:
   0:	56                   	push   %esi
   1:	e8 0a 00 00 00       	call   10 <bar>
  // are we clobbering one of the two return registers here?
   6:	89 d0                	mov    %edx,%eax
   8:	89 f2                	mov    %esi,%edx
   a:	5e                   	pop    %esi
   b:	c3                   	ret    
   c:	0f 1f 40 00          	nopl   0x0(%eax)

00000010 <bar>:
  // ok so 64b return values are passed in %eax, then %edx for -m32.
  10:	b8 77 66 55 44       	mov    $0x44556677,%eax
  15:	ba 33 22 11 00       	mov    $0x112233,%edx
  1a:	c3                   	ret

Match GCC's register assignment behavior. This causes some minor test case reordering from the introduced register classes. Interestingly unfold-masked-merge-vector-variablemask.ll has slightly fewer spills.

I'm working on a follow up patch to add warnings for these cases, but this should be okay to go in now.

The behavior of this version of this patch is not quite right just yet, in terms of matching GCC's curious behavior.

Consider my test case from the above comment again:

// cc -O3 -m32 -c -o edx.o
long long foo () {
    register long long x asm("edx");
    asm("call bar": "=r"(x));
    return x;
}

long long bar () {
    return 0x0011223344556677;
}

gcc produces the following disassembly:

; objdump -Dr edx.o
00000000 <foo>:
   0:   e8 fc ff ff ff          call   1 <foo+0x1>
                        1: R_386_PC32   bar
   5:   89 d0                   mov    %edx,%eax
   7:   89 ca                   mov    %ecx,%edx
   9:   c3                      ret

while Clang plus this version of this patch produces:

00000000 <foo>:
   0:   53                      push   %ebx
   1:   56                      push   %esi
   2:   e8 09 00 00 00          call   10 <bar>
   7:   89 d8                   mov    %ebx,%eax
   9:   89 f2                   mov    %esi,%edx
   b:   5e                      pop    %esi
   c:   5b                      pop    %ebx
   d:   c3                      ret

It seems for 32b x86, when returning 64b, the lower 32b is returned in %eax and the upper 32b in %edx, which both compilers get correct here. What differs is the source registers of the output argument to the inline asm block.

To see why this is critical, the crashing test case for the bugreport in 36378 is from the Linux kernel's __get_user_8 function, which is defined in arch/x86/lib/getuser.S. Specifically the comment:

Outputs: ...
%[r|e]dx contains zero-extended value
%ecx contains the high half for 32-bit __get_user_8

So in order to support this unspecified calling convention, it's critical that we read from %ecx for the upper 32b, and %edx for the lower 32b following the call. From GCC's disassembly, you can see this in the source of the movs (left operand) after the call.

Huh. It looks like I commited a partial patch change. The uploaded patch has only some of my change to remove the remove breaks from the switch. The EAX, EDX, and ECX cases should have also be returns (or at least have breaks). Corrected patch here.

There we go, clang now even produces one less relocation than gcc, but otherwise the same instructions, from my test case:

00000000 <foo>:
   0:   e8 0b 00 00 00          call   10 <bar>
   5:   89 d0                   mov    %edx,%eax
   7:   89 ca                   mov    %ecx,%edx
   9:   c3                      ret

The disassembly of individual translation units from the Linux kernel also looks correct, though clang has other issues compiling a working 32b x86 Linux kernel, so I can't test this completely. But based on the smaller cases, this code looks correct.

Thanks Nirav!

llvm/test/CodeGen/X86/physreg-pairs.ll
4 ↗	(On Diff #165332)	`GCC's` `values to`
6 ↗	(On Diff #165332)	I find this sentence slightly hard to follow. Would you mind rewording it to something along the lines of: To match GCC's behavior in assigning 64-bit values to a 32-bit register, we bind the pair (the given register, the following register) from the sequence EAX, EDX, ECX, EBX, ESI, EDI, EBP, ESP, to the value. There is no wrapping from the sequence, so this will fail given ESP.

This revision is now accepted and ready to land.Sep 13 2018, 12:43 PM

xbolva00 added a subscriber: xbolva00.Sep 13 2018, 12:50 PM

xbolva00 added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
41225	if (Size == 64

nickdesaulniers added inline comments.Sep 13 2018, 12:52 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
41225	space between `//` and `Model`?

Closed by commit rL342175: [X86] Fix register resizings for inline assembly register operands. (authored by niravd). · Explain WhySep 13 2018, 1:35 PM

This revision was automatically updated to reflect the committed changes.

Isn't this fix about inline assembly? Why do we see all the scheduling/regalloc changes here?

Yes, this is a fix to match GCC's register assignment for a 64-bit in
32-bit mode to pairs of registers.

I haven't looked too deeply into the details, but register allocation order
is sensitive to the register classes (I think by way of register pressure)
which is why the additional register classes are able to change our
selection bias between 32-bit registers and 8-bits.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

5 lines

test/

CodeGen/

X86/

pr38730.ll

26 lines

Diff 163390

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,759 Lines • ▼ Show 20 Lines
	if (Size == 1) Size = 8;			if (Size == 1) Size = 8;
	unsigned DestReg = getX86SubSuperRegisterOrZero(Res.first, Size);			unsigned DestReg = getX86SubSuperRegisterOrZero(Res.first, Size);
	if (DestReg > 0) {			if (DestReg > 0) {
	bool is64Bit = Subtarget.is64Bit();			bool is64Bit = Subtarget.is64Bit();
	const TargetRegisterClass *RC =			const TargetRegisterClass *RC =
	Size == 8 ? (is64Bit ? &X86::GR8RegClass : &X86::GR8_NOREXRegClass)			Size == 8 ? (is64Bit ? &X86::GR8RegClass : &X86::GR8_NOREXRegClass)
	: Size == 16 ? (is64Bit ? &X86::GR16RegClass : &X86::GR16_NOREXRegClass)			: Size == 16 ? (is64Bit ? &X86::GR16RegClass : &X86::GR16_NOREXRegClass)
	: Size == 32 ? (is64Bit ? &X86::GR32RegClass : &X86::GR32_NOREXRegClass)			: Size == 32 ? (is64Bit ? &X86::GR32RegClass : &X86::GR32_NOREXRegClass)
	: &X86::GR64RegClass;			: Size == 64 ? (is64Bit ? &X86::GR64RegClass : nullptr)
	if (RC->contains(DestReg))			: nullptr;
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions trailing whitespace nickdesaulniers: trailing whitespace
				if (RC && RC->contains(DestReg))
	Res = std::make_pair(DestReg, RC);			Res = std::make_pair(DestReg, RC);
	} else {			} else {
	// No register found/type mismatch.			// No register found/type mismatch.
	Res.first = 0;			Res.first = 0;
	Res.second = nullptr;			Res.second = nullptr;
	}			}
	} else if (isFRClass(*Class)) {			} else if (isFRClass(*Class)) {
	// Handle references to XMM physical registers that got mapped into the			// Handle references to XMM physical registers that got mapped into the
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	MachineRegisterInfo *MRI = &Entry->getParent()->getRegInfo();			MachineRegisterInfo *MRI = &Entry->getParent()->getRegInfo();
	MachineBasicBlock::iterator MBBI = Entry->begin();			MachineBasicBlock::iterator MBBI = Entry->begin();
	for (const MCPhysReg I = IStart; I; ++I) {			for (const MCPhysReg I = IStart; I; ++I) {
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;
	if (X86::GR64RegClass.contains(*I))			if (X86::GR64RegClass.contains(*I))
	RC = &X86::GR64RegClass;			RC = &X86::GR64RegClass;
	else			else
	llvm_unreachable("Unexpected register class in CSRsViaCopy!");			llvm_unreachable("Unexpected register class in CSRsViaCopy!");

				xbolva00Unsubmitted Not Done Reply Inline Actions if (Size == 64 xbolva00: if (Size == 64
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions space between `//` and `Model`? nickdesaulniers: space between `//` and `Model`?
	unsigned NewVR = MRI->createVirtualRegister(RC);			unsigned NewVR = MRI->createVirtualRegister(RC);
	// Create copy from CSR to a virtual register.			// Create copy from CSR to a virtual register.
	// FIXME: this currently does not emit CFI pseudo-instructions, it works			// FIXME: this currently does not emit CFI pseudo-instructions, it works
	// fine for CXX_FAST_TLS since the C++-style TLS access functions should be			// fine for CXX_FAST_TLS since the C++-style TLS access functions should be
	// nounwind. If we want to generalize this later, we may need to emit			// nounwind. If we want to generalize this later, we may need to emit
	// CFI pseudo-instructions.			// CFI pseudo-instructions.
	assert(Entry->getParent()->getFunction().hasFnAttribute(			assert(Entry->getParent()->getFunction().hasFnAttribute(
	Attribute::NoUnwind) &&			Attribute::NoUnwind) &&
	Show All 36 Lines

llvm/test/CodeGen/X86/pr38730.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -o - %s -mtriple=i386-unknown-linux-gnu \| FileCheck %s

				@a = common dso_local local_unnamed_addr global i32 0, align 4

				; Function Attrs: nounwind
				define dso_local i32 @b() local_unnamed_addr {
				; CHECK-LABEL: b:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: pushl %esi
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: .cfi_offset %esi, -8
				; CHECK-NEXT: movl a, %eax
				; CHECK-NEXT: #APP
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: movl %eax, a
				; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .cfi_def_cfa_offset 4
				; CHECK-NEXT: retl
				entry:
				%0 = load i32, i32* @a, align 4
				%1 = tail call { i64, i32 } asm "", "={edx},=r,1,~{dirflag},~{fpsr},~{flags}"(i32 %0)
				efriedmaUnsubmitted Not Done Reply Inline Actions Shouldn't this be an error? I can't see how it's possible to put a 64-bit value into edx. efriedma: Shouldn't this be an error? I can't see how it's possible to put a 64-bit value into edx.
				srhinesUnsubmitted Not Done Reply Inline Actions GCC issues a warning for this, but then uses two registers (one being the specified edx). srhines: GCC issues a warning for this, but then uses two registers (one being the specified edx).
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions See also: https://lore.kernel.org/patchwork/patch/403136/ nickdesaulniers: See also: https://lore.kernel.org/patchwork/patch/403136/
				%asmresult1 = extractvalue { i64, i32 } %1, 1
				store i32 %asmresult1, i32* @a, align 4
				ret i32 undef
				}