This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix register resizings for inline assembly register operands.
ClosedPublic

Authored by niravd on Aug 30 2018, 12:14 PM.

Details

Summary

In the case of a 64-bit value being assigned to a named 32-bit register in 32-bit mode, match GCC's translated assignment to register pairs.

Fixes PR38730.

Diff Detail

Event Timeline

niravd created this revision.Aug 30 2018, 12:14 PM
efriedma added inline comments.
llvm/test/CodeGen/X86/pr38730.ll
22

Shouldn't this be an error? I can't see how it's possible to put a 64-bit value into edx.

srhines added inline comments.
llvm/test/CodeGen/X86/pr38730.ll
22

GCC issues a warning for this, but then uses two registers (one being the specified edx).

llvm/test/CodeGen/X86/pr38730.ll
22
nickdesaulniers added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
41122

trailing whitespace

I tested and can verify that this patch fixes the crash in the compiler. I think we chose a different second register in this case than GCC does; maybe that's ok?

If someone is doing this, and intentionally ignoring gcc's warning, I think we have to assume they're depending on gcc's register choices. So we should either reject or match gcc.

nickdesaulniers added a comment.EditedAug 30 2018, 3:28 PM

Testing this patch locally, I think the register choice is indeed important. Consider the following test case:

long long foo () {
    register long long x asm("edx");
    asm("call bar": "=r"(x));
    return x;
}

long long bar () {
    return 0x0011223344556677;
}

GCC produces the following disassembly:

00000000 <foo>:
   0:	e8 fc ff ff ff       	call   1 <foo+0x1>
   5:	89 d0                	mov    %edx,%eax
   7:	89 ca                	mov    %ecx,%edx
   9:	c3                   	ret    
   a:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi

00000010 <bar>:
  10:	b8 77 66 55 44       	mov    $0x44556677,%eax
  15:	ba 33 22 11 00       	mov    $0x112233,%edx
  1a:	c3

Clang+this patch produces the following disassembly (with my annotations, read bar first, then foo):

00000000 <foo>:
   0:	56                   	push   %esi
   1:	e8 0a 00 00 00       	call   10 <bar>
  // are we clobbering one of the two return registers here?
   6:	89 d0                	mov    %edx,%eax
   8:	89 f2                	mov    %esi,%edx
   a:	5e                   	pop    %esi
   b:	c3                   	ret    
   c:	0f 1f 40 00          	nopl   0x0(%eax)

00000010 <bar>:
  // ok so 64b return values are passed in %eax, then %edx for -m32.
  10:	b8 77 66 55 44       	mov    $0x44556677,%eax
  15:	ba 33 22 11 00       	mov    $0x112233,%edx
  1a:	c3                   	ret
niravd updated this revision to Diff 164670.Sep 10 2018, 7:46 AM
niravd edited the summary of this revision. (Show Details)

Match GCC's register assignment behavior. This causes some minor test case reordering from the introduced register classes. Interestingly unfold-masked-merge-vector-variablemask.ll has slightly fewer spills.

I'm working on a follow up patch to add warnings for these cases, but this should be okay to go in now.

nickdesaulniers added a comment.EditedSep 13 2018, 9:57 AM

The behavior of this version of this patch is not quite right just yet, in terms of matching GCC's curious behavior.

Consider my test case from the above comment again:

// cc -O3 -m32 -c -o edx.o
long long foo () {
    register long long x asm("edx");
    asm("call bar": "=r"(x));
    return x;
}

long long bar () {
    return 0x0011223344556677;
}

gcc produces the following disassembly:

; objdump -Dr edx.o
00000000 <foo>:
   0:   e8 fc ff ff ff          call   1 <foo+0x1>
                        1: R_386_PC32   bar
   5:   89 d0                   mov    %edx,%eax
   7:   89 ca                   mov    %ecx,%edx
   9:   c3                      ret

while Clang plus this version of this patch produces:

00000000 <foo>:
   0:   53                      push   %ebx
   1:   56                      push   %esi
   2:   e8 09 00 00 00          call   10 <bar>
   7:   89 d8                   mov    %ebx,%eax
   9:   89 f2                   mov    %esi,%edx
   b:   5e                      pop    %esi
   c:   5b                      pop    %ebx
   d:   c3                      ret

It seems for 32b x86, when returning 64b, the lower 32b is returned in %eax and the upper 32b in %edx, which both compilers get correct here. What differs is the source registers of the output argument to the inline asm block.

To see why this is critical, the crashing test case for the bugreport in 36378 is from the Linux kernel's __get_user_8 function, which is defined in arch/x86/lib/getuser.S. Specifically the comment:

Outputs: ...
%[r|e]dx contains zero-extended value
%ecx contains the high half for 32-bit __get_user_8

So in order to support this unspecified calling convention, it's critical that we read from %ecx for the upper 32b, and %edx for the lower 32b following the call. From GCC's disassembly, you can see this in the source of the movs (left operand) after the call.

niravd updated this revision to Diff 165332.Sep 13 2018, 10:49 AM

Huh. It looks like I commited a partial patch change. The uploaded patch has only some of my change to remove the remove breaks from the switch. The EAX, EDX, and ECX cases should have also be returns (or at least have breaks). Corrected patch here.

nickdesaulniers accepted this revision.Sep 13 2018, 12:43 PM

There we go, clang now even produces one less relocation than gcc, but otherwise the same instructions, from my test case:

00000000 <foo>:
   0:   e8 0b 00 00 00          call   10 <bar>
   5:   89 d0                   mov    %edx,%eax
   7:   89 ca                   mov    %ecx,%edx
   9:   c3                      ret

The disassembly of individual translation units from the Linux kernel also looks correct, though clang has other issues compiling a working 32b x86 Linux kernel, so I can't test this completely. But based on the smaller cases, this code looks correct.

Thanks Nirav!

llvm/test/CodeGen/X86/physreg-pairs.ll
4 ↗(On Diff #165332)

GCC's
values to

6 ↗(On Diff #165332)

I find this sentence slightly hard to follow. Would you mind rewording it to something along the lines of:

To match GCC's behavior in assigning 64-bit values to a 32-bit
register, we bind the pair (the given register, the following
register) from the sequence
EAX, EDX, ECX, EBX, ESI, EDI, EBP, ESP,
to the value. There is no wrapping from the sequence, so this
will fail given ESP.

This revision is now accepted and ready to land.Sep 13 2018, 12:43 PM
xbolva00 added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
41225

if (Size == 64

llvm/lib/Target/X86/X86ISelLowering.cpp
41225

space between // and Model?

This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Sep 17 2018, 11:41 AM

Isn't this fix about inline assembly? Why do we see all the scheduling/regalloc changes here?

Yes, this is a fix to match GCC's register assignment for a 64-bit in
32-bit mode to pairs of registers.

I haven't looked too deeply into the details, but register allocation order
is sensitive to the register classes (I think by way of register pressure)
which is why the additional register classes are able to change our
selection bias between 32-bit registers and 8-bits.