In the case of a 64-bit value being assigned to a named 32-bit register in 32-bit mode, match GCC's translated assignment to register pairs.
Fixes PR38730.
Differential D51502
[X86] Fix register resizings for inline assembly register operands. niravd on Aug 30 2018, 12:14 PM. Authored by
Details In the case of a 64-bit value being assigned to a named 32-bit register in 32-bit mode, match GCC's translated assignment to register pairs. Fixes PR38730.
Diff Detail
Event Timeline
Comment Actions I tested and can verify that this patch fixes the crash in the compiler. I think we chose a different second register in this case than GCC does; maybe that's ok? Comment Actions If someone is doing this, and intentionally ignoring gcc's warning, I think we have to assume they're depending on gcc's register choices. So we should either reject or match gcc. Comment Actions Testing this patch locally, I think the register choice is indeed important. Consider the following test case: long long foo () { register long long x asm("edx"); asm("call bar": "=r"(x)); return x; } long long bar () { return 0x0011223344556677; } GCC produces the following disassembly: 00000000 <foo>: 0: e8 fc ff ff ff call 1 <foo+0x1> 5: 89 d0 mov %edx,%eax 7: 89 ca mov %ecx,%edx 9: c3 ret a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 00000010 <bar>: 10: b8 77 66 55 44 mov $0x44556677,%eax 15: ba 33 22 11 00 mov $0x112233,%edx 1a: c3 Clang+this patch produces the following disassembly (with my annotations, read bar first, then foo): 00000000 <foo>: 0: 56 push %esi 1: e8 0a 00 00 00 call 10 <bar> // are we clobbering one of the two return registers here? 6: 89 d0 mov %edx,%eax 8: 89 f2 mov %esi,%edx a: 5e pop %esi b: c3 ret c: 0f 1f 40 00 nopl 0x0(%eax) 00000010 <bar>: // ok so 64b return values are passed in %eax, then %edx for -m32. 10: b8 77 66 55 44 mov $0x44556677,%eax 15: ba 33 22 11 00 mov $0x112233,%edx 1a: c3 ret Comment Actions Match GCC's register assignment behavior. This causes some minor test case reordering from the introduced register classes. Interestingly unfold-masked-merge-vector-variablemask.ll has slightly fewer spills. I'm working on a follow up patch to add warnings for these cases, but this should be okay to go in now. Comment Actions The behavior of this version of this patch is not quite right just yet, in terms of matching GCC's curious behavior. Consider my test case from the above comment again: // cc -O3 -m32 -c -o edx.o long long foo () { register long long x asm("edx"); asm("call bar": "=r"(x)); return x; } long long bar () { return 0x0011223344556677; } gcc produces the following disassembly: ; objdump -Dr edx.o 00000000 <foo>: 0: e8 fc ff ff ff call 1 <foo+0x1> 1: R_386_PC32 bar 5: 89 d0 mov %edx,%eax 7: 89 ca mov %ecx,%edx 9: c3 ret while Clang plus this version of this patch produces: 00000000 <foo>: 0: 53 push %ebx 1: 56 push %esi 2: e8 09 00 00 00 call 10 <bar> 7: 89 d8 mov %ebx,%eax 9: 89 f2 mov %esi,%edx b: 5e pop %esi c: 5b pop %ebx d: c3 ret It seems for 32b x86, when returning 64b, the lower 32b is returned in %eax and the upper 32b in %edx, which both compilers get correct here. What differs is the source registers of the output argument to the inline asm block. To see why this is critical, the crashing test case for the bugreport in 36378 is from the Linux kernel's __get_user_8 function, which is defined in arch/x86/lib/getuser.S. Specifically the comment:
So in order to support this unspecified calling convention, it's critical that we read from %ecx for the upper 32b, and %edx for the lower 32b following the call. From GCC's disassembly, you can see this in the source of the movs (left operand) after the call. Comment Actions Huh. It looks like I commited a partial patch change. The uploaded patch has only some of my change to remove the remove breaks from the switch. The EAX, EDX, and ECX cases should have also be returns (or at least have breaks). Corrected patch here. Comment Actions There we go, clang now even produces one less relocation than gcc, but otherwise the same instructions, from my test case: 00000000 <foo>: 0: e8 0b 00 00 00 call 10 <bar> 5: 89 d0 mov %edx,%eax 7: 89 ca mov %ecx,%edx 9: c3 ret The disassembly of individual translation units from the Linux kernel also looks correct, though clang has other issues compiling a working 32b x86 Linux kernel, so I can't test this completely. But based on the smaller cases, this code looks correct. Thanks Nirav!
Comment Actions Isn't this fix about inline assembly? Why do we see all the scheduling/regalloc changes here? Comment Actions Yes, this is a fix to match GCC's register assignment for a 64-bit in I haven't looked too deeply into the details, but register allocation order |
trailing whitespace