This fixes https://bugs.llvm.org/show_bug.cgi?id=39525
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
While that seems like a place to change, I would have thought that the code just a little farther down would have caught this....
const GlobalValue *GV = GA->getGlobal(); // If we require an extra load to get this address, as in PIC mode, we // can't accept it. if (isGlobalStubReference(Subtarget.classifyGlobalReference(GV))) return;
but I was remembering my older out-of-tree backend where I have made changes. An older classifyGlobalReference was for both functions and data. Today, we have both classifyGlobalReference and classifyGlobalFunctionReference. Today's classifyGlobalReference then ends up in TargetMachine.cpp/shouldAssumeDSOLocal which then sends you off into classifyLocalReference. With no GOTPCREL, isGlobalStubReference() returns false.
I wonder if it needs to check GV to see if it is function with a dyn_cast_or_null<function> and call classifyGlobalFunctionReference instead?
Thanks for having look at this. I had also looked at that part, but the problem here has to do with local linkage functions which do not use GOTOFF relocations so classifyLocalReference() just returns MO_NO_FLAG. That's the reason why the if (isGlobalStubReference(Subtarget.classifyGlobalReference(GV))) does not catch this. However, these local linkage functions still need to be accessed in a RIP-relative way in the PIC mode and thus should not be used as immediate values.
I think the logic in the code is that (1) return the function for any PIC mode addresses, and if not (2) assuming non-PIC mode, see if we still cannot use this address as an immediate value, . So I think it makes sense to fix this at (1) before going down to (2).
I will appreciate any comments.
Yes I see. Thanks for double checking.
I still wonder if the call to classifyGlobalReference shouldn't be a call to classifyGlobalFunctionReference if the GV is a function, but that seems like a separate question (which we don't need to answer at the moment).
I don't think this is the correct fix, and it would need a test.
This is the code in question:
#include <stdio.h> int offset = 0; static void foo2() { printf("foo2 is called\n"); } void foo() { __asm__ volatile("movq %0,%%gs:(%1)" : : "ir"((void *)&foo2), "r"(offset)); printf("foo is called\n"); }
Between the two alternative constraints, i and r, LLVM chooses i, which, if you don't look inside the assembly, seems reasonable. Then you end up generating, essentially, "MOV 64-bit immediate to GS offset 0".
I think the correct fix is probably somewhere in the inline assembly specific codepath, not this highly general one. We absolutely want to fold displacements into RIP relative address modes.
@rnk Thanks for looking into this.
I'm not sure I understand:
The change I made is inside the function void X86TargetLowering::LowerAsmOperandForConstraint(, and under case 'i': { which specifically handles operands of the inline assembly constraint i. Inside this routine DOES have the PIC checking routine in order not to emit immediates in the case of PIC modes. There are already checks for other PIC styles but a check for RIPRel is missing. So I still believe this is the right place to fix this problem.
I will be happy to have any comment and will soon add the test code.
I see, that sounds reasonable then. Sorry, the diff is uploaded without -U9999 context, so I assumed it was in one of a more general routine.
Thanks for looking into this @rnk. I am adding a unit test for this patch. Let me know if the test looks okay.
It looks like this was actually intentionally removed back in rL107727 to fix https://llvm.org/pr4752. This change breaks the test that was added for it, llvm/test/CodeGen/X86/2010-07-06-asm-RIP.ll. I guess, since this is GCC inline asm, the question is, how does GCC handle the 'i' constraint? I suspect that this new behavior may be closer to GCC, but it'll require more analysis.
Here is the original test case from the PR:
#include <stdio.h> int offset = 0; static void foo2() { printf("foo2 is called\n"); } void foo() { __asm__ volatile("movq %0,%%gs:(%1)" : : "ir"((void *)&foo2), "r"(offset)); printf("foo is called\n"); }
If you compile with GCC, you will see that it generates the same assembly that LLVM does:
foo: .LFB15: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl offset(%rip), %eax #APP # 21 "t.c" 1 movq $foo2,%gs:(%eax) # 0 "" 2 #NO_APP leaq .LC1(%rip), %rdi call puts@PLT addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc
So, I think LLVM is already correct here. You can confirm on compiler explorer: https://gcc.godbolt.org/z/rOHCb6
So with 'i' it should actually work as a PC-relative mode. Gotcha, that totally makes sense. Thanks for confirming this @rnk.