This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Support emitting GOT relocations for function calls
ClosedPublic

Authored by scott.linder on Jan 29 2019, 2:14 PM.

Details

Summary

Currently we make this decision based on addrspace, with the rationale being that Private (and so Generic/Flat) can't meaningfully support relocations. Until recently functions could not have a non-default (i.e. non-Generic/Flat) addrspace at all, meaning we never emitted GOT relocations for functions.

The approach taken in this patch is to explicitly support GOT for functions, regardless of addrspace. We might instead consider whether addrspace is the correct way to make the decision in the first place. In the long term we may also wish to change the default addrspace of our functions.

Diff Detail

Event Timeline

scott.linder created this revision.Jan 29 2019, 2:14 PM
arsenm added inline comments.Jan 29 2019, 2:18 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3488–3492

Looking for a specific load opcode looks concerning. Where is this coming from? I would think we need a pseudo at this point

scott.linder added inline comments.Jan 29 2019, 3:06 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3488–3492

SITargetLowering::LowerGlobalAddress adds the load after creating the SI_PC_ADD_REL_OFFSET pseudo. Would it be better to add a new pseudo for just the load, or create a pseudo to include the existing PC relative calculation and the load, i.e. SI_LOAD_PC_ADD_REL_OFFSET? I don't think the load needs to be bundled to enforce an ordering or anything.

arsenm added inline comments.Jan 29 2019, 3:16 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3488–3492

I'm worried about selecting a different opcode, and the possibility of some transform somewhere obscuring the address.

I think I looked at this before, and other targets use different call opcodes based on the address type. I think I did it this way because unlike other targets, we don't embed the call address in the actual call instruction. I think using the second pseudo would be closer to this, though I still don't like relying on getVRegDef here.

scott.linder marked an inline comment as done.Jan 30 2019, 1:47 PM
scott.linder added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
3488–3492

I started to implement another pseudo, but as you mention getVRegDef doesn't seem right here. Is there a reason why the Global isn't carried through all of ISel, rather than being "recovered" later?

I've tried to implement carrying a copy of the function global through the DAG from the beginning, rather than recovering it after legalizing the GlobalAddress. I don't know if this is a reasonable approach, but I figured I would post something and see what you think.

arsenm added inline comments.Jan 31 2019, 7:51 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3496

What happened to TCRETURN? I'm pretty sure we need this

test/CodeGen/AMDGPU/byval-frame-setup.ll
17 ↗(On Diff #184603)

Can you commit the test changes adding hidden first?

arsenm added inline comments.Jan 31 2019, 7:54 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3998–4001

Can you add a comment here

scott.linder marked 3 inline comments as done.

Address feedback

lib/Target/AMDGPU/SIISelLowering.cpp
3496

After removing the need to insert the original global here there was nothing to do for TCRETURN in the custom inserter, so I just deleted the _ISEL variant and used tablegen to legalize ISD::TC_RETURN directly to SI_TCRETURN.

arsenm added inline comments.Feb 1 2019, 11:49 AM
test/CodeGen/AMDGPU/function-call-relocs.ll
49

Can you add a case with a tail call that needs the GOT just to be sure?

Add test for tail call through GOT

arsenm accepted this revision.Feb 1 2019, 2:01 PM

LGTM

This revision is now accepted and ready to land.Feb 1 2019, 2:01 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2019, 11:59 AM