Page MenuHomePhabricator

[llvm][IR] Add dso_local_equivalent Constant
ClosedPublic

Authored by leonardchan on Apr 1 2020, 3:35 PM.

Details

Summary

The dso_local_equivalent constant is a wrapper for functions that represents a value which is functionally equivalent to the global passed to this. That is, if this accepts a function, calling this constant should have the same effects as calling the function directly. This could be a direct reference to the function, the @plt modifier on X86/AArch64, a thunk, or anything that's equivalent to the resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from some other symbol defined within the same binary. The address of this value is also insignificant. The name is leveraged from dso_local where use of a function or variable is resolved to a symbol in the same linkage unit.

In this patch:

  • Addition of dso_local_equivalent and handling it
  • Update Constant::needsRelocation() to strip constant inbound GEPs and take advantage of dso_local_equivalent for relative references

This is useful for the Relative VTables C++ ABI which makes vtables readonly. This works by replacing the dynamic relocations for function pointers in them with static relocations that represent the offset between the vtable and virtual functions. If a function is externally defined, dso_local_equivalent can be used as a generic wrapper for the function to still allow for this static offset calculation to be done.

See RFC for more details.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
pcc added inline comments.Oct 1 2020, 11:22 AM
llvm/docs/LangRef.rst
3814

I would make this specific to functions. I don't think this can be made to work with globals at least with existing ELF linkers because I don't think you can get the copy relocation behavior without the symbol's value being overridden by the linkage unit with the copy relocation, and it doesn't make sense to have more than one linkage unit override the address of the same global.

rjmccall added inline comments.Oct 1 2020, 3:37 PM
llvm/docs/LangRef.rst
3814

Oh, does ELF's copy relocation necessarily steal the symbol so that it resolves into the copying linkage unit? I thought it might be separable so that you could just get a local copy of the data without changing symbol resolution. That would be a much more generically useful feature.

pcc added inline comments.Oct 1 2020, 3:46 PM
llvm/docs/LangRef.rst
3814

I think the behaviors may be separable at the binary level (i.e. you could hand-craft an executable that only copies but doesn't override), but at the linker level the copying and overriding happen at the same time when the linker sees a non-GOT-generating relocation pointing to the symbol.

leonardchan added inline comments.Oct 1 2020, 7:08 PM
llvm/docs/LangRef.rst
3814

I'm fine with applying this to just functions for now since that was what I was initially planning to support. If possible, support for variables could be rolled out in the future. It looks like, off the bat, we could just use a GOT entry if we know the symbol is dso_local, but I think we can already achieve this by having a hidden global that points to your symbol.

Took me some time to research copy relocations, so correct me if I'm wrong, but should it be R_*_GLOB_DATA that overrides the symbol address during linking, whereas the copy relocation (R_*_COPY) just does the copying as @rjmccall suggests? I think it's R_*_GLOB_DATA that the GOT uses for getting addresses(?).

pcc added inline comments.Oct 1 2020, 7:22 PM
llvm/docs/LangRef.rst
3814

The R_*_COPY is normally generated by the linker and only present in the executable or shared object. It isn't present in an object file. The symbol address override is implemented by a symbol table entry, not by a relocation.

I'm not sure what would happen if you had an R_*_COPY in an object file without a definition of the symbol. You would need to make sure that it's passed through to the binary correctly and that dynamic loaders have the expected behavior (because normally dynamic loaders expect the symbol that's being copied to also be defined locally). The size of the symbol would also need to be known at compile time (as opposed to at link time in the normal case where the linker is inserting the relocation) in order to allocate enough space for the copy.

MaskRay added a subscriber: MaskRay.Oct 1 2020, 7:55 PM
MaskRay added inline comments.
llvm/docs/LangRef.rst
3814

I agree with pcc that the variable case should be dropped.

R_*_COPY could be used for shared objects in some ld.so (e.g. musl, which just skips the first definition; but not in glibc, which special cases that a COPY skips an executable definition). Generally it does not make sense for a shared object.

R_*_COPY on an executable copying its own symbol does not work -> the behavior is like RTLD_NEXT.

rjmccall added inline comments.Oct 1 2020, 10:55 PM
llvm/docs/LangRef.rst
3814

Is that avoidable? What happens if you ask for a copy relocation in an object file, or is that not expressible?

3814

I'm fine with just supporting functions given the absence of a target that can support this for variables.

3814

I thought I deleted this comment, but apparently not; please ignore it.

Update to only support functions since we don't have a fully flushed out way of supporting global variables.

leonardchan edited the summary of this revision. (Show Details)Oct 15 2020, 12:03 PM

ping, any more comments for this?

phosek accepted this revision.Oct 23 2020, 10:45 AM

LGTM but someone else should take a look as well.

llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
37 ↗(On Diff #295936)

Propagate

This revision is now accepted and ready to land.Oct 23 2020, 10:45 AM
leonardchan marked an inline comment as done.

*ping* for other comments. I'll probably commit this at the end of the week unless others have more to add.

MaskRay requested changes to this revision.Oct 27 2020, 9:19 PM

llc -filetype=obj doesn't. Please fix it. It is a bit unfortunate but the ELFObjectWriter generally has more restrictions than the assembler writer.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2344
llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
24 ↗(On Diff #300376)
81 ↗(On Diff #300376)
83 ↗(On Diff #300376)

Shouldn't protected visibility use GV as well?

Is isDSOLocal implied by other properties?

llvm/test/CodeGen/X86/dso_local_equivalent.ll
39

_hidden_func{{$}}

44

ditto

49

ditto

54

ditto

72

ifunc stub may seem strange. Can you add a comment?

This revision now requires changes to proceed.Oct 27 2020, 9:19 PM
leonardchan marked 8 inline comments as done.

llc -filetype=obj doesn't. Please fix it. It is a bit unfortunate but the ELFObjectWriter generally has more restrictions than the assembler writer.

Ah, didn't see this. Yeah I was using this incorrectly in some of my tests. Fixed.

llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
83 ↗(On Diff #300376)

Yeah, it seems that there's actually a function that covers this already: GlobalValue::isImplicitDSOLocal.

llvm/test/CodeGen/X86/dso_local_equivalent.ll
72

Added. This is here mostly to ensure correctness since the ifunc may not be dso_local so we'd still need a stub to ensure that.

@MaskRay any more comments?

Updated some renamed variables that I didn't check prior.

Apologies for my late reply. A couple of comments inlined

See RFC for more details.

Neither the RFC nor the patch mentions the immediate use case. It is possible that the merit is buries in a deep message but for a few feature the description itself should justify its value (dblaikie mentioned that:

We should do this because I need it" (we shouldn't be implementing features for especially niche use cases/if they don't generalize) isn't always a compelling motivation but "we should do this because someone might need it" isn't either (we shouldn't be implementing features that have no users).

Please add the use cases.

llvm/include/llvm-c/Core.h
2171 ↗(On Diff #304667)

This is not defined. If you don't add a C implementation, please drop the declaration here and above

llvm/include/llvm/IR/Constants.h
892

The comment says "function" then "variable". Is "variable" a typo?

llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
20 ↗(On Diff #304667)
51 ↗(On Diff #304667)

ELF has comdat but does not need this pass.
Mach-O needs this pass but does not have comdat.

I assume that this code is untested.

llvm/test/CodeGen/X86/dso_local_equivalent.ll
5

This comment appears to be unrelated to the main purpose of the test. A file-level comment is needed to explain why darwin is picked (it does not support the lowering)

leonardchan marked 4 inline comments as done.

Apologies for my late reply. A couple of comments inlined

See RFC for more details.

Neither the RFC nor the patch mentions the immediate use case. It is possible that the merit is buries in a deep message but for a few feature the description itself should justify its value (dblaikie mentioned that:

We should do this because I need it" (we shouldn't be implementing features for especially niche use cases/if they don't generalize) isn't always a compelling motivation but "we should do this because someone might need it" isn't either (we shouldn't be implementing features that have no users).

Please add the use cases.

No worries. Thanks for the reviews! I added the use case in the LangRef. Essentially the immediate use case is for using static relocations in vtables under the new relative vtables ABI. This constant gives us a generic way to essentially "request" a PLT relocation from Clang without needing to explicitly say in the frontend "these targets support PLT entries".

llvm/include/llvm/IR/Constants.h
892

Yup, it should still be function. Fixed.

llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
51 ↗(On Diff #304667)

Added some cases for COFF where this pass is needed and comdats are supported.

leonardchan edited the summary of this revision. (Show Details)Nov 12 2020, 11:50 AM
leonardchan edited the summary of this revision. (Show Details)Nov 12 2020, 12:00 PM
MaskRay added inline comments.Nov 12 2020, 6:58 PM
llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
20 ↗(On Diff #304667)

Please use llvm::foo to define functions/variables in .cpp files

9 ↗(On Diff #304920)

Transform uses of dso_local_equivalent to a function stub for targets which do not support lowering dso_local_equivalent to MCExpr.

30 ↗(On Diff #304920)

Nit: Strings in assert usually do not need to end with a period

36 ↗(On Diff #304920)

I think the stub should be InternalLinkage (if you want to retain a symbol) or PrivateLinkage (if you don't want a symbol). You would not need hidden visibility below.

If you use a hidden function, and two translation units have dso_local declarations of the same function, the produced stubs will cause a duplicate definition error at link time.

llvm/test/CodeGen/X86/dso_local_equivalent.ll
11

-o - -> -o /dev/null

129

Use one single NO-PLT-NOT: .stub

145

Use one single NO-PLT-COMDAT-NOT: .stub

leonardchan marked 6 inline comments as done.
leonardchan added inline comments.
llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
36 ↗(On Diff #304920)

Hmm. I'm guessing the link error would only occur though if comdats weren't supported? In that case I think perhaps the stubs should be internal/private to avoid multiple definitions.

MaskRay added inline comments.Nov 14 2020, 10:34 AM
llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
36 ↗(On Diff #304920)

For stubs for declarations (not defined), there are some pitfalls. On ELF, if in a comdat, a hidden symbol works better because it can be merged with symbols in other translations; if not in a comdat, private/internal linkage has to be used to avoid linker issues. Such a comdat concept probably does not work on COFF. Mach-O does not have comdat.

The pass serves for binary formats which I think are not your main concern, am I right? :) If you cannot find people willing to review the Mach-O/COFF part, I recommend that you drop this pass from this patch because it is not needed by ELF (and your use case will condition the support on ELF anyway)

leonardchan added inline comments.Nov 16 2020, 10:21 AM
llvm/lib/CodeGen/DSOLocalEquivalentLowering.cpp
36 ↗(On Diff #304920)

Ah, ok I'll drop this pass then. The immediate use case for us is only ELF formats. I wasn't sure to what extent this should be supported for other binary formats, but I suppose those could be added down the line.

MaskRay accepted this revision.Nov 16 2020, 9:23 PM
MaskRay added inline comments.
llvm/include/llvm-c/Core.h
266 ↗(On Diff #305563)

Adding a member in the middle breaks prebuilt binaries linking against llvm-c. How about just dropping the change?

This revision is now accepted and ready to land.Nov 16 2020, 9:23 PM
leonardchan marked an inline comment as done.
leonardchan added inline comments.
llvm/include/llvm-c/Core.h
266 ↗(On Diff #305563)

Done. The only concern is that there is no generated value for this in the Value::ValueTy enum which would've led to some errors, so I added it manually with a comment explaining why it's there.

Welp, it turns out that if I went with the approach of adding the constant manually everywhere the macro was used, I'd be adding a lot of special cases everywhere. The only case I needed to cover was in Core which depends on Value.def:

/usr/local/google/home/leonardchan/llvm-monorepo/llvm-project-1/llvm/lib/IR/Core.cpp:832:12: note: expanded from macro 'HANDLE_VALUE'
    return LLVM##Name##ValueKind;
           ^
<scratch space>:239:1: note: expanded from here
LLVMDSOLocalEquivalentValueKind

The only way I could think of preventing the constant from being used here while still being used in other places HANDLE_VALUE/CONSTANT are used is a "banlist" approach via more macros in Value.def. Open to other ideas if it doesn't look that good.

This revision was landed with ongoing or failed builds.Nov 19 2020, 10:27 AM
This revision was automatically updated to reflect the committed changes.