This is an archive of the discontinued LLVM Phabricator instance.

Add !associated metadata.
ClosedPublic

Authored by eugenis on Jan 24 2017, 4:55 PM.

Details

Summary

This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section
pointing to the metadata argument's section. The effect of that is a reverse dependency
between sections for the linker GC.

!associated does not change the behavior of global-dce. The global
may also need to be added to llvm.compiler.used.

Since SHF_LINK_ORDER is per-section, !associated effectively enables
fdata-sections for the affected globals, the same as comdats do

Diff Detail

Repository
rL LLVM

Event Timeline

eugenis created this revision.Jan 24 2017, 4:55 PM
mehdi_amini requested changes to this revision.Jan 24 2017, 5:02 PM

Can you clarify on the use case and why llvm.compiler.used isn't enough?

This revision now requires changes to proceed.Jan 24 2017, 5:02 PM
mehdi_amini added inline comments.Jan 24 2017, 5:04 PM
test/ThinLTO/X86/lazyload_metadata.ll
19 ↗(On Diff #85658)

I don't understand where this is coming from?

See https://reviews.llvm.org/D28481 for the previous attempt at this.

In short, ASan emits "metadata" globals for each original global, and needs them to be alive as long as the corresponding original global is. We can not put a reference to the "metadata" global into the original global, because that would kill .bss (among other things).

The only part of this CL that we really need is the one that emits SHF_ASSOCIATED for "metadata" globals. The GlobalDCE part mirrors the linker handling of SHF_ASSOCIATED. As an alternative, we could add all "metadata" globals to llvm.compiler.used, but that would suppress some optimizations.

See this discussion of SHF_ASSOCIATED:
https://groups.google.com/d/msg/generic-abi/_CbBM6T6WeM/LZnqx1KZAQAJ

test/ThinLTO/X86/lazyload_metadata.ll
19 ↗(On Diff #85658)

I assume, because of the extra fixed metadata kind.

Can you articulate the reasons not to you llvm.compiler_used?

This sounds _a lot_ like COMDATs, could you use those?

I mean the specific optimizations that won't happen that motivate not using llvm.compiler.used.

This sounds _a lot_ like COMDATs, could you use those?

Been there, see the generic-abi ml discussion.

I mean the specific optimizations that won't happen that motivate not using llvm.compiler.used.

I'm not sure. Perhaps, link time devirtualization? Without this, ASan would have to add every single global to llvm.compiler.used, which would make globaldce not do anything at all. If anything, it is bad for the object file size.

The real reason is w/o the GlobalDCE change !associated has strange semantics - it protects a global from linker GC, but only if said global survives compiler GC. I don't see how that is useful.

This sounds _a lot_ like COMDATs, could you use those?

Been there, see the generic-abi ml discussion.

I can't find it, could you please link it?

I mean the specific optimizations that won't happen that motivate not using llvm.compiler.used.

I'm not sure. Perhaps, link time devirtualization? Without this, ASan would have to add every single global to llvm.compiler.used, which would make globaldce not do anything at all.

It'll still operate on every functions. And this is what happens on Darwin today.

If anything, it is bad for the object file size.

Linker will garbage collect.

The real reason is w/o the GlobalDCE change !associated has strange semantics - it protects a global from linker GC, but only if said global survives compiler GC. I don't see how that is useful.

I'm not convinced at all by the !associated at this point, in particular I haven't seen anything it does which is different from what is done on Darwin.

I mean the specific optimizations that won't happen that motivate not using llvm.compiler.used.

I'm not sure. Perhaps, link time devirtualization? Without this, ASan would have to add every single global to llvm.compiler.used, which would make globaldce not do anything at all.

It'll still operate on every functions. And this is what happens on Darwin today.

Yes, and it works fine. But on Linux we can do better.

Isn't it possible that GC-ing of virtual tables would kill class methods, which can in turn kill more virtual tables and allow devirtualizing of more call sites?

If anything, it is bad for the object file size.

Linker will garbage collect.

Object file size matters, too.

The real reason is w/o the GlobalDCE change !associated has strange semantics - it protects a global from linker GC, but only if said global survives compiler GC. I don't see how that is useful.

I'm not convinced at all by the !associated at this point, in particular I haven't seen anything it does which is different from what is done on Darwin.

Darwin is not a problem here. ELF is.
On Darwin, ASan uses the "live_support" linker hack:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L1680
SHF_ASSOCIATED is the proposed ELF implementation of roughly the same concept. We need a way to represent it in the IR.

I mean the specific optimizations that won't happen that motivate not using llvm.compiler.used.

I'm not sure. Perhaps, link time devirtualization? Without this, ASan would have to add every single global to llvm.compiler.used, which would make globaldce not do anything at all.

It'll still operate on every functions. And this is what happens on Darwin today.

Yes, and it works fine. But on Linux we can do better.

I'm missing what is platform specific in this?

Isn't it possible that GC-ing of virtual tables would kill class methods, which can in turn kill more virtual tables and allow devirtualizing of more call sites?

Possible, we'll have to consider the benefit of "advanced optimizations" in the context of ASAN builds though, and balanced this with the cost.

If anything, it is bad for the object file size.

Linker will garbage collect.

Object file size matters, too.

I'll ask to see the numbers before moving forward with this motivation.

The real reason is w/o the GlobalDCE change !associated has strange semantics - it protects a global from linker GC, but only if said global survives compiler GC. I don't see how that is useful.

I'm not convinced at all by the !associated at this point, in particular I haven't seen anything it does which is different from what is done on Darwin.

Darwin is not a problem here. ELF is.
On Darwin, ASan uses the "live_support" linker hack:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L1680

Sure, I know about this: https://github.com/llvm-mirror/llvm/commit/090f75dc839cbe1d94c8445bf8fabd0f19309f25

SHF_ASSOCIATED is the proposed ELF implementation of roughly the same concept. We need a way to represent it in the IR.

Sure, I get that, but what is fundamentally different between the Darwin live_support section and SHF_ASSOCIATED?
This is likely the main piece of information that I'm missing right now.

I also don't think that adding metadata that are "correctness related" is a good thing in general.

I also don't think that adding metadata that are "correctness related" is a good thing in general.

And what I mean by that is that a "good" scheme should allow to ignore the metadata and the metadata can enable more optimization for the passes that understands it,

I also don't think that adding metadata that are "correctness related" is a good thing in general.

Yeah. This could be a global flag. But we already have !type metadata which is correctness related.

OK. We can start with !associated that does not affect DCE. Its only effect would be setting SHF_ASSOCIATED on the ELF section. ASan would add all "metadata" globals to llvm.compiler.used.

eugenis updated this revision to Diff 88834.Feb 16 2017, 5:55 PM
eugenis edited edge metadata.
eugenis edited the summary of this revision. (Show Details)

I've removed all global-dce changes and, according to D29590, added an argument to the metadata that gives the target of SHF_LINK_ORDER.
PTAL.

Can this metadata be freely dropped? Clarify this in LangRef please.

docs/LangRef.rst
5110 ↗(On Diff #88834)

80 cols.

eugenis updated this revision to Diff 88955.Feb 17 2017, 2:26 PM

Can this metadata be freely dropped? Clarify this in LangRef please.

It can not. Added a note in LangRef.

mehdi_amini added inline comments.Feb 17 2017, 2:42 PM
docs/LangRef.rst
5112 ↗(On Diff #88955)

I don't think this is an OK property of metadata.
Have you look for another representation that would not have this constraint?

eugenis added inline comments.Feb 17 2017, 2:45 PM
docs/LangRef.rst
5112 ↗(On Diff #88955)

It does not seem to be that bad. After all, !type and !absolute_symbol already have that property, they just don't document it.

eugenis updated this revision to Diff 88967.Feb 17 2017, 2:58 PM
eugenis added inline comments.
docs/LangRef.rst
5112 ↗(On Diff #88955)

How about now?
Under this list of conditions the metadata can be discarded, which will make the global non-discardable.

mehdi_amini added inline comments.Feb 17 2017, 3:00 PM
docs/LangRef.rst
5112 ↗(On Diff #88955)

From LangRef:

A transformation is required to drop any metadata attachment that it does not know or know it can’t preserve. Currently there is an exception for metadata attachment to globals for !type and !absolute_symbol which can’t be unconditionally dropped unless the global is itself deleted.
eugenis added inline comments.Feb 17 2017, 3:02 PM
docs/LangRef.rst
5112 ↗(On Diff #88955)

Ah thanks, I was expecting to find this in the individual metadata type's sections.

mehdi_amini added inline comments.Feb 17 2017, 3:06 PM
docs/LangRef.rst
5117 ↗(On Diff #88967)

From the sentence this list is just "guidelines", correct? Is it still "valid" to use the associated MD without a section? In different comdats?

The IR part (LangRef) seems OK to me.
The ELF part is out of my scope, if rafael or pcc already checked it, that's fine.

docs/LangRef.rst
5113 ↗(On Diff #88967)

colon?
Also, since it seems to be advisory and not mandatory I wouldn't use "must" but "may".

eugenis updated this revision to Diff 88974.Feb 17 2017, 3:34 PM
eugenis marked an inline comment as done.
pcc added inline comments.Feb 17 2017, 3:50 PM
include/llvm/MC/MCSectionELF.h
95 ↗(On Diff #88974)

I don't quite follow why you need to change this from storing a section to a symbol.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
220 ↗(On Diff #88974)

What if GO is null? (This could happen if we RAUW the GlobalObject with some other constant.)

I think the right behavior in that case would be to return null from this function.

eugenis updated this revision to Diff 88984.Feb 17 2017, 4:02 PM
eugenis marked an inline comment as done.
eugenis added inline comments.
include/llvm/MC/MCSectionELF.h
95 ↗(On Diff #88974)

Because we need the symbol name when printing textual assembly.

pcc added inline comments.Feb 17 2017, 4:22 PM
include/llvm/MC/MCSectionELF.h
95 ↗(On Diff #88974)

I see. Could you separate the printer fix out into a separate change please?

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
370 ↗(On Diff #88984)

Remove this line.

lib/MC/MCSectionELF.cpp
157 ↗(On Diff #88984)

Would AssociatedSymbol ever be null?

eugenis added inline comments.Feb 17 2017, 5:10 PM
include/llvm/MC/MCSectionELF.h
95 ↗(On Diff #88974)

See D30129

pcc edited edge metadata.Feb 17 2017, 6:25 PM

Can you please refresh this change?

eugenis updated this revision to Diff 89004.Feb 17 2017, 6:39 PM
eugenis marked 2 inline comments as done.
pcc added inline comments.Feb 17 2017, 6:45 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
243–244 ↗(On Diff #89003)

I think you need to make sure AssociatedSymbol is non-null before setting the flag. Please also add a negative test for a non-GlobalObject constant.

366–367 ↗(On Diff #89003)

Same here.

eugenis updated this revision to Diff 89007.Feb 17 2017, 7:01 PM
eugenis marked an inline comment as done.
pcc accepted this revision.Feb 17 2017, 7:09 PM

LGTM

This revision was automatically updated to reflect the committed changes.