This is an archive of the discontinued LLVM Phabricator instance.

Alternative way of representing comdats
ClosedPublic

Authored by majnemer on Jun 17 2014, 8:09 AM.

Details

Summary

This is a variation on http://reviews.llvm.org/D4094

I think it is at feature parity. In particular, lib/Linker has not been updated and passes have not been audited to not modify the set of symbols in a comdat.

What it does is change the representation to not be a Value. It is intended as a way of making a discussion about these treadeoffs more concrete.

Diff Detail

Repository
rL LLVM

Event Timeline

rafael updated this revision to Diff 10503.Jun 17 2014, 8:09 AM
rafael retitled this revision from to Alternative way of representing comdats.
rafael updated this object.
rafael edited the test plan for this revision. (Show Details)
rafael added reviewers: nicholas, rnk, majnemer.
rafael added a subscriber: Unknown Object (MLST).
majnemer commandeered this revision.Jun 17 2014, 10:32 AM
majnemer edited reviewers, added: rafael; removed: majnemer.

It seems like we have rough consensus that COMDATs should behave more like Types than Values. Commandeering this differential.

rnk edited edge metadata.Jun 17 2014, 11:26 AM

+1, I don't think we need to support RAUW on comdats.

majnemer updated this revision to Diff 10525.Jun 17 2014, 9:54 PM
majnemer updated this object.
majnemer edited edge metadata.
  • Address some review feedback.
majnemer updated this revision to Diff 10531.Jun 18 2014, 12:21 AM
  • Update GlobalOpt and GlobalDCE for Comdats.
majnemer updated this revision to Diff 10593.Jun 18 2014, 4:10 PM
  • Implement changes for lib/Linker
rafael edited edge metadata.Jun 19 2014, 8:30 PM

This also needs tests for the errors. In particular

  • comdat that is only a forward ref
  • redefinition of a comdat.
  • linker errors

And one more thing, when dropping a comdat we should drop it from the comdat symbol table.

rafael added inline comments.Jun 19 2014, 8:30 PM
docs/LangRef.rst
732 ↗(On Diff #10593)

Other set of objects/sections with that key.

"the linker" in here is a bit ambiguous. Are we talking about the system linker or lib/Linker. Since we produce relocatable objects, it is probably better to state this in terms of what we do: put the global objects in a section with that comdat.

734 ↗(On Diff #10593)

Move this after the part the defines what the selection kind is maybe?

761 ↗(On Diff #10593)

This paragraph is now redundant.

763 ↗(On Diff #10593)

There are still some things should probably be mentioned:

  • We split "IR level sections" into multiple sections in the final .o since in object files you can't have a section that is part in one comdat and part in another.
  • Explain how we map mulitple sections with the same comdat to using associative comdts on COFF. Otherwise someone is sure to look at the list and think associative is missing :-)
  • Alias are placed in the same comdat the aliasee computes to, if any.
include/llvm/ADT/UniqueVector.h
24 ↗(On Diff #10593)

Cool, I had no idea we had this :-)

include/llvm/IR/GlobalValue.h
115 ↗(On Diff #10593)

This looks odd. You probably want the const version to forward to the non-const:

return const_cast<GlobalValue*>(this)->getComdat();

include/llvm/Support/GCOV.h
369 ↗(On Diff #10593)

unrelated change?

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
252 ↗(On Diff #10593)

Nice cleanup. You can commit this now if you want.

844 ↗(On Diff #10593)

Reminds me: we should probably reject GlobalObject with common linkage and an explicit comdat in the verifier.

lib/Linker/LinkModules.cpp
542 ↗(On Diff #10593)

Not sure I follow. Is this assuming that the comdat name is related to the name of a global? That may be a common use but is not in any way required by the semantics.

558 ↗(On Diff #10593)

I assume this logic matches what the COFF linker does? Can you extract it to a helper like

bool computeResultingSeelectionKind(Comdat::SelectionKind &Result, Comdat::SelectionKind &Src, Comdat::SelectionKind &Dst) ...

and put a comment saying where the merging logic comes from.

583 ↗(On Diff #10593)

This is fairly arbitrary for Newest. We should probably error or actually check the bitcode timestamp, no?

595 ↗(On Diff #10593)

"auto *" instead of "auto" is probably better since it makes it obvious it is a pointer.

626 ↗(On Diff #10593)

Not just ExactMatch.

865 ↗(On Diff #10593)

BTW, do we produce as error if we have

module1:

@a = ... comdat $c1, largest

module2

@b = ... comdat $c1, newest

that is, we have conflicting comdat selection kinds, but the comdats never show up in the corresponding global objects.

majnemer updated this revision to Diff 10795.Jun 24 2014, 10:31 AM
majnemer edited edge metadata.
  • Address some review feedback.
  • Update GlobalOpt and GlobalDCE for Comdats.
  • Implement changes for lib/Linker
  • Address the latest round of review feedback
docs/LangRef.rst
732 ↗(On Diff #10593)

The objects aren't in the same section. In COFF, they are in different sections which are associated.

734 ↗(On Diff #10593)

Done.

761 ↗(On Diff #10593)

Done.

763 ↗(On Diff #10593)

I did the last two, I don't quite understand the first one.

include/llvm/IR/GlobalValue.h
115 ↗(On Diff #10593)

Done.

include/llvm/Support/GCOV.h
369 ↗(On Diff #10593)

Done.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
252 ↗(On Diff #10593)

Done in r211601.

844 ↗(On Diff #10593)

Done.

lib/Linker/LinkModules.cpp
542 ↗(On Diff #10593)

The COMDAT name is related to the name of the global when the selection kind isn't any.

558 ↗(On Diff #10593)

Done.

583 ↗(On Diff #10593)

Done.

595 ↗(On Diff #10593)

Done.

626 ↗(On Diff #10593)

Done.

865 ↗(On Diff #10593)

There is code for this:

return emitError("Linking COMDATs named '" + ComdatName +
                 "': invalid selection kinds!");

almost there I think :-)

docs/LangRef.rst
778 ↗(On Diff #10795)

OK, I get it in what way the comdat name is significant. The linker will pick the section whose comdat symbol is the largest, not the largest section. Can you think of a way to document this a bit more explicitly? If not, don't worry. I assume this should be clear to anyone familiar with COFF.

It does suggest that for any selection other than any we should require a real symbol to be present. For example, we should error on

$bar = comdat largest
@foo = global i32 42, comdat $bar

saying "no matching symbol for $bar, cannot decide which one is largest".

781 ↗(On Diff #10795)

What I meant about IR sections being split into multiple object sections is that in, for example,

$foo = comdat any
$bar = comdat any
@g1 = global i32 42, section "sec", comdat $foo
@g2 = global i32 42, section "sec", comdat $bar

there is only one section "sec", but in the object file we end up with two sections named sec:

[ 6] sec               PROGBITS        0000000000000000 000050 000004 00 WAG  0   0  4
[ 7] sec               PROGBITS        0000000000000000 000054 000004 00 WAG  0   0  4

One last thing that we might want to document is that CodeGen can create comdats even if no comdat is explicit in the IR. For example, given

@bar = weak global i32 42

we create a comdat and put bar in it. This is something we might want to change now that we have proper comdat support, but for now it is better to just document it.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
262 ↗(On Diff #10795)

I think you can avoid the duplicated error reporting with a static helper:

const Comdat *getELFComdat(const GlobalValue &GV)

it returns null if there is no comdat and reports a fatal error it there is one with a seleciton other than any.

584 ↗(On Diff #10795)

similar to the ELF helper.

753 ↗(On Diff #10795)

It looks like for COFF we miscompile

$foo = comdat any
@bar = global i32 42, comdat $foo
@foo = global i32 42

Is that even possible to represent in COFF? Since "foo" is a comdat key, the symbol "foo" has to be in the section that has that comdat.I guess in an object file you could have two symbols named foo, but that would be hard to represent in assembly.or even at the IR for selections other than any.

Can we just error in codegen when trying to output that for COFF? It does work for ELF, so it would probably be an interesting testcase to have.

796 ↗(On Diff #10795)

This (along with the other non-any issues) can probably be handled earlier in the Verifier since we know that ELF only supports any.

lib/Linker/LinkModules.cpp
549 ↗(On Diff #10795)

Is the alias handling correct? They don't exist in the object file (they are just another symbol). So I assume that given

$foo = comdat largest
@foo = alias  getelementptr([2 x i32]* @bar, i32 0, i32 1)
@bar = global [2 x i32] zeroinitializer, comdat $foo

The linker will end up seeing that foo has a size of 32 bits.

Can we just error in here for now? Something along the lines of "cannot compute size of this alias, so cannot select one comdat". We can extend it afterwards for cases where we are able to compute the size, like simple geps.

majnemer updated this revision to Diff 10903.Jun 26 2014, 2:12 PM
  • Address some review feedback.
  • Update GlobalOpt and GlobalDCE for Comdats.
  • Implement changes for lib/Linker
  • Address the latest round of review feedback
  • Address latest round of feedback.
docs/LangRef.rst
778 ↗(On Diff #10795)

Added a verifier check.

781 ↗(On Diff #10795)

Done.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
262 ↗(On Diff #10795)

Done.

584 ↗(On Diff #10795)

Done.

753 ↗(On Diff #10795)

This is possible to represent, we get this "correct":

SECTION HEADER #2
   .data name
       0 physical address
       0 virtual address
       4 size of raw data
      B4 file pointer to raw data (000000B4 to 000000B7)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
C0300040 flags
         Initialized Data
         4 byte align
         Read Write

SECTION HEADER #4
   .data name
       0 physical address
       0 virtual address
       4 size of raw data
      B8 file pointer to raw data (000000B8 to 000000BB)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
C0301040 flags
         Initialized Data
         COMDAT; sym= _bar
         4 byte align
         Read Write

006 00000000 SECT4  notype       Static       | .data
    Section length    4, selection    5 (pick associative Section 0x2)
009 00000000 SECT4  notype       External     | _bar
00A 00000000 SECT2  notype       External     | _foo
796 ↗(On Diff #10795)

We cannot do this because we don't know if we are targeting COFF vs ELF in the Verifier.

lib/Linker/LinkModules.cpp
549 ↗(On Diff #10795)

Done.

Rafael Ávila de Espíndola wrote:

This (along with the other non-any issues) can probably be handled earlier in the Verifier since we know that ELF only supports any.

We cannot do this because we don't know if we are targeting COFF vs ELF in the Verifier.

We know that ELF only supports any, so we can (and should) assume COFF
for the other selectors.

Cheers,
Rafael

majnemer updated this revision to Diff 10904.Jun 26 2014, 2:41 PM
  • Require that Comdat group leaders be in their Comdat group for COFF.

Please add

$foo = comdat any
@bar = global i32 42, comdat $foo
@foo = global i32 42

as an ELF test.

docs/LangRef.rst
746 ↗(On Diff #10904)

The sections or the comdat symbols btw? What does link.exe do?

751 ↗(On Diff #10904)

This is undocumented and it seems unused. How about removing it for now?

756 ↗(On Diff #10904)

The sections or the comdat symbol must be the same size?

763 ↗(On Diff #10904)

"selected if the COMDAT's key is the largest."

i.e., we have to drop the "section", no?

785 ↗(On Diff #10904)

This is still missing:

What I meant about IR sections being split into multiple object sections is that in, for example,

$foo = comdat any
$bar = comdat any
@g1 = global i32 42, section "sec", comdat $foo
@g2 = global i32 42, section "sec", comdat $bar

there is only one section "sec", but in the object file we end up with two sections named sec:

[ 6] sec               PROGBITS        0000000000000000 000050 000004 00 WAG  0   0  4
[ 7] sec               PROGBITS        0000000000000000 000054 000004 00 WAG  0   0  4
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
796 ↗(On Diff #10904)

needs test

800 ↗(On Diff #10904)

needs test.

lib/IR/Verifier.cpp
601 ↗(On Diff #10904)

nit:

C.getSelectionKind() != omdat::Any

lib/Linker/LinkModules.cpp
599 ↗(On Diff #10904)

needs test.

test/MC/COFF/comdat.ll
1 ↗(On Diff #10904)

This should probably be in codegen, no MC.

test/MC/ELF/comdat.ll
1 ↗(On Diff #10904)

CodeGen.

test/MC/MachO/comdat.ll
1 ↗(On Diff #10904)

CodeGen

majnemer updated this revision to Diff 10905.Jun 26 2014, 3:17 PM
  • Remove the "newest" selection kind
majnemer updated this revision to Diff 10908.Jun 26 2014, 3:59 PM
  • Address review comments.

I've added a test for ELF.

docs/LangRef.rst
746 ↗(On Diff #10904)

Sections, COFF doesn't know how big a symbol is.

751 ↗(On Diff #10904)

Done.

756 ↗(On Diff #10904)

Sections, COFF doesn't know how big a symbol is.

763 ↗(On Diff #10904)

Your understanding is correct.

785 ↗(On Diff #10904)

Done.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
796 ↗(On Diff #10904)

Done.

800 ↗(On Diff #10904)

Done.

lib/IR/Verifier.cpp
601 ↗(On Diff #10904)

Done.

lib/Linker/LinkModules.cpp
599 ↗(On Diff #10904)

Done.

test/MC/COFF/comdat.ll
1 ↗(On Diff #10904)

Done.

test/MC/ELF/comdat.ll
1 ↗(On Diff #10904)

Done.

test/MC/MachO/comdat.ll
1 ↗(On Diff #10904)

Done.

Rafael Ávila de Espíndola wrote:

The sections or the comdat symbols btw? What does link.exe do?

Sections, COFF doesn't know how big a symbol is.

OK, I must still be missing something. If the linker doesn't know how
big a symbol is, how is largest implemented? In the current
documentation we have

  • largest: largest COMDAT key, i.e., the symbol.
  • samesize: The size of the section is the important one.

Cheers,
Rafael

majnemer updated this revision to Diff 10913.Jun 26 2014, 11:57 PM
  • Fix associative sections on COFF
  • Add a test showcasing how MS RTTI would look
majnemer updated this revision to Diff 10915.Jun 27 2014, 12:50 AM
  • Make sure that COMDATs groups whose key are an alias work

There is just one more issue that I think we need to handle in the linker and we are good:

It is very important that we don't change the set of symbol in a comdat. That means that if we have code like

@foo = global i32 42, comdat $bar
@zed = alias ...

and we cannot compute the base of zed, we should produce an error in the linker since we don't know if it is the same comdat as foo or not. So far all the aliases have been computable, so this is really just to make sure we error instead of miscompile if someone writes a fancier one.

LGTM with the doc clarification and test improvements, but please split the assembly changes in its own commit.

docs/LangRef.rst
733 ↗(On Diff #10915)

You should probably expand the "The global object may not have local linkage." in its own paragraph. It should probably go after the note on how this is implemented in COFF. Then we can say something like

Given how this is implemented on COFF, we know there is only one GlobalObject per section. That global object, or an alias based on it, is required to have the same name as the comdat. That is used to find the object and compute the section size during linking.

Since the comdat name has to be stable, the global value that shares its name must not have local linkage since those can be renamed.

include/llvm/MC/MCContext.h
164 ↗(On Diff #10915)

The idea is that it should be possible to create two different sections with

.section foo, "dr", largest, bar
.section foo, "dr", same_size, bar

correct? That is fine, but it can be tested with assembly and should probably be an independent first patch. This part LGTM with the above test checking that we create two sections.

lib/Linker/LinkModules.cpp
557 ↗(On Diff #10915)

needs test.

564 ↗(On Diff #10915)

needs test.

test/CodeGen/X86/coff-comdat.ll
55 ↗(On Diff #10915)

Thin can/should be private, right?

test/Linker/Inputs/comdat5.ll
12 ↗(On Diff #10915)

this can/should be private.

rafael accepted this revision.Jun 27 2014, 10:05 AM
rafael edited edge metadata.
This revision is now accepted and ready to land.Jun 27 2014, 10:05 AM
majnemer closed this revision.Jun 27 2014, 11:28 AM
majnemer updated this revision to Diff 10940.

Closed by commit rL211920 (authored by @majnemer).