This is an archive of the discontinued LLVM Phabricator instance.

IR: Add COMDATs to the IR
AbandonedPublic

Authored by majnemer on Jun 10 2014, 1:17 PM.

Details

Summary

This new IR facility allows us to represent the object-file semantic of
a COMDAT group.

COMDATs allow us to tie together sections and make the inclusion of one
dependent on another. This is required to implement features like MS
ABI VFTables and optimizing away certain kinds of initialization in C++.

This functionality is only representable in COFF and ELF, Mach-O has no
similar mechanism.

Diff Detail

Event Timeline

majnemer updated this revision to Diff 10299.Jun 10 2014, 1:17 PM
majnemer retitled this revision from to IR: Add COMDATs to the IR.
majnemer updated this object.
majnemer added reviewers: rnk, rafael, nicholas.
majnemer added a subscriber: Unknown Object (MLST).
rafael edited edge metadata.Jun 11 2014, 7:48 AM

This patch doesn't update lib/Linker. This means that LTO can miscompiple this, no? In particular, I think it will be wrong given something like

$foo = comdat any
@a = weak global i32, 0, comdat $foo
@b = private global i32 0, comdat $foo

in two files. It will keep only one @a, but will duplicate @b create a comdat with an extra element.

Updating lib/Linker should be fairly simple for 'any'. It should just avoid linking any a symbol if it is in a comdat that is already present in the destination. It is not clear what it should do for other selection kind or mismatched selection kinds (error?)

docs/LangRef.rst
664

The repeated "if the target has the necessary supports" is a bit redundant. Just talk about the target differences in the comdat section.

685

How do alias fit in this design?

We should probably not allow alias to cross comdats, so how about adding something like:

Aliases are placed on the same comdat as the aliassee expression computes to.

Somewhat related: The llvm codegen will currently create comdats on its own. For example, a simple weak global gets output on a comdat. Do you intend to change that? In any case, it is probably a good thing to document.

758

We only have 3 file formats. It is probably OK to say that ELF only supports 'any' :-)

771

An important difference between COFF and ELF is that in COFF only one section can be in a COMDAT. In ELF, multiple sections can be in the same comdat.

I noticed that you left associative out of the above list. Is the intention to write code that look like ELF (multiple sections in on COMDAT) but at codegen we pick one to actually be in that comdat and make others associative with it?

It is probably a good idea to mention this, since the documentation gives the impression that adding COMDAT to a global makes sure that it will be in that COMDAT.

What are we doing about linkage? For COMDATs to work, we have to make sure their contents are not modified. For example, if a COMDAT has two linkonce_odr symbols, dropping only one can cause a llink failure.

One way would be to say that isDiscardableIfUnused symbols are not allowed, but we do want to allow private. Going this way would require spiting the "can be dropped" notion out of linkage. For example, linkonce would become "weak dropable" and a plain private would have to be kept. We do want to allow entire COMDATs to be dropped, so the comdat itself would probably need the dropable bit too.

A simpler but potentially brittle option is to just say that

  • A symbol in a comdat cannot be dropped individually
  • A comdat can be dropped if every symbol in it can be dropped.

And audit pretty much every call to isDiscardableIfUnused.

include/llvm/IR/Comdat.h
32

Why do you need it to be a Value? This says that a Comdat has a type, which seems wrong.

46

section?

51

What is this needed for?

include/llvm/IR/Module.h
599

Is this need? I looks a bit strange say that comdat is "part of a module". They are really a composite property of the globals.

lib/AsmParser/LLParser.cpp
539

Does clang-format agrees with having the break in the same line?

1161

Don't repeat the name in the comment. Functions start with a lowecase letter.

lib/IR/AsmWriter.cpp
1315

This keeps unused comdats. It is probably better to keep track of which comdats are needed and output only those. This is similar to how an unused type is dropped.

lib/IR/Globals.cpp
115

Why not reject instead? This probably indicates a bug in the calling code, no?

majnemer added inline comments.Jun 12 2014, 10:24 AM
docs/LangRef.rst
664

OK, will do.

758

Sure.

771

Consider:

$v = comdat any
@v = global i32 0, comdat $v
@m = global i32 0, comdat $v

In the ELF case, @v will be selected as the COMDAT key. The .data.v and .data.m sections will both be in the COMDAT group.

In the COFF case, @v's .data section will be the primary COMDAT section. @m's .data section will be associated with it.

Because we have this behavior, we do not need an explicit associative COMDAT selection kind.

I'll work on fixing the discard problem.

include/llvm/IR/Comdat.h
32

It is a Value so that it can participate in things like use lists.

As an example why this is useful:
If we want to drop a symbol in a COMDAT group, we need to prove we don't need any of the symbols in that group. Iterating over users of the COMDAT is a straightforward way of doing this.

I could recreate this functionality just for COMDATs but it seemed like the right way to go.

51

There are places, like ConvertValIDToValue, where we get a Comdat from a Value.

include/llvm/IR/Module.h
599

A Comdat value has to live in *some* symbol table. Considering the tight relationship between Comdat value names and GlobalValue names, it makes sense for the symbol table to be at Module scope.

lib/AsmParser/LLParser.cpp
539

I matched the style in ParseFnAttributeValuePairs, I can reformat this to be in clang-format style if you wish.

1161

I'll switch to \brief.

With regard to the naming style, I followed the Golden Rule

lib/IR/AsmWriter.cpp
1315

Sure, I can check to see if the Comdat has an empty use list.

lib/IR/Globals.cpp
115

It's annoying to avoid this from the AsmParser. I'll add a verifier check.

An important difference between COFF and ELF is that in COFF only one section can be in a COMDAT. In ELF, multiple sections can be in the same comdat.

I noticed that you left associative out of the above list. Is the intention to write code that look like ELF (multiple sections in on COMDAT) but at codegen we pick one to actually be in that comdat and make others associative with it?

It is probably a good idea to mention this, since the documentation gives the impression that adding COMDAT to a global makes sure that it will be in that COMDAT.

What are we doing about linkage? For COMDATs to work, we have to make sure their contents are not modified. For example, if a COMDAT has two linkonce_odr symbols, dropping only one can cause a llink failure.

One way would be to say that isDiscardableIfUnused symbols are not allowed, but we do want to allow private. Going this way would require spiting the "can be dropped" notion out of linkage. For example, linkonce would become "weak dropable" and a plain private would have to be kept. We do want to allow entire COMDATs to be dropped, so the comdat itself would probably need the dropable bit too.

A simpler but potentially brittle option is to just say that

  • A symbol in a comdat cannot be dropped individually
  • A comdat can be dropped if every symbol in it can be dropped.

And audit pretty much every call to isDiscardableIfUnused.

Consider:

$v = comdat any
@v = global i32 0, comdat $v
@m = global i32 0, comdat $v

In the ELF case, @v will be selected as the COMDAT key. The .data.v and .data.m sections will both be in the COMDAT group.

In the COFF case, @v's .data section will be the primary COMDAT section. @m's .data section will be associated with it.

Because we have this behavior, we do not need an explicit associative COMDAT selection kind.

This in nice. It is probably a good idea to document it.


Rafael Ávila de Espíndola wrote:

Why do you need it to be a Value? This says that a Comdat has a type, which seems wrong.

It is a Value so that it can participate in things like use lists.

As an example why this is useful:
If we want to drop a symbol in a COMDAT group, we need to prove we don't need any of the symbols in that group. Iterating over users of the COMDAT is a straightforward way of doing this.

I could recreate this functionality just for COMDATs but it seemed like the right way to go.

We had problems in the past with a Class being too generic
(GlobalAlias being the most recent one), so I would suggest making it
specific.All you need to is add a "Use*" to the comdat class, no?

Comment at: include/llvm/IR/Comdat.h:51
@@ +50,3 @@
+ /// Methods for support type inquiry through isa, cast, and dyn_cast:
+ static inline bool classof(const Value *V) {

+ return V->getValueID() == Value::ComdatVal;

Rafael Ávila de Espíndola wrote:

What is this needed for?

There are places, like ConvertValIDToValue, where we get a Comdat from a Value.

Well, that seems like a problem to cover another. It shouldn't be
parsed as a Value :-(

Comment at: include/llvm/IR/Module.h:599
@@ +598,3 @@
+ comdat_iterator comdat_begin() { return ComdatList.begin(); }
+ const_comdat_iterator comdat_begin() const { return ComdatList.begin(); }

+ comdat_iterator comdat_end () { return ComdatList.end(); }

Rafael Ávila de Espíndola wrote:

Is this need? I looks a bit strange say that comdat is "part of a module". They are really a composite property of the globals.

A Comdat value has to live in *some* symbol table. Considering the tight relationship between Comdat value names and GlobalValue names, it makes sense for the symbol table to be at Module scope.

Sure. It just feels a bit more like constants or types. They have to
live somewhere, but it is more of an implementation detail where they
live. Conceptually a module has GlobalValues, triple, datalayout and
information associated with the GlobalValues (metadata, comdat,
types).

Comment at: lib/AsmParser/LLParser.cpp:539
@@ +538,3 @@
+ case lltok::kw_exactmatch: SK = Comdat::ExactMatch; break;
+ case lltok::kw_largest: SK = Comdat::Largest; break;

+ case lltok::kw_newest: SK = Comdat::Newest; break;

Rafael Ávila de Espíndola wrote:

Does clang-format agrees with having the break in the same line?

I matched the style in ParseFnAttributeValuePairs, I can reformat this to be in clang-format style if you wish.

clang-format please :-)

Comment at: lib/AsmParser/LLParser.cpp:1161
@@ +1160,3 @@
+
+/// GetComdatVal - Get a Comdat with the specified name, creating a forward

+/// reference record if needed.

Rafael Ávila de Espíndola wrote:

Don't repeat the name in the comment. Functions start with a lowecase letter.

I'll switch to \brief.

With regard to the naming style, I followed the Golden Rule

The annoying thing is that parts of the code base end up never being
fixed. What about fixing the old code in a preparation commit?

Comment at: lib/IR/Globals.cpp:115
@@ +114,3 @@
+ if (isa<GlobalVariable>(this) || isa<GlobalAlias>(this)) {
+ // Comdats on declarations are semantically meaningless, ignore them.

+ if (isDeclaration())

Rafael Ávila de Espíndola wrote:

Why not reject instead? This probably indicates a bug in the calling code, no?

It's annoying to avoid this from the AsmParser. I'll add a verifier check.

You probably need both. We want the verifier to be sure someone using
the c++ api gets notified of the error. You need a parser check to
make sure llvm-as can provide a better error message.

Cheers,
Rafael

Rafael Ávila de Espíndola wrote:

Why do you need it to be a Value? This says that a Comdat has a type, which seems wrong.

It is a Value so that it can participate in things like use lists.

As an example why this is useful:
If we want to drop a symbol in a COMDAT group, we need to prove we don't need any of the symbols in that group. Iterating over users of the COMDAT is a straightforward way of doing this.

I could recreate this functionality just for COMDATs but it seemed like the right way to go.

We had problems in the past with a Class being too generic
(GlobalAlias being the most recent one), so I would suggest making it
specific.All you need to is add a "Use*" to the comdat class, no?

Now that I think of it, a comdat having Uses is probably undesirable
in itself. Some of the potential issues that come to mind:

  • RAUW should not change what is in a comdat.
  • Logic for checking if something has only one use should also not be

affected by comdats.

  • Other code walking all uses probably be surprised to find a comdat.

It looks comdats should keep track of what GVs are in them with some
other mechanism. The simple option of just building this information
when needed might be sufficient. I think the only cases where we would
need it is

  • Some optimization to drop unused comdats if all their members can be dropped.
  • The linker.

Both of these will be looking at the entire module, so they should be
able to build the set of global values in each comdat as needed.

Cheers,
Rafael

majnemer abandoned this revision.Jul 6 2014, 2:14 PM

This was superseded by D4178 which landed as rL211920.