Page MenuHomePhabricator

[Driver][Bundler] Improve bundling of object files.
ClosedPublic

Authored by ABataev on Aug 6 2019, 11:33 AM.

Details

Summary

Previously, object files were bundled using partial linking. It resulted
in the following structure of the bundled objects:

clang-offload-bundle
__CLANG_OFFLOAD_BUNDLE__<target>
<target_code>
<host_code>

But when we tried to unbundle object files, it worked correctly only for
the target objects. The host object remain bundled. It produced a lot of
junk sections in the host object files and in some cases may caused
incorrect linking.

Patch improves bundling of the object files. After this patch the
bundled object looks like this:

clang-offload-bundle
__CLANG_OFFLOAD_BUNDLE__<target>
<target_code>
__CLANG_OFFLOAD_BUNDLE__<host>
<host_code>

With this structure we are able to unbundle the host object files too so
that fter unbundling they are the same as were before.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev created this revision.Aug 6 2019, 11:33 AM
Herald added a project: Restricted Project. · View Herald Transcript

Can you detail what "incorrect linking" means? AFAIK the additional sections were just bloating the executable, but how do they affect correctness?

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Can you detail what "incorrect linking" means? AFAIK the additional sections were just bloating the executable, but how do they affect correctness?

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

It might produce a lot of "junk" sections and I saw that in some cases it may affect the code linking. There is a report from Itaru Kitayama and I found out that this oartial linking leads to the incorrectly working application. Seems to me, there is a problem with partial linking in LD in some cases.

Additional note. Seems to me, it has something to do with the partial linking. According to ld documentation, it is recommended to use -Ur option for partial linking of the C++ object files to resolve global constructors.

-Ur
For anything other than C++ programs, this option is equivalent to '-r': it generates relocatable output--i.e., an output file that can in turn serve as input to ld. When linking C++ programs, `-Ur' does resolve references to constructors, unlike '-r'. It does not work to use '-Ur' on files that were themselves linked with `-Ur'; once the constructor table has been built, it cannot be added to. Use `-Ur' only for the last partial link, and '-r' for the others.

The problem I saw is exactly connected with the global constructors, which are not called after partial linking.
Seems to me, we partially link objects files for C++ with -Ur option but we cannot say if this the last time we perform partial linking or not (we may try to bundle/unbundle objects several times, especially when we'll try to support linking with libraries). Better not to use partial linking in bundler.

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

tools/clang-offload-bundler/ClangOffloadBundler.cpp
373 ↗(On Diff #213670)

Please update if needed.

534 ↗(On Diff #213670)

Please update if needed.

541–549 ↗(On Diff #213670)

I think we should revert this change and just bundle the host object file as we do for all other targets.

572 ↗(On Diff #213670)

Please update if needed.

Hahnfeld added inline comments.Aug 13 2019, 11:41 AM
tools/clang-offload-bundler/ClangOffloadBundler.cpp
541–549 ↗(On Diff #213670)

To be clear: I agree that bundling + unbundling should result in the exact same object file, so the other changes seem good, just having the host object file easily accessible should be preserved.

ABataev marked an inline comment as done.Aug 13 2019, 11:42 AM

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling. Plus, technically, we do not unbundle the original object file, so I would not call this unbundling at all.

tools/clang-offload-bundler/ClangOffloadBundler.cpp
541–549 ↗(On Diff #213670)

We just cannot use partial linking, it does not work for C++.

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

Plus, technically, we do not unbundle the original object file, so I would not call this unbundling at all.

Well, after this patch unbundling is strictly required to do anything with the host object.

tools/clang-offload-bundler/ClangOffloadBundler.cpp
541–549 ↗(On Diff #213670)

I'm only proposing to use partial linking such that external tools have easy access to the host object. I'm fine with storing another copy of the original host object into a section and fetch it from there during unbundling.

ABataev marked an inline comment as done.Aug 13 2019, 12:00 PM

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

I don't have a small reproducer, unfortunately, only the big one.
Here is the message from the user:

Hi,
I am revisiting this possible compiler bug in Clang 8.0.0.

In the source code I am developing, there's a global static variable, nest::sli_neuron::recordablesMap_ put in the BSS section and it is expected to be fully initialized by the time nest::sli_neuron::sli_neuron() gets called, however in a gdb session:

(gdb) p nest::sli_neuron::recordablesMap_
Python Exception <type 'exceptions.AttributeError'> 'gdb.Type' object has no attribute 'name':
$1 = {<std::map<Name, double (nest::sli_neuron::*)() const, std::less<Name>, std::allocator<std::pair<Name const, double (nest::sli_neuron::*)() const> > >> = std::map with 0 elements, _vptr$RecordablesMap = 0x0}

this doesn't happen when -fopenmp-targets is _not_ used, is it not trivial to come up a reproducer, thus I am sending a work-in-progress report hoping someone will shed some light on this.

Thanks,
Itaru.

Another one:

Hi,
I am seeing a link error shown below:

`.text.startup' referenced in section `.init_array.0' of /tmp/event_delivery_manager-02f392.o: defined in discarded section `.text.startup[_ZN4nest18DataSecondaryEventIdNS_16GapJunctionEventEE18supported_syn_ids_E]' of /tmp/event_delivery_manager-02f392.o
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)

I am not sure how to tackle this as the part is referenced isn't what I am
working on. I am using the latest trunk Clang 10 at this moment.

Steps to reproduce:

Steps to produce:
$ git clone https://https://github.com/nest/nest-simulator/
$ mkdir /tmp/build/nest-clang-offload/
$ cd /tmp/build/nest-clang-offload/
$ cmake -DCMAKE_TOOLCHAIN_FILE=Platform/JURON_Clang -DCMAKE_INSTALL_PREFIX=/path/to/opt/nest-clang -Dwith-gsl=ON -Dwith-openmp=ON -Dwith-mpi=OFF -Dwith-python=OFF -Dwith-offload=ON /path/to/nest-simulator/

Plus, technically, we do not unbundle the original object file, so I would not call this unbundling at all.

Well, after this patch unbundling is strictly required to do anything with the host object.

tools/clang-offload-bundler/ClangOffloadBundler.cpp
541–549 ↗(On Diff #213670)

Ahh, I see. I will try to do this. I will pack all the device objects files + host object file into a resulting bundle in the saparate sections + partial link the original host code. Will check if this works.

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

I don't have a small reproducer, unfortunately, only the big one.
Here is the message from the user:

Hi,
I am revisiting this possible compiler bug in Clang 8.0.0.

In the source code I am developing, there's a global static variable, nest::sli_neuron::recordablesMap_ put in the BSS section and it is expected to be fully initialized by the time nest::sli_neuron::sli_neuron() gets called, however in a gdb session:

(gdb) p nest::sli_neuron::recordablesMap_
Python Exception <type 'exceptions.AttributeError'> 'gdb.Type' object has no attribute 'name':
$1 = {<std::map<Name, double (nest::sli_neuron::*)() const, std::less<Name>, std::allocator<std::pair<Name const, double (nest::sli_neuron::*)() const> > >> = std::map with 0 elements, _vptr$RecordablesMap = 0x0}

this doesn't happen when -fopenmp-targets is _not_ used, is it not trivial to come up a reproducer, thus I am sending a work-in-progress report hoping someone will shed some light on this.

Thanks,
Itaru.

Another one:

Hi,
I am seeing a link error shown below:

`.text.startup' referenced in section `.init_array.0' of /tmp/event_delivery_manager-02f392.o: defined in discarded section `.text.startup[_ZN4nest18DataSecondaryEventIdNS_16GapJunctionEventEE18supported_syn_ids_E]' of /tmp/event_delivery_manager-02f392.o
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)

I am not sure how to tackle this as the part is referenced isn't what I am
working on. I am using the latest trunk Clang 10 at this moment.

Steps to reproduce:

Steps to produce:
$ git clone https://https://github.com/nest/nest-simulator/
$ mkdir /tmp/build/nest-clang-offload/
$ cd /tmp/build/nest-clang-offload/
$ cmake -DCMAKE_TOOLCHAIN_FILE=Platform/JURON_Clang -DCMAKE_INSTALL_PREFIX=/path/to/opt/nest-clang -Dwith-gsl=ON -Dwith-openmp=ON -Dwith-mpi=OFF -Dwith-python=OFF -Dwith-offload=ON /path/to/nest-simulator/

I've seen these reports. What has eventually led to this patch, ie where do constructors come into play?

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

I don't have a small reproducer, unfortunately, only the big one.
Here is the message from the user:

Hi,
I am revisiting this possible compiler bug in Clang 8.0.0.

In the source code I am developing, there's a global static variable, nest::sli_neuron::recordablesMap_ put in the BSS section and it is expected to be fully initialized by the time nest::sli_neuron::sli_neuron() gets called, however in a gdb session:

(gdb) p nest::sli_neuron::recordablesMap_
Python Exception <type 'exceptions.AttributeError'> 'gdb.Type' object has no attribute 'name':
$1 = {<std::map<Name, double (nest::sli_neuron::*)() const, std::less<Name>, std::allocator<std::pair<Name const, double (nest::sli_neuron::*)() const> > >> = std::map with 0 elements, _vptr$RecordablesMap = 0x0}

this doesn't happen when -fopenmp-targets is _not_ used, is it not trivial to come up a reproducer, thus I am sending a work-in-progress report hoping someone will shed some light on this.

Thanks,
Itaru.

Another one:

Hi,
I am seeing a link error shown below:

`.text.startup' referenced in section `.init_array.0' of /tmp/event_delivery_manager-02f392.o: defined in discarded section `.text.startup[_ZN4nest18DataSecondaryEventIdNS_16GapJunctionEventEE18supported_syn_ids_E]' of /tmp/event_delivery_manager-02f392.o
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)

I am not sure how to tackle this as the part is referenced isn't what I am
working on. I am using the latest trunk Clang 10 at this moment.

Steps to reproduce:

Steps to produce:
$ git clone https://https://github.com/nest/nest-simulator/
$ mkdir /tmp/build/nest-clang-offload/
$ cd /tmp/build/nest-clang-offload/
$ cmake -DCMAKE_TOOLCHAIN_FILE=Platform/JURON_Clang -DCMAKE_INSTALL_PREFIX=/path/to/opt/nest-clang -Dwith-gsl=ON -Dwith-openmp=ON -Dwith-mpi=OFF -Dwith-python=OFF -Dwith-offload=ON /path/to/nest-simulator/

I've seen these reports. What has eventually led to this patch, ie where do constructors come into play?

When disabled partial linking and linked the original object file instead of the "unbundled" one manually, the global variable is initialized properly. It is a global variable which must call the constructor upon program start. With partial linking, this constructor is not called and the variable remains zeroinitialized.

Okay, so I wasn't happy with the current explanations because I don't like "fixing" an issue without understanding the problem. Here's a small reproducer:

$  cat main.cpp 
#include "test.h"

int main(int argc, char *argv[]) {
  m[1] = 2;
  return 0;
}
$ cat test.h 
#include <map>
#include <vector>

template <typename T>
struct B {
  static std::vector<int> v;

  virtual void f() {
    v.push_back(1);
  }
};

struct C : public B<int> {
  C() { }
};

template <typename T>
std::vector<int> B<T>::v;

extern std::map<int, int> m;
$ cat test.cpp 
#include "test.h"

std::map<int, int> m;

Compiling with clang++ -c main.cpp, clang++ -c test.cpp, clang++ main.o test.o -o main works fine and the resulting executable doesn't crash.
Now if instead partially linking test.o like ld -r test.o -o test.o.ld-r and then linking the executable with clang++ main.o test.o.ld-r -o main.ld-r outputs a binary that crashes at execution because the constructor of std::map<int> m is not called.

Digging in the object files reveals the reason:

$ readelf -S test.o | grep -n1 .init_array
281-       0000000000000008  0000000000000000 WAG       0     0     8
282:  [138] .init_array       INIT_ARRAY       0000000000000000  000009d0
283-       0000000000000008  0000000000000000 WAG       0     0     8
284:  [139] .rela.init_array  RELA             0000000000000000  00002460
285-       0000000000000018  0000000000000018   G      150   138     8
286:  [140] .init_array       INIT_ARRAY       0000000000000000  000009d8
287-       0000000000000008  0000000000000000  WA       0     0     8
288:  [141] .rela.init_array  RELA             0000000000000000  00002478
289-       0000000000000018  0000000000000018          150   140     8
$ readelf -S test.o.ld-r | grep -n1 .init_array
279-       0000000000000078  0000000000000000   A       0     0     4
280:  [137] .init_array       INIT_ARRAY       0000000000000000  00001270
281-       0000000000000010  0000000000000008 WAG       0     0     8
282:  [138] .rela.init_array  RELA             0000000000000000  00003b10
283-       0000000000000030  0000000000000018  IG      147   137     8

I haven't further looked into the contents of the sections, but I'd guess that the first .init_array in test.o contains the constructor for template <typename T> std::vector<int> B<T>::v. Because it might need to be merged with other TUs (in this case, it's also present in main.o) it is marked with a group flag (G). The second .init_array in test.o is probably the constructor for std::map<int, int> m and not marked with a group, but ld -r merges these two sections into one .init_array, now with a group. So when linking with main.o which also contains a constructor for template <typename T> std::vector<int> B<T>::v, the linker drops the call to the constructor of std::map<int, int> m:

$ readelf -S main | grep -n1 .init_array
43-       0000000000000160  0000000000000000   A       0     0     4
44:  [19] .init_array       INIT_ARRAY       0000000000006d98  00005d98
45-       0000000000000018  0000000000000008  WA       0     0     8
$ readelf -S main.ld-r | grep -n1 .init_array
43-       0000000000000160  0000000000000000   A       0     0     4
44:  [19] .init_array       INIT_ARRAY       0000000000006da0  00005da0
45-       0000000000000010  0000000000000008  WA       0     0     8

(note the difference in size of the two .init_array sections!)

Further notes:

  • Obviously, linking in a different order like clang++ test.o.ld-r main.o -o main.ld-r2 also results in a working executable, but that's not really a solution.
  • Calling ld -Ur doesn't change anything:
$ ld -Ur test.o -o test.o.ld-Ur
$ md5sum test.o.ld-*
a4d5cead3209ef191d5c05de63e398de  test.o.ld-r
a4d5cead3209ef191d5c05de63e398de  test.o.ld-Ur

Long story short: This very much looks like a bug in ld when using partial linking. So the best thing that Clang can do to (kind of) workaround this problem is ensuring that bundling + unbundling results in the bitwise-same host object file. However, I think we should still use partial linking for easy access to the host object file, even if we don't extract it from there (other tools using it from there have to blame ld -r, not clang-offload-bundler, that the partially linked object file doesn't correctly call global initializers).

ABataev updated this revision to Diff 215214.Aug 14 2019, 1:06 PM

Reworked to keep partial linking to make original host object available for analysis without unbundling.

Hahnfeld requested changes to this revision.Aug 15 2019, 6:54 AM

The code changes look good to me, but the test doesn't pass on x86. We've faced the same problem when clang-offload-bundler was initially committed and the current testing is the best we were able to do.

test/Driver/clang-offload-bundler.c
223–227 ↗(On Diff #215214)

This still holds: I can't partially link PPC object files on x86_64. Please revert the test changes to not actually perform the bundling.

tools/clang-offload-bundler/ClangOffloadBundler.cpp
373–375 ↗(On Diff #215214)

This isn't true anymore.

377 ↗(On Diff #215214)

I know this has been wrong before, but can you please fix to we just copy without use?

This revision now requires changes to proceed.Aug 15 2019, 6:54 AM
ABataev updated this revision to Diff 215399.Aug 15 2019, 7:42 AM

Fixed comments.

The code changes look good to me, but the test doesn't pass on x86. We've faced the same problem when clang-offload-bundler was initially committed and the current testing is the best we were able to do.

I reworked the test to make it more portable, try to test it on x86

Please submit the test changes unrelated to the code changes in a separate patch!

Hahnfeld accepted this revision.Aug 15 2019, 9:53 AM

LG, thanks for the changes.

This revision is now accepted and ready to land.Aug 15 2019, 9:53 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2019, 10:17 AM