This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Remove -Wl,-fini=__kmp_internal_end_fini
ClosedPublic

Authored by aaronpuchert on Nov 6 2019, 5:38 PM.

Details

Summary

The termination function duplicated the functionality of the
attribute((destructor))-annotated function kmp_internal_end_fini,
and we have no indication that this doesn't work.

The function might cause issues with link-time optimization turned on:
until very recently, none of the usual linkers was reporting functions
named in -Wl,-fini as used to the LTO plugin, so it might be dropped.
If the function is dropped, -Wl,-fini=__kmp_internal_end_fini doesn't
do what we want: with ld.bfd and lld it drops the FINI attribute from
.dynamic and with gold we get FINI = 0x0, which leads to a crash on
cleanup. This can be reproduced by building with

-DLLVM_ENABLE_PROJECTS="clang;openmp" \
-DLLVM_ENABLE_LTO=Thin \
-DLLVM_USE_LINKER=gold

The issue in lld has been fixed in f95273f75aa, but gold remains without
fix so far.

Fixes PR43927.

Event Timeline

aaronpuchert created this revision.Nov 6 2019, 5:38 PM
aaronpuchert edited the summary of this revision. (Show Details)Nov 6 2019, 5:39 PM
aaronpuchert edited the summary of this revision. (Show Details)Nov 6 2019, 5:53 PM

Belatedly filed bug 43927. Feel free to comment there if you think this isn't the proper fix.

Looks acceptable to me. As well as removing the kmp_internal_end_fini symbol at all, as it duplicates the functionality of kmp_internal_end_dtor which has attribute "destructor". I am OK with either solution, would be good to hear others opinion.

Does anybody knows a platform where "attribute((destructor))" does not work? If yes then we may need both destructors in place.

As well as removing the __kmp_internal_end_fini symbol at all, as it duplicates the functionality of __kmp_internal_end_dtor which has attribute "destructor". I am OK with either solution, would be good to hear others opinion.

That should solve the problem, too. The destructor doesn't seem to be optimized out.

Does anybody knows a platform where __attribute__((destructor)) does not work? If yes then we may need both destructors in place.

That's the interesting question. The comment saying that it doesn't work is from 2009, and might not reflect the current situation.

// 2009-09-08 (lev): It looks the destructor does not work. In simple test cases
// destructors work perfectly, but in real libomp.so I have no evidence it is
// ever called. However, -fini linker option in makefile.mk works fine.

Unfortunately it doesn't tell us where this has been observed. Since it came with rOMP191506, I would guess it was written by an Intel employee, but nobody can be expected to remember the details of this after 10 years.

Let's say we remove __kmp_internal_end_fini and rely on __kmp_internal_end_dtor, can we write a test case ensuring that __kmp_internal_end_atexit is called? That calls into __kmp_task_team_wait, so maybe we can exit while some tasks are still running, and get a different observable result depending on whether we wait or not?

aaronpuchert planned changes to this revision.Nov 12 2019, 6:18 PM

I will try removing __kmp_internal_end_fini with a test that __kmp_internal_end_atexit is still called.

I will try removing __kmp_internal_end_fini with a test that __kmp_internal_end_atexit is still called.

I don't know a reliable way to test that dtor was called. But was able to write a linux test which fails 99% if dtor was not called:

#include <dlfcn.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#define LIBNAME "libomp.so"
int volatile flag = 0;
int(*get_max_threads)(void) = NULL; // omp_get_max_threads
void *foo(void* arg)
{
  printf("enter foreign thread\n");
  fflush(0);
  if (get_max_threads)
  {
    int nt = get_max_threads();
    flag = 1;
    printf("sleeping foreign thread\n");
    fflush(0);
    usleep(100000);
    printf("exiting foreign thread\n");
    fflush(0);
  } else {
    printf("error loading runtime library\n");
    exit(1);
  }
  pthread_exit(0);
}
int main(int argc, char** argv )
{
  pthread_t t;
  int rc;
  char *error;
  void *h = dlopen(LIBNAME, RTLD_LAZY);
  if (h == NULL) {
    printf("dlopen failed with error %s\n", dlerror());
    exit(1);
  }
  get_max_threads = (int(*)(void))dlsym(h, "omp_get_max_threads");
  if ((error = dlerror()) != NULL) {
    fprintf(stderr, "%s\n", error);
    exit(2);
  }
  rc = get_max_threads();
  printf("initial thread registered for %d threads\n", rc);
  fflush(0);
  rc = pthread_create(&t, NULL, foo, NULL);
  if (rc) {
    printf("Could not create thread!\n");
    exit(-1);
  }
  // let foreign thread register itself with the runtime library
  // (to prevent conflict of initialization and shutdown)
  while(flag == 0) {
    usleep(1);
  }
  // try to cleanup
  dlclose( h );
  // foreign thread should crash if library closed w/o shutdown
  pthread_join(t, NULL);
  printf("passed\n");
  fflush(0);
  return 0;
}

Tried the test with regular library and with one with empty destructors:

$ clang -pthread test.c -ldl

The library with fake (empty) destructors:
$ ./a.out
initial thread registered for 88 threads
enter foreign thread
sleeping foreign thread
exiting foreign thread
Segmentation fault (core dumped)

The library with real destructor(s):
$ ./a.out
initial thread registered for 88 threads
enter foreign thread
sleeping foreign thread
exiting foreign thread
passed

Not sure though how to properly run such test in LIT (all library tests so far used -fopenmp, while this one needs to avoid it).

I don't know a reliable way to test that dtor was called. But was able to write a linux test which fails 99% if dtor was not called:

Write to a file in the destructor, check the file contents after execution?

aaronpuchert marked an inline comment as done.Nov 13 2019, 7:23 PM

foreign thread should crash if library closed w/o shutdown

Why exactly do we expect this crash?

Not sure though how to properly run such test in LIT (all library tests so far used -fopenmp, while this one needs to avoid it).

Is the problem with -fopenmp that the library is unloaded too late for any meaningful side effects? Why is the destructor still necessary with -fopenmp? Maybe we can observe memory leaks with the address sanitizer otherwise?

Or maybe we could utilize the tracing? That would be a bit unconventional, and we would need to exclude Release builds, but it would be pretty reliable.

A non-empty destructor will be called, modulo bugs and targets that haven't implemented ctors/dtors. It seems a safe bet that an architecture which can support shared objects and openmp will be able to run destructors. There's probably an existing test for them.

The function could be retained by marking it attribute(("used")), which may be necessary when statically linking this library anyway.

There's probably an existing test for them.

I can't say if there is a test case specifically for this, but indeed there are around 40 failing tests when I remove both termination functions. So I guess that means it's somehow tested already. Now I feel a bit stupid for not trying this before.

The function could be retained by marking it __attribute__(("used")),

Just to be clear, ThinLTO wasn't optimizing out the destructor, just the -Wl,-fini function. But maybe I'm misunderstanding here.

which may be necessary when statically linking this library anyway.

Why might it be necessary? I thought the destructor attribute is preserved in the object file, and on Unices static libraries are just collections of object files. Then the destructor should end up in the application binary, and should be registered there. Or do you mean that an application using libomp is completely statically linked, and doesn't even have a .dynamic section?

Remove __kmp_internal_end_fini instead of exporting it: it shouldn't be needed.

aaronpuchert retitled this revision from [libomptarget] Export __kmp_internal_end_fini to fix [Thin]LTO build to [OpenMP] Remove -Wl,-fini=__kmp_internal_end_fini.Nov 14 2019, 7:37 PM
aaronpuchert edited the summary of this revision. (Show Details)

The function I was suggesting marking used is the one passed to the linker as -fini, not the one marked destructor.

The function I was suggesting marking used is the one passed to the linker as -fini, not the one marked destructor.

Ok, so you were saying that we should be able to remove it, but alternatively we could also mark it as used. That makes sense.

This revision is now accepted and ready to land.Nov 18 2019, 8:25 AM

The current change (without the racy test) looks good to me.

This is presumably a user visible ABI break. As far as I know that's OK for the OpenMP runtime. Anyone have more information on the policy here?

The current change (without the racy test) looks good to me.

Yes, we don't seem to need an additional test case.

This is presumably a user visible ABI break.

Can you explain what makes this is an ABI break? I don't think that applications linking with libomp.so need to be recompiled/relinked. The only thing that changed is that the dynamic linker won't run __kmp_internal_end_fini anymore. It wasn't exported anyway. Whether the library is compiled with -fini has (to my knowledge) no effect on executable files linked with the library.

Can you explain what makes this is an ABI break? I don't think that applications linking with libomp.so need to be recompiled/relinked. The only thing that changed is that the dynamic linker won't run __kmp_internal_end_fini anymore. It wasn't exported anyway. Whether the library is compiled with -fini has (to my knowledge) no effect on executable files linked with the library.

In so far as a function which used to be present in the shared library now isn't. I'd missed that it was not exported. All good, thanks.

This revision was automatically updated to reflect the committed changes.