This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Parse SHT_GNU_verneed and respect versioned undefined symbols in shared objects
ClosedPublic

Authored by MaskRay on May 16 2020, 12:43 AM.

Details

Summary

An undefined symbol in a shared object can be versioned, like f@v1.
We currently insert f as an Undefined into the symbol table, but we
should insert f@v1 instead.

The string v1 is inferred from SHT_GNU_versym and SHT_GNU_verneed.
This patch implements the functionality.

Failing to do this can cause two issues:

  • If a versioned symbol referenced by a shared object is defined in the executable, we will fail to export it.
  • If a versioned symbol referenced by a shared object in another object file, --no-allow-shlib-undefined may spuriously report an "undefined reference to " error. See https://bugs.llvm.org/show_bug.cgi?id=44842 (Linking -lfftw3 -lm on Arch Linux can cause undefined reference to __log_finite)

Diff Detail

Event Timeline

MaskRay created this revision.May 16 2020, 12:43 AM
grimar added inline comments.May 18 2020, 2:41 AM
lld/ELF/InputFiles.cpp
1217

in llvm-readobj?

1219

I have a feeling that this method does too much here.

In llvm-readelf we have the following method:

template <class ELFT>
Expected<std::vector<VerNeed>>
ELFDumper<ELFT>::getVersionDependencies(const Elf_Shdr *Sec) const;

One of possible solutions could be to refactor it and move the logic to ELFFile<ELFT> so that
it could be reused from LLD.

Or, if not, then the code here could be simplified a bit, see my comments below.

1226

I think you can use sec->sh_size?

const uint8_t *secEnd = obj.base() + sec->sh_size;

It should be enough to check that we are not going past the end of the section.

1231

>= I think.

1235

What about introducing a helper lambda:

auto checkBuf = [](const uint8_t *Buf, const Twine& Name) {
  if (B < secEnd && uintptr_t(verneedBuf) % sizeof(uint32_t) == 0)
    return true;
  error(... invalid " + Name + " at ... )
  return false;
}

It should simplify the code a bit I guess.

1253

I am not sure such precise diagnostic is needed for LLD. It could be just: "error parsing SHT_GNU_verneed: broken vna_name" or just "error parsing SHT_GNU_verneed" for all errors here. So you'll be able to use a new helper lambda too.

I think it is imprortant that llvm-readelf/llvm-readobj is able to say what is wrong with the object, but probably no need to overcomplicate the code in LLD to report the details. Alternatively, getVersionDependencies code can be moved to ELFFile<ELFT> and reused, as I've mentioned earlier. With that LLD would report detailed errors, but do not do the parsing on its side.

lld/ELF/InputFiles.h
367

It should probably be protected/private.

Just thinking if it would be possible to resolve this without having to read the verneed section of shared objects. I've not checked to see if this would make a simpler implementation though.

If a versioned symbol referenced by a shared object is defined in the executable, we will fail to export it.
If a versioned symbol referenced by a shared object in another object file, --no-allow-shlib-undefined may spuriously report an "undefined reference to " error. See https://bugs.llvm.org/show_bug.cgi?id=44842 (Linking -lfftw3 -lm on Arch Linux can cause undefined reference to __log_finite)

Reading through https://www.akkadia.org/drepper/symbol-versioning I'm not sure if a versioned symbol being defined by the application is in the spirit of the specification. As I understand it when linking an undefined reference the static linker only binds against the default version, this is then recorded as the version needed. I'm trying to work out if there is ever a case where we could create a shared object that could depend on a versioned definition in an executable as we wouldn't have the symbols from the executable when we create the shared library to record the version needed. I guess matching versioned definitions in the application could be used to preempt/intercept a version defined by a shared library, but that is about all I can think of. Do other linker's and dynamic loaders support this use case?

For a versioned symbol referenced by a shared object, that is defined in another shared object in the link, for the purposes of no-allow-shlib-undefined it could be possible with some loss of accuracy, to resolve it as defined if there were any versioned symbol definitions. This would miss the case where the versions don't match, but this would be diagnosed by the Dynamic Loader, however it wouldn't error for a reference to log_finite from a shared library to log_finite@GLIBC_2.15

MaskRay updated this revision to Diff 264659.May 18 2020, 9:33 AM
MaskRay marked 6 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Address comments

MaskRay added inline comments.May 18 2020, 9:36 AM
lld/ELF/InputFiles.cpp
1231

> makes an unlikely but valid situation (vn_cnt==0) work.

1235

The similar diagnostics occur twice. For checkBuf, we will need:

if (!checkBuf(...))
  return false;

The end result will be longer.

1253

I'll keep the parsing logic in LLD but simplify the diagnostics for now.

Just thinking if it would be possible to resolve this without having to read the verneed section of shared objects. I've not checked to see if this would make a simpler implementation though.

Thanks for raising the point. We indeed should think whether there are simpler ways to achieve something. See below, I think we need to have parseVerneed.

If a versioned symbol referenced by a shared object is defined in the executable, we will fail to export it.
If a versioned symbol referenced by a shared object in another object file, --no-allow-shlib-undefined may spuriously report an "undefined reference to " error. See https://bugs.llvm.org/show_bug.cgi?id=44842 (Linking -lfftw3 -lm on Arch Linux can cause undefined reference to __log_finite)

Reading through https://www.akkadia.org/drepper/symbol-versioning I'm not sure if a versioned symbol being defined by the application is in the spirit of the specification. As I understand it when linking an undefined reference the static linker only binds against the default version, this is then recorded as the version needed. I'm trying to work out if there is ever a case where we could create a shared object that could depend on a versioned definition in an executable as we wouldn't have the symbols from the executable when we create the shared library to record the version needed. I guess matching versioned definitions in the application could be used to preempt/intercept a version defined by a shared library, but that is about all I can think of. Do other linker's and dynamic loaders support this use case?

GNU ld exports the preempted symbol and respects -y. We don't currently have the diagnostics (resolveShared does not check Symbol::traced):

echo 'v1 { f; };' > a.ver
echo '.globl f_v1; .symver f_v1,f@v1; f_v1: g:' | as - -o a.o
ld.bfd -shared --version-script a.ver a.o -o a.so
ld.bfd --version-script a.ver a.o a.so -o a.so -y f@v1
# ld.bfd: a.o: definition of f@v1
# ld.bfd: a.so: reference to f@v1   ## we don't have this line
# ld.bfd: warning: cannot find entry symbol _start; defaulting to 0000000000401000

I have checked that the approach is similar to gold's.

For a versioned symbol referenced by a shared object, that is defined in another shared object in the link, for the purposes of no-allow-shlib-undefined it could be possible with some loss of accuracy, to resolve it as defined if there were any versioned symbol definitions. This would miss the case where the versions don't match, but this would be diagnosed by the Dynamic Loader, however it wouldn't error for a reference to log_finite from a shared library to log_finite@GLIBC_2.15

As the patch shows, the SHT_GNU_verneed parsing complexity is surmountable and we can get three things: (1) proper preemption semantics (arguably preemption may not be the desire of the application but the behavior matches GNU ld and gold) (2) proper --no-allow-shlib-undefined (3) making -y __log_finite@GLIBC_2.15 work in a separate change. So I am in favor of doing it.

MaskRay updated this revision to Diff 264671.May 18 2020, 10:06 AM

Rename verneed-shared-undefined.s to verneed-shared.s

MaskRay updated this revision to Diff 264937.May 19 2020, 8:57 AM

Rebase after D80143

grimar added inline comments.May 20 2020, 1:51 AM
lld/ELF/InputFiles.cpp
1217

It was not addressed. It doesn't read right to me:
"...but do not implement sophisticated error checking in llvm-readobj because..."

I guess you have meant "checking in LLD" or "checking like in llvm-readobj".

1225

This looks a bit cleaner probably:

ArrayRef<uint8_t> data =
    CHECK(obj.template getSectionContentsAsArray<uint8_t>(sec), this);
const uint8_t *verneedBuf = data.begin();

And then you can just inline bufEnd as data.end() I think. See below.

1228

sh_offset for ELF64 is packed<uint>, which is std::conditional_t<Is64, uint64_t, uint32_t>, i.e. it is uint64_t,
so I think size_t is not a right thing here.

1235

I also do not think it is important to use error() for invalid objects. In many places
LLD calls fatal(). Then you can simplify the code. E.g (I've not checked it):

  ArrayRef<uint8_t> data =
      CHECK(obj.template getSectionContentsAsArray<uint8_t>(sec), this);
  const uint8_t *verneedBuf = data.begin();

  auto checkBuf = [&](uint8_t *buf, size_t size, size_t align) {
    if (buf + size > data.end() || uintptr_t(buf) % align != 0)
      fatal(toString(this) + ": unable to parse SHT_GNU_verneed");
    return buf;
  };

  for (unsigned i = 0; i != sec->sh_info; ++i) {
    auto *vn = reinterpret_cast<const typename ELFT::Verneed *>(
        checkBuf(verneedBuf, sizeof(typename ELFT::Verneed), sizeof(uint32_t)));
    const uint8_t *vernauxBuf = verneedBuf + vn->vn_aux;
    for (unsigned j = 0; j != vn->vn_cnt; ++j) {
      auto *aux = reinterpret_cast<const typename ELFT::Vernaux *>(checkBuf(
          vernauxBuf, sizeof(typename ELFT::Vernaux), sizeof(uint32_t)));
      uint16_t version = aux->vna_other & VERSYM_VERSION;
      if (version >= verneeds.size())
        verneeds.resize(version + 1);
      if (aux->vna_name >= this->stringTable.size())
        fatal(toString(this) + ": unable to parse SHT_GNU_verneed");
      verneeds[version] = aux->vna_name;
      vernauxBuf += aux->vna_next;
    }
    verneedBuf += vn->vn_next;
  }
  return verneeds;
}
lld/ELF/Symbols.cpp
717 ↗(On Diff #264937)

unrelated

MaskRay updated this revision to Diff 265259.May 20 2020, 8:20 AM
MaskRay marked 7 inline comments as done.

Address comments

lld/ELF/InputFiles.cpp
1235

Changed error to fatal. This saves 6 lines. A helper checkBuf may still be a bit unnecessary I think. I do want to give a bit more contex (Vernaux parsing error instead of Verneed parsing error).

grimar added inline comments.May 21 2020, 2:23 AM
lld/ELF/InputFiles.cpp
1225

You've missed this one?
Using of getSectionContentsAsArray has benefits:

It would check the sh_offset/sh_size early and might provide a better error message:

ELFFile<ELFT>::getSectionContentsAsArray(const Elf_Shdr *Sec) const {
...
  uintX_t Offset = Sec->sh_offset;
  uintX_t Size = Sec->sh_size;

...
  if (std::numeric_limits<uintX_t>::max() - Offset < Size)
    return createError("section " + getSecIndexForError(this, Sec) +
                       " has a sh_offset (0x" + Twine::utohexstr(Offset) +
                       ") + sh_size (0x" + Twine::utohexstr(Size) +
                       ") that cannot be represented");
  if (Offset + Size > Buf.size())
    return createError("section " + getSecIndexForError(this, Sec) +
                       " has a sh_offset (0x" + Twine::utohexstr(Offset) +
                       ") + sh_size (0x" + Twine::utohexstr(Size) +
                       ") that is greater than the file size (0x" +
                       Twine::utohexstr(Buf.size()) + ")");

It also hides the calculations done here like obj.base() + sec->sh_offset and
obj.base() + std::min<uint64_t>(sec->sh_offset + sec->sh_size, obj.getBufSize());,
what is good.

grimar added inline comments.May 21 2020, 4:47 AM
lld/ELF/InputFiles.cpp
1225

Ah, and you can actually getSectionContents:

template <class ELFT>
Expected<ArrayRef<uint8_t>>
ELFFile<ELFT>::getSectionContents(const Elf_Shdr *Sec) const {
  return getSectionContentsAsArray<uint8_t>(Sec);
}
MaskRay updated this revision to Diff 265560.May 21 2020, 11:42 AM
MaskRay marked 4 inline comments as done.

Use llvm::object::ELFFile::getSectionContents

grimar added inline comments.May 22 2020, 1:08 AM
lld/ELF/InputFiles.cpp
1226

This doesn't look right to me:

  1. obj.getSectionContents(sec) returns Expected<ArrayRef<uint8_t>> and you do not handle an error,

the code will fail with LLVM_ENABLE_ABI_BREAKING_CHECKS. Do we have a test for this case?

  1. Not sure why you return no entries. This invocation might fail when sh_size or sh_offset is broken.

Shouldn't we call fatal() too?

MaskRay updated this revision to Diff 265763.May 22 2020, 9:54 AM
MaskRay marked 3 inline comments as done.

Address comments

lld/ELF/InputFiles.cpp
1226

Sorry, missed this. The first test (ShOffset: 0xFFFFFFFF) tested this scenario. It will indeed break LLVM_ENABLE_ABI_BREAKING_CHECKS` builds (which I can't test now because of some unrelated llvm-tblgen problems)

grimar accepted this revision.May 23 2020, 7:14 AM

LGTM, thanks!

lld/ELF/InputFiles.cpp
1226

nit: you probably might want to use CHECK as it often used in LLD:
(looks shorter, and I think will provide exactly the same error output)

ArrayRef<uint8_t> data = CHECK(obj.getSectionContents(sec), this);

This revision is now accepted and ready to land.May 23 2020, 7:14 AM
MaskRay updated this revision to Diff 265870.May 23 2020, 9:19 AM
MaskRay marked 2 inline comments as done.

Use CHECK(...)

This revision was automatically updated to reflect the committed changes.