This is an archive of the discontinued LLVM Phabricator instance.

[elfabi] Add support for reading DT_SONAME from binaries

Authored by amontanez on Dec 12 2018, 4:46 PM.



This change gives the llvm-elfabi tool the ability to read DT_SONAME from a binary ELF file into an ELFStub.


  • DynamicEntries struct for storing dynamic entries that are relevant to elfabi.
  • terminatedSubstr() retrieves a null-terminated substring from a StringRef.
  • appendToError() appends a string to an error, allowing more specific error messages.

Diff Detail


Event Timeline

amontanez created this revision.Dec 12 2018, 4:46 PM
amontanez updated this revision to Diff 177978.Dec 12 2018, 5:05 PM

Small code formatting fix.

jakehehrlich added inline comments.Dec 18 2018, 5:14 PM
19 ↗(On Diff #177978)

So yaml2obj actually does support dynamic symbols but I'm not sure that it supports .dynamic. grep for "DynamicSymbols" in .test and .yaml files in the code base. There should be a yaml2obj test that uses this feature. You might find proper support for .dynamic as well.

80 ↗(On Diff #177978)

I'd prefer to only do one loop over the dynamic entries; right now you do a seperate loop for each feature (here just .dynstr and the soname, but later other things as well). It's not a huge deal as there's a pretty small upper bound on how long these can be. A standard strategy is to traverse it once and put everything in an array indexed by the tag.

56 ↗(On Diff #177978)

All these except readELFFile should be static and in ELFObjHandler.cpp and not in the header.


  • There is no longer one function to populate each ELFStub member.
  • The .dynamic table is only scanned once.
amontanez marked 4 inline comments as done.Dec 20 2018, 10:59 AM
amontanez added inline comments.
19 ↗(On Diff #177978)

Unfortunately, the only mention of the .dynamic section in yaml2obj is when sh_entsize of SHT_DYNAMIC is populated. If I recall correctly, using DynamicSymbols causes .dynstr to be overwritten.

amontanez marked an inline comment as done.

Small change to make functions static.

jakehehrlich added inline comments.Dec 20 2018, 12:20 PM
68–75 ↗(On Diff #179108)

If they're never supposed to be missing, does it make sense for them to be Optional?

111 ↗(On Diff #179108)

nit: Don't fret over this too much but It would be nice if there were a way to ensure that when looking for the null terminator this didn't fall off the end of the string table. As is if the string table happens to be erroneous and not contain a null terminator, this will segfault.

amontanez marked 2 inline comments as done.


  • Mandatory DynamicEntries members are now uint64_t instead of Optional<uint64_t>, now include more checks for correctness.
  • Bound checking when creating SoName string from .dynstr table.
amontanez updated this revision to Diff 180171.Jan 3 2019, 4:09 PM

Addressed feedback in D55852: strings that overrun the bounds of a string table (due to absent null terminator or incorrect string table size) are reported as errors in terminatedSubstr().

jakehehrlich accepted this revision.Jan 3 2019, 4:29 PM

LGTM. However make sure to at least CC some people outside, if not add them as reviewers on everything.

This revision is now accepted and ready to land.Jan 3 2019, 4:29 PM
jakehehrlich requested changes to this revision.Jan 3 2019, 5:04 PM

As discussed offline I realized I missed something.

35–37 ↗(On Diff #180171)

Also this has no size, it should cover the whole PT_DYNAMIC and the string table. There isn't a way to start the offset in zero in yaml2elf (my fault, sorry) so you'll have to start it somewhere else anyhow to make the vaddr and offset consistent anyway.

140 ↗(On Diff #180171)

Ah wait, never-mind, I want this to work a bit differently. We discussed this offline but for others benifit I'll mention it here. This should work for cases when the vaddr which corresponds to ElfFile->base() is not zero (e.g. the common case for ET_EXEC and the almost never but sometimes case for ET_DYN). To do this you have to loop though the program headers, find the PT_LOAD such that the vaddr of the string table is in range of the vaddr to vaddr+filesz of the program header, and then use the offset of the program header and the difference between the vaddr of the program header and the string table to calculate the correct offset from base of the elf file. That's kind of a tricky subtle issue. You can test this by creating a binary using yaml2obj and making the vaddr of the program header that contains a dynamic string table non-zero.

This revision now requires changes to proceed.Jan 3 2019, 5:04 PM
amontanez updated this revision to Diff 180305.Jan 4 2019, 12:56 PM
amontanez marked 2 inline comments as done.

The offset of .dynstr is now calculated using the DT_STRTAB address along with the program headers.

jakehehrlich accepted this revision.Jan 4 2019, 2:27 PM

Super informative error messages!

two small things to fix before you land.

130 ↗(On Diff #180305)

This hole Range thing is super confusing to me. Just have Addr and Size and just demand the user supplies Size.

135 ↗(On Diff #180305)

You can just do Addr >= Entry.p_vaddr && Addr + Size < Entry.p_vaddr + Entry.file_sz

This revision is now accepted and ready to land.Jan 4 2019, 2:27 PM
jhenderson added inline comments.Jan 7 2019, 3:44 AM
16–19 ↗(On Diff #180305)

I would whole-heartedly support extending yaml2obj to allow for dynamic tags to be specified in a more readable format, possibly similar to how relocations or symbols are written?

21 ↗(On Diff #180305)

Do you actually need a .dynsym for this and the other tests?

27 ↗(On Diff #180305)

Can you just replace this with an explicit size?

33 ↗(On Diff #180305)

I'm not sure I understand the purpose of this comment... I think it's fairly obvious what the content is!

46 ↗(On Diff #180305)

Perhaps worth specifying the string offset in this error message?

1 ↗(On Diff #180305)

Which address is the "bad vaddr"? I assume it's the DT_STRTAB, in which case, I'd prefer the test to be called "binary-read-bad-dt-strtab" or similar.

16 ↗(On Diff #180305)

You could actually just have DT_SONAME 0, and a single null byte in the .dynstr to simplify this test slightly. Same goes for other error cases.

38 ↗(On Diff #180305)

Should this list include .dynamic too?

In general, I'm noticing some inconsistencies in the tests in what is in which segment. Could you try to unify them as much as possible, assuming that the differences aren't important?

49 ↗(On Diff #180305)

Nit: unnecessary blank line here.

34–35 ↗(On Diff #180305)

This is confusing. There's a '\0' on the end of the contents according to the section. Perhaps it should be removed?

34 ↗(On Diff #180305)

To simplify things slightly, remove the "not\0bar\0" content. We only really need one thing before and one thing after the soname string.

52 ↗(On Diff #180305)

Add a {{$}} at the end of this string, to make sure it doesn't have anything after it (this would match somelib.sofoo for example).

1 ↗(On Diff #180305)

I'm not sure we need this case, unless you want to explicitly check the validity of the .dynstr with llvm-elfabi. If it is zero-sized, the check for whether DT_SONAME is in it will achieve this.

47 ↗(On Diff #180305)

Why not? There's nothing requiring the first loadable segment to contain the elf header and program header table. A zero address is therefore possible for .dynstr.

Also, all these errors should be "address" not "offset", since that's what they refer to.

1 ↗(On Diff #180305)

This would be better named "insert-soname". A separate test for what to do if there is already a SONAME is also needed.

42 ↗(On Diff #180305)

This and the following function don't need to be in the anonymous namespace. I also don't think the DynamicEntries struct needs to be either.

74–75 ↗(On Diff #180305)

I'm not sure I understand what "binary" is referring to here.

79 ↗(On Diff #180305)

I don't think you need to specify "binary" here. Theoretically, this function would work just as well if the table was created in memory rather than read from a file.

95 ↗(On Diff #180305)

As noted in the test, this isn't true.

100 ↗(On Diff #180305)

As noted in the test, this probably isn't a necessary check.

130 ↗(On Diff #180305)

I feel like this has been done before in LLVM. At least llvm-objcopy (?) has something very similar. Do we need to reinvent the wheel?

@jakehehrlich - can you remember exactly where this was?

amontanez updated this revision to Diff 180587.Jan 7 2019, 4:50 PM
amontanez marked 20 inline comments as done.

Cleaned up tests and made them significantly more consistent.

amontanez marked an inline comment as done.Jan 7 2019, 5:02 PM
amontanez added inline comments.
16–19 ↗(On Diff #180305)

I agree, after working on theses tests for a while I think something to assist populating .dynamic is in order. Keeping the content and comments consistent becomes rather tedious. I'll get to that asap so it can precede this patch.

135 ↗(On Diff #180305)

I've split it this way so there's a distinction between

  • an address not covered by any PT_LOAD
  • an address range partially covered by a PT_LOAD

While it's not super useful for the typical binary, it's helpful for correctly designing tests.

Basically looks fine to me, with one outstanding issue, namely that I feel like the address to offset code is a reinvention of something either in llvm-objcopy or llvm-objdump somewhere, but I don't remember where unfortunately. I'd suggest checking how llvm-objdump (or possibly llvm-readobj) looks up information relating to dynamic symbols etc.

46–58 ↗(On Diff #180587)

You could probably simplify both of these to just check the SoName field. Same goes for other tests.

Also add the {{$}} here and above.

156 ↗(On Diff #180587)

Nit: Shouldn't this static be before the return value?

amontanez marked 2 inline comments as done.

Replaced addrAsOffset() with ELFFile<ELFT>::toMappedAddr(), updated tests.

Small change from (const char *)*DynStrPtr to (const char *)DynStrPtr.get() for clarity.

jhenderson accepted this revision.Jan 15 2019, 2:05 AM

LGTM, aside from one small comment.

151 ↗(On Diff #181596)

I was under the impression that C++ style casts were preferred, but I could be wrong. I also don't see a good reason to change it from what was there before (i.e. reinterpret_cast).

amontanez marked an inline comment as done.
amontanez edited the summary of this revision. (Show Details)

using reinterpret_cast now. Since it looks like it might take a bit of time to land support for dynamic entries in yaml2obj, do you feel it would be fine to land this patch now and update the tests later?

jhenderson accepted this revision.Jan 16 2019, 1:42 AM

I think it's fine to land this as is.

This revision was automatically updated to reflect the committed changes.