This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Add dynamic section printing to private-headers option
ClosedPublic

Authored by paulsemel on Jul 6 2018, 3:19 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

paulsemel created this revision.Jul 6 2018, 3:19 AM

Sorry, forgot to add the test. I will do it after lunch !

paulsemel updated this revision to Diff 154393.Jul 6 2018, 5:46 AM

Added a test

I noticed that llvm-readobj already implements a lot of stuff that would be helpful for this - specifically, it already prevents a mechanism for getting the dynamic tags. I don't think it makes sense to reinvent the wheel, so maybe you should consider finding a way to share that code between the two. It would probably need to be in the llvmObject library.

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

As with other tests, I'd rather not use a pre-canned binary here, especially one that isn't directly associated with this test. Please use yaml2obj.

5

Assuming yaml2obj is able to do this, I'd just have one of each tag we need to print (perhaps 2 for NEEDED, so that we show our string table look ups work properly). If it isn't, I'd look at extending yaml2obj in preference to messing about trying to get a binary with everything.

We also need to have one of each tag we support printing of, plus one tag that isn't interpreted at this point. In particular, we need both DT_REL and DT_RELA-style tags printing, but there are plenty of other untested tags too. I don't know if yaml2obj will allow all of these in a single file, so you might need multiple test cases.

tools/llvm-objdump/ELFDump.cpp
30

What happens if you have a malformed segment with no DT_NULL tag?

You should probably use the program header's p_filesz field as an additional check to avoid overrunning the end of the array. If we hit p_filesz and not DT_NULL, we should probably warn. Assuming you can get yaml2obj to create such a section, I'd also expect a test case for that.

35

I'd be a little bit more specific and say "DT_STRTAB entry..."

43

See my earlier comment about DT_NULL not being present.

44–45

I'm not sure we should do this upfront. I think it should be on an as-needed basis. I also think that there's something in the llvm support library that turns an integer into a hex string - see utohexstr etc.

46

You can leave this unitialised, or better yet use a StringRef.

118

I don't like the fact that this "print" function has side effects of looking up the dynamic segment. I'd rather have an extra loop in printDynamicSection do the lookup (which should be simple, because we only care about one kind of segment).

I'm also not sure that printProgramHeaders should also print the contents of the dynamic segment. I think that feels like a separate operation (i.e. I could imagine a case where you want to just print the dynamic segment, but without the rest of the program headers). I think that implies that the two should be separate unrelated functions, perhaps with a single third function printPrivateHeaders or similar that prints both.

145

DT_LOOS shouldn't be printed like this. It's a boundary marker, so would actually be some platform specific value (e.g. DT_GNU_MY_MAGIC_TAG).

Similar comments go for DT_HIGHOS, DT_LOPROC, and DT_HIGHPROC, although I see you haven't got them in the list.

147

What made you pick just this set of platform specific values?

I'd actually suggest that you should have printing for all dynamic tags listed in llvm/BinaryFormat/DynamicTags.def.

159

This is not a reliable way of looking up the base address of an ELF, because PT_PHDR is optional. I'm actually not really sure what the best way is (you could use the PT_LOAD segments as well), but fortune has it that it's unnecessary, since the dynamic segment itself has an address: you can use that to calculate the relative offset between the segment start and the dynamic string table, and use that relative offset to find the string table section. I think the following calculation would work, though you should probably check it:

StrtabOffset = DynamicSegment->p_offset - (DynamicSegment->p_vaddr -
                  StrtabTag->d_un.d_val)
169

Why 18? (Please add a comment).

174

What about if there's an error? You should not just ignore it...

paulsemel updated this revision to Diff 154945.Jul 11 2018, 1:30 AM
paulsemel marked 5 inline comments as done.

Added Jame's suggestions. Refactor code.

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

Ok.. I gave it a try. The problem is not about building the dynamic section (though it might lead to confusing tests), but more about all the needed parts we need to have to get a relevant test (Program headers, string tables etc..)
What do you think ?

tools/llvm-objdump/ELFDump.cpp
30

Ok.. I finally added a helper function in libObject, I think it is the right way to do it.
Moreover, DT_NULL is needed by the ELF specification (it must end the Dynamic section), so let's issue an error.

147

This was just to initiate this feature, I didn't really know which one to handle. But I will just take every tags in DynamicTags.def

147

Ok I'll answer here for this comment and the one above. I decided to take get all the non architecture specific tags for the moment.

159

Your method is not working on my testing binary. The address given in the dynamic section are relative to PHDR, so I do not understand how you want to do do it an other way.
I'll let it like this until I find another solution, but this one is not working :/

174

Well, I think we should just consume the error and fallback to the hex print (that's what being done here basically - I just need to consume the error)

(moving this comment out of line, because the comments got completely out of sync with the code):
Regarding the VirtBase calculation, yeah, you're right, there was a mistake on my part - I made an incorrect assumption regarding program layout.

But you definitely cannot rely on PT_PHDR, because it isn't always present.

There might be a better way than this, but I think one way we can rely on, is to loop through the program headers until finding one whose address range (i.e. p_vaddr to p_vaddr + p_memsz) includes the value of DT_STRTAB, then subtract that segment's p_vaddr from the DT_STRTAB value (giving us the relative offset within the segment of the section), before finally adding the p_offset value.

Aside: since you are now touching the libObject code, you might want to see if there are any people you should add to the reviewers from that area.

include/llvm/Object/ELF.h
451

I think this should use the dynamic program header, rather than the sections, because a runnable program doesn't need section headers (but it does not program headers). It'll be faster, because there are generally fewer program headers than sections too. Maybe it could fall back to using the section headers if there is no PT_DYNAMIC segment.

Are the functions in this file unit-tested at all? If so, we should add them. If not, we still might want to add some, but it would depend on how straightforward it is to do.

472

I'd rephrase this as "invalid empty dynamic section" (or segment if using the program headers), or similar. I also have a personal preference to be explicit with checks (i.e. do numEntry == 0), though I'm not too fussed if you prefer it this way.

478

I wonder if we should cache this result somehow, so that the majority of this function is only run once per file?

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

Does yaml2obj not allow you to pass in arbitrary values to the dynamic tags then?

Regardless, that complexity is there already, it's just hiding away in a pre-canned binary, which will make it harder to work out what to do with this test, if that binary is ever updated and/or this test ever starts to fail. I'd rather we be explicit, and not tie ourselves to that binary, if at all possible.

tools/llvm-objdump/ELFDump.cpp
21

I'd be surprised if this doesn't already exist in the LLVM support library somewhere...

30

Normally, I'd agree with you, but objdump's job is not to manipulate things, only to give information about things. That means that we should try to give as much information as possible, especially as in this case, a missing DT_NULL does not prevent us doing so. Emitting an error will presumably stop us doing any further dumps the user has requested. So I think a warning would be more appropriate.

89

Why? Also, I think this comment should be a bit more verbose, so that it is clearer. Depending on the reason, perhaps something like:

"We use "-21" in order to match GNU objdump's output. <explain reason briefly>"

(also, don't forget the trailing full stop).

94

Why not make the second one PRIx32, and then get rid of the cast when formatting later?

101

No need for this else following the continue.

jhenderson added inline comments.Jul 11 2018, 2:34 AM
tools/llvm-objdump/ELFDump.cpp
51

Similar to my earlier comments, maybe it would be better to warn for this case, and fall back to just printing the hex value, like you suggested.

This probably goes for all errors that result in us being unable to load the dynamic string table. Also, where possible, we should test each of those errors.

59

This and others should be "report_error" available in llvm-objdump.h, not report_fatal_error. They also just take an Error, so you don't have to mess about with errorToErrorCode.

145

Does your new method address this comment?

147

I think having the non-architecture-specific tags only is sensible for the first instance. A later change can add support for them (we should only decode them when using the appropriate machine type).

paulsemel marked 3 inline comments as done.Jul 16 2018, 2:04 AM

Little follow-up. I'm pretty sure it would require changes to yaml2obj to get this test written in yaml format.
Why not creating my own binary for the moment, and wait for someone (me if I have time ?) to implement this feature ?

include/llvm/Object/ELF.h
478

It's up to you. For the moment, I've just mimicked the behavior of program_headers and sections :)

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

Ok, I'll try to be more verbose on this. Here is what I would like to have in yaml2obj :

Dynamic:
        DT_NEEDED            "libc.so"
        DT_WHATEVER       0x12345678

And here is what we have :

Sections:
  - Name:            .dynamic
    Type:            SHT_DYNAMIC
    Flags:           [ SHF_ALLOC, SHF_WRITE ]
    Content:         0x0000001012002030000400

And for each reference in the dynamic section, we need to trick to build a correct binary. (thus in our case, we need to build a correct program_headers, strtab etc..). Really, it'll just get a big mess and loss of time.

This feature might definitely be added in yaml2obj. I admit that I don't really feel like building that a complicated yaml file :)

tools/llvm-objdump/ELFDump.cpp
21

Tried to grep but was there. Do you know where this kind of macros are in LLVM ?

30

Is it even possible to issue warnings from libObject ? (as now this code has moved to the libobj)

59

Well, I can't access file name here, I just mimicked the behavior above, I think this is also the reason they did this no ?
I'm probably missing something

89

Is there a rational reason apart from "We want to be GNU command line compliant" ? Not sure we might try to explain smth.

94

Seems that a static assert prevent us from doing it. See what's being done below for program header.

145

Yes, to have the markers, you need to define another macro

paulsemel updated this revision to Diff 155631.Jul 16 2018, 2:04 AM

Added Jame's suggestions

Little follow-up. I'm pretty sure it would require changes to yaml2obj to get this test written in yaml format.
Why not creating my own binary for the moment, and wait for someone (me if I have time ?) to implement this feature ?

I think this would be a reasonable way forward.

I'm still concerned by the amount of duplication between llvm-readobj's handling of this and llvm-objdump's. My impression is that only the printing and error handling sections should be distinct. The methods for looking up values etc should be in libObject.

I'm out of time now, so haven't looked at everything yet, but hopefully this will help you continue moving.

include/llvm/Object/ELF.h
459

Too much auto. Please use the explicit type here.

474

Similar to above, please use the explicit type here.

476

You can use getSectionContentsAsArray here, I think.

478

Okay, sure. I guess it's unlikely a program will need to do this multiple times, and it could be the thing to do the caching if its important. So no need to do it, I think.

488

What about if DynSecSize % sizeof(Elf_Dyn) is non-zero? That should probably be an error too.

491

I would call this "Last" or similar, since End implies one past the end in iterator terminology. That being said, it might just be easier to do the makeArrayRef before this and use the methods on the ArrayRef instead to query the last element (specifically back).

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

Okay, those are all fair comments. I'm happy for a pre-canned binary, but I'd prefer it if it is dedicated to this feature.

tools/llvm-objdump/ELFDump.cpp
21

Well, I'd expect this to be an STL extension, since it's in C++17 (see https://en.cppreference.com/w/cpp/iterator/size), and I see that there is a size method in STLExtras.h. Try taking a look at that. I'm not sure it'll quite work, as I think it is implementing version 1 from the reference page only, but maybe you could try putting up a separate review for adding version 2 support?

30

I'd expect library code to emit "Error" or "Expected" returns, and then the consumer to decide what to do with them. You can see a similar thing I did earlier this year for the DWARF debug line parser, which now returns Expected/Error instances, and the consumers often convert these into warnings (see rL331971).

51

Why not just propagate the Error back up the stack? You are already returning an Expected.

59

I'm not sure which one this lined up with before, but it looks to me that you have two options in most cases, 1) return Error/Expected up the stack, so that the call site handles it, or 2) Pass the file name into the function.

paulsemel marked 6 inline comments as done and an inline comment as not done.Jul 16 2018, 1:18 PM
paulsemel added inline comments.
include/llvm/Object/ELF.h
493

I don't think we need to issue an error here actually

493

My bad.. old comment

tools/llvm-objdump/ELFDump.cpp
30

Hmm.. I don't have any real arguments against this, but I don't really like doing this..
Isn't it another way to do this ?

paulsemel updated this revision to Diff 155745.Jul 16 2018, 1:19 PM

Added Jame's suggestions.

jhenderson added inline comments.Jul 17 2018, 2:31 AM
include/llvm/ADT/STLExtras.h
1041–1045 ↗(On Diff #155745)

Be careful that this doesn't get submitted with everything else...

include/llvm/Object/ELF.h
451

I don't think you've addressed this comment:

Are the functions in this file unit-tested at all? If so, we should add them. If not, we still might want to add some, but it would depend on how straightforward it is to do.

test/tools/llvm-objdump/private-headers-dynamic-section.test
28

Could you also add SHT_RELR cases here, please?

tools/llvm-objdump/ELFDump.cpp
21

Can you remove this now?

30

Well, we need to report the error back somehow from the library, and we don't want the library to do the printing, because then different consumers can't customise according to their needs. However, we're not actually in a library here, so this is all a bit moot!

44

o -> O (or probably better just Elf or similar, since it's not an object file if its got a dynamic section).

50–51

Is this a mistake here? Because it doesn't look like you meant to write this...

68

This should be report_error, like below.

72–74

I don't think you've addressed my earlier comments about not using VirtBase, derived from a PT_PHDR?

83–89

This sort of area is a good example of why I think we should be reusing code from llvm-readobj. Please investigate this as a priority. In particular, I note that llvm-readobj does not have to loop through the whole set of tags to get the right string.

paulsemel updated this revision to Diff 155879.Jul 17 2018, 7:14 AM
paulsemel marked 5 inline comments as done.

Alright.. Again, there was a lot of changes I wanted to do, so sorry if I messed smth @jhenderson.
I eventually moved some code from llvm-readobj to libObject, and I will refactor llvm-readobj similar code in another PR !

paulsemel added inline comments.Jul 17 2018, 7:17 AM
include/llvm/ADT/STLExtras.h
1041–1045 ↗(On Diff #155745)

Sure, I will wait for the other patch to be accepted, and then I'll rebase

include/llvm/Object/ELF.h
451

No there is basically two features unit tested in the libObject. Do you really want me to do one ?

test/tools/llvm-objdump/private-headers-dynamic-section.test
28

Are you talking about PLT REL entries ? I've added some anyway !

tools/llvm-objdump/ELFDump.cpp
72–74

I have eventually changed the way I'm getting the Dynamic String Table, to reuse at most as possible the libObject.

83–89

Well, I would just need to replace this to a switch actually

Hmm... I don't like using the dynamic symbol table to lookup the dynamic string table. There is a reason for a specific DT_STRTAB tag. We should use that. Also, this latest version won't work if the section headers have been stripped, but we should be able to dump the dynamic table in this instance, by looking at the DT_* tags in the PT_DYNAMIC segment (if present). Try adding a test for this case (i.e. section headers stripped), and you'll see an error, I suspect... Also, please make sure you have a test that has no section headers or PT_DYNAMIC segment.

Alright.. Again, there was a lot of changes I wanted to do, so sorry if I messed smth @jhenderson.
I eventually moved some code from llvm-readobj to libObject, and I will refactor llvm-readobj similar code in another PR !

Could you do this separate PR first, please? Otherwise, we end up with (temporarily) duplicate code, which might be an issue if somebody modifies llvm-readobj in the meantime.

include/llvm/ADT/STLExtras.h
1041–1045 ↗(On Diff #155745)

Actually, since it's been pointed out, you might as well use array_lengthof in this patch, to remove any dependency.

include/llvm/Object/ELF.h
451

Is it possible to easily construct an ELFFile in memory? If not, I suppose not, as we might as well use the lit tests to achieve this.

471

It occurs to me that this half of the code is currently untested. Could you add a test that shows that if a dynamic segment doesn't exist, we look for and use the .dynamic section.

You should also add a test that shows that if neither exist, we just don't print anything.

test/tools/llvm-objdump/private-headers-dynamic-section.test
28

No, I was referring to SHT_RELR, added recently to LLD (rL336594) and llvm-readobj (rL335922). See also the proposal at https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg. The important bit is that there are some additional DT_* tags for these sections: DT_RELR, DT_RELRENT, and DT_RELRSZ, and I'd like it if these were tested too.

tools/llvm-objdump/ELFDump.cpp
44

Um... This hasn't been done?

paulsemel updated this revision to Diff 156622.Jul 20 2018, 3:33 PM
paulsemel marked 4 inline comments as done.

Added Jame's suggestions.
I managed to do one of the two real test cases with only yaml2obj. The first one definitely needs too much things to be a yaml2obj test.

Ok, I think the way I'm now getting the dynamic string table is good !

Hmm no.. I'm pretty sure it is a bad idea to do a PR that is completely relying on this one. I just duplicated the code, so I don't see any problem even if someone modifies this area in llvm-readobj. I will just refactor no worries :)

tools/llvm-objdump/ELFDump.cpp
44

Was for the function above, but I should have done this one too in the meantime !

Hmm no.. I'm pretty sure it is a bad idea to do a PR that is completely relying on this one. I just duplicated the code, so I don't see any problem even if someone modifies this area in llvm-readobj. I will just refactor no worries :)

Okay. Please make sure in the commit comment to explain that this is a copy of the llvm-readobj code, and that a subsequent refactor is still to come.

include/llvm/Object/ELF.h
418–419

This strikes me as a rather large function to have in a header file. What do you think about moving the body into ELF.cpp source file? It looks like there is precedence for this already. I also noticed that the source provides a "STRINGIFY_ENUM_CASE" macro that does what you want.

451

Again, this should be moved to the source file.

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

One more test suggestion: use llvm-objcopy to remove the section headers for this test input and then run the same dump. It should produce the same output.

test/tools/llvm-objdump/private-headers-no-dynamic-segment.test
14 ↗(On Diff #156622)

Now that you've done this, feel free to keep it, but I think you can simplify it down a bit and not test all the tags here.

test/tools/llvm-objdump/private-headers-no-dynamic.test
18 ↗(On Diff #156622)

Maybe worth doing a CHECK-NOT: DT_ or similar to show that the tool isn't trying to dump garbage in this cae.

tools/llvm-objdump/ELFDump.cpp
34–37

I don't think we want this loop as the first choice. Making it the fallback would match the behaviour earlier of dynamicEntries() (i.e. use program headers first, and only fallback on sections if that doesn't work.

44

I just realised that you're reinventing the wheel again here. llvm-readobj has the same function to do the same sort of thing. Take a look at toMappedAddr in parseDynamicTable in the ElfDumper class. It probably is worth sharing this again, so as to not repeat the code, and risk getting it slightly wrong.

paulsemel updated this revision to Diff 156804.Jul 23 2018, 9:18 AM
paulsemel marked 4 inline comments as done.

Added Jame's suggestions

I haven't copied the whole thing, most of the code is from me actually.. Why can't I just land this one and refactor the other one right after ?
I'm not sure this is that much a big deal to do this way, is it ?

Anyway, this patch is for the moment relying on an other patch I made for yaml2obj, so I'll wait until it is accepted !

NB: fun fact, it seems, that we are doing better in finding the dynamic section than GNU objdump.

include/llvm/Object/ELF.h
418–419

No, I can't, because I don't want to output "DT_NEEDED", but "NEEDED"...

test/tools/llvm-objdump/private-headers-no-dynamic.test
18 ↗(On Diff #156622)

I though about it, but there is no pattern I can check.. (as I am not printing the DT_ thing)

tools/llvm-objdump/ELFDump.cpp
44

I move this toMappedAddr function to libObj, as I see a lot of use cases for it :)

jhenderson added inline comments.Jul 23 2018, 9:40 AM
include/llvm/Object/ELF.h
451

You've marked this as done but dynamicEntries is still in the header file...

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

Sorry, I guess I wasn't clear. I was expecting these to be in the same test file. So the run commands would look something like:

# RUN: llvm-objdump -p %p/Inputs/private-headers-x86_64.elf | FileCheck %s
# RUN: llvm-objcopy <insert appropriate arguments here> %p/Inputs/private-headers-x86_64.elf %t-stripped.elf
# RUN: llvm-objdump -p %t-stripped.elf | FileCheck %s

This means that you can show that the output is identical with and without the section headers.

test/tools/llvm-objdump/private-headers-no-dynamic.test
18 ↗(On Diff #156622)

This is the last bit of the output right? If so, you can do:

CHECK-NOT: {{.}} or similar.

tools/llvm-objdump/ELFDump.cpp
44

Yes, I agree with this.

paulsemel marked 2 inline comments as done.

Added Jame's suggestions.

include/llvm/Object/ELF.h
451

Oops..

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

You can't just strip the section header entry and not the whole section with llvm-objcopy (which make sense). I had to craft the binary by hand.. So this is not really possible

jhenderson added inline comments.Jul 24 2018, 2:31 AM
include/llvm/Object/ELF.h
451

How about moving this into the source file too!?

test/tools/llvm-objdump/private-headers-dynamic-section.test
2

You can strip all section headers. Use llvm-objcopy --strip-sections to do it. This will preserve the section contents, but remove the section headers (and the section header string table, if I'm not mistaken).

paulsemel updated this revision to Diff 156992.Jul 24 2018, 3:05 AM
This revision is now accepted and ready to land.Jul 24 2018, 3:12 AM
This revision was automatically updated to reflect the committed changes.
Via Web