This is an archive of the discontinued LLVM Phabricator instance.

[ObjectYAML][yaml2obj][ELF] Add basic support for dynamic entries
Changes PlannedPublic

Authored by amontanez on Jan 10 2019, 4:56 PM.

Details

Summary

This change introduces basic support for adding .dynamic entries to an ELF binary produced with yaml2obj.
Added:

  • YAML traits for .dynamic section contents.
  • d_tag can be populated by using the names of the tags.
  • yaml2obj writes .dynamic entries.
  • Some dynamic entries are validated for correctness.
  • Some dynamic entries can be automatically inferred.
  • Values can be numbers, strings (for things like DT_SONAME and DT_NEEDED, for example), or section names that are converted to addresses.

Diff Detail

Repository
rL LLVM

Event Timeline

amontanez created this revision.Jan 10 2019, 4:56 PM
ruiu added a comment.Jan 10 2019, 5:13 PM

I wonder if you really want to distinguish d_val from d_ptr. Here is the definition of the Elf_Dyn struct:

struct Elf32_Dyn {
  Elf32_Sword d_tag; // Type of dynamic table entry.
  union {
    Elf32_Word d_val; // Integer value of entry.
    Elf32_Addr d_ptr; // Pointer value of entry.
  } d_un;
};

d_val and d_ptr are exclusive and are just aliases to each other. As far as I know, people don't usually try to distinguish them; instead just use d_val.

You could distinguish them and use either d_val and d_ptr depending on a d_tag value, but if you actually to do that, you have to explicitly specify which field, d_val or d_ptr, you want to use for *every* d_tag value. That's perhaps isn't worth it.

So, if I were you, I'd simply use d_val for all d_tag values and don't use d_ptr. In other words, I'd define DynamicEntry as follows:

struct DynamicEntry {
  ELF_DT Tag;
  int64 Val;
};

instead of this:

struct DynamicEntry {
  ELF_DT Tag;
  Optional<uint64_t> Val;
  Optional<llvm::yaml::Hex64> Ptr;
};
amontanez updated this revision to Diff 181192.Jan 10 2019, 5:13 PM

I was thinking the same while writing this. The only argument I have is that this allows obj2yaml to print values in decimal and pointers in hexadecimal. That's a rather weak argument though. I'm totally fine with simplifying it down to just using d_val.

On the hex versus decimal front, I think hex is the better way to go.

llvm/include/llvm/ObjectYAML/ELFYAML.h
94

Somewhat moot if you follow @ruiu's suggestion (which I support), but this should be an int64_t.

llvm/lib/ObjectYAML/ELFYAML.cpp
60

Where did you get this list from? It looks like you might have missed some that LLVM supports (e.g. the DT_RELR* tags). I'd recommend taking the list from the list in BinaryFormat/DynamicTags.def.

95–98

I'm not sure these 4 really should be accepted, since they don't actually represent true dynamic tag values, but are rather markers.

That being said, I see that the equivalents in SHT_* are, so I'm okay with them being left in.

llvm/test/tools/yaml2obj/dynamic-entry-missing-val.yaml
14

Is it possible to give more context to this message, e.g. Dynamic entry "DT_FLAGS"... or something similar that allows easier identification?

llvm/test/tools/yaml2obj/dynamic-section.yaml
1

I think that there was nothing wrong with the original test. I'd like to be able to write dynamic sections with custom headers and contents, as well as with custom tag lists. You should have both tests, in my opinion.

On that note, bonus points if you can do a section definition of .dynamic (e.g. with custom sh_link field set), but with dynamic tags using this kind of format. I think relocation sections work like that?

15

It would be good if these sort of things could be specified both with arbitrary hex values and with references to things, e.g. sections, so that they can be specified like:

DynamicEntries:
  - Tag: DT_STRTAB
    Pointer: .dynstr
llvm/tools/yaml2obj/yaml2elf.cpp
388

Copy/paste error?

389–391

Why not create an explicit DT_NULL tag, rather than calling zero()?

740–741

I'm unconvinced by this line here, unless you're going to auto-populate the dynamic section with tags such as DT_SYMTAB in this case.

amontanez updated this revision to Diff 181412.Jan 11 2019, 6:44 PM
amontanez marked 8 inline comments as done.
amontanez edited the summary of this revision. (Show Details)

I changed a lot and included the majority of what was originally going to be in a second follow-up patch.

Added:

  • Some dynamic entries are validated for correctness.
  • Some dynamic entries can be automatically inferred.
  • Values can be numbers, strings (for things like DT_SONAME and DT_NEEDED, for example), or section names that are converted to addresses.

Changed:

  • initSymtabSectionHeader() is split into initSymtabSectionHeader() and finalizeSymtabSection(). This allows .dynamic strings to be placed in .dynstr alongside the strings from .dynsym (since DotDynstr.finalize() is now separated out).

Removed:

  • SHT_DYNAMIC no longer supports the "Content" tag for sections that have raw content. This breaks tools/llvm-objdump/private-headers-no-dynamic-segment.test
  • zero(SHeader) removed from beginning of finalizeSymtabSection() and initStrtabSectionHeader(). This prevented sh_addr from being populated when symbol tables were specified.
amontanez marked 2 inline comments as done.Jan 14 2019, 9:30 AM
amontanez added inline comments.
llvm/tools/yaml2obj/yaml2elf.cpp
740–741

That was the original goal, it just wasn't included in the patch. I've adjusted the implicit section names for now since that functionality isn't added yet.

ruiu added inline comments.Jan 14 2019, 11:04 AM
llvm/include/llvm/ObjectYAML/ELFYAML.h
94

Hmm, why is this of type StringRef? Isn't this uint64_t?

llvm/tools/yaml2obj/yaml2elf.cpp
377

Perhaps you should inverse the condition and continue early, to reduce the indentation depth.

384

Stray break? (You don't need this because this is at the end of the last case, and if you really want it it is not indented correctly.)

419–421

At first I thought that yamlError(...) creates a new object and you were returning it, but that's actually not the case. You are just returning false. Why don't you write it in two lines?

printError(YamlEntry, "value and linked string table address are inconsistent");
return false;

If you define a helper lambda function inside this function like this

auto Report = [&](StringRef Msg) {
  yaml::Output Yout(errs());
  Yout << YamlObj;
  WithColor::error() << Msg << "\n";
};

then you can write like this:

Report("value and linked string table address are inconsistent");
return false;
480

I don't think you need to work that hard to minimize a scope of a variable.

amontanez marked 2 inline comments as done.Jan 14 2019, 12:01 PM
amontanez added inline comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
94

Making Value a StringRef gives more flexibility for how the value of a dynamic entry can be populated. I've listed a few examples below that illustrate how this is currently used.

Using a section name rather than manually entering the address of a section:

Tag: DT_STRTAB
Value: .dynstr

Using a string that will be added to .dynstr:

Tag: DT_SONAME
Value: libsomething.so

Or, the simple case of using a numeric value:

Tag: DT_STRSZ
Value:  0xa0
ruiu added inline comments.Jan 14 2019, 1:03 PM
llvm/include/llvm/ObjectYAML/ELFYAML.h
94
Tag: DT_STRTAB
Value: .dynstr

What if two or more sections have the same name ".dynstr"? In ELF, at least in spirit, section names are not significant.

I feel that in order to discuss the final file format of the text-based ELF in YAML, code review is not the best place. Also, I think that we shouldn't discuss one YAML tag at a time. I'd rather want to take a look at the final form of the text-based ELF file so that I can get a big picture. Do you have such example? It doesn't have to be a formal document or anything, but just an example or two of a text-based ELF file should suffice.

amontanez marked an inline comment as done.Jan 14 2019, 2:40 PM
amontanez added inline comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
94

What if two or more sections have the same name ".dynstr"? In ELF, at least in spirit, section names are not significant.

I completely agree with your sentiment here. yaml2elf requires section names to be unique (see buildSectionIndex()), and a large portion of the tool's design depends on this. I know this isn't the case for regular ELF files, but it's a design choice this tool makes and a non-trivial amount of yaml2elf would have to be re-written to change that.

I'll draft up a few examples and add them to the main comment thread of this patch so we can focus on that discussion.

The goal of this change is to introduce a new type of YAML section that allows the population of .dynamic entries by providing a list of tag and value pairs. These entries are interpreted (and potentially validated) before being written to the .dynamic section.

The simplest way to satisfy this requirement is for all dynamic entry values to be numeric values. However, this poses a few problems. The first problem I encountered is that if dynamic symbols are specified, the contents of .dynstr can't be manually specified or modified. This inherently prevents entries like DT_SONAME, DT_NEEDED, DT_RPATH, and DT_RUNPATH from being specified alongside dynamic symbols unless everything (all contents of .dynstr and .dynsym) is specified manually through raw contents (i.e. hex strings). The second problem is that in a given YAML file, multiple references to the same thing might not remain consistent. I had an issue where manually entering .dynamic contents via hex would often result in the annotations for the hex being inconsistent with the hex itself. A good example of this is maintaining a correct DT_STRSZ as the contents of a string table are updated.

To address these issues, this patch introduces three ways to input a value for a dynamic entry. All of these cases are illustrated later with an example.

  • For strings that belong in .dynstr, the string itself can be used as the value for an entry.
  • A section name can be used in place of an address. In this case, the value of the dynamic entry is the sh_addr of the specified section.
  • A value can be specified using hexadecimal or decimal (or other bases supported by StringRef::to_integer()).

When writing the .dynamic section entries, yaml2elf can validate the correctness of some tags. For example, DT_STRTAB's value must be consistent with the address specified in the section header of the linked section (in pseudocode, StrtabEnt.Val == SHdrs[Dynamic.sh_link].sh_addr). If dynamic entries have strict enough constraints, they can also be inferred and automatically added (whether or not this should happen and how to control this from the user's side is up for discussion).

Here's an example to illustrate some of this.

!ELF
FileHeader:
  Class:           ELFCLASS64
  Data:            ELFDATA2LSB
  Type:            ET_DYN
  Machine:         EM_X86_64
Sections:
  - Name: .dynsym
    Type: SHT_DYNSYM
    Address: 0x1000
  - Name: .data
    Type: SHT_PROGBITS
    Flags: [ SHF_ALLOC, SHF_WRITE ]
  - Name: .dynamic
    Type: SHT_DYNAMIC
    Entries:
      - Tag: DT_SONAME
        Value: libsomething.so
      - Tag: DT_SYMTAB
        Value: .dynsym
      - Tag: DT_SYMENT
        Value: 0x18
DynamicSymbols:
  Global:
    - Name: foo
      Type: STT_FUNC
      Section: .data
    - Name: bar
      Type: STT_OBJECT
      Section: .data

The final section is of type SHT_DYNAMIC, and the Entries: key illustrates the proposed addition. Walking through the three dynamic entries,

  1. DT_SONAME: The value of this entry is a string that will be inserted into the dynamic string table (.dynstr) alongside the symbol names specified in DynamicSymbols. This is possible due to the nature of .dynstr being represented as a StringTableBuilder, and that .dynamic is linked to .dynstr by default. If the .dynamic section had been linked to a section other than .dynstr, the value of this entry would have to be a number (the offset of the string in the linked string table) rather than a string.
  1. DT_SYMTAB: This tag doesn't have any unique treatment or validation specified, but it illustrates the option of using the name of a section rather than the address. This resolves to 0x1000 since .dynsym is declared with an address of 0x1000. It would have been equally valid to make this entry have a value of 0x1000, but this means that changes to .dynsym's address don't need to be updated in the dynamic entry. It's also worth noting that in the case of DT_SYMTAB it wouldn't be difficult to infer this, meaning it would be possible to later add this entry if it was missing.
  1. DT_SYMENT: This tag also doesn't have any unique treatment or validation specified. Since 0x18 isn't a section name, it is treated as a number. This entry could also easily be automatically inferred.

I understand that automatically adding .dynamic entries that aren't explicitly specified could easily be undesirable, but it also simplifies the process of generating an ELF binary with all the required dynamic entries. Currently I have one proposed "solution" for this: if a DT_NULL entry is specified at the end of a set of dynamic entries, no additional dynamic entries will be generated. This isn't currently implemented, but does give the user control over the behavior. This behavior would need to be very clearly documented. I'm not dead-set on having entries automatically added to the binary even if absent from the explicit list of dynamic entries, though it's something I'd certainly take advantage of when creating tests for llvm-elfabi. I'm very much open to discussion on this.

Hopefully this better explains where the goals of this change, where it currently stands, and why it was designed in this way.

amontanez updated this revision to Diff 181673.EditedJan 14 2019, 5:12 PM
amontanez edited the summary of this revision. (Show Details)

Quick changes to address some of Rui's comments, and introduced support for the older raw "Content" tag so tools/llvm-objdump/private-headers-no-dynamic-segment.test doesn't fail anymore.

I'll try not to make any other changes until we have more discussion about what we'd like this patch to look like at a macro level.

Edit: Oof, forgot to delete the old printYamlError() function. I've fixed it and will include the change in the next update.

amontanez edited the summary of this revision. (Show Details)Jan 14 2019, 5:12 PM

I agree that the design maybe needs a bit more discussion. Perhaps the message you posted in the comment thread should be posted as an RFC on the llvm-dev mailing list?

ruiu added a comment.Jan 17 2019, 11:08 AM

Oh, sorry, I was thinking that this patch is for llvm-tapi, but this is really for yaml2obj. I'm sorry, perhaps my previous comments might be pointless...

I'd like to ask first what is the motivation of this change. As far as I know, yaml2obj is used for generating test binary files in the LLVM lit tests. Do you have a need to generate an ELF file with a .dynamic section for testing? If so, could you explain why?

There were quite a few tests that required the .dynamic section in D55629. D55852 tests also have a number of .dynamic entries as well. Without the ability to write .dynamic entries in yaml2obj, these will continue to contain a good portion of hexidecimal section contents with annotations. This addition to yaml2obj is going to be rather low-priority for me, but I feel it's the best way to make llvm-elfabi tests more legible. It also will make creation of new tests significantly easier.

See also D56791. I needed it to test llvm-readobj's printing of dynamic relocations, which is driven via the dynamic section. I ended up hand-crafting the contents, but it was a bit messy, and definitely unreadable.

grimar added a subscriber: grimar.Feb 5 2019, 12:30 AM

There were quite a few tests that required the .dynamic section in D55629. D55852 tests also have a number of .dynamic entries as well. Without the ability to write .dynamic entries in yaml2obj, these will continue to contain a good portion of hexidecimal section contents with annotations. This addition to yaml2obj is going to be rather low-priority for me, but I feel it's the best way to make llvm-elfabi tests more legible. It also will make creation of new tests significantly easier.

Hi Armando! Sorry, I did not notice this patch earlier. I have a simple solution implemented in D57691 which teaches the tools to parse and dump the .dynamic section and its dynamic tags.
It does not try to do anything else rather than dumping and parsing the existent entries, though that seems to be enough for the purposes of improving the tests.

Would you mind if I try to land it? It seems to me that features of this patch like "dynamic entries are validated for correctness.", "dynamic entries can be automatically inferred." and
others are something more complicated and can/should be implemented on top.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 12:30 AM

This change as-is does a lot, which was what sparked the desire for a RFC. My plan was to trim down this patch once the RFC reached a general consensus so we could have the basic functionality of tag/value pairs and build on that. Further additions (like automatically adding required entries) would be addressed in subsequent changes. If you have anything you'd like to add to the RFC, feel free. I'm fine with you working to land D57691 first if that is high-priority. Otherwise, I can reduce this patch to reflect the general functionality of yours while using some nuances that will prepare for the future features.

Otherwise, I can reduce this patch to reflect the general functionality of yours while using some nuances that will prepare for the future features.

I think we should start from little, so yes, that would be great.
Any other opinions?

As nobody has really fed back on the RFC, I think we should just go with the consensus among the (very limited) set of developers who have responded to that and on here. I'd go ahead and respin this patch to a minimal set and/or progress @grimar's in a similar manner. It might be nice to keep this patch in some form as a sort of target to aim for with the sum of the patches going forwards, i.e. rather than overwriting the diff here, create a new patch with the small subset.

amontanez planned changes to this revision.Feb 8 2019, 2:00 PM

See D57975 for the new patch introducing support for writing .dynamic entries. It doesn't include any changes to obj2yaml, or seek to update existing tests. I'm marking this patch as "planned changes" to reduce confusion as to which should be reviewed.

silvas resigned from this revision.Mar 25 2020, 6:35 PM
grimar removed a subscriber: grimar.Mar 26 2020, 2:05 AM