Page MenuHomePhabricator

[ObjectYAML][yaml2obj][ELF] Add basic support for dynamic entries
Needs ReviewPublic

Authored by amontanez on Thu, Jan 10, 4:56 PM.

Details

Summary

This change introduces basic support for adding .dynamic entries to an ELF binary produced with yaml2obj.
Added:

  • YAML traits for .dynamic section contents.
  • d_tag can be populated by using the names of the tags.
  • yaml2obj writes .dynamic entries.
  • Some dynamic entries are validated for correctness.
  • Some dynamic entries can be automatically inferred.
  • Values can be numbers, strings (for things like DT_SONAME and DT_NEEDED, for example), or section names that are converted to addresses.

Diff Detail

Repository
rL LLVM

Event Timeline

amontanez created this revision.Thu, Jan 10, 4:56 PM
ruiu added a comment.Thu, Jan 10, 5:13 PM

I wonder if you really want to distinguish d_val from d_ptr. Here is the definition of the Elf_Dyn struct:

struct Elf32_Dyn {
  Elf32_Sword d_tag; // Type of dynamic table entry.
  union {
    Elf32_Word d_val; // Integer value of entry.
    Elf32_Addr d_ptr; // Pointer value of entry.
  } d_un;
};

d_val and d_ptr are exclusive and are just aliases to each other. As far as I know, people don't usually try to distinguish them; instead just use d_val.

You could distinguish them and use either d_val and d_ptr depending on a d_tag value, but if you actually to do that, you have to explicitly specify which field, d_val or d_ptr, you want to use for *every* d_tag value. That's perhaps isn't worth it.

So, if I were you, I'd simply use d_val for all d_tag values and don't use d_ptr. In other words, I'd define DynamicEntry as follows:

struct DynamicEntry {
  ELF_DT Tag;
  int64 Val;
};

instead of this:

struct DynamicEntry {
  ELF_DT Tag;
  Optional<uint64_t> Val;
  Optional<llvm::yaml::Hex64> Ptr;
};
amontanez updated this revision to Diff 181192.Thu, Jan 10, 5:13 PM

I was thinking the same while writing this. The only argument I have is that this allows obj2yaml to print values in decimal and pointers in hexadecimal. That's a rather weak argument though. I'm totally fine with simplifying it down to just using d_val.

On the hex versus decimal front, I think hex is the better way to go.

llvm/include/llvm/ObjectYAML/ELFYAML.h
94

Somewhat moot if you follow @ruiu's suggestion (which I support), but this should be an int64_t.

llvm/lib/ObjectYAML/ELFYAML.cpp
60

Where did you get this list from? It looks like you might have missed some that LLVM supports (e.g. the DT_RELR* tags). I'd recommend taking the list from the list in BinaryFormat/DynamicTags.def.

95–98

I'm not sure these 4 really should be accepted, since they don't actually represent true dynamic tag values, but are rather markers.

That being said, I see that the equivalents in SHT_* are, so I'm okay with them being left in.

llvm/test/tools/yaml2obj/dynamic-entry-missing-val.yaml
14 ↗(On Diff #181192)

Is it possible to give more context to this message, e.g. Dynamic entry "DT_FLAGS"... or something similar that allows easier identification?

llvm/test/tools/yaml2obj/dynamic-section.yaml
1 ↗(On Diff #181192)

I think that there was nothing wrong with the original test. I'd like to be able to write dynamic sections with custom headers and contents, as well as with custom tag lists. You should have both tests, in my opinion.

On that note, bonus points if you can do a section definition of .dynamic (e.g. with custom sh_link field set), but with dynamic tags using this kind of format. I think relocation sections work like that?

15 ↗(On Diff #181192)

It would be good if these sort of things could be specified both with arbitrary hex values and with references to things, e.g. sections, so that they can be specified like:

DynamicEntries:
  - Tag: DT_STRTAB
    Pointer: .dynstr
llvm/tools/yaml2obj/yaml2elf.cpp
428

Copy/paste error?

429–431

Why not create an explicit DT_NULL tag, rather than calling zero()?

906–907

I'm unconvinced by this line here, unless you're going to auto-populate the dynamic section with tags such as DT_SYMTAB in this case.

amontanez updated this revision to Diff 181412.Fri, Jan 11, 6:44 PM
amontanez marked 8 inline comments as done.
amontanez edited the summary of this revision. (Show Details)

I changed a lot and included the majority of what was originally going to be in a second follow-up patch.

Added:

  • Some dynamic entries are validated for correctness.
  • Some dynamic entries can be automatically inferred.
  • Values can be numbers, strings (for things like DT_SONAME and DT_NEEDED, for example), or section names that are converted to addresses.

Changed:

  • initSymtabSectionHeader() is split into initSymtabSectionHeader() and finalizeSymtabSection(). This allows .dynamic strings to be placed in .dynstr alongside the strings from .dynsym (since DotDynstr.finalize() is now separated out).

Removed:

  • SHT_DYNAMIC no longer supports the "Content" tag for sections that have raw content. This breaks tools/llvm-objdump/private-headers-no-dynamic-segment.test
  • zero(SHeader) removed from beginning of finalizeSymtabSection() and initStrtabSectionHeader(). This prevented sh_addr from being populated when symbol tables were specified.
amontanez marked 2 inline comments as done.Mon, Jan 14, 9:30 AM
amontanez added inline comments.
llvm/tools/yaml2obj/yaml2elf.cpp
906–907

That was the original goal, it just wasn't included in the patch. I've adjusted the implicit section names for now since that functionality isn't added yet.

ruiu added inline comments.Mon, Jan 14, 11:04 AM
llvm/include/llvm/ObjectYAML/ELFYAML.h
94

Hmm, why is this of type StringRef? Isn't this uint64_t?

llvm/tools/yaml2obj/yaml2elf.cpp
417

Perhaps you should inverse the condition and continue early, to reduce the indentation depth.

424

Stray break? (You don't need this because this is at the end of the last case, and if you really want it it is not indented correctly.)

459–461

At first I thought that yamlError(...) creates a new object and you were returning it, but that's actually not the case. You are just returning false. Why don't you write it in two lines?

printError(YamlEntry, "value and linked string table address are inconsistent");
return false;

If you define a helper lambda function inside this function like this

auto Report = [&](StringRef Msg) {
  yaml::Output Yout(errs());
  Yout << YamlObj;
  WithColor::error() << Msg << "\n";
};

then you can write like this:

Report("value and linked string table address are inconsistent");
return false;
520

I don't think you need to work that hard to minimize a scope of a variable.

amontanez marked 2 inline comments as done.Mon, Jan 14, 12:01 PM
amontanez added inline comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
94

Making Value a StringRef gives more flexibility for how the value of a dynamic entry can be populated. I've listed a few examples below that illustrate how this is currently used.

Using a section name rather than manually entering the address of a section:

Tag: DT_STRTAB
Value: .dynstr

Using a string that will be added to .dynstr:

Tag: DT_SONAME
Value: libsomething.so

Or, the simple case of using a numeric value:

Tag: DT_STRSZ
Value:  0xa0
ruiu added inline comments.Mon, Jan 14, 1:03 PM
llvm/include/llvm/ObjectYAML/ELFYAML.h
94
Tag: DT_STRTAB
Value: .dynstr

What if two or more sections have the same name ".dynstr"? In ELF, at least in spirit, section names are not significant.

I feel that in order to discuss the final file format of the text-based ELF in YAML, code review is not the best place. Also, I think that we shouldn't discuss one YAML tag at a time. I'd rather want to take a look at the final form of the text-based ELF file so that I can get a big picture. Do you have such example? It doesn't have to be a formal document or anything, but just an example or two of a text-based ELF file should suffice.

amontanez marked an inline comment as done.Mon, Jan 14, 2:40 PM
amontanez added inline comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
94

What if two or more sections have the same name ".dynstr"? In ELF, at least in spirit, section names are not significant.

I completely agree with your sentiment here. yaml2elf requires section names to be unique (see buildSectionIndex()), and a large portion of the tool's design depends on this. I know this isn't the case for regular ELF files, but it's a design choice this tool makes and a non-trivial amount of yaml2elf would have to be re-written to change that.

I'll draft up a few examples and add them to the main comment thread of this patch so we can focus on that discussion.

The goal of this change is to introduce a new type of YAML section that allows the population of .dynamic entries by providing a list of tag and value pairs. These entries are interpreted (and potentially validated) before being written to the .dynamic section.

The simplest way to satisfy this requirement is for all dynamic entry values to be numeric values. However, this poses a few problems. The first problem I encountered is that if dynamic symbols are specified, the contents of .dynstr can't be manually specified or modified. This inherently prevents entries like DT_SONAME, DT_NEEDED, DT_RPATH, and DT_RUNPATH from being specified alongside dynamic symbols unless everything (all contents of .dynstr and .dynsym) is specified manually through raw contents (i.e. hex strings). The second problem is that in a given YAML file, multiple references to the same thing might not remain consistent. I had an issue where manually entering .dynamic contents via hex would often result in the annotations for the hex being inconsistent with the hex itself. A good example of this is maintaining a correct DT_STRSZ as the contents of a string table are updated.

To address these issues, this patch introduces three ways to input a value for a dynamic entry. All of these cases are illustrated later with an example.

  • For strings that belong in .dynstr, the string itself can be used as the value for an entry.
  • A section name can be used in place of an address. In this case, the value of the dynamic entry is the sh_addr of the specified section.
  • A value can be specified using hexadecimal or decimal (or other bases supported by StringRef::to_integer()).

When writing the .dynamic section entries, yaml2elf can validate the correctness of some tags. For example, DT_STRTAB's value must be consistent with the address specified in the section header of the linked section (in pseudocode, StrtabEnt.Val == SHdrs[Dynamic.sh_link].sh_addr). If dynamic entries have strict enough constraints, they can also be inferred and automatically added (whether or not this should happen and how to control this from the user's side is up for discussion).

Here's an example to illustrate some of this.

!ELF
FileHeader:
  Class:           ELFCLASS64
  Data:            ELFDATA2LSB
  Type:            ET_DYN
  Machine:         EM_X86_64
Sections:
  - Name: .dynsym
    Type: SHT_DYNSYM
    Address: 0x1000
  - Name: .data
    Type: SHT_PROGBITS
    Flags: [ SHF_ALLOC, SHF_WRITE ]
  - Name: .dynamic
    Type: SHT_DYNAMIC
    Entries:
      - Tag: DT_SONAME
        Value: libsomething.so
      - Tag: DT_SYMTAB
        Value: .dynsym
      - Tag: DT_SYMENT
        Value: 0x18
DynamicSymbols:
  Global:
    - Name: foo
      Type: STT_FUNC
      Section: .data
    - Name: bar
      Type: STT_OBJECT
      Section: .data

The final section is of type SHT_DYNAMIC, and the Entries: key illustrates the proposed addition. Walking through the three dynamic entries,

  1. DT_SONAME: The value of this entry is a string that will be inserted into the dynamic string table (.dynstr) alongside the symbol names specified in DynamicSymbols. This is possible due to the nature of .dynstr being represented as a StringTableBuilder, and that .dynamic is linked to .dynstr by default. If the .dynamic section had been linked to a section other than .dynstr, the value of this entry would have to be a number (the offset of the string in the linked string table) rather than a string.
  1. DT_SYMTAB: This tag doesn't have any unique treatment or validation specified, but it illustrates the option of using the name of a section rather than the address. This resolves to 0x1000 since .dynsym is declared with an address of 0x1000. It would have been equally valid to make this entry have a value of 0x1000, but this means that changes to .dynsym's address don't need to be updated in the dynamic entry. It's also worth noting that in the case of DT_SYMTAB it wouldn't be difficult to infer this, meaning it would be possible to later add this entry if it was missing.
  1. DT_SYMENT: This tag also doesn't have any unique treatment or validation specified. Since 0x18 isn't a section name, it is treated as a number. This entry could also easily be automatically inferred.

I understand that automatically adding .dynamic entries that aren't explicitly specified could easily be undesirable, but it also simplifies the process of generating an ELF binary with all the required dynamic entries. Currently I have one proposed "solution" for this: if a DT_NULL entry is specified at the end of a set of dynamic entries, no additional dynamic entries will be generated. This isn't currently implemented, but does give the user control over the behavior. This behavior would need to be very clearly documented. I'm not dead-set on having entries automatically added to the binary even if absent from the explicit list of dynamic entries, though it's something I'd certainly take advantage of when creating tests for llvm-elfabi. I'm very much open to discussion on this.

Hopefully this better explains where the goals of this change, where it currently stands, and why it was designed in this way.

amontanez updated this revision to Diff 181673.EditedMon, Jan 14, 5:12 PM
amontanez edited the summary of this revision. (Show Details)

Quick changes to address some of Rui's comments, and introduced support for the older raw "Content" tag so tools/llvm-objdump/private-headers-no-dynamic-segment.test doesn't fail anymore.

I'll try not to make any other changes until we have more discussion about what we'd like this patch to look like at a macro level.

Edit: Oof, forgot to delete the old printYamlError() function. I've fixed it and will include the change in the next update.

amontanez edited the summary of this revision. (Show Details)Mon, Jan 14, 5:12 PM

I agree that the design maybe needs a bit more discussion. Perhaps the message you posted in the comment thread should be posted as an RFC on the llvm-dev mailing list?