This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj][ELF] Add support for symbol indexes greater than SHN_LORESERVE
ClosedPublic

Authored by jakehehrlich on Sep 1 2017, 12:32 PM.

Details

Summary

Right now Symbols must be either undefined or defined in a specific section. Some symbols have section indexes like SHN_ABS however. This change adds support for outputting symbols that have such section indexes.

Diff Detail

Repository
rL LLVM

Event Timeline

jakehehrlich created this revision.

Uploaded slightly wrong diff. Fixed.

mcgrathr requested changes to this revision.Sep 1 2017, 12:39 PM
mcgrathr added inline comments.
lib/ObjectYAML/ELFYAML.cpp
491 ↗(On Diff #113576)

SHN_XINDEX is a special animal unlike other special cases. It means the real shndx value is elsewhere. See SHN_XINDEX handling elsewhere in the code base (or the ELF spec).

SHN_UNDEF, SHN_ABS, and SHN_COMMON are standard special cases. Those just go through as they are.
(Note that SHN_UNDEF is not in the [SHN_LORESERVE,SHN_HIRESERVE] range like all the others here are.)

SHN_{LO,HI}* are not actual values, but markers for the ranges used for machine-specific and OS-specific values.
It doesn't make sense to match them.

I think you need to handle each such value specially, because the correct way to treat it will depend on the specific machine/OS-specific definition.
All the ones I've seen are basically variants of SHN_COMMON and are handled the same way (just pass it through). But I think the only safe thing is to go case by case for handling specific ones we understand, and error out if you encounter one we don't know how to handle.

This revision now requires changes to proceed.Sep 1 2017, 12:39 PM
jakehehrlich added inline comments.Sep 1 2017, 12:57 PM
lib/ObjectYAML/ELFYAML.cpp
491 ↗(On Diff #113576)

This is just code that translates strings of t he form " SHN_* " to the appropriate value for use anywhere in yaml2elf. It doesn't handle what happens in these cases any further than that. I think there's an argument that SHN_{LO,HI}* shouldn't be here since you would never want to use those names when building an ELF object. The rest of the file however includes such values however (see SHT_LOOS and SHT_LOPROC in the equivliant code for section types) so I followed suit.

In general ELF_SHN should be the type of values with names SHN_* in this code base; even the ones which are not logically values you would want to use in a yaml file to produce an ELF.

mcgrathr added inline comments.Sep 1 2017, 1:05 PM
lib/ObjectYAML/ELFYAML.cpp
491 ↗(On Diff #113576)

Understood. I think you still need to handle SHN_XINDEX specially though, because it has a structural meaning.
It probably makes more sense for yaml files to list large literal section numbers and have yaml2obj do the SHN_XINDEX encoding for them automagically. But maybe yaml2elf is meant to test that encoding logic too, in which case you'd have to spell out SHN_XINDEX in symtab entries and then spell out the SHT_SYMTAB_SHNDX section with its contents explicitly.

You should certainly handle all the reserved-range SHN_* names in include/llvm/BinaryFormat/ELF.h (there are more that binutils groks).

jakehehrlich added inline comments.Sep 1 2017, 2:00 PM
lib/ObjectYAML/ELFYAML.cpp
491 ↗(On Diff #113576)

I think sean would likely say that yaml2obj should handle large indexes automagically, following the logic that it should be hard to produce invalid files. That being the case I don't think I want to support large indexes in this patch. Perhaps I should throw an error if the user creates a large SHN_XINDEX? As is you can't produce a valid file that uses large section indexes because there's no support for SHT_SYMTAB_SHNDX (I would have to add that).

mcgrathr added inline comments.Sep 1 2017, 2:05 PM
lib/ObjectYAML/ELFYAML.cpp
491 ↗(On Diff #113576)

It sounds fine to have this change only add support for the known SHN_* names. Perhaps it should recognize SHN_XINDEX as a name, but specifically refuse to write it out, since you should never use it explicitly if yaml2obj should not easily produce invalid ELF files. You should also make sure that it refuses numeric section numbers >= SHN_LORESERVE. A separate change to produce SHT_SYMTAB_SHNDX automagically for large number section numbers makes sense.

jakehehrlich edited edge metadata.
  1. Added support for the rest of the SHN_* values in ELF.h
  2. An error should now be thrown if the user tries to use SHN_XINDEX
  1. Added check to make sure the users uses "Section: <section name>" to specify section indexes and not "Index: <value>"

Needed to check that Symbol.Index was set first before comparing it.

mcgrathr accepted this revision.Sep 2 2017, 1:20 AM

Not being expert in the yaml2obj code base, this looks good to me with the caveats below.

include/llvm/ObjectYAML/ELFYAML.h
53 ↗(On Diff #113610)

This is actually a 16-bit field in ElfNN_Sym.

lib/ObjectYAML/ELFYAML.cpp
743 ↗(On Diff #113610)

It looks like llvm::Optional has both operator* and operator==/operator< et al that essentially forward the operator to the value that operator* would yield (in the has-value case, anyway--the overall semantics of the comparison operators on Optional are a bit weird, and I think there's some implicit copy construction in there too), so either Symbol.Index or *Symbol.Index works here, but you should be consistent, and *Symbol.Index looks more natural next to the Symbol.Index->bool test (and I suspect is the only pattern with sensible semantics when dealing with Optional<T> where T is not a trivial POD type like this one).

This revision is now accepted and ready to land.Sep 2 2017, 1:20 AM

I agree that it should be *Symbol.Index. This was a mistake on my part.

  1. Fixed not dereferencing the optional value
  2. Changes indexes to be 16-bit instead of 32-bit

Ideally I'd like to have Sean or Simon look at this first before landing.

silvas accepted this revision.Sep 7 2017, 12:10 PM

Sorry for the delay.

One suggestion for paring down the recognized SHN_* list and how I'm personally conceptualizing my thought process for what to include there.

This LGTM.

lib/ObjectYAML/ELFYAML.cpp
491 ↗(On Diff #113576)

I agree with jake that the SHN_XINDEX is purely an artifact of the file encoding, so we probably don't want to allow the user to explicitly provide it (seems too error-prone)
It seems like the only purpose for supporting SHN_XINDEX explicitly specified in the YAML file would be to test our SHN_XINDEX handling, which seems like it is better done with a different sort of test (probably just checked in binaries since it's a very narrow and specific thing to the binary format; literally just a historical "oops, we should have made this field have more bits" workaround). Since yaml2obj is intended as a testing tool, its inputs should be pretty small and so we shouldn't be needing SHN_XINDEX.

It might be interesting to add an error check/diagnostic in the case of too many sections in the input yaml file, but for now just omitting SHN_XINDEX from the list of possibilities seems fine.

There probably isn't much harm in keeping in the SHN_{LO,HI}* values around, though I assume the common usage of them would be something like SHN_LORESERVE + <N> or similar so a hex value would likely still be needed (or adding support for a semantic constant like has been done for the hexagon ones). So I'd probably lean towards removing the SHN_{LO,HI}* values since it would make the test case difficult to read (for example, of somebody used SHN_LORESERVE instead of SHN_HEXAGON_SCOMMON which has the same numerical value). On the other hand, that breaks the simplicity of "what should we list in ScalarEnumerationTraits<ELFYAML::ELF_SHN>::enumeration", which is currently just every SHN_* it seems. So if we omit some we should definitely have a comment explaining the rationale for what values we allow/omit.

For now, let's just follow a YAGNI principle and just support SHN_ABS since it seems that that's all you really need (and a comment to that effect in ScalarEnumerationTraits<ELFYAML::ELF_SHN>::enumeration). SHN_COMMON is a likely eventual need too. But SHN_UNDEF probably isn't since we handle undef symbols already by omitting the section (so in a sense it is a default value, though conceivably we could allow it to be explicitly specified).

This revision was automatically updated to reflect the committed changes.