Page MenuHomePhabricator

[llvm-elfabi] Support ELF file that lacks .gnu.hash section
Needs ReviewPublic

Authored by haowei on Dec 15 2020, 8:33 PM.

Details

Summary

Before this change, when reading ELF file, elfabi determines number of entries in .dynsym by reading the .gnu.hash section. This change makes elfabi read section headers directly. This change allows elfabi works on ELF files which do not have .gnu.hash sections.

Diff Detail

Event Timeline

haowei created this revision.Dec 15 2020, 8:33 PM
haowei requested review of this revision.Dec 15 2020, 8:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 8:33 PM
pcc added a subscriber: pcc.Dec 15 2020, 8:57 PM
pcc added inline comments.
llvm/lib/InterfaceStub/ELFObjHandler.cpp
449

This code won't find the section header if section headers have been stripped. You'll need to bring back the code that you deleted and extend it to infer the number of symbols from the .hash section. In general you can expect a binary to have at least one of .hash or .gnu.hash if it has a symbol table because there isn't enough information to interpret the dynamic symbol table without a hash table and without reading the section headers.

s/.gnu_hash/.gnu.hash/

grimar added inline comments.Dec 16 2020, 12:31 AM
llvm/lib/InterfaceStub/ELFObjHandler.cpp
466

FTR, I think we should move this somewhere to libObject. This will be useful for llvm-readelf/obj which is currently supports inferring symbol table size from .hash, but not from `.gnu.hash.

What others said - there are six cases we should care about:

  1. Section headers are stripped and the ELF has no hash/gnu_hash dynamic tag - this can be treated as an error, I think since the ELF gABI requires a hash table.
  2. Section headers are stripped, but there is a DT_HASH
  3. Section headers are stripped, but there is a DT_GNU_HASH
  4. Section headers present but there is no .hash or .gnu.hash section - this might be treated as an error, I think since the ELF gABI requires a hash table.
  5. Section headers present with .hash section
  6. Section headers present with .gnu.hash section

I think all of the tools should handle all of these situations (llvm-readelf, llvm-objdump, llvm-elfabi, ...) so it definitely seems like this should be part of the Object library.

haowei updated this revision to Diff 312627.Thu, Dec 17, 3:26 PM
haowei retitled this revision from [llvm-elfabi] Support ELF file that lacks .gnu_hash section to [llvm-elfabi] Support ELF file that lacks .gnu.hash section.
haowei edited the summary of this revision. (Show Details)

What others said - there are six cases we should care about:

  1. Section headers are stripped and the ELF has no hash/gnu_hash dynamic tag - this can be treated as an error, I think since the ELF gABI requires a hash table.
  2. Section headers are stripped, but there is a DT_HASH
  3. Section headers are stripped, but there is a DT_GNU_HASH
  4. Section headers present but there is no .hash or .gnu.hash section - this might be treated as an error, I think since the ELF gABI requires a hash table.
  5. Section headers present with .hash section
  6. Section headers present with .gnu.hash section

I think all of the tools should handle all of these situations (llvm-readelf, llvm-objdump, llvm-elfabi, ...) so it definitely seems like this should be part of the Object library.

So actually, the ELF stub generated by elfabi does not have DT_HASH or DT_GNU_HASH. We have test it they are not required in dynamic linking. It does have section headers. So in the new diff, I moved this function into llvm/include/llvm/Object/ELF.h, and it fallback to parse .gnu.hash and .hash when section headers are not available. It returns 0 if none is found and return an error only if parsing error happens.

What would be best way to test this piece of code? Shall I put a few ELF files in the test directory with section headers, .hash, .gnu.hash stripped and run elfabi on them?

llvm/lib/InterfaceStub/ELFObjHandler.cpp
449

I see. The reason we want to change is that we found a so file with empty dynsym and .gnu.hash caused assertion error in

llvm-project/llvm/include/llvm/Object/ELFTypes.h:542: ArrayRef<llvm::object::Elf_GnuHash_Impl::Elf_Word> llvm::object::Elf_GnuHash_Impl<llvm::object::ELFType<llvm::support::little, true> >::values(unsigned int) const [ELFT = llvm::object::ELFType<llvm::support::little, true>]: Assertion `DynamicSymCount >= symndx' failed.

Since elfabi is primarily used during linking, it is expected the input will have section headers so that why I made the change. In the new diff, I put this function into ELF.h and make it read section headers first and use .gnu.hash and .hash as backup.

466

I see. In the new diff, I put it into llvm/include/llvm/Object/ELF.h, is it a good place for it?

haowei added a comment.Tue, Jan 5, 3:43 PM

friendly ping.

friendly ping.

Sorry for the delay. I and others have been off work for a couple of weeks or so over Christmas. I'll try to get to this sometime in the next couple of days or so.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

llvm/include/llvm/Object/ELF.h
539

By my reading of this code, there is no need for separate LastSymIdx and BucketVal variables. Can you get rid of one?

543

Seems like this might enter an infinite loop, if there's a corrupt section without a 0 value at the end of the chain.

561

This should probably return an error if Sec.sh_size % Sec.sh_entsize != 0. If an earlier piece of code makes it impossible to hit that case, then it probably still deserves an assertion at least.

563

This looks inverted to me?

Also add a blank line after each of the if statemetns and for loops, to better breka this up into logical chunks.

582

Although it's not strictly needed, I'd add a break here too, to ensure there's no bug risk if this switch were to ever be extended.

grimar added inline comments.Mon, Jan 11, 3:42 AM
llvm/include/llvm/Object/ELF.h
563

You can use SectionsOrError->empty() for such checks.

573

You don't need curly bracers around a single line.

576

Don't use auto, because the type is probably not obvious?

585

Just if (ElfGnuHash), the same below.

haowei updated this revision to Diff 315996.Mon, Jan 11, 10:26 PM
haowei marked 8 inline comments as done.

I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.

llvm/include/llvm/Object/ELF.h
543

You are right, but what would be the best way to prevent that? I thought about to determine the max bucket capacity by calculating the differences between indexes in the bucket. However, it is not guaranteed that hashes will evenly distributed in the buckets. It also won't work if this is only one bucket.

561

I make it throw an error when the mod is not 0. Added brace since the return stmt takes 3 lines.

563

Thanks for pointing out. I made a mistake here. Changed to empty() in latest diff.

I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.

What do you mean it seems upset? The review here looks fine to me.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.

I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.

llvm/include/llvm/Object/ELF.h
543

I know @grimar spent some time fixing loops in hash table code like this in llvm-readelf. Maybe he can provide some guidance there. If nothing else, we can at least make sure the hash table doesn't run off the end of the file.

grimar added inline comments.Tue, Jan 12, 12:52 AM
llvm/include/llvm/Object/ELF.h
543

My understanding is that this loop might either run past the end of the section or past the end of the file. It seems that saying "infinite loop" is not really correct then: it should crash sooner ol later, or return a wrong result.

There are 2 cases:

  1. When we have a section header table, we can get the size of the hash table from there.
  2. When we don't have a section header table, we have no information about the hash table size and only can check that we don't go past the end of the file I believe.

Perhaps, getDynSymtabSizeFromGnuHash could also accept one more argument, like void *End or something alike,
which will either point past the end of the section or past the end of the file. The code could check that no overrun happens then.

May be it worth just to pass both Optional<uint64_t> SecSize and void *BufEnd for better diagnostics.

haowei updated this revision to Diff 316566.Wed, Jan 13, 9:59 PM

I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.

What do you mean it seems upset? The review here looks fine to me.

If I compare Diff 2 and Diff 3 in Phabricator, the result looks weird as rebase changes were included.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.

I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.

Sorry, I misunderstood your suggestion. I thought you suggest to use llvm-readelf to verify the dynsym sizes.

I added a test to use llvm-elfabi to read elf files under 3 different conditions:

  • ELF contains section headers, .gnu.hash and .hash sections.
  • ELF contains .gnu.hash and .hash sections, section headers are stripped.
  • ELF contains .hash sections, section headers are stripped.

Please let me know if these tests are sufficient.

llvm/include/llvm/Object/ELF.h
543

I pass BufEnd into getDynSymtabSizeFromGnuHash function. If section headers are stripped, there seems to be no way to know the size limit of DynSym if the hash tables are corrupted. I think file's BufEnd might be the only choice.

grimar added inline comments.Wed, Jan 13, 11:57 PM
llvm/include/llvm/Object/ELF.h
543

Shoudn't getDynSymtabSizeFromGnuHash report an error when attempts to read past the BufEnd?

haowei updated this revision to Diff 316579.Thu, Jan 14, 12:10 AM
haowei marked an inline comment as done.
haowei added inline comments.
llvm/include/llvm/Object/ELF.h
543

Changed in latest diff.

Is it important that the .dynsym is used directly when .hash/.gnu.hash is available? At the moment, your test doesn't really show whether it is used or whether you go straight to the dynamic tags for details, I believe - to do that, you'd need to have one given bogus values for the size, and show that this is not sued. Additionally, you have no testing for your error messages, nor do you have testing that shows that the .gnu.hash is used above the .hash section.

llvm/include/llvm/Object/ELF.h
541–542

LLVM usually uses preincrement (i.e. ++LastSymIdx etc).

547

I think this message would be better off being:

"no terminator found for GNU hash section before buffer end"

I don't think saying "malformed ELF" adds anything to the message. Also, don't use ".gnu.hash" unless you know that is definitely the section's name (it could be other names too technically). Ideally use the section name itself if it is easily available. Otherwise, just refer to it in general terms as in the example.

566

Similar to above, but also I'd include the sh_size and sh_entsize values in this message, e.g. like the following pseudo code.

SecName + " section has sh_size (" + Sec.sh_size + ") % sh_entsize (" + Sec.sh_size + ") that is not 0"

602

I don't think you need this cast?

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
2
3
4–5

Alternatively, refer to them by their dynamic tags:

##  * Section headers are stripped but there is a DT_GNU_HASH dynamic tag.
##  * Section headers are stripped but there is a DT_HASH dynamic tag.

I think I prefer this form.

11

You don't need to call llvm-strip - yaml2obj can emit an object without the section header table, as pointed out in my earlier comment:

For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

40–42
141–142

By using yaml2obj macros, you can avoid having two nearly-identical YAML inputs. See the -D option in yaml2obj. This will also work with the "section headers present/not present options".

170–176

I think this should be above the YAML input.

haowei marked an inline comment as done.Thu, Jan 14, 4:59 PM

Is it important that the .dynsym is used directly when .hash/.gnu.hash is available? At the moment, your test doesn't really show whether it is used or whether you go straight to the dynamic tags for details, I believe - to do that, you'd need to have one given bogus values for the size, and show that this is not sued. Additionally, you have no testing for your error messages, nor do you have testing that shows that the .gnu.hash is used above the .hash section.

I tried a few ways to generate corrupted the ELF file with yaml2obj but the results are not quite good. For the case to test the error message when the terminator in DT_GNU_HASH is not available, the loop at L540 in ELF.h actually ended before It hit BufEnd. The reason is that .strtab and .shstrtab are placed at the end of elf file automatically by yaml2obj and there will almost always be cases to make (*It) & 1 != 0, preventing the code from reach the error message. I think the only way would be placing .gnu.hash at the end of the ELF, and remove the terminator, but that seems to be not possible with yaml2obj. For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

llvm/include/llvm/Object/ELF.h
602

Without it, clang complains:

ELF.h:605:12: error: no matching function for call to 'getDynSymtabSizeFromGnuHash'

If I recall correctly, bytes_end() returns "unsigned char*"

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
4–5

I will modify this in the next diff.

11

If I add

SectionHeaderTable:
  Sections:  []

it will report errors:

yaml2obj: error: section '.text' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.data' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.dynsym' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.hash' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.gnu.hash' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.dynstr' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.dynamic' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.strtab' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.shstrtab' should be present in the 'Sections' or 'Excluded' lists

If I try to use:

SectionHeaderTable:
  NoHeaders: true

Similar errors will be raised:

yaml2obj: error: excluded section referenced: '.text'  by symbol 'foo'
yaml2obj: error: excluded section referenced: '.data'  by symbol 'bar'

That's why I used llvm-strip instead of using this new yaml2obj feature.

141–142

The structure of DT_GNU_HASH is quite different from DT_HASH in yaml2obj if you compare L32 to L38 and L39-L51 . That's why I choose to put 2 elf YAML inputs instead of using marco. If you feel keeping these 2 sections in place and only reference one in the .dynamic section with macros is an acceptable approach, I can do that. Since I need to add a test case that section headers are available without any hash tables, I still have to put more than 1 YAML input in this test file.

grimar added a comment.EditedThu, Jan 14, 11:37 PM

The reason is that .strtab and .shstrtab are placed at the end of elf file automatically by yaml2obj and there will almost always be cases to make (*It) & 1 != 0, preventing the code from reach the error message. I think the only way would be placing .gnu.hash at the end of the ELF, and remove the terminator, but that seems to be not possible with yaml2obj. For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

It is not a problem for yaml2obj to make the .gnu.hash to be the last section. Yes, it adds implicit sections like .strtab and .shstrtab at the end automatically, but you can explicitly specify them in the "Sections" at any position you need. E.g. before the .gnu.hash.
See broken-dynamic-reloc.test from llvm-readelf test suite:

## Place all implicit sections here to make the .dynsym section to be the
## last in the file. This makes the task of validating offsets a bit easier.
  - Name: .dynstr
    Type: SHT_STRTAB
  - Name: .strtab
    Type: SHT_STRTAB
  - Name: .shstrtab
    Type: SHT_STRTAB
  - Name:    .dynsym
    Type:    SHT_DYNSYM
    Flags:   [ SHF_ALLOC ]
    Address: 0x300
    Offset:  0x300

For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

Can you elaborate what the problem is? There are many examples in both yaml2obj and llvm-readelf test suites that are using the EntSize key to set the sh_entsize.

grimar added inline comments.Thu, Jan 14, 11:51 PM
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
11

Using of Sections: [] is not what you want, it doesn't remove the section headers. It can be used to create a
section headers table with a null section only:

SectionHeaderTable:
  Sections: []
  Excluded:
    - Name: .strtab
    - Name: .shstrtab

Though, you don`t actually need it, as you can write just the following to achieve the same:

SectionHeaderTable:
  Excluded:
    - Name: .strtab
    - Name: .shstrtab

So, to exclude the section header table you need to use NoHeaders: true.
The errors you see:

yaml2obj: error: excluded section referenced: '.text'  by symbol 'foo'
yaml2obj: error: excluded section referenced: '.data'  by symbol 'bar'

are because your dynamic symbols specify sections:

DynamicSymbols:
  - Name:    foo
    Type:    STT_FUNC
    Section: .text
    Value:   0x100
    Binding: 1
  - Name:    bar
    Type:    STT_OBJECT
    Section: .data
    Value:   0x200
    Binding: 1

Can't you just remove Section: .text and Section: .data?

jhenderson added inline comments.Fri, Jan 15, 1:34 AM
llvm/include/llvm/Object/ELF.h
602

Actually, let's just change getDynSymtabSizeFromGnuHash to take a`uint8_t` for BufEnd. I think it's quite common in this code to work with bytes in that form.

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
141–142

I'm just suggesting using macros in the dynamic table. You can also use macros to avoid need for any extra YAML, I think. Just change the DT_* value to something arbitrary/omit individual sections from the section header table using the Exclude option of the SectionHeaderTable key etc depending on the specific example.

haowei updated this revision to Diff 317134.Fri, Jan 15, 6:03 PM
haowei updated this revision to Diff 317139.Fri, Jan 15, 6:29 PM
haowei marked 2 inline comments as done.

For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

Can you elaborate what the problem is? There are many examples in both yaml2obj and llvm-readelf test suites that are using the EntSize key to set the sh_entsize.

I thought EntSize was automatically generated and could not be modified. Thanks for the note, it solved my issue.

llvm/include/llvm/Object/ELF.h
541–542

Changed to preincrement

566

The actual SecName is a bit complicated to get and when this error happens it also means the ELF file is probably corrupted, so I put SHT_DYNSYM here instead. Let me know if raw_string_ostream is not the best approach to build string here.

602

Do you mean uint8_t *? Actually the problem is I forgot the const modifier. After adding this, it no longer need casting.

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
11

Removing Section: .text and Section: .data will causing llvm-elfabi's output indicate the symbols are undefined (which is true). I make it pointing something else.

I think using the llvm-strip here is better. I need to do a lot of macro hacks to make this test to use a single ELF yaml input, while I can use a single one with llvm-strip. It looks cleaner.

141–142

I see. I use macros in .dynamic and this approach only requires a single YAML input.