Before this change, when reading ELF file, elfabi determines number of entries in .dynsym by reading the .gnu.hash section. This change makes elfabi read section headers directly. This change allows elfabi works on ELF files which do not have .gnu.hash sections.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
449 | This code won't find the section header if section headers have been stripped. You'll need to bring back the code that you deleted and extend it to infer the number of symbols from the .hash section. In general you can expect a binary to have at least one of .hash or .gnu.hash if it has a symbol table because there isn't enough information to interpret the dynamic symbol table without a hash table and without reading the section headers. |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
466 | FTR, I think we should move this somewhere to libObject. This will be useful for llvm-readelf/obj which is currently supports inferring symbol table size from .hash, but not from `.gnu.hash. |
What others said - there are six cases we should care about:
- Section headers are stripped and the ELF has no hash/gnu_hash dynamic tag - this can be treated as an error, I think since the ELF gABI requires a hash table.
- Section headers are stripped, but there is a DT_HASH
- Section headers are stripped, but there is a DT_GNU_HASH
- Section headers present but there is no .hash or .gnu.hash section - this might be treated as an error, I think since the ELF gABI requires a hash table.
- Section headers present with .hash section
- Section headers present with .gnu.hash section
I think all of the tools should handle all of these situations (llvm-readelf, llvm-objdump, llvm-elfabi, ...) so it definitely seems like this should be part of the Object library.
So actually, the ELF stub generated by elfabi does not have DT_HASH or DT_GNU_HASH. We have test it they are not required in dynamic linking. It does have section headers. So in the new diff, I moved this function into llvm/include/llvm/Object/ELF.h, and it fallback to parse .gnu.hash and .hash when section headers are not available. It returns 0 if none is found and return an error only if parsing error happens.
What would be best way to test this piece of code? Shall I put a few ELF files in the test directory with section headers, .hash, .gnu.hash stripped and run elfabi on them?
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
449 | I see. The reason we want to change is that we found a so file with empty dynsym and .gnu.hash caused assertion error in llvm-project/llvm/include/llvm/Object/ELFTypes.h:542: ArrayRef<llvm::object::Elf_GnuHash_Impl::Elf_Word> llvm::object::Elf_GnuHash_Impl<llvm::object::ELFType<llvm::support::little, true> >::values(unsigned int) const [ELFT = llvm::object::ELFType<llvm::support::little, true>]: Assertion `DynamicSymCount >= symndx' failed. Since elfabi is primarily used during linking, it is expected the input will have section headers so that why I made the change. In the new diff, I put this function into ELF.h and make it read section headers first and use .gnu.hash and .hash as backup. | |
466 | I see. In the new diff, I put it into llvm/include/llvm/Object/ELF.h, is it a good place for it? |
Sorry for the delay. I and others have been off work for a couple of weeks or so over Christmas. I'll try to get to this sometime in the next couple of days or so.
The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
671 | By my reading of this code, there is no need for separate LastSymIdx and BucketVal variables. Can you get rid of one? |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | Seems like this might enter an infinite loop, if there's a corrupt section without a 0 value at the end of the chain. | |
693 | This should probably return an error if Sec.sh_size % Sec.sh_entsize != 0. If an earlier piece of code makes it impossible to hit that case, then it probably still deserves an assertion at least. | |
695 | This looks inverted to me? Also add a blank line after each of the if statemetns and for loops, to better breka this up into logical chunks. | |
714 | Although it's not strictly needed, I'd add a break here too, to ensure there's no bug risk if this switch were to ever be extended. |
I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.
Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | You are right, but what would be the best way to prevent that? I thought about to determine the max bucket capacity by calculating the differences between indexes in the bucket. However, it is not guaranteed that hashes will evenly distributed in the buckets. It also won't work if this is only one bucket. | |
693 | I make it throw an error when the mod is not 0. Added brace since the return stmt takes 3 lines. | |
695 | Thanks for pointing out. I made a mistake here. Changed to empty() in latest diff. |
What do you mean it seems upset? The review here looks fine to me.
Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.
I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | I know @grimar spent some time fixing loops in hash table code like this in llvm-readelf. Maybe he can provide some guidance there. If nothing else, we can at least make sure the hash table doesn't run off the end of the file. |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | My understanding is that this loop might either run past the end of the section or past the end of the file. It seems that saying "infinite loop" is not really correct then: it should crash sooner ol later, or return a wrong result. There are 2 cases:
Perhaps, getDynSymtabSizeFromGnuHash could also accept one more argument, like void *End or something alike, May be it worth just to pass both Optional<uint64_t> SecSize and void *BufEnd for better diagnostics. |
If I compare Diff 2 and Diff 3 in Phabricator, the result looks weird as rebase changes were included.
Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.
I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.
Sorry, I misunderstood your suggestion. I thought you suggest to use llvm-readelf to verify the dynsym sizes.
I added a test to use llvm-elfabi to read elf files under 3 different conditions:
- ELF contains section headers, .gnu.hash and .hash sections.
- ELF contains .gnu.hash and .hash sections, section headers are stripped.
- ELF contains .hash sections, section headers are stripped.
Please let me know if these tests are sufficient.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | I pass BufEnd into getDynSymtabSizeFromGnuHash function. If section headers are stripped, there seems to be no way to know the size limit of DynSym if the hash tables are corrupted. I think file's BufEnd might be the only choice. |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | Shoudn't getDynSymtabSizeFromGnuHash report an error when attempts to read past the BufEnd? |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
675 | Changed in latest diff. |
Is it important that the .dynsym is used directly when .hash/.gnu.hash is available? At the moment, your test doesn't really show whether it is used or whether you go straight to the dynamic tags for details, I believe - to do that, you'd need to have one given bogus values for the size, and show that this is not sued. Additionally, you have no testing for your error messages, nor do you have testing that shows that the .gnu.hash is used above the .hash section.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
673–674 | LLVM usually uses preincrement (i.e. ++LastSymIdx etc). | |
679 | I think this message would be better off being: "no terminator found for GNU hash section before buffer end" I don't think saying "malformed ELF" adds anything to the message. Also, don't use ".gnu.hash" unless you know that is definitely the section's name (it could be other names too technically). Ideally use the section name itself if it is easily available. Otherwise, just refer to it in general terms as in the example. | |
698 | Similar to above, but also I'd include the sh_size and sh_entsize values in this message, e.g. like the following pseudo code. SecName + " section has sh_size (" + Sec.sh_size + ") % sh_entsize (" + Sec.sh_size + ") that is not 0" | |
734 | I don't think you need this cast? | |
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
2 | ||
3 | ||
4–5 | Alternatively, refer to them by their dynamic tags: ## * Section headers are stripped but there is a DT_GNU_HASH dynamic tag. ## * Section headers are stripped but there is a DT_HASH dynamic tag. I think I prefer this form. | |
11 | You don't need to call llvm-strip - yaml2obj can emit an object without the section header table, as pointed out in my earlier comment: | |
40–42 | ||
141–142 | By using yaml2obj macros, you can avoid having two nearly-identical YAML inputs. See the -D option in yaml2obj. This will also work with the "section headers present/not present options". | |
170–176 | I think this should be above the YAML input. |
I tried a few ways to generate corrupted the ELF file with yaml2obj but the results are not quite good. For the case to test the error message when the terminator in DT_GNU_HASH is not available, the loop at L540 in ELF.h actually ended before It hit BufEnd. The reason is that .strtab and .shstrtab are placed at the end of elf file automatically by yaml2obj and there will almost always be cases to make (*It) & 1 != 0, preventing the code from reach the error message. I think the only way would be placing .gnu.hash at the end of the ELF, and remove the terminator, but that seems to be not possible with yaml2obj. For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
734 | Without it, clang complains: ELF.h:605:12: error: no matching function for call to 'getDynSymtabSizeFromGnuHash' If I recall correctly, bytes_end() returns "unsigned char*" | |
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
4–5 | I will modify this in the next diff. | |
11 | If I add SectionHeaderTable: Sections: [] it will report errors: yaml2obj: error: section '.text' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.data' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.dynsym' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.hash' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.gnu.hash' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.dynstr' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.dynamic' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.strtab' should be present in the 'Sections' or 'Excluded' lists yaml2obj: error: section '.shstrtab' should be present in the 'Sections' or 'Excluded' lists If I try to use: SectionHeaderTable: NoHeaders: true Similar errors will be raised: yaml2obj: error: excluded section referenced: '.text' by symbol 'foo' yaml2obj: error: excluded section referenced: '.data' by symbol 'bar' That's why I used llvm-strip instead of using this new yaml2obj feature. | |
141–142 | The structure of DT_GNU_HASH is quite different from DT_HASH in yaml2obj if you compare L32 to L38 and L39-L51 . That's why I choose to put 2 elf YAML inputs instead of using marco. If you feel keeping these 2 sections in place and only reference one in the .dynamic section with macros is an acceptable approach, I can do that. Since I need to add a test case that section headers are available without any hash tables, I still have to put more than 1 YAML input in this test file. |
It is not a problem for yaml2obj to make the .gnu.hash to be the last section. Yes, it adds implicit sections like .strtab and .shstrtab at the end automatically, but you can explicitly specify them in the "Sections" at any position you need. E.g. before the .gnu.hash.
See broken-dynamic-reloc.test from llvm-readelf test suite:
## Place all implicit sections here to make the .dynsym section to be the ## last in the file. This makes the task of validating offsets a bit easier. - Name: .dynstr Type: SHT_STRTAB - Name: .strtab Type: SHT_STRTAB - Name: .shstrtab Type: SHT_STRTAB - Name: .dynsym Type: SHT_DYNSYM Flags: [ SHF_ALLOC ] Address: 0x300 Offset: 0x300
Can you elaborate what the problem is? There are many examples in both yaml2obj and llvm-readelf test suites that are using the EntSize key to set the sh_entsize.
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
---|---|---|
11 | Using of Sections: [] is not what you want, it doesn't remove the section headers. It can be used to create a SectionHeaderTable: Sections: [] Excluded: - Name: .strtab - Name: .shstrtab Though, you don`t actually need it, as you can write just the following to achieve the same: SectionHeaderTable: Excluded: - Name: .strtab - Name: .shstrtab So, to exclude the section header table you need to use NoHeaders: true. yaml2obj: error: excluded section referenced: '.text' by symbol 'foo' yaml2obj: error: excluded section referenced: '.data' by symbol 'bar' are because your dynamic symbols specify sections: DynamicSymbols: - Name: foo Type: STT_FUNC Section: .text Value: 0x100 Binding: 1 - Name: bar Type: STT_OBJECT Section: .data Value: 0x200 Binding: 1 Can't you just remove Section: .text and Section: .data? |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
734 | Actually, let's just change getDynSymtabSizeFromGnuHash to take a`uint8_t` for BufEnd. I think it's quite common in this code to work with bytes in that form. | |
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
141–142 | I'm just suggesting using macros in the dynamic table. You can also use macros to avoid need for any extra YAML, I think. Just change the DT_* value to something arbitrary/omit individual sections from the section header table using the Exclude option of the SectionHeaderTable key etc depending on the specific example. |
I thought EntSize was automatically generated and could not be modified. Thanks for the note, it solved my issue.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
673–674 | Changed to preincrement | |
698 | The actual SecName is a bit complicated to get and when this error happens it also means the ELF file is probably corrupted, so I put SHT_DYNSYM here instead. Let me know if raw_string_ostream is not the best approach to build string here. | |
734 | Do you mean uint8_t *? Actually the problem is I forgot the const modifier. After adding this, it no longer need casting. | |
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
11 | Removing Section: .text and Section: .data will causing llvm-elfabi's output indicate the symbols are undefined (which is true). I make it pointing something else. I think using the llvm-strip here is better. I need to do a lot of macro hacks to make this test to use a single ELF yaml input, while I can use a single one with llvm-strip. It looks cleaner. | |
141–142 | I see. I use macros in .dynamic and this approach only requires a single YAML input. |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
661 | Don't forget to clang-format this and other code again with your latest changes. | |
698 | You can use Twine instead, I think: return createStringError(object_error::parse_failed, "SHT_DYNSYM section has sh_size (" + Twine(Sec.sh_size) + ") % sh_entsize (" + Sec.sh_entsize + ") that is not 0"); You might need to do .str() on the end of the message too. | |
734 | Yeah, I meant uint8_t *, but adding the const is probably enough. | |
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
11 | I'm not going to argue this strongly, but it's worth noting that llvm-objcopy and llvm-strip operations can have minor side effects, whereas with yaml2obj you have more precise control over what the output actually is. |
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test | ||
---|---|---|
11 | Changed to yaml2obj solution. |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
699–701 | Twine + Twine produces a new Twine, and IIRC string literals get converted to Twines to allow this to happen. Twine is intended for efficient string concatenation, so you concatenate all the Twine instances, and then convert it to a string using .str() as demonstrated. |
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
699–701 | Actually createStringError can take a const Twine & parameter directly, so I removed str() since it is not necessary. Does it look good? |
I don't think I've got any more comments. Please wait for others too though.
llvm/include/llvm/Object/ELF.h | ||
---|---|---|
699–701 | Yes, that looks fine. |
@MaskRay Do you have further comments on this change? Can I get an LGTM from you? Thanks.
The out-of-bounds read issue (by comparing the last two diffs) does not have a test. I am ok if @grimar or @jhenderson thinks this is fine.
nvm. I think the issue is pre-existing so it is probably fine. But it may deserve a test in a separate patch.
If there's missing test coverage, we should add it, though I'm also happy if it's a separate change, if the issue was preexisting.
I plan to land this patch since the issue raised is a preexisting one and I prefer to address it in a separate change.
What would be the best way to use the unit test to cover the out of bound issue (between Diff 9 and 10)? It looks like even with the small bug in Diff 9, the current unit test didn't raise any memory errors even though the pointer It was accessing the Buf.bytes_end().
I'm not 100% sure, but I think if you were to have an input where the hash table was the very last thing in the ELF (i.e. after all sections, and, if present, the section header table too), you might (sometimes) see a crash. Being a memory access issue, it's possible you won't see it every run, but I'd somewhat expect a sanitizer to complain at some point. You could also have a test case where it isn't the last thing in the input and the very next byte is something readable, but of a value where it does something funky. You probably also need to have your input corrupted so that the chain doesn't end before the end of the section.
@haowei Please re-test after rebasing. The SectionHeaderTable syntax has changed. I have fixed read-elf-dynsym.test
Don't forget to clang-format this and other code again with your latest changes.