This is an archive of the discontinued LLVM Phabricator instance.

[llvm-elfabi] Support ELF file that lacks .gnu.hash section
ClosedPublic

Authored by haowei on Dec 15 2020, 8:33 PM.

Details

Summary

Before this change, when reading ELF file, elfabi determines number of entries in .dynsym by reading the .gnu.hash section. This change makes elfabi read section headers directly. This change allows elfabi works on ELF files which do not have .gnu.hash sections.

Diff Detail

Event Timeline

haowei created this revision.Dec 15 2020, 8:33 PM
haowei requested review of this revision.Dec 15 2020, 8:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 8:33 PM
pcc added a subscriber: pcc.Dec 15 2020, 8:57 PM
pcc added inline comments.
llvm/lib/InterfaceStub/ELFObjHandler.cpp
453

This code won't find the section header if section headers have been stripped. You'll need to bring back the code that you deleted and extend it to infer the number of symbols from the .hash section. In general you can expect a binary to have at least one of .hash or .gnu.hash if it has a symbol table because there isn't enough information to interpret the dynamic symbol table without a hash table and without reading the section headers.

s/.gnu_hash/.gnu.hash/

grimar added inline comments.Dec 16 2020, 12:31 AM
llvm/lib/InterfaceStub/ELFObjHandler.cpp
465

FTR, I think we should move this somewhere to libObject. This will be useful for llvm-readelf/obj which is currently supports inferring symbol table size from .hash, but not from `.gnu.hash.

What others said - there are six cases we should care about:

  1. Section headers are stripped and the ELF has no hash/gnu_hash dynamic tag - this can be treated as an error, I think since the ELF gABI requires a hash table.
  2. Section headers are stripped, but there is a DT_HASH
  3. Section headers are stripped, but there is a DT_GNU_HASH
  4. Section headers present but there is no .hash or .gnu.hash section - this might be treated as an error, I think since the ELF gABI requires a hash table.
  5. Section headers present with .hash section
  6. Section headers present with .gnu.hash section

I think all of the tools should handle all of these situations (llvm-readelf, llvm-objdump, llvm-elfabi, ...) so it definitely seems like this should be part of the Object library.

haowei updated this revision to Diff 312627.Dec 17 2020, 3:26 PM
haowei retitled this revision from [llvm-elfabi] Support ELF file that lacks .gnu_hash section to [llvm-elfabi] Support ELF file that lacks .gnu.hash section.
haowei edited the summary of this revision. (Show Details)

What others said - there are six cases we should care about:

  1. Section headers are stripped and the ELF has no hash/gnu_hash dynamic tag - this can be treated as an error, I think since the ELF gABI requires a hash table.
  2. Section headers are stripped, but there is a DT_HASH
  3. Section headers are stripped, but there is a DT_GNU_HASH
  4. Section headers present but there is no .hash or .gnu.hash section - this might be treated as an error, I think since the ELF gABI requires a hash table.
  5. Section headers present with .hash section
  6. Section headers present with .gnu.hash section

I think all of the tools should handle all of these situations (llvm-readelf, llvm-objdump, llvm-elfabi, ...) so it definitely seems like this should be part of the Object library.

So actually, the ELF stub generated by elfabi does not have DT_HASH or DT_GNU_HASH. We have test it they are not required in dynamic linking. It does have section headers. So in the new diff, I moved this function into llvm/include/llvm/Object/ELF.h, and it fallback to parse .gnu.hash and .hash when section headers are not available. It returns 0 if none is found and return an error only if parsing error happens.

What would be best way to test this piece of code? Shall I put a few ELF files in the test directory with section headers, .hash, .gnu.hash stripped and run elfabi on them?

llvm/lib/InterfaceStub/ELFObjHandler.cpp
453

I see. The reason we want to change is that we found a so file with empty dynsym and .gnu.hash caused assertion error in

llvm-project/llvm/include/llvm/Object/ELFTypes.h:542: ArrayRef<llvm::object::Elf_GnuHash_Impl::Elf_Word> llvm::object::Elf_GnuHash_Impl<llvm::object::ELFType<llvm::support::little, true> >::values(unsigned int) const [ELFT = llvm::object::ELFType<llvm::support::little, true>]: Assertion `DynamicSymCount >= symndx' failed.

Since elfabi is primarily used during linking, it is expected the input will have section headers so that why I made the change. In the new diff, I put this function into ELF.h and make it read section headers first and use .gnu.hash and .hash as backup.

465

I see. In the new diff, I put it into llvm/include/llvm/Object/ELF.h, is it a good place for it?

haowei added a comment.Jan 5 2021, 3:43 PM

friendly ping.

friendly ping.

Sorry for the delay. I and others have been off work for a couple of weeks or so over Christmas. I'll try to get to this sometime in the next couple of days or so.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

llvm/include/llvm/Object/ELF.h
562 ↗(On Diff #312627)

By my reading of this code, there is no need for separate LastSymIdx and BucketVal variables. Can you get rid of one?

jhenderson added inline comments.Jan 8 2021, 12:52 AM
llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

Seems like this might enter an infinite loop, if there's a corrupt section without a 0 value at the end of the chain.

584 ↗(On Diff #312627)

This should probably return an error if Sec.sh_size % Sec.sh_entsize != 0. If an earlier piece of code makes it impossible to hit that case, then it probably still deserves an assertion at least.

586 ↗(On Diff #312627)

This looks inverted to me?

Also add a blank line after each of the if statemetns and for loops, to better breka this up into logical chunks.

605 ↗(On Diff #312627)

Although it's not strictly needed, I'd add a break here too, to ensure there's no bug risk if this switch were to ever be extended.

grimar added inline comments.Jan 11 2021, 3:42 AM
llvm/include/llvm/Object/ELF.h
586 ↗(On Diff #312627)

You can use SectionsOrError->empty() for such checks.

596 ↗(On Diff #312627)

You don't need curly bracers around a single line.

599 ↗(On Diff #312627)

Don't use auto, because the type is probably not obvious?

608 ↗(On Diff #312627)

Just if (ElfGnuHash), the same below.

haowei updated this revision to Diff 315996.Jan 11 2021, 10:26 PM
haowei marked 8 inline comments as done.

I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.

llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

You are right, but what would be the best way to prevent that? I thought about to determine the max bucket capacity by calculating the differences between indexes in the bucket. However, it is not guaranteed that hashes will evenly distributed in the buckets. It also won't work if this is only one bucket.

584 ↗(On Diff #312627)

I make it throw an error when the mod is not 0. Added brace since the return stmt takes 3 lines.

586 ↗(On Diff #312627)

Thanks for pointing out. I made a mistake here. Changed to empty() in latest diff.

I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.

What do you mean it seems upset? The review here looks fine to me.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.

I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.

llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

I know @grimar spent some time fixing loops in hash table code like this in llvm-readelf. Maybe he can provide some guidance there. If nothing else, we can at least make sure the hash table doesn't run off the end of the file.

grimar added inline comments.Jan 12 2021, 12:52 AM
llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

My understanding is that this loop might either run past the end of the section or past the end of the file. It seems that saying "infinite loop" is not really correct then: it should crash sooner ol later, or return a wrong result.

There are 2 cases:

  1. When we have a section header table, we can get the size of the hash table from there.
  2. When we don't have a section header table, we have no information about the hash table size and only can check that we don't go past the end of the file I believe.

Perhaps, getDynSymtabSizeFromGnuHash could also accept one more argument, like void *End or something alike,
which will either point past the end of the section or past the end of the file. The code could check that no overrun happens then.

May be it worth just to pass both Optional<uint64_t> SecSize and void *BufEnd for better diagnostics.

haowei updated this revision to Diff 316566.Jan 13 2021, 9:59 PM

I rebased the code, it seems upset the Phabricator since it does not distinguish between rebase changes and actual changes.

What do you mean it seems upset? The review here looks fine to me.

If I compare Diff 2 and Diff 3 in Phabricator, the result looks weird as rebase changes were included.

The testing for this is going to be somewhat similar to various llvm-readobj examples, which use yaml2obj to generate an ELF object with the required properties. For example, llvm\test\tools\llvm-readobj\ELF\dyn-symbols-size-from-hash-table.test contains testing for the llvm-readobj handling of dynamic symbols when there are no section headers. There are actually further improvements that can be made beyond that. For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

Do you mean I should also modify llvm-readobj so it use my getDynSymtabSize function to determine .dynsym size? I assume it currently has its own way to determine the number of dynamic symbols.

I'm not sure why you think that is what I mean. You should write a test for the tool you are modifying (llvm-elfabi, if I'm not mistaken), which uses an input object with the specific properties you care about. The test would make the input object using yaml2obj, just as the existing llvm-readobj tests that exercise similar functionality do. You can make the new llvm-elfabi test better than the existing llvm-readobj one, because further improvements have been made to yaml2obj since the llvm-readobj test was written. You don't need to modify llvm-readobj to do that.

Sorry, I misunderstood your suggestion. I thought you suggest to use llvm-readelf to verify the dynsym sizes.

I added a test to use llvm-elfabi to read elf files under 3 different conditions:

  • ELF contains section headers, .gnu.hash and .hash sections.
  • ELF contains .gnu.hash and .hash sections, section headers are stripped.
  • ELF contains .hash sections, section headers are stripped.

Please let me know if these tests are sufficient.

llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

I pass BufEnd into getDynSymtabSizeFromGnuHash function. If section headers are stripped, there seems to be no way to know the size limit of DynSym if the hash tables are corrupted. I think file's BufEnd might be the only choice.

grimar added inline comments.Jan 13 2021, 11:57 PM
llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

Shoudn't getDynSymtabSizeFromGnuHash report an error when attempts to read past the BufEnd?

haowei updated this revision to Diff 316579.Jan 14 2021, 12:10 AM
haowei marked an inline comment as done.
haowei added inline comments.
llvm/include/llvm/Object/ELF.h
566 ↗(On Diff #312627)

Changed in latest diff.

Is it important that the .dynsym is used directly when .hash/.gnu.hash is available? At the moment, your test doesn't really show whether it is used or whether you go straight to the dynamic tags for details, I believe - to do that, you'd need to have one given bogus values for the size, and show that this is not sued. Additionally, you have no testing for your error messages, nor do you have testing that shows that the .gnu.hash is used above the .hash section.

llvm/include/llvm/Object/ELF.h
541–542 ↗(On Diff #316579)

LLVM usually uses preincrement (i.e. ++LastSymIdx etc).

547 ↗(On Diff #316579)

I think this message would be better off being:

"no terminator found for GNU hash section before buffer end"

I don't think saying "malformed ELF" adds anything to the message. Also, don't use ".gnu.hash" unless you know that is definitely the section's name (it could be other names too technically). Ideally use the section name itself if it is easily available. Otherwise, just refer to it in general terms as in the example.

566 ↗(On Diff #316579)

Similar to above, but also I'd include the sh_size and sh_entsize values in this message, e.g. like the following pseudo code.

SecName + " section has sh_size (" + Sec.sh_size + ") % sh_entsize (" + Sec.sh_size + ") that is not 0"

602 ↗(On Diff #316579)

I don't think you need this cast?

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
1 ↗(On Diff #316579)
2 ↗(On Diff #316579)
3–4 ↗(On Diff #316579)

Alternatively, refer to them by their dynamic tags:

##  * Section headers are stripped but there is a DT_GNU_HASH dynamic tag.
##  * Section headers are stripped but there is a DT_HASH dynamic tag.

I think I prefer this form.

10 ↗(On Diff #316579)

You don't need to call llvm-strip - yaml2obj can emit an object without the section header table, as pointed out in my earlier comment:

For example, yaml2obj has recently gained the ability to not emit the section header table, rather than needing to use llvm-strip to remove it. See llvm\test\tools\yaml2obj\ELF\section-headers.yaml for an example.

39–41 ↗(On Diff #316579)
140–141 ↗(On Diff #316579)

By using yaml2obj macros, you can avoid having two nearly-identical YAML inputs. See the -D option in yaml2obj. This will also work with the "section headers present/not present options".

169–175 ↗(On Diff #316579)

I think this should be above the YAML input.

haowei marked an inline comment as done.Jan 14 2021, 4:59 PM

Is it important that the .dynsym is used directly when .hash/.gnu.hash is available? At the moment, your test doesn't really show whether it is used or whether you go straight to the dynamic tags for details, I believe - to do that, you'd need to have one given bogus values for the size, and show that this is not sued. Additionally, you have no testing for your error messages, nor do you have testing that shows that the .gnu.hash is used above the .hash section.

I tried a few ways to generate corrupted the ELF file with yaml2obj but the results are not quite good. For the case to test the error message when the terminator in DT_GNU_HASH is not available, the loop at L540 in ELF.h actually ended before It hit BufEnd. The reason is that .strtab and .shstrtab are placed at the end of elf file automatically by yaml2obj and there will almost always be cases to make (*It) & 1 != 0, preventing the code from reach the error message. I think the only way would be placing .gnu.hash at the end of the ELF, and remove the terminator, but that seems to be not possible with yaml2obj. For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

llvm/include/llvm/Object/ELF.h
602 ↗(On Diff #316579)

Without it, clang complains:

ELF.h:605:12: error: no matching function for call to 'getDynSymtabSizeFromGnuHash'

If I recall correctly, bytes_end() returns "unsigned char*"

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
3–4 ↗(On Diff #316579)

I will modify this in the next diff.

10 ↗(On Diff #316579)

If I add

SectionHeaderTable:
  Sections:  []

it will report errors:

yaml2obj: error: section '.text' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.data' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.dynsym' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.hash' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.gnu.hash' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.dynstr' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.dynamic' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.strtab' should be present in the 'Sections' or 'Excluded' lists
yaml2obj: error: section '.shstrtab' should be present in the 'Sections' or 'Excluded' lists

If I try to use:

SectionHeaderTable:
  NoHeaders: true

Similar errors will be raised:

yaml2obj: error: excluded section referenced: '.text'  by symbol 'foo'
yaml2obj: error: excluded section referenced: '.data'  by symbol 'bar'

That's why I used llvm-strip instead of using this new yaml2obj feature.

140–141 ↗(On Diff #316579)

The structure of DT_GNU_HASH is quite different from DT_HASH in yaml2obj if you compare L32 to L38 and L39-L51 . That's why I choose to put 2 elf YAML inputs instead of using marco. If you feel keeping these 2 sections in place and only reference one in the .dynamic section with macros is an acceptable approach, I can do that. Since I need to add a test case that section headers are available without any hash tables, I still have to put more than 1 YAML input in this test file.

grimar added a comment.EditedJan 14 2021, 11:37 PM

The reason is that .strtab and .shstrtab are placed at the end of elf file automatically by yaml2obj and there will almost always be cases to make (*It) & 1 != 0, preventing the code from reach the error message. I think the only way would be placing .gnu.hash at the end of the ELF, and remove the terminator, but that seems to be not possible with yaml2obj. For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

It is not a problem for yaml2obj to make the .gnu.hash to be the last section. Yes, it adds implicit sections like .strtab and .shstrtab at the end automatically, but you can explicitly specify them in the "Sections" at any position you need. E.g. before the .gnu.hash.
See broken-dynamic-reloc.test from llvm-readelf test suite:

## Place all implicit sections here to make the .dynsym section to be the
## last in the file. This makes the task of validating offsets a bit easier.
  - Name: .dynstr
    Type: SHT_STRTAB
  - Name: .strtab
    Type: SHT_STRTAB
  - Name: .shstrtab
    Type: SHT_STRTAB
  - Name:    .dynsym
    Type:    SHT_DYNSYM
    Flags:   [ SHF_ALLOC ]
    Address: 0x300
    Offset:  0x300

For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

Can you elaborate what the problem is? There are many examples in both yaml2obj and llvm-readelf test suites that are using the EntSize key to set the sh_entsize.

grimar added inline comments.Jan 14 2021, 11:51 PM
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
10 ↗(On Diff #316579)

Using of Sections: [] is not what you want, it doesn't remove the section headers. It can be used to create a
section headers table with a null section only:

SectionHeaderTable:
  Sections: []
  Excluded:
    - Name: .strtab
    - Name: .shstrtab

Though, you don`t actually need it, as you can write just the following to achieve the same:

SectionHeaderTable:
  Excluded:
    - Name: .strtab
    - Name: .shstrtab

So, to exclude the section header table you need to use NoHeaders: true.
The errors you see:

yaml2obj: error: excluded section referenced: '.text'  by symbol 'foo'
yaml2obj: error: excluded section referenced: '.data'  by symbol 'bar'

are because your dynamic symbols specify sections:

DynamicSymbols:
  - Name:    foo
    Type:    STT_FUNC
    Section: .text
    Value:   0x100
    Binding: 1
  - Name:    bar
    Type:    STT_OBJECT
    Section: .data
    Value:   0x200
    Binding: 1

Can't you just remove Section: .text and Section: .data?

jhenderson added inline comments.Jan 15 2021, 1:34 AM
llvm/include/llvm/Object/ELF.h
602 ↗(On Diff #316579)

Actually, let's just change getDynSymtabSizeFromGnuHash to take a`uint8_t` for BufEnd. I think it's quite common in this code to work with bytes in that form.

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
140–141 ↗(On Diff #316579)

I'm just suggesting using macros in the dynamic table. You can also use macros to avoid need for any extra YAML, I think. Just change the DT_* value to something arbitrary/omit individual sections from the section header table using the Exclude option of the SectionHeaderTable key etc depending on the specific example.

haowei updated this revision to Diff 317134.Jan 15 2021, 6:03 PM
haowei updated this revision to Diff 317139.Jan 15 2021, 6:29 PM
haowei marked 2 inline comments as done.

For the case to test error message when sh_size % sh_entsize is not 0, I didn't find any good examples that I can manipulate the sh_entsize through yaml2obj. Do you have good suggestions to write unit tests for these cases?

Can you elaborate what the problem is? There are many examples in both yaml2obj and llvm-readelf test suites that are using the EntSize key to set the sh_entsize.

I thought EntSize was automatically generated and could not be modified. Thanks for the note, it solved my issue.

llvm/include/llvm/Object/ELF.h
541–542 ↗(On Diff #316579)

Changed to preincrement

566 ↗(On Diff #316579)

The actual SecName is a bit complicated to get and when this error happens it also means the ELF file is probably corrupted, so I put SHT_DYNSYM here instead. Let me know if raw_string_ostream is not the best approach to build string here.

602 ↗(On Diff #316579)

Do you mean uint8_t *? Actually the problem is I forgot the const modifier. After adding this, it no longer need casting.

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
10 ↗(On Diff #316579)

Removing Section: .text and Section: .data will causing llvm-elfabi's output indicate the symbols are undefined (which is true). I make it pointing something else.

I think using the llvm-strip here is better. I need to do a lot of macro hacks to make this test to use a single ELF yaml input, while I can use a single one with llvm-strip. It looks cleaner.

140–141 ↗(On Diff #316579)

I see. I use macros in .dynamic and this approach only requires a single YAML input.

jhenderson added inline comments.Jan 18 2021, 12:31 AM
llvm/include/llvm/Object/ELF.h
529 ↗(On Diff #317139)

Don't forget to clang-format this and other code again with your latest changes.

566 ↗(On Diff #316579)

You can use Twine instead, I think:

return createStringError(object_error::parse_failed, "SHT_DYNSYM section has sh_size ("
  + Twine(Sec.sh_size) + ") % sh_entsize (" + Sec.sh_entsize + ") that is not 0");

You might need to do .str() on the end of the message too.

602 ↗(On Diff #316579)

Yeah, I meant uint8_t *, but adding the const is probably enough.

llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
10 ↗(On Diff #316579)

I'm not going to argue this strongly, but it's worth noting that llvm-objcopy and llvm-strip operations can have minor side effects, whereas with yaml2obj you have more precise control over what the output actually is.

haowei updated this revision to Diff 317738.Jan 19 2021, 6:09 PM
haowei marked 3 inline comments as done.
haowei added inline comments.
llvm/test/tools/llvm-elfabi/read-elf-dynsym.test
10 ↗(On Diff #316579)

Changed to yaml2obj solution.

jhenderson added inline comments.Jan 20 2021, 12:44 AM
llvm/include/llvm/Object/ELF.h
567–569 ↗(On Diff #317738)

Twine + Twine produces a new Twine, and IIRC string literals get converted to Twines to allow this to happen. Twine is intended for efficient string concatenation, so you concatenate all the Twine instances, and then convert it to a string using .str() as demonstrated.

haowei updated this revision to Diff 317803.Jan 20 2021, 1:21 AM
haowei added inline comments.
llvm/include/llvm/Object/ELF.h
567–569 ↗(On Diff #317738)

Actually createStringError can take a const Twine & parameter directly, so I removed str() since it is not necessary. Does it look good?

jhenderson accepted this revision.Jan 20 2021, 6:39 AM

I don't think I've got any more comments. Please wait for others too though.

llvm/include/llvm/Object/ELF.h
567–569 ↗(On Diff #317738)

Yes, that looks fine.

This revision is now accepted and ready to land.Jan 20 2021, 6:39 AM
MaskRay requested changes to this revision.Jan 20 2021, 10:35 AM
MaskRay added inline comments.
llvm/include/llvm/Object/ELF.h
541 ↗(On Diff #317803)

This looks wrong. It < BufEnd

545 ↗(On Diff #317803)

It >= BufEnd

This revision now requires changes to proceed.Jan 20 2021, 10:35 AM
haowei updated this revision to Diff 317953.Jan 20 2021, 11:25 AM
haowei marked 2 inline comments as done.

@MaskRay Do you have further comments on this change? Can I get an LGTM from you? Thanks.

@MaskRay Do you have further comments on this change? Can I get an LGTM from you? Thanks.

The out-of-bounds read issue (by comparing the last two diffs) does not have a test. I am ok if @grimar or @jhenderson thinks this is fine.

MaskRay accepted this revision.Jan 22 2021, 10:58 AM

nvm. I think the issue is pre-existing so it is probably fine. But it may deserve a test in a separate patch.

This revision is now accepted and ready to land.Jan 22 2021, 10:58 AM

If there's missing test coverage, we should add it, though I'm also happy if it's a separate change, if the issue was preexisting.

I plan to land this patch since the issue raised is a preexisting one and I prefer to address it in a separate change.

What would be the best way to use the unit test to cover the out of bound issue (between Diff 9 and 10)? It looks like even with the small bug in Diff 9, the current unit test didn't raise any memory errors even though the pointer It was accessing the Buf.bytes_end().

I plan to land this patch since the issue raised is a preexisting one and I prefer to address it in a separate change.

What would be the best way to use the unit test to cover the out of bound issue (between Diff 9 and 10)? It looks like even with the small bug in Diff 9, the current unit test didn't raise any memory errors even though the pointer It was accessing the Buf.bytes_end().

I'm not 100% sure, but I think if you were to have an input where the hash table was the very last thing in the ELF (i.e. after all sections, and, if present, the section header table too), you might (sometimes) see a crash. Being a memory access issue, it's possible you won't see it every run, but I'd somewhat expect a sanitizer to complain at some point. You could also have a test case where it isn't the last thing in the input and the very next byte is something readable, but of a value where it does something funky. You probably also need to have your input corrupted so that the chain doesn't end before the end of the section.

@haowei Please re-test after rebasing. The SectionHeaderTable syntax has changed. I have fixed read-elf-dynsym.test

@haowei Please re-test after rebasing. The SectionHeaderTable syntax has changed. I have fixed read-elf-dynsym.test

Thanks for fixing that. Sorry about the mess up.