This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Create name for fake sections
ClosedPublic

Authored by namhyung on Aug 5 2022, 1:36 PM.

Details

Summary

It doesn't have a section header string table so add a vector to have
the strings and create name based on the program header type and the
index.

Diff Detail

Event Timeline

namhyung created this revision.Aug 5 2022, 1:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 1:36 PM
namhyung requested review of this revision.Aug 5 2022, 1:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 1:36 PM

Two thoughts:

  1. Have you looked into using StringTableBuilder at all? I think you should be able to use that instead of hand-rolling your own code to achieve what looks to be the same thing.
  2. I'm not convinced by the naming scheme for the fake section names. Are you following any existing precedent for it? If not, I think we need it to be more obviously different to regular section names, as it doesn't seem unreasonable for people to have custom section names like "load0" for their sections. My immediate thought was something like PT_LOAD#0 for e.g. the PT_LOAD at index 0.

@MaskRay, any thoughts on the naming scheme?

  1. Have you looked into using StringTableBuilder at all?

Nope, will take a look!

  1. Are you following any existing precedent for it?

Yes, it follows binutils objdump.

Does GNU objdump use load0?

% objdump -h llvm/test/Object/Inputs/no-sections.elf-x86-64 

llvm/test/Object/Inputs/no-sections.elf-x86-64:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn

I think PT_LOAD#0 looks good to me, but I don't object to matching GNU objdump for the commonly used PT_LOAD.
There is a small issue that we have to invent names for PT_* values, for values that GNU objdump doesn't hard code.
This is probably an area whether 100% compatibility doesn't matter.

llvm/include/llvm/Object/ELF.h
188

Use SmallString<0> FakeSectionStrings.

Prefer SmallString because std::string has a small string optimization which isn't useful for this case.

Not sure what happened with the input data. But at least it worked with my test case and linux perf kcore image.

$ bin/yaml2obj ../llvm/test/tools/llvm-objdump/X86/disassemble-no-section.test -o nosec.elf

$ objdump -h nosec.elf

nosec.elf:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 load0         0000000a  ffffffff00000000  ffffffff00000000  00001000  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

$ objdump -d nosec.elf

nosec.elf:     file format elf64-x86-64


Disassembly of section load0:

ffffffff00000000 <load0>:
ffffffff00000000:	55                   	push   %rbp
ffffffff00000001:	48 89 e5             	mov    %rsp,%rbp
ffffffff00000004:	0f 1f 40 00          	nopl   0x0(%rax)
ffffffff00000008:	5d                   	pop    %rbp
ffffffff00000009:	c3                   	ret    
namhyung added inline comments.Aug 9 2022, 2:26 PM
llvm/include/llvm/Object/ELF.h
188

But I think it'd be better to use StringTableBuilder as @jhenderson said.

Does GNU objdump use load0?

% objdump -h llvm/test/Object/Inputs/no-sections.elf-x86-64 

llvm/test/Object/Inputs/no-sections.elf-x86-64:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn

I think PT_LOAD#0 looks good to me, but I don't object to matching GNU objdump for the commonly used PT_LOAD.
There is a small issue that we have to invent names for PT_* values, for values that GNU objdump doesn't hard code.
This is probably an area whether 100% compatibility doesn't matter.

I'm not sure we need to worry about inventing names for other PT_* values at this time, because fake sections are only created for executable PT_LOAD, if I'm not mistaken. That being said, I agree that it's unlikely it matters whether this is 100% compatible.

namhyung updated this revision to Diff 451643.Aug 10 2022, 2:53 PM

Updates

  • use StringTableBuilder and SmallString
  • fix bug in the p_type check
namhyung marked an inline comment as done.Aug 10 2022, 2:56 PM
namhyung added inline comments.
llvm/include/llvm/Object/ELF.h
783–784

I've found a bug in this code - it should check equality.

I was just experimenting with an admittedly old version of GNU objdump on my machine and that just tells me that there are no sections in the file and doesn't do any disassembly. It's possible it's just because it's that old version (I don't have a more recent version available currently). If this is available in GNU objdump, could you post the version it was added to, please?

That said, I think there is benefit from this regardless of the parity with GNU.

llvm/include/llvm/Object/ELF.h
780

This probably wants to be a size_t since it's an index.

781

Not sure you need the llvm:: prefixes?

793
llvm/test/Object/objdump-no-sectionheaders.test
0–59

I'd strongly support replacing the pre-canned ELF with a YAML input in this test. At the moment, I have no idea if the behaviour is correct without digging out a binary inspection tool to look at the ELF object.

Some key characteristics I think that need testing:

  1. That non-PT_LOAD segments are ignored (but still count towards the index), even if they have PF_X.
  2. That non-PF_X segments are ignored (but still count towards the index).
  3. That multiple PT_LOAD wiht PF_X segments can be displayed.

To achieve these points, I think it would be sufficient to have an ELF with the following program headers:

  1. PT_PHDR (or other PT_* type) with PF_X
  2. PT_LOAD with PF_X
  3. PT_LOAD without PF_X
  4. PT_LOAD with PF_X

If this is available in GNU objdump, could you post the version it was added to, please?

I don't know when it's added, it's hard to find.. But my version is 2.38.

namhyung@youngsil:llvm-project$ objdump --version
GNU objdump (GNU Binutils for Debian) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
llvm/include/llvm/Object/ELF.h
780

Ok.

781

It seems not. Will remove.

llvm/test/Object/objdump-no-sectionheaders.test
0–59

Sounds good!

MaskRay added inline comments.Aug 13 2022, 6:44 PM
llvm/include/llvm/Object/ELF.h
188

I think StringTableBuilder just adds complexity/overhead in this case. StringTableBuilder cannot deduplicate any string here (and the code does not need to optimize for size).

655

// There is no section name string table. Return FakeSectionStrings which is non-empty if we have created fake sections.

656

Just return FakeSectionStrings; and delete return ""

783–784

The convention does not place parens around !=

jhenderson added inline comments.Aug 15 2022, 12:28 AM
llvm/include/llvm/Object/ELF.h
188

I disagree that it adds complexity. It seems more intuitive to do StrTab.add(someString) than std::copy(someString.begin(), someString.end(), std::back_inserter(StrTab)); followed by a null terminator addition. I do agree that the optimizations are unnecessary. Maybe there's room for a "simple" string builder class that does none of the deduplication/optimization?

namhyung added inline comments.Aug 15 2022, 10:18 PM
llvm/include/llvm/Object/ELF.h
188

I agree with @jhenderson. The API in the StringTableBuilder is simpler and more intuitive to me. Also finalizeInOrder() is even not trying to optimize it.

783–784

Yeah, but I thought it'd good to be symmetric to the RHS.

namhyung updated this revision to Diff 452899.Aug 15 2022, 11:26 PM

Updated.

  • remove unnecessary llvm:: prefix
  • convert disassemble-no-section.test to use yaml2obj

Note that the test fails due to a mismatch on the first section name.
It should be PT_LOAD#1 but it shows PT_LOAD#3. I don't know why. :(

jhenderson added inline comments.Aug 16 2022, 12:34 AM
llvm/include/llvm/Object/ELF.h
655

@namhyung, I believe @MaskRay was suggesting using the comment he's posted instead of "no section string table", as an improvement. Please update.

783–784

Having the "symmetry" has the potential to actually cause confusion, since the LHS isn't negated unlike the RHS.

llvm/test/Object/objdump-no-sectionheaders.test
2
22

I think I see a potential problem here, which could explain the problem you're seeing: you haven't specified any contents or explicit offsets of any of these program headers, which may mean that even though they have addresses specified, they'll start at the same offset within the file. This might result in something bad such as only the one section appearing in the output, although I don't know the llvm-objdump code well enough to be certain.

namhyung added inline comments.Aug 16 2022, 11:50 PM
llvm/include/llvm/Object/ELF.h
655

Oh, I overlooked that, sorry. Will update.

llvm/test/Object/objdump-no-sectionheaders.test
2

ok

22

Will check with contents and offsets.

namhyung updated this revision to Diff 453210.Aug 17 2022, 12:23 AM

v3 updates. But the test still fails.

  • add content and offset to phdr in the test
  • update comments for fake section strings
  • remove unnecessary paranthesis
jhenderson added inline comments.Aug 17 2022, 12:58 AM
llvm/test/Object/objdump-no-sectionheaders.test
2

I misread this sentence. Here's a better improvement.

22

The Offset will physically impact the size of the file on disk, meaning this generated YAML file will be huge. Instead, use offset values around 0x100 - 0x200, which are significantly smaller (and adjust the addresses to line up too, I recommend) (you can't use 0 because of the ELF header and Program Header table). Also, the sizes of the individual blobs of data can be only a couple of bytes for the test purposes to be satisfied.

namhyung updated this revision to Diff 454387.Aug 21 2022, 11:10 PM

v4 updates. But the test still fails..

  • update comment in the test
  • use smaller offsets for the test input

Have you tried using a debugger to investigate the test failure?

I suspect the StringTableBuilder does something wrong, or my understanding is incorrect. I checked with a debugger that the sh_name offset set correctly, but the first string was "PT_LOAD#3" instead of "PT_LOAD#1". Investigating...

Sorry for the delay. I think I found the reason - it passed a stack-allocate string to the string table builder which keeps the cached string ref. So each section name pointed to the same place in stack and got overwritten. Will add a vector to hold the local strings.

namhyung updated this revision to Diff 457813.Sep 3 2022, 1:51 PM

v5 update. Now it passes the test!

  • use a vector for local section name strings
MaskRay accepted this revision.Sep 4 2022, 7:44 PM

LGTM.

llvm/include/llvm/Object/ELF.h
782

for (auto [Idx, Phdr] : llvm::enumerate(*PhdrsOrErr))

This revision is now accepted and ready to land.Sep 4 2022, 7:44 PM
jhenderson requested changes to this revision.Sep 5 2022, 12:44 AM
jhenderson added inline comments.
llvm/include/llvm/Object/ELF.h
794

I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a std::string temporary, and stick that in SecNames. It's likely that the StringRef will be pointing at invalid memory immediately after its construction (it's likely just working because nothing has scribbled over the data in memory yet). Could you not just populate SecNames with the result of the "PT_LOAD#" + std::to_string(Idx) expression directly?

If the StringTableBuilder contains StringRefs to the strings, using a std::vector will potentially invalidate them, if it needs to increase its capacity, right? We probably want a different strategy here. Probably simplest is simply to give the vector an initial capacity equal to the total number of program headers. Given that there are rarely that many, this shouldn't have any significant impact on performance.

This revision now requires changes to proceed.Sep 5 2022, 12:44 AM
MaskRay added inline comments.Sep 5 2022, 2:19 PM
llvm/include/llvm/Object/ELF.h
794

Thanks for catching this. It is a code smell but it actually works: "PT_LOAD#" + std::to_string(Idx) returns a std::string. It works as the the lifetime of the temporary object std::string applies to the full-expression.

The following should be better:

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

// An alternative is ("PT_LOAD#" + Twine(Idx)).toVector(Name); but a local `SmallString` variable is needed
SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

This makes the test failing again.

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

This makes the test failing again.

Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

This makes the test failing again.

Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.

I think we should switch to what I originally suggested, drop StringTableBuilder and use a string with \0 terminators:)

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

This makes the test failing again.

Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.

I think we should switch to what I originally suggested, drop StringTableBuilder and use a string with \0 terminators:)

I'm actually inclined to agree - the "simplification" is turning out to be more complicated than I expected.

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

This makes the test failing again.

Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.

I think we should switch to what I originally suggested, drop StringTableBuilder and use a string with \0 terminators:)

I'm actually inclined to agree - the "simplification" is turning out to be more complicated than I expected.

Let me try one more time before going back. I think this failure came from the std::string implementation. From a quick glance it seems to store small strings inside the object, not in the external allocation. Then a later move due to vector resize will invalidate earlier references. But SmallString seems to store the data always in the external allocation so it'd be fine after move. Actually it passes the test when I change it like below

-  SmallVector<std::string, 0> SecNames;
+  SmallVector<SmallString<10>, 0> SecNames;

Let me know if I missed something.

I don't think you can rely on the behaviour of what happens to the strings when vectors are resized - I don't believe there's anything in the C++ standard or the existing type's definition that will guarantee the behaviour will remain the same. Also, your explanation sounds reversed to me, although I haven't looked at the implementation: I'd expecting SmallString to have a stack-allocated array for storing strings when the size is small, but std::string to always use the heap. In fact, on some implementations, I wouldn't be surprised if std::string also uses stack-allocation for short strings (I recall running into a problem very similar to yours along the same lines recently due to this sort of thing). Either way, it cannot be relied upon to keep your references valid.

namhyung added inline comments.Sep 7 2022, 12:37 AM
llvm/include/llvm/Object/ELF.h
794

I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a std::string temporary, and stick that in SecNames. It's likely that the StringRef will be pointing at invalid memory immediately after its construction (it's likely just working because nothing has scribbled over the data in memory yet).

I don't think so. The StringRef was needed because the constructor of SmallString takes it only. I wanted to pass std::string directly but it's not supported IIUC. After it creates a SmallString, I don't use the StringRef for a temp string anymore. Instead it uses the SmallString data in the vector, that's why I passed SecNames.back(). I believe it's safe to refer the data even after vector is resized.

I don't think you can rely on the behaviour of what happens to the strings when vectors are resized - I don't believe there's anything in the C++ standard or the existing type's definition that will guarantee the behaviour will remain the same.

Fair enough. Let's go back to use a string directly. :)

namhyung updated this revision to Diff 458380.Sep 7 2022, 1:47 AM

v6 update.

  • do not use StringTableBuilder
jhenderson added inline comments.Sep 7 2022, 2:10 AM
llvm/include/llvm/Object/ELF.h
783–784

Do you actually need this?

namhyung added inline comments.Sep 7 2022, 7:35 AM
llvm/include/llvm/Object/ELF.h
783–784

Yep, Idx is a part of the section name.

jhenderson added inline comments.Sep 7 2022, 8:11 AM
llvm/include/llvm/Object/ELF.h
783–784

I wasn't referring to MaskRay's suggestion. I was referring to the line that's actually highlighted with this comment (i.e. line 778 in this diff).

namhyung added inline comments.Sep 7 2022, 10:10 AM
llvm/include/llvm/Object/ELF.h
783–784

Oh, I see. Maybe it's not needed but I think it's a good practice to maintain offset 0 as an empty string. Do you want me to remove it?

MaskRay added inline comments.Sep 7 2022, 1:45 PM
llvm/include/llvm/Object/ELF.h
783–784

FakeSectionStrings += '\0' or use push_back.

llvm/test/Object/objdump-no-sectionheaders.test
1–2

Nit: "This test " can be omitted. Use an imperative sentence.

jhenderson accepted this revision.Sep 8 2022, 12:03 AM

LGTM, with @MaskRay's nits addressed.

llvm/include/llvm/Object/ELF.h
783–784

Having come back to this a day later, I think it's wise to keep it, as we don't want to risk there being code that assumes sh_name == '\0' means no name.

llvm/test/Object/objdump-no-sectionheaders.test
1–2

Concretely: "Check llvm-objdump -h ..."

This revision is now accepted and ready to land.Sep 8 2022, 12:03 AM
namhyung updated this revision to Diff 458671.Sep 8 2022, 12:31 AM

v7 update.

  • use SmallString::operator+=()
  • update comment in the test

Thank you both for the review! Please commit with 'Namhyung Kim <namhyung@google.com>'.

This revision was landed with ongoing or failed builds.Sep 9 2022, 4:27 AM
This revision was automatically updated to reflect the committed changes.

Thank you both for the review! Please commit with 'Namhyung Kim <namhyung@google.com>'.

All submitted. Please keep an eye out for any bot failures.