This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Object/
-
llvm/
-
Object/
10/28
ELF.h
-
test/
-
Object/
2/10
objdump-no-sectionheaders.test
-
tools/llvm-objdump/X86/
-
llvm-objdump/
-
X86/
-
disassemble-no-section.test
-
tools/llvm-objdump/
-
llvm-objdump/
-
llvm-objdump.cpp

Differential D131290

[llvm-objdump] Create name for fake sections
ClosedPublic

Authored by namhyung on Aug 5 2022, 1:36 PM.

Download Raw Diff

Details

Reviewers

Bigcheese
MaskRay
jhenderson

Commits

rG43efb5e445fd: [llvm-objdump] Create name for fake sections

Summary

It doesn't have a section header string table so add a vector to have
the strings and create name based on the program header type and the
index.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

namhyung created this revision.Aug 5 2022, 1:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 1:36 PM

Herald added subscribers: StephenFan, rupprecht. · View Herald Transcript

namhyung requested review of this revision.Aug 5 2022, 1:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 1:36 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 5 2022, 1:36 PM

Harbormaster completed remote builds in B179583: Diff 450378.Aug 5 2022, 2:21 PM

Two thoughts:

Have you looked into using StringTableBuilder at all? I think you should be able to use that instead of hand-rolling your own code to achieve what looks to be the same thing.
I'm not convinced by the naming scheme for the fake section names. Are you following any existing precedent for it? If not, I think we need it to be more obviously different to regular section names, as it doesn't seem unreasonable for people to have custom section names like "load0" for their sections. My immediate thought was something like PT_LOAD#0 for e.g. the PT_LOAD at index 0.

@MaskRay, any thoughts on the naming scheme?

Have you looked into using StringTableBuilder at all?

Nope, will take a look!

Are you following any existing precedent for it?

Yes, it follows binutils objdump.

Does GNU objdump use load0?

% objdump -h llvm/test/Object/Inputs/no-sections.elf-x86-64 

llvm/test/Object/Inputs/no-sections.elf-x86-64:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn

I think PT_LOAD#0 looks good to me, but I don't object to matching GNU objdump for the commonly used PT_LOAD.
There is a small issue that we have to invent names for PT_* values, for values that GNU objdump doesn't hard code.
This is probably an area whether 100% compatibility doesn't matter.

llvm/include/llvm/Object/ELF.h
186	Use `SmallString<0> FakeSectionStrings`. Prefer `SmallString` because `std::string` has a small string optimization which isn't useful for this case.

Not sure what happened with the input data. But at least it worked with my test case and linux perf kcore image.

$ bin/yaml2obj ../llvm/test/tools/llvm-objdump/X86/disassemble-no-section.test -o nosec.elf

$ objdump -h nosec.elf

nosec.elf:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 load0         0000000a  ffffffff00000000  ffffffff00000000  00001000  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

$ objdump -d nosec.elf

nosec.elf:     file format elf64-x86-64


Disassembly of section load0:

ffffffff00000000 <load0>:
ffffffff00000000:	55                   	push   %rbp
ffffffff00000001:	48 89 e5             	mov    %rsp,%rbp
ffffffff00000004:	0f 1f 40 00          	nopl   0x0(%rax)
ffffffff00000008:	5d                   	pop    %rbp
ffffffff00000009:	c3                   	ret

namhyung added inline comments.Aug 9 2022, 2:26 PM

llvm/include/llvm/Object/ELF.h
186	But I think it'd be better to use StringTableBuilder as @jhenderson said.

In D131290#3710905, @MaskRay wrote:
Does GNU objdump use load0?
% objdump -h llvm/test/Object/Inputs/no-sections.elf-x86-64 

llvm/test/Object/Inputs/no-sections.elf-x86-64:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
I think PT_LOAD#0 looks good to me, but I don't object to matching GNU objdump for the commonly used PT_LOAD.
There is a small issue that we have to invent names for PT_* values, for values that GNU objdump doesn't hard code.
This is probably an area whether 100% compatibility doesn't matter.

I'm not sure we need to worry about inventing names for other PT_* values at this time, because fake sections are only created for executable PT_LOAD, if I'm not mistaken. That being said, I agree that it's unlikely it matters whether this is 100% compatible.

Updates

use StringTableBuilder and SmallString
fix bug in the p_type check

namhyung marked an inline comment as done.Aug 10 2022, 2:56 PM

namhyung added inline comments.

llvm/include/llvm/Object/ELF.h
778–780	I've found a bug in this code - it should check equality.

Harbormaster completed remote builds in B180530: Diff 451643.Aug 10 2022, 4:54 PM

I was just experimenting with an admittedly old version of GNU objdump on my machine and that just tells me that there are no sections in the file and doesn't do any disassembly. It's possible it's just because it's that old version (I don't have a more recent version available currently). If this is available in GNU objdump, could you post the version it was added to, please?

That said, I think there is benefit from this regardless of the parity with GNU.

llvm/include/llvm/Object/ELF.h
778	This probably wants to be a size_t since it's an index.
779	Not sure you need the `llvm::` prefixes?
788
llvm/test/Object/objdump-no-sectionheaders.test
0–59	I'd strongly support replacing the pre-canned ELF with a YAML input in this test. At the moment, I have no idea if the behaviour is correct without digging out a binary inspection tool to look at the ELF object. Some key characteristics I think that need testing: That non-PT_LOAD segments are ignored (but still count towards the index), even if they have PF_X. That non-PF_X segments are ignored (but still count towards the index). That multiple PT_LOAD wiht PF_X segments can be displayed. To achieve these points, I think it would be sufficient to have an ELF with the following program headers: PT_PHDR (or other PT_* type) with PF_X PT_LOAD with PF_X PT_LOAD without PF_X PT_LOAD with PF_X

If this is available in GNU objdump, could you post the version it was added to, please?

I don't know when it's added, it's hard to find.. But my version is 2.38.

namhyung@youngsil:llvm-project$ objdump --version
GNU objdump (GNU Binutils for Debian) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.

llvm/include/llvm/Object/ELF.h
778	Ok.
779	It seems not. Will remove.
llvm/test/Object/objdump-no-sectionheaders.test
0–59	Sounds good!

MaskRay added inline comments.Aug 13 2022, 6:44 PM

llvm/include/llvm/Object/ELF.h
186	I think `StringTableBuilder` just adds complexity/overhead in this case. `StringTableBuilder` cannot deduplicate any string here (and the code does not need to optimize for size).
653	`// There is no section name string table. Return FakeSectionStrings which is non-empty if we have created fake sections.`
654	Just return `FakeSectionStrings;` and delete `return ""`
778–780	The convention does not place parens around `!=`

jhenderson added inline comments.Aug 15 2022, 12:28 AM

llvm/include/llvm/Object/ELF.h
186	I disagree that it adds complexity. It seems more intuitive to do `StrTab.add(someString)` than `std::copy(someString.begin(), someString.end(), std::back_inserter(StrTab));` followed by a null terminator addition. I do agree that the optimizations are unnecessary. Maybe there's room for a "simple" string builder class that does none of the deduplication/optimization?

namhyung added inline comments.Aug 15 2022, 10:18 PM

llvm/include/llvm/Object/ELF.h
186	I agree with @jhenderson. The API in the StringTableBuilder is simpler and more intuitive to me. Also `finalizeInOrder()` is even not trying to optimize it.
778–780	Yeah, but I thought it'd good to be symmetric to the RHS.

Updated.

remove unnecessary llvm:: prefix
convert disassemble-no-section.test to use yaml2obj

Note that the test fails due to a mismatch on the first section name.
It should be PT_LOAD#1 but it shows PT_LOAD#3. I don't know why. :(

jhenderson added inline comments.Aug 16 2022, 12:34 AM

llvm/include/llvm/Object/ELF.h
653	@namhyung, I believe @MaskRay was suggesting using the comment he's posted instead of "no section string table", as an improvement. Please update.
778–780	Having the "symmetry" has the potential to actually cause confusion, since the LHS isn't negated unlike the RHS.
llvm/test/Object/objdump-no-sectionheaders.test
2
22	I think I see a potential problem here, which could explain the problem you're seeing: you haven't specified any contents or explicit offsets of any of these program headers, which may mean that even though they have addresses specified, they'll start at the same offset within the file. This might result in something bad such as only the one section appearing in the output, although I don't know the llvm-objdump code well enough to be certain.

Harbormaster completed remote builds in B181448: Diff 452899.Aug 16 2022, 12:34 AM

namhyung added inline comments.Aug 16 2022, 11:50 PM

llvm/include/llvm/Object/ELF.h
653	Oh, I overlooked that, sorry. Will update.
llvm/test/Object/objdump-no-sectionheaders.test
2	ok
22	Will check with contents and offsets.

v3 updates. But the test still fails.

add content and offset to phdr in the test
update comments for fake section strings
remove unnecessary paranthesis

jhenderson added inline comments.Aug 17 2022, 12:58 AM

llvm/test/Object/objdump-no-sectionheaders.test
2	I misread this sentence. Here's a better improvement.
22	The `Offset` will physically impact the size of the file on disk, meaning this generated YAML file will be huge. Instead, use offset values around 0x100 - 0x200, which are significantly smaller (and adjust the addresses to line up too, I recommend) (you can't use 0 because of the ELF header and Program Header table). Also, the sizes of the individual blobs of data can be only a couple of bytes for the test purposes to be satisfied.

Harbormaster completed remote builds in B181690: Diff 453210.Aug 17 2022, 1:32 AM

v4 updates. But the test still fails..

update comment in the test
use smaller offsets for the test input

Harbormaster completed remote builds in B182509: Diff 454387.Aug 21 2022, 11:44 PM

Have you tried using a debugger to investigate the test failure?

I suspect the StringTableBuilder does something wrong, or my understanding is incorrect. I checked with a debugger that the sh_name offset set correctly, but the first string was "PT_LOAD#3" instead of "PT_LOAD#1". Investigating...

Sorry for the delay. I think I found the reason - it passed a stack-allocate string to the string table builder which keeps the cached string ref. So each section name pointed to the same place in stack and got overwritten. Will add a vector to hold the local strings.

v5 update. Now it passes the test!

use a vector for local section name strings

Harbormaster completed remote builds in B184972: Diff 457813.Sep 3 2022, 3:38 PM

LGTM.

llvm/include/llvm/Object/ELF.h
777–778	`for (auto [Idx, Phdr] : llvm::enumerate(*PhdrsOrErr))`

This revision is now accepted and ready to land.Sep 4 2022, 7:44 PM

jhenderson requested changes to this revision.Sep 5 2022, 12:44 AM

jhenderson added inline comments.

llvm/include/llvm/Object/ELF.h
789	I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a std::string temporary, and stick that in SecNames. It's likely that the StringRef will be pointing at invalid memory immediately after its construction (it's likely just working because nothing has scribbled over the data in memory yet). Could you not just populate `SecNames` with the result of the `"PT_LOAD#" + std::to_string(Idx)` expression directly? If the StringTableBuilder contains StringRefs to the strings, using a std::vector will potentially invalidate them, if it needs to increase its capacity, right? We probably want a different strategy here. Probably simplest is simply to give the vector an initial capacity equal to the total number of program headers. Given that there are rarely that many, this shouldn't have any significant impact on performance.

This revision now requires changes to proceed.Sep 5 2022, 12:44 AM

MaskRay added inline comments.Sep 5 2022, 2:19 PM

llvm/include/llvm/Object/ELF.h
789	Thanks for catching this. It is a code smell but it actually works: `"PT_LOAD#" + std::to_string(Idx)` returns a `std::string`. It works as the the lifetime of the temporary object `std::string` applies to the full-expression. The following should be better: SmallVector<std::string, 0> SecNames; ... SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str()); // An alternative is ("PT_LOAD#" + Twine(Idx)).toVector(Name); but a local `SmallString` variable is needed

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

This makes the test failing again.

In D131290#3771385, @namhyung wrote:
SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());
This makes the test failing again.

Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.

In D131290#3771459, @jhenderson wrote:
In D131290#3771385, @namhyung wrote:
SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());
This makes the test failing again.
Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.

I think we should switch to what I originally suggested, drop StringTableBuilder and use a string with \0 terminators:)

In D131290#3771470, @MaskRay wrote:
In D131290#3771459, @jhenderson wrote:
In D131290#3771385, @namhyung wrote:
SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());
This makes the test failing again.
Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.
I think we should switch to what I originally suggested, drop StringTableBuilder and use a string with \0 terminators:)

I'm actually inclined to agree - the "simplification" is turning out to be more complicated than I expected.

In D131290#3771526, @jhenderson wrote:
In D131290#3771470, @MaskRay wrote:
In D131290#3771459, @jhenderson wrote:
In D131290#3771385, @namhyung wrote:
SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());
This makes the test failing again.
Please see the second half of my previous comment re. vector resizing (can apply to SmallVector too potentially). I would expect that to be a problem.
I think we should switch to what I originally suggested, drop StringTableBuilder and use a string with \0 terminators:)
I'm actually inclined to agree - the "simplification" is turning out to be more complicated than I expected.

Let me try one more time before going back. I think this failure came from the std::string implementation. From a quick glance it seems to store small strings inside the object, not in the external allocation. Then a later move due to vector resize will invalidate earlier references. But SmallString seems to store the data always in the external allocation so it'd be fine after move. Actually it passes the test when I change it like below

-  SmallVector<std::string, 0> SecNames;
+  SmallVector<SmallString<10>, 0> SecNames;

Let me know if I missed something.

I don't think you can rely on the behaviour of what happens to the strings when vectors are resized - I don't believe there's anything in the C++ standard or the existing type's definition that will guarantee the behaviour will remain the same. Also, your explanation sounds reversed to me, although I haven't looked at the implementation: I'd expecting SmallString to have a stack-allocated array for storing strings when the size is small, but std::string to always use the heap. In fact, on some implementations, I wouldn't be surprised if std::string also uses stack-allocation for short strings (I recall running into a problem very similar to yours along the same lines recently due to this sort of thing). Either way, it cannot be relied upon to keep your references valid.

namhyung added inline comments.Sep 7 2022, 12:37 AM

llvm/include/llvm/Object/ELF.h
789	I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a std::string temporary, and stick that in SecNames. It's likely that the StringRef will be pointing at invalid memory immediately after its construction (it's likely just working because nothing has scribbled over the data in memory yet). I don't think so. The StringRef was needed because the constructor of SmallString takes it only. I wanted to pass std::string directly but it's not supported IIUC. After it creates a SmallString, I don't use the StringRef for a temp string anymore. Instead it uses the SmallString data in the vector, that's why I passed `SecNames.back()`. I believe it's safe to refer the data even after vector is resized.

In D131290#3773833, @jhenderson wrote:

I don't think you can rely on the behaviour of what happens to the strings when vectors are resized - I don't believe there's anything in the C++ standard or the existing type's definition that will guarantee the behaviour will remain the same.

Fair enough. Let's go back to use a string directly. :)

v6 update.

do not use StringTableBuilder

jhenderson added inline comments.Sep 7 2022, 2:10 AM

llvm/include/llvm/Object/ELF.h
778	Do you actually need this?

Harbormaster completed remote builds in B185367: Diff 458380.Sep 7 2022, 2:52 AM

namhyung added inline comments.Sep 7 2022, 7:35 AM

llvm/include/llvm/Object/ELF.h
778	Yep, Idx is a part of the section name.

jhenderson added inline comments.Sep 7 2022, 8:11 AM

llvm/include/llvm/Object/ELF.h
778	I wasn't referring to MaskRay's suggestion. I was referring to the line that's actually highlighted with this comment (i.e. line 778 in this diff).

namhyung added inline comments.Sep 7 2022, 10:10 AM

llvm/include/llvm/Object/ELF.h
778	Oh, I see. Maybe it's not needed but I think it's a good practice to maintain offset 0 as an empty string. Do you want me to remove it?

MaskRay added inline comments.Sep 7 2022, 1:45 PM

llvm/include/llvm/Object/ELF.h
778	`FakeSectionStrings += '\0'` or use push_back.
llvm/test/Object/objdump-no-sectionheaders.test
1	Nit: "This test " can be omitted. Use an imperative sentence.

LGTM, with @MaskRay's nits addressed.

llvm/include/llvm/Object/ELF.h
778	Having come back to this a day later, I think it's wise to keep it, as we don't want to risk there being code that assumes `sh_name == '\0'` means no name.
llvm/test/Object/objdump-no-sectionheaders.test
1	Concretely: "Check llvm-objdump -h ..."

This revision is now accepted and ready to land.Sep 8 2022, 12:03 AM

v7 update.

use SmallString::operator+=()
update comment in the test

Thank you both for the review! Please commit with 'Namhyung Kim <namhyung@google.com>'.

Harbormaster completed remote builds in B185576: Diff 458671.Sep 8 2022, 1:27 AM

This revision was landed with ongoing or failed builds.Sep 9 2022, 4:27 AM

Closed by commit rG43efb5e445fd: [llvm-objdump] Create name for fake sections (authored by namhyung, committed by jhenderson). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhenderson added a commit: rG43efb5e445fd: [llvm-objdump] Create name for fake sections.

In D131290#3776478, @namhyung wrote:

Thank you both for the review! Please commit with 'Namhyung Kim <namhyung@google.com>'.

All submitted. Please keep an eye out for any bot failures.

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

ELF.h

18 lines

test/

Object/

objdump-no-sectionheaders.test

65 lines

tools/

llvm-objdump/

X86/

disassemble-no-section.test

8 lines

tools/

llvm-objdump/

llvm-objdump.cpp

6 lines

Diff 458380

llvm/include/llvm/Object/ELF.h

//===- ELF.h - ELF object file implementation -------------------*- C++ -*-===// //===- ELF.h - ELF object file implementation -------------------*- C++ -*-===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// This file declares the ELFFile template class. // This file declares the ELFFile template class.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#ifndef LLVM_OBJECT_ELF_H #ifndef LLVM_OBJECT_ELF_H

#define LLVM_OBJECT_ELF_H #define LLVM_OBJECT_ELF_H

#include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/ArrayRef.h"

#include "llvm/ADT/SmallString.h"

#include "llvm/ADT/SmallVector.h" #include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/StringRef.h" #include "llvm/ADT/StringRef.h"

#include "llvm/BinaryFormat/ELF.h" #include "llvm/BinaryFormat/ELF.h"

#include "llvm/Object/ELFTypes.h" #include "llvm/Object/ELFTypes.h"

#include "llvm/Object/Error.h" #include "llvm/Object/Error.h"

#include "llvm/Support/Endian.h" #include "llvm/Support/Endian.h"

#include "llvm/Support/Error.h" #include "llvm/Support/Error.h"

#include <cassert> #include <cassert>

▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines public:

const uint8_t *base() const { return Buf.bytes_begin(); } const uint8_t *base() const { return Buf.bytes_begin(); }

const uint8_t *end() const { return base() + getBufSize(); } const uint8_t *end() const { return base() + getBufSize(); }

size_t getBufSize() const { return Buf.size(); } size_t getBufSize() const { return Buf.size(); }

private: private:

StringRef Buf; StringRef Buf;

std::vector<Elf_Shdr> FakeSections; std::vector<Elf_Shdr> FakeSections;

SmallString<0> FakeSectionStrings;

MaskRayUnsubmitted

Done

Use SmallString<0> FakeSectionStrings.

Prefer SmallString because std::string has a small string optimization which isn't useful for this case.

MaskRay: Use `SmallString<0> FakeSectionStrings`. Prefer `SmallString` because `std::string` has a…

namhyungAuthorUnsubmitted

Done

But I think it'd be better to use StringTableBuilder as @jhenderson said.

namhyung: But I think it'd be better to use StringTableBuilder as @jhenderson said.

MaskRayUnsubmitted

Not Done

I think StringTableBuilder just adds complexity/overhead in this case. StringTableBuilder cannot deduplicate any string here (and the code does not need to optimize for size).

MaskRay: I think `StringTableBuilder` just adds complexity/overhead in this case. `StringTableBuilder`…

jhendersonUnsubmitted

Not Done

I disagree that it adds complexity. It seems more intuitive to do StrTab.add(someString) than std::copy(someString.begin(), someString.end(), std::back_inserter(StrTab)); followed by a null terminator addition. I do agree that the optimizations are unnecessary. Maybe there's room for a "simple" string builder class that does none of the deduplication/optimization?

jhenderson: I disagree that it adds complexity. It seems more intuitive to do `StrTab.add(someString)` than…

namhyungAuthorUnsubmitted

Not Done

I agree with @jhenderson. The API in the StringTableBuilder is simpler and more intuitive to me. Also finalizeInOrder() is even not trying to optimize it.

namhyung: I agree with @jhenderson. The API in the StringTableBuilder is simpler and more intuitive to…

ELFFile(StringRef Object); ELFFile(StringRef Object);

public: public:

const Elf_Ehdr &getHeader() const { const Elf_Ehdr &getHeader() const {

return *reinterpret_cast<const Elf_Ehdr *>(base()); return *reinterpret_cast<const Elf_Ehdr *>(base());

} }

▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines if (Index == ELF::SHN_XINDEX) {

// header at index 0. // header at index 0.

if (Sections.empty()) if (Sections.empty())

return createError( return createError(

"e_shstrndx == SHN_XINDEX, but the section header table is empty"); "e_shstrndx == SHN_XINDEX, but the section header table is empty");

Index = Sections[0].sh_link; Index = Sections[0].sh_link;

} }

if (!Index) // no section string table. // There is no section name string table. Return FakeSectionStrings which

MaskRayUnsubmitted

Not Done

// There is no section name string table. Return FakeSectionStrings which is non-empty if we have created fake sections.

MaskRay: `// There is no section name string table. Return FakeSectionStrings which is non-empty if we…

jhendersonUnsubmitted

Not Done

@namhyung, I believe @MaskRay was suggesting using the comment he's posted instead of "no section string table", as an improvement. Please update.

jhenderson: @namhyung, I believe @MaskRay was suggesting using the comment he's posted instead of "no…

namhyungAuthorUnsubmitted

Done

Oh, I overlooked that, sorry. Will update.

namhyung: Oh, I overlooked that, sorry. Will update.

return ""; // is non-empty if we have created fake sections.

MaskRayUnsubmitted

Not Done

Just return FakeSectionStrings; and delete return ""

MaskRay: Just return `FakeSectionStrings;` and delete `return ""`

if (!Index)

return FakeSectionStrings;

if (Index >= Sections.size()) if (Index >= Sections.size())

return createError("section header string table index " + Twine(Index) + return createError("section header string table index " + Twine(Index) +

" does not exist"); " does not exist");

return getStringTable(Sections[Index], WarnHandler); return getStringTable(Sections[Index], WarnHandler);

} }

/// This function finds the number of dynamic symbols using a GNU hash table. /// This function finds the number of dynamic symbols using a GNU hash table.

/// ///

▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines

/// disassemble objects without a section header table (e.g. ET_CORE objects /// disassemble objects without a section header table (e.g. ET_CORE objects

/// analyzed by linux perf or ET_EXEC with llvm-strip --strip-sections). /// analyzed by linux perf or ET_EXEC with llvm-strip --strip-sections).

template <class ELFT> void ELFFile<ELFT>::createFakeSections() { template <class ELFT> void ELFFile<ELFT>::createFakeSections() {

if (!FakeSections.empty()) if (!FakeSections.empty())

return; return;

auto PhdrsOrErr = program_headers(); auto PhdrsOrErr = program_headers();

if (!PhdrsOrErr) if (!PhdrsOrErr)

return; return;

for (auto Phdr : *PhdrsOrErr) { FakeSectionStrings.append(StringRef("\0", 1));

jhendersonUnsubmitted

Not Done

This probably wants to be a size_t since it's an index.

jhenderson: This probably wants to be a size_t since it's an index.

namhyungAuthorUnsubmitted

Done

Ok.

namhyung: Ok.

MaskRayUnsubmitted

Not Done

for (auto [Idx, Phdr] : llvm::enumerate(*PhdrsOrErr))

MaskRay: `for (auto [Idx, Phdr] : llvm::enumerate(*PhdrsOrErr))`

jhendersonUnsubmitted

Not Done

Do you actually need this?

jhenderson: Do you actually need this?

namhyungAuthorUnsubmitted

Done

Yep, Idx is a part of the section name.

namhyung: Yep, Idx is a part of the section name.

jhendersonUnsubmitted

Not Done

I wasn't referring to MaskRay's suggestion. I was referring to the line that's actually highlighted with this comment (i.e. line 778 in this diff).

jhenderson: I wasn't referring to MaskRay's suggestion. I was referring to the line that's actually…

namhyungAuthorUnsubmitted

Done

Oh, I see. Maybe it's not needed but I think it's a good practice to maintain offset 0 as an empty string. Do you want me to remove it?

namhyung: Oh, I see. Maybe it's not needed but I think it's a good practice to maintain offset 0 as an…

jhendersonUnsubmitted

Not Done

Having come back to this a day later, I think it's wise to keep it, as we don't want to risk there being code that assumes sh_name == '\0' means no name.

jhenderson: Having come back to this a day later, I think it's wise to keep it, as we don't want to risk…

MaskRayUnsubmitted

Not Done

FakeSectionStrings += '\0' or use push_back.

MaskRay: `FakeSectionStrings += '\0'` or use push_back.

if (!(Phdr.p_type & ELF::PT_LOAD) || !(Phdr.p_flags & ELF::PF_X)) for (auto [Idx, Phdr] : llvm::enumerate(*PhdrsOrErr)) {

jhendersonUnsubmitted

Not Done

Not sure you need the llvm:: prefixes?

jhenderson: Not sure you need the `llvm::` prefixes?

namhyungAuthorUnsubmitted

Done

It seems not. Will remove.

namhyung: It seems not. Will remove.

if (Phdr.p_type != ELF::PT_LOAD || !(Phdr.p_flags & ELF::PF_X))

namhyungAuthorUnsubmitted

Done

I've found a bug in this code - it should check equality.

namhyung: I've found a bug in this code - it should check equality.

MaskRayUnsubmitted

Not Done

The convention does not place parens around !=

MaskRay: The convention does not place parens around `!=`

namhyungAuthorUnsubmitted

Done

Yeah, but I thought it'd good to be symmetric to the RHS.

namhyung: Yeah, but I thought it'd good to be symmetric to the RHS.

jhendersonUnsubmitted

Not Done

Having the "symmetry" has the potential to actually cause confusion, since the LHS isn't negated unlike the RHS.

jhenderson: Having the "symmetry" has the potential to actually cause confusion, since the LHS isn't…

continue; continue;

Elf_Shdr FakeShdr = {}; Elf_Shdr FakeShdr = {};

FakeShdr.sh_type = ELF::SHT_PROGBITS; FakeShdr.sh_type = ELF::SHT_PROGBITS;

FakeShdr.sh_flags = ELF::SHF_ALLOC | ELF::SHF_EXECINSTR; FakeShdr.sh_flags = ELF::SHF_ALLOC | ELF::SHF_EXECINSTR;

FakeShdr.sh_addr = Phdr.p_vaddr; FakeShdr.sh_addr = Phdr.p_vaddr;

FakeShdr.sh_size = Phdr.p_memsz; FakeShdr.sh_size = Phdr.p_memsz;

FakeShdr.sh_offset = Phdr.p_offset; FakeShdr.sh_offset = Phdr.p_offset;

// Create a section name based on the p_type and index.

jhendersonUnsubmitted

Not Done

FakeShdr.sh_offset = Phdr.p_offset;

- // Create a section name based on the p_type and index

+ // Create a section name based on the p_type and index.

FakeShdr.sh_name = StrTab.add("PT_LOAD#" + std::to_string(Idx));

jhenderson:

FakeShdr.sh_name = FakeSectionStrings.size();

jhendersonUnsubmitted

Not Done

I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a std::string temporary, and stick that in SecNames. It's likely that the StringRef will be pointing at invalid memory immediately after its construction (it's likely just working because nothing has scribbled over the data in memory yet). Could you not just populate SecNames with the result of the "PT_LOAD#" + std::to_string(Idx) expression directly?

If the StringTableBuilder contains StringRefs to the strings, using a std::vector will potentially invalidate them, if it needs to increase its capacity, right? We probably want a different strategy here. Probably simplest is simply to give the vector an initial capacity equal to the total number of program headers. Given that there are rarely that many, this shouldn't have any significant impact on performance.

jhenderson: I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a…

MaskRayUnsubmitted

Not Done

Thanks for catching this. It is a code smell but it actually works: "PT_LOAD#" + std::to_string(Idx) returns a std::string. It works as the the lifetime of the temporary object std::string applies to the full-expression.

The following should be better:

SmallVector<std::string, 0> SecNames; 
...
SecNames.push_back(("PT_LOAD#" + Twine(Idx)).str());

// An alternative is ("PT_LOAD#" + Twine(Idx)).toVector(Name); but a local `SmallString` variable is needed

MaskRay: Thanks for catching this. It is a code smell but it actually works: `"PT_LOAD#" + std…

namhyungAuthorUnsubmitted

Done

I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to a std::string temporary, and stick that in SecNames. It's likely that the StringRef will be pointing at invalid memory immediately after its construction (it's likely just working because nothing has scribbled over the data in memory yet).

I don't think so. The StringRef was needed because the constructor of SmallString takes it only. I wanted to pass std::string directly but it's not supported IIUC. After it creates a SmallString, I don't use the StringRef for a temp string anymore. Instead it uses the SmallString data in the vector, that's why I passed SecNames.back(). I believe it's safe to refer the data even after vector is resized.

namhyung: > I'm pretty sure this isn't going to work as intended in many cases: you create a StringRef to…

FakeSectionStrings.append(StringRef(("PT_LOAD#" + Twine(Idx)).str()));

FakeSectionStrings.append(StringRef("\0", 1));

FakeSections.push_back(FakeShdr); FakeSections.push_back(FakeShdr);

} }

template <class ELFT> template <class ELFT>

Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const { Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {

const uintX_t SectionTableOffset = getHeader().e_shoff; const uintX_t SectionTableOffset = getHeader().e_shoff;

if (SectionTableOffset == 0) { if (SectionTableOffset == 0) {

▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

llvm/test/Object/objdump-no-sectionheaders.test

; RUN: llvm-objdump -h %p/Inputs/no-sections.elf-x86-64 \ ## This test checks llvm-objdump -h can handle ELF files without section info.

MaskRayUnsubmitted

Not Done

Nit: "This test " can be omitted. Use an imperative sentence.

MaskRay: Nit: "This test " can be omitted. Use an imperative sentence.

jhendersonUnsubmitted

Not Done

Concretely: "Check llvm-objdump -h ..."

jhenderson: Concretely: "Check llvm-objdump -h ..."

; RUN: | FileCheck %s ## Only PT_LOAD segments with the PF_X flag will be displayed as fake sections.

jhendersonUnsubmitted

Not Done

## This test checks llvm-objdump -h can handle ELF files without section info.

- ## Only PT_LOAD program with PF_X flag will be display corresponding fake sections.

+ ## Only PT_LOAD program with PF_X flag will display corresponding fake sections.

# RUN: yaml2obj %s -o %t

jhenderson:

namhyungAuthorUnsubmitted

Done

namhyung: ok

jhendersonUnsubmitted

Not Done

## This test checks llvm-objdump -h can handle ELF files without section info.

- ## Only PT_LOAD program with PF_X flag will display corresponding fake sections.

+ ## Only PT_LOAD segments with the PF_X flag will be displayed as fake sections.

# RUN: yaml2obj %s -o %t

I misread this sentence. Here's a better improvement.

jhenderson: I misread this sentence. Here's a better improvement.

; CHECK: Sections: # RUN: yaml2obj %s -o %t

; CHECK-NEXT: Idx Name Size VMA Type # RUN: llvm-objdump -h %t | FileCheck %s

; CHECK-NEXT: 0 000006ec 0000000000400000 TEXT

; CHECK-NEXT: 1 00000000 0000000000000000 TEXT # CHECK: Sections:

; CHECK-NOT: {{.}} # CHECK-NEXT: Idx Name Size VMA Type

# CHECK-NEXT: 0 PT_LOAD#1 00000100 0000000000400000 TEXT

# CHECK-NEXT: 1 PT_LOAD#3 00000200 0000000000600400 TEXT

# CHECK-NOT: {{.}}

!ELF

FileHeader:

Class: ELFCLASS64

Data: ELFDATA2LSB

Type: ET_CORE

Machine: EM_X86_64

Sections:

- Type: SectionHeaderTable

NoHeaders: true

- Type: Fill

jhendersonUnsubmitted

Not Done

I think I see a potential problem here, which could explain the problem you're seeing: you haven't specified any contents or explicit offsets of any of these program headers, which may mean that even though they have addresses specified, they'll start at the same offset within the file. This might result in something bad such as only the one section appearing in the output, although I don't know the llvm-objdump code well enough to be certain.

jhenderson: I think I see a potential problem here, which could explain the problem you're seeing: you…

namhyungAuthorUnsubmitted

Done

Will check with contents and offsets.

namhyung: Will check with contents and offsets.

jhendersonUnsubmitted

Not Done

The Offset will physically impact the size of the file on disk, meaning this generated YAML file will be huge. Instead, use offset values around 0x100 - 0x200, which are significantly smaller (and adjust the addresses to line up too, I recommend) (you can't use 0 because of the ELF header and Program Header table). Also, the sizes of the individual blobs of data can be only a couple of bytes for the test purposes to be satisfied.

jhenderson: The `Offset` will physically impact the size of the file on disk, meaning this generated YAML…

Name: code1

Pattern: "cc"

Size: 0x100

Offset: 0x200

- Type: Fill

Name: data1

Pattern: "aa55"

Size: 0x100

Offset: 0x300

- Type: Fill

Name: code2

Pattern: "ff"

Size: 0x200

Offset: 0x400

ProgramHeaders:

- Type: PT_PHDR

Flags: [ PF_X ]

VAddr: 0x400000

MemSize: 0x100

- Type: PT_LOAD

Flags: [ PF_X ]

VAddr: 0x400000

MemSize: 0x100

FirstSec: code1

LastSec: code1

- Type: PT_LOAD

Flags: [ PF_R ]

VAddr: 0x500300

MemSize: 0x100

FirstSec: data1

LastSec: data1

- Type: PT_LOAD

Flags: [ PF_R, PF_X ]

VAddr: 0x600400

MemSize: 0x200

FirstSec: code2

LastSec: code2

jhendersonUnsubmitted

Not Done

I'd strongly support replacing the pre-canned ELF with a YAML input in this test. At the moment, I have no idea if the behaviour is correct without digging out a binary inspection tool to look at the ELF object.

Some key characteristics I think that need testing:

That non-PT_LOAD segments are ignored (but still count towards the index), even if they have PF_X.
That non-PF_X segments are ignored (but still count towards the index).
That multiple PT_LOAD wiht PF_X segments can be displayed.

To achieve these points, I think it would be sufficient to have an ELF with the following program headers:

PT_PHDR (or other PT_* type) with PF_X
PT_LOAD with PF_X
PT_LOAD without PF_X
PT_LOAD with PF_X

jhenderson: I'd strongly support replacing the pre-canned ELF with a YAML input in this test. At the moment…

namhyungAuthorUnsubmitted

Not Done

Sounds good!

namhyung: Sounds good!

llvm/test/tools/llvm-objdump/X86/disassemble-no-section.test

	## This test checks -d disassembles an ELF file without section headers.			## This test checks -d disassembles an ELF file without section headers.
	## Such files include kcore files extracted by linux perf tools, or			## Such files include kcore files extracted by linux perf tools, or
	## executables with section headers stripped by e.g.			## executables with section headers stripped by e.g.
	## llvm-strip --strip-sections.			## llvm-strip --strip-sections.

	# RUN: yaml2obj %s -o %t			# RUN: yaml2obj %s -o %t
	# RUN: llvm-objdump -d %t \| FileCheck %s			# RUN: llvm-objdump -d %t \| FileCheck %s

	# CHECK: Disassembly of section :			# CHECK: Disassembly of section PT_LOAD#0:
	# CHECK-EMPTY:			# CHECK-EMPTY:
	# CHECK-NEXT: <>:			# CHECK-NEXT: <PT_LOAD#0>:
	# CHECK-NEXT: 55 pushq %rbp			# CHECK-NEXT: 55 pushq %rbp
	# CHECK-NEXT: 48 89 e5 movq %rsp, %rbp			# CHECK-NEXT: 48 89 e5 movq %rsp, %rbp
	# CHECK-NEXT: 0f 1f 40 00 nopl (%rax)			# CHECK-NEXT: 0f 1f 40 00 nopl (%rax)
	# CHECK-NEXT: 5d popq %rbp			# CHECK-NEXT: 5d popq %rbp
	# CHECK-NEXT: c3 retq			# CHECK-NEXT: c3 retq

	## Check disassembly with an address range.			## Check disassembly with an address range.
	# RUN: llvm-objdump -d --start-address=0xffffffff00000000 \			# RUN: llvm-objdump -d --start-address=0xffffffff00000000 \
	# RUN: --stop-address=0xffffffff00000004 %t 2>&1 \| \			# RUN: --stop-address=0xffffffff00000004 %t 2>&1 \| \
	# RUN: FileCheck %s --check-prefix RANGE			# RUN: FileCheck %s --check-prefix RANGE

	# RANGE: no section overlaps the range			# RANGE: no section overlaps the range
	# RANGE-EMPTY:			# RANGE-EMPTY:
	# RANGE-NEXT: Disassembly of section :			# RANGE-NEXT: Disassembly of section PT_LOAD#0:
	# RANGE-EMPTY:			# RANGE-EMPTY:
	# RANGE-NEXT: <>:			# RANGE-NEXT: <PT_LOAD#0>:
	# RANGE-NEXT: 55 pushq %rbp			# RANGE-NEXT: 55 pushq %rbp
	# RANGE-NEXT: 48 89 e5 movq %rsp, %rbp			# RANGE-NEXT: 48 89 e5 movq %rsp, %rbp
	# RANGE-EMPTY:			# RANGE-EMPTY:

	!ELF			!ELF
	FileHeader:			FileHeader:
	Class: ELFCLASS64			Class: ELFCLASS64
	Data: ELFDATA2LSB			Data: ELFDATA2LSB
	Show All 16 Lines

llvm/tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 2,160 Lines • ▼ Show 20 Lines	static size_t getMaxSectionNameWidth(const ObjectFile &Obj) {
for (const SectionRef &Section : ToolSectionFilter(Obj)) {		for (const SectionRef &Section : ToolSectionFilter(Obj)) {
StringRef Name = unwrapOrError(Section.getName(), Obj.getFileName());		StringRef Name = unwrapOrError(Section.getName(), Obj.getFileName());
MaxWidth = std::max(MaxWidth, Name.size());		MaxWidth = std::max(MaxWidth, Name.size());
}		}
return MaxWidth;		return MaxWidth;
}		}

void objdump::printSectionHeaders(ObjectFile &Obj) {		void objdump::printSectionHeaders(ObjectFile &Obj) {
		if (Obj.isELF() && Obj.sections().empty())
		createFakeELFSections(Obj);

size_t NameWidth = getMaxSectionNameWidth(Obj);		size_t NameWidth = getMaxSectionNameWidth(Obj);
size_t AddressWidth = 2 * Obj.getBytesInAddress();		size_t AddressWidth = 2 * Obj.getBytesInAddress();
bool HasLMAColumn = shouldDisplayLMA(Obj);		bool HasLMAColumn = shouldDisplayLMA(Obj);
outs() << "\nSections:\n";		outs() << "\nSections:\n";
if (HasLMAColumn)		if (HasLMAColumn)
outs() << "Idx " << left_justify("Name", NameWidth) << " Size "		outs() << "Idx " << left_justify("Name", NameWidth) << " Size "
<< left_justify("VMA", AddressWidth) << " "		<< left_justify("VMA", AddressWidth) << " "
<< left_justify("LMA", AddressWidth) << " Type\n";		<< left_justify("LMA", AddressWidth) << " Type\n";
else		else
outs() << "Idx " << left_justify("Name", NameWidth) << " Size "		outs() << "Idx " << left_justify("Name", NameWidth) << " Size "
<< left_justify("VMA", AddressWidth) << " Type\n";		<< left_justify("VMA", AddressWidth) << " Type\n";

if (Obj.isELF() && Obj.sections().empty())
createFakeELFSections(Obj);

uint64_t Idx;		uint64_t Idx;
for (const SectionRef &Section : ToolSectionFilter(Obj, &Idx)) {		for (const SectionRef &Section : ToolSectionFilter(Obj, &Idx)) {
StringRef Name = unwrapOrError(Section.getName(), Obj.getFileName());		StringRef Name = unwrapOrError(Section.getName(), Obj.getFileName());
uint64_t VMA = Section.getAddress();		uint64_t VMA = Section.getAddress();
if (shouldAdjustVA(Section))		if (shouldAdjustVA(Section))
VMA += AdjustVMA;		VMA += AdjustVMA;

uint64_t Size = Section.getSize();		uint64_t Size = Section.getSize();
▲ Show 20 Lines • Show All 933 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Create name for fake sectionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458380

llvm/include/llvm/Object/ELF.h

llvm/test/Object/objdump-no-sectionheaders.test

llvm/test/tools/llvm-objdump/X86/disassemble-no-section.test

llvm/tools/llvm-objdump/llvm-objdump.cpp

[llvm-objdump] Create name for fake sections
ClosedPublic