This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readelf] Support dumping the BB address map section with --bb-addr-map.
ClosedPublic

Authored by rahmanl on Jan 26 2021, 11:55 PM.

Details

Summary

This patch lets llvm-readelf dump the content of the BB address map
section in the following format:

Function {
  At: <address>
  BB entries [
    {
      Offset:   <offset>
      Size:     <size>
      Metadata: <metadata>
    },
    ...
  ]
}
...

Diff Detail

Event Timeline

rahmanl created this revision.Jan 26 2021, 11:55 PM
rahmanl published this revision for review.Jan 28 2021, 2:16 AM
rahmanl added reviewers: grimar, MaskRay.
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2021, 2:16 AM
grimar added inline comments.Jan 28 2021, 2:56 AM
llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
38

I'd separate the YAML from the check lines with an empty line.
The same below.

Also I'd separate LLVM checks from GNU.

61

This needs a comment to describe what is exactly wrong with the section.
There are multiple possible issues that might occur, but you test only one of them.

I'd suggest to create a valid section and use ShSize key with a macro to truncate it
by different ways. This will allow to easily test each of cases

uint32_t Offset = Data.getULEB128(Cur);
uint32_t Size = Data.getULEB128(Cur);
uint32_t Metadata = Data.getULEB128(Cur);

I.e. you can just reuse the first YAML:

Sections:
  - Name:    .llvm_bb_addr_map
    Type:    SHT_LLVM_BB_ADDR_MAP
    ShSize: [[SIZE=<none>]] <--- just add this line and use the power of macros
    Entries:
.....
63

I am not sure that 0x{{([[:xdigit:]]{8})}} is usefull? I'd just check the offset.
Also I guess you can reuse this check line for other broken cases and have a [[OFFSET]] macro instead.

...  | FileCheck %s -DFILE=%t2.o --check-prefix=INVALID -DOFFSET=0x....
# INVALID: warning: '[[FILE]]': unable to dump the SHT_LLVM_BB_ADDR_MAP section: unable to decode LEB128 at offset [[OFFSET]]: malformed uleb128, extends past end
66

It is slightly misaligned vertically.

72

Usually we reduce the number of spaces to minimal.

- Name: .llvm_bb_addr_map
  Type: SHT_LLVM_BB_ADDR_MAP
  Size: 8

(The same applies for the first YAML).

llvm/tools/llvm-readobj/ELFDumper.cpp
6545

It would me more consistent to use the describe helper to build the message. E.g. see:

return createError("invalid section linked to " + describe(Obj, Sec) +
                   ": " + toString(SymtabOrErr.takeError()));

You need to add this option to the llvm-readobj and llvm-readelf command guides under llvm/docs/CommandGuide.

Ideally, I think you should add gtest unit tests for the libObject code. The lit test would then just (primarily) test the llvm-readobj handling of the potential return values of the parsing code.

llvm/include/llvm/Object/ELF.h
281

I'm not sure why some functions use _ instead of camelCase, but it's probably best you use the latter. Maybe put it after the notes functions so that we don't switch back and forth between styles too.

llvm/include/llvm/Object/ELFTypes.h
797

Is there a specification to point to somewhere for this section? If so, what case style does it use? If there isn't, we should probably use LLVM style, though I'm not sure.

llvm/lib/Object/ELF.cpp
379

getULEB128 can return uint64_t, so I'd use uint64_t here and below - it defers truncation of large values to as late as possible, which I think is a good thing. Optionally, also consider an error check that the values aren't greater than max UINT32. A ULEB128 has no defined max size, so if the input format is ULEB128, users might theoretically expect to allow more than 32-bits worth of data. Alternatively, consider using uin64_t for the struct member sizes. Values outside that range will be picked up by the ULEB parser and reported, so we don't need to do more then.

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
2

--bb-addr-map is the canonical option, so use that terminology in the comments rather than --elf-...

61

See also my out-of-line comment, but I think it would make more sense for the testing of the different ways things can go wrong to be in the libObject unit tests if possible, with just a single test in llvm-readobj lit tests showing that a warning is reported if there's a problem.

llvm/tools/llvm-readobj/llvm-readobj.cpp
373–374

Any particular reason for the elf alias? The alias is there for some older options (cg-profile specifically) because they were initially added as elf-..., so we keep the alias for backwards compatibility. If there isn't a reason, I'd drop it.

rahmanl updated this revision to Diff 320071.Jan 29 2021, 1:13 AM
rahmanl marked 6 inline comments as done.

Test the 32-bit case.
Address @grimar's comments.

Thanks for the comments @grimar

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
61

Thanks for the suggestion. Adopted.
I am not sure if I want to exercise more invalid cases. Of course I can truncate the content by different sizes, but I don't think it will add a lot of value.

63

Thanks. Used [[OFFSET]] even though it's only used for one test case.

72

I removed this YAML since I am reusing the first YAML for the invalid case.

rahmanl added inline comments.Jan 31 2021, 3:34 PM
llvm/lib/Object/ELF.cpp
379

Thanks. Based on your suggestion, I plan to read these values as uint64_t and add cast to uint32_t (And add proper error checking). But I noticed that my elf2yaml code had the same problem (reading ULEB-encoded values as uint32_t). So I appreciate if you could please review D95767 before coming back to this patch.

rahmanl planned changes to this revision.Jan 31 2021, 3:34 PM
rahmanl updated this revision to Diff 320666.Feb 1 2021, 7:54 PM
rahmanl marked 5 inline comments as done.
  • Decode ULEB128 as uint64_t and then cast to uint32_t. Report error if over the limit.
llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
61

Thanks James. Allow me get to this (and the cmd documentation) after I get the logic right.
Would you mind elaborate on what specifically needs to be checked in the libObject unit test (Also, is there a precedent/example I can follow)?

Note to self: I've not yet reviewed the testing in depth. Need to do that still.

Would you mind elaborate on what specifically needs to be checked in the libObject unit test (Also, is there a precedent/example I can follow)?

The existing libObject unit tests are patchy at best, unfortunately, so finding a clear-cut precedent is a little tricky. Probably the easiest thing to do would be to follow things like the InvalidSymbolTest in ELFObjectFileTest.cpp to generate an ELFFile instance from YAML written as a string in the code. With a bit of care, you could probably dynamically build up the YAML string from common components to avoid a verbose string literal in each test case. Once you have an ELFFile instance, you can then just call decodeBBAddrMap on that. The aim of this test would be to test the code that is in the library directly, without worring about whatever llvm-readobj does. In particular, you can test all the error paths there, using things like EXPECT_THAT_EXPECTED(..., FailedWithMessage(...));. In the lit test, you could then treat the function as a black box and test only one way of getting the error (to show that the error is handled properly), plus that the dumpign code works as expected in a valid case. Does that make sense?

llvm/include/llvm/Object/ELFTypes.h
797

Nit: these comments probably all want trailing full stops.

llvm/lib/Object/ELF.cpp
620

Lambda case style is the same as variable case style (i.e. UpperCamelCase). I guess, because they're more like pointers to functions than functions themselves?

627

Coding standard for error messages is no trailing full stop (for consistency with most existing messages).

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
61

See out-of-line comment (moved out-of-line due to length).

rahmanl updated this revision to Diff 321248.Feb 3 2021, 3:33 PM
  • Move error path tests under libObject unittests.
rahmanl marked 4 inline comments as done.Feb 3 2021, 3:41 PM

Thanks @jhenderson. Your explanation was very helpful. Would you please advise on whether I should keep the error paths in the lit tests, as is?

rahmanl updated this revision to Diff 321260.Feb 3 2021, 4:39 PM
rahmanl marked an inline comment as done.
  • Add command line guides for llvm-readobj and llvm-readelf.

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

I meant the textual output. It's very close to json - the benefit of it actually being json is that one could script from llvm-readobj output.

The other question - looking at the textual output, could it go the next step and produce the association bbaddress - name? My thinking is that the user of the output would find that more valuable (quick scripting, for instance)

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

I meant the textual output. It's very close to json - the benefit of it actually being json is that one could script from llvm-readobj output.

I see. You're right. The output is similar to json. This is produced by ScopedPrinter. The relevant sources do not point out similarity with JSON at all. Maybe @jhenderson can answer this.

The other question - looking at the textual output, could it go the next step and produce the association bbaddress - name? My thinking is that the user of the output would find that more valuable (quick scripting, for instance)

This is a good point. Currently, the bb-address-map is only generated using -fbasic-block-sections=labels which also renumbers the basic block in ascending. So we can say that the MBB number associated with the i'th BB entry is i.

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

I meant the textual output. It's very close to json - the benefit of it actually being json is that one could script from llvm-readobj output.

I see. You're right. The output is similar to json. This is produced by ScopedPrinter. The relevant sources do not point out similarity with JSON at all. Maybe @jhenderson can answer this.

The LLVM-style output format for llvm-readobj is not intended to be JSON, although it is somewhat similar. Primarily, the output is intended for ease of testing within the LLVM lit tests. See in particular the second of the following two commits (the first contains some general rules for the output format).

https://github.com/llvm/llvm-project/commit/9cad53cfeceed2ef75aa17f0553edd27afd2e950
https://github.com/llvm/llvm-project/commit/d7e7003e8b2d4632b72533e0d05ad5558e5bdcbc#diff-53e1a2a2a247ad1a4cac5242c17c0aa36eefdec70b4369fb31f911aaf89aad2c

I don't think it's practical to make the general output JSON-like due to the amount of turbulence this would cause. If there's a particularly strong desire to do this, it should be discussed on llvm-dev in the first instance, I think, and secondly, I think would be best as a third output format (e.g. --elf-output-style=json).

llvm/docs/CommandGuide/llvm-readelf.rst
37

Spell out "BB". I assume it is "basic block"? Also, as this is only implemented for LLVM output style, I'd mention that in the documentation. Also, you say "section(s)", implying there might be more than one. However, the code only supports dumping a single section per object file. Either your code or your documentation needs updating!

llvm/docs/CommandGuide/llvm-readobj.rst
153

Same as above.

llvm/include/llvm/Object/ELFTypes.h
800
llvm/lib/Object/ELF.cpp
643–644

I might be mistaken, but won't this cause an assertion due to Cur not being checked and reported if it is in a bad state? I think it could happen if reading NumBlocks produces too large a number, and then a later value is truncated (for example).

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
18

But actually, you only need one of this and the above test case that checks the error behaviour in a lit test - the libObject unit test is sufficient for the other failure path(s). In fact, you should probably change the comment of the kept error case, simply to say something like "Check that a malformed section can be handled."

llvm/unittests/Object/ELFObjectFileTest.cpp
515

I'm not sure this line is really needed?

529
rahmanl updated this revision to Diff 322557.Feb 9 2021, 5:53 PM
rahmanl marked 7 inline comments as done.
  • Support dumping multiple BB address map sections.
  • Address other @jhenderson's comments.
rahmanl updated this revision to Diff 322562.Feb 9 2021, 6:13 PM
  • Support dumping multiple BB address map sections.
  • Add a unit test to check that we don't generate an unhandled error.
  • Keep only one error path in the lit test.
rahmanl added inline comments.Feb 9 2021, 6:14 PM
llvm/docs/CommandGuide/llvm-readelf.rst
37

You're right. I am changing the code to support dumping multiple BB address map sections.

llvm/lib/Object/ELF.cpp
643–644

Thanks for the scrutiny. I think this won't happen because ReadULEB128AsUInt32 always checks ULEBSizeErr and return zero if it's in the error state. However, if we ever read with Data.getULEB128(Cur) we may run into the unhandled error problem.
I added a unit test case to exercise the case that you mentioned, but at this point, I am not sure if the benefit of restricting to uint32_t outweighs the readability cost of this code.

rahmanl updated this revision to Diff 322564.Feb 9 2021, 6:28 PM
  • Add a dummy SHT_PROGBITS section in the test for further test coverage.
jhenderson added inline comments.Feb 10 2021, 1:35 AM
llvm/lib/Object/ELF.cpp
637

If the ULEB for NumBlocks happened to be 0xffffffffffffffff, I think NumBlocks will be 0xffffffff, which will mean you'll add that many (0, 0, 0) entries to BBEntries. I suspect this is best avoided due to the time and memory cost before you eventually get the reported error.

Adding an extra check to ensure ULEBSizeErr isn't bad here is probably necessary.

643–644

I see, yes. I'd prefer to keep the checks, but if you think it's really making it hard to read, I'm not opposed to reverting to a version without them.

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
60

Would it make sense to show that the second section is still dumped in this case?

87

Why ShSize? Do you mean Size to give it a non-zero actual size on disk? (ShSize overrides only the value in the header - it doesn't impact the section placement).

llvm/tools/llvm-readobj/ELFDumper.cpp
553

Here and below, it looks like clang-tidy is unhappy. You probably need to change the base virtual function?

rahmanl updated this revision to Diff 323505.Feb 12 2021, 5:05 PM
rahmanl marked 5 inline comments as done.
  • Handle @jhenderson's comments.
  • Add a test for error handling when NumBlocks is out-of-range.

Thanks for the comments @jhenderson

llvm/lib/Object/ELF.cpp
637

Thanks. You're right.
I am fixing this in two ways:

  1. Changing ReadULEB128AsUInt32 to return zero when ULEBSizeErr is set.
  2. Checking ULEBSizeErr in the inner loop to prevent from further iteration when the error is set (similar to the cursor error).

After this change, the unit test failed with "Success values must still be checked prior to being destroyed" which made me realize that the error checking order may leave the Cursor error unchecked. So I reversed the checks after the loop to fix it.

I also added a test which exercises the case that you mentioned. Had to encode ULEB128 since our YAML format does not take in NumBlocks as input.

643–644

Thanks. I am going to keep the uint32_t restriction. It's not looking bad.

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
87

You're right. Fixed. (I now understand why we use ShSize for truncating the section).

rahmanl updated this revision to Diff 323512.Feb 12 2021, 7:45 PM
  • Remove errs() lines.
jhenderson added inline comments.Feb 16 2021, 1:26 AM
llvm/lib/Object/ELF.cpp
624
644

clang-format is complaining.

Had to encode ULEB128 since our YAML format does not take in NumBlocks as input.

Sounds like this should be added to the YAML input support, so that you can write a custom NumBlocks value that doesn't match the real number of blocks. We do similar things for ELF sections for example, where you can override the implicit value with a value of your own choice which gets written to the field.

652–659

This is probably fine, but feels a little fragile. How about using joinErrors to combine the two errors into a single one that can be reported, as suggested in the inline edit? That also avoids any risk of problems in the event the code changes such that they could both be in a bad state. joinErrors of an Error::success() and a non-success Error is safe and equivalent to just returning the actual Error, if I remember correctly.

llvm/unittests/Object/ELFObjectFileTest.cpp
583

Consider instead making the suggested yaml2obj change first, and then using that for this test case.

rahmanl updated this revision to Diff 324167.Feb 16 2021, 7:00 PM
rahmanl marked 2 inline comments as done.
  • Use joinErrors to join ULEBSizeErr and the cursor error.
rahmanl added inline comments.Feb 16 2021, 7:01 PM
llvm/lib/Object/ELF.cpp
644

Thanks for the suggestion. Please review D96831.

rahmanl planned changes to this revision.Feb 16 2021, 10:10 PM
jhenderson added inline comments.Feb 17 2021, 12:51 AM
llvm/lib/Object/ELF.cpp
653

Let's use the variable name explicitly.

rahmanl updated this revision to Diff 324458.Feb 17 2021, 4:09 PM
  • Use the NumBlocks field in the unit test, instead of encoding by hand.
  • Change the type of 'NumBlocks' to uint64_t so we can write the invalid unit test.
rahmanl updated this revision to Diff 324461.Feb 17 2021, 4:13 PM
rahmanl marked 3 inline comments as done.
  • Use Cur instead of cursor in the comment.
rahmanl updated this revision to Diff 324477.Feb 17 2021, 5:38 PM
  • Decode the Metadata field consistently with the encoding in AsmPrinter::getBBAddrMapMetadata.
  • Decode the Metadata field consistently with the encoding in AsmPrinter::getBBAddrMapMetadata.

@jhenderson and @shenhan I decided to decode the Metadata field right here so the user wouldn't need to worry about it.

I'd like to do another review pass of the testing before signing off on this, but generally, this is looking good.

llvm/include/llvm/Object/ELFTypes.h
803

Typo/incomplete sentence?

llvm/unittests/Object/ELFObjectFileTest.cpp
583

I would suggest either "Check for proper error ..." or "Check proper error ..."

rahmanl updated this revision to Diff 325647.Feb 22 2021, 7:17 PM
rahmanl marked 2 inline comments as done.

Fix typos.

I'd like to do another review pass of the testing before signing off on this, but generally, this is looking good.

No worries. Please take your time.

Just a few small suggestions, otherwise this looks good, thanks!

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
4

Maybe : or . to end this and the Check 32 bit line.

9

I'm not sure "x32" is really a thing. I think you probably just want to call it "64-bit" and "32-bit".

llvm/unittests/Object/ELFObjectFileTest.cpp
533

It might make sense to do proper edge-case testing here, with values of 0xffffffff and 0x100000000.

rahmanl updated this revision to Diff 326628.Feb 26 2021, 1:43 AM
rahmanl marked 3 inline comments as done.
  • Typos and edge-cases.
llvm/unittests/Object/ELFObjectFileTest.cpp
533

Good idea. Done here and below.

@jhenderson Is this good now? Thanks for the review.

jhenderson accepted this revision.Mar 5 2021, 12:42 AM

Apologies for the delay - I've missed most of this week due to illness. LGTM.

This revision is now accepted and ready to land.Mar 5 2021, 12:42 AM

Apologies for the delay - I've missed most of this week due to illness. LGTM.

Thank you for the review. Hope everything is going well for you now.