Page MenuHomePhabricator

[llvm-readelf] Support dumping the BB address map section with --bb-addr-map.
ClosedPublic

Authored by rahmanl on Jan 26 2021, 11:55 PM.

Details

Summary

This patch lets llvm-readelf dump the content of the BB address map
section in the following format:

Function {
  At: <address>
  BB entries [
    {
      Offset:   <offset>
      Size:     <size>
      Metadata: <metadata>
    },
    ...
  ]
}
...

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rahmanl marked 4 inline comments as done.Feb 3 2021, 3:41 PM

Thanks @jhenderson. Your explanation was very helpful. Would you please advise on whether I should keep the error paths in the lit tests, as is?

rahmanl updated this revision to Diff 321260.Feb 3 2021, 4:39 PM
rahmanl marked an inline comment as done.
  • Add command line guides for llvm-readobj and llvm-readelf.

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

I meant the textual output. It's very close to json - the benefit of it actually being json is that one could script from llvm-readobj output.

The other question - looking at the textual output, could it go the next step and produce the association bbaddress - name? My thinking is that the user of the output would find that more valuable (quick scripting, for instance)

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

I meant the textual output. It's very close to json - the benefit of it actually being json is that one could script from llvm-readobj output.

I see. You're right. The output is similar to json. This is produced by ScopedPrinter. The relevant sources do not point out similarity with JSON at all. Maybe @jhenderson can answer this.

The other question - looking at the textual output, could it go the next step and produce the association bbaddress - name? My thinking is that the user of the output would find that more valuable (quick scripting, for instance)

This is a good point. Currently, the bb-address-map is only generated using -fbasic-block-sections=labels which also renumbers the basic block in ascending. So we can say that the MBB number associated with the i'th BB entry is i.

Can the output be consumed programmatically? (i.e. is it for example valid yaml or json)

Yes, but it's not yaml or json. The output is std::vector<Elf_BBAddrMap> and it is accessible via the library call: ELFFile::decodeBBAddrMap(Elf_Shdr&) (However, the user needs to find the BB address map section).

I meant the textual output. It's very close to json - the benefit of it actually being json is that one could script from llvm-readobj output.

I see. You're right. The output is similar to json. This is produced by ScopedPrinter. The relevant sources do not point out similarity with JSON at all. Maybe @jhenderson can answer this.

The LLVM-style output format for llvm-readobj is not intended to be JSON, although it is somewhat similar. Primarily, the output is intended for ease of testing within the LLVM lit tests. See in particular the second of the following two commits (the first contains some general rules for the output format).

https://github.com/llvm/llvm-project/commit/9cad53cfeceed2ef75aa17f0553edd27afd2e950
https://github.com/llvm/llvm-project/commit/d7e7003e8b2d4632b72533e0d05ad5558e5bdcbc#diff-53e1a2a2a247ad1a4cac5242c17c0aa36eefdec70b4369fb31f911aaf89aad2c

I don't think it's practical to make the general output JSON-like due to the amount of turbulence this would cause. If there's a particularly strong desire to do this, it should be discussed on llvm-dev in the first instance, I think, and secondly, I think would be best as a third output format (e.g. --elf-output-style=json).

llvm/docs/CommandGuide/llvm-readelf.rst
37

Spell out "BB". I assume it is "basic block"? Also, as this is only implemented for LLVM output style, I'd mention that in the documentation. Also, you say "section(s)", implying there might be more than one. However, the code only supports dumping a single section per object file. Either your code or your documentation needs updating!

llvm/docs/CommandGuide/llvm-readobj.rst
153

Same as above.

llvm/include/llvm/Object/ELFTypes.h
800
llvm/lib/Object/ELF.cpp
650–651

I might be mistaken, but won't this cause an assertion due to Cur not being checked and reported if it is in a bad state? I think it could happen if reading NumBlocks produces too large a number, and then a later value is truncated (for example).

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
18

But actually, you only need one of this and the above test case that checks the error behaviour in a lit test - the libObject unit test is sufficient for the other failure path(s). In fact, you should probably change the comment of the kept error case, simply to say something like "Check that a malformed section can be handled."

llvm/unittests/Object/ELFObjectFileTest.cpp
515

I'm not sure this line is really needed?

529
rahmanl updated this revision to Diff 322557.Feb 9 2021, 5:53 PM
rahmanl marked 7 inline comments as done.
  • Support dumping multiple BB address map sections.
  • Address other @jhenderson's comments.
rahmanl updated this revision to Diff 322562.Feb 9 2021, 6:13 PM
  • Support dumping multiple BB address map sections.
  • Add a unit test to check that we don't generate an unhandled error.
  • Keep only one error path in the lit test.
rahmanl added inline comments.Feb 9 2021, 6:14 PM
llvm/docs/CommandGuide/llvm-readelf.rst
37

You're right. I am changing the code to support dumping multiple BB address map sections.

llvm/lib/Object/ELF.cpp
650–651

Thanks for the scrutiny. I think this won't happen because ReadULEB128AsUInt32 always checks ULEBSizeErr and return zero if it's in the error state. However, if we ever read with Data.getULEB128(Cur) we may run into the unhandled error problem.
I added a unit test case to exercise the case that you mentioned, but at this point, I am not sure if the benefit of restricting to uint32_t outweighs the readability cost of this code.

rahmanl updated this revision to Diff 322564.Feb 9 2021, 6:28 PM
  • Add a dummy SHT_PROGBITS section in the test for further test coverage.
jhenderson added inline comments.Feb 10 2021, 1:35 AM
llvm/lib/Object/ELF.cpp
644

If the ULEB for NumBlocks happened to be 0xffffffffffffffff, I think NumBlocks will be 0xffffffff, which will mean you'll add that many (0, 0, 0) entries to BBEntries. I suspect this is best avoided due to the time and memory cost before you eventually get the reported error.

Adding an extra check to ensure ULEBSizeErr isn't bad here is probably necessary.

650–651

I see, yes. I'd prefer to keep the checks, but if you think it's really making it hard to read, I'm not opposed to reverting to a version without them.

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
60

Would it make sense to show that the second section is still dumped in this case?

87

Why ShSize? Do you mean Size to give it a non-zero actual size on disk? (ShSize overrides only the value in the header - it doesn't impact the section placement).

llvm/tools/llvm-readobj/ELFDumper.cpp
553

Here and below, it looks like clang-tidy is unhappy. You probably need to change the base virtual function?

rahmanl updated this revision to Diff 323505.Feb 12 2021, 5:05 PM
rahmanl marked 5 inline comments as done.
  • Handle @jhenderson's comments.
  • Add a test for error handling when NumBlocks is out-of-range.

Thanks for the comments @jhenderson

llvm/lib/Object/ELF.cpp
644

Thanks. You're right.
I am fixing this in two ways:

  1. Changing ReadULEB128AsUInt32 to return zero when ULEBSizeErr is set.
  2. Checking ULEBSizeErr in the inner loop to prevent from further iteration when the error is set (similar to the cursor error).

After this change, the unit test failed with "Success values must still be checked prior to being destroyed" which made me realize that the error checking order may leave the Cursor error unchecked. So I reversed the checks after the loop to fix it.

I also added a test which exercises the case that you mentioned. Had to encode ULEB128 since our YAML format does not take in NumBlocks as input.

650–651

Thanks. I am going to keep the uint32_t restriction. It's not looking bad.

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
87

You're right. Fixed. (I now understand why we use ShSize for truncating the section).

rahmanl updated this revision to Diff 323512.Feb 12 2021, 7:45 PM
  • Remove errs() lines.
jhenderson added inline comments.Feb 16 2021, 1:26 AM
llvm/lib/Object/ELF.cpp
631
651

clang-format is complaining.

Had to encode ULEB128 since our YAML format does not take in NumBlocks as input.

Sounds like this should be added to the YAML input support, so that you can write a custom NumBlocks value that doesn't match the real number of blocks. We do similar things for ELF sections for example, where you can override the implicit value with a value of your own choice which gets written to the field.

659–666

This is probably fine, but feels a little fragile. How about using joinErrors to combine the two errors into a single one that can be reported, as suggested in the inline edit? That also avoids any risk of problems in the event the code changes such that they could both be in a bad state. joinErrors of an Error::success() and a non-success Error is safe and equivalent to just returning the actual Error, if I remember correctly.

llvm/unittests/Object/ELFObjectFileTest.cpp
583

Consider instead making the suggested yaml2obj change first, and then using that for this test case.

rahmanl updated this revision to Diff 324167.Feb 16 2021, 7:00 PM
rahmanl marked 2 inline comments as done.
  • Use joinErrors to join ULEBSizeErr and the cursor error.
rahmanl added inline comments.Feb 16 2021, 7:01 PM
llvm/lib/Object/ELF.cpp
651

Thanks for the suggestion. Please review D96831.

rahmanl planned changes to this revision.Feb 16 2021, 10:10 PM
jhenderson added inline comments.Feb 17 2021, 12:51 AM
llvm/lib/Object/ELF.cpp
660

Let's use the variable name explicitly.

rahmanl updated this revision to Diff 324458.Feb 17 2021, 4:09 PM
  • Use the NumBlocks field in the unit test, instead of encoding by hand.
  • Change the type of 'NumBlocks' to uint64_t so we can write the invalid unit test.
rahmanl updated this revision to Diff 324461.Feb 17 2021, 4:13 PM
rahmanl marked 3 inline comments as done.
  • Use Cur instead of cursor in the comment.
rahmanl updated this revision to Diff 324477.Feb 17 2021, 5:38 PM
  • Decode the Metadata field consistently with the encoding in AsmPrinter::getBBAddrMapMetadata.
  • Decode the Metadata field consistently with the encoding in AsmPrinter::getBBAddrMapMetadata.

@jhenderson and @shenhan I decided to decode the Metadata field right here so the user wouldn't need to worry about it.

I'd like to do another review pass of the testing before signing off on this, but generally, this is looking good.

llvm/include/llvm/Object/ELFTypes.h
803

Typo/incomplete sentence?

llvm/unittests/Object/ELFObjectFileTest.cpp
583

I would suggest either "Check for proper error ..." or "Check proper error ..."

rahmanl updated this revision to Diff 325647.Feb 22 2021, 7:17 PM
rahmanl marked 2 inline comments as done.

Fix typos.

I'd like to do another review pass of the testing before signing off on this, but generally, this is looking good.

No worries. Please take your time.

Just a few small suggestions, otherwise this looks good, thanks!

llvm/test/tools/llvm-readobj/ELF/bb-addr-map.test
4

Maybe : or . to end this and the Check 32 bit line.

9

I'm not sure "x32" is really a thing. I think you probably just want to call it "64-bit" and "32-bit".

llvm/unittests/Object/ELFObjectFileTest.cpp
533

It might make sense to do proper edge-case testing here, with values of 0xffffffff and 0x100000000.

rahmanl updated this revision to Diff 326628.Feb 26 2021, 1:43 AM
rahmanl marked 3 inline comments as done.
  • Typos and edge-cases.
llvm/unittests/Object/ELFObjectFileTest.cpp
533

Good idea. Done here and below.

@jhenderson Is this good now? Thanks for the review.

jhenderson accepted this revision.Mar 5 2021, 12:42 AM

Apologies for the delay - I've missed most of this week due to illness. LGTM.

This revision is now accepted and ready to land.Mar 5 2021, 12:42 AM

Apologies for the delay - I've missed most of this week due to illness. LGTM.

Thank you for the review. Hope everything is going well for you now.