This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Add support for dumping embedded offloading data
ClosedPublic

Authored by jhuber6 on Jun 2 2022, 11:39 AM.

Details

Summary

In Clang/LLVM we are moving towards a new binary format to store many
embedded object files to create a fatbinary. This patch adds support for
dumping these embedded images in the llvm-objdump tool. This will
allow users to query information about what is stored inside the binary.
This has very similar functionality to the cuobjdump tool for thoe familiar
with the Nvidia utilities. The proposed use is as follows:

$ clang input.c -fopenmp --offload-arch=sm_70 --offload-arch=sm_52 -c
$ llvm-objdump -O input.o

input.o:        file format elf64-x86-64

OFFLOADIND IMAGE [0]:
kind            cubin
arch            sm_52
triple          nvptx64-nvidia-cuda
producer        openmp

OFFLOADIND IMAGE [1]:
kind            cubin
arch            sm_70
triple          nvptx64-nvidia-cuda
producer        openmp

This will be expanded further once we start embedding more information
into these offloading images. Right now we are planning on adding
flags and entries for debug level, optimization, LTO usage, target
features, among others.

This patch only supports printing these sections, later we will want to
support dumping files the user may be interested in via another flag. I
am unsure if this should go here in llvm-objdump or llvm-objcopy.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@jhenderson Is this patch good to land as well?

Sorry for the delay - I've had a lot on my plate.

It seems like most (all?) of my last round of inline comments haven't been addressed?

llvm/test/tools/llvm-objdump/Offloading/Inputs/binary.yaml
31 ↗(On Diff #437900)

Nit: Looks like one too many blank lines at EOF.

llvm/test/tools/llvm-objdump/Offloading/binary.test
1 ↗(On Diff #437900)

In line 3, you run yaml2obj to create the .bin file again. I suggest you simply rename this %t to %t.bin and then delete the later yaml2obj invocation.

It'd probably be useful to have a brief comment at the start of this individual test case, e.g. "Show can dump offloading binaries directly." and an equivalent one at the start of the wrapped-in-ELF case, e.g. "Show can dump offloading binaries embedded in ELF.".

4 ↗(On Diff #437900)

Nit: Perhaps worth renaming %t to %t.elf for this test case to help make it clearer what this code is doing in contrast to the previous case.

Also, I'd add a single blank line (followed by the suggested comment above) between the lines to do with the raw binary and ELF cases.

16 ↗(On Diff #437900)

Nit: by adding this suggested spacing, your output then all lines line up, which I think improves test readability.

llvm/test/tools/llvm-objdump/Offloading/failure.test
17 ↗(On Diff #437900)

This doesn't tell us whether the message is a warning or error, so please include the "warning:" or "error:" prefix in the check.

Additionally, the error message as checked doesn't give sufficient context as to what has gone wrong. For example, it doesn't mention the input file or that it is the offloading code that has failed. See https://llvm.org/docs/CodingStandards.html#error-and-warning-messages. A couple of options include modifying the underlying code to give better error messages, or to "catch" the error and rewrap it with some additional context in the message. llvm-readobj has a number of examples of where this is done that I know of, for example (llvm-objdump may do too).

Finally, in dumping tools, especially in newer code, we try to avoid having hard errors. The reason for this is because it is often useful to be able to see the reainder of the output from other files and/or options, whereas reporting an error is usually a hard end to the program (NB: I haven't double-checked the llvm-objdump behaviour to confirm that the error reporting ends the program, so apologies if this is a bit of misdirection).

I made the second and third of these points in previous inline comments, but they don't seem to have been addressed.

llvm/tools/llvm-objdump/OffloadDump.cpp
21

Why is this a macro rather than just a static const char *?

35–36

With the changes made to yaml2obj, can we now test this default case?

61–63

Could I clarify when you said making a change in the future for the name -> type bit that you mean in a future patch?

72

Nit: SectionRef is designed to be lightweight and copyable (like llvm::StringRef) so there's no particular need to use const & here.

79

You should exercise this error path by adding a SHOffset key to your ELF YAML with an invalid value.

87

Not addressed yet.

jhuber6 updated this revision to Diff 441019.Jun 29 2022, 8:00 AM
jhuber6 marked 9 inline comments as done.

Addressing comments, adding test for warnings.

llvm/test/tools/llvm-objdump/Offloading/failure.test
17 ↗(On Diff #437900)

ReportError does indeed exit the program. There are a lot of other examples of llvm-objdump exiting on malformed input. I think it's reasonable to exit if the user requested --offloading and it's malformed to just exit. I will specify the checks and try to improve the message however.

llvm/tools/llvm-objdump/OffloadDump.cpp
35–36

It's already being tested now. The binary.yaml file has the fourth entry with the None type.

jhenderson added inline comments.Jun 30 2022, 1:00 AM
llvm/test/tools/llvm-objdump/Offloading/Inputs/malformed.yaml
1 ↗(On Diff #441019)

Is this file just meant to be the YAML? It's in an Inputs directory, so the RUN and CHECK lines aren't going to do anything...

llvm/test/tools/llvm-objdump/Offloading/binary.test
1 ↗(On Diff #437900)

It'd probably be useful to have a brief comment at the start of this individual test case, e.g. "Show can dump offloading binaries directly." and an equivalent one at the start of the wrapped-in-ELF case, e.g. "Show can dump offloading binaries embedded in ELF.".

Looks like this bit hasn't been addressed?

4 ↗(On Diff #437900)

Again, the second half of this comment hasn't been addressed.

llvm/test/tools/llvm-objdump/Offloading/failure.test
17 ↗(On Diff #437900)

A lot of those exits are from older code, but the general preference is to move away from the exit-immediately-on-malformed. Imagine the case where you have 4 different offloading binaries you wish to dump (e.g. llvm-objdump --offloading 1.bin 2.bin 3.bin 4.bin), and all 4 of these are malformed for different reasons. You'd end up getting an error on 1.bin and not knowing that the other 3 need fixing too, meaning you'd have to keep rerunning your code until you'd eventually flushed out all of the errors.

Similarly, imagine the binary was wrapped in an ELF, and you wanted to dump other parts of the ELF too. You'd end up only getting as far as the offloading dump before erroring, and not getting any other information you wanted.

17 ↗(On Diff #441019)

Unrelated to my other comments, but why is this string in single quotes?

llvm/test/tools/llvm-objdump/Offloading/warning.test
6 ↗(On Diff #441019)

Is it worth checking that the good binary was dumped successfully?

14 ↗(On Diff #441019)

You can use FileCheck's -D option to check the exact filename:

# RUN: ... | FileCheck -DFILE=%t.elf ...

# CHECK: warning: '[[FILE]]': ...
15 ↗(On Diff #441019)

Nit: too many blank lines at EOF.

llvm/tools/llvm-objdump/OffloadDump.cpp
79

Marked as done but I don't see it?

84

I believe you'll end up with an assertion under the ABI breaking checks config (I think that's the name of it anyway), as you don't actually use the error within BinaryOrErr. I think it would still be good to include the message as reported by the underlying code, but wrapped in the additional context you've added. Also, would be good to include the input file name. Rough idea (uncertain on the exact invocation needed to get the string, as I'm too lazy to look it up right now!):

reportError("while extracting offloading files from \"" + O->getFileName() + "\": " + toString(BinaryOrErr));
91–93

Same comment as above.

103–107

Ditto.

JonChesterfield accepted this revision.Jun 30 2022, 4:52 AM
JonChesterfield added a subscriber: JonChesterfield.

14 revisions seems excessive here, we're into the region of polishing something that's already fine. Let's leave the remaining nits to be fixed in passing in later patches. I want to use this to debug something asap and would like to avoid spinning a local branch with this and my own wip stuff.

jhenderson requested changes to this revision.Jun 30 2022, 4:57 AM

At least one of my comments highlights a thing that will cause tests to fail on bots under some configurations, so this isn't ready to land as-is, even if we were to defer the nits.

This revision now requires changes to proceed.Jun 30 2022, 4:57 AM
jhuber6 updated this revision to Diff 441365.Jun 30 2022, 5:36 AM

Addressing comments. I think what we need is some new function like reportErrorNoExit, otherwise all the file paths handling the printing themselves would be ugly.

Addressing comments. I think what we need is some new function like reportErrorNoExit, otherwise all the file paths handling the printing themselves would be ugly.

This is definitely something I think the tool could benefit from, with a final check at the end of the program to ensure the right exit code is produced, if this function has ever been called. LLD has a similar process for non-fatal errors (it sets a variable that is checked periodically to ensure it is safe to continue). That being said, as this impacts llvm-objdump more widely, I would be happy for it to be deferred to a later patch, so that other code paths could be updated at the same time.

There are a high number of inline comments that I haven't seen be addressed. Most of them are fairly minor, but the volume of them combined with the fact that I've had no response to them makes me concerned that letting the patch land without them means they'll never be addressed. Please could you go through them and either address them or explain why it should either a) be deferred or b) not done at all.

llvm/test/tools/llvm-objdump/Offloading/failure.test
17 ↗(On Diff #441019)

To be clear, I don't think the explanation needs to be in quotes of any variety - only the file name.

Addressing comments. I think what we need is some new function like reportErrorNoExit, otherwise all the file paths handling the printing themselves would be ugly.

This is definitely something I think the tool could benefit from, with a final check at the end of the program to ensure the right exit code is produced, if this function has ever been called. LLD has a similar process for non-fatal errors (it sets a variable that is checked periodically to ensure it is safe to continue). That being said, as this impacts llvm-objdump more widely, I would be happy for it to be deferred to a later patch, so that other code paths could be updated at the same time.

There are a high number of inline comments that I haven't seen be addressed. Most of them are fairly minor, but the volume of them combined with the fact that I've had no response to them makes me concerned that letting the patch land without them means they'll never be addressed. Please could you go through them and either address them or explain why it should either a) be deferred or b) not done at all.

The only comments I'm aware of that I haven't addressed is early exiting on failure and a check on failing to get the section. I think if we truly want to avoid early exits we should put it in a separate patch, I'm not doing anything that the rest of llvm-objdump doesn't already do. I do not think it's necessary to add an error check if we fail to get a section, this is tested already in many places and I see no point to add a completely redundant test to the LLVM test-suite. I can remove the quotation marks prior to landing if you want. If you have any further objections let me know, otherwise I would greatly appreciate being able to move forward with this.

I'm absolutely OK with the high volume of minor comments going unaddressed, especially since they're being added incrementally over a prolonged period of time. I'm also OK with James doing some post commit cleanup to make things better conform to his ideal.

The only comments I'm aware of that I haven't addressed is early exiting on failure and a check on failing to get the section. I think if we truly want to avoid early exits we should put it in a separate patch, I'm not doing anything that the rest of llvm-objdump doesn't already do. I do not think it's necessary to add an error check if we fail to get a section, this is tested already in many places and I see no point to add a completely redundant test to the LLVM test-suite. I can remove the quotation marks prior to landing if you want. If you have any further objections let me know, otherwise I would greatly appreciate being able to move forward with this.

I've gone through and highlighted all the inline comments that still need addressing/answering. I can live with the early exiting in this patch, but still think you need the test I've requested (see inline for explanation).

I'm still not really comfortable with relying on section name to identify the offloading binaries, as it could potentially require tens or even hundreds of thousands of string comparisons under some cases, plus it's not really in the spirit of the ELF file format*. It's also worth noting that for other platforms, the naming schemes are different - for example in Mach-O files, section names are usually __some_section_name instead of .some_section_name, so you'll still have to have at least some platform-specific code in order to retrieve the section. Most (all?) new sections that have any dumping support are distinguished by type for ELF, although curiously, dumping support has only been added for them in llvm-readobj rather than llvm-objdump, so it's not a clear-cut point.

(* The ELF gABI states about the sh_type field: "This member categorizes the section's contents and semantics." and then defines the SHT_PROGBITS type (which I assume the current offloading binaries are) as "The section holds information defined by the program, whose format and meaning are determined solely by the program." Given that the offloading format is not specific to the program, it seems incorrect to say that it is SHT_PROGBITS.)

llvm/test/tools/llvm-objdump/Offloading/binary.test
1 ↗(On Diff #437900)

Not addressed.

4 ↗(On Diff #437900)

Not addressed.

llvm/test/tools/llvm-objdump/Offloading/warning.test
6 ↗(On Diff #441019)

Not addressed.

llvm/tools/llvm-objdump/OffloadDump.cpp
61–63

Not answered.

79

The reason this needs addressing, is because this specific code path in llvm-objdump is otherwise untested. The test is needed to show that llvm-objdump under these specific circumstances properly handles errors if Contents is in an error state. This is actually important because if there is no such test, a change to the reportError whereby Contents isn't used in the message, could result in unchecked errors, but without a test case, these would only manifest under real usage, rather than under testing like they should be.

87

Not addressed.

Do you consider these blocking issues beyond fixing the typos and adding comments? I'm getting very tired of needing to constantly update this patch that others have already signed off on and I'm beginning to feel like this is a fruitless endeavor. If you have no intention of ever letting this through let me know and I'll abandon it so we don't waste any more of our time.

llvm/test/tools/llvm-objdump/Offloading/binary.test
1 ↗(On Diff #437900)

Sure, I'll add a comment.

4 ↗(On Diff #437900)

Sure I'll add a comment.

llvm/test/tools/llvm-objdump/Offloading/warning.test
6 ↗(On Diff #441019)

No, this is the same test checks as the other file and it bloats the test. We already know that it will print the good one.

llvm/tools/llvm-objdump/OffloadDump.cpp
61–63

Yes, I was saying we could do this in a future patch as I didn't think it was a blocking issue for the functionality of this patch. The main reason I did this is just because it was the easiest common solution between extracting these from LLVM-IR and an ELF, that is the section's string will be the same. I was just planning on adding it to the list of existing ones in ELF.h, but as this worked overall I thought it was sufficient to land this patch.

79

There's already usage of getContents() like this in llvm-objdump, I don't see why this is a special case. If someone changed the reportError function to not report errors it should show up somewhere. The point of this patch is not about checking if the ELF works, and we know from similar usage in llvm-objdump that this pattern reports errors if it's malformed. I can add a test if you really want me t, but I fail to see the point even with your hypothetical situation.

87

I'll fix the typo.

Do you consider these blocking issues beyond fixing the typos and adding comments?

The missing test case is the issue that I care about most.

I'm getting very tired of needing to constantly update this patch that others have already signed off on and I'm beginning to feel like this is a fruitless endeavor. If you have no intention of ever letting this through let me know and I'll abandon it so we don't waste any more of our time.

With all due respect to the others who have signed off on this patch already, they aren't routine contributors to this area of the LLVM code-base, so don't know the norms, expected code quality etc of it. Please understand that I am just trying to keep the quality as high as possible. This often means that things have to go through several iterations until they are right. Please also be aware that this patch is not the only patch I am reviewing and I have to balance my regular responsibilities alongside these reviews too, so I don't have time to do a back-and-forth on the patch multiple times a day. I'm certainly not blocking this for the sake of doing so, especially given the time I've spent reviewing the patch.

I'm absolutely OK with the high volume of minor comments going unaddressed, especially since they're being added incrementally over a prolonged period of time.

Most of these comments are as a result of changes in each iteration, where the later iteration either doesn't fully resolve the point I had raised, or introduced other issues. It takes time to get things right. I'd also point out that some inline comments were left without responses for several iterations of the patch, which is part of the reason for this taking this long.

I'm also OK with James doing some post commit cleanup to make things better conform to his ideal.

This isn't how reviews work. It's not my responsibility to address issues introduced with a patch that someone else has just landed. If you think it is, please raise this on the LLVM forums and see what others have to say.

llvm/test/tools/llvm-objdump/Offloading/warning.test
6 ↗(On Diff #441019)

My thinking was that it would be useful, to show that good binaries are dumped despite the warning. The question is really, do you consider it a guarantee that all good binaries will be dumped even if a later one is bad, and if not, will users be concerned if the behaviour ever changed (by accident or otherwise) to check all the binaries are good before dumping any of them?

llvm/tools/llvm-objdump/OffloadDump.cpp
61–63

Okay.

79

Imagine if the code were changed to the following by somebody:

if (!Contents)
  reportError("failed to get offloading section contents", O->getFileName());

There would be no test failure under any situation, because there is no test. You would get a crash though if someone were to try to use llvm-objdump with the enhanced error checks enabled, and ran into a malformed binary.

This isn't really a contrived example either: in an earlier revision of this patch, there was a similar situation, where the error was thrown away without getting its message, so I don't think it's unreasonable to assume that it could occur in a later revision of this code.

I'm not asking for testing that errors are reported when getting malformed contents, I'm asking for testing that the error returned by the lower-level function is handled by this higher-level one.

jhuber6 updated this revision to Diff 441681.Jul 1 2022, 6:41 AM

Addressing comments adding test

jhenderson accepted this revision.Jul 1 2022, 6:52 AM

Looks good now, with a couple of nits to be addressed prior to being committed, and a slight change to a test.

llvm/test/tools/llvm-objdump/Offloading/binary.test
1 ↗(On Diff #441681)

Nit: in newer tests in the binary tools at least, we use ## for test comments to help them stand out from the RUN and CHECK directives. Same throughout the new comments.

5 ↗(On Diff #441681)

Ditto. Also typo: "insode" -> "inside".

llvm/test/tools/llvm-objdump/Offloading/warning.test
7–8 ↗(On Diff #441681)

These should be a single invocation and just check that the warning appears in the right place in the output with respect to the regular output. Sorry if that wasn't clear from my earlier comments.

If you want, you can also abbreviate your other checks to e.g. just the OFFLOADING IMAGE line - as you rightly point out, we test that the dumping works properly elsewhere.

This revision is now accepted and ready to land.Jul 1 2022, 6:52 AM
jhuber6 updated this revision to Diff 441683.Jul 1 2022, 6:59 AM

Thanks, addressing final nits.

MaskRay added inline comments.Jul 1 2022, 10:08 AM
llvm/tools/llvm-objdump/OffloadDump.h
19

Use const reference if non-null

MaskRay accepted this revision.Jul 1 2022, 10:12 AM
MaskRay added inline comments.
llvm/test/tools/llvm-objdump/Offloading/content-failure.test
1 ↗(On Diff #441683)
llvm/tools/llvm-objdump/OffloadDump.cpp
21

The variable is writable.

static is unneeded for const variables (internal linkage by default).

24

Use const reference if non-null

43

Append [ to OFFLOAD IMAGE

62

delete unneeded blank line

71

Use const reference if non-null

99

Use const reference if non-null

jhuber6 added inline comments.Jul 1 2022, 10:47 AM
llvm/tools/llvm-objdump/OffloadDump.cpp
43

Could you elaborate on this? do you want it formatted like OFFLOADING IMAGE[0]?

MaskRay added inline comments.Jul 1 2022, 10:51 AM
llvm/tools/llvm-objdump/OffloadDump.cpp
43

outs() << "\nOFFLOADING IMAGE [" << Index << "]:\n"

MaskRay added inline comments.Jul 1 2022, 10:52 AM
llvm/test/tools/llvm-objdump/Offloading/binary.test
1 ↗(On Diff #441683)

to see conveys no information and should be deleted.

ditto elsewhere

llvm/test/tools/llvm-objdump/Offloading/warning.test
1 ↗(On Diff #441683)
jhuber6 updated this revision to Diff 441747.Jul 1 2022, 11:09 AM

Addressing nits.

MaskRay added a comment.EditedJul 1 2022, 11:23 AM

I'm absolutely OK with the high volume of minor comments going unaddressed, especially since they're being added incrementally over a prolonged period of time. I'm also OK with James doing some post commit cleanup to make things better conform to his ideal.

FWIW I don't think this is fine. If a minor comment is due to existing code, it's fine; otherwise it's not. I have only read a few comments and I think they are all relevant.
(Thanks to @jhenderson who has done a great job ensuring the tests are clean, readable, and maintainable.)
I do not think this new feature has rights to deviate from the usual standard.

There are many comments which have been addressed but haven't been marked "done" (tip: you can click "done" before arc diff and these comments will automatically be marked done).
They can be distracting to reviewers as they might think the patch is still in a not-ready state.

This revision was landed with ongoing or failed builds.Jul 1 2022, 6:13 PM
This revision was automatically updated to reflect the committed changes.