This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dis] Add an option `print-thinlto-index-only` in llvm-dis to print index in LLVM assembly.
ClosedPublic

Authored by mingmingl on Jan 13 2022, 12:52 PM.

Details

Reviewers
tejohnson
Summary

Context:

  • In the ThinLTO minimized code [1], ThinLTOBitcodeWriter has a different format for GlobalVar [2] that's not yet supported by BitcodeReader yet. So add a flag to tell llvm-dis to parse thin-lto index only.
  • From the iterative discussion and development of this patch, there isn't a clean, automatic way to choose a bitcode reader (that's as simple as adding a flag llvm-dis). An alternative option is to modify BitcodeReader to support the GlobalVar format, but that's going to be more involved change.

[1] e.g., bitcode generated by opt --thin-link-bitcode-file=<minimized-bitcode> or clang -Xclang -fthin-link-bitcode=<minimized-bitcode>
[2] https://github.com/llvm/llvm-project/blob/b6a93967d9c11e79802b5e75cec1584d6c8aa472/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp#L4741

Diff Detail

Event Timeline

mingmingl created this revision.Jan 13 2022, 12:52 PM
mingmingl requested review of this revision.Jan 13 2022, 12:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2022, 12:52 PM
mingmingl added inline comments.Jan 13 2022, 2:38 PM
llvm/tools/llvm-dis/llvm-dis.cpp
206–207

(Just a message before fixing the failed test)

In pre-merge test, llvm-dis parsing behavior changed for a bitcode that's invalid itself; the behavior changed from parse error to crash (test expects llvm-dis to return parse error while now llvm-dis crashes).

After a quick test, this is caused by the change of read order here;

I'll see how to solve this without changing test expectation from not to crash (i.e., llvm-dis still reports error rather than crash upon invalid bitcode).

The relevant checker line in invalid.test is /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-dis -disable-output /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Bitcode/Inputs/invalid-pointer-element-type.bc

mingmingl added inline comments.Jan 13 2022, 4:10 PM
llvm/tools/llvm-dis/llvm-dis.cpp
206–207

To add more context on getLazyModule and getLTOInfo and regarding how a bitcode is read

a) `getLazyModule` will invoke `BitcodeReader::parseModule`, which parses each subblock and exit-on-err when invalid pointer is read [1]; this is also where `invalid record` is returned when non-index-reader is used to parse minimized lto code.

b) `getLTOInfo` [2] has the equivalent loop to iterate blocks and subblocks, but skips uninteresting subblocks. As a result the invalid block `TYPE_BLOCK_ID_NEW` is read in a skipping fashion (while it's detected if parsed), so the read action right after `TYPE_BLOCK_ID_NEW` is in a sense undefined and triggered `invalid abbrev id` [3] in this case.

This means, llvm-dis gives more clear error log by parsing first and tell which section is not correct; if llvm-dis tries to read LTO info before parsing, and reading LTO info doesn't parse, the error log is less helpful than error log from parsing.

On the other hand, if a wrong reader is used during parsing, llvm-dis gives false alarms (e.g., llvm-dis cannot read minimized thinlto bitcode now)

From this perspective, it seems the right for llvm-dis to first parse and detect error; two options at the top of my head to solve this

-In `getLazyModule`, use the read-and-fail approach (try common reader first, fallback to the other reader, and returns error if the fallback doesn't work); the index-flag in module block is probably not necessary in this option to solve `llvm-dis` issue.
 
- Add another flavor of `getLTOInfo` that does parsing; but the problem of choosing a right reader still exists.

Maybe I should book a chat to discuss this further, but this patch is not in a hurry and not in the critical path anyway.

btw verified the loop to read module block is the same between getLTOInfo and getLazyModule implementation by printing names out (getLazyModule returns error on the 3rd entry when parsing it, getLTOInfo skips 3rd entry and fails on the 4th one).

[1] https://github.com/llvm/llvm-project/blob/d2cc6c2d0c2f8a6e272110416a3fd579ed5a3ac1/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L3657
[2] https://github.com/llvm/llvm-project/blob/d2cc6c2d0c2f8a6e272110416a3fd579ed5a3ac1/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L6982
[3] https://github.com/llvm/llvm-project/blob/6a1f50b84ae8f8a8087fcdbe5f27dae8c76878f1/llvm/include/llvm/Bitstream/BitstreamReader.h#L528

tejohnson added inline comments.Jan 14 2022, 4:12 PM
llvm/tools/llvm-dis/llvm-dis.cpp
206–207

Hmm, I see the issue. It is unfortunate to lose the more specific error message. Let me think about this some more.

mingmingl updated this revision to Diff 401052.Jan 18 2022, 5:10 PM
mingmingl retitled this revision from [Bitcode] [ThinLTO] Add a new bitcode module record for THINLTO_INDEX_FLAG to [llvm-dis] Add an option `dump-thinlto-index-only` in llvm-dis to read ThinLTO minimized code only..
mingmingl edited the summary of this revision. (Show Details)

Add a flag in llvm-dis to print index, and it's up to command runner to decide if the input is a fit (e.g., a ThinLTO bitcode).

Add a regression test for it.

tejohnson added inline comments.Jan 18 2022, 5:56 PM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
3269

FYI looks like FUNCTION, ALIAS and IFUNC have the same issue. I'm not sure if there is a good common place to leave a comment about handling all of those?

llvm/test/Bitcode/thinlto-index-disassembled-by-llvm-dis.ll
15

Check that the option also works on %t.o

llvm/tools/llvm-dis/llvm-dis.cpp
79

"dump the index as a bitcode file" Do you mean dump/print the LLVM assembly? Suggest replacing "dump" with "print" in the option also.

mingmingl marked 2 inline comments as done.

Revise based on feedback.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
3269

Good point!

Moved this to switch statements of BitcodeReader::parseModule, mentioned other simplified records, and added a github pointer to ThinLinkBitcodeWriter::writeSimplifiedModuleInfo (in case the method itself was renamed in the future).

Remove outdated verbose comment in BitcodeReader::parseGlobalVarRecord since there is a comment to cover more types of records.

mingmingl retitled this revision from [llvm-dis] Add an option `dump-thinlto-index-only` in llvm-dis to read ThinLTO minimized code only. to [llvm-dis] Add an option `print-thinlto-index-only` in llvm-dis to read ThinLTO minimized code only..Jan 19 2022, 10:21 AM
tejohnson added inline comments.Jan 19 2022, 1:43 PM
llvm/test/Bitcode/thinlto-index-disassembled-by-llvm-dis.ll
16

If you change the suffix part to a wildcard then I believe the same checking lines could be shared by both llvm-dis invocations (e.g. remove FULL below and use DIS). Or just stop matching after thinlto-index-disassembled-by-llvm-dis.ll.tmp instead of using a wildcard.

llvm/tools/llvm-dis/llvm-dis.cpp
79

I would remove "as a bitcode file" since we aren't printing as bitcode (bitcode is the encoded input format), or use my initial suggestion to replace this with "LLVM assembly".

mingmingl updated this revision to Diff 401405.Jan 19 2022, 2:23 PM
mingmingl retitled this revision from [llvm-dis] Add an option `print-thinlto-index-only` in llvm-dis to read ThinLTO minimized code only. to [llvm-dis] Add an option `print-thinlto-index-only` in llvm-dis to print index in LLVM assembly..

Revise based on feedback.

llvm/test/Bitcode/thinlto-index-disassembled-by-llvm-dis.ll
16

Done by stop matching after thinlto-index-disassembled-by-llvm-dis.ll.tmp

(Was initially thinking of keeping the suffix explicit but there is no ambiguity in the context anyway, so sharing is reasonable to me).

mingmingl marked an inline comment as done.Jan 19 2022, 2:24 PM
mingmingl added inline comments.
llvm/tools/llvm-dis/llvm-dis.cpp
79

thanks for pointing this out! Update to LLVM assembly now.

This revision is now accepted and ready to land.Jan 19 2022, 3:49 PM
mingmingl marked an inline comment as done.Jan 19 2022, 6:07 PM

pre-merge tests all passed. Going to submit the patch.

mingmingl closed this revision.Jan 19 2022, 6:21 PM

This is submitted in https://github.com/llvm/llvm-project/commit/e95ad93e6ef874ad208028c18b4cb4890320f25c

It seems that revision link isn't appended in commit message (maybe this time I didn't use arc), but mention this revision in https://github.com/llvm/llvm-project/commit/e95ad93e6ef874ad208028c18b4cb4890320f25c#commitcomment-64209086 for future reference.