This is an archive of the discontinued LLVM Phabricator instance.

[bcanalyzer] Recognize more stream types
ClosedPublic

Authored by modocache on Jan 11 2018, 9:33 PM.

Details

Summary

llvm-bcanalyzer prints out the stream type of the file it is
analyzing. If the file begins with the LLVM IR magic number, it reports
a stream type of "LLVM IR". However, any other bitstream format is
reported as "unknown".

Add some checks for two other common bitstream formats: Clang AST
files, which begin with 'CPCH', and Clang serialized diagnostics, which
begin with 'DIAG'.

Test Plan: check-llvm

Diff Detail

Repository
rL LLVM

Event Timeline

modocache created this revision.Jan 11 2018, 9:33 PM
davide requested changes to this revision.Jan 12 2018, 7:28 AM
davide added a subscriber: davide.

I don't mind the direction but this needs a test.

This revision now requires changes to proceed.Jan 12 2018, 7:28 AM
modocache added a comment.EditedJan 12 2018, 7:36 AM

@davide I mentioned this in the "Test Plan" above, but I'm not sure how to test this without relying upon Clang. I do have a diff up for review that adds tests for this to the Clang repository, though: https://reviews.llvm.org/D41980. Maybe I'm missing something, though -- do you have any tips on how to test this entirely in the LLVM repo, so that I can include tests in this diff?

Have you considered checking in the final artifacts (bitcode), maybe with a short description on how to craft them in case we need to regenerate?

aprantl added inline comments.Jan 12 2018, 8:54 AM
tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
799 ↗(On Diff #129578)

Half-serious: How many implementations of this check do we have now (including clang)? It might be nice to we could add a checkMagic(stdd::array<char, 4>) function to BitStreamReader.

Very smart idea, @davide! I'll do that.

tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
799 ↗(On Diff #129578)

Oh, great point!

I think @davide's suggestion to add a test for this is a good one. What do you think of me landing this with a test first, and then in another commit I can consolidate the logic?

modocache updated this revision to Diff 129642.Jan 12 2018, 9:21 AM

Add tests directly to the LLVM test suite. Thanks, @davide!

Friendly ping! Is this looking OK?

The two binaries are almost .5M large. Is that the smallest we can do, or would two 4-byte files with just the magic number be sufficient as well?

aprantl added inline comments.Jan 18 2018, 8:06 AM
tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
799 ↗(On Diff #129578)

Could you add a helper function for comparing a magic number read from a bitstream?

bruno added a subscriber: bruno.Mar 23 2018, 3:54 PM
bruno added inline comments.
test/Bitcode/stream-types.c
6 ↗(On Diff #129642)

I see tests in clang using llvm-bcanalyzer, maybe you can also add tests there when this is done?

george.karpenkov resigned from this revision.Mar 26 2018, 1:35 PM
modocache updated this revision to Diff 143457.Apr 21 2018, 9:50 AM
modocache edited the summary of this revision. (Show Details)

Sorry for letting this diff languish for so long. I updated the tests to be simple files with just the bytes necessary to be picked up as a specific stream type. I also moved the magic number reading over to a static helper function. I considered adding this function as a member of llvm::BirstreamReader, but chose not to for now, because doing so (as far as I can tell) would mean encoding Clang's magic numbers into LLVM's machinery. Let me know if you have different thoughts. Thanks!

JDevlieghere accepted this revision.Apr 21 2018, 1:38 PM
JDevlieghere added a subscriber: JDevlieghere.

Would you mind adding one more test that ensure the tool doesn’t crash with incomplete input, e.g. a file containing just CP or DI? Other than that this LGTM.

modocache updated this revision to Diff 143462.Apr 21 2018, 4:34 PM

Thanks, @JDevlieghere! I added two tests like the ones you described. Thankfully it doesn't crash because llvm-bcanalyzer verifies the bitcode invariant that input sizes be a multiple of 4.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 21 2018, 4:56 PM
This revision was automatically updated to reflect the committed changes.

Strange, the revision appeared to have been "Accepted" by @JDevlieghere. I wonder why the message says it wasn't accepted...? Apologies in advance if I did something incorrectly when landing this.

Thanks, @JDevlieghere! I added two tests like the ones you described. Thankfully it doesn't crash because llvm-bcanalyzer verifies the bitcode invariant that input sizes be a multiple of 4.

Thanks, Brian!

Strange, the revision appeared to have been "Accepted" by @JDevlieghere. I wonder why the message says it wasn't accepted...? Apologies in advance if I did something incorrectly when landing this.

If someone request changes, they have to explicitly accept for Phabricator to consider the whole diff as accepted. I think the UI could do a better job at showing this, because I've seen this confuse people on several occasions. Anyway, I'm sure there's nothing to worry about here since you had already addressed @davide's comments.