This is an archive of the discontinued LLVM Phabricator instance.

Add support for JSON output style to llvm-symbolizer
ClosedPublic

Authored by aorlov on Feb 17 2021, 10:58 AM.

Details

Summary

This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aorlov marked 2 inline comments as done.Mar 18 2021, 4:09 AM
aorlov added inline comments.
llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
118–119

I'm using the exact type that symbolizeFrame() returns.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
147

Please note JSON allows 53-bits numbers. So we print huge numbers as strings "0x...".

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
158

We need the error code for the clear logic while error handling. The error message may be localized, etc.
DIPrinter must only print the data but do not handle errors.
It is impossible to get the error code from the Error without handling.
Initially I have used ErrorInfoBase instead to pass the error information to DIPrinter.
Now DICommon contains ErrorCode (it can be used as "success" flag too).
The error message is stored in DICommon<std::string>.Result only in case of error.

aorlov updated this revision to Diff 331515.Mar 18 2021, 4:14 AM
aorlov marked an inline comment as done.
aorlov updated this revision to Diff 331542.Mar 18 2021, 6:33 AM

The refactoring is done.
Thanks for reviewing!

To separate concerns, it would be a good idea to put the DIPrinter refactoring in a separate prerequisite patch, that the JSON implementation patch depends on.

I'll probably have loads more comments once that is done, but it'll be easier to sift through them once it is.

llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
32

This isn't a good type name. What is a "Common"? Would it make more sense to pass the common data as a separate argument to the non-common Result?

53

Perhaps printError is good, because the type signature in its current form doesn't make it clear that this is for error reporting.

56

We should probably have DIPrinterLLVM and DIPrinterGNU. The two could largely share functionality under the hood, but I think it is a cleaner interface than one implementation of DIPrinter controlling all its output style, and the other having to look at another variable to choose between styles.

121

Make sure to run clang-format on all your new code.

140–141

Use StringRef, not const std::string &.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
191

Delete this. We don't use arbitrary comment markers to divide up files in LLVM coding style.

195

You seem to have ignored my earlier comments re. printing everything?

207

Please review the separate DIPrinter refactoring patch here https://reviews.llvm.org/D98994

Thanks for the other patch. I'll take a look at it shortly. Could you rebase this patch on that one, please, so that it's easier to see what the adding of a new output style will entail, once the refactoring has been done?

Also, if you add "Depends on DXXXXXX" (where XXXXXX is the number from the patch URL) to this patch description, it will automatically do some Phabricator magic to allow people reviewing this to see that there are other prerequisite patches.

aorlov updated this revision to Diff 333102.Mar 24 2021, 1:08 PM
aorlov edited the summary of this revision. (Show Details)
aorlov updated this revision to Diff 335282.Apr 5 2021, 9:50 AM
aorlov edited the summary of this revision. (Show Details)
aorlov updated this revision to Diff 335455.Apr 6 2021, 3:22 AM

Ping.

Note the build status is red because of a problem not related to this patch.

aorlov updated this revision to Diff 336228.Apr 8 2021, 2:12 PM
aorlov updated this revision to Diff 336260.Apr 8 2021, 4:24 PM
aorlov updated this revision to Diff 336271.Apr 8 2021, 4:48 PM
aorlov updated this revision to Diff 336539.Apr 9 2021, 12:06 PM

One more ping.

Apologies for disappearing @aorlov - I was off work for over two weeks, and only started back today. I'm working through a big backlog, and will hopefully get to look at this patch in the next day or two.

I've not looked at the testing yet. Here are comments on the code to keep you going though.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
213

static

215

static

220

I suspect from a user's perspective they won't expect to see an Error attribute if there's no error.

221

What is ErrorCode supposed to represent? How will a user benefit from it beyond the information provided in the message?

239

This is intended to be machine readable. "Noise" in the output isn't an issue. In fact, as previously mentioned, not having these and other attributes is actively harmful to the user experience, as it makes it harder to write a parser that consumes this JSON.

In the case where you have a BadString output, I'd just print an empty string. Example:

{ "Source" : "", "FunctionName" : "", ... , "Line" : 0, "Column" : 0, "Discriminator" : 0 }
261

What's the point of the new line? Same goes elsewhere where you are adding new lines. If the output is intended to be machine readable, there is no need for the new lines.

312

Same as above - just print everything.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
156–157

I missed this in the other review. We don't need to make an ErrorInfo here at all. Just pass the InputString in instead. That will simplify both the call-site and the printInvalidCommand function.

354–357

I would make it always a list, not just for multiple input addresses. Also, shouldn't this be using the json stream arrayBegin/End methods?

360

There's no need for this outer if.

361–364

How about:

StringRef Sep = "";
for (StringRef Address : InputAddresses) {
  outs() << Sep;
  Sep = ",";
  symbolizeInput(Args, AdjustVMA, IsAddr2Line, Style, Address, Symbolizer,
              *Printer);
}

This may be moot however, if you are using the JSON stream to use the proper JSON array methods.

aorlov updated this revision to Diff 339053.Apr 20 2021, 4:53 PM
aorlov marked 5 inline comments as done.Apr 20 2021, 5:00 PM

I do not think we should always include everything and anything into JSON. There is nothing wrong with skipping parameters with unknown values, not applicable data and such.

For example DILocal contains Optional<int64_t> FrameOffset. In JS it would be declared as FrameOffset?: number; and handled natively.

If FrameOffset is not specified we cannot print out the value, any number is a valid offset, and a non-number string would confuse the parses much more than just the optional field which is skipped when N/A.

Besides, our customers are happy with the proposed JSON and do not have any problem parsing optional data.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
220

Error: {Code: 0} – is a standard way to say success. And it is a bad idea to omit Error if it should be checked first.

221

The ErrorCode is more important than a message when automating the error handling. The error message may be localized, depend on OS, etc.

239

Ok, but I still omit empty Error Message and FrameOffset.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
361–364

Did you forget about different commas in GNU/LLVM output style? I do not want any functional change in that area as a part of this patch.
I have added groupBegin() and groupEnd() to DIPrinter interface and moved all logic to JSONPrinter implementation.

jhenderson added inline comments.Apr 21 2021, 1:28 AM
llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
220

Right, but basically the behaviour of checking whether "Error" exists is no more complicated in most languages than checking whether it has value zero.

I would kind of think there would be two possible JSON objects produced per query (possibly three if we handle invalid commands differently to other errors for JSON output):

{"Error":"some message/code/whatever"}
{"Source":"some path",...}

Where the response had an error, it is unlikely the other parameters can be relied upon in any meaningful way, so it's probably better to omit them than potentially cause confusion.
The pseudo-python logic for this might look something like:

response = json.load(output)
if 'Error' in response:
  handleError(response)
else:
  handleNormalResponse(response)
221

The error codes contained in llvm::Error and llvm::Expected are quite often somewhat arbitrary, and inconsistent. It won't be possible to safely rely on these codes for any useful automated processing. Furthermore, some Errors could contain inconvertibleErrorCode which will cause a problem if these end up back here.

What's the use-case for handling different error kinds differently? I'm not saying there isn't a motivation for that, I'm just trying to understand how you plan to use it. If you have no such plan, it doesn't make sense to add additional logic to distinguish errors by code as well as message.

239

That seems reasonable, thanks for the explanation.

255

I didn't think of this earlier, but it makes a lot of sense to have the pretty printing form be more human readable, with indentation and new lines as appropriate. Thanks!

271

Did you consider having a single json::OStream as a member of the JSONPrinter class, so that it only needs constructing (with the Pretty check) in one place?

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
361–364

Thanks, yes, I'd forgotten this was the higher-level printer area.

"group" doesn't obviously mean anything to me. Could you consider renaming it to something like "listBegin" etc? I think that more clearly indicates what you're doing.

aorlov updated this revision to Diff 339258.Apr 21 2021, 8:50 AM
aorlov marked an inline comment as done.Apr 21 2021, 8:57 AM
aorlov added inline comments.
llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
271

Note there is no way to reset json::OStream context, so we cannot reuse the same instance to print 2 or more objects while processing stdin.
I have moved json::OStream to printJSON(json::Value).

aorlov updated this revision to Diff 339311.Apr 21 2021, 10:38 AM

I disagree and still think the ErrorCode is needed for the proper error handling, even if it is arbitrary and inconsistent, which is the usual case everywhere anyway.
I see your point, though, and did what you suggested to unblock this patch and get it committed.

I've reviewed the test cases more thoroughly today. I think you need test cases where the addresses are specified on the command-line rather than via stdin, because there's a behaviour difference (objects are in list or not as the case may be).

llvm/docs/CommandGuide/llvm-symbolizer.rst
256

Perhaps worth briefly mentioning the behaviour changes for the following cases:

  1. Stdin versus addresses on command-line (i.e. separate individual objects versus list of objects)
  2. --pretty-print versus no --pretty-print. Optionally, this could be under the --pretty-print option discussion.

I don't think you need concrete examples of the differences. Just a one/two sentence description.

279

Maybe use the --pretty-print option to make this example a bit more readable?

llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
126

I guess it might be helpful to know what this list is of (e.g. ObjectList or whatever).

llvm/test/tools/llvm-symbolizer/output-style-json-code.test
4–10

My understanding is that the --no-inlines option has no impact on the output here, so I think you can drop one of these cases?

12

Rather than use {{.*}}/no-file.exe here and similar situations for paths below, I'd recommend using FileCheck's -D option to ensure the correct path is printed, i.e:

# RUN:   FileCheck %s ... -DFILE=%p/no-file.exe

# NO-FILE: ... "ModuleName":"[[FILE]]" ...
16

Maybe rather than NOT-FOUND-1 and NOT-FOUND-2, it would be more self-descriptive using NOT-FOUND-NOINLINES and NOT-FOUND-INLINES or something like that?

18

Just thinking about usability - what is the canonical way for users to know that their address hasn't been found in JSON output? It might be worth documenting this somewhere.

26

Consider a comment for this case, introducing the following set of test cases. In particular, it's important to note that this is using a stdin argument. You could also move all the references to "no-inlines" to one place. I.e. something like "this test case is testing stdin input, with the --no-inlines option" would do the trick.

Same sort of thing goes below.

29

Nit: missing full stop.

30

I find the "Address":"0x0" bit here somewhat confusing, given this is an error case. I'd consider omitting it entirely. Alternatively, simply print what the input address was specified as (in this case "Address":"some text").

33

It would be nice to have one or more test cases with a non-zero discriminator value. Also, where the "Source" parameter is non-empty.

You can simplify {{/|\\\\}} to this: {{[\\/]}}. I think it's slightly more readable due to avoiding the quadruple backslash. Same goes elsewhere below.

50

I guess the obvious question is: why do we have a difference in behaviour surrounding the FunctionName attribute? The motivation for different symbolizer/addr2line output is because llvm-addr2line needs to be compatible with GNU addr2line, but that principle doesn't apply for JSON output (which AFAIK is not a GNU addr2line supported feature).

54

Nit: missing full stop.

llvm/test/tools/llvm-symbolizer/output-style-json-data.test
33

I'd make one of these addresses non-zero, as that will show that the "Address" and "Start" parameters are not just always 0.

llvm/test/tools/llvm-symbolizer/output-style-json-frame.test
30

I think you need a test case with a non-empty "TagOffset".

42

It looks like me like this hasn't been addressed?

aorlov updated this revision to Diff 339790.Apr 22 2021, 3:02 PM
aorlov marked 11 inline comments as done.Apr 22 2021, 3:13 PM
aorlov added inline comments.
llvm/test/tools/llvm-symbolizer/output-style-json-code.test
12

Unfortunately -D would not work in this case, as JSON has own rules for escaping special symbols in paths.

16

It is not applicable anymore.

18

Yes. This is the Symbolize library design issue. As far as I can tell there is no reliable way to get a distinct result for the symbol not found case.
For now I just keep it transparent by serializing whatever the library returns, as addressing the library issue is out of the scope of this patch.

33

I have added the test for a non-zero discriminator. Note the address is hardcoded. It is more correct to use /Inputs/discrim.inp via stdin, but I just copied the address from discriminator.test.
It seems currently we have no binaries in llvm/test/tools/llvm-symbolizer/Inputs with a valid Source info.

You can simplify

No, it does not work because JSON has own rules for escaping special symbols in paths.

aorlov added inline comments.Apr 22 2021, 3:13 PM
llvm/test/tools/llvm-symbolizer/output-style-json-code.test
50

This behavior is a part of llvm-symbolizer and does not depend on the output styles and does not belong to the printer (look at decideHowToPrintFunctions()).
Not sure if I understand what you are saying. Changing the behavior is out of the scope of this patch.
I agree that it does not make much sense in testing that, but one of the reviewers requested these tests.

llvm/test/tools/llvm-symbolizer/output-style-json-frame.test
42

Intention was to break a dependency on DWARF generation in clang. But I have changed it build from the C source to keep it simple for reading.

aorlov updated this revision to Diff 339949.Apr 23 2021, 2:01 AM

Hi @aorlov,

The community norm is for a week between updates and a ping/between consecutive pings. I haven't had a chance to come back to this just yet due to my workload. I hope to get to it in the next day or two.

Out of time for today. Will come back to this another time.

llvm/docs/CommandGuide/llvm-symbolizer.rst
256–258

Make this all one paragraph, and reflow to 80 character limit. I'd actually rephrase it slightly too:

"If addresses are supplied via stdin, the output JSON will be a series of individual objects. Otherwise, all results will be contained in a single array."

279

Sorry, use -p, not --pretty-print (I just realised that's what's used in the previous example).

291–293

I'd rephrase as in the inline comment (please make sure to reflow if necessary).

llvm/test/tools/llvm-symbolizer/output-style-json-code.test
4–5

Please reflow this comment to 80-character limit.

7
12–13

As before, do we need both --no-inlines and --inlines cases?

33

It seems currently we have no binaries in llvm/test/tools/llvm-symbolizer/Inputs with a valid Source info.

Consider generating one at test time using assembly or yaml2obj.

You can simplify

No, it does not work because JSON has own rules for escaping special symbols in paths.

Okay - I missed the double backslash.

50

Ah, I think we've hit on a fundamental question regarding JSON output - what should the options that change what information is printed do to JSON output? I'm thinking here specifically both --functions and --addresses, but it may apply to others too.

aorlov updated this revision to Diff 340882.Apr 27 2021, 9:40 AM
aorlov marked 7 inline comments as done.Apr 27 2021, 9:49 AM
aorlov added inline comments.
llvm/test/tools/llvm-symbolizer/output-style-json-code.test
7

But note the library name is symbolize, not symbolizer.

33

I have added output-style-json-code-source.c.

aorlov updated this revision to Diff 340908.Apr 27 2021, 10:49 AM
aorlov marked an inline comment as done.

Used --print-source-context-lines to control Source printout in JSON.

aorlov updated this revision to Diff 341070.Apr 27 2021, 9:50 PM
aorlov updated this revision to Diff 342279.May 2 2021, 2:34 PM
jhenderson added inline comments.May 5 2021, 2:04 AM
llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
99–100

Is this clang-formatted properly? Slightly surprised there's no space after override.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
217–218

ErrorCode is unused now. Also, no need for the const on ErrorMsg.

219

Usually we use StringRef.str() to convert to a std::string. Same applies below in the error message line.

234–235

Don't abbeviate these names unnecessarily like this. Array and FrameCount would both be acceptable, for example. In general, avoid single letter variable names except for loop counters. Same goes throughout this patch. See https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly for details:

We cannot stress enough how important it is to use descriptive names. Pick names that match the semantics and role of the underlying entities, within reason. Avoid abbreviations unless they are well known.

I think you can inline N into the initialising part of the for loop? I don't see it being used outside the loop.

238

Object maybe?

248

I think adding source context can be a later patch. Let's try to avoid adding more than we have to in this one patch.

llvm/test/tools/llvm-symbolizer/output-style-json-code-source.c
3 ↗(On Diff #342279)

This won't work. clang isn't available in llvm-symbolizer tests, as the latter are part of the LLVM layer. Clang depends on the LLVM layer, but not vice-versa. You might be able to use llvm-mc -g to compile some assembly to be able to test this though.

llvm/test/tools/llvm-symbolizer/output-style-json-code.test
4
7

Fair point. How about simply ## Show how library errors are reported in the output.

llvm/test/tools/llvm-symbolizer/output-style-json-data.test
6

Same comment as the code test.

28

Same comment as above.

llvm/test/tools/llvm-symbolizer/output-style-json-frame.c
24 ↗(On Diff #342279)

As above. This won't work. You should use llvm-mc to build from assembly, as in other test cases.

aorlov updated this revision to Diff 343189.May 5 2021, 2:15 PM
aorlov marked 7 inline comments as done.May 5 2021, 2:31 PM

I think adding source context can be a later patch. Let's try to avoid adding more than we have to in this one patch.

Nice.
Note you asked for a new test for Source a week ago.
Ok, I have removed Source from the JSON output and removed the new test for Source.

I have restored output-style-json-frame.test instead of output-style-json-frame.c
Note output-style-json-frame.test is huge because it is machine generated and you asked for a non zero and non empty TagOffset.
I don't like an idea to clean up or optimize it anyway.

llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
99–100

It is generated by clang-format and validated by clang-tidy.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
238

It is not necessary anymore because Source has been removed.

dblaikie added inline comments.May 5 2021, 2:33 PM
llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
99–100

The extraneous trailing semicolon is probably confusing clang-format, so maybe remove those and see how it gets formatted?

aorlov updated this revision to Diff 343427.May 6 2021, 8:56 AM
aorlov marked an inline comment as done.
jhenderson added inline comments.May 7 2021, 12:38 AM
llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
99–100

Ah, good spot!

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
218

There's still an unfortunate single-letter variable name here. I'd suggest Json or Object for the variable name.

227

Same comment as above. How about InliningInfo?

235

LI is a similar abbreviation which doesn't clearly indicate its name. What's wrong with LineInfo?

247

J -> Json or Object

260

Object or Json.

271

Local

272

Maybe use FrameObject here to disambiguate from the toJson retiurn below.

283

Json or RequestJson or similar.

302

Json or Object

llvm/test/tools/llvm-symbolizer/output-style-json-code.test
47

If I follow this correctly, this test case is showing that llvm-addr2line with -f results in the function name being printed? Assuming that's correct, I think you need to show that llvm-addr2line without -f results in the function name not being included in the JSON output too.

llvm/test/tools/llvm-symbolizer/output-style-json-frame.test
39

Please include the version of clang explicitly here, not just in the ident string below. The reason is that someone might come along and trim off the superfluous parts of the below assembly to minimise the test case, but it would still be helpful to know how to generate the unmodified part. I would also use a final release version of clang, so that a future user can easily get the exact version of clang well into the future.

aorlov updated this revision to Diff 343649.May 7 2021, 5:14 AM
aorlov marked 11 inline comments as done.May 7 2021, 5:18 AM
dblaikie added inline comments.May 7 2021, 12:55 PM
llvm/test/tools/llvm-symbolizer/output-style-json-frame.test
39

This seems like a fairly non-trivial build - might be worth an explanation about what's interesting about this case that's not exercised by simpler cases (such as ones without sanitizers, maybe also x86 plain (not a necessity, but curious if it's something ARM specific here), etc)

aorlov updated this revision to Diff 343923.May 9 2021, 11:42 AM

I have replaced output-style-json-frame.test with output-style-json-frame.ll, which is much friendlier but still contains non-zero TagOffset.

jhenderson added inline comments.May 10 2021, 12:25 AM
llvm/test/tools/llvm-symbolizer/output-style-json-frame.ll
1 ↗(On Diff #343923)

Does this need a REQUIRES: aarch64-registered-target or equivalent?

20 ↗(On Diff #343923)

Perhaps highlight that you are testing both 0 and non-zero frame offsets with a comment.

25 ↗(On Diff #343923)

This code could probably do with some comments highlighting what the key elements are, so that future changes don't lose them.

aorlov updated this revision to Diff 344128.May 10 2021, 11:33 AM
aorlov added inline comments.May 10 2021, 11:35 AM
llvm/test/tools/llvm-symbolizer/output-style-json-frame.ll
1 ↗(On Diff #343923)

It does not require any ARM target.
Note this test is passed on x64 Windows and x64 Debian.

20 ↗(On Diff #343923)

I have updated output-style-json-frame.ll.
Now it contains 0, non-zero and empty (missing) TagOffset to cover all possible cases.
I added a comment too.

25 ↗(On Diff #343923)

There are no any key elements. I just declared 3 variables with different type, size and all possible TagOffset per your requests.
I personally think it is overkill, and does not really belong to the JSON patch, as there is nothing JSON specific there.
It seems a test for the symbolize library and the symbolizer itself.

aorlov updated this revision to Diff 344183.May 10 2021, 1:29 PM

I have added the aarch64 requirements, and a few comments in that test for target offsets. Hope this should address it.

jhenderson accepted this revision.May 11 2021, 12:48 AM

LGTM, with one nit.

llvm/test/tools/llvm-symbolizer/output-style-json-frame.ll
1 ↗(On Diff #344183)

Nit: similar to the ## in the other tests, use ;; for comments in this test to help the comments stand out from lit and FileCheck directives.

This revision is now accepted and ready to land.May 11 2021, 12:48 AM
This revision was landed with ongoing or failed builds.May 11 2021, 2:11 AM
This revision was automatically updated to reflect the committed changes.

Please make another commit to fix the unnecessary introduction of semi-colons in the places I've highlighted.

llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
39–40

Why have these superfluous semi-colons reappeared?

99–100

Ditto.

MaskRay added inline comments.May 11 2021, 12:36 PM
llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
40

virtual ~DIPrinter(){}; => virtual ~DIPrinter() {}

Does clang-format complain on virtual ~DIPrinter(){}; ?

dblaikie added inline comments.May 11 2021, 1:04 PM
llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h
40

clang-format doesn't have warnings/errors really - it does it's best to guess at what's going on and format it.

The extra ';' confuse clang-format and so it formats poorly.

@jhenderson also asked @aorlov to remove the unnecessary semicolons earlier:

Please make another commit to fix the unnecessary introduction of semi-colons in the places I've highlighted.

@aorlov - could you take a look at this & ensure the semicolons have been removed?

Please make another commit to fix the unnecessary introduction of semi-colons in the places I've highlighted.

Done.
Sorry for that. Something went wrong when switching branches.