This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 112–113 | I'm using the exact type that symbolizeFrame() returns. | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
| 148 | Please note JSON allows 53-bits numbers. So we print huge numbers as strings "0x...". | |
| llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
| 163 | We need the error code for the clear logic while error handling. The error message may be localized, etc. | |
To separate concerns, it would be a good idea to put the DIPrinter refactoring in a separate prerequisite patch, that the JSON implementation patch depends on.
I'll probably have loads more comments once that is done, but it'll be easier to sift through them once it is.
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 31 | This isn't a good type name. What is a "Common"? Would it make more sense to pass the common data as a separate argument to the non-common Result? | |
| 52 | Perhaps printError is good, because the type signature in its current form doesn't make it clear that this is for error reporting. | |
| 55 | We should probably have DIPrinterLLVM and DIPrinterGNU. The two could largely share functionality under the hood, but I think it is a cleaner interface than one implementation of DIPrinter controlling all its output style, and the other having to look at another variable to choose between styles. | |
| 115 | Make sure to run clang-format on all your new code. | |
| 134–135 | Use StringRef, not const std::string &. | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
| 192 | Delete this. We don't use arbitrary comment markers to divide up files in LLVM coding style. | |
| 196 | You seem to have ignored my earlier comments re. printing everything? | |
| 208 | ||
Please review the separate DIPrinter refactoring patch here https://reviews.llvm.org/D98994
Thanks for the other patch. I'll take a look at it shortly. Could you rebase this patch on that one, please, so that it's easier to see what the adding of a new output style will entail, once the refactoring has been done?
Also, if you add "Depends on DXXXXXX" (where XXXXXX is the number from the patch URL) to this patch description, it will automatically do some Phabricator magic to allow people reviewing this to see that there are other prerequisite patches.
Apologies for disappearing @aorlov - I was off work for over two weeks, and only started back today. I'm working through a big backlog, and will hopefully get to look at this patch in the next day or two.
I've not looked at the testing yet. Here are comments on the code to keep you going though.
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
|---|---|---|
| 214 | static | |
| 216 | static | |
| 221 | I suspect from a user's perspective they won't expect to see an Error attribute if there's no error. | |
| 222 | What is ErrorCode supposed to represent? How will a user benefit from it beyond the information provided in the message? | |
| 240 | This is intended to be machine readable. "Noise" in the output isn't an issue. In fact, as previously mentioned, not having these and other attributes is actively harmful to the user experience, as it makes it harder to write a parser that consumes this JSON. In the case where you have a BadString output, I'd just print an empty string. Example: { "Source" : "", "FunctionName" : "", ... , "Line" : 0, "Column" : 0, "Discriminator" : 0 } | |
| 262 | What's the point of the new line? Same goes elsewhere where you are adding new lines. If the output is intended to be machine readable, there is no need for the new lines. | |
| 313 | Same as above - just print everything. | |
| llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
| 161–164 | I missed this in the other review. We don't need to make an ErrorInfo here at all. Just pass the InputString in instead. That will simplify both the call-site and the printInvalidCommand function. | |
| 360–363 | I would make it always a list, not just for multiple input addresses. Also, shouldn't this be using the json stream arrayBegin/End methods? | |
| 366 | There's no need for this outer if. | |
| 367–370 | How about: StringRef Sep = "";
for (StringRef Address : InputAddresses) {
  outs() << Sep;
  Sep = ",";
  symbolizeInput(Args, AdjustVMA, IsAddr2Line, Style, Address, Symbolizer,
              *Printer);
}This may be moot however, if you are using the JSON stream to use the proper JSON array methods. | |
I do not think we should always include everything and anything into JSON. There is nothing wrong with skipping parameters with unknown values, not applicable data and such.
For example DILocal contains Optional<int64_t> FrameOffset. In JS it would be declared as FrameOffset?: number; and handled natively.
If FrameOffset is not specified we cannot print out the value, any number is a valid offset, and a non-number string would confuse the parses much more than just the optional field which is skipped when N/A.
Besides, our customers are happy with the proposed JSON and do not have any problem parsing optional data.
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
|---|---|---|
| 221 | Error: {Code: 0} – is a standard way to say success. And it is a bad idea to omit Error if it should be checked first. | |
| 222 | The ErrorCode is more important than a message when automating the error handling. The error message may be localized, depend on OS, etc. | |
| 240 | Ok, but I still omit empty Error Message and FrameOffset. | |
| llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
| 367–370 | Did you forget about different commas in GNU/LLVM output style? I do not want any functional change in that area as a part of this patch. | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
|---|---|---|
| 221 | Right, but basically the behaviour of checking whether "Error" exists is no more complicated in most languages than checking whether it has value zero. I would kind of think there would be two possible JSON objects produced per query (possibly three if we handle invalid commands differently to other errors for JSON output): {"Error":"some message/code/whatever"}
{"Source":"some path",...}Where the response had an error, it is unlikely the other parameters can be relied upon in any meaningful way, so it's probably better to omit them than potentially cause confusion. response = json.load(output) if 'Error' in response: handleError(response) else: handleNormalResponse(response) | |
| 222 | The error codes contained in llvm::Error and llvm::Expected are quite often somewhat arbitrary, and inconsistent. It won't be possible to safely rely on these codes for any useful automated processing. Furthermore, some Errors could contain inconvertibleErrorCode which will cause a problem if these end up back here. What's the use-case for handling different error kinds differently? I'm not saying there isn't a motivation for that, I'm just trying to understand how you plan to use it. If you have no such plan, it doesn't make sense to add additional logic to distinguish errors by code as well as message. | |
| 240 | That seems reasonable, thanks for the explanation. | |
| 256 | I didn't think of this earlier, but it makes a lot of sense to have the pretty printing form be more human readable, with indentation and new lines as appropriate. Thanks! | |
| 272 | Did you consider having a single json::OStream as a member of the JSONPrinter class, so that it only needs constructing (with the Pretty check) in one place? | |
| llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
| 367–370 | Thanks, yes, I'd forgotten this was the higher-level printer area. "group" doesn't obviously mean anything to me. Could you consider renaming it to something like "listBegin" etc? I think that more clearly indicates what you're doing. | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
|---|---|---|
| 272 | Note there is no way to reset json::OStream context, so we cannot reuse the same instance to print 2 or more objects while processing stdin. | |
I disagree and still think the ErrorCode is needed for the proper error handling, even if it is arbitrary and inconsistent, which is the usual case everywhere anyway. 
I see your point, though, and did what you suggested to unblock this patch and get it committed.
I've reviewed the test cases more thoroughly today. I think you need test cases where the addresses are specified on the command-line rather than via stdin, because there's a behaviour difference (objects are in list or not as the case may be).
| llvm/docs/CommandGuide/llvm-symbolizer.rst | ||
|---|---|---|
| 256 | Perhaps worth briefly mentioning the behaviour changes for the following cases: 
 I don't think you need concrete examples of the differences. Just a one/two sentence description. | |
| 279 | Maybe use the --pretty-print option to make this example a bit more readable? | |
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
| 120 | I guess it might be helpful to know what this list is of (e.g. ObjectList or whatever). | |
| llvm/test/tools/llvm-symbolizer/output-style-json-code.test | ||
| 4–10 | My understanding is that the --no-inlines option has no impact on the output here, so I think you can drop one of these cases? | |
| 12 | Rather than use {{.*}}/no-file.exe here and similar situations for paths below, I'd recommend using FileCheck's -D option to ensure the correct path is printed, i.e: # RUN: FileCheck %s ... -DFILE=%p/no-file.exe # NO-FILE: ... "ModuleName":"[[FILE]]" ... | |
| 16 | Maybe rather than NOT-FOUND-1 and NOT-FOUND-2, it would be more self-descriptive using NOT-FOUND-NOINLINES and NOT-FOUND-INLINES or something like that? | |
| 18 | Just thinking about usability - what is the canonical way for users to know that their address hasn't been found in JSON output? It might be worth documenting this somewhere. | |
| 26 | Consider a comment for this case, introducing the following set of test cases. In particular, it's important to note that this is using a stdin argument. You could also move all the references to "no-inlines" to one place. I.e. something like "this test case is testing stdin input, with the --no-inlines option" would do the trick. Same sort of thing goes below. | |
| 29 | Nit: missing full stop. | |
| 30 | I find the "Address":"0x0" bit here somewhat confusing, given this is an error case. I'd consider omitting it entirely. Alternatively, simply print what the input address was specified as (in this case "Address":"some text"). | |
| 33 | It would be nice to have one or more test cases with a non-zero discriminator value. Also, where the "Source" parameter is non-empty. You can simplify {{/|\\\\}} to this: {{[\\/]}}. I think it's slightly more readable due to avoiding the quadruple backslash. Same goes elsewhere below. | |
| 50 | I guess the obvious question is: why do we have a difference in behaviour surrounding the FunctionName attribute? The motivation for different symbolizer/addr2line output is because llvm-addr2line needs to be compatible with GNU addr2line, but that principle doesn't apply for JSON output (which AFAIK is not a GNU addr2line supported feature). | |
| 54 | Nit: missing full stop. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-data.test | ||
| 33 | I'd make one of these addresses non-zero, as that will show that the "Address" and "Start" parameters are not just always 0. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.test | ||
| 30 | I think you need a test case with a non-empty "TagOffset". | |
| 42 | It looks like me like this hasn't been addressed? | |
| llvm/test/tools/llvm-symbolizer/output-style-json-code.test | ||
|---|---|---|
| 12 | Unfortunately -D would not work in this case, as JSON has own rules for escaping special symbols in paths. | |
| 16 | It is not applicable anymore. | |
| 18 | Yes. This is the Symbolize library design issue. As far as I can tell there is no reliable way to get a distinct result for the symbol not found case.  | |
| 33 | I have added the test for a non-zero discriminator. Note the address is hardcoded. It is more correct to use /Inputs/discrim.inp via stdin, but I just copied the address from discriminator.test. 
 No, it does not work because JSON has own rules for escaping special symbols in paths. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-code.test | ||
|---|---|---|
| 50 | This behavior is a part of llvm-symbolizer and does not depend on the output styles and does not belong to the printer (look at decideHowToPrintFunctions()).  | |
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.test | ||
| 42 | Intention was to break a dependency on DWARF generation in clang. But I have changed it build from the C source to keep it simple for reading. | |
Hi @aorlov,
The community norm is for a week between updates and a ping/between consecutive pings. I haven't had a chance to come back to this just yet due to my workload. I hope to get to it in the next day or two.
Out of time for today. Will come back to this another time.
| llvm/docs/CommandGuide/llvm-symbolizer.rst | ||
|---|---|---|
| 256–258 | Make this all one paragraph, and reflow to 80 character limit. I'd actually rephrase it slightly too: "If addresses are supplied via stdin, the output JSON will be a series of individual objects. Otherwise, all results will be contained in a single array." | |
| 279 | Sorry, use -p, not --pretty-print (I just realised that's what's used in the previous example). | |
| 294–296 | I'd rephrase as in the inline comment (please make sure to reflow if necessary). | |
| llvm/test/tools/llvm-symbolizer/output-style-json-code.test | ||
| 4–5 | Please reflow this comment to 80-character limit. | |
| 7 | ||
| 12–13 | As before, do we need both --no-inlines and --inlines cases? | |
| 33 | 
 Consider generating one at test time using assembly or yaml2obj. 
 Okay - I missed the double backslash. | |
| 50 | Ah, I think we've hit on a fundamental question regarding JSON output - what should the options that change what information is printed do to JSON output? I'm thinking here specifically both --functions and --addresses, but it may apply to others too. | |
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 96–97 | Is this clang-formatted properly? Slightly surprised there's no space after override. | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
| 218–219 | ErrorCode is unused now. Also, no need for the const on ErrorMsg. | |
| 220 | Usually we use StringRef.str() to convert to a std::string. Same applies below in the error message line. | |
| 235–236 | Don't abbeviate these names unnecessarily like this. Array and FrameCount would both be acceptable, for example. In general, avoid single letter variable names except for loop counters. Same goes throughout this patch. See https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly for details: 
 I think you can inline N into the initialising part of the for loop? I don't see it being used outside the loop. | |
| 239 | Object maybe? | |
| 249 | I think adding source context can be a later patch. Let's try to avoid adding more than we have to in this one patch. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-code-source.c | ||
| 3 ↗ | (On Diff #342279) | This won't work. clang isn't available in llvm-symbolizer tests, as the latter are part of the LLVM layer. Clang depends on the LLVM layer, but not vice-versa. You might be able to use llvm-mc -g to compile some assembly to be able to test this though. | 
| llvm/test/tools/llvm-symbolizer/output-style-json-code.test | ||
| 4 | ||
| 7 | Fair point. How about simply ## Show how library errors are reported in the output. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-data.test | ||
| 6 | Same comment as the code test. | |
| 28 | Same comment as above. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.c | ||
| 24 ↗ | (On Diff #342279) | As above. This won't work. You should use llvm-mc to build from assembly, as in other test cases. | 
I think adding source context can be a later patch. Let's try to avoid adding more than we have to in this one patch.
Nice.
Note you asked for a new test for Source a week ago.
Ok, I have removed Source from the JSON output and removed the new test for Source.
I have restored output-style-json-frame.test instead of output-style-json-frame.c
Note output-style-json-frame.test is huge because it is machine generated and you asked for a non zero and non empty TagOffset.
I don't like an idea to clean up or optimize it anyway.
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 96–97 | It is generated by clang-format and validated by clang-tidy. | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
| 239 | It is not necessary anymore because Source has been removed. | |
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 96–97 | The extraneous trailing semicolon is probably confusing clang-format, so maybe remove those and see how it gets formatted? | |
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 96–97 | Ah, good spot! | |
| llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp | ||
| 219 | There's still an unfortunate single-letter variable name here. I'd suggest Json or Object for the variable name. | |
| 228 | Same comment as above. How about InliningInfo? | |
| 236 | LI is a similar abbreviation which doesn't clearly indicate its name. What's wrong with LineInfo? | |
| 248 | J -> Json or Object | |
| 261 | Object or Json. | |
| 272 | Local | |
| 273 | Maybe use FrameObject here to disambiguate from the toJson retiurn below. | |
| 284 | Json or RequestJson or similar. | |
| 303 | Json or Object | |
| llvm/test/tools/llvm-symbolizer/output-style-json-code.test | ||
| 47 | If I follow this correctly, this test case is showing that llvm-addr2line with -f results in the function name being printed? Assuming that's correct, I think you need to show that llvm-addr2line without -f results in the function name not being included in the JSON output too. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.test | ||
| 39 | Please include the version of clang explicitly here, not just in the ident string below. The reason is that someone might come along and trim off the superfluous parts of the below assembly to minimise the test case, but it would still be helpful to know how to generate the unmodified part. I would also use a final release version of clang, so that a future user can easily get the exact version of clang well into the future. | |
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.test | ||
|---|---|---|
| 39 | This seems like a fairly non-trivial build - might be worth an explanation about what's interesting about this case that's not exercised by simpler cases (such as ones without sanitizers, maybe also x86 plain (not a necessity, but curious if it's something ARM specific here), etc) | |
I have replaced output-style-json-frame.test with output-style-json-frame.ll, which is much friendlier but still contains non-zero TagOffset.
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.ll | ||
|---|---|---|
| 1 ↗ | (On Diff #343923) | Does this need a REQUIRES: aarch64-registered-target or equivalent? | 
| 20 ↗ | (On Diff #343923) | Perhaps highlight that you are testing both 0 and non-zero frame offsets with a comment. | 
| 25 ↗ | (On Diff #343923) | This code could probably do with some comments highlighting what the key elements are, so that future changes don't lose them. | 
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.ll | ||
|---|---|---|
| 1 ↗ | (On Diff #343923) | It does not require any ARM target.  | 
| 20 ↗ | (On Diff #343923) | I have updated output-style-json-frame.ll. | 
| 25 ↗ | (On Diff #343923) | There are no any key elements. I just declared 3 variables with different type, size and all possible TagOffset per your requests. | 
I have added the aarch64 requirements, and a few comments in that test for target offsets. Hope this should address it.
LGTM, with one nit.
| llvm/test/tools/llvm-symbolizer/output-style-json-frame.ll | ||
|---|---|---|
| 1 ↗ | (On Diff #344183) | Nit: similar to the ## in the other tests, use ;; for comments in this test to help the comments stand out from lit and FileCheck directives. | 
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 39 | virtual ~DIPrinter(){}; => virtual ~DIPrinter() {} Does clang-format complain on virtual ~DIPrinter(){}; ? | |
| llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h | ||
|---|---|---|
| 39 | clang-format doesn't have warnings/errors really - it does it's best to guess at what's going on and format it. The extra ';' confuse clang-format and so it formats poorly. @jhenderson also asked @aorlov to remove the unnecessary semicolons earlier: 
 @aorlov - could you take a look at this & ensure the semicolons have been removed? | |