Page MenuHomePhabricator

Add support for DWARF embedded source to llvm-symbolizer

Authored by aorlov on May 12 2021, 12:41 PM.



This patch adds DWARF embedded source printout to llvm-symbolizer.

Diff Detail

Event Timeline

aorlov created this revision.May 12 2021, 12:41 PM
aorlov requested review of this revision.May 12 2021, 12:41 PM
dblaikie added inline comments.May 13 2021, 2:35 PM

Could you pull this (& any other refactoring you think makes the semantic-changing patch more isolated/focussed) into a preliminary commit?


I take it this is trimming the trailing lines that won't be needed, while leaving the leading lines so the algorithm in printContext can still retrieve the lines in a source-agnostic way (so it can retrieve them in the same way from embedded source as non-embedded source)

It seems unfortunate to have split the pruning logic in two places like this - perhaps a generic function that takes the line iterator & just the desired context (pruning both leading and trailing lines) - then use that function here, and have the "Source" attribute contain the fully pruned text.

Then in printContext if the Source value is provided, use that literally - otherwise open the file and prune its leading/trailing using the same function?

(also that refactoring - pulling out such a reusable pruning function - should be done as a separate commit)

aorlov updated this revision to Diff 345650.May 15 2021, 1:44 PM
aorlov marked 2 inline comments as done.
dblaikie added inline comments.May 15 2021, 1:59 PM

It's not valid to dereference an end iterator like this.

What is this code intended to address?


This looks rather similar to PlainPrinterBase::printContext - perhaps it could be refactored out into a common function?

aorlov updated this revision to Diff 345676.May 16 2021, 12:27 AM
aorlov marked an inline comment as done.May 16 2021, 12:35 AM
aorlov added inline comments.

Note llvm::line_iterator is badly designed.
It is checks *Pos == '\0' in many places instead of comparing Pos with Buffer.end().
It requires '\0' at the end of buffer which is not necessary actually.
Look at an assert on llvm/lib/Support/LineIterator.cpp, line 47.


Correct. I copied the Source context to JSON as is in the first implementation. Then I decided to format it too. I have moved the code to the common function format().

aorlov marked an inline comment as done.May 16 2021, 12:45 AM
aorlov added inline comments.

Note the current implementation of line_iterator can be used with the full source, but not with the pruned source if it didn't coincide with the end of the source file.

dblaikie added inline comments.May 16 2021, 10:42 AM

This still isn't valid - there's no guarantee that end is dereferenceable. (using [] doesn't make the code any safer/different - it's still a dereference)


Either the code will need to unconditionally copy the StringRef into a std::string (in which case, maybe it'd be simpler if the SourceCode structure stored the data in a std::string in the first place? then it wouldn't need the MemoryBuffer member either, since it wouldn't be holding live pointers to the memory mapped file) - or the line_iterator would need to be fixed or a different tool used - which, maybe that's the thing to do? If the original code that extracted the relevant lines already used the line iterator (so features like blank line skipping, etc, have already been handled) - maybe this code could do something simpler, like splitting on newlines, rather than using the full line iterator?

aorlov added inline comments.May 16 2021, 11:31 AM

I just used the same checking as llvm/lib/Support/LineIterator.cpp, line 47:

assert(Buffer.getBufferEnd()[0] == '\0');

Any suggestions?


Note currently line_iterator does not skip blank lines.
I will prepare an own implementation without llvm::line_iterator.

aorlov updated this revision to Diff 345722.May 16 2021, 12:37 PM
aorlov marked an inline comment as done.May 16 2021, 12:39 PM

I've got no specific comments on the source code, just one small nit in the test. I'll leave @dblaikie to continue the rest of the review.


It would probably be a good idea to run the exact same check for GNU output (or llvm-addr2line), just to cover our bases. That way, if the behaviour ever diverges in the future, there's a clear test to update.

aorlov updated this revision to Diff 345880.May 17 2021, 7:45 AM
aorlov marked an inline comment as done.
jhenderson added inline comments.May 18 2021, 12:50 AM

I think "output" is a more common term to use and easier to understand.


No need to repeat "source context" in this text - it's implied by the top-level comment.


This is simpler and more traditional for the input arguments. Also CODE is the default and not needed.


There's only one small difference between this and the LLVM style, if I'm not mistaken. You could avoid duplciation by doing something like:

; RUN: ... | FileCheck ... --check-prefixes=COMMON,LLVM;
; RUN: ... | FileCheck ... --check-prefixes=COMMON,GNU

;      COMMON:foo
;   LLVM-NEXT:/source.c:3:13
;    GNU-NEXT:/source.c:3
; COMMON-NEXT:2  : // Line 2
; COMMON-NEXT:3 >: void foo() {}
; COMMON-NEXT:4  : // Line 4
aorlov updated this revision to Diff 346297.May 18 2021, 4:20 PM
aorlov marked 8 inline comments as done.

Test looks good to me. I'm leaving the source for @dblaikie.

Sure, seems OK. Thanks!

jhenderson accepted this revision.May 20 2021, 12:10 AM

Formally: LGTM!

This revision is now accepted and ready to land.May 20 2021, 12:10 AM
This revision was automatically updated to reflect the committed changes.