This is an archive of the discontinued LLVM Phabricator instance.

Support inline functions symbolization in Addr2Line symbolizer.
ClosedPublic

Authored by m.ostapenko on Aug 19 2015, 8:33 AM.

Details

Summary

Right now, Addr2Line symbolizer in asan_symbolize.py doesn't support inline functions symbolization. This might be a useful feature for using ASan on embedded systems.

Test results:

$ cat test.c

#include <sanitizer/common_interface_defs.h>

static inline void FooBarBaz() {

__sanitizer_print_stack_trace();

}

int main() {

FooBarBaz();
return 0;

}

$ clang test.c -fsanitize=address -g -O2 -o test.x && ./test.x &> /tmp/test.log
$ ./projects/compiler-rt/lib/asan/scripts/asan_symbolize.py -l /tmp/test.log

#0 0x42095e in __sanitizer_print_stack_trace _asan_rtl_
#1 0x4cec07 in FooBarBaz /home/max/build/llvm/asan/test.c:4
#2 0x4cec07 in main /home/max/build/llvm/asan/test.c:8
#3 0x7f89f0891ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287

Diff Detail

Repository
rL LLVM

Event Timeline

m.ostapenko retitled this revision from to Support inline functions symbolization in Addr2Line symbolizer..
m.ostapenko updated this object.
m.ostapenko added reviewers: samsonov, glider.
m.ostapenko set the repository for this revision to rL LLVM.
m.ostapenko added a subscriber: ygribov.
samsonov edited edge metadata.Aug 19 2015, 9:57 AM

It's kind of sad you have to launch a subprocess for every line in a stack trace. Is there no way you could make "addr2line -i" read addresses from stdin, and understand how many frames in output describe every input address?

Is there no way you could make "addr2line -i" read addresses from stdin,
and understand how many frames in output describe every input address?

Unfortunately addr2line is too primitive for this and there is no way to know number of frames in advance. Perhaps we could make inlining optional?

Crazy suggestion: whenever you need to symbolize a PC, feed this PC followed by 0xbadbadbad to addr2line. It will print just two lines (??, ??:0) for the latter, which means all the rest describe the symbolized location of PC (possibly, inlined).

Hm, this should work. Max?

We probably also need to update Addr2LineProcess and LLVMSymbolizer so that output for offline/online symbolizer is consistent.

Crazy suggestion: whenever you need to symbolize a PC, feed this PC followed by 0xbadbadbad to addr2line. It will print just two lines (??, ??:0) for the latter, which means all the rest describe the symbolized location of PC (possibly, inlined).

Hm, this should work. Max?

Yes, this works fine in script.

We probably also need to update Addr2LineProcess and LLVMSymbolizer so that output for offline/online symbolizer is consistent.

We don't touch LLVMSymbolizer in asan_symbolize.py, so no need to modify it in library.

m.ostapenko edited edge metadata.

Updating following approach, suggested by Alexey. No regressions on x86_64-unknown-linux-gnu.

$ ASAN_OPTIONS=allow_addr2line=1 ./test.x 
  #0 0x41f7de in __sanitizer_print_stack_trace /home/max/src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:38
  #1 0x4cec17 in FooBarBaz /tmp/test.c:4
  #2 0x4cec17 in main /tmp/test.c:8
  #3 0x7fe58102cec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
  #4 0x417ac5 in _start (/tmp/test.x+0x417ac5)
samsonov added inline comments.Aug 21 2015, 10:17 AM
lib/asan/scripts/asan_symbolize.py
159 ↗(On Diff #32830)

Make this a named constant. I also wonder if different value would work better (smth. that will definitely be outside of .text section).

163 ↗(On Diff #32830)

Why can't this be while-true loop?

171 ↗(On Diff #32830)

Should this be a list of pairs

[('??', '??:?')]

? Or just declare lines before try-catch block and call .append here

176 ↗(On Diff #32830)
return ['%s in %s %s' % (addr, frame[0], fix_filename(frame[1])) for frame in lines]
lib/sanitizer_common/sanitizer_symbolizer_internal.h
78 ↗(On Diff #32830)

The semantics of this function is very confusing. If GetEffectiveLenOfOutput is 0, it means that we should continue to fetch the output, which makes little sense.

Also, this is incorrect. Suppose you have passed

0x400400
0xbadbadbad

to addr2line, it had printed the following to stdout:

foo
/tmp/foo.cc:42
??
??:?

Suppose the first call to ReadFromFile in ReadFromSymbolizer has fecthed:

foo
/tmp/foo.cc:42
??

In your implementation, GetEffectiveLenOfOutput would return non-zero, you will break from the loop and fail to consume the last line.

You should fix it at the higher level - consume the output of addr2line as you do now, but change the way you handle this buffer: either teach ParseSymbolizePCOutput to drop the last couple of lines, or (better) trim the buffer in Addr2LineSymbolizer before calling that function.

m.ostapenko removed rL LLVM as the repository for this revision.

Updating the patch according to last review. Don't break existing interfaces now.

ygribov added inline comments.Aug 24 2015, 12:20 PM
lib/asan/scripts/asan_symbolize.py
138 ↗(On Diff #32970)

Perhaps make short version for 32-bit targets? Same for C++ code below.

163 ↗(On Diff #32970)

Perhaps add a guard here to avoid endless loop?

167 ↗(On Diff #32970)

Shouldn't you eat following ??:? as well? Also what if symbolization fails for the first (non-terminator) address? I guess you'll skip response for terminator and thus break following symbolizations.

171 ↗(On Diff #32970)

This would be more pythonic:

% (addr, line, fix_filename(file)) for (line, file) in lines
lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
198 ↗(On Diff #32970)

Please use size_t. Also it makes more sense to call this terminator_len.

202 ↗(On Diff #32970)

Hm, shouldn't this always be false if length == terminator_size?

206 ↗(On Diff #32970)

The two internal_memcmps are duplicating each other. Can you compute the second one and cache result in a var with descriptive name (say is_terminated)?

219 ↗(On Diff #32970)

You don't need 9, neither here, nor in definition.

232 ↗(On Diff #32970)

Shouldn't this be non-const?

263 ↗(On Diff #32970)

-1 should be a define somewhere. Also splitting termination logic across Addr2LinePool and Addr2LineProcess looks really weird. Can we move it all to one class?

272 ↗(On Diff #32970)

I'm sure we can avoid code dup.

ygribov added inline comments.Aug 25 2015, 3:45 AM
lib/asan/scripts/asan_symbolize.py
138 ↗(On Diff #32970)

Actually -1 seems to work for both Binutils and Elfutils.

m.ostapenko added inline comments.Aug 25 2015, 4:03 AM
lib/asan/scripts/asan_symbolize.py
163 ↗(On Diff #32970)

I did this in my previous revision, but Alexey wrote this in previous comment:

Why can't this be while-true loop?

So, I'm a bit confused, what approach is preferable here.

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
219 ↗(On Diff #32970)

Hm, without 9 I got this:

/home/max/src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc:198:41: error: invalid application of 'sizeof' to an incomplete type 'const char []'
    const size_t terminator_len = sizeof(output_terminator_) - 1;
                                        ^~~~~~~~~~~~~~~~~~~~
1 error generated.
ygribov added inline comments.Aug 25 2015, 4:53 AM
lib/asan/scripts/asan_symbolize.py
163 ↗(On Diff #32970)

Yeah, I'm a bit worried about corner-cases causing our primitive parser to hang but that may be an overkill.

Updating according to last Yura's nits.

samsonov added inline comments.Aug 25 2015, 10:09 AM
lib/asan/scripts/asan_symbolize.py
172 ↗(On Diff #33082)

s/line/function

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
199 ↗(On Diff #33082)

s/red/read

202 ↗(On Diff #33082)

Please fix variable names in the comment - there are no OUTPUT_TERMINATOR_ and TERMINATOR_LEN

204 ↗(On Diff #33082)

Should this be <= ?

209 ↗(On Diff #33082)

class members should end with _

Use of static class variable here is really weird - it adds implicit restriction that you will *always* call ReachedEndOfOutput, followed by TrimOutputBuffer with the same buffer, and that there will be no several instances of this class. I think the latter is false even now.

229 ↗(On Diff #33082)

I don't like hardcoded 9 either. You can just move this constant into the body of ReachedEndOfOutput

static const char kOutputTerminator[] = "??\n??:0\n";
m.ostapenko added inline comments.Aug 25 2015, 10:49 AM
lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
202 ↗(On Diff #33082)

Ah, sorry, these are GCC habits.

209 ↗(On Diff #33082)

Ok, I can accumulate effective_length_ in object's member, but I'll need to remove "const" keyword from ReachedEndOfOutput method. Is this OK?

Updating according to last review. Since effective_length_ is an object's member now, ReachedEndOfOutput is no longer const.

ygribov added inline comments.Aug 26 2015, 10:06 AM
lib/asan/scripts/asan_symbolize.py
167 ↗(On Diff #33196)

I'd rather assert than file_name is correct so that we fail fast on parse error. Also why startswith instead of =?

lib/sanitizer_common/sanitizer_symbolizer_internal.h
78 ↗(On Diff #33196)

Rather than cache effective length in ReachedEndOfOutput and introduce implicit constraint on methods, can we recalculate effective_length inside TrimOutputBuffer? This would also allow us to keep const.

91 ↗(On Diff #33196)

Why const uptr?

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
198 ↗(On Diff #33196)

I wonder if there's a way to avoid magic const.

272–273 ↗(On Diff #33196)

Note that terminator logic is still split across two classes.

281 ↗(On Diff #33196)

I don't think that output_terminator but rather "dummy_address" or something like this. BTW note that statics don't take underscore.

m.ostapenko added inline comments.Aug 26 2015, 11:27 AM
lib/sanitizer_common/sanitizer_symbolizer_internal.h
91 ↗(On Diff #33196)

Sorry, size_t is undefined here.

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
198 ↗(On Diff #33196)

We can use strlen, but this will lead to recalculation on each functon call. Or maybe use "??\n??:0\n" as it to compute kTerminatorLen.

281 ↗(On Diff #33196)

AFAIK, they do:

class members should end with _

Or maybe I missed something?

samsonov added inline comments.Aug 26 2015, 1:30 PM
lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
198 ↗(On Diff #33196)

Consider just making output_terminator_ a function-static - it's not used anywhere else in this class anyway.

m.ostapenko added inline comments.Aug 26 2015, 2:14 PM
lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
198 ↗(On Diff #33196)

Yes, this is an option, but if follow approach Yura suggested (do not accumulate effective_len, but recompute it in TrimOutputBuffer each time), we'll need output_terminator_ in TrimOutputBuffer either to find meaningful buffer length.

ygribov added inline comments.Aug 26 2015, 2:19 PM
lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
198 ↗(On Diff #33196)

This seems to work at least GCC 4.9:

struct XXX {
  static constexpr const char s[] = "??\n??:0\n";
  size_t f() {
    return sizeof(s);
  }
};

Updating according to last review. To completely remove termination logic from Addr2LinePool, I needed to introduce new AdjustCommandIfNeeded in SymbolizerProcess interface. Also, ReachedEndOfOutput is outline now to be able compute sizeof(output_terminator_).

Sorry if I'm being overly nitpicky, but I don't understand why you've moved this logic away from Addr2LinePool - I actually preferred when the subclass was responsible for providing correct command to SendCommand and parsing its result accordingly...

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
226 ↗(On Diff #33322)

This is wrong - how could you know if the buffer for command has sufficient size?

Sorry if I'm being overly nitpicky, but I don't understand why you've moved this logic away from Addr2LinePool - I actually preferred when the subclass was responsible for providing correct command to SendCommand and parsing its result accordingly...

Ok, I'm moving command modifying logic back to Addr2LinePool. Also, should I move TrimOutputBuffer logic to Addr2LinePool as well? I'm sorry, I'm a bit confused about this stuff.

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
226 ↗(On Diff #33322)

Hm, I increased command buffer size from 32 to 64, isn't that enough to hold offset + dummy address?

static const uptr kBufferSize = 64;

And the patch itself.

ygribov added inline comments.Aug 28 2015, 5:13 AM
lib/asan/scripts/asan_symbolize.py
171–172 ↗(On Diff #33409)

Or rather '??:0'?

Alexey, Yura, could you please comment this?

I still don't like the suggested approach. Having TrimOutputBuffer in the parent class is confusing - it's unclear when one needs to use it, and it's added for one specific implementation detail of addr2line. To make that more generic, you can make ReadFromSymbolizer virtual, and make Addr2LineProcess::ReadFromSymbolizer just call the implementation of the base class, and afterwards adjust it (search for output_terminator_ and put a terminating '\0').

lib/asan/scripts/asan_symbolize.py
171–172 ↗(On Diff #33409)

Please address

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
277 ↗(On Diff #33409)

While you're here, rename it to buffer, as it's a local variable.

Updating according to last Alexey's review. Does this look better now?

Yes, IMO it looks better. Only a few minor comments.

lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cc
209 ↗(On Diff #34349)

You shouldn't really do anything if ReadFromSymbolizer fails.

if (!ReadFromSymbolizer())
  return false;
//...
return true;
216 ↗(On Diff #34349)

You probably don't need a dedicated function for this now.

218 ↗(On Diff #34349)

"in case given offset is invalid"

Addressing last Alexey's comments.

samsonov accepted this revision.Sep 11 2015, 5:06 PM
samsonov edited edge metadata.

LGTM. Thank you for doing this, and sorry for the long review!

This revision is now accepted and ready to land.Sep 11 2015, 5:06 PM

Great! Alexey, Yura, thanks for your help!

I've rechecked the patch once again on ARM, AARCH64 and X86_64 targets and everything looks good.
Alexey, could you please land this?

This revision was automatically updated to reflect the committed changes.