This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] Avoid truncating Symbolizer output
ClosedPublic

Authored by paulkirth on May 27 2022, 5:03 PM.

Details

Summary

Repalce the fixed buffer in SymbolizerProcess with InternalScopedString,
and simply append to it when reading data.

Fixes #55460

Diff Detail

Event Timeline

paulkirth created this revision.May 27 2022, 5:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 5:03 PM
Herald added a subscriber: dberris. · View Herald Transcript
paulkirth requested review of this revision.May 27 2022, 5:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 5:03 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

@vitalybuka is this change congruent w/ what you were suggesting in D126102?

Thanks.
LGTM with few nits

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_internal.h
96

getBuff don't need to be virtual

96

please make getBuff -> GetBuff
make it protected and return Vector&

117

String is used to do formated append()

For this task InternalMmapVector<char> looks more convenient.

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
515–543

we don't need another buffer
I didn't compile, but this is idea:

bool SymbolizerProcess::ReadFromSymbolizer() {
  buffer_.clear();
  bool ret = true;
  do {
    uptr just_read = 0;
    constexpr uptr max_length = 256;
    uptr size_before = buffer_.size();
    buffer_.resize(size_before + max_length);
    bool success = ReadFromFile(input_fd_, buffer_.data(),
                                max_length, &just_read);

    if (!success)
      just_read = 0;
    buffer_.resize(size_before + just_read);

    // We can't read 0 bytes, as we don't expect external symbolizer to close
    // its stdout.
    if (just_read == 0) {
      Report("WARNING: Can't read from symbolizer at fd %d\n", input_fd_);
      ret = false;
      break;
    }
  }  while (!ReachedEndOfOutput(buffer_.data(), buffer_.length()));
  buffer_.append('\0');
  return ret;
}
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp
237

Let's make it nicer, so buffer size and trailing zero position match. Something like:

GetBuff().resize(garbage - start);
GetBuff().push_back('\0');

paulkirth updated this revision to Diff 434638.Jun 6 2022, 3:59 PM

Address comments.

Replace InternalScopedString w/ InternalMmapVector.
Use PageSize as the minimum size to read in, since we dont allocate on the
stack anymore.
Adopt suggested code changes when reading in data.

paulkirth marked 4 inline comments as done.Jun 6 2022, 4:02 PM
paulkirth added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
515–543

This was a good suggestion after moving to the vector.

paulkirth marked 2 inline comments as done.Jun 6 2022, 4:02 PM
leonardchan added inline comments.Jun 6 2022, 4:21 PM
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
524

nit: I think you could just use and assign to ret here instead of having success

compiler-rt/test/sanitizer_common/TestCases/symbolize_stack.cpp
29

This might be a little extra work, but could you rework the checks such that we somewhat find all 10 frames in the recursive call to this function. IIRC, this test could unexpectedly pass if it just happen to have one symbolized frame that have 5 vectors in that frame. Since the behavior we expect with this fix is that we should have the correct symbols for each frame, I think a more verbose test would be 10 checks (one for each frame) and they have the appropriate number of vectors in them. They don't have to be the fully expanded symbol name, but just enough that we capture all the vectors, so they would look something like:

CHECK: {{#[0-9]+.*vector<.*vector.*<vector.* ... >.*>.*>}}  // Or however number of `vector`s are expected
vitalybuka added inline comments.Jun 6 2022, 5:09 PM
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
518

GetPageSizeCached?
but better simple constant like 256, to avoid relying on Vector details
Vector will start with PageSize

You can do the following to use entire buffer:
buffer_.resize(size_before + max_length);
buffer_.resize(buffer_.capacity());

525
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp
244

why do you need reinterpret cast?
both are "const char*"
also resize needs number of elements, not address diff, so if char was 2 bytes (garbage - buff.data()) still stands, when reinterpret is incorrect.

paulkirth updated this revision to Diff 434857.Jun 7 2022, 9:55 AM

Address comments

  • use the original buffer size(1024) as a fixed size increment when increasing buffer size.
  • make full capacity availible when reading in data.
  • replace success with ret.
  • refactor pointer arithmetic.
paulkirth updated this revision to Diff 434944.Jun 7 2022, 2:01 PM

Update test condition per discussion with @leonardchan

  • match the beggining of the line, some number of vectors, and the file location
paulkirth marked an inline comment as done.Jun 7 2022, 2:02 PM
paulkirth marked 3 inline comments as done.
vitalybuka accepted this revision.Jun 7 2022, 2:47 PM
This revision is now accepted and ready to land.Jun 7 2022, 2:47 PM
leonardchan accepted this revision.Jun 7 2022, 2:48 PM
This revision was automatically updated to reflect the committed changes.