Page MenuHomePhabricator

Add FileWriter to GSYM and encode/decode functions to AddressRange and AddressRanges
ClosedPublic

Authored by clayborg on Jun 26 2019, 8:58 AM.

Details

Summary

The full GSYM patch started with: https://reviews.llvm.org/D53379

This patch add the ability to encode data using the new llvm::gsym::FileWriter class.

FileWriter is a simplified binary data writer class that doesn't require targets, target definitions, architectures, or require any other optional compile time libraries to be enabled via the build process. This class needs the ability to seek to different spots in the binary data that it produces to fix up offsets and sizes in GSYM data. It currently uses std::ostream over llvm::raw_ostream because llvm::raw_ostream doesn't support seeking which is required when encoding and decoding GSYM data.

AddressRange objects are encoded and decoded to be relative to a base address. This will be the FunctionInfo's start address if the AddressRange is directly contained in a FunctionInfo, or a base address of the containing parent AddressRange or AddressRanges. This allows address ranges to be efficiently encoded using ULEB128 encodings as we encode the offset and size of each range instead of full addresses. This also makes encoded addresses easy to relocate as we just need to relocate one base address.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aprantl added inline comments.Jun 26 2019, 4:57 PM
include/llvm/DebugInfo/GSYM/FileWriter.h
34 ↗(On Diff #206738)

This whole interface needs Doxygen comments.

35 ↗(On Diff #206738)

What are the bool return values indicating? If it's error handling — is there a better way of doing this?

lib/DebugInfo/GSYM/FileWriter.cpp
12 ↗(On Diff #206738)

#include <cassert>

clayborg marked 3 inline comments as done.Jun 27 2019, 9:59 AM
clayborg added inline comments.
include/llvm/DebugInfo/GSYM/FileWriter.h
34 ↗(On Diff #206738)

Agreed, but I am not going to do it before we agree on the API.

I just didn't know if this class was going to be thrown out in favor of the MC layer. I couldn't figure out how to use it for raw files so I made this class, but I figured someone with more llvm expertise might let me know how to do it or how to do it another way.

I figured FileWriter would need changes so I just want to iterate on this and get to the "lgtm once we have doxygen comments", then I will add full docs. Is that ok?

35 ↗(On Diff #206738)

bool for success. We can remove this and make it void, or match what the MC layer does. Do we want to just switch all return values to "void" an dhave the user periodically check in this the FileWriter class and ask if it is good? Error checking on each write seems like overkill, so I tried to keep it simple with a bool for now. Let me know what you think.

43 ↗(On Diff #206738)

I can just switch this to StringRef and have the user be on the hook for not passing in a StringRef with embedded NULL characters. We can always write the entire string and always add a NULL. Is that ok?

clayborg updated this revision to Diff 206886.Jun 27 2019, 10:00 AM

Switch FileWriter::writeCStr to use llvm::StringRef. Addressed other review comments.

clayborg marked an inline comment as done.Jun 27 2019, 10:02 AM

JF is currently working on making BitStream error-safe over in https://reviews.llvm.org/D63518. Perhaps there are some patterns that are worth copying.

Most of my concerns have already been noted by other reviewers, in particular with regards to error handling. Just a small nit inline.

lib/DebugInfo/GSYM/FileWriter.cpp
39 ↗(On Diff #206886)

You're already using static_cast etc, so let's use reinterpret_cast here? Applies to the rest of the review as well.

JF is currently working on making BitStream error-safe over in https://reviews.llvm.org/D63518. Perhaps there are some patterns that are worth copying.

Any pattern in particular? I would rather not have to try and take an error after every write. The ASMPrinter returns void. Do we even need a return value? We could store any errors internally in a contained lldb::Error that must be checked before the object is destructed?

clayborg updated this revision to Diff 206912.Jun 27 2019, 1:12 PM

Use reinterpret_cast instead of C casts.

The important part is that the error must be checked (which isn't enforceable through bool returns). Whether it's after each function call or later, you probably have a better idea about.

clayborg updated this revision to Diff 206952.Jun 27 2019, 3:22 PM

Added llvm::Error as a member variable of FileWriter. Each method that does anything with the stream will check if there is an error and return if the FileWriter has encountered an error. Added:

llvm::Error FileWriter::takeError();

To allow clients to check errors after a series of method calls on FileWriter objects. The error must be checked or taken before the FileWriter object destructs.

clayborg updated this revision to Diff 207104.Jun 28 2019, 10:57 AM

Update this patch to compile after r364634.

This all LGTM now. Please wait long enough for others to get their say but overall I'm happy now.

include/llvm/DebugInfo/GSYM/FileWriter.h
67 ↗(On Diff #207104)

Personal preference, can be this be a conversion to bool instead?

aprantl requested changes to this revision.Jun 28 2019, 4:32 PM

We need to come up with a better error handling strategy before this can be landed.

lib/DebugInfo/GSYM/FileWriter.cpp
68 ↗(On Diff #207104)

I'm sure this can be improved. Currently this function checks for errors, but then silently discards them and returns void.

This revision now requires changes to proceed.Jun 28 2019, 4:32 PM
aprantl added a reviewer: vsk.Jun 28 2019, 4:32 PM

It currently uses std::ostream over llvm::raw_ostream because llvm::raw_ostream doesn't support seeking which is required when encoding and decoding GSYM data.

There is raw_fd_ostream::seek. You just need to make sure raw_fd_ostream::supportsSeeking() returns true when you open it.

#include <iostream> is Forbidden.

lib/DebugInfo/GSYM/FileWriter.cpp
95 ↗(On Diff #207104)

I think calling hasError() in every write function is less ideal. Errors can be checked after a batch write in the call sites.

97 ↗(On Diff #207104)

Write on empty data just works. No need to check emptiness.

131 ↗(On Diff #207104)

If Align is a power of 2, you can use (Offset+Align-1) & -Align.

136 ↗(On Diff #207104)

If you use raw_fd_ostream, there is raw_ostream::write_zeros.

MaskRay requested changes to this revision.Jun 28 2019, 6:39 PM
MaskRay added a reviewer: MaskRay.
MaskRay removed a subscriber: MaskRay.
clayborg marked 3 inline comments as done.Jul 1 2019, 7:15 AM
clayborg added inline comments.
lib/DebugInfo/GSYM/FileWriter.cpp
95 ↗(On Diff #207104)

So the question really becomes do we want the first error to stick, or just let the stream get into a bad state and then report some random error later. Error checking is quite cheap (a pointer check) so I don't think the overhead is too much. I would much rather know an accurate first error over just getting some "failed to write" error later when something else caused the error in the first place. Thoughts?

97 ↗(On Diff #207104)

The main worry was passing nullptr for the data.

131 ↗(On Diff #207104)

It isn't required to be a power of 2 currently.

It currently uses std::ostream over llvm::raw_ostream because llvm::raw_ostream doesn't support seeking which is required when encoding and decoding GSYM data.

There is raw_fd_ostream::seek. You just need to make sure raw_fd_ostream::supportsSeeking() returns true when you open it.

The main issue is being able to write to a memory buffer so we don't need to create a temp file on disk when creating sections or doing unit tests. The only raw_ostream that supports seeking is the fd class, none of the other do and the docs for raw_ostream state that seek isn't supported. What we would need is a few virtual methods on raw_ostream so that any raw_ostream can support seeking, but I am not sure people would be thrilled by this. For unit tests I was wanting to use raw_string_ostream, but again, no seek is available on the raw_ostream base class.

#include <iostream> is Forbidden.

It is forbidden in header files only.

clayborg updated this revision to Diff 207303.Jul 1 2019, 7:30 AM

Don't #include <iostream>

So what is the verdict on FileWriter? Do we want real error handling like is currently coded, or just be able to check sometime later and find out that something went wrong? If we just want to check that something is wrong, we don't need to maintain a llvm::Error, we can just check the stream at any point and return a generic error that means nothing. If this is what we do, I am not sure I see the point of adding the error handling. The current solution calls hasError() which checks if the llvm::Error contains a ErrorInfoBase pointer so the call is pretty efficient. And the first thing to cause the error will stay in the error object.

One vote against going with raw_ostream based streams is there is no error handling in the base class. Also no seek support in the base class. raw_fd_stream has std::error_code type handling, and then requires us to use only file descriptor based streams (no memory based streams for unit tests or quickly making GSYMs in memory).

Comments?

MaskRay added a comment.EditedJul 2 2019, 8:05 PM

So what is the verdict on FileWriter? Do we want real error handling like is currently coded, or just be able to check sometime later and find out that something went wrong? If we just want to check that something is wrong, we don't need to maintain a llvm::Error, we can just check the stream at any point and return a generic error that means nothing. If this is what we do, I am not sure I see the point of adding the error handling. The current solution calls hasError() which checks if the llvm::Error contains a ErrorInfoBase pointer so the call is pretty efficient. And the first thing to cause the error will stay in the error object. Without the error checking, FileWriter seems to just forward IO operations to the underlying std::ostream. I wonder if it would be easier to review the other components of GSYM by inlining those FileWriter use sites.

Others may have different opinions but my feeling is that FileWriter does lots of duplicate work that has been done in raw_ostream. It tries to catch errors but I don't see how errors are handled. It may become clearer when other components of GSYM are checked in. Before that, without concrete use cases, it may be difficult to suggest what FileWriter should do.

One vote against going with raw_ostream based streams is there is no error handling in the base class. Also no seek support in the base class. raw_fd_stream has std::error_code type handling, and then requires us to use only file descriptor based streams (no memory based streams for unit tests or quickly making GSYMs in memory).

Comments?

So the problem is that raw_ostream doesn't support seek. raw_fd_ostream supports seek but a fd-based stream is not desired for GSYM. I think we can ask people who have worked on raw_ostream what is the best way going forward.

If it cannot be decidedly shortly, I wonder if it will be easier to move forward with GSYM by inlining the FileWriter calls into other components. You can add the abstraction layer back later when the exceptations of the error checking become clearer after more components are checked in.

So what is the verdict on FileWriter? Do we want real error handling like is currently coded, or just be able to check sometime later and find out that something went wrong? If we just want to check that something is wrong, we don't need to maintain a llvm::Error, we can just check the stream at any point and return a generic error that means nothing. If this is what we do, I am not sure I see the point of adding the error handling. The current solution calls hasError() which checks if the llvm::Error contains a ErrorInfoBase pointer so the call is pretty efficient. And the first thing to cause the error will stay in the error object. Without the error checking, FileWriter seems to just forward IO operations to the underlying std::ostream. I wonder if it would be easier to review the other components of GSYM by inlining those FileWriter use sites.

Others may have different opinions but my feeling is that FileWriter does lots of duplicate work that has been done in raw_ostream. It tries to catch errors but I don't see how errors are handled. It may become clearer when other components of GSYM are checked in. Before that, without concrete use cases, it may be difficult to suggest what FileWriter should do.

We have existing cases right now. We are writing binary data in either byte order to a stream. Not sure what more we need here? If we end up writing to a an ASMPrinter later, FileWriter will be the abstraction layer so that GSYM code doesn't need to know the details of wether we are writing to a section in a file, or just a blob of data for a stand alone file.

One vote against going with raw_ostream based streams is there is no error handling in the base class. Also no seek support in the base class. raw_fd_stream has std::error_code type handling, and then requires us to use only file descriptor based streams (no memory based streams for unit tests or quickly making GSYMs in memory).

Comments?

So the problem is that raw_ostream doesn't support seek. raw_fd_ostream supports seek but a fd-based stream is not desired for GSYM. I think we can ask people who have worked on raw_ostream what is the best way going forward.

If it cannot be decidedly shortly, I wonder if it will be easier to move forward with GSYM by inlining the FileWriter calls into other components. You can add the abstraction layer back later when the exceptations of the error checking become clearer after more components are checked in.

Not sure what would get inlined. We need a way to write integers of varying sizes and byte orders into some data. raw_ostream doesn't support this at all right? What would get inlined? We need a way to encode and decode the objects into a stream and we need to be able to do this to a raw buffer and later to a ASMPrinter. I would rather keep FileWriter here as that layer. std::ostream is working fine for now, and raw_ostream seems to have been made really simple on purpose and doesn't seem to be the right layer for this as raw_ostream is made for logging text.

I am happy to do what is needed. Anyone have any comments on my last comments? We just need a solid direction that everyone thinks we should follow.

Ping for comments

I need some feedback here. Is the general consensus that we need to modify raw_ostream to support seeking or continue with the current FileWriter?

I think teaching more raw_ostreams about seeking is probably right. I started to propose a solution but it boiled down to make our own abstractions over existing ostreams and what amounts to raw_string_ostream. I think raw_string_ostream and raw_os_ostream should both support seeking

Made a patch to support seeking in llvm::raw_ostream: https://reviews.llvm.org/D65096

clayborg updated this revision to Diff 211190.Jul 22 2019, 2:36 PM

Changes:

  • switch FileWriter over to use llvm::raw_pwrite_stream to avoid needing to use std::ostream
  • Remove error handlings from FileWrite since neither llvm::raw_pwrite_stream nor llvm::raw_ostream has any
  • Remove the write unsigned integer function that took a byte size to simplify logic
  • Use SmallString +raw_svector_ostream in places that used to use std::ostringstream
clayborg updated this revision to Diff 211212.Jul 22 2019, 3:17 PM
  • Remove comments that mention errors in GSYMTest.cpp
  • Remove header includes that were no longer needed

Friendly ping. Would love to get GSYM moving if possible.

Hi Greg, sorry about the delay. I have a few small nits about documentation and one question about error handling in the decode method, but otherwise this looks good to me.

include/llvm/DebugInfo/GSYM/FileWriter.h
25 ↗(On Diff #211212)

It seems like now the FileWriter is mostly concerned with the byte order. I'd update this comment saying that it's a wrapper around raw_pwrite_stream with convenience methods to deal with endianness.

include/llvm/DebugInfo/GSYM/Range.h
60 ↗(On Diff #211212)

Can you add an @{ and @} around this to group the comment? (http://www.doxygen.nl/manual/grouping.html)

97 ↗(On Diff #211212)

Can you add an @{ and @} around this to group the comment?

lib/DebugInfo/GSYM/Range.cpp
71 ↗(On Diff #211212)

Should we check that the new offsets are valid?

clayborg marked 2 inline comments as done.Tue, Jul 30, 10:39 AM
clayborg added inline comments.
include/llvm/DebugInfo/GSYM/FileWriter.h
25 ↗(On Diff #211212)

Eventually this class will probably encapsulate a ASMPrinter when we need to write the file to a section. So I am not sure how much was want to change this description as it still fits. raw_pwrite_stream is what we are using right now, but it might not be in the future. So I believe this comment still is true. What do you think?

lib/DebugInfo/GSYM/Range.cpp
71 ↗(On Diff #211212)

There in an assertion in DataExtractor::getULEB128() that will validate the offset is valid when assertions are enable, though I am not a fan of that since it can crash the program if you have a truncated GSYM file.

It is hard to pre-validate if there is enough data for a ULEB128 number in the stream at offset "Offset" since it can be one byte, or many bytes, and without parsing the data first, we can't know the length it will need and it seems like DataExtractor::getULEB128 should handle this gracefully instead of crashing with an assert, or worse yet, just checking that the initial offset is valid, an happily walk off the end of the array like it does.

The DataExtractor::getULEB128 looks like:

uint64_t DataExtractor::getULEB128(uint32_t *offset_ptr) const {
  assert(*offset_ptr <= Data.size());

  const char *error;
  unsigned bytes_read;
  uint64_t result = decodeULEB128(
      reinterpret_cast<const uint8_t *>(Data.data() + *offset_ptr), &bytes_read,
      reinterpret_cast<const uint8_t *>(Data.data() + Data.size()), &error);
  if (error)
    return 0;
  *offset_ptr += bytes_read;
  return result;
}

I would rather see no assert:

uint64_t DataExtractor::getULEB128(uint32_t *offset_ptr) const {
  if (*offset_ptr >= Data.size());
    return 0;

We could check if "Offset" is a valid offset before each call to Data.getULEB128(). Thoughts?

clayborg updated this revision to Diff 212380.Tue, Jul 30, 10:41 AM

Added and around grouped comments as suggested. Waiting for responses on previous comments about changing FileWriter description and calling DataExtractor::getULEB128().

clayborg marked 2 inline comments as done.Tue, Jul 30, 10:42 AM

anyone have any time to look at this again? Would love to get things moving.

friendly ping

Can anyone take a look please? Another week, no comments.

aprantl added inline comments.Mon, Aug 19, 8:59 AM
include/llvm/DebugInfo/GSYM/FileWriter.h
34 ↗(On Diff #206738)

Looks like the comments are still missing.

43 ↗(On Diff #206738)

Ideally, then the method should be called writeNullTerminated or something to that end.

clayborg updated this revision to Diff 215939.Mon, Aug 19, 10:08 AM
  • added full FileWriter doxygen comments
  • renamed writeCStr to writeNullTerminated
  • update all DataExtractor offsets to be uint64_t to match recent DataExtractor changes
aprantl added inline comments.Mon, Aug 19, 12:26 PM
include/llvm/DebugInfo/GSYM/FileWriter.h
37 ↗(On Diff #215939)

/// Write a ...

40 ↗(On Diff #215939)

/// \param Value

88 ↗(On Diff #215939)

additional?

42 ↗(On Diff #206692)

ditto...

clayborg updated this revision to Diff 215973.Mon, Aug 19, 1:27 PM
clayborg marked an inline comment as done.
  • remove function name from doxygen comments
  • change @param to \param
  • fix typos
clayborg marked 4 inline comments as done.Mon, Aug 19, 1:27 PM
clayborg added inline comments.
include/llvm/DebugInfo/GSYM/FileWriter.h
40 ↗(On Diff #215939)

Yikes, there is a large mix of both @param and \param in the code...

Let's ask the underlying question: Which of the above use-cases does a GSYM writer need, and why isn't the StringRef case sufficient?

Couldn't tell if I needed to still comment on this, but here goes. The function:

void FileWriter::writeData(llvm::ArrayRef<uint8_t> Data)

will write exactly the bytes that are in Data. The function:

void FileWriter::writeNullTerminated(llvm::StringRef Str);

Will write out a the contents of the Str as a NULL terminated C string where the NULL termination doesn't need to be part of the string. Using StringRef for data means we usually have to use static cast:

writer.writeNullTerminated(StringRef(static_cast<const char *>(...), ...);

So GSYM uses the writeNullTerminated for standard C strings in the string table which include the NULL termination, and uses writeData for raw data when writing out the swapped values.

aprantl accepted this revision.Mon, Aug 19, 3:24 PM

All of my comments have been addressed.

aprantl added inline comments.Mon, Aug 19, 3:26 PM
include/llvm/DebugInfo/GSYM/FileWriter.h
40 ↗(On Diff #215939)

\param is the preferred spelling in LLVM code (I thought I had replaced all remaining @'s with backslashes a while ago).

MaskRay accepted this revision.Wed, Aug 21, 7:12 AM
MaskRay added inline comments.
include/llvm/DebugInfo/GSYM/FileWriter.h
5 ↗(On Diff #215973)

The license notice is wrong.

lib/DebugInfo/GSYM/FileWriter.cpp
6 ↗(On Diff #215973)

The licence notice is wrong.

This revision is now accepted and ready to land.Wed, Aug 21, 7:12 AM
MaskRay added inline comments.Wed, Aug 21, 9:19 AM
include/llvm/DebugInfo/GSYM/FileWriter.h
18 ↗(On Diff #215973)

sys/types.h ?

This revision was automatically updated to reflect the committed changes.