This is an archive of the discontinued LLVM Phabricator instance.

Modify DataEncoder to be able to encode data in an object owned buffer.
ClosedPublic

Authored by clayborg on Dec 3 2021, 12:40 PM.

Details

Summary

DataEncoder was previously made to modify data within an existing buffer. As the code progressed, new clients started using DataEncoder to create binary data. In these cases the use of this class was possibly, but only if you knew exactly how large your buffer would be ahead of time. This patchs adds the ability for DataEncoder to own a buffer that can be dynamically resized as data is appended to the buffer.

Change in this patch:

  • Allow a DataEncoder object to be created that owns a DataBufferHeap object that can dynamically grow as data is appended
  • Add new methods that start with "Append" to append data to the buffer and grow it as needed
  • Adds full testing of the API to assure modifications don't regress any functionality
  • Has two constructors: one that uses caller owned data and one that creates an object with object owned data
  • "Append" methods only work if the object owns it own data
  • Removes the ability to specify a shared memory buffer as no one was using this functionality. This allows us to switch to a case where the object owns its own data in a DataBufferHeap that can be resized as data is added

"Put" methods work on both caller and object owned data.
"Append" methods work on only object owned data where we can grow the buffer. These methods will return false if called on a DataEncoder object that has caller owned data.

The main reason for these modifications is to be able to use the DateEncoder objects instead of llvm::gsym::FileWriter in https://reviews.llvm.org/D113789. This patch wants to add the ability to create symbol table caching to LLDB and the code needs to build binary caches and save them to disk.

Diff Detail

Event Timeline

clayborg created this revision.Dec 3 2021, 12:40 PM
clayborg requested review of this revision.Dec 3 2021, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2021, 12:40 PM
clayborg updated this revision to Diff 391768.Dec 3 2021, 3:59 PM

Update headerdoc comments and enable clang-format.

clayborg updated this revision to Diff 391772.Dec 3 2021, 4:16 PM

Fixed "DataEncoder::Append*" methods so they return false correctly when used on caller owned buffer objects.

labath added a comment.Dec 6 2021, 3:09 AM

It seems to me that the DataEncoder class is a lot more complicated than what is necessary for our existing use cases. All of them involve creating a new buffer (either empty, or with some pre-existing content), modifying it, and passing it on. A lot of the DataEncoder complexity (IsDynamic, UpdateMemberVariables, ...) stems from the fact that it wants to support writing into unowned buffers, functionality that we don't even use.

If we removed that, we could make this code a lot simpler. See inline comments for one way to achieve that.

lldb/include/lldb/Utility/DataEncoder.h
70–73

Take const void*data here and make a copy for internal use inside the constructor.

225–246

delete

262–263

delete, along with the member variables it updates.

lldb/source/Expression/DWARFExpression.cpp
462–463

delete, and create data encoder from m_data.GetDataStart() directly

It seems to me that the DataEncoder class is a lot more complicated than what is necessary for our existing use cases. All of them involve creating a new buffer (either empty, or with some pre-existing content), modifying it, and passing it on. A lot of the DataEncoder complexity (IsDynamic, UpdateMemberVariables, ...) stems from the fact that it wants to support writing into unowned buffers, functionality that we don't even use.

I disagree here. We weren't using, but now we are using this functionality. If you look at the ELF test that I modified, not that it is using the new heap based buffer. Even though it is very easy to calculate a buffer size in advance for this, we shouldn't have to.

Another reason I believe we need to dynamically grow the buffer is: if I am about the encode an entire symbol table into a memory buffer, how would I determine the size of the buffer I need in advance? I shouldn't have to worry about this and it would make the DataEncoder class hard to use because of this. We are going to want to encode many more things for caching using DataEncoder in the future and we shouldn't have to pre-determine how much memory is needed in advance, or allocate way too much just in case, just to be able to encode the data. How large will our symbol table name index tables be? That would be hard to determine in advance. Same with indexing of debug info, which is something that I want to add soon after.

We currently need DataEncoder for two cases:

  • fixing up data that isn't relocated in pre-determined buffers for variable addresses (DW_OP_addr from .o files in DWARF without dSYM cases for Apple)
  • creating a new stream of data where we don't know the size in advance

In the latter case, we might also want to first append a bunch of data, and then use the Put methods to do some fixups on data we dynamically appended to fixup offsets to data that was created later in the buffer.

So I strongly believe that we do need the functionality that I added to DataEncoder.

If we removed that, we could make this code a lot simpler. See inline comments for one way to achieve that.

Let me know if my comments above helped explain the APIs I added. I would like to avoid having to make two calls to a encode function to pre-determine the size of the buffers that are needed.

clayborg added inline comments.Dec 6 2021, 11:02 AM
lldb/source/Expression/DWARFExpression.cpp
462–463

That would crash the program. The data here is coming from the DWARF from a mmap'ed read only file. We can't modify the data, and thus this is why we make a copy in the first place: so we can modify the DWARF data and fixup the address info.

I don't think we actually disagree here. I'm aware of your use case (let's call it "dynamic mode") for appending to a buffer. I agree it's useful and I don't want to change that. What I want to remove is the other mode (non-dynamic). Presently, this is the only mode, but we're not actually making a good use of it. All of the existing use cases can be implemented with a "dynamic" buffer. And the code would be much simpler since there is only one mode to support -- one in which the data encoder owns the buffer it is writing to and can resize it at will.

lldb/source/Expression/DWARFExpression.cpp
462–463

With the current implementation yes. But this was meant to be done in conjuction with the changes to the DataEncoder constructor. The idea is that the constructor itself would copy the data into the owned buffer such that it can be freely modified or appended to. (In here we wouldn't make use of the append functionality, but that doesn't matter.)

I don't think we actually disagree here. I'm aware of your use case (let's call it "dynamic mode") for appending to a buffer. I agree it's useful and I don't want to change that. What I want to remove is the other mode (non-dynamic). Presently, this is the only mode, but we're not actually making a good use of it. All of the existing use cases can be implemented with a "dynamic" buffer. And the code would be much simpler since there is only one mode to support -- one in which the data encoder owns the buffer it is writing to and can resize it at will.

Ahh! Gotcha. That I can do. Thanks for the feedback, I will post an update shortly!

clayborg updated this revision to Diff 392177.Dec 6 2021, 2:09 PM

Updated to make DataEncoder always copy any data and own the buffer.

clayborg marked 3 inline comments as done.Dec 6 2021, 2:09 PM
clayborg updated this revision to Diff 392187.Dec 6 2021, 2:33 PM

Update the headerdoc and fixed some of the comments that were out of date or left over from the inital copy from DataExtractor.

clayborg updated this revision to Diff 392191.Dec 6 2021, 2:40 PM

More documentation fixed for DataEncoder.

labath accepted this revision.Dec 7 2021, 12:07 AM

This is what I hand in mind. Thanks.

This revision is now accepted and ready to land.Dec 7 2021, 12:07 AM

Fixed a buildbot issue for linux with:

commit cfe5d768be95ae0a62cf430e56e7762cebf81a65 (HEAD -> main, origin/main, origin/HEAD)
Author: Greg Clayton <gclayton@fb.com>
Date: Tue Dec 7 12:03:45 2021 -0800

Fix buildbot after https://reviews.llvm.org/D115073.

Another fix for buildbots. Don't use llvm::ArrayRef to refer to data, store in std::vector<>. In release builds this will fail.

commit 220854a47bdc0c281dbaafb54411ec65e76212b3 (HEAD -> main, origin/main, origin/HEAD)
Author: Greg Clayton <gclayton@fb.com>
Date: Tue Dec 7 12:19:00 2021 -0800

Fix buildbots after https://reviews.llvm.org/D115073.