Page MenuHomePhabricator

Delete DataBufferMemoryMap

Authored by zturner on Feb 16 2017, 2:29 PM.



This depends on D30010 going in first, but assuming that's successful, this patch updates LLDB to use LLVM's memory mapping instead of DataBufferMemoryMap. Since this also makes DataBufferMemoryMap obsolete, I went ahead and nuked it from orbit while I was at it.

Diff Detail


Event Timeline

zturner created this revision.Feb 16 2017, 2:29 PM
labath edited edge metadata.Feb 22 2017, 6:50 AM

I think you are still waiting to get the llvm changes sorted out, but this side of it looks fine to me (modulo a couple of nits).

414 ↗(On Diff #88786)

It looks like this make-a-MemoryBuffer-and-shove-it-in-a-DataBufferLLVM functionality could be handled by a utility function, as it is used in a number of places.

54 ↗(On Diff #88786)

It looks like this variable is not necessary anymore.

zturner updated this revision to Diff 89453.Feb 22 2017, 5:43 PM

Updated with suggestions from labath@

I've tried this out on linux. I got it working only after making the following modifications:

15 ↗(On Diff #89453)

add #include "llvm/Support/FileSystem.h"

38 ↗(On Diff #89453)

Change this to uint64_t Size so that passing -1 will correctly mean "auto-detect the file size".

43 ↗(On Diff #89453)

This makes pretty much everything fail. Most of the code base has a reference to a non-const DataBuffer, which then just segfaults after calling this. I think you'll have to return a const_cast of the buffer here for now.

zturner added inline comments.Feb 23 2017, 11:12 AM
43 ↗(On Diff #89453)

That's too bad, although I guess this problem will go away in the future if I can replace DataBuffer with either llvm::MemoryBuffer or llvm::BinaryStream.

zturner updated this revision to Diff 89542.Feb 23 2017, 11:53 AM
clayborg requested changes to this revision.Feb 23 2017, 12:17 PM

In general code around crashes, we don't want to introduce any crashes in LLDB (no llvm_unreachable, asserts ok if code still functions without them). See inlined comments.

28 ↗(On Diff #89453)

Should this suggest "DataBufferLLVM" instead of directly using "llvm::MemoryBuffer::getFile()"?

22 ↗(On Diff #89453)

Someone can construct this with an empty unique pointer. We will then crash if someone calls GetBytes().

24–25 ↗(On Diff #89453)

If you move this to DataBufferLLVM.cpp you can avoid including "llvm/Support/MemoryBuffer.h" and just include "ClangForward.h" and make sure "llvm::MemoryBuffer is forward declared. If you do, you will need to declare the destructor and have the implementation in the .cpp file as well.

42 ↗(On Diff #89453)

We could also split this into two calls:

uint8_t *GetWriteableBytes()
  if (Buffer->isWriteable())
    return reinterpret_cast<const uint8_t *>(Buffer->getBufferStart());
    return nullptr;

const uint8_t *GetReadableBytes() const
    return reinterpret_cast<const uint8_t *>(Buffer->getBufferStart());

This would then allow clients to not get the non const version by the sheer fact that their DataBufferLLVM is not const.

43 ↗(On Diff #89453)

So one key benefit of having people share used of a DataBufferSP is that they can all reference the same data and the original buffer is ref counted by that shared pointer. You might do:

DataBufferSP data_sp = (load data for entire file);
DataExtractor extractor(data_sp, ....); 
// Above extractor now has shared reference to data_sp
DataExtractor extractor2(extractor, 0x1000, 0x2000);
// Above extractor2 now has a shared reference to data_sp as well, but it only points
// at a subset of the data starting at offset 0x1000, and being 0x2000 bytes in size
return extractor2;

Note that if we return extractor2, it has a strong reference to the data and will keep the data alive as "data_sp" and "extractor" will be destroyed, the backing data won't. Any future switch will need to keep this in mind and can't be making copies of the data in the process. This happens quite a bit in LLDB as you mmap the entire contents of a universal file that contains N mach-o files, and might hand out a DataExtractor for the big endian PowerPC slice, and one for the little endian x86_64 slice. This is the primary reason for DataExtractor being able to have a shared reference to the backing data. Keep that in mind when trying to replace with something else.

43 ↗(On Diff #89453)

Also please don't use llvm_unreachable since that can crash the debugger. You can put in an assert, but the code must function without the assert without crashing.

48 ↗(On Diff #89453)

This will crash if someone constructs with empty unique_ptr. Please don't put an assert or llvm_unreachable in either. Please test the pointer. Move to .cpp if you end up moving CreateFromPath and not including "llvm/Support/MemoryBuffer.h"

52 ↗(On Diff #89453)

Test Buffer before using it and move to .cpp if you end up moving CreateFromPath and end up not including "llvm/Support/MemoryBuffer.h"

61 ↗(On Diff #89453)

Add newline

40 ↗(On Diff #89453)

Remove, This is being included by DataBufferLLVM.h and no local uses of MemoryBuffer are in this file.

36 ↗(On Diff #89453)

Remove, This is being included by DataBufferLLVM.h and no local uses of MemoryBuffer are in this file.

30 ↗(On Diff #89453)

Why is this needed?

This revision now requires changes to proceed.Feb 23 2017, 12:17 PM
zturner updated this revision to Diff 89547.Feb 23 2017, 12:39 PM
zturner edited edge metadata.

Updated with suggestions from clayborg@. It seems wasteful to me check the pointer on every single call to read when we can check it once on creation. So I've added an assert in the constructor and documented with doxygen comments that the class should not be constructed with a non-null MemoryBuffer. i.e. it's the caller's responsibility to ensure the pointer is not null.

Note that when creating one from a path using the supplied static functions, this is not an issue since it will never create one with a null pointer anyway.

I would prefer to avoid superfluous null pointer checks, so hopefully this is sufficient.

I would still vote to check Buffer for NULL. GetByteSize() and GetBytes() are usually accessed one time so there won't be a performance issue. If anyone wanted to actually use a DataBufferLLVM as a member variable, they would need to use a std::unique_ptr to one with the current designed because it can't be default constructed.

20–21 ↗(On Diff #89547)

This doesn't stop things from crashing if for some reason someone does do this and asserts are off. The DataBuffer classes are there for keeping the data alive. The main class that uses the DataBuffer classes is DataExtractor and that grabs the data pointer on time and stores it into the DataExtractor class. So I vote for letting this be empty and checking the unique_ptr as GetBytes is usually accessed one time.

52 ↗(On Diff #89547)

I would still vote to check Buffer for NULL. As I said above, GetByteSize() and GetBytes() are usually accessed one time.

56 ↗(On Diff #89547)

I would still vote to check Buffer for NULL. As I said above, GetByteSize() and GetBytes() are usually accessed one time.

The main concern I have with adding null checks is that I think it actually *increases* your risk of crashing. All of a sudden you've introduced an entirely new branch / code-path that is completely untested. Worse, it only occurs in a situation where you've explicitly told the user not to do something, and they accidentally did it anyway.

As a general rule, you just can't protect against that, and you shouldn't try to because there are an infinite number of ways it could occur in practice. It's the halting problem. If you tell the user of an API to check a pointer for null before using the API, then you can make a ton of assumptions inside the API that greatly simplify things and increase test coverage. Once you start propagating this kind of thing up through the API, tons of error checking and error handling code (which is all likely untested, and nobody understands how it can ever happen or if it's even possible to happen) just goes away because there is just no way for those errors to happen anymore. (In fact, in many cases maybe the errors already can't happen, but there is so much propagating of null checks through various levels of the API that you can't really reason about whether it does or doesn't).

If an API documents that it never returns NULL, for example, then you don't need to check the return value for null. Just like if an API documents that it doesn't accept null, then you do need to check the parameter you pass in.

BTW, not sure if this interests you, but C++ people have been working on the C++ Core Guidelines, some of which will eventually make it into the standard. One of those things is a class called not_null<T> (link here) which restricts a pointer to be non null. That would solve a lot of problems like this at the type system level, and the conversion will be much easier / less painful / less churn if we already know which pointers we can make assumptions about.

I am not sure if this is a voting situation, but I agree with what Zachary said above.

Since we're already speaking about tests, it looks like the new DataBufferLLVM class could use a unit test or two, just so we get in the habit of writing those.

This revision was automatically updated to reflect the committed changes.