This is an archive of the discontinued LLVM Phabricator instance.

Start adding the supporting code to perform out-of-process allocator enumeration.
ClosedPublic

Authored by delcypher on Nov 1 2018, 6:16 AM.

Details

Summary

This patch introduces the local portion (LocalObjectView) of the
ObjectView abstraction and demonstrates its use by modifying the
secondary allocator for out-of-process enumeration.

The ObjectView abstraction is an interface that tries to abstract away
the differences performing in-process and out-of-process operations so
that in-process and out-of-process enumeration can share the same
implementation.

This patch refers to but does not add the RemoteObjectView
implementation of the ObjectView abstraction which will be added
in later patches.

This patch is based on https://reviews.llvm.org/D50330 which provides
a complete implementation for all allocators but has been abandoned
due to difficulty of reviewing the code.

rdar://problem/45284065

Diff Detail

Repository
rL LLVM

Event Timeline

delcypher created this revision.Nov 1 2018, 6:16 AM
Herald added a subscriber: Restricted Project. · View Herald TranscriptNov 1 2018, 6:16 AM
delcypher updated this revision to Diff 172114.Nov 1 2018, 6:22 AM

Drop now unused type_traits header.

kcc added a comment.Nov 1 2018, 11:59 AM

OMG.

I sympathize with your problem, but the solution makes the code unreadable and unmaintainable.
The point of having templates is to avoid code like

if (ObjectView::IsLocal()) {

Also, if the change requires replacing

auto t = chunks_[i];

with the code below I can not accept such a change.

Header **header_ptr_addr = &(chunks_[i]);
// Copy header pointer into this process
typename ObjectView::template SmallAllocTy<Header *> header_ptr_view(
    header_ptr_addr);
Header *t = *(header_ptr_view.GetLocalAddress());

Is there a reasonable way to replace it with this?

auto t = Load(&chunks_[i]);

If not, you may need to find completely different way, e.g. not changing the allocator code at all and instead doing the introspection in an independent part of code.
Whenever the allocator data layout changes, you will have to change that code too, but with proper tests you will at least know when you need to do it.

In D53975#1284314, @kcc wrote:

OMG.

I sympathize with your problem, but the solution makes the code unreadable and unmaintainable.
The point of having templates is to avoid code like

if (ObjectView::IsLocal()) {

That's not what the ObjectView (the template parameter) type does. The ObjectView abstraction let's its clients look at an object regardless of whether it exists in the current process or a remote process. The abstraction does not abstract away operations on the object.
It is the Object's job (in this case the allocator) to know how to perform operations on itself, whether that be in-process or out-of-process. This implies that the allocator has to handle performing operations on itself and this includes out-of-process operations. Trying to push
the knowledge of the peculiarities of a particular allocator into ObjectView type does not make sense from a software engineering perspective.

However there are things we could do to make this more aesthetically pleasing. For example we could make the EnsureSortedChunks() be a no-op when the allocator is instantiated with the ObjectView type for out-of-process enumeration.
Fundamentally however, the code path of in-process and out-of-process enumeration are similar but there are some divergences that have to be handled somewhere which is why you are seeing this ObjectView::IsLocal() guard.

Also, if the change requires replacing

auto t = chunks_[i];

with the code below I can not accept such a change.

Header **header_ptr_addr = &(chunks_[i]);
// Copy header pointer into this process
typename ObjectView::template SmallAllocTy<Header *> header_ptr_view(
    header_ptr_addr);
Header *t = *(header_ptr_view.GetLocalAddress());

Is there a reasonable way to replace it with this?

auto t = Load(&chunks_[i]);

Yes, but with a caveat. The type of t will not be Header* it will be LocalObjectView<Header*> for the in-process instantiation and RemoteObjectView<Header*, StackStorage> for the out-of-process.
The reason for this is that for the out-of-process case we need to manage the memory (allocate, copy into, and deallocate) for the object that is copied from a remote process. This is what the RemoteObjectView
object (not included in this patch) is doing.

However what we can do it provide an implicit conversion (like we do with llvm::cl::opt<int> today) so that can be treated as the type it represents (Header* in this case). Personally I'd prefer not to do the implicit conversion operation
because I prefer the programmer to be aware of the type they are working with (i.e. an ObjectView<Header*> and not a Header*) but I can see the appeal of the implicit conversion.

I'll make the change and let you take a look.

kcc added a comment.Nov 1 2018, 3:19 PM

have you considered implementing this fully outside of the sanitizer allocator?
It might end up being much cleaner.

delcypher updated this revision to Diff 172259.Nov 1 2018, 3:55 PM
  • Simplify the way we make new instances of ObjectView<ObjectTy> by introducing a MakeView(...) function
  • Add an implicit conversion operator to LocalObjectView<ObjectTy> so it can be implicitly converted to ObjectTy&. This allows us to simplify the code a bit.
In D53975#1284724, @kcc wrote:

have you considered implementing this fully outside of the sanitizer allocator?

Yes, but it would effectively be a downstream change which means longterm maintenance burden on us. We want to avoid that (the need for me or Dan to rush a fix whenever the allocator changes slightly upstream).

delcypher updated this revision to Diff 172268.Nov 1 2018, 4:24 PM

Unguard consistency checks by rewriting them by using ObjectView interface.

@kcc Please take another look at the patch. I have tried to address some of the issues you raised. I've removed one of the ObjectView::IsLocal() guards but the one around EnsureSortedChunks() remains.

There are various technical reasons why don't want to do this on the out-of-process path which I can explain if you really want. However could you explain something to me?

Why is EnsureSortedChunks() even called at all? I see nothing in the iteration code that would actually trigger a sort.
During the call to ForEachChunk(...) the allocator is also supposed to be locked so other threads won't be able to trigger a sort either. This suggests to me that the call to EnsureSortedChunks() is unnecessary, even on the in-process path. If we remove that then this ObjectView::IsLocal() guard issue just goes away.

delcypher updated this revision to Diff 172333.Nov 2 2018, 3:30 AM
  • Guard second check because it only holds when EnsureSortedChunks() has been called.
  • Minor tweaks.

Why is EnsureSortedChunks() even called at all? I see nothing in the iteration code that would actually trigger a sort.
During the call to ForEachChunk(...) the allocator is also supposed to be locked so other threads won't be able to trigger a sort either. This suggests to me that the call to EnsureSortedChunks() is unnecessary, even on the in-process path. If we remove that then this ObjectView::IsLocal() guard issue just goes away.

Okay I figured it out. It's the callback that could trigger a sort. The current callback implementations call things like GetUserBegin() which can eventually trigger a call in the allocator to GetBlockBeginFastLocked() which will try to sort the chunks.

@kcc

In D53975#1284724, @kcc wrote:

have you considered implementing this fully outside of the sanitizer allocator?

Yes, but it would effectively be a downstream change which means longterm maintenance burden on us. We want to avoid that (the need for me or Dan to rush a fix whenever the allocator changes slightly upstream).

Just to add this. The patch you looked at during LLVMDev was effectively "implementing this fully outside of the sanitizer allocator". That patch had out-of-process variants of the important allocator methods and declared them elsewhere so they were only compiled for Darwin. You rightly rejected that patch (because it duplicated a bunch of allocator code) and requested we look at unifying the implementations. This patch is my best attempt at unifying the implementations. I personally think the approach in this patch is much better and I'm glad you pushed us to investigate implementing out-of-process enumeration in this way.

So we've seen both approaches and so we need to decide which has the better trade off for maintainability.

For what it's worth this allocator is the only allocator where the issue of having use ObjectView::IsLocal() guards actually comes up. None of the other allocators try to modify themselves during chunk enumeration. So ObjectView::IsLocal() guards won't appear anywhere else.

delcypher updated this revision to Diff 172349.Nov 2 2018, 6:44 AM

Remove two const attributes that breaks the RemoteObjectView implementation (not up for review yet).

@kcc Any more thoughts on this?

kcc added a comment.Nov 5 2018, 6:09 PM

no new thoughts, sorry. The patch continues to horrify me.

Can you please be more specific about what exactly are you objecting against? We really want the code for the out-of-process enumeration to be part of compiler-rt, otherwise we'll be getting downstream failures forever. And in compiler-rt, the code belongs to the allocator.

We might be able to improve on this further, if you explain what specifically you don't like: The IsLocal checks are needed because the code modifies memory. We could change the function to have another bool argument that would specify whether modifications are allowed or not.

krytarowski added a subscriber: krytarowski.EditedNov 5 2018, 6:23 PM

Is this out of process only to enumarate? Is there an option to support this through debugging interfaces and analysis of a program? I mean to inspect asanitized program from an external debugger(-like) program.

Is this out of process only to enumarate? Is there an option to support this through debugging interfaces and analysis of a program? I mean to inspect asanitized program from an external debugger(-like) program.

The goal of this patch is to exactly allow an external debugger-like program to analyze the heap. There's existing macOS tools (heap, leaks, malloc_history) that use the interface that we're trying to implement.

Interesting, are these tools open-source? I would be interested to port them to NetBSD.

lib/sanitizer_common/sanitizer_allocator_secondary.h
284 ↗(On Diff #172349)

I think writing
auto t = ObjectView::MakeView(&chunks_[i])
is considerably more clear.
@kcc Would that be better with you?

We can also do an implicit constructor and write

ObjectView t = &chunks_[i]

290 ↗(On Diff #172349)

Cast seems to be the wrong operator here, why not * ?
Then we could write

CHECK_EQ(*t, *ObjectView::MakeView(&chunks_[i]))

delcypher added inline comments.Nov 6 2018, 1:41 PM
lib/sanitizer_common/sanitizer_allocator_secondary.h
290 ↗(On Diff #172349)

I'm basically just using static_cast<Header*>() to trigger to the conversion operator to ObjectTy&.

I could also provide a operator*(). I'm just not convinced it's clearer.

We could also provide operator==. So that the check reads as CHECK_EQ(t, ObjectView::MakeView(&chunks_[i])).

The problem there though is that template instantiation will fail if the ObjectTy in LocalObjectView doesn't have an == operator. We could use std::enable_if to conditionally declare the operator only if ObjectTy has an == operator. The trouble with that is because C++ doesn't have concepts yet, actually writing this would be extremely messy.

delcypher updated this revision to Diff 173054.Nov 7 2018, 3:20 PM

Simplify design even further at the expense of memory management.

delcypher updated this revision to Diff 173071.Nov 7 2018, 4:10 PM

Remove clang-format whitespace change.

They're not open source. The interface to them is at https://opensource.apple.com/source/Libc/Libc-498/include/malloc/malloc.h.

It looks like mallinfo from GNU or similar and it looks like closely tied to Darwin internals, so it does not seem to be reusable by others.

@kcc @kubamracek & I discussed this change some more and we've produced an even simpler patch at the expense being able to do good memory management in the out-of-process case.
Please take a look and let us know what you think.

kcc added a comment.Nov 8 2018, 6:36 PM

(sorry, I have been distracted).
The first impression is that this seems manageable. Thank you for bearing with me.

Maybe, as a bike-shedding exercise, try to find a better name for ObjectMap

They're not open source. The interface to them is at https://opensource.apple.com/source/Libc/Libc-498/include/malloc/malloc.h.

It looks like mallinfo from GNU or similar and it looks like closely tied to Darwin internals, so it does not seem to be reusable by others.

The complete implementation will be darwin specific. However the changes we will be making to the allocators themselves will be platform agnostic so it will be possible to build tools on top of this.

@kcc

Maybe, as a bike-shedding exercise, try to find a better name for ObjectMap

Given that the ObjectMap abstraction doesn't necessarily deal with single objects having Object in the name isn't very helpful.
All the abstraction now does it map pointers from one address space into another while transparently handling the copying.

Based on those observations here are some other name ideas

  • PointerMap
  • AddressSpaceView (possibly shortened to AddrSpaceView). This name suggests renaming the Map(...) method to Load(...).
  • VMView. This name suggests renaming the Map(...) method to Load(...).

Other suggestions are welcome.

kcc added a comment.Nov 9 2018, 12:38 PM

AddressSpaceView::Load sounds great.
You can go further and rename Load to operator(), then declare a member

AddressSpaceView Load;

such that the code would look like

Load(chunks[i])

but that's up to you.

Please add comments and tests.
Are you planing to add the rest of the code to this change, or commit it separately? (either works)

In D53975#1293500, @kcc wrote:

AddressSpaceView::Load sounds great.
You can go further and rename Load to operator(), then declare a member

AddressSpaceView Load;

such that the code would look like

Load(chunks[i])

but that's up to you.

If we make AddressSpaceView an instance member because it will change the data layout of the allocator (the out-of-process version of AddressSpaceView will not be zero sized, unlike the in-process version). This is something we cannot allow. However, we could make it a static member I suppose which won't affect data layout. I need to use this AddressSpaceView in other places outside of a class and I'm a little worried how it will look there. I'd rather stick with the Load method for now and apply your suggested transformation in later commits if it turns out to be necessary.

Please add comments and tests.
Are you planing to add the rest of the code to this change, or commit it separately? (either works)

There are many parts to this change which I'm going to have to land as separate commits. Here's my rough plan

  1. Put up a change for each allocator change required for ForEachChunk(). At this point we won't introduce the out-of-process version of AddressSpaceView and will rely on the existing allocator unit tests. If there are missing allocator unit tests I will add them. This might seem odd choice but the out-of-process version of AddressSpaceView simply isn't ready yet (the requirements have changed which my old implementation cannot meet). However I know what the allocator changes will need to be so I can have those sitting in the review pipeline while I implement the out-of-process version of AddressSpaceView.
  1. Start a review sketching a design for how to implement LSanChunkView and ASanChunkView out-of-process. This is another major part of the work because being able to do ForEachChunk out-of-process isn't useful on its own. We need a way to know what the state of each enumerated chunk is.
  1. Put up for review an out-of-process of AddressSpaceView. This will include unit tests to make sure the allocator can be instantiated with this other type and that enumeration still works.
  1. Put up for review the Darwin specific code that will use the new out-of-process interfaces. This will include a Darwin only "system test" that exercises all the newly introduced code.

I'm going to tweak this change a bit so that it is a change for the Secondary allocator. Separate changes will follow.

kcc added a comment.Nov 12 2018, 4:12 PM

I'm puzzled.
Why would you make AddressSpaceView non-zero-sized given that Load is a static member function?

I would like to see a test that uses a non-default implementation of AddressSpaceView.

In D53975#1296359, @kcc wrote:

I'm puzzled.
Why would you make AddressSpaceView non-zero-sized given that Load is a static member function?

Sorry. I was thinking of the old implementation and not the new implementation (I shouldn't reply when I'm really tired). Yes, you're right all implementations of AddressSpaceView will be zero-sized.

I would like to see a test that uses a non-default implementation of AddressSpaceView.

Yes I will be landing tests in two forms.

  • Unit tests that check on all platforms that it's possible to instantiate the allocator and use it with RemoteAddressSpaceView (these won't actually launch a separate process).
  • A lit test that that acts like one of Darwin's memory analysis tools by enumerating all the allocations of another process that is using ASan.
In D53975#1296359, @kcc wrote:

I'm puzzled.
Why would you make AddressSpaceView non-zero-sized given that Load is a static member function?

Sorry. I was thinking of the old implementation and not the new implementation (I shouldn't reply when I'm really tired). Yes, you're right all implementations of AddressSpaceView will be zero-sized.

Okay I just checked. Unfortunately your idea to have an instance of AddressSpaceView as a member and add operator() isn't going to work. It seems C++ does not allow zero sized classes. Even a class with zero members have to be at least 1 byte in size apparently.
This implies that adding AddressSpaceView as a member could change the data layout which is a deal breaker for us. So for now we'll have to do without the syntactic sugar for now. I'll upload a new patch soon.

Just in case you were curious here's an example that shows empty classes don't have zero size.

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

using uptr = uintptr_t;

struct LocalAddressSpaceView {
  template <typename T>
  static T *Load(T *target_address, uptr num_elements = 1) {
    return target_address;
  }

  LocalAddressSpaceView() {
    static_assert(sizeof(LocalAddressSpaceView) == 1, "Must be zero sized");
  }
};

struct Foo {
  LocalAddressSpaceView a;
  LocalAddressSpaceView b;
};

int main() {
  Foo foo;
  printf("Foo: 0x%" PRIx64 "\n", reinterpret_cast<uintptr_t>(&foo));
  // Note &(foo.a) != &(foo.b);
  printf("Foo.a: 0x%" PRIx64 "\n", reinterpret_cast<uintptr_t>(&(foo.a)));
  printf("Foo.b: 0x%" PRIx64 "\n", reinterpret_cast<uintptr_t>(&(foo.b)));
  return 0;
}
delcypher updated this revision to Diff 173940.Nov 13 2018, 2:44 PM
  • s/ObjectView/AddressSpaceView/.
  • More comments.
  • Rewrite commit description.
  • Move LocalAddressSpaceView into its own header file for consumption by other parts of the code in later patches.

@kcc is this version of the patch good enough to land? As noted earlier.

  • We will land unit tests for each allocator once we land RemoteAddressSpaceView.
  • Changes to the other allocators will come as separate patches.
kcc accepted this revision.Nov 14 2018, 11:08 AM

LGTM
You may still replace AddressSpaceView::Load with Load by implementing a one-line function inside this class. Up to you.

This revision is now accepted and ready to land.Nov 14 2018, 11:08 AM
In D53975#1298813, @kcc wrote:

LGTM
You may still replace AddressSpaceView::Load with Load by implementing a one-line function inside this class. Up to you.

Thanks. I'd rather not do this right now. I'm worried that as I re-implement all my work using the new interface, that I will discover somewhere minor tweaks are required. Once all the out-of-process enumeration pieces have landed I'll be happy to a one-line function to make the syntax shorter.

This revision was automatically updated to reflect the committed changes.