This is an archive of the discontinued LLVM Phabricator instance.

Introduce `RemoteAddressSpaceView` and `VMReadContext`.
Needs RevisionPublic

Authored by delcypher on Jan 2 2019, 10:35 AM.

Details

Summary

RemoteAddressSpaceView is the remote memory counter part to
LocalAddressSpaceView that implements the AddressSpaceView
interface.

The RemoteAddressSpaceView implementation is a simple wrapper around a
VMReadContext object. A VMReadContext object implements the platform
specific code to

  • Read memory from another process.
  • Cache those reads and manage the cache lifetime.
  • Manage the writability of the cached memory.

Two implementations are provided:

  • A Darwin only implementation that allows in-process and out-of-process

reads. The reads are cached by a Darwin specific framework
(CoreSymbolication) and thus the VMReadContext implementation does not
implement a cache.

  • A non-Darwin implementation that only allows in-process reads

(basically a no-op). This exists only for testing purposes.

Tests for this functionality will be added in later patches.

rdar://problem/45284065

Event Timeline

delcypher created this revision.Jan 2 2019, 10:35 AM
Herald added subscribers: Restricted Project, mgorny, kubamracek. · View Herald TranscriptJan 2 2019, 10:35 AM
delcypher updated this revision to Diff 179889.Jan 2 2019, 10:40 AM

Fix some comments.

Would it make sense to also extend this to Linux using process_vm_readv/process_vm_writev?

Would it make sense to also extend this to Linux using process_vm_readv/process_vm_writev?

Perhaps, but certainly not in this patch because I consider that out of scope. On Darwin we already have tools that will be using the new functionality, on Linux the tools don't exist yet. So implementing VMReadContext for Darwin is much more useful at this time.
The interface should be generic enough that we could add Linux support in the future.

vitalybuka added inline comments.Jan 8 2019, 4:10 PM
lib/sanitizer_common/sanitizer_remote_address_space_view.h
28

Why all these stuff needs to be static?

delcypher added inline comments.Jan 9 2019, 1:22 AM
lib/sanitizer_common/sanitizer_remote_address_space_view.h
28

@vitalybuka Because the allocators don't receive an instance of RemoteAddressSpaceView, they receive the type as a template parameter (e.g. SizeClassAllocator64). This design choice means that everything in an implementation of the AddressSpaceView interface (i.e. LocalAddressSpaceView and RemoteAddressSpaceView) needs to be static.

To implement RemoteAddressSpaceView we have to store some state. All of this state and the read operations are encapsulated in the VMReadContext object. We cannot store any of this state inside the allocators because that could change the data-layout.

This design is not thread-safe of course but for allocator enumeration on Darwin will only ever be doing that in a single thread anyway.

vitalybuka added inline comments.Jan 9 2019, 12:07 PM
lib/sanitizer_common/sanitizer_remote_address_space_view.h
28

Can you add some CHECKs validate this single thread assumption?

lib/sanitizer_common/sanitizer_vm_read_context.h
28

could you please init these inline?

vitalybuka added inline comments.Jan 9 2019, 1:11 PM
lib/sanitizer_common/sanitizer_remote_address_space_view.h
26

struct -> class

69

maybe nested: ScopedRemoteAddressSpaceViewContext -> RemoteAddressSpaceView::ScopedContest
and make RemoteAddressSpaceView::set_context private

69

class and
private:

VMReadContext vmrc_;
lib/sanitizer_common/sanitizer_vm_read_context.h
25

Just define "typedef int ProcessHandle;" and remove process_vm_read_handle_t
On other platforms it does not matter, and we will add more specific type if needed later.

32

historically sanitizer common uses Google C++ naming convention so members need to have _ suffux.

51

Please remove all stuff you are not using in upcoming patches.
When can extend API when needed

74

If you can calculate it from target_process please remove member and simplify state

94

As I understand we do not expect a lot of context callers
so let remove default "writable = true" and make calls more verbose

lib/sanitizer_common/sanitizer_vm_read_context_common.cc
20 ↗(On Diff #179889)

Remove it if you don't use it.
I'd expected ScopedContext is the why to reset it.

lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
43

Just remove it. You'll add it when needed

delcypher marked 9 inline comments as done.Jan 11 2019, 11:00 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_remote_address_space_view.h
26

LocalAddressSpaceView is a struct. Making RemoteAddressSpaceView be a class seems inconsistent. Why do you want to make this change?

28

Can you add some CHECKs validate this single thread assumption?

@vitalybuka

I don't see an easy way to do this because this code will be used in a context where there is no ThreadRegistry and there is no interception on pthread_create and other thread functions. I also couldn't see a non-asan specific (e.g. I found AsanThread* GetCurrentThread()) API to use because this code will be common to all sanitizers.

There are also examples of existing code that don't check the single thread assumption. E.g. from lib/asan/asan_thread.cc

ThreadRegistry &asanThreadRegistry() {
  static bool initialized;
  // Don't worry about thread_safety - this should be called when there is
  // a single thread.
...

I'm not against adding some sort of check to validate the single thread assumption but I need more guidance on how to do this using the available sanitizer internal APIs.

69

maybe nested: ScopedRemoteAddressSpaceViewContext -> RemoteAddressSpaceView::ScopedContest

and make RemoteAddressSpaceView::set_context private

That's a great idea. I'll do that.

lib/sanitizer_common/sanitizer_vm_read_context.h
25

Apart from that fact that it needs to be unsigned on Darwin. I'm not super comfortable doing that because the declaration of __sanitizer::process_vm_read_handle_t in sanitizer_internal_defs.h allows to detect if the system declaration of task_t disagrees with the Sanitizer runtimes type. This is because on Darwin we include system headers that do define the task_t type.

28

Yes. I'll do that.

32

I'll try to fix that.

51

I'll have a look through my out-of-tree implementation and see what I can remove.

94

Sure.

lib/sanitizer_common/sanitizer_vm_read_context_common.cc
20 ↗(On Diff #179889)

Reset() gets called if the read call fails. Although I've just realised that's actually inconsistent with the API comments. I'll fix that.

delcypher marked an inline comment as done.Jan 11 2019, 12:23 PM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
43

By "it" I presume you mean that writable argument?

But if I do that, how far up the stack should I go? Taken to the extreme that means reverting what was landed in https://reviews.llvm.org/D54879 . I really don't want to do that because the distinction between writable and read-only access is likely important.

delcypher updated this revision to Diff 181566.Jan 14 2019, 8:49 AM

Lots of clean-up.

delcypher marked 6 inline comments as done and 2 inline comments as done.Jan 14 2019, 8:52 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context.h
74

We cannot calculate it. It needs to be part of the sate.

delcypher updated this revision to Diff 182057.Jan 16 2019, 8:13 AM
delcypher marked an inline comment as done.
  • Remote fields/methods in VMReadContext that won't be used.
  • Remove default writable=true argument.
delcypher marked 5 inline comments as done.Jan 16 2019, 8:15 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context.h
51

I've removed several fields/methods from this patch that weren't being used in my internal implementation.

delcypher marked an inline comment as done.Jan 16 2019, 8:16 AM

@vitalybuka I've fixed what I can and I've left questions about things that aren't straightforward to fix or that I don't agree with. When you have time could you take another look?

vitalybuka added inline comments.Jan 16 2019, 1:37 PM
lib/sanitizer_common/sanitizer_remote_address_space_view.h
26

Historically sanitizers follow google c++ style and class is more appropriate here
https://google.github.io/styleguide/cppguide.html#Structs_vs._Classes

as well for LocalAddressSpaceView

28

what about saving GetTid() on the first call of set_context()
then check that saved id matches GetTid() for all subsequent calls to
set_context()
get_context()
Load*()

I assume this is not going to be main thread?

68

same for class

lib/sanitizer_common/sanitizer_vm_read_context.h
31

please init target_process_ to something target_process_ as well

vitalybuka added inline comments.Jan 16 2019, 1:37 PM
lib/sanitizer_common/sanitizer_vm_read_context.h
18

empty live between define and include

25

then just define process_vm_read_handle_t as task_t in sanitizer_internal_defs for Darwin?

43

why copying is not acceptable?

74

We cannot calculate it. It needs to be part of the sate.

I read this as:

bool IsLocal() const {
  return config_ == nullptr || mach_task_self() == target_process_;
}
lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
29

can you remove "config" argument and just initialize config_ to darwin specific stuff here?

30

if we keep it please remove {} for consistency

43

It still looks like better to split Read and ReadWriteble when we have different implementation for them.
So If you have/expect such patch quite soon, then fine, let's keep it. If this is just expectation that some-when in undefined future we will likely need to spit them, then I'd prefer we split when it's actually necessary.

51

why do we need config_?
from here looks like you expect a particular function
could you call it directly here?

delcypher marked 6 inline comments as done.Jan 18 2019, 5:26 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context.h
31

Not doing inline initialisation here is deliberate. The type is platform specific so we might not be able to initialise it using an int. This is why the platform specific constructors do the initialisation.

Technically we could initialise to 0 now because task_t happens to be a typedef for unsigned on Darwin but this might not work for other platforms in the future.

43

Because a VMReadContext object is supposed to manage the lifetime of memory it copies from another. The implementation might hold data structures to manage this memory and so copying a VMReadContext object is almost certainly a mistake.

For the Darwin implementation VMReadContext delegates this task to a Darwin specific framework that guarantees the lifetime of the copied memory will be longer than its client (VMReadContext). So technically nothing would go wrong right now if the VMReadContext copy constructor was called. However calling the VMReadContext is almost certainly a mistake and so it should be disallowed.

74
bool IsLocal() const {
 return config_ == nullptr || mach_task_self() == target_process_;
}

Oh I see. I thought you where talking about GetTargetProcess(). However in this latest revision IsLocal() has actually been removed because my internal implementation never called it.

I can probably bring it back though.

lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
29

No. On Darwin config is a function pointer but on other platforms it can be anything (that's why it has type void*) that's needed to get access to the target process's memory. The config argument must be there to provide a consistent and flexible interface on all platforms.

There is no special function we can call from inside VMReadContext to get the function pointer. The client creating the VMReadContext on Darwin knows the value of the pointer and must pass it in.

30

Ok. I'll fix this.

51

No we cannot. It must be passed in via the VMReadContext constructor.

delcypher updated this revision to Diff 182510.Jan 18 2019, 7:21 AM

Minor tweaks.

delcypher marked 6 inline comments as done.Jan 18 2019, 7:24 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context.h
18

Fixed.

32

Fixed.

74

I've brought the implementation back and removed the is_local_ field.

lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
30

Done.

delcypher marked 2 inline comments as done.Jan 18 2019, 7:30 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context.h
25

I tried doing this in my internal implementation. It breaks compilation because sanitizer_internal_defs gets included very early and task_t at that point is not defined. Let's just keep it.

delcypher marked an inline comment as done.Jan 18 2019, 7:49 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
43

Using the read-only/writable information will land much later so I'm going to remove passing this information to VMReadContext.

delcypher updated this revision to Diff 182527.Jan 18 2019, 8:37 AM
  • Check thread id
  • Use atomic variables
delcypher marked an inline comment as done.Jan 18 2019, 8:38 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_remote_address_space_view.h
28

Okay I've added the thread check. I don't know if this code will be called on the main thread or not. Presumably that's irrelevant here?

@vitalybuka I've done another pass over the patch and fixed everything I reasonably could. Please make another pass when you have time.

vitalybuka added inline comments.Jan 18 2019, 1:18 PM
lib/sanitizer_common/sanitizer_remote_address_space_view.h
25

includes are not ordered

33

Why did you obfuscate context type?
There is not point in atomic if you are going to use single thread
maybe OK for owning_tid_ but we will crash immediately if this does not match

51

probably you should check if owning_tid_ is already set that they match

58

Please avoid reinterpret_cast, just return VMReadContext* as member

lib/sanitizer_common/sanitizer_vm_read_context.h
31

Maybe platform specific invalid_vm_read_handle near process_vm_read_handle_t?
also process_vm_read_handle_t is too specific, why not just some task_id?

43

"operator=" ?

48

(void*) -> reinterpret_cast<>

lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
51

Instead of void* can you make it function pointer?
With type similar to memory_reader_t
If you make memory_reader_t return bool and handle error there, the only platform specific part of VMReadContext will be the callback.

also config_ -> native_reader_;

Rebase on master and adopt new license headers.

delcypher updated this revision to Diff 182930.Jan 22 2019, 8:39 AM
delcypher marked 4 inline comments as done.
  • Revert back to plain VMReadContext*
  • Check owning_tid_ in set_context()
  • Delete more methods from VMReadContext.
delcypher marked 2 inline comments as done.Jan 22 2019, 8:41 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_remote_address_space_view.h
25

Oops. Good catch. Fixed.

33

Originally I did that because I was going to make this a DCHECK and I didn't want it to be possible for the pointer to change non-atomically. You're right though. Now that the CHECK happens in all builds I should just use VMContext*.

I've fixed this now.

51

That would require that I guarantee that owning_tid is 0 initially (i.e. before RemoteAddressSpaceView is ever used). I think this might already be the case because atomic_uint64_t RemoteAddressSpaceView::owning_tid is a global but I wasn't 100% sure. I wasn't sure how I was supposed to write an initializer for the global in sanitizer_remote_address_space_view.cc so I just left it out.

If we can confidently assume that owning_tid_ is zero initialized then I'll add the check.

I've gone ahead and made the fix. Seems to work.

lib/sanitizer_common/sanitizer_vm_read_context.h
43

That would also be bad. I've deleted that and also the move variants.

delcypher updated this revision to Diff 182932.Jan 22 2019, 8:44 AM
  • Remove C-style cast.
delcypher marked 2 inline comments as done.Jan 22 2019, 8:45 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context.h
48

Fixed.

delcypher marked 2 inline comments as done.Jan 22 2019, 9:22 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
51

Instead of void* can you make it function pointer?
With type similar to memory_reader_t
If you make memory_reader_t return bool and handle error there, the only platform specific part of VMReadContext will be the callback.

I could but I'm strongly against that because.

  • A function pointer isn't great. It can't hold any state. Even in Darwin's case I'd have to write a wrapper function that implements the callback interface. That function needs to know the address of the actual function to call. So the callback would need to be able to receive arbitrary data (i.e. passing a void*) so I could pass that address.
  • Having a function pointer encourages implementations to actually live outside of VMReadContext. This is the opposite of what VMReadContext() tries to do -- Hide as much platform specific behaviour as possible inside VMReadContext. It is a Darwin specific implementation detail that we have to pass a function pointer to VMReadContext, other platforms should not be forced to do that in-order to write their implementations. They should be able to pass whatever makes sense for their platform.
  • On other platforms the interface may need to work differently. More parameters might need to be passed to the reader. In this case the void* would be a pointer to a struct containing the extra data needed by the platform.
  • It ties the design of VMReadContext very closely to Darwin's memory_reader_t.

Besides, I could probably achieve the non-platform specific error handling simplify by refactoring Read(...) into the platform agnostic part Read(...) and the platform specific part ReadImpl(...).

I think we could clean up the interface in a better way. Instead of a void* argument to the constructor it could be

VMReadContextConfig* which would be

class VMReadContextConfig {
   public:
    __sanitizer::process_vm_read_handle_t target_process;
};

Then other platforms could sub-class this and add their own platform specific fields. For Darwin we'd add a function pointer field so that the Darwin specific function pointer can be passed.

That way

  • At most one argument needs to be passed VMReadContext
  • It allows for platform specific implementations in a way that's slightly more type safe than void*.
  • We can pass arbitrary data to be passed to VMReadContext.

    We don't have RTTI or LLVM's dyn_cast so we would have to statically cast inside each implementation to the expected VMReadContextConfig sub-class.
vitalybuka added inline comments.Jan 22 2019, 12:41 PM
lib/sanitizer_common/CMakeLists.txt
45

Please return )

lib/sanitizer_common/sanitizer_remote_address_space_view.h
56

So data race is possible when one thread is releasing context and another is setting.
For simplicity I think you should never reset owning_tid_
As soon as one thread used it, no other allowed to interact with that.

Otherwise we need to design proper synchronization which is inconvenient with all these statics. E.g. as soon someone called Load and saved returned pointer
we will need to make sure that caller release the pointers before letting other threads to interact with context.

And maybe it's not bad idea. But you have option to go with "never reset owning_tid_" now and switch to Load/Unload with mutexes later, when needed.

57

Nothing prevents one thread to create ScopedContext and another thread to do LoadWritable. We need to check owning_tid_ in Load and LoadWritable

102

: vmrc() is not needed

lib/sanitizer_common/sanitizer_vm_read_context.h
31

Maybe platform specific invalid_vm_read_handle near process_vm_read_handle_t?
also process_vm_read_handle_t is too specific, why not just some task_id?

I still think you should remove process_vm_read_handle_t from sanitizer_internal_defs.h and just define it in the Context. u64 everywhere, like for tid_t, is good enough for me.

lib/sanitizer_common/sanitizer_vm_read_context_mac.cc
56
bool VMReadContext::IsLocal() const {
  return config_ == nullptr || mach_task_self() == target_process_;
}
delcypher marked 2 inline comments as done.Jan 23 2019, 10:04 AM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_remote_address_space_view.h
56

So data race is possible when one thread is releasing context and another is setting.
For simplicity I think you should never reset owning_tid_
As soon as one thread used it, no other allowed to interact with that.

I'm fine with that. Currently RemoteAddressSpaceView should only be used within a single-thread.

I'll update the patch as you suggest.

57

Did I miss something? Won't the CHECK_EQ(atomic_load_relaxed(&owning_tid_), current_tid) fire inside get_context() fire due to LoadWritable() or Load() eventually triggering a call to get_context()?

  • Add extra return in CMakeLists.txt.
  • Only allow for owning_tid_ to be set once.
  • Clean up IsLocal() darwin implementation.
delcypher marked 2 inline comments as done and an inline comment as not done.Jan 23 2019, 10:38 AM

Add VMReadContextConfig to simplify usage of VMReadContext.

delcypher marked an inline comment as done.Jan 23 2019, 12:25 PM
delcypher added inline comments.
lib/sanitizer_common/sanitizer_vm_read_context_config.h
1 ↗(On Diff #183152)

@vitalybuka I had a go at adding VMReadContextConfig (see the header file below). The advantage of this is I'm able to pull the definition of task_t out of sanitizer_internal_defs.h and make the constructor of VMReadContextConfig a little nicer by not having it take a void*.

Fix lint issues.

Fix comments so the VMReadContext::Read() actually complies
in the case we are doing local reads.

vitalybuka requested changes to this revision.Apr 24 2019, 11:14 AM

Removing stale patches from the "Ready to Review" list.

This revision now requires changes to proceed.Apr 24 2019, 11:14 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 24 2019, 11:14 AM
yln resigned from this revision.Jul 22 2021, 4:12 PM