This is an archive of the discontinued LLVM Phabricator instance.

[Darwin] Teach `AtosSymbolizerProcess` to work on a copy of the environment.
AbandonedPublic

Authored by delcypher on Apr 7 2020, 5:13 PM.

Details

Reviewers
kubamracek
yln
Summary

This refactor is a pre-requisite to fixing rdar://problem/58789439. In
future patches we will modify this copy to pass a different environment
to atos. Due to working on a copy of the existing environment it is
expected that this patch will produce no functional change in most
cases.

The storage size for the copy of the environment is set statically so we
avoid doing a heap allocation at symbolization time where we might be
unable to allocate heap memory (e.g. when an out-of-memory situation has
been hit). Note that although AtosSymbolizerProcess is itself heap
allocated it is done so during early init where we should have enough
available heap memory.

In the case that the current process has more environment variables than
can be stored (currently 511 + 1 for null terminator) then a warning is
produced and the copy of the environment is truncated. A test case is
included for this scenario which should hopefully be rare.

Diff Detail

Event Timeline

delcypher created this revision.Apr 7 2020, 5:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 5:13 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
yln accepted this revision.Apr 8 2020, 9:27 AM

LGTM with nits

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
54

Can be moved inside private section of class: it's a private implementation detail of the class.

59

I wouldn't do this (initialize members). My argument is *not* performance or that it is unnecessary, but rather that it makes it seem as if AtosSymbolizerProcess is fully usable after being constructed. Good C++ objects are fully initialized/usable after construction. AtosSymbolizerProcess is not like that. It's very special: you have to call its methods in a specific order for it to work. Stray of that path and it won't work correctly. And we can't fix that, not even by initializing its members with good (empty) defaults.

124

Should this also be setup in StartSymbolizerSubprocess for consistency? Or the other way around: should pid_str_ be populated in GetArgV? But then we would probably introduce an unwanted order dependency between GetEnvP and GetArgV. (These are just questions, not suggestions.)

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-large-env.cpp
2

I think this is unnecessary

23

I think the most important aspect to test would be to check that the started symbolizer process has all the env vars that we would expect it to have. I thought about how to test this but don't have a good idea, so I am happy to move forward without it.

This revision is now accepted and ready to land.Apr 8 2020, 9:27 AM
delcypher marked 4 inline comments as done.Apr 8 2020, 12:28 PM
delcypher added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
54

Seems reasonable. I thought there was some issue with C++ wanting this to be constexpr when I tried this last time but it doesn't seem to be happening now so I'll make this change.

59

The point of this initialization is so that if methods are some how called in the wrong order we get deterministic behaviour. If we just leave garbage in this array and pass this to posix_spawn by accident then we'd likely get unpredictable crashes. Technically this initialization is unnecessary so if you feel very strongly about this then I'll remove it. Do you still want this removed?

124

Should what also be setup in StartSymbolizerSubprocess?

Also please bear in mind the subsequent patch (https://reviews.llvm.org/D77706) where the semantics of CopyEnv change to allocate space for a new environment variable.
Although looking at this.. all of this could actually be moved in StartSymbolizerSubprocess that would actually be way cleaner because we can keep the branching logic for SANITIZER_IOSSIM contained to a single method.

I'll refactor this.

compiler-rt/test/sanitizer_common/TestCases/Darwin/symbolizer-large-env.cpp
2

This was originally here because I was hitting some weird issue where atos was reporting stale line locations. I'll remove it.

delcypher updated this revision to Diff 256116.Apr 8 2020, 2:38 PM

Address first round of feedback.

yln accepted this revision.Apr 8 2020, 2:49 PM

LGTM

kubamracek added inline comments.Apr 8 2020, 5:34 PM
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
95

Can this be moved into sanitizer_mac.cpp or other non-symbolizer file? Make it an actual useful helper that takes the destination buffer as argument, its size, returns true/false.

kubamracek added inline comments.Apr 8 2020, 5:40 PM
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
115

Can you briefly describe how tricky would it be to not have a hardcoded limit? Could an InternalMmapVector be used here instead?

117

Is it actually correct to not do a "deep copy" and only copy the pointers? I mean, who "owns" these strings? Are they guaranteed to live forever?

delcypher marked 3 inline comments as done.Apr 8 2020, 7:27 PM
delcypher added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
95

No. The semantics of this function will change in the next patch that tie it to the needs of this class. To avoid discussing this in two places let's discuss the change of semantics in https://reviews.llvm.org/D77706.

115

It's not tricky but I don't think it's a good idea. We should avoid trying to do allocation during symbolication because we could be in an out-of-memory situation. The only way to avoid doing allocations at symbolization time is to do it earlier. This patch allocates at the time the symbolizer class is created (because the statically declared array is an instance member) which is during early process init where we should have plenty of memory (if we don't then we have bigger problems).

117

Darwin's Libc setenv implementation owns these strings. When users call setenv(...) it makes copies of them which it keeps track of. There are technically some circumstances where this is unsafe.

  • setenv() is called on an existing environment variable in the environment array. In some cases Libc will call realloc() on the pointer to the existing string. In the right circumstances this could potentially leave a dangling pointer in our copy of the environment array.
  • unsetenv() is called on an existing environment variable in the environment array. This will cause the string to be free'd. Again this could leave a dangling pointer in our copy of the environment array.

So no. They are not guaranteed to live for ever.

Possible solutions:

  • Do nothing.
  • Implement interceptors to prevent environment changes being made while symbolizer process is being launched. Technically this will never be sufficient because there are APIs to directly access the environment array directly.

Deep copying the strings on its own is actually not a good solution because it suffers exactly from the same problem we have already. If the environment gets changed while we're in the middle of copying the strings we could dereference free'd memory.

yln added inline comments.Apr 9 2020, 8:25 AM
compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_mac.cpp
117

I am assuming that posix_spawn makes a deep copy of the passed-in environment when it starts the new process. Dan, do you know if this is indeed the case?

If so, I think we should ignore these issues for the sake of simplicity because the windows for things to go wrong (time between StartSymbolizerSubprocess() and posix_spawn) is small.

delcypher updated this revision to Diff 257092.Apr 13 2020, 1:23 PM

Use new EnvArray type.