This is an archive of the discontinued LLVM Phabricator instance.

[Sanitizers][Atos] Remove null-ing of atos process pointer
ClosedPublic

Authored by thetruestblue on Apr 6 2023, 10:57 AM.

Details

Summary

Currently, when we send an address to atos to be symbolized, it is
expected that atos returns with more than it was sent, i.e. symbol
information for that address. In the case where only the address is
returned, we currently null the pointer to the atos process. Typically,
for modules where no symbolication is expected, we do not send the
address to atos.

However, in new simulators there is an early call that atos does not
return any symbol information for. And in this case, because we have
gotten rid of the pointer to the process, no subsequent frames are
symbolicated, even tho atos is still working/running.

This patch removes the nulling of the pointer to the process. This
allows subsequent calls to atos even after an unexpected result.
It also now Reports what has happened and the address this occurred.

This will improve symbolication in cases where we get an unepxected
result, and will make it easier to diagnose atos if it is not
symbolicating as expected.
File a radar about the change of behavior 107621524

rdar://107169715

Diff Detail

Event Timeline

thetruestblue created this revision.Apr 6 2023, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 10:57 AM
Herald added a subscriber: Enna1. · View Herald Transcript
thetruestblue requested review of this revision.Apr 6 2023, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 10:57 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
yln accepted this revision.EditedApr 6 2023, 1:58 PM

Thanks Blue!

This all makes sense to me; great description in the commit message :)

2 quick questions:

However, in new simulators there is an early call that atos does not return any symbol information for

  1. Do we know why this is / what changed?
  1. Previously we completely stopped trying to symbolicate after the first failure. Now we will always try for new requests. Can you think of any scenarios where this might be a problem, i.e., some interaction with a retry where we would now spend much more time, or maybe even hang on report generation?
This revision is now accepted and ready to land.Apr 6 2023, 1:58 PM

However, in new simulators there is an early call that atos does not return any symbol information for

  1. Do we know why this is / what changed?
  1. Previously we completely stopped trying to symbolicate after the first failure. Now we will always try for new requests. Can you think of any scenarios where this might be a problem, i.e., some interaction with a retry where we would now spend much more time, or maybe even hang on report generation?
  1. Not yet. I have filed a radar about the issue tho. My suspicion is maybe sandbox changes? It is not Sanitizer-related. Meaning: atos still fails to find symbol information for the address for a process than does not have address sanitizer enabled. Only the single symbol seems to be affected.
  1. I'd love to extend this question to others. This is what I understand: I think there could be a case where atos could hang. But in that case I think we would get a warning about not being able to read from symbolizer, and it would not hit this particular code path. Meaning that case is covered elsewhere. This depends on actually getting a response from atos. In all other cases now, tho, we never set process pointer to null. So even if we *try* to read from atos but can't for some reason, currently we just return false for the symbolication for that frame, but will continue to try symbolizing other frames. Even in the worse case where normally we would stop trying to symbolicate the time "cost" should still be the same as if the symbolizing had *behaved correctly*.
thetruestblue edited the summary of this revision. (Show Details)Apr 6 2023, 2:24 PM
yln added a comment.Apr 6 2023, 2:35 PM

[...] This depends on actually getting a response from atos. [...] Even in the worse case where normally we would stop trying to symbolicate the time "cost" should still be the same as if the symbolizing had *behaved correctly*.

Makes sense, thanks for explaining!

One small nit that I just thought of:
If it's easy to do, then let's move the test under sanitizer_common/TestCases/Darwin, since sanitizer_common is where the code change is as well.

thetruestblue added a comment.EditedApr 6 2023, 5:53 PM

One small nit that I just thought of:
If it's easy to do, then let's move the test under sanitizer_common/TestCases/Darwin, since sanitizer_common is where the code change is as well.

It is simple. The reason I put it here is because sanitizer common tests are only tested on macOS. Whereas asan is tested on iOS and iOS Simulator which is where we saw the issue. But I agree, sanitizer common makes more sense based on where this code lives.

@thetruestblue It looks like the new test in this change fails https://green.lab.llvm.org/green/job/clang-stage1-RA/33833/

I think ubsan doesnt try to symbolicate the invalid address. I will get a fix up today.