This is an archive of the discontinued LLVM Phabricator instance.

Suppress SIGSEGV on Android when the program will recover
Needs ReviewPublic

Authored by clayborg on Jun 14 2018, 9:51 AM.

Details

Reviewers
labath
Summary

SIGSEGV signals are sent to Android processes when a NULL dereference happens in Java code that is being run as native code in dex, odex and oat files. In this patch I modified the Platforms to be allowed to modify a stop reason for a thread. The PlatformAndroid will watch for SIGSEGV stop reasons and see if the frame that caused the SIGSEGV comes from a dex, odex or oat file, and if so, it will suppress the stop reason. Many IDEs are manually figuring this out and learning to skip the signal so users don't see it. Even when IDEs do this, the IDE might end up showing a SIGSEGV as a valid stop reason for a thread. Also, a SIGSEGV thread might be selected when another thread hits a breakpoint. So we suppress these signals to avoid spurious thread changes and improve android debugging.

Diff Detail

Event Timeline

clayborg created this revision.Jun 14 2018, 9:51 AM

If you can get the address of the bad access from the signal, you could also check that it was 0x0 and only suppress the SIGSEGV if it is?

Also, do you want to put in a setting to turn this behavior off? If the code in any of the files of this type were to crash for some other reason than a Java NULL dereference, you'd have no way to use lldb to debug the issue. lldb will just auto-continue and then either the program will terminate because the SIGSEGV handler doesn't handle this signal or the SIGSEGV handler will crash or whatever... That will be too late to be useful.

But I don't know anything about how these .oat and .dex/.odex files are constructed, and maybe they can't crash for any other reason than emulating a Java NULL access, in which case this looks fine.

source/Plugins/Platform/Android/PlatformAndroid.cpp
424

Given that this is a pretty big behavior change, I would exactly match on the three extensions rather than use endswith, so it only affects the file types you care about.

Thank you for implementing this. We've had code like this in android studio for a long time, but it's definitely better doing it in lldb instead.

I'll give some more background so that Jim and anyone else looking at this can understand what's going on. These files (I forgot what's the difference between them) contain the compiled version of Java code. As java is a "safe" language, the code should not generate any signals (modulo compiler bugs) that the runtime can't handle. The SEGV is just the implementation's way of avoiding null checks everywhere -- on the assumption that NullPointerExceptions are rare, its faster to not generate them and clean up the occasional mess in the signal handler.
For that reason, an android app (*) will always have a SEGV handler. This handler will be invoked even for non-bening SEGVs (so simply resuming from a SEGV will never crash the app immediately). For signals the runtime can't handle it will invoke a special art_sigsegv_fault function, which the debugger can hook to catch "real" segfaults. Unfortunately, in the past this mechanism was unreliable (the function could end up inlined), so we have to do this dance instead. Once android versions with the fixed "fault" function become more prevalent, we can skip this and just automatically reinject all SIGSEGVs. This is particularly important as each new version of android finds new creative ways to "optimize" things via SIGSEGVs, and going things this way means constantly playing catch-up.

So much for background. I think Jim's suggestion on having all of this this controllable by a setting makes sense, and it would be consistent with how we handle other kinds of "magical" under-the-hood stops (target.process.stop-on-sharedlibrary-events). I'm not sure how much use would it get, but I can imagine it being useful for debugging the segv handling code itself. I'm a bit sad that we now have two plugins with the OverrideStopInfo functionality, but I can't think of any better way of arranging things right now.

(*) This means "real" GUI apps. command line executables will not have the android runtime inside them, nor the special segv handler, but that means they will not contain any "dex" files either.

source/Plugins/Platform/Android/PlatformAndroid.cpp
399

I'm not sure if SEGV is one of them, but numbers of some signals vary between architectures. You should be able to get the real value via process->GetUnixSignals()

424

Agreed, matching on the exact extension looks safer and more obvious. The most llvm-y way of writing that would be StringSwitch<bool>(ext.GetStringRef()).Cases("dex", "oat", "odex", true).Default(false)

Thanks for the explanation!

Jim

Another thing that slightly bugs me about this patch is now we have the Architecture with special purpose code to modify the stop reason, and the Platform ditto. I wonder if it wouldn't be better to have a way to register interest in modifying stop infos, and then let the target & architecture sign up for that. That way the next time somebody else needs to do this we won't have to add more special purpose code to Thread.cpp.