This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Use crash stack for fuchsia.
AcceptedPublic

Authored by charco on Feb 10 2021, 2:10 PM.

Details

Summary

This commit adds a temporary stack to handle libfuzzer crashes in
Fuchsia.

Crashes in fuchsia are handled via exception channels: an exception
handler thread waits for an exception, and when one happens, it will try
to "resurrect" the crashed thread by writing the registers onto the
stack and changing the pc to a crash trampoline, which then calls
libfuzzer's static crash handler.

If the crashed thread has an invalid stack, writing the registers onto
the stack will fail. The end result is that the fuzzer would hang and
the error would be reported as a time out.

To solve it, we set up a temporary stack of a few pages so the crash
handler can run. This crash handler will end up the application, so we
are not expected to go back to normal.

Related to that change, now that we only have one crash stack, we can't
allow other threads to use the same crash stack, and we aren't supposed
to let multiple threads call the crash handler concurrently anyways, so
now we only let the first thread to crash to go through.

Finally, there's also a change in the asan code for fuchsia, to check
whether we are in the default stack or in a different one. Upon calls to
noreturn functions, asan unpoisons the thread stacks, so we check
whether the current stack is the default one, and if so, do everything
as normal. Otherwise, we unpoison the whole default stack, as well as
the current stack page (we don't know how big the current stack is).

Diff Detail

Event Timeline

charco requested review of this revision.Feb 10 2021, 2:10 PM
charco created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 2:10 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
charco updated this revision to Diff 322822.Feb 10 2021, 2:13 PM

clang-format

vitalybuka added inline comments.
compiler-rt/lib/asan/asan_fuchsia.cpp
74

local_var unused?

and better just use __builtin_frame_address()

82

why exactly GetPageSize?

charco updated this revision to Diff 324825.Feb 18 2021, 5:39 PM
charco marked an inline comment as done.

Fix cfi directives for the crash trampoline.

eep added inline comments.Feb 18 2021, 7:53 PM
compiler-rt/lib/fuzzer/FuzzerUtilFuchsia.cpp
204

Do we need to ensure this is aligned on ARM?

aarongreen added inline comments.Feb 26 2021, 12:08 PM
compiler-rt/lib/asan/asan_fuchsia.cpp
65–87

Does this affect more than libFuzzer? If so, it probably belongs in its own change, with the libFuzzer changes depending on it.

compiler-rt/lib/fuzzer/FuzzerFuchsiaCrashTrampoline.S
1 ↗(On Diff #324825)

See my note on the next file. I think this should be FuzzerUnwindFuchsia.S.

compiler-rt/lib/fuzzer/FuzzerFuchsiaRegisters.h
1 ↗(On Diff #324825)

I don't love this file, as it seems to be squashing together several orthogonal concerns (os, arch, elf/dwarf).

I didn't love how it was before (all hidden in FuzzerUtilFuchsia) but at least it was mostly abstracted away.

I don't love the idea of splitting everything up into separate files, as it leads to quite a few files added for what's by and large a corner case (e.g. FuzzerDwarf.h, FuzzerDwarfX86_64.h, FuzzerDwarfAarch64.h, FuzzerUnwindFuchsia.h, FuzzerUnwindFuchsiax86_64.h, FuzzerUnwindFuchsiaAarch64.h).

At minimum, let's at least name this consistently with the rest of libFuzzer, i.e. make this FuzzerUnwindFuchsia.h.

compiler-rt/lib/fuzzer/FuzzerUtilFuchsia.cpp
42

Pretty sure you mean the zx_thread_state_general_regs_t struct, but it would be good to say so.

42

Also, these look like they can be moved to FuzzerFuchsiaRegisters.h (or FuzzerUnwindFuchsia.h), where they would have more context (but may need a guard like #ifdef __cplusplus... I'm not sure about their inclusion in the .S file).

52

Any particular reason for this size?

125–143

Tackling a few suggestions with one comment:

  • The constructor is redundant with the default value for Handle.
  • An explicit constructor might be nice.
  • If you define a move constructor, its good practice to define a move assignment operator as well, so all future changes work as expected, e.g. unhandled_exceptions.back() = std::move(replacement);

Taken together, I think this is close enough to add unique_ptr-style semantics and put all the logic into get/release/reset methods, much like zx::object in Fuchsia's //zircon/system/ulib/zx/include/zx/object.h.

183–185

This needs a test, but since we're cross compiling and not self-hosted, it probably needs to be on the Fuchsia side. I've opened https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=71106.

Alternately, you may be able to add some general, OS-agnostic unit tests for the concurrent and double fault cases. We'll still need to address the shortcomings of testing on Fuchsia, though, so the above bug would still be valid.

204

So close to extracting all the arch bits from this file... I wonder if it's worth abstracting this too.

213

Hmm. I feel it might be better to invoke one of the error callbacks directly in this case rather than relying on the std::at_exit handler to land us in Fuzzer::ExitCallback. If we're really concerned with the process being hosed, Fuzzer::DeathCallback might be the best option, as it skips trying to print the stack, but you'd need to call _Exit yourself.

aarongreen added inline comments.Apr 8 2021, 4:04 PM
compiler-rt/lib/fuzzer/FuzzerFuchsiaCrashTrampoline.S
121 ↗(On Diff #324825)

Can this just be pc, as with rip above?

charco updated this revision to Diff 341787.Apr 30 2021, 12:07 AM
charco marked 2 inline comments as done.

Code review comments.

charco marked an inline comment as done and an inline comment as not done.Apr 30 2021, 12:17 AM

Thanks for the review. I addressed all the comments. PTAL when you have some time!

compiler-rt/lib/asan/asan_fuchsia.cpp
65–87

libfuzzer is the only one using a different stack than the default, that I am aware of. But this would affect any program that uses asan and a non-default stack.

82

If we are not in the default stack, we have no way of knowing the size of the current stack. In that case, the best thing we can do is to unpoison the current stack page (as it is the one that we are using). An alternative would be to do nothing.

compiler-rt/lib/fuzzer/FuzzerFuchsiaCrashTrampoline.S
1 ↗(On Diff #324825)

FuzzerUnwindFuchsia is a great name for this file. Thanks for the suggestion!

121 ↗(On Diff #324825)

I am not sure I follow what you mean here. Above we use the rdi register to hold the address of were we want to jump. Here we use x0.

compiler-rt/lib/fuzzer/FuzzerFuchsiaRegisters.h
1 ↗(On Diff #324825)

This is consistent to how they handle this stuff in libunwind (one file with ifdefs per arch). It is true that the DWARF Reg Numbers are not specific for fuchsia, but we are the only ones who care about them.

compiler-rt/lib/fuzzer/FuzzerUtilFuchsia.cpp
52

Nope. I just needed to pick a size. This is only used for "printing the backtrace and exiting", so 16KiB seemed more than enough.

125–143

Thanks! The code looks better now.
I decided to leave Handle public because we need its address to be used in the system calls down below. (See for example the usage of Channel.Handle).

183–185

Ack.

204

Technically yes. It just works because sizeof() the struct is already multiple of 16. I added back the rounding formula, but I think a static check would also do, as we don't expect that struct to change.

204

Hm. We will need the arch ifdefs one way or the other. I moved most of this code into a SetupCrashTrampoline function, at least that way we constrain the ifdef to a small function.

213

We had agreed to keep it as is, right? An alternative would be to somehow generate a crash artifact and then let next handler handle the crash (finishing the program)

charco updated this revision to Diff 341789.Apr 30 2021, 12:29 AM
charco marked an inline comment as done.

Comments on SetupTrampoline

charco updated this revision to Diff 341790.Apr 30 2021, 12:39 AM
  • Uncomment static assert
charco updated this revision to Diff 341971.Apr 30 2021, 10:35 AM
  • Enforce alignment & fix array type.
aarongreen accepted this revision.May 18 2021, 2:09 PM
This revision is now accepted and ready to land.May 18 2021, 2:09 PM

The stated goal is very sensible. But this change is doing too many things at once to get there. Please separate the refactorings into smaller changes that come with some discussion of why that refactoring is needed.

compiler-rt/lib/asan/asan_fuchsia.cpp
65–87

I would prefer to see this as a separate change and with more comments in the code right here explaining the rationale for what it's doing. It's a subtle change in semantics that needs it own discussion separate from the libfuzzer context.

compiler-rt/lib/fuzzer/FuzzerUnwindFuchsia.S
33–34

That's not at all what I think the CFA ought to be. That's something like what the CFA would have been if the faulting instruction has instead been a normal call. That's not what the contract about the CFA is. The CFA is about *your* frame, not your caller's frame. For unwinding purposes, the CFA is whatever you say it is in the CFI. Some ABIs like the generic aarch64 ABI says that the CFA should match the SP on entry to the function. The ABI says that but it's not really part of the unwinding contract, it's just something the ABI says.

Anyway, even to match the letter of the aarch64 ABI, the CFA for the trampoline has nothing whatsoever to do with the interrupted state. The CFA for the trampoline can be the SP on entry *to the trampoline*. Since the protocol of the trampoline as a "function" is actually private between that code and the setup code, you can construe that however you like, e.g. either at the top of the trap frame or the bottom.

What I'd do is make it as if the trampoline were function called by the normal convention, so its SP on entry is below the trap frame. This means the trap frame itself is kept at the stack's alignment requirement (16). On x86 this means an extra stub return address word is pushed first to correctly misalign the SP on entry as per the ABI. This is the CFA definition that plain .cfi_startproc gives you (in the CIE). So if that's the trap frame, it's your CFA. And then all you need is a .cfi_offset R, offsetof(genregs, R) for each one, plus .cfi_signal_frame, and you're unwound.

This way nicely works the same whether you've pushed the trap frame below the interrupted stack or you've started a fresh signal stack.

compiler-rt/lib/fuzzer/FuzzerUnwindFuchsia.h
46

If you're going to take this approach then you need to pepper this file with static_assert(offsetof(...) == REG_...); to validate each and every constant here. This is the big downside of putting the assembly into a .S file instead of using inline asm as before.

You haven't given any rationale for the change to define the assembly in a .S file instead of using the previous method.
That alone is a big refactoring that ought to be a change of its own before the semantic changes. It needs to be explained why that's a better way to organize the code than what we were doing before, which IMHO is substantially less fragile for ongoing maintenance.

compiler-rt/lib/fuzzer/FuzzerUtilFuchsia.cpp
78

If it's important to make this size fixed at compile time, it should be at least 64k.
But it would be better to choose at at runtime with zx_system_get_page_size() * 4 or suchlike, along with some comments giving rationale for the size chosen.

79

Is it important that this be statically allocated? Why not dynamically allocate it during setup?

The alignment requirement is a machine ABI detail and should have a symbolic constant and comment about that, though it happens to be 16 on machines currently supported.

111

Reusing the crashing thread's shadow call stack is no safer than reusing its machine stack.
The same is true of the safe-stack unsafe stack, which is only used by default on x86 but is fully supported by the Fuchsia libc ABI on both machines.

I think you need to have separate unsafe stack (everywhere) and shadow call stack (aarch64 only) for the crash handler too.

126

This seems like a good standalone refactoring with a simple goal: make the type move-friendly (and add the reset method, though since it's not used elsewhere that doesn't seem to have been a goal).
Make that its own separate change, with commentary explaining its goal. I only surmised what the motivation for this change must have been. A log message or code comment should explain that.

charco added a comment.Jul 5 2021, 2:59 PM

Thanks for the review, Roland.

I will be OOO next week, but will address the comments when I get back.