This is an archive of the discontinued LLVM Phabricator instance.

Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode
Needs ReviewPublic

Authored by varunkumare99 on Jul 10 2023, 5:56 PM.

Details

Summary

adds stack probing instruction sequence for dynamic stack allocations, VLA's and constant arrays to protect against stack clash attacks.

Depending on the size of the stack allocation, various probing sequences are generated:

  • straight line sequence of subtracts and stores
  • A loop allocating and probing one page size per iteration, plus a single probe to deal with the remainder.
  • A loop which moves the SP down to the target value held in a register, used when the allocation size is not known at compile-time

reference: https://reviews.llvm.org/D96004

Diff Detail

Event Timeline

varunkumare99 created this revision.Jul 10 2023, 5:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 5:56 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
varunkumare99 requested review of this revision.Jul 10 2023, 5:56 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 10 2023, 5:56 PM
efriedma added inline comments.Jul 10 2023, 10:21 PM
clang/lib/Driver/ToolChains/Clang.cpp
3460

Why 1024?

llvm/lib/Target/ARM/ARMFrameLowering.cpp
780

Is this relevant? Are the other unmodified uses of emitSPUpdate() relevant?

979

Can we try to unify the Windows/non-Windows codepaths to some extent? Having two independent codepaths each trying to decide when probing is necessary seems likely to lead to bugs.

llvm/test/CodeGen/ARM/stackProbing_arm.ll
19

Going straight from the function entry to here, you've decremented the stack by a total of 6276 bytes; this can jump over a 4kb guard page.

21

From a security perspective, scattering pointers to the stack onto the stack is maybe not the best idea.

95

Again, this is too many bytes. (At least, ignoring the stores before the loop, which don't appear to be intentionally inserted checks.)

llvm/test/CodeGen/ARM/stackProbing_thumb.ll
12

This is a very fancy way of setting a register to zero.

Have you done any testing that the generated code actually works?

tnfchris added inline comments.Jul 26 2023, 10:27 AM
clang/lib/Driver/ToolChains/Clang.cpp
3460

1024 was experimentally determined by Arm and is part of the ABI for stack clash (which has not yet been published). It was determined by examining a large number of programs and looking at the function stack usages. 1024 covers 80-90% of programs such that we can minimize the number of probes required in the average cases.

efriedma added inline comments.Jul 26 2023, 10:58 AM
clang/lib/Driver/ToolChains/Clang.cpp
3460

There are actually multiple numbers involved here, no? One is the spacing of probes, i.e. if allocating a large amount of stack, how many times you need to probe; this is basically the page size of the target. the other is how much unprobed space a function is allowed to allocate before calling another function. Referring to the the AArch64 patch, -mstack-probe-size is the former, the hardcoded "1024" is the latter.

tnfchris added inline comments.Jul 26 2023, 12:12 PM
clang/lib/Driver/ToolChains/Clang.cpp
3460

I hadn't looked at the patch in detail yet, I thought this was the probing offset. But you're right, what I thought of was StackClashCallerGuard, if stack-probe-size indeed the guard size itself, then yeah this would be wrong. It seems incorrect to allow it smaller than the page size.