This is an archive of the discontinued LLVM Phabricator instance.

Fix interaction between stack alignment and inline-asm stack clash protection
ClosedPublic

Authored by serge-sans-paille on Jul 23 2020, 8:12 AM.

Details

Summary

As reported in https://github.com/rust-lang/rust/issues/70143 alignment is not taken into account when doing the probing. Fix that by adjusting the first probe if the stack align is small, or by extending the dynamic probing if the alignment is large.

Diff Detail

Event Timeline

lkail added inline comments.Jul 24 2020, 4:16 AM
llvm/lib/Target/X86/X86FrameLowering.cpp
672

When Align > StackProbeSize, which should be rare, is there any chance that the first probe is performed on the old frame?

serge-sans-paille marked an inline comment as done.Jul 24 2020, 5:00 AM
serge-sans-paille added inline comments.
llvm/lib/Target/X86/X86FrameLowering.cpp
672

That's a concern I have. In that particular code that's ok, because we check Align <= StackProbeSize and we just adjust the first probe. In the other case what we're basically doing is

rsp += rem(rsp, align)
rsp -= align
rsp += StackProbeSize
*rsp = 0

if rsp is already aligned, we end-up doing align/StackProbeSize useless probing

Can you add tests that show what this is intended to look like for alignments greater than the probe step size?

Right now AFAICS you're just adding the alignment, which doesn't make sense to me. In the Rust code I examined, alignment was achieved by ANDing an alignment mask, which essentially rounds down to the next aligned address. That's a dynamically-sized address range that needs to be part of the probing.

Added test case for large stack align

cuviper added inline comments.Aug 10 2020, 4:49 PM
llvm/test/CodeGen/X86/stack-clash-small-large-align.ll
42

There's an immediate race after andq $-65536, %rsp -- if we get a signal here, the stack pointer could be in a potentially bad place and start clobbering stuff.

Then adding the full alignment puts us into arbitrary stack memory, or even could go past the top of the stack. If we start writing probes from there, who knows what memory we're clobbering. (akin to @lkail's concern)

Consider if the stack is almost aligned to begin with, something like 0x1230010. The and will only adjust a small distance to 0x1230000, and then the add makes it 0x1240000. The first probe will be a page below that, 0x123E000, but that's still way out of our frame, not memory we should be writing.

Actually, that plays just as badly if the stack is perfectly aligned to begin with.

The other extreme was the original concern, perhaps with an incoming stack like 0x123FFE0. Then it will again and to 0x1230000, add to 0x1240000, and start probing at 0x123E000, which is OK in this case.

cuviper added inline comments.Aug 10 2020, 4:54 PM
llvm/test/CodeGen/X86/stack-clash-small-large-align.ll
42

Oops, those start probing at 0x123F000, sorry for the bad math. The point stands though.

Take into account @cuviper review, split handling of alignment between three cases:

  1. small stack, small align → adjust first stack increase
  2. large stack, small align → perform a first stack increase + touch before the loop
  3. large align → perform a loop over the range between original stack pointer and aligned stack pointer (in that case, there's still a small window for the attack)

I still need to improve the third case...

Perform masking and probing in the same location to nicely handle all combination of small/large alloc and small/large align

lkail accepted this revision.Sep 20 2020, 7:01 PM

LGTM. This solution probes the gap between [stackptr & mask, stackptr] when align is big.

This revision is now accepted and ready to land.Sep 20 2020, 7:01 PM
efriedma added inline comments.
llvm/lib/Target/X86/X86FrameLowering.cpp
1097

I don't think this condition is right.

Say MaxAlign == StackProbeSize. Then an "and" can allocate up to StackProbeSize-4 bytes. So any subsequent stack allocation can jump over a guard page. (This is an extreme example. Really, it doesn't matter what the alignment is; it's just harder to cause a practical issue if the alignment is small.)

In general, we can't skip a probe for a stack allocation. We can only merge the probes for adjacent stack allocations. Say, for example, we realign the stack then allocate "Offset" bytes of aligned memory. We can get away with considering both allocations as a single "allocation" if MaxAlign+Offset <= StackProbeSize. But that method of proof works if you analyze them together. If you analyze each allocation independently, you can't prove the safety, so the realignment needs its own probe.

lkail added inline comments.Sep 21 2020, 9:34 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
1097

Good catch. Looks PPC64's implementation also has the same issue. I'll post a patch to fix this issue for PPC64.

Enforce an extra invariant: after alignment adjustment, there is less than PAGE_SIZE unprobed, which make sit possible to decouple it from stack allocation probing. I think this adresses @efriedma comment (?)

I think this adresses @efriedma comment (?)

Maybe? I'd like to make sure we have test coverage for:

  • the case where alignment+offset is less than the probe size
  • the case where both alignment and offset are less than the probe size, but align+offset is greater than the probe size

(In stack-clash-small-alloc-medium-align.ll, it looks like alignment+offset is exactly equal to the probe size?)

the case where alignment+offset is less than the probe size

For this case, it would also be good to have a version with a subsequent dynamic and/or callframe allocation that pushes the total stack allocated over the probe size, to show the interaction works correctly.

Added more test case

the case where both alignment and offset are less than the probe size, but align+offset is greater than the probe size

I don't think any of the tests cover this? Or am I missing something obvious?

Added missing test case, and label existing ones so that their purpose is clear

Tested clang bootstrapping with -fstack-clash-protection on, works like a charm \o/

efriedma accepted this revision.Sep 25 2020, 2:17 PM

Looks like it's working correctly now. LGTM