As reported in https://github.com/rust-lang/rust/issues/70143 alignment is not taken into account when doing the probing. Fix that by adjusting the first probe if the stack align is small, or by extending the dynamic probing if the alignment is large.
Details
Diff Detail
Event Timeline
llvm/lib/Target/X86/X86FrameLowering.cpp | ||
---|---|---|
672 | When Align > StackProbeSize, which should be rare, is there any chance that the first probe is performed on the old frame? |
llvm/lib/Target/X86/X86FrameLowering.cpp | ||
---|---|---|
672 | That's a concern I have. In that particular code that's ok, because we check Align <= StackProbeSize and we just adjust the first probe. In the other case what we're basically doing is rsp += rem(rsp, align) rsp -= align rsp += StackProbeSize *rsp = 0 if rsp is already aligned, we end-up doing align/StackProbeSize useless probing |
Can you add tests that show what this is intended to look like for alignments greater than the probe step size?
Right now AFAICS you're just adding the alignment, which doesn't make sense to me. In the Rust code I examined, alignment was achieved by ANDing an alignment mask, which essentially rounds down to the next aligned address. That's a dynamically-sized address range that needs to be part of the probing.
llvm/test/CodeGen/X86/stack-clash-small-large-align.ll | ||
---|---|---|
42 | There's an immediate race after andq $-65536, %rsp -- if we get a signal here, the stack pointer could be in a potentially bad place and start clobbering stuff. Then adding the full alignment puts us into arbitrary stack memory, or even could go past the top of the stack. If we start writing probes from there, who knows what memory we're clobbering. (akin to @lkail's concern) Consider if the stack is almost aligned to begin with, something like 0x1230010. The and will only adjust a small distance to 0x1230000, and then the add makes it 0x1240000. The first probe will be a page below that, 0x123E000, but that's still way out of our frame, not memory we should be writing. Actually, that plays just as badly if the stack is perfectly aligned to begin with. The other extreme was the original concern, perhaps with an incoming stack like 0x123FFE0. Then it will again and to 0x1230000, add to 0x1240000, and start probing at 0x123E000, which is OK in this case. |
llvm/test/CodeGen/X86/stack-clash-small-large-align.ll | ||
---|---|---|
42 | Oops, those start probing at 0x123F000, sorry for the bad math. The point stands though. |
Take into account @cuviper review, split handling of alignment between three cases:
- small stack, small align → adjust first stack increase
- large stack, small align → perform a first stack increase + touch before the loop
- large align → perform a loop over the range between original stack pointer and aligned stack pointer (in that case, there's still a small window for the attack)
I still need to improve the third case...
Perform masking and probing in the same location to nicely handle all combination of small/large alloc and small/large align
LGTM. This solution probes the gap between [stackptr & mask, stackptr] when align is big.
llvm/lib/Target/X86/X86FrameLowering.cpp | ||
---|---|---|
1097 | I don't think this condition is right. Say MaxAlign == StackProbeSize. Then an "and" can allocate up to StackProbeSize-4 bytes. So any subsequent stack allocation can jump over a guard page. (This is an extreme example. Really, it doesn't matter what the alignment is; it's just harder to cause a practical issue if the alignment is small.) In general, we can't skip a probe for a stack allocation. We can only merge the probes for adjacent stack allocations. Say, for example, we realign the stack then allocate "Offset" bytes of aligned memory. We can get away with considering both allocations as a single "allocation" if MaxAlign+Offset <= StackProbeSize. But that method of proof works if you analyze them together. If you analyze each allocation independently, you can't prove the safety, so the realignment needs its own probe. |
llvm/lib/Target/X86/X86FrameLowering.cpp | ||
---|---|---|
1097 | Good catch. Looks PPC64's implementation also has the same issue. I'll post a patch to fix this issue for PPC64. |
Enforce an extra invariant: after alignment adjustment, there is less than PAGE_SIZE unprobed, which make sit possible to decouple it from stack allocation probing. I think this adresses @efriedma comment (?)
I think this adresses @efriedma comment (?)
Maybe? I'd like to make sure we have test coverage for:
- the case where alignment+offset is less than the probe size
- the case where both alignment and offset are less than the probe size, but align+offset is greater than the probe size
(In stack-clash-small-alloc-medium-align.ll, it looks like alignment+offset is exactly equal to the probe size?)
the case where alignment+offset is less than the probe size
For this case, it would also be good to have a version with a subsequent dynamic and/or callframe allocation that pushes the total stack allocated over the probe size, to show the interaction works correctly.
the case where both alignment and offset are less than the probe size, but align+offset is greater than the probe size
I don't think any of the tests cover this? Or am I missing something obvious?
When Align > StackProbeSize, which should be rare, is there any chance that the first probe is performed on the old frame?