This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fix handling of stack protection with SVE
ClosedPublic

Authored by john.brawn on Oct 12 2021, 4:39 AM.

Details

Summary

Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack:

  • Fix an assertion failure in the StackProtector pass caused by trying to call getTypeAllocSize on a scalable vector.
  • When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected.

Fixes PR51795.

Diff Detail

Event Timeline

john.brawn created this revision.Oct 12 2021, 4:39 AM
john.brawn requested review of this revision.Oct 12 2021, 4:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 4:39 AM
Matt added a subscriber: Matt.Oct 13 2021, 2:57 PM

Thanks for working on this @john.brawn. I've left some suggestions on your patch.

llvm/lib/CodeGen/StackProtector.cpp
360–361

I'd suggest changing the interface HasAddressTaken to take a TypeSize.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
110

In your patch you've changed the frame-layout by swapping the areas of fixed-width locals and SVE locals and always requiring/reserving x19 as a base pointer. An alternative idea is to allocate vscale x 16 bytes (i.e. size of one SVE vector) for the stack protector slot, and have that as the first SVE slot after the SVE callee saves:

+-----------------------+
|     Callee Saves      |
+ - - - - - - - - - - - +
|  Frame record (LR/FP) |
+ - - - - - - - - - - - + <- FP
| Neon/SVE callee saves |
+ - - - - - - - - - - - +
|    Stack protector    |
| . . . . . . . . . . . | <- FP - sizeof(Neon/SVE callee saves) - 1*sizeof(SVE vector)
|                       |
| SVE locals/spill-slots|
|                       |
+ - - - - - - - - - - - + 
|                       | 
|        Locals         | 
|                       | 
+------------------------ <- SP/BP

This way, we'd trade a bit of redundant stack space, but we'd avoid having to always reserve x19. We also have the benefit of using the same layout (reducing possibility of bugs being introduced by any new layout which is only enabled through a compile flag). Like with your current patch, we can still access SVE locals directly at <pointer> + <ScalableOffset>, where <pointer> in this case will be FP instead of BP.

I'm not too familiar with the exact requirements for the position of the stack protector slot, but my preliminary understanding is that it must live somewhere below the frame-record and above the locals. The above suggestion would satisfy that requirement.

Can you think of any reason why this might not be feasible?

1121

Could this use hasStackProtectorIndex instead?

john.brawn added inline comments.Oct 26 2021, 6:49 AM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
110

I hadn't though about doing it this way, but it looks like it should be feasible and would definitely simplify things. I'll give it a go and see if I can get it working.

john.brawn edited the summary of this revision. (Show Details)

Moved the stack protector to above the SVE locals instead of moving SVE locals below the stack protector, and also make use of TypeSize.

Hi @john.brawn, thanks for updating your patch, this approach seems like a good improvement! Just left a few more comments.

llvm/lib/CodeGen/PrologEpilogInserter.cpp
969

Do we need to some way to ensure that the LocalStackSlotAllocationPass doesn't pre-allocate the stack protector value if we know it will be allocated to a different StackID?

978

nit: unnecessary curly braces

llvm/lib/CodeGen/StackProtector.cpp
173–174

This should use TypeSize::isKnownGT(Typesize::getFixed(MemLoc->Size.getValue()), AllocSize).

I doubt it would ever return true when AllocSize is scalable though, becuase from what I can see in the code, MemLoc.hasValue() will return false if I accesses a scalable size, because MemoryLocation will not be precise.

215–219

If the gep is indexing into a scalable vector type, then OffsetSize cannot be fixed size?

344

unnecessary change?

360

same here, perhaps just move this into NFC patch and submit it separately.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
2958

I think this patch also needs some code to determineSVEStackObjectOffsets to ensure that the stack protector value is ordered before all other objects when allocating their offsets. (this is probably best tested with an MIR test)

llvm/test/CodeGen/AArch64/stack-guard-sve.ll
17

nit: To simplify the test, you could also write:

store <vscale x 4 x float> zeroinitializer, <vscale x 4 x float>* %x, align 16
john.brawn added inline comments.Nov 11 2021, 9:23 AM
llvm/lib/CodeGen/PrologEpilogInserter.cpp
969

I'll add an assertion to LocalStackSlotAllocationPass to check for that.

978

That's done for consistency with the preceding if (which has braces because of the nested if/else).

llvm/lib/CodeGen/StackProtector.cpp
215–219

Indexing into a scalable vector depends only on the size of the element type, which is known.

344

A relic of the previous version of this patch, I'll remove.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
2958

Will do.

Looking into the interaction with LocalStackSlotAllocation it turned out things were more complicated than I thought as it's run before we had decided to put the stack protector in the SVE stack area. I've gone with moving that decision much earlier (instead of attempting to undo what LocalStackSlotAllocation had done).

The getelementptr interaction in HasAddressTaken also wasn't right. Instead of doing "isKnownLT" we have to do "!isKnownGE" as these functions false in the "don't know" case. We also have to be more careful about the subtraction, as you can't subtract a fixed value from a scalable value.

Thanks for the update, the patch looks structurally good to me. Just left a few nits and a request for an extra test.

llvm/lib/CodeGen/StackProtector.cpp
173–174

nit: TypeSize::isKnownLT ?

218–219

nit: This can just be:

TypeSize NewAllocSize = TypeSize::Fixed(AllocSize.getKnownMinValue()) - OffsetSize;

Can you also add a test for this case? I guess that would require something like:

%ptr = alloca <vscale x 4 x i32>
%ptr.bc = bitcast <vscale x 4 x i32>* to i32*
%gep = getelementptr i32, i32* %ptr.bc, i64 2 ; no stack protector needed
                                              ; (2*sizeof(i32)) < (vscale x 4 x sizeof(i32))
store i32 %something, i32* %gep
%gep2 = getelementptr i32, i32* %gep, i64 2   ; now a stack protector is needed because
                                              ; it exceeds `sizeof(4 x i32) - 2 x sizeof(i32)`
store i32 %something, i32* %gep2
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18431

Is it worth adding a FIXME saying that it would be nice to have this in a new interface, something like getStackIDForStackProtectorIndex, rather than doing it in finalizeLowering?

llvm/test/CodeGen/AArch64/stack-guard-reassign-sve.mir
21

nit: It would be better to reorder these objects so that the StackGuardSlot is not the first object, so that you can test it still gets allocated at the top of the frame. (because the algorithm processes the objects in order of id)

john.brawn marked 3 inline comments as done.Dec 13 2021, 6:40 AM
john.brawn added inline comments.
llvm/lib/CodeGen/StackProtector.cpp
173–174

It needs to be a negated comparison because the TypeSize::isKnownXY functions return false for "I don't know" and in those cases we should assume the access is larger than AllocSize.

Adjusted based on review comments.

sdesmalen accepted this revision.Dec 14 2021, 1:43 AM

Thanks, LGTM!

This revision is now accepted and ready to land.Dec 14 2021, 1:43 AM
This revision was landed with ongoing or failed builds.Dec 14 2021, 3:31 AM
This revision was automatically updated to reflect the committed changes.