This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. <vscale x 16 x i8>) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM. Will give other reviewers some time to review though...
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
2554–2582 | Nit: It doesn't actually sort them, right? Or did I miss it? I'm assuming that we'll want to eventually order SVE stack objects by size. |
llvm/include/llvm/CodeGen/TargetFrameLowering.h | ||
---|---|---|
369 | We don't want to encourage people to key backend behavior off of the IR type of an alloca. Please change the API so it only passes down whether the type has a variable size. | |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
2575 | Assertions should only be used to check conditions we know are false. I don't see any code that would actually guarantee this, given arbitrary IR as input. Not sure what layer is best to address this. I mean, I guess we could honor the request by turning the alloca into a dynamic alloca during isel. |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
2554–2582 | I think we need to do something here if it's possible to have objects that aren't "16 x vscale" bytes long, to ensure natural alignment. I'm fine leaving that as a FIXME for now, though. |
llvm/include/llvm/CodeGen/TargetFrameLowering.h | ||
---|---|---|
369 | Thanks, I didn't know that. I'll change the API! | |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
2554–2582 |
Yeah, I thought that made sense but realised that PEI does not seem to sort the objects either, probably because that would undo the ordering as calculated by StackSlotColoring, which orders the slots by # uses (i.e. more uses, closer to SP). I do have a separate patch that reorders the slots based on the availability of the FP, which I can share as a follow-up to this patch. | |
2554–2582 |
Objects that aren't 16 x vscale bytes long (but have an alignment < 16bytes) are supported by this code [1]. If you're suggesting that we should reorder the objects based on natural alignment for performance reasons, we can indeed sort the ObjectsToAllocate list to group all the predicates (vscale x 2 bytes) together, and all the data vectors (vscale x 16 bytes) together (but leave their order otherwise intact to benefit from the order defined by StackSlotColoring). I can fix that in a separate patch and add a FIXME for now. [1] All objects in the SVE area on the stack have the same vscale scaling. That means that if we have two objects, e.g. one of vscale x 16 bytes (data vector) and one of vscale x 2 bytes (predicate), this will be aligned to (vscale *) 16 bytes by allocating vscale x 32 bytes by the call to alignTo below on line 2578. | |
2575 | Without having thought about it too deeply, that indeed seems like a feasible way to support it. For now, are you happy for me to leave the assert here (plus an extra FIXME comment) to leave the case unsupported until we fix this in a later patch? |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
2554–2582 | Oh, I see; vscale just multiplies all the stack offsets, so anything that's aligned before vscale is applied is still aligned afterwards. Sorry, I got confused somehow. Sure, we can put off sorting for now. | |
2575 | Please at least change it to report_fatal_error. I'm okay putting off fixing it. |
- Changed getStackIDForType(const Type *T) -> getStackIDForScalableVectors()
- Changed assert into report_fatal_error for >16 byte alignment of SVE stack objects and added a FIXME.
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
2554–2582 | No worries, I understand this is quite confusing. Thanks! |
Thanks for the review @efriedma and @cameron.mcinally!
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
2501–2502 | Before I committed the patch, I moved the initialization of Min and Max to the start of the function. |
We don't want to encourage people to key backend behavior off of the IR type of an alloca. Please change the API so it only passes down whether the type has a variable size.