This is an archive of the discontinued LLVM Phabricator instance.

[AArc64] Add support for insert/extract for nxv1i1 types.
ClosedPublic

Authored by sdesmalen on Jul 1 2022, 2:37 AM.

Details

Summary

This patch adds patterns and tests for subvector insert/extract
intrinsics to/from all legal predicate types.

Diff Detail

Event Timeline

sdesmalen created this revision.Jul 1 2022, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 2:37 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
sdesmalen requested review of this revision.Jul 1 2022, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 2:37 AM
david-arm added inline comments.Jul 4 2022, 7:11 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1542

nit: whitespace change

llvm/test/CodeGen/AArch64/sve-insert-vector.ll
711

Is there value in adding a simple test that inserts <vscale x 1 x i1> into a <vscale x 1 x i1> vector? I assume that standard DAG combines will treat this as a simple copy?

992

I don't think this is a problem with your patch, but the spill and fill in this output looks unnecessary. We have enough registers to support this without spilling I think?

Also, something weird seems to be happening with the offset for the spill/fill, i.e. "#7, mul vl". I assume that translates to offset = 7 x vscale x 2? It seems to fit into the stack space we've allocated, but I wonder if this is just pure luck?

david-arm added inline comments.Jul 4 2022, 7:13 AM
llvm/test/CodeGen/AArch64/sve-insert-vector.ll
992

Ah, perhaps we're actually just storing it into the top of the temporary stack space we've allocated, i.e. the top part of a "vscale x 16" byte object.

sdesmalen marked 3 inline comments as done.Jul 4 2022, 7:39 AM
sdesmalen added inline comments.
llvm/test/CodeGen/AArch64/sve-insert-vector.ll
711

I think there's little value in that because it becomes a copy straight away when building the DAG.

992

This happens because p4-p15 are callee-saved, and it needs p4 as a scratch register in this function.

Ah, perhaps we're actually just storing it into the top of the temporary stack space we've allocated, i.e. the top part of a "vscale x 16" byte object.

Correct. The space it allocates is aligned to <vscale x 16 x i8>, so that's the smallest space that gets allocated, but then it only stores a <vscale x 16 x i1>.

sdesmalen marked 2 inline comments as done.Jul 4 2022, 8:21 AM
sdesmalen added inline comments.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1542

That was actually intentional, to cluster all the two-stage unpacks together.

david-arm accepted this revision.Jul 4 2022, 8:31 AM

LGTM!

This revision is now accepted and ready to land.Jul 4 2022, 8:31 AM
kmclaughlin accepted this revision.Jul 4 2022, 8:48 AM
This revision was landed with ongoing or failed builds.Jul 4 2022, 8:55 AM
This revision was automatically updated to reflect the committed changes.
sdesmalen marked an inline comment as done.