Page MenuHomePhabricator

[AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.
Needs ReviewPublic

Authored by sdesmalen on Dec 9 2019, 10:04 AM.

Details

Summary

This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).

Diff Detail

Event Timeline

sdesmalen created this revision.Dec 9 2019, 10:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2019, 10:04 AM

For every place you're adding if (scalable) return false;, I'd like to see a comment explaining why we're bailing out.

llvm/include/llvm/CodeGen/TargetLowering.h
1255 ↗(On Diff #232884)

While you're here, indentation?

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6882

Should we have a helper for this pattern?

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1362

This is sort of weird for a method named "SelectAddrModeFrameIndexSVE"; should it not just fail?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9332

Is this necessary?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2241

This seems sort of confusing. "Scale" here is implicitly multiplied by vl, and there's isn't any way for the caller to tell except by checking the opcode.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1221

IsLE? Are we supposed to do something different on big-endian targets?

1234

Should we always use PTRUE_B, even for non-byte element sizes, to encourage CSE?

Should we prefer to use ldr/str where legal, to take advantage of the larger immediate offset?

sdesmalen updated this revision to Diff 233449.Dec 11 2019, 2:28 PM
sdesmalen marked 8 inline comments as done.
  • Added convenience function MemoryLocation::getSizeOrUnknown
  • Removed isLE predicate from predicate store patterns.
  • Added comments explaining why we bail out of a function when the type is a scalable vector.
  • Addressed other suggestions to clean up code.
sdesmalen added inline comments.Dec 11 2019, 2:29 PM
llvm/include/llvm/CodeGen/TargetLowering.h
1255 ↗(On Diff #232884)

Good spot!

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6882

Yes, that would be useful. I've added MemoryLocation::getSizeOrUnknown(const TypeSize &)

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1362

Agreed, that should not have been there. Fixed.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9332

No, good catch!

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2241

I'm not sure if is an actual issue in practice though. Are you suggesting to make Scale a TypeSize instead of an unsigned?

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1221

No, that was a misunderstanding on my part. I've removed this now.

1234

Should we always use PTRUE_B, even for non-byte element sizes, to encourage CSE?

Our experience is that vectorized loops have most predicates CSEd anyway. For a loop that operates on two lanes, often a predicate is already available and there is no need to introduce an extra ptrue_b. If a loop using floats is vectorized with VF=2, we don't want operations on <vscale x 2 x float> to use ptrue.b because that would enable operations on all (vscale x) 4 lanes, which may not be valid.

Should we prefer to use ldr/str where legal, to take advantage of the larger immediate offset?

That would not be endian safe, hence the preference to use ST1 (note that the order is dictated by the AAPCS for when passing the vectors by reference). This case of saving/restoring to/from the stack like this is pretty rare. Normal spills and fills will indeed use the STR/LDR instructions. And normal load/store vector instructions that are not storing to a local will likely use other addressing modes like reg+reg.

efriedma added inline comments.Dec 11 2019, 5:42 PM
llvm/lib/Analysis/Loads.cpp
144

"how many bytes are dereferenced".

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1362

I'm not sure how you're proving that "N" is a FrameIndexSDNode here?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2241

Yes, that would force the callers to explicitly handle scalable types. It looks like some of them don't.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1234

Okay, that makes sense. For the CSE thing, we could maybe add an optimization pass after isel if it's necessary.

sdesmalen updated this revision to Diff 237670.Mon, Jan 13, 7:14 AM
sdesmalen marked 11 inline comments as done.
  • Code in SelectAddrModeFrameIndexSVE now checks if index is a FrameIndexSDNode (rather than assume it is one).
  • Fixed whitespace and updated comment.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2241

Given that this is a change propagates through the rest of the code-base, I will do this in a separate patch.

sdesmalen updated this revision to Diff 237674.Mon, Jan 13, 7:19 AM

added context to the patch

sdesmalen marked 2 inline comments as done.Wed, Jan 15, 3:46 AM
sdesmalen added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2241

I've implemented this change in D72758.