Page MenuHomePhabricator

[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible

Authored by joechrisellis on Mar 3 2021, 7:10 AM.



This commit folds sign- or zero-extended offsets into gather loads where
possible with a DAGCombine optimization.

As an example, the following code:

1	#include <arm_sve.h>
3	svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) {
4	  return svld1sw_gather_s64offset_u64(
5	    pred, base, svextw_s64_x(pred, offsets)
6	  );
7	}

would previously lower to the following assembly:

sxtw	z0.d, p0/m, z0.d
ld1sw	{ z0.d }, p0/z, [x0, z0.d]

but now lowers to:

ld1sw   { z0.d }, p0/z, [x0, z0.d, sxtw]

Diff Detail

Event Timeline

joechrisellis created this revision.Mar 3 2021, 7:10 AM
joechrisellis requested review of this revision.Mar 3 2021, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2021, 7:10 AM
david-arm added a subscriber: kmclaughlin.

Adding @kmclaughlin as reviewer since she did a lot of the gather/scatter lowering work.

david-arm added inline comments.Mar 4 2021, 12:40 AM

Do we not need to check the type we're extending from here? For example, what if we're extending from a vector of i8s, which isn't tested below?

peterwaller-arm added inline comments.Mar 4 2021, 1:58 AM

This comment (and below) is merely a re-expression of the code on the next line, where a message indicating the intent may be more useful -- something like "Fold sign/zero extensions of vector offsets into GLD1 nodes where possible".

joechrisellis marked 2 inline comments as done.

Address comments.

  • @david-arm: make sure that we only fold in {u,s}xtws, not just any sign-/zero-extension.
  • @peterwaller-arm: comment clarifications.
joechrisellis added inline comments.Mar 4 2021, 7:00 AM

Great spot -- thank you. Turns out we can only fold in {u,s}xtws (w being the key part). :)

kmclaughlin added inline comments.Mar 4 2021, 11:16 AM

nit: I think it might make the code below a bit clearer if these variables were named similarly to the comments accompanying them, i.e. Chain, Pred, Base, etc


Can we remove some of the indentation below by returning SDValue() here if the offset isn't extended instead?


There were some helper functions added for LowerMGATHER which I think you might be able to use here to get the right gather opcode (getGatherVecOpcode & getSignExtendedGatherOpcode).
As an example:

  getGatherVecOpcode(false /*Scaled*/, OffsetIsSext, true /*NeedsExtend*/);
  getGatherVecOpcode(true /*Scaled*/, OffsetIsSext, true /*NeedsExtend*/);
  NewOpc = getSignExtendedGatherOpcode(NewOp);
joechrisellis marked 3 inline comments as done.

Address review comments.

  • @kmclaughlin:
    • better variable names.
    • use helper functions to determine new opcode for gather load.

I've also made the function a little more general, since we might want to extend this function to capture other optimisation patterns in the future.

joechrisellis added inline comments.Mar 5 2021, 6:14 AM

I would prefer to leave this as-is, because this was an intentional decision. It's possible that we might want to extend this function with a new optimisation pattern. It'd be easier to extend the function in its current state than it would be with the early return. 🙂


Thanks, that's cleaned things up a lot. 🙂

david-arm added inline comments.Mar 8 2021, 1:26 AM

Are these opcodes in a range, i.e. could you assert something like:

assert((Opc >= Op1 && Opc <= Op2) || (Opc >= Op3 && Opc <= Op4))

It might make the assert look a bit nicer perhaps?


In fact, you could move the above assert here instead and have something like:

assert(Scaled || Signed || SExt || ZExt)



I wonder if you maybe want to make the combine function simpler for now in terms of asserts and checking by only passing in the opcodes that you actually intend to combine? If later on you want to add more then it's easy enough to update. This is just a suggestion though.

joechrisellis marked an inline comment as done.

Address comments.

  • @david-arm: clean up assertion by using ranges instead of explicitly listing all allowed opcodes.

Much nicer -- thank you!

I've split it into two ranges -- one for unsigned gather loads, and one for signed gather loads.

assert(((Opc >= AArch64ISD::GLD1_MERGE_ZERO && // unsigned gather loads
         Opc <= AArch64ISD::GLD1_IMM_MERGE_ZERO) ||
        (Opc >= AArch64ISD::GLD1S_MERGE_ZERO && // signed gather loads
         Opc <= AArch64ISD::GLD1S_IMM_MERGE_ZERO)) &&
       "Invalid opcode.");

These ranges are actually adjacent in the enum at the moment but I guess that isn't always guaranteed to be the case.


I am not sure that this will work because it is possible to have an unscaled, unsigned, and unextended opcode -- e.g. AArch64ISD::GLD1_MERGE_ZERO. Although this point is definitely valid if we only consider the optimisation patterns that we're performing at the moment.


I am happy to leave it as-is for now, as long as this isn't a blocker? I think it's nice to be able to add new patterns in for the different ISD nodes without too much wrangling. Just my opinion though --override if you want! 😄

Politely pinging this. 🙂

kmclaughlin accepted this revision.Mar 16 2021, 5:11 AM

LGTM! I think it's probably worth checking if we can do this for scatter stores at some point as well.


nit: if SExt & ZExt aren't used anywhere else, could you remove them and just set Extended based on the opcode? i.e.

const bool Extended = Opc == AArch64ISD::GLD1_SXTW_MERGE_ZERO ||
                      Opc == AArch64ISD::GLD1_SXTW_SCALED_MERGE_ZERO ||
                      Opc == AArch64ISD::GLD1_UXTW_MERGE_ZERO ||
                      Opc == AArch64ISD::GLD1_UXTW_SCALED_MERGE_ZERO;
This revision is now accepted and ready to land.Mar 16 2021, 5:11 AM

Address code review comments.

This revision was landed with ongoing or failed builds.Mar 16 2021, 8:10 AM
This revision was automatically updated to reflect the committed changes.