Page MenuHomePhabricator

[POC][LoopVectorizer] Vectorize a simple loop with a scalable VF.
AbandonedPublic

Authored by sdesmalen on Oct 28 2020, 2:09 PM.

Details

Reviewers
None
Summary

This patch is part of a proof of concept for vectorising a loop using
scalable vectors. The patch is shared for reference and there is no
expectation for this patch to land in the current form.

  • Removed a bunch of asserts that were previously added to prevent vectorization for scalable VFs.
  • Steps are scaled by vscale, a runtime value.
  • Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately.

This vectorizes:

void loop(int N, double *a, double *b) {
  #pragma clang loop vectorize_width(4, scalable)
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + 1.0;
  }   
}

Diff Detail

Event Timeline

sdesmalen created this revision.Oct 28 2020, 2:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2020, 2:09 PM
sdesmalen requested review of this revision.Oct 28 2020, 2:09 PM
khchen added a subscriber: khchen.Oct 28 2020, 5:43 PM

This isn't as many changes as I was expecting. I had expected to need a lot of legality changes to make sure scalable vectorization was going to be correct too.

dancgr added a subscriber: dancgr.Nov 3 2020, 9:56 AM

This POC has been split up into separate patches: D91059, D91060 and D91077.

This isn't as many changes as I was expecting. I had expected to need a lot of legality changes to make sure scalable vectorization was going to be correct too.

Sorry, I missed this comment earlier.

There are indeed not that many changes required for scalable vectors as you might expect to achieve some initial auto-vectorization. Some of the mechanisms are already able to cope with scalable vectors, such as reductions, for which we introduced the intrinsics a couple years ago because they can't be handled in a scalar reduction loop. For legalisation the most critical part that needs changing is the selection of the VF when there is a data-dependence. For scalable vectors, the maximum vector width must somehow take vscale into account, which must be sufficiently large/conservative for the vectorizer to guarantee that a loop with dependence distance of N bytes can be safely vectorized. In the absence of extra information provided by the user that tells about the min/max vector-width of a scalable vector, we can benefit from an architectural maximum vector length for AArch64 SVE/SVE2.

sdesmalen abandoned this revision.Dec 1 2020, 4:36 AM

This patch has been superseded by D91077