Page MenuHomePhabricator

[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF.

Authored by sdesmalen on Nov 9 2020, 6:53 AM.


  • Steps are scaled by vscale, a runtime value.
  • Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately.

This vectorizes:

void loop(int N, double *a, double *b) {
  #pragma clang loop vectorize_width(4, scalable)
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + 1.0;

Diff Detail

Event Timeline

sdesmalen created this revision.Nov 9 2020, 6:53 AM
Herald added a project: Restricted Project. · View Herald Transcript
sdesmalen requested review of this revision.Nov 9 2020, 6:53 AM
sdesmalen updated this revision to Diff 304167.Nov 10 2020, 6:29 AM

Fixed broken assert and made buildScalarSteps generate the right step-by-vscale code if the type is floating point.

cameron.mcinally added inline comments.

Nit: Yank spelling of scalarize?


Same here.

sdesmalen updated this revision to Diff 304814.Nov 12 2020, 6:09 AM
sdesmalen marked an inline comment as done.



You're right, I'm mixing up American and British spelling in this patch, thanks for spotting!

dmgreen added inline comments.Nov 16 2020, 12:00 AM

This could do with a quick clang-format.


Can you add a comment saying this has to potentially construct a scalable step. And that otherwise it should be folded to a constant?


Perhaps add FIXME:


Is it worth just not automatically interleaving for the moment? It might simplify things until the cost model is doing better, at least.


Is there a better way to print this?

sdesmalen updated this revision to Diff 305509.Nov 16 2020, 7:38 AM
sdesmalen marked 5 inline comments as done.
  • Addressed comments.
  • Rebased after parent revision D88962 was updated.

From a functional perspective, interleaving should be no different for scalable VFs as they are for fixed-width VFs. That means that if we keep interleaving enabled for scalable vectors, we'll be testing more functionality and code-paths in the vectorizer. We're still a way off from cost-modelling this properly, so until then I'd like to suggest we keep this enabled by default mostly for testing purposes. Does that make sense?


Good point. I believe this can just print VF since ElementCount defines operator<< that adds "vscale x ".

dmgreen added inline comments.Nov 23 2020, 2:51 AM

I was hoping that if we remove the need for this, we could remove the need for the "if scalable return 1" in the cost model. That way we get the testing of the cost model, which seems like a larger issue to test to me.

Can we remove that cost model change already? Or are there too many paths that do not work yet? The simple loop you have in the commit message might work already.

sdesmalen updated this revision to Diff 307097.Nov 23 2020, 9:11 AM

Removed early return from getInstructionCost, as this shortcut wasn't actually necessary for the test.

sdesmalen marked an inline comment as done.Nov 23 2020, 9:11 AM
sdesmalen added inline comments.

It seems I can actually leave this in and still remove the "if scalable return 1" (at least for the unit test in this patch).

dmgreen accepted this revision.Nov 24 2020, 4:52 AM

Nice. Thanks. That alleviates my objections to the interleaving count.

This LGTM. I think it may depend on some other patches? If not perhaps leave this open for a day or two in case other have comments.

This revision is now accepted and ready to land.Nov 24 2020, 4:52 AM
sdesmalen updated this revision to Diff 310169.Dec 8 2020, 6:17 AM
sdesmalen marked an inline comment as done.

Rebased patch after D90687 and D88962 landed. This required a small change to metadata-width.ll, because the original test case tries to vectorize an induction variable, which this prototype doesn't yet support.