This is an archive of the discontinued LLVM Phabricator instance.

[LV] Enable vectorization of multiple exit loops w/computable exit counts
ClosedPublic

Authored by reames on Jul 12 2021, 7:56 AM.

Details

Summary

This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Diff Detail

Event Timeline

reames created this revision.Jul 12 2021, 7:56 AM
reames requested review of this revision.Jul 12 2021, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 7:56 AM
anna added a comment.Jul 14 2021, 12:41 PM

LGTM. Thanks for getting this finally for LV :) Perhaps wait for other reviewers a day or so?

Didn't see any target specific LV tests failing in the list of "failed tests" above. so looks like we're fine.

fhahn accepted this revision.Jul 15 2021, 7:08 AM

LGTM, thanks Philip!

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1133

IIUC the update logic for phis has been generalised already.

This revision is now accepted and ready to land.Jul 15 2021, 7:08 AM
This revision was landed with ongoing or failed builds.Jul 15 2021, 8:53 AM
This revision was automatically updated to reflect the committed changes.

@reames This commit has broken our 2 stage A64FX SVE bot. (sorry for the late report, took me a while to bisect it)

https://lab.llvm.org/buildbot/#/builders/176/builds/88

This bot builds stage 1 with -mcpu=a64fx then uses that to build stage 2 with flags -mcpu=a64fx -mllvm -aarch64-sve-vector-bits-min=512 and doing so leads to this error from clang-tblgen:

[3689/7201] Building AbstractBasicReader.inc...
FAILED: tools/clang/include/clang/AST/AbstractBasicReader.inc 
cd /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2 && /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2/bin/clang-tblgen -gen-clang-basic-reader -I /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include/clang/AST -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2/tools/clang/include -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2/include -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/include /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include/clang/AST/PropertiesBase.td --write-if-changed -o tools/clang/include/clang/AST/AbstractBasicReader.inc -d tools/clang/include/clang/AST/AbstractBasicReader.inc.d
/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include/clang/AST/PropertiesBase.td:362:3: error: creation code for Array doesn't refer to property "totalLength"
  def : Creator<[{
  ^

I haven't seen this on any other bot, the code itself seems fine (though first time reading it myself) and if I remove -mllvm -aarch64-sve-vector-bits-min=512 the build works. So I assume what's happening is something is being vectorised incorrectly, or doing so is uncovering a different issue. It could be particular to SVE.

Obviously "build all of llvm and some of clang" isn't a great reproducer so I'm going to see if I can emit some vectorisation info with the names/locations of the functions that were previously not vectorised and now are.

Any other ideas welcome.

Any other ideas welcome.

@DavidSpickett - I took a skim through the SVE code to see if anything obvious fell out, but I didn't spot anything. Unfortunately, you've got a combination of an architecture I'm unfamiliar with, and an IR feature I'm unfamiliar with. I don't think I'll be able to help you much without a reproducer and some context.

A couple ideas for you:

  1. If you use smaller min bit widths, do you still see the problem?
  2. Does setting that flag change the number of loops which get vectorizer? If not, it must change *how* they get vectorized.
  3. Is there a known bug with scalable vectorization and epilogue loops? This change causes a lot more epilogue loops to be generated.
  4. If you change the control knob for preferring fixed vs scalable with SVE registers, does the symptom change?

Does setting that flag change the number of loops which get vectorizer? If not, it must change *how* they get vectorized.

I found a single loop in StringRef::find was vectorised with this patch applied. I've opened https://bugs.llvm.org/show_bug.cgi?id=51182 to track me investigating that.

That doesn't tell us if it's an SVE specific issue so I will look at the IR produced then at the SVE assembly itself, with some of the steps you suggested.

Oh and I'll update the bot to workaround this issue until we can find the cause.