This is an archive of the discontinued LLVM Phabricator instance.

[LV] Tail-folding with runtime memory checks
ClosedPublic

Authored by SjoerdMeijer on Aug 27 2019, 7:41 AM.

Details

Summary

The loop vectorizer is running in an assert when it tries to fold the tail and
has to emit runtime memory disambiguation checks. This happens as soon as
you try to do something like:

#pragma clang loop predicate(enable)
for (int i=0; i < N; i++)
  A[i] = B[i] + C[i];

where A, B, and C are integer pointers, not annotated with e.g. the restrict
type qualifier.

However, it looks like the logic to deal with this is all in place, so this
simply removes the assert.

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.Aug 27 2019, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2019, 7:41 AM

I wrote some manual tests, and looked at some IR, and the produced code looked okay to me.
I will continue testing a bit more until you point out to me that I missed something here and did something stupid :-)

Ayal added a comment.Aug 27 2019, 8:33 AM

The original intent was to make sure that under OptForSize only a single (vector) loop will be produced. I.e., w/o a scalar loop serving either as scalar leftover iterations or as runtime guard bailout. There's another such assert in emitSCEVChecks(). Now that FoldTailByMasking no longer implies OptForSize this should indeed be updated (to use CM_ScalarEpilogueNotAllowedOptSize instead perhaps?).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2699 ↗(On Diff #217401)

also here

Hi,

Thanks for taking a look at this, and for clarifying this:

The original intent was to make sure that under OptForSize only a single (vector) loop will be produced. I.e., w/o a scalar loop serving either as scalar leftover iterations or as runtime guard bailout.

and for pointing this out:

There's another such assert in emitSCEVChecks(). Now that FoldTailByMasking no longer implies OptForSize this should indeed be updated (to use CM_ScalarEpilogueNotAllowedOptSize instead perhaps?).

Shall we address this separately in a different patch? I am looking at this now, and feel that I have embarked on a little SCEV adventure, and that this is a separate issue.

Ayal added a comment.Aug 28 2019, 1:49 AM

Shall we address this separately in a different patch? I am looking at this now, and feel that I have embarked on a little SCEV adventure, and that this is a separate issue.

Sure. Can also add a TODO/FIXME for now.

SjoerdMeijer edited the summary of this revision. (Show Details)

Thanks, have added a FIXME, which I will address asap.

I have addressed the other assert in D66932.

Ayal added a comment.Aug 29 2019, 3:28 AM

I have addressed the other assert in D66932.

OK, great, then the FIXME is probably redundant.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2732 ↗(On Diff #217583)

Replace the assert so that it checks hasOptSize, instead of removing it?

llvm/test/Transforms/LoopVectorize/X86/tail-folding-memcheck.ll
13 ↗(On Diff #217583)

Better avoid hard coded values such as %12.
E.g., see what utils/update_test_checks.py makes of it.

Comments addressed.

SjoerdMeijer marked an inline comment as done.Aug 29 2019, 5:30 AM
SjoerdMeijer added inline comments.
llvm/test/Transforms/LoopVectorize/X86/tail-folding-memcheck.ll
13 ↗(On Diff #217583)

Thanks for spotting, copy-paste mistake.
I think checking the masked loads/stores and the labels is good enough (and conciser), but if you think it is better I can run the script.

Ayal accepted this revision.Sep 3 2019, 12:37 AM
This revision is now accepted and ready to land.Sep 3 2019, 12:37 AM

Many thanks for reviewing!

This revision was automatically updated to reflect the committed changes.

Hi,

I've hit the new assertion when compiling for my out of tree target when using -Osize in combination with

#pragma clang loop vectorize(enable)

on a loop.

What is supposed to prevent us from triggering the assertion in a case like that?

Hi,

I've hit the new assertion when compiling for my out of tree target when using -Osize in combination with

#pragma clang loop vectorize(enable)

on a loop.

What is supposed to prevent us from triggering the assertion in a case like that?

Can be reproduced on trunk with

clang -mllvm -disable-llvm-optzns -S -Xclang -emit-llvm-bc -Os lala.c -o ./lala.bc
opt -disable-basicaa -O1 -S -o - lala.bc

Hi, sorry about this.

Just looking at the .C file, I was surprised that it tries to emit runtime checks because restrict is. But running opt with disable-basicaa I can imagine it will try to do that yes; that doesn't really surprise me. And so when I ran clang -Os -fvectorize I am not running in to this assert.

Triggering this assert with opt shows its usefulness of this assert I think. But if we can trigger this from a user-facing tool clang then I think we do have a problem. Do you have a clang command to trigger this?

Why would it be ok to give the assert when we run

clang
opt

?
You mean that clang's output in that case is broken? In that case I'd expect a verifier in opt to catch the fault, not that opt would crash with an assertion.

Anyway, I get the crash also with

clang -mllvm -disable-basicaa -S -Os lala.c -o -
SjoerdMeijer added a comment.EditedSep 4 2019, 6:01 AM

I was just trying to say that triggering an assert by some option combination in a non-user facing tool is probably not that difficult. And I was also trying to say that the assert was kind of serving it purpose, because I was expecting basicaa to run. But probably that assumption is wrong?

But let's forget this, because I agree that asserting is not really a graceful way of dealing with this, and there's probably room for improvement. Just checking what you are expecting for this case: do you expect this loop to be vectorised? I would guess that because of disabling basicaa, runtime checks are required and there's code growth, which is undesired when optimising for size, so no vectorization.

Normally we do use basicaa, the crash was found in "fuzz" tests where we compile the code with random flags.

So I'm not really sure what the vectorizer should really do in that case, apart from not crashing :)

Okay, thanks for reporting, I will look into this.