This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Don't consider conditional-load dereferenceability when vectorization is forced
ClosedPublic

Authored by hfinkel on Apr 25 2016, 4:28 PM.

Details

Summary

I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
#pragma clang loop vectorize(assume_safety)
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we still don't vectorize this loop. Even with "assume_safety", the check that we don't if-convert conditionally-executed loads (to protect against data-dependent deferenceability) was not elided. We should vectorize this.

The change here seems straightforward. One subtlety: As implemented, it will still prefer to use a masked-load instrinsic (given target support) over the speculated load. The choice here seems architecture specific; the best option depends on how expensive the masked load is compared to a regular load. Ideally, using the masked load still reduces unnecessary memory traffic, and so should be preferred. If we'd rather do it the other way, flipping the order of the checks is easy.

There is still an issue with the generated code: it contains runtime overlap checks. Fixing that will be follow-up work.

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 54939.Apr 25 2016, 4:28 PM
hfinkel retitled this revision from to [LoopVectorize] Don't consider conditional-load dereferenceability when vectorization is forced.
hfinkel updated this object.
hfinkel added a subscriber: llvm-commits.

Actually, checking:

Hints->getForce() == LoopVectorizeHints::FK_Enabled

will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

TheLoop->isAnnotatedParallel()

to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

There is still an issue with the generated code: it contains runtime overlap checks. Fixing that will be follow-up work.

On this, the loop is not fully annotated by the time it hits the vectorizer. SimplyCFG is dropping the metadata on one of the loads. I'll need to fixup the test case to be fully annotated if/when a check for TheLoop->isAnnotatedParallel() is added.

anemet edited edge metadata.Apr 25 2016, 5:18 PM

Actually, checking:

Hints->getForce() == LoopVectorizeHints::FK_Enabled

will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

I was about point this out and then ask you this :-) :

TheLoop->isAnnotatedParallel()

to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

Looks like we only use llvm.mem.parallel_loop_access to convey assume_safety so I guess we can just extend it to imply if-convertable loads as well, no?

Actually, checking:

Hints->getForce() == LoopVectorizeHints::FK_Enabled

will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

I was about point this out and then ask you this :-) :

TheLoop->isAnnotatedParallel()

to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

Looks like we only use llvm.mem.parallel_loop_access to convey assume_safety so I guess we can just extend it to imply if-convertable loads as well, no?

I'm happy to do this (we can always differentiate late if there's a use case for that). I'll update the patch.

Would be good to also document this in LangRef.rst.

hfinkel updated this revision to Diff 54958.Apr 25 2016, 6:12 PM
hfinkel edited edge metadata.

It is the llvm.mem.parallel_loop_access that really implies the if-conversion safety. Check that. Note the semantic addition in the LangRef.

anemet accepted this revision.Apr 25 2016, 6:15 PM
anemet edited edge metadata.

LGTM. You may be able to drop the llvm.loop.vectorize.enable MD now.

This revision is now accepted and ready to land.Apr 25 2016, 6:15 PM
This revision was automatically updated to reflect the committed changes.