This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Respect pragma unroll when loop contains convergent instructions
Needs RevisionPublic

Authored by yaxunl on Feb 21 2018, 2:33 PM.

Details

Summary

Currently loop unroll is conservative about loops containing convergent instructions.
It does not allow remainder for such loops, which essentially disables unroll count
requested by pragma and results in fully unrolled loop in many cases.

As such a user may specify pragma unroll 32 but instead gets the loop unrolled 512
and results in extremely long compilation time.

For some target, e.g. AMDGPU, the remainder does not cause extra divergence and
should be allowed.

This patch introduces AllowRemainderForConvergentLoop in
TargetTransformInfo::UnrollingPreferences and allows each target to specify
whether unrolling convergent loop with remainder is allowed. By default it is
false therefore no functional change for other targets.

This patch fixes shmembench-ocl compilation time issue on amdpu.

Diff Detail

Event Timeline

yaxunl created this revision.Feb 21 2018, 2:33 PM
This revision is now accepted and ready to land.Feb 21 2018, 4:06 PM
efriedma added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
426

I don't like sticking this here.

From your description, it sounds like it's a *correctness* property of the target, whether or not certain transforms which duplicate convergent operations are allowed. In that case, it's not really about unrolling at all; it could apply to other transforms which clone code. So at the very least, this should be a separate hook, with a clear explanation of exactly which transforms this allows.

yaxunl added inline comments.Feb 22 2018, 7:33 AM
include/llvm/Analysis/TargetTransformInfo.h
426

For this specific transform (adding remainder for unrolling loop) we know that it will not cause extra divergence on amdgcn target. However this is something not easily applied to other cases. So far I do not see how it can be applied to other transformations.

efriedma added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
426

The allowed transform can't be specifically "remainder loops produced by lib/Transforms/Utils/LoopUnrollRuntime.cpp in LLVM r324285"; there must be some set of similar transforms which are allowed (whether or not they're currently implemented in the current LLVM codebase).

test/Transforms/LoopUnroll/convergent.ll
100

I'm not sure this testcase really demonstrates what you want it to demonstrate... a trip count of 4 is divisible by an unroll count of 2, so you don't need a remainder loop anyway.

108

Are you sure this metadata is correct? It would be nice to have a separate test without the "convergent" marking to show it's making a difference.

The allowed transform can't be specifically "remainder loops produced by lib/Transforms/Utils/LoopUnrollRuntime.cpp in LLVM r324285"; there must be some set of similar transforms which are allowed (whether or not they're currently implemented in the current LLVM codebase).

Agree.

If we phrase this in terms of a specific set of transformations that are/aren't allowed, we may even be able to say that a remainder loop containing convergent functions is in fact safe on all platforms. I'm not sure, need to think about it...

nhaehnle requested changes to this revision.Feb 22 2018, 3:56 PM
nhaehnle added inline comments.
test/Transforms/LoopUnroll/convergent.ll
100

Agreed. I have to say, it looks to me like the loop unroll is simply overly conservative here, and should stick with the requested 2x unroll on all targets, despite the convergent function call.

There really shouldn't be a difference between AMDGPU and NVPTX at this point.

This revision now requires changes to proceed.Feb 22 2018, 3:56 PM
yaxunl added inline comments.Feb 27 2018, 9:39 AM
test/Transforms/LoopUnroll/convergent.ll
100

It seems there is a separate but related bug in loop unroll: basically if AllowRemainder is disabled, pragma unroll count is not respected even though there is no remainder. I created another patch for that issue https://reviews.llvm.org/D43826