Page MenuHomePhabricator

Remove the user-count threshold when analyzing read attributes
ClosedPublic

Authored by wengxt on Jul 17 2015, 1:37 PM.

Details

Summary

This threshold limited FunctionAttrs ability to prove arguments to be read-only.
In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.

Removing this threshold won't cause significant regression in compilation time, because the worst-case time complexity of the algorithm is still O(# of instructions) for each parameter.

Patched by Xuetian Weng.

[1] https://github.com/vetter/shoc

Diff Detail

Event Timeline

wengxt updated this revision to Diff 30032.Jul 17 2015, 1:37 PM
wengxt retitled this revision from to Replace hardcoded threshold with an option when analyze read attributes.
wengxt updated this object.
wengxt added reviewers: nlewycky, jingyue.
wengxt added a subscriber: llvm-commits.
jingyue accepted this revision.Jul 17 2015, 2:09 PM
jingyue edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jul 17 2015, 2:09 PM
wengxt updated this revision to Diff 30039.Jul 17 2015, 2:32 PM
wengxt edited edge metadata.

update test plan info

Xuetian Weng wrote:

wengxt created this revision.
wengxt added reviewers: nlewycky, jingyue.
wengxt added a subscriber: llvm-commits.

In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.

This patch changes the hardcoded threshold into an option to enable us to
change the threshold at runtime.

Question one, what does this have to do with ld.global.nc? It looks like
you've got a load instruction (wrapped in an intrinsic) and it in turn
has many uses. The fact it's ld.global.nc doesn't seem to be relevant?

In theory, if you replaced the ld.global.nc with a LoadInst, would you
have the same problem? Or is there some other optimization that occurs
on those uses of a LoadInst that doesn't occur on the ld.global.nc?

Question two, can you think of a way to design determinePointerReadAttrs
so that it doesn't need a use count but doesn't have a bad worst case
performance problem? For that matter, why isn't this code already linear
in the number of instructions?

Nick

[1] https://github.com/vetter/shoc

http://reviews.llvm.org/D11311

Files:

lib/Transforms/IPO/FunctionAttrs.cpp
test/Transforms/FunctionAttrs/readattrs-threshold.ll

Index: test/Transforms/FunctionAttrs/readattrs-threshold.ll
===================================================================

    • /dev/null +++ test/Transforms/FunctionAttrs/readattrs-threshold.ll @@ -0,0 +1,13 @@ +; RUN: opt< %s -functionattrs -S | FileCheck %s --check-prefix=CHECK +; RUN: opt< %s -functionattrs -readattrs-analysis-uses-threshold=1 -S | FileCheck %s --check-prefix=CHECK-NO-READATTR + +; %p has two uses +; CHECK: define i32 @test(i32* nocapture readonly %p) +; CHECK-NO-READATTR: define i32 @test(i32* nocapture %p) +define i32 @test(i32* %p) { + %1 = load i32, i32* %p + %2 = getelementptr i32, i32* %p, i32 1 + %3 = load i32, i32* %2 + %4 = add i32 %1, %3 + ret i32 %4 +} Index: lib/Transforms/IPO/FunctionAttrs.cpp ===================================================================
    • lib/Transforms/IPO/FunctionAttrs.cpp +++ lib/Transforms/IPO/FunctionAttrs.cpp @@ -44,6 +44,12 @@ STATISTIC(NumNoAlias, "Number of function returns marked noalias"); STATISTIC(NumAnnotated, "Number of attributes added to library functions");

      +static cl::opt<int> ReadAttrsAnalysisUsesThreshold( + "readattrs-analysis-uses-threshold", cl::Hidden, cl::init(20), + cl::ZeroOrMore, + cl::desc("Control the maximum number of uses to analyze for " + "readonly/readnone attribute (default = 20)")); + namespace { struct FunctionAttrs : public CallGraphSCCPass { static char ID; Pass identification, replacement for typeid @@ -425,7 +431,7 @@ We don't need to track IsWritten. If A is written to, return immediately.

      for (Use&U : A->uses()) {
  • if (Count++>= 20) + if (Count++>= ReadAttrsAnalysisUsesThreshold) return Attribute::None;

    Visited.insert(&U);

llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

Question one, what does this have to do with ld.global.nc? It looks like
you've got a load instruction (wrapped in an intrinsic) and it in turn
has many uses. The fact it's ld.global.nc doesn't seem to be relevant?

In theory, if you replaced the ld.global.nc with a LoadInst, would you
have the same problem? Or is there some other optimization that occurs
on those uses of a LoadInst that doesn't occur on the ld.global.nc?

In our case, we intend to transform LoadInst into ld.global.nc during Instruction
selection, not the other direction. It's only safe to use ld.global.nc for LoadInst
if the pointer being accessed is readonly (and noalias, which is irreverent to this
patch).

Question two, can you think of a way to design determinePointerReadAttrs
so that it doesn't need a use count but doesn't have a bad worst case
performance problem? For that matter, why isn't this code already linear
in the number of instructions?

Yeah, I also think this is linear in the number of instructions, and I don't think
there can be multiple call on this function that make it quadratic or something.
The complexity should be O(num_of_arguments * num_of_instructions)
Maybe simply remove this threshold is good enough. Is there any other reason
to introduce this threshold?

Nick

[1] https://github.com/vetter/shoc

http://reviews.llvm.org/D11311

Files:

lib/Transforms/IPO/FunctionAttrs.cpp
test/Transforms/FunctionAttrs/readattrs-threshold.ll

Index: test/Transforms/FunctionAttrs/readattrs-threshold.ll
===================================================================

    • /dev/null +++ test/Transforms/FunctionAttrs/readattrs-threshold.ll @@ -0,0 +1,13 @@ +; RUN: opt< %s -functionattrs -S | FileCheck %s --check-prefix=CHECK +; RUN: opt< %s -functionattrs -readattrs-analysis-uses-threshold=1 -S | FileCheck %s --check-prefix=CHECK-NO-READATTR + +; %p has two uses +; CHECK: define i32 @test(i32* nocapture readonly %p) +; CHECK-NO-READATTR: define i32 @test(i32* nocapture %p) +define i32 @test(i32* %p) { + %1 = load i32, i32* %p + %2 = getelementptr i32, i32* %p, i32 1 + %3 = load i32, i32* %2 + %4 = add i32 %1, %3 + ret i32 %4 +} Index: lib/Transforms/IPO/FunctionAttrs.cpp ===================================================================
    • lib/Transforms/IPO/FunctionAttrs.cpp +++ lib/Transforms/IPO/FunctionAttrs.cpp @@ -44,6 +44,12 @@ STATISTIC(NumNoAlias, "Number of function returns marked noalias"); STATISTIC(NumAnnotated, "Number of attributes added to library functions");

      +static cl::opt<int> ReadAttrsAnalysisUsesThreshold( + "readattrs-analysis-uses-threshold", cl::Hidden, cl::init(20), + cl::ZeroOrMore, + cl::desc("Control the maximum number of uses to analyze for " + "readonly/readnone attribute (default = 20)")); + namespace { struct FunctionAttrs : public CallGraphSCCPass { static char ID; Pass identification, replacement for typeid @@ -425,7 +431,7 @@ We don't need to track IsWritten. If A is written to, return immediately.

      for (Use&U : A->uses()) {
  • if (Count++>= 20) + if (Count++>= ReadAttrsAnalysisUsesThreshold) return Attribute::None;

    Visited.insert(&U);

llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

Xuetian Weng wrote:

wengxt added a comment.

Question one, what does this have to do with ld.global.nc? It looks like

you've got a load instruction (wrapped in an intrinsic) and it in turn
has many uses. The fact it's ld.global.nc doesn't seem to be relevant?

In theory, if you replaced the ld.global.nc with a LoadInst, would you

have the same problem? Or is there some other optimization that occurs
on those uses of a LoadInst that doesn't occur on the ld.global.nc?

In our case, we intend to transform LoadInst into ld.global.nc during Instruction
selection, not the other direction. It's only safe to use ld.global.nc for LoadInst
if the pointer being accessed is readonly (and noalias, which is irreverent to this
patch).

I realize that, but my question stands. Is there an optimization that
should be happening to these ld.global.nc before we get here?

Question two, can you think of a way to design determinePointerReadAttrs

so that it doesn't need a use count but doesn't have a bad worst case
performance problem? For that matter, why isn't this code already linear
in the number of instructions?

Yeah, I also think this is linear in the number of instructions, and I don't think
there can be multiple call on this function that make it quadratic or something.
The complexity should be O(num_of_arguments * num_of_instructions)
Maybe simply remove this threshold is good enough. Is there any other reason
to introduce this threshold?

Nope, I can't think of any reason. I think this threshold is safe to remove!

Nick

[1] https://github.com/vetter/shoc

http://reviews.llvm.org/D11311

Files:

lib/Transforms/IPO/FunctionAttrs.cpp
test/Transforms/FunctionAttrs/readattrs-threshold.ll

Index: test/Transforms/FunctionAttrs/readattrs-threshold.ll

===================================================================
  • /dev/null +++ test/Transforms/FunctionAttrs/readattrs-threshold.ll @@ -0,0 +1,13 @@ +; RUN: opt< %s -functionattrs -S | FileCheck %s --check-prefix=CHECK +; RUN: opt< %s -functionattrs -readattrs-analysis-uses-threshold=1 -S | FileCheck %s --check-prefix=CHECK-NO-READATTR + +; %p has two uses +; CHECK: define i32 @test(i32* nocapture readonly %p) +; CHECK-NO-READATTR: define i32 @test(i32* nocapture %p) +define i32 @test(i32* %p) { + %1 = load i32, i32* %p + %2 = getelementptr i32, i32* %p, i32 1 + %3 = load i32, i32* %2 + %4 = add i32 %1, %3 + ret i32 %4 +} Index: lib/Transforms/IPO/FunctionAttrs.cpp ===================================================================
  • lib/Transforms/IPO/FunctionAttrs.cpp +++ lib/Transforms/IPO/FunctionAttrs.cpp @@ -44,6 +44,12 @@ STATISTIC(NumNoAlias, "Number of function returns marked noalias"); STATISTIC(NumAnnotated, "Number of attributes added to library functions");
+static cl::opt<int>   ReadAttrsAnalysisUsesThreshold( +    "readattrs-analysis-uses-threshold", cl::Hidden, cl::init(20), +    cl::ZeroOrMore, +    cl::desc("Control the maximum number of uses to analyze for " +             "readonly/readnone attribute (default = 20)")); + namespace { struct FunctionAttrs : public CallGraphSCCPass { static char ID; // Pass identification, replacement for typeid @@ -425,7 +431,7 @@ // We don't need to track IsWritten. If A is written to, return immediately.
for (Use&U : A->uses()) {
  • if (Count++>= 20) + if (Count++>= ReadAttrsAnalysisUsesThreshold) return Attribute::None;
Visited.insert(&U);

llvm-commits mailing list

llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

http://reviews.llvm.org/D11311

Xuetian Weng wrote:

wengxt added a comment.

Question one, what does this have to do with ld.global.nc? It looks like
you've got a load instruction (wrapped in an intrinsic) and it in turn
has many uses. The fact it's ld.global.nc doesn't seem to be relevant?

In theory, if you replaced the ld.global.nc with a LoadInst, would you
have the same problem? Or is there some other optimization that occurs
on those uses of a LoadInst that doesn't occur on the ld.global.nc?

In our case, we intend to transform LoadInst into ld.global.nc during Instruction
selection, not the other direction. It's only safe to use ld.global.nc for LoadInst
if the pointer being accessed is readonly (and noalias, which is irreverent to this
patch).

I realize that, but my question stands. Is there an optimization that
should be happening to these ld.global.nc before we get here?

My commit message might be confusing. I don't think this change has anything to do with ld.global.nc. The LoadInst to ld.global.nc optimization is just one of the examples which can benefit from this change. And we happened to notice that when we are working on our cuda compiler.

wengxt updated this revision to Diff 30532.Jul 23 2015, 3:46 PM

remove the threshold

wengxt retitled this revision from Replace hardcoded threshold with an option when analyze read attributes to Replace hardcoded threshold when analyze read attributes.Jul 23 2015, 3:47 PM
wengxt retitled this revision from Replace hardcoded threshold when analyze read attributes to Replace the hardcoded threshold when analyze read attributes.
nicholas accepted this revision.Jul 24 2015, 11:51 AM
nicholas added a reviewer: nicholas.

LGTM

jingyue retitled this revision from Replace the hardcoded threshold when analyze read attributes to Remove the user-count threshold when analyzing read attributes.Jul 24 2015, 12:05 PM
jingyue updated this object.
jingyue edited edge metadata.
jingyue closed this revision.Jul 24 2015, 12:06 PM

This patch caused an unused warning/error, which was fixed in r243145. Thanks dnovillo@!