Page MenuHomePhabricator

[ValueTracking] Consider the bounds of PTX special registers
AbandonedPublic

Authored by jingyue on Jun 15 2014, 5:17 PM.

Details

Summary

Some PTX special registers are bounded per CUDA programming guide.
Leveraing the bounds of these special registers can lead to more precise
value analysis.

Add two new tests in test/Transforms/InstCombine/intrinsics.ll

Depends on D4144

Diff Detail

Event Timeline

jingyue updated this revision to Diff 10434.Jun 15 2014, 5:17 PM
jingyue retitled this revision from to [ValueTracking] Consider the bounds of PTX special registers.
jingyue updated this object.
jingyue edited the test plan for this revision. (Show Details)
jingyue added reviewers: eliben, jholewinski, meheff.
jingyue added a subscriber: Unknown Object (MLST).
eliben accepted this revision.Jun 16 2014, 8:36 AM
eliben edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jun 16 2014, 8:36 AM
meheff added inline comments.Jun 16 2014, 9:51 AM
lib/Analysis/ValueTracking.cpp
759

spelling nit: levaraging

778

These values seem to be CUDA version specific. Is there any way of guarding these? And how will we know to update these when CUDA N+1 is supported with different values?

Mark, I agree with your concern. I just found out we can use -target-cpu to pass the compute capacity (e.g., sm_35) to the clang frontend. I'll send out another diff. Thanks!

meheff edited edge metadata.Jun 16 2014, 12:08 PM

I should also mention that I encountered some long compilation times which are superlinear with the unroll count when experimenting with the pragma loop limit. With the current limit (32K) on a simple loop the compilation time is ~7s. Doubling the limit results in a compilation time of ~50s. It seems to be beneath llvm::UnrollLoop -> FoldBlockIntoPredecessor -> llvm::ScalarEvolution::forgetLoop.

  • Original Message -----

From: "Mark Heffernan" <meheff@google.com>
To: jingyue@google.com, "justin holewinski" <justin.holewinski@gmail.com>, eliben@google.com, meheff@google.com
Cc: llvm-commits@cs.uiuc.edu
Sent: Monday, June 16, 2014 2:08:29 PM
Subject: Re: [PATCH] [ValueTracking] Consider the bounds of PTX special registers

I should also mention that I encountered some long compilation times
which are superlinear with the unroll count when experimenting with
the pragma loop limit. With the current limit (32K) on a simple
loop the compilation time is ~7s. Doubling the limit results in a
compilation time of ~50s. It seems to be beneath llvm::UnrollLoop
-> FoldBlockIntoPredecessor -> llvm::ScalarEvolution::forgetLoop.

Can you please file a PR to track this issue?

Thanks,
Hal

http://reviews.llvm.org/D4150


llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

jingyue abandoned this revision.Jun 19 2014, 10:03 AM

Because the ranges of PTX special registers depend on subtarget (-target-cpu), we will have clang attach range metadata to these intrinsics and have the optimizer pick up these metadata. The second part is committed in r211281 (D4187). Will work on the first part.