Some PTX special registers are bounded per CUDA programming guide.
Leveraing the bounds of these special registers can lead to more precise
value analysis.
Add two new tests in test/Transforms/InstCombine/intrinsics.ll
Depends on D4144
Differential D4150
[ValueTracking] Consider the bounds of PTX special registers jingyue on Jun 15 2014, 5:17 PM. Authored by
Details
Some PTX special registers are bounded per CUDA programming guide. Add two new tests in test/Transforms/InstCombine/intrinsics.ll Depends on D4144
Diff Detail Event TimelineComment Actions Mark, I agree with your concern. I just found out we can use -target-cpu to pass the compute capacity (e.g., sm_35) to the clang frontend. I'll send out another diff. Thanks! Comment Actions I should also mention that I encountered some long compilation times which are superlinear with the unroll count when experimenting with the pragma loop limit. With the current limit (32K) on a simple loop the compilation time is ~7s. Doubling the limit results in a compilation time of ~50s. It seems to be beneath llvm::UnrollLoop -> FoldBlockIntoPredecessor -> llvm::ScalarEvolution::forgetLoop. Comment Actions
Can you please file a PR to track this issue? Thanks,
Comment Actions Because the ranges of PTX special registers depend on subtarget (-target-cpu), we will have clang attach range metadata to these intrinsics and have the optimizer pick up these metadata. The second part is committed in r211281 (D4187). Will work on the first part. |
spelling nit: levaraging