Some PTX special registers are bounded per CUDA programming guide.
Leveraing the bounds of these special registers can lead to more precise
Add two new tests in test/Transforms/InstCombine/intrinsics.ll
Depends on D4144
jingyue on Jun 15 2014, 5:17 PM.Authored by
I should also mention that I encountered some long compilation times which are superlinear with the unroll count when experimenting with the pragma loop limit. With the current limit (32K) on a simple loop the compilation time is ~7s. Doubling the limit results in a compilation time of ~50s. It seems to be beneath llvm::UnrollLoop -> FoldBlockIntoPredecessor -> llvm::ScalarEvolution::forgetLoop.
Can you please file a PR to track this issue?