This is for getPointerDependencyFrom in AMDGPUAnnotateUniformValues.
With the default scan limit of 100 instruction per block, we can reduce the compile time
of a special kernel from longer than 2 hours to 7 minutes.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp | ||
---|---|---|
109 | I don't see how this changes anything. This is what the default null case does? unsigned DefaultLimit = getDefaultBlockScanLimit(); if (!Limit) Limit = &DefaultLimit; |
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp | ||
---|---|---|
109 | Ooh, the change is even not correct due to the weird design of the Limit usage: --*Limit; if (!*Limit) return MemDepResult::getUnknown(); This will usually causes trouble if you don't use default nullptr. You will have to set the Limit before each call. To say that, the use at DeadStoreElimination.cpp is also not correct. |
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp | ||
---|---|---|
109 | Seems like you should fix that instead of working around it here, or make it a mandatory argument? |
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp | ||
---|---|---|
109 | We should fix DeadStoreElimination case anyway. And it seems DeadStoreElimination is the only case using explicit argument. However I am using |
https://reviews.llvm.org/D84873 addresses the same issue. So this is no longer needed.
I don't see how this changes anything. This is what the default null case does?