This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active lanes of Wavefront
using llvm.cttz and performs the following steps:
- Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic
- Accumulate the result.
- Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel.
Just "UseDpp"?