This patch implements CFG structurization based on the paper "Taming Control Divergence in GPUs through Control Flow Linearization" by Jayvant Anantpur, Govindarajan R.
It works somewhat different than the algorithm in the paper, the main difference is that it is region based, and does the transformation backward in stead of forward within a region.
The purpose for the new pass is to do the structurization much later in the compiler to not inhibit optimization. In the AMDGPU backend it is added right before register allocation. The transform is implemented as a target independent pass with a few new callbacks to the target. Hopefully it will be useful to others at some point.
There is a hidden flag added to be able to select the early (old) and the late CFG structurizer. There are some littest failures with the AMDGPU tests when running with the new structurizer, which we are working on fixing.
Adding commented out code