The current way of computing the RegMask in RegUsageInfoCollector is very slow on architectures with certain register configurations.
That is particularly true on architectures with many registers that are aliasing a lot one another. An example of this in the LLVM source is the AMDGPU backend , that has almost 3500 registers and where the VGPR_512, VGPR_256, VGPR_* register are composed of consecutive instances of VGPR_32 registers. All these classes end up aliasing one another.
The current algorithm based on isPhysRegModified() is iterating over all the registers and calling isPhysRegModified() for each one.
isPhysRegModified() itself will scan over all the aliasing registers and if any of the aliased register is defined it will return true.
In a backend like AMDGPU where most registers alias with other 30-40 registers it results in a very high amount of iterations and is very slow.
The algorithm proposed in this patch instead of using isPhysRegModified checks that the register is defined and if it is it will iterate over the aliased registers and set them as defined as well.
I believe this should have the same effect as the previous algorithm, but in this case we'll only iterate over the aliased registers only if the register is defined.
Because most of the time only a subset of the whole set of LLVM registers is actually defined directly by an instruction in a function this results in an average much lower number of iterations.
Please no auto when the type isn't immediately visible.