[AMDGPU] Fix DPP combiner
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:
- dpp move with uses and old reg initializer should be in the same BB.
- bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
- Added add, subrev, and, or instructions to the old folding function.
- Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
Differential revision: https://reviews.llvm.org/D55444