This patch makes instruction fusion more aggressive by
- adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
- updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate.
This change increases the number of AES instruction pairs generated on
Cortex-A57 and Cortex-A72. This doesn't change code at all in
most benchmarks or general code, but we've seen improvement on kernels
using AESE/AESMC and AESD/AESIMC.
This seems to change core PostRA-Scheduler logic and that may impact other targets. On the other hand, I see that GenericScheduler tries ClusteredNodes candidates and this was missing from here (perhaps for no good reason).