Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64
for now.
This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs().
Would it make sense to simply rename "xxxClusterLoads" to "xxxClusterMemOps" instead of adding an extra interface? The target can still control whether loads or stores are clustered in getMemOpBaseRegImmOfs().
The same is true for a lot of the following code which would need the differentiation between loads and stores anymore.