I am planning to revert this change. This works with tests like test/CodeGen/AArch64/copy-zero-reg.ll. However, if there are multiple branches, this patch degrades in performance due to large number of mov instructions in each fall through. Here's an example of test case, where it degrades in performance.
Jun 20 2018
May 14 2018
May 4 2018
Apr 3 2018
I think we should ask backend if we want to gang up loads and stores, and if we do want to gang up loads and stores, then how many should we be ganging up together. Ganging up lots of loads may result in high register pressure.
Nov 14 2017
The load/store opt pass is already pretty expensive in terms of compile-time. Did you see any compile-time regressions in your testing? Also, what performance results have you collected?