Load & store alignment is used mostly late. Since the module pass is run
early and the CGSCC pass is run late we wait for the latter to derive
them. This mainly avoids redundant duplication in case we fail.
This is a compile-time reduction patch and it is unclear if we want to
introduce such switches. Though I feel we might need more of them.