When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead.
This still now uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization.
This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set.
nit: check TTI.shouldMaximizeVectorBandwidth() first ,like in the original code?