For RVV, we are very reliant on constant pools for fixed length constants. The default lowering for constant pool aligns the entry to the ABI alignment. For a vector, this is usually the size of the type in question. As this isn't actually exposed in the ABI (right?!), the resulting alignment creates a bunch of extra padding with no value.
This change reduces the alignment used to be the vector element alignment. This closely matches the reasoning in the allowsMisalignedMemoryAccesses routine (and we assert they're in sync.) Note that our instruction choice doesn't change; only the alignment of the constant pool entry.
Performance effects here may be a bit complicated, but I think (hope?) it should be generally positive. Potential downsides include:
- Placing data immediately after end of the previous function. This may confuse instruction decode which is fetching in chunks, and tries to decode the data as instructions.
- Changing the working set of the following function. By removing alignment, we may either decrease or increase the size of this set. Note that we actually have two working sets to consider - d-cache and i-cache. Each can change independently.
Note that the downsides above already apply to non-vector data (since they are naturally less aligned). If we have a processor which has problems with the above items, we should probably be trying to mitigate the general issues as opposed to getting lucky due to vector constant pools. :)
Note: I'd originally tried to do something here which was more target independent, but I found that a) reducing alignment caused massive test diffs, and b) exposed what appeared to be a number of missing folds on x86. Thus the target specific hook approach taken here.
Messing with the alignment of constants in .rodata.cst16 is going to have nearly zero effect; the data is getting emitted in 16-byte chunks, so there's no padding anyway. (Not sure if this is what you meant to test.)