This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Reduce alignment of vector constant pool entries
AbandonedPublic

Authored by reames on Jul 18 2023, 9:07 AM.

Details

Summary

For RVV, we are very reliant on constant pools for fixed length constants. The default lowering for constant pool aligns the entry to the ABI alignment. For a vector, this is usually the size of the type in question. As this isn't actually exposed in the ABI (right?!), the resulting alignment creates a bunch of extra padding with no value.

This change reduces the alignment used to be the vector element alignment. This closely matches the reasoning in the allowsMisalignedMemoryAccesses routine (and we assert they're in sync.) Note that our instruction choice doesn't change; only the alignment of the constant pool entry.

Performance effects here may be a bit complicated, but I think (hope?) it should be generally positive. Potential downsides include:

  • Placing data immediately after end of the previous function. This may confuse instruction decode which is fetching in chunks, and tries to decode the data as instructions.
  • Changing the working set of the following function. By removing alignment, we may either decrease or increase the size of this set. Note that we actually have two working sets to consider - d-cache and i-cache. Each can change independently.

Note that the downsides above already apply to non-vector data (since they are naturally less aligned). If we have a processor which has problems with the above items, we should probably be trying to mitigate the general issues as opposed to getting lucky due to vector constant pools. :)

Note: I'd originally tried to do something here which was more target independent, but I found that a) reducing alignment caused massive test diffs, and b) exposed what appeared to be a number of missing folds on x86. Thus the target specific hook approach taken here.

Diff Detail

Event Timeline

reames created this revision.Jul 18 2023, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 9:07 AM
reames requested review of this revision.Jul 18 2023, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 9:07 AM

That remind me this patch: D112531, and back to psABI part, yes, we don't have too much word on fixed length vector, one point I am hesitate here is P-extension (but it never ratified and no clear timeline.) use GPR to hold fixed-length vector, then that will require align to XLEN.

Note: I'd originally tried to do something here which was more target independent, but I found that a) reducing alignment caused massive test diffs, and b) exposed what appeared to be a number of missing folds on x86. Thus the target specific hook approach taken here.

SSE arithmetic ops can only fold aligned loads. Is that what you were seeing?

efriedma added inline comments.
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-constant-pool-alignment.ll
5

Messing with the alignment of constants in .rodata.cst16 is going to have nearly zero effect; the data is getting emitted in 16-byte chunks, so there's no padding anyway. (Not sure if this is what you meant to test.)

reames abandoned this revision.Jul 21 2023, 3:32 PM

Abandoning as the benefit here appears to be marginal at best due to a misunderstanding pointed out by @efriedma's comment. (My inline response expands a bit on his point.)

That remind me this patch: D112531, and back to psABI part, yes, we don't have too much word on fixed length vector, one point I am hesitate here is P-extension (but it never ratified and no clear timeline.) use GPR to hold fixed-length vector, then that will require align to XLEN.

P can be safely ignored at this point; we can worry about it when we add toolchain support. If ever.

Note: I'd originally tried to do something here which was more target independent, but I found that a) reducing alignment caused massive test diffs, and b) exposed what appeared to be a number of missing folds on x86. Thus the target specific hook approach taken here.

SSE arithmetic ops can only fold aligned loads. Is that what you were seeing?

Yep, that looks like what I was seeing.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-constant-pool-alignment.ll
5

I went and dug through this carefully.

As you point out, the data fragments do end up being concatenated together into the .rodata section, and the alignment of the entries within the section doesn't end up mattering much. We group 16b chunks into .rodata.cst16, and 32b chunks into .rodata.cst16. These are then combined to form the final .rodata section, and thus the internal alignment doesn't create any additional padding within that section. This change does end up reducing the alignment requirement of the .rodata section as a whole (from 32 to 8 in this example), but that's a pretty marginal gain.

When I wrote the patch, I was thinking the data fragments were in the .text section between the functions. This is not true.