Set default preferred alignment for MemIntrinsic like memcpy according to arch32 or arch64,
it will improve performance.
e.g. dhrystone with "-O2" boosts performance by 50% on arch RV32.
Differential D134168
[RISCV] Make preferred alignment of PointerArgs for MemIntrinsic JojoR on Sep 19 2022, 1:11 AM. Authored by
Details
Set default preferred alignment for MemIntrinsic like memcpy according to arch32 or arch64, e.g. dhrystone with "-O2" boosts performance by 50% on arch RV32.
Diff Detail
Unit Tests
Event TimelineComment Actions Please use like git diff -U999999 to make context available.(see https://llvm.org/docs/DeveloperPolicy.html#making-and-submitting-a-patch) Comment Actions Please upload patches using -U99999 as indicated here https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface
Comment Actions
What are arch32 and arch64?
On what implementation? Does this affect actually-useful benchmarks, not just dhrystone? I would assume so, but it'd be more useful to get numbers for meaningful benchmarks rather than ones people should've long since abandoned.
Comment Actions I think WORD_SIZE as preferred alignment is basic behavior to improve performance (use lw/sw instead of lb/sb), and dhrystone gets good performance benefit from this patch, Comment Actions I'm not saying it's a bad idea, I'm just saying that the 50% isn't a very useful figure to quote as it's unlikely to be representative of real-world performance, so some less awful benchmarks would be more meaningful and helpful to quote. Comment Actions The description doesn't clearly describe the effect of the patch. My understanding from reading the user of this function is that the alignment of allocas and global variables used by memcpy are increased in CodeGenPrepare. This results in less memory operations. In the case of 32-bit dhrystone, it looks like we have an explicit call to the memcpy library function. I guess by aligning the pointers we allow the source and dest to both be word_size aligned so we can use the full word copy loop? In my local testing for riscv64 this didn't seem to affect dhrystone performance.
Comment Actions It looks like this hook is only used by codegenprepare, so the test should be in llvm/test/Transforms/CodeGenPrepare and check that the alignment of the IR variable was changed rather than indirectly testing it via assembly code. I'd also add a version with an alloca. Additionally, I don't think this should be specific to RISC-V, I would assume that all architectures that don't have unaligned accesses (and even those with slower accesses) would benefit from aligning global string constants that are used in memcpy operations. I see that right now this hook is only used by ARM targets (D7908). Can this be changed to call allowsMisalignedMemoryAccesses() and adjusting the alignment to the size of pointers? |
This seems arbitrary. I'd expect at least XLEN / 8 here?