Set default preferred alignment for MemIntrinsic like memcpy according to arch32 or arch64,
it will improve performance.
e.g. dhrystone with "-O2" boosts performance by 50% on arch RV32.
JojoR on Sep 19 2022, 1:11 AM.Authored by
Please upload patches using -U99999 as indicated here https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface
What are arch32 and arch64?
On what implementation? Does this affect actually-useful benchmarks, not just dhrystone? I would assume so, but it'd be more useful to get numbers for meaningful benchmarks rather than ones people should've long since abandoned.
I think WORD_SIZE as preferred alignment is basic behavior to improve performance (use lw/sw instead of lb/sb), and dhrystone gets good performance benefit from this patch,
I'm not saying it's a bad idea, I'm just saying that the 50% isn't a very useful figure to quote as it's unlikely to be representative of real-world performance, so some less awful benchmarks would be more meaningful and helpful to quote.
The description doesn't clearly describe the effect of the patch. My understanding from reading the user of this function is that the alignment of allocas and global variables used by memcpy are increased in CodeGenPrepare. This results in less memory operations. In the case of 32-bit dhrystone, it looks like we have an explicit call to the memcpy library function. I guess by aligning the pointers we allow the source and dest to both be word_size aligned so we can use the full word copy loop?
In my local testing for riscv64 this didn't seem to affect dhrystone performance.
It looks like this hook is only used by codegenprepare, so the test should be in llvm/test/Transforms/CodeGenPrepare and check that the alignment of the IR variable was changed rather than indirectly testing it via assembly code. I'd also add a version with an alloca.
Additionally, I don't think this should be specific to RISC-V, I would assume that all architectures that don't have unaligned accesses (and even those with slower accesses) would benefit from aligning global string constants that are used in memcpy operations. I see that right now this hook is only used by ARM targets (D7908).
Can this be changed to call allowsMisalignedMemoryAccesses() and adjusting the alignment to the size of pointers?