Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you improve the summary to explain why this is being done? Its the same reasons as we align loops.
Should this be done for all cpus? I can see how that would make sense, but as far as I understand you are only really aiming for M-class devices. And we haven't in the past aligned loops for v6m devices (or some of the higher end v7m devices).
llvm/test/CodeGen/ARM/preferred-function-alignment.ll | ||
---|---|---|
1 | It might be better to make this an Arm CPU deliberately (as opposed to thumb), as opposed to generic. I believe that is what this is testing. |
I've assigned the function alignment to the same as the loop alignment, as in my testing I'd seen that the values are "best" when they are equal.
Can you improve the summary to explain why this is being done? Its the same reasons as we align loops.
Words seem to be failing me today. Hopefully the new summary makes sense.
I agree it makes sense to use the same alignments, especially for cortex-m cpus. Can you update the LoopAlignment. Maybe call it "CodeAlignment" now? I'm not sure that's better or not to change the name. The documentation can be changed to: "/// What alignment is preferred for loop bodies <and functions>, in log2(bytes)."
Otherwise LGTM. Thanks
It might be better to make this an Arm CPU deliberately (as opposed to thumb), as opposed to generic. I believe that is what this is testing.