Page MenuHomePhabricator

[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
Needs ReviewPublic

Authored by arichardson on Sep 20 2022, 6:23 AM.

Details

Summary

This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.

TODO: It should also improve performance for other benchmarks, would be
good to get some numbers.

Diff Detail

Event Timeline

arichardson created this revision.Sep 20 2022, 6:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2022, 6:23 AM
arichardson requested review of this revision.Sep 20 2022, 6:23 AM
arichardson added inline comments.
llvm/test/Transforms/CodeGenPrepare/RISCV/adjust-memintrin-alignment.ll
3

Not sure if there is a way to get the default target datalayout from opt, so I've hardcoded the relevant bits here.

efriedma added inline comments.Sep 20 2022, 2:05 PM
llvm/lib/CodeGen/TargetLoweringBase.cpp
967

The question is what size load/store ops would we prefer to use for the memcpy, and whether those ops require alignment. Using the alignment of a pointer seems arbitrary; we aren't loading or storing a pointer.

I'd prefer not to call getPointerAlignment() here if we can avoid it; the caller already does the math to figure out the current alignment and the increase. shouldUpdatePointerArgAlignment just needs to know what alignment it wants, not whether the call currently satisfies that alignment.

llvm/lib/Target/ARM/ARMISelLowering.cpp
1927

You want to make this more aggressive by default? Maybe... but we probably want different heuristics for small copies. (For example, aligning a 3-byte copy to 8 bytes makes no sense; we can't take advantage of alignment greater than 2 bytes.)

JojoR added inline comments.Sep 20 2022, 8:24 PM
llvm/lib/CodeGen/TargetLoweringBase.cpp
967

As @efriedma said, we can set default PrefAlign according to PointerSize,
but we should return final PrefAlign from backend du to backend's requirement,
different ISAs maybe have different alignment :)

Address feedback

arichardson added inline comments.Sep 28 2022, 4:08 AM
llvm/lib/CodeGen/TargetLoweringBase.cpp
967

I agree that this is a backend-specific choice. I would assume loading an aligned pointer is an efficient operation on most/all targets, and for the ones where this is not true, they can override shouldUpdatePointerArgAlignment().

I believe this change should match now what you did for RISC-V: MinSize==XLEN PrefAlign==XLEN.

llvm/lib/Target/ARM/ARMISelLowering.cpp
1927

I've reverted this part of the diff and added test to show we don't adjust 3/7 byte objects

arichardson added inline comments.Sep 28 2022, 4:11 AM
llvm/lib/CodeGen/TargetLoweringBase.cpp
966

@jrtc27 we may want to adjust these values for CHERI to not require 16-byte alignment&size, but I think even without a 8-byte fallback this should be a (minor) net win.

efriedma added inline comments.Sep 28 2022, 10:55 AM
llvm/include/llvm/CodeGen/TargetLowering.h
1933

It looks like the argument "Arg" is now unused?

Fix unused argument

efriedma added inline comments.Oct 4 2022, 12:31 PM
llvm/include/llvm/CodeGen/TargetLowering.h
1933

Still here? Did you mean to upload a different change?

arichardson added inline comments.Oct 4 2022, 12:36 PM
llvm/include/llvm/CodeGen/TargetLowering.h
1933

I was using it for allowsMisalignedMemoryAccesses() alignment, but dropped it in the previous diff.
I think having the current argument that is being processed could be useful for overrides (since they could make decisions based on the current alignment).

I've added the use back now, but am also happy to drop it.

1933

Arg is used in the call to allowsMisalignedMemoryAccesses(getPointerMemTy() to determine if it's already a fast operation.
I can drop it if you prefer, I just thought it is potentially useful to avoid additional aligning.

efriedma added inline comments.Oct 7 2022, 11:47 AM
llvm/include/llvm/CodeGen/TargetLowering.h
1933

I'm not sure how calling getPointerAlignment() avoids additional alignment, assuming PrefAlign is correct. shouldAlignPointerArgs is supposed to return the minimum "fast" alignment for the given call. If the actual alignment is already greater than or equal to that, the caller does nothing, even if shouldAlignPointerArgs returns true.

(If there's some alignment between 1 and PrefAlign that the target considers "fast", I guess you could run into an issue, but that implies PrefAlign is wrong.)

llvm/lib/CodeGen/TargetLoweringBase.cpp
967

Maybe TargetTransformInfo::getRegisterBitWidth() is a better default than getPointerPrefAlignment()? I guess that's the same thing for most targets, but it probably makes the intent a bit more clear...

I agree there isn't any default that's going to be correct for all targets.

arichardson planned changes to this revision.Oct 7 2022, 12:08 PM
arichardson added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
1933

Fair enough, I'll drop the argument and update the patch to avoid looking at the current alignment.

llvm/lib/CodeGen/TargetLoweringBase.cpp
967

I was not aware of that function. Thanks for pointing it out that does indeed sounds like the best default.

Address review feedback

efriedma added inline comments.Wed, Nov 16, 4:07 PM
llvm/lib/CodeGen/TargetLoweringBase.cpp
967

Did you mean to address this?

Address missed feedback - adds two more affected tests due to pointer/scalar register size differences