This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Limit inlining string functions with strict alignment
AbandonedPublic

Authored by evandro on May 24 2018, 4:39 PM.

Details

Summary

With strict alignment is required, limit excessive number of byte loads and stores when inlining memset() and memcpy().

This will prevent the test case below from failing with D45098.

Diff Detail

Repository
rL LLVM

Event Timeline

evandro created this revision.May 24 2018, 4:39 PM

If I'm following correctly, the problem here is that we don't have paired load/store operations for byte operations, so the threshold is roughly double what it should be. StrictAlign isn't really a good proxy for that; it doesn't have anything to do with whether a particular memcpy will lower to paired operations. Ideally, we should probably come up with some callback to give a bonus to paired operations rather than penalize all operations in StrictAlign mode.

That said, just checking StrictAlign might be a good enough approximation for now; most code isn't built with strict alignment. Needs an comment explaining what you're doing, though.

Given you're just checking the subtarget, I don't think you need to make getMaxStoresPerMemcpy virtual; you can just modify AArch64TargetLowering::AArch64TargetLowering.

evandro added a comment.EditedMay 25 2018, 9:15 AM

If I'm following correctly, the problem here is that we don't have paired load/store operations for byte operations, so the threshold is roughly double what it should be. StrictAlign isn't really a good proxy for that; it doesn't have anything to do with whether a particular memcpy will lower to paired operations. Ideally, we should probably come up with some callback to give a bonus to paired operations rather than penalize all operations in StrictAlign mode.

The problem is less the lack of paired byte wide loads and stores than having to use byte wide loads and stores.

That said, just checking StrictAlign might be a good enough approximation for now; most code isn't built with strict alignment. Needs an comment explaining what you're doing, though.

OK

Given you're just checking the subtarget, I don't think you need to make getMaxStoresPerMemcpy virtual; you can just modify AArch64TargetLowering::AArch64TargetLowering.

The sub-target doesn't exist in the base class though.

The AArch64Subtarget is explicitly passed as a parameter to AArch64TargetLowering::AArch64TargetLowering.

The AArch64Subtarget is explicitly passed as a parameter to AArch64TargetLowering::AArch64TargetLowering.

But getMaxStoresPerMemcpy() is called in SelectionDAG as TargetLowering::getMaxStoresPerMemcpy(). So, unless it's virtual, the latter, instead of AArch64TargetLowering::getMaxStoresPerMemcpy(), will be called.

I meant that you can change the value of MaxStoresPerMemcpy in AArch64TargetLowering::AArch64TargetLowering, instead of overriding getMaxStoresPerMemcpy.

I meant that you can change the value of MaxStoresPerMemcpy in AArch64TargetLowering::AArch64TargetLowering, instead of overriding getMaxStoresPerMemcpy.

If you mean decreasing the number of MaxStoresPerMemcpy, then that would restrict the inlining of memcpy() without -mattr=+strict-align though.

MaxStoresPerMemcpy = STI.requiresStrictAlign() ? 4 : 16;?

MaxStoresPerMemcpy = STI.requiresStrictAlign() ? 4 : 16;?

Oy, you mean in the constructor? Got it!

evandro abandoned this revision.May 25 2018, 2:14 PM

There's no need to make these methods virtual and override them in the sub target. Rather, modify the limits when the sub target is initialized, as @eli.friedman suggested.