With strict alignment is required, limit excessive number of byte loads and stores when inlining memset() and memcpy().
This will prevent the test case below from failing with D45098.
Differential D47349
[AArch64] Limit inlining string functions with strict alignment evandro on May 24 2018, 4:39 PM. Authored by
Details
With strict alignment is required, limit excessive number of byte loads and stores when inlining memset() and memcpy(). This will prevent the test case below from failing with D45098.
Diff Detail
Event TimelineComment Actions If I'm following correctly, the problem here is that we don't have paired load/store operations for byte operations, so the threshold is roughly double what it should be. StrictAlign isn't really a good proxy for that; it doesn't have anything to do with whether a particular memcpy will lower to paired operations. Ideally, we should probably come up with some callback to give a bonus to paired operations rather than penalize all operations in StrictAlign mode. That said, just checking StrictAlign might be a good enough approximation for now; most code isn't built with strict alignment. Needs an comment explaining what you're doing, though. Given you're just checking the subtarget, I don't think you need to make getMaxStoresPerMemcpy virtual; you can just modify AArch64TargetLowering::AArch64TargetLowering. Comment Actions The problem is less the lack of paired byte wide loads and stores than having to use byte wide loads and stores.
OK
The sub-target doesn't exist in the base class though. Comment Actions The AArch64Subtarget is explicitly passed as a parameter to AArch64TargetLowering::AArch64TargetLowering. Comment Actions But getMaxStoresPerMemcpy() is called in SelectionDAG as TargetLowering::getMaxStoresPerMemcpy(). So, unless it's virtual, the latter, instead of AArch64TargetLowering::getMaxStoresPerMemcpy(), will be called. Comment Actions I meant that you can change the value of MaxStoresPerMemcpy in AArch64TargetLowering::AArch64TargetLowering, instead of overriding getMaxStoresPerMemcpy. Comment Actions If you mean decreasing the number of MaxStoresPerMemcpy, then that would restrict the inlining of memcpy() without -mattr=+strict-align though. Comment Actions There's no need to make these methods virtual and override them in the sub target. Rather, modify the limits when the sub target is initialized, as @eli.friedman suggested. |