This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering] improve the default expansion of uaddsat/usubsat
ClosedPublic

Authored by spatel on Mar 6 2019, 6:10 PM.

Details

Summary

This is an alternative to D59006 that achieves identical results for x86, and also makes an improvement for AArch64. The logic is pushing the limits of target-independence, but this is what it takes to not induce any regressions for x86, and there's no harm to AArch...so better all-around?

Normally, we'd try to improve the generic combines underlying the sub-optimal output that we see in the test diffs, but that did not look easy/possible for the cases I looked at. For example, the AArch bic/bic appears to be missed because one of those is a generic 'not' op, but the other is already a BIC node.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Mar 6 2019, 6:10 PM
spatel updated this revision to Diff 189871.Mar 8 2019, 9:31 AM

This is really 2 independent changes:

  1. The changes to min/max expansion are what helps x86.
  2. The changes to UADDO/USUBO expansion are what helps AArch.

The 1st part may be over-the-line, but the 2nd is non-controversial AFAICT, so let's just try that hunk for now?

(It's really just the UADDO side that affects the AArch tests, but keep both changes for symmetry.)

The AArch-specific improvements in D59187 / D59174 seem like good patches, but even without those test diffs, do we still want this general optimization, so other targets are not similarly impacted? (We don't have these tests for any other in-tree targets AFAIK.)

@spatel D59174 is not AArch64 specific and covers what this patch does, but only for the case where VSELECT is expanded. An alternative would be to add a combine for vselect zero_or_neg_one, neg_one, x -> zero_or_neg_one | x and vselect zero_or_neg_one, 0, x -> ~zero_or_neg_one & x, though it's arguable whether the latter is an improvement in general.

Also okay with what this patch does, but the other test changes from D59174 show that the pattern can also occur in other cases, so we might want to handle it more generally.

@spatel D59174 is not AArch64 specific and covers what this patch does, but only for the case where VSELECT is expanded. An alternative would be to add a combine for vselect zero_or_neg_one, neg_one, x -> zero_or_neg_one | x and vselect zero_or_neg_one, 0, x -> ~zero_or_neg_one & x, though it's arguable whether the latter is an improvement in general.

Ah, sorry - I missed the x86 diff there the first time I looked. So yes, the benefit of that patch seems clear.

And yes, the pattern with 'and-not' isn't clearly a win unless the target has that op as a single instruction, but it seems common enough that we would choose that as the optimized form here in the legalizer. So I'd go with all of these patches to try to make sure that we get optimized code for the intrinsics.

lebedev.ri accepted this revision.Mar 17 2019, 2:18 AM

Looks good to me.

This revision is now accepted and ready to land.Mar 17 2019, 2:18 AM
This revision was automatically updated to reflect the committed changes.