Page MenuHomePhabricator

[DAGCombiner] allow truncation of binops after legalization if desirable
Needs ReviewPublic

Authored by spatel on Feb 21 2019, 10:58 AM.

Details

Summary

This is an alternative to D58210 that achieves similar results for SystemZ. I'm proposing to extend the general (trunc (binop X, Y)) transform by using the existing TLI hook isTypeDesirableForOp() if we are post-legalization. This allows eliminating the similar fold in distributeTruncateThroughAnd() that is oddly constrained by starting the pattern match only from shift/rotate.

About the test diffs:

  1. AMDGPU: 'widen-smrd-loads' is an improvement and 'idot' diffs are regressions?
  2. PowerPC: neutral
  3. SystemZ: improvements or neutral
  4. x86: mostly neutral, improvements with 'shld/shrd', and regressions for 'vector-sext'.

I did look at the x86 'vector-sext' regressions, and the seemingly unnecessary 'movzbl' are being inserted by an isel pattern because:

// anyext. Define these to do an explicit zero-extend to
// avoid partial-register updates.

So that conflicts with the x86 setting that says 8-bit ops are desirable. Ideally, we would defer partial-reg optimizations to a later pass (and I know we already do this for some cases, so maybe that just needs to be adjusted a bit).

Diff Detail

Event Timeline

spatel created this revision.Feb 21 2019, 10:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 10:58 AM

IIRC the idot patterns are looking for very specific patterns and avoiding commuting to save compile time, so maybe those just need to be updated?

nikic added a subscriber: nikic.Feb 24 2019, 6:33 AM
jonpa added a comment.Feb 25 2019, 9:13 AM

I tried this patch on SystemZ / SPEC, and as before this seems to have a relatively very minor impact on the number of files changed (7), and on the performance (seemingly unaffected).

I think the SystemZ test looks good, but I leave the final approval to Uli as usual.

I tried this patch on SystemZ / SPEC, and as before this seems to have a relatively very minor impact on the number of files changed (7), and on the performance (seemingly unaffected).

I think the SystemZ test looks good, but I leave the final approval to Uli as usual.

Ah, the SystemZ test changes certainly LGTM, and would be a welcome improvement ...

Ping.

AMDGPU comment/update on the idot tests?

Not sure what we want to do about x86 (D58703), but it's probably not important enough to hold up the general improvement?

IIRC the idot patterns are looking for very specific patterns and avoiding commuting to save compile time, so maybe those just need to be updated?

They can be updated, but with the patch these tests produced considerably more code.

llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll
237

In fact this is regression as well. A scalar operation (s_) is preferable over vector (v_).

please can you rebase this?

spatel updated this revision to Diff 218393.Mon, Sep 2, 2:48 PM

Patch updated:
x86 vector sext is no longer a problem; AMDGPU appears to still show regressions.