This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] allow truncation of binops after legalization if desirable
AbandonedPublic

Authored by spatel on Feb 21 2019, 10:58 AM.

Details

Summary

This is an alternative to D58210 that achieves similar results for SystemZ. I'm proposing to extend the general (trunc (binop X, Y)) transform by using the existing TLI hook isTypeDesirableForOp() if we are post-legalization. This allows eliminating the similar fold in distributeTruncateThroughAnd() that is oddly constrained by starting the pattern match only from shift/rotate.

About the test diffs:

  1. AMDGPU: 'widen-smrd-loads' is an improvement and 'idot' diffs are regressions?
  2. PowerPC: neutral
  3. SystemZ: improvements or neutral
  4. x86: mostly neutral, improvements with 'shld/shrd', and regressions for 'vector-sext'.

I did look at the x86 'vector-sext' regressions, and the seemingly unnecessary 'movzbl' are being inserted by an isel pattern because:

// anyext. Define these to do an explicit zero-extend to
// avoid partial-register updates.

So that conflicts with the x86 setting that says 8-bit ops are desirable. Ideally, we would defer partial-reg optimizations to a later pass (and I know we already do this for some cases, so maybe that just needs to be adjusted a bit).

Diff Detail

Event Timeline

spatel created this revision.Feb 21 2019, 10:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 10:58 AM

IIRC the idot patterns are looking for very specific patterns and avoiding commuting to save compile time, so maybe those just need to be updated?

nikic added a subscriber: nikic.Feb 24 2019, 6:33 AM
jonpa added a comment.Feb 25 2019, 9:13 AM

I tried this patch on SystemZ / SPEC, and as before this seems to have a relatively very minor impact on the number of files changed (7), and on the performance (seemingly unaffected).

I think the SystemZ test looks good, but I leave the final approval to Uli as usual.

I tried this patch on SystemZ / SPEC, and as before this seems to have a relatively very minor impact on the number of files changed (7), and on the performance (seemingly unaffected).

I think the SystemZ test looks good, but I leave the final approval to Uli as usual.

Ah, the SystemZ test changes certainly LGTM, and would be a welcome improvement ...

Ping.

AMDGPU comment/update on the idot tests?

Not sure what we want to do about x86 (D58703), but it's probably not important enough to hold up the general improvement?

IIRC the idot patterns are looking for very specific patterns and avoiding commuting to save compile time, so maybe those just need to be updated?

They can be updated, but with the patch these tests produced considerably more code.

llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll
237

In fact this is regression as well. A scalar operation (s_) is preferable over vector (v_).

please can you rebase this?

spatel updated this revision to Diff 218393.Sep 2 2019, 2:48 PM

Patch updated:
x86 vector sext is no longer a problem; AMDGPU appears to still show regressions.

arsenm added inline comments.Feb 13 2020, 4:57 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10842

Due to the lack of register bank awareness, the AMDGPU answer depends on whether the source node is divergent. This needs to pass the node itself, not just the opcode and type to answer properly

@spatel Is this necessary any more?

@spatel Is this necessary any more?

The SystemZ code hasn't changed, so the code is still not ideal for that target. I'm not sure what we can do about the remaining regressions. I'll rebase, so we at least have an updated view of test diffs

spatel updated this revision to Diff 333935.Mar 29 2021, 11:07 AM

Rebased (regenerated test diffs).

Re x86 - do we really want to narrow all the way to bytes?
I was under impression that wasn't a good idea.

Re x86 - do we really want to narrow all the way to bytes?
I was under impression that wasn't a good idea.

It's probably not a good idea in general, but the diffs I scanned here are all feeding into 8-bit op/store, so should be neutral.
This patch is trying to remove code and help SystemZ without hurting anything else, but maybe it's not worth the churn. We could probably add a TLI hook instead.

RKSimon added inline comments.Mar 31 2021, 2:52 AM
llvm/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll
207

This file needs (manually) regenerating so the codegen diffs are more obvious

Matt added a subscriber: Matt.Apr 13 2021, 8:07 AM
RKSimon requested changes to this revision.Apr 15 2021, 3:36 AM

Needs rebasing again and cgp-bitfield-extract.ll cleaning up to show the diffs

This revision now requires changes to proceed.Apr 15 2021, 3:36 AM

Is this still relevant?

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:49 PM
spatel abandoned this revision.Sep 28 2022, 2:30 PM

Is this still relevant?

Sorry - I lost track of this patch. It was trying to help SystemZ, but it got stuck on the regressions for AMDGPU, and I don't think there was a quick fix. I don't have immediate plans to work on this, so I'll abandon. Could try a SystemZ-specific transform instead?