This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering] Add SimplifyMultipleUseDemandedBits
ClosedPublic

Authored by RKSimon on Jun 13 2019, 10:02 AM.

Details

Summary

This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

  • AMDGPU unsigned dot product codegen retains an AND mask that it previously removed (but otherwise the dotproduct codegen is a lot better)
  • X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG

Both these cases look like they can be fixed up afterwards but I'd like to get code owner's feedback first.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jun 13 2019, 10:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 10:02 AM
nikic added inline comments.Jun 15 2019, 3:41 AM
lib/CodeGen/SelectionDAG/TargetLowering.cpp
664 ↗(On Diff #204567)

Why are there explicit SimplifyMultipleUseDemandedBits() calls for individual opcodes, rather than a general call in the !hasOneUse branch here? This is what we do in InstCombine.

RKSimon marked an inline comment as done.Jun 15 2019, 4:01 AM
RKSimon added inline comments.
lib/CodeGen/SelectionDAG/TargetLowering.cpp
664 ↗(On Diff #204567)

Because we need to regenerate a single op - afaict we'd have to refactor the entire SimplifyDemandedBits / TargetLoweringOpt system to enable us to replace individual args of a single node.

I'm ok with the X86 changes if we can fix them after.

@arsenm @nhaehnle Please can you take a look at the AMDGPU changes? @craig.topper is happy with the X86 changes (and temp regression) and I'm keen to get this + a couple of followups in before the release branch.

The AMDGPU changes seem fine to me overall.

@craig.topper Are OK with me committing this? I can then work on the remaining regressions

@craig.topper Are OK with me committing this? I can then work on the remaining regressions

Yes

This revision was not accepted when it landed; it landed in state Needs Review.Jul 23 2019, 5:42 AM
This revision was automatically updated to reflect the committed changes.