This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Move AVG combine to SimplifyDemandBits
ClosedPublic

Authored by dmgreen on Feb 6 2022, 1:34 AM.

Details

Summary

Pulled out of D106237, this moves the matching of AVGFloor and AVGCeil into a place where demand bit are available, so that it can detect more cases for more folds. It changes the transform to start from a shift, not from a truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming to ext(hadd(A, B)).

For signed values, because only the bottom bits are demanded llvm will transform the above to use a lshr too, as opposed to ashr. In order to correctly detect the hadd we need to know the demanded bits to turn it back. Depending on whether the shift is signed (ashr) or logical (lshr), and the extensions are signed or unsigned we can create different nodes.
If the shift is signed:
Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd.
Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd.
If the shift is unsigned:
Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd.
Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 6 2022, 1:34 AM
dmgreen requested review of this revision.Feb 6 2022, 1:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2022, 1:34 AM
RKSimon added inline comments.Feb 6 2022, 3:11 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
914

DemandedElts?

dmgreen updated this revision to Diff 406284.Feb 6 2022, 2:23 PM

Now passes through demanded elts.

You've made this a blocker to some other patches - is it necessary or just nice to have?

llvm/test/CodeGen/AArch64/arm64-vhadd.ll
1197

regression?

You've made this a blocker to some other patches - is it necessary or just nice to have?

No I don't think so - same as the comment in https://reviews.llvm.org/D119073#3308941, it should be OK in the other order, this is just the order I happened to have it locally, so the tests might be different. This gives better codegen, so I would like to get it in.

llvm/test/CodeGen/AArch64/arm64-vhadd.ll
1197

The usra is generally a more expensive operation, that gets split into a shift and an add on the CPU's I looked at. It will be larger for codesize, but shouldn't be slower.

I left the example in because it shows some of the potential disadvantage with using sign/known bits. It's better in general, but there may be places with extra extends.

dmgreen updated this revision to Diff 407986.Feb 11 2022, 12:27 PM

Rebase, now that the base patches are in.

RKSimon accepted this revision.Feb 13 2022, 2:55 AM

LGTM

This revision is now accepted and ready to land.Feb 13 2022, 2:55 AM
This revision was landed with ongoing or failed builds.Feb 15 2022, 2:17 AM
This revision was automatically updated to reflect the committed changes.