This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Add vector demanded elements support to computeKnownBits
ClosedPublic

Authored by RKSimon on Oct 17 2016, 12:16 PM.

Details

Summary

Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

I could add DemandedElts support to computeKnownBitsForTargetNode in this patch if required (this could be useful for x86 target shuffles...).

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 74885.Oct 17 2016, 12:16 PM
RKSimon retitled this revision from to [DAGCombiner] Add vector demanded elements support to computeKnownBits.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.

Maybe it would make more sense to do this in the IR version of computeKnownBits first? We should prefer to perform optimizations on IR where possible.

I think this approach makes sense; we don't want to track each individual bit of a very wide vector.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2101

Need to check that SubIdx is less than the number of elements in the vector.

2549

You need to check whether ConstEltNo is less than the number of elements in the vector.

RKSimon updated this revision to Diff 74961.Oct 18 2016, 3:12 AM

Added the asserts suggested by Eli.

Improved VECTOR_SHUFFLE demanded elts handling - as long as the shuffle mask element isn't UNDEF then we can use the input demanded elts. This might be too pessimistic and we should be able to ignore UNDEFs as long as we don't need the element.

I'll investigate the InstCombine version - I don't know as much about the value tracking code so it might take a little longer.

efriedma added inline comments.Oct 18 2016, 11:24 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2550

Are your sure it isn't possible to trigger this assertion? I think DAGCombine will delete an extractelement with an out-of-range index, but you can't depend on that happening before this code runs.

RKSimon added inline comments.Oct 18 2016, 3:54 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2550

OK thanks - I misunderstood your request for a check - I'll fix it.

zvi added a subscriber: zvi.Oct 18 2016, 11:06 PM
RKSimon updated this revision to Diff 75123.Oct 19 2016, 4:17 AM

Improved index checking for EXTRACT_SUBVECTOR and EXTRACT_VECTOR_ELT.

Use APInt::[] bit getter instead of doing it manually.

Use APInt::setBit bit setter instead of doing it manually.

zvi added a comment.Oct 23 2016, 5:59 AM

I'll investigate the InstCombine version - I don't know as much about the value tracking code so it might take a little longer.

At the moment, instcombine can't handle cases such as:

define <4 x i32> @all_zeros_1(i32 %a) {
  %u = insertelement <4 x i32> <i32 undef, i32 0, i32 0, i32 0>, i32 %a, i32 0
  %l = and <4 x i32> %u, <i32 0, i32 -1, i32 -1, i32 -1>
  ret <4 x i32> %l
}

or

define <4 x i32> @redundant_and(i32 %a) {
  %u = insertelement <4 x i32> <i32 undef, i32 0, i32 0, i32 0>, i32 %a, i32 0
  %l = and <4 x i32> %u, <i32 -1, i32 0, i32 0, i32 0>
  ret <4 x i32> %l
}

It would be nice if both instcombine and dagcombine shared the same DemandedBits infrastructure, though i don't have a plan to achieve this.

In D25691#577325, @zvi wrote:

It would be nice if both instcombine and dagcombine shared the same DemandedBits infrastructure, though i don't have a plan to achieve this.

Agreed, we should be trying to add similar features were possible. After this patch and we've matured this a little further on the DAGCombiner side implementing an InstCombine equivalent should be quite a bit quicker.

bjope added a subscriber: bjope.Oct 24 2016, 12:50 PM
bjope added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2057

Shouldn't we remove this TODO now? Or is that a different idea about the possibility to fetch a set of individual known bits for each element in one request?
If originating value is a vector, then the caller could call computeKnownBits for one demanded element at a time. Doing it for all elements in one go will be compilcated since the recursion could be different for different elements.

2530

I guess this comment needs to be revised, or maybe removed.
And at least the TODO should be removed.

2554

I suggest that you explicitly show that a decision was made to not track a specific element here. And maybe more importantly show that we can not use the original DemandedElts in the recursive call by selecting a new bit mask.
This could be done by making the recursion like this:

computeKnownBits(InVec, KnownZero, KnownOne, APInt::getAllOnesValue(VecVT.getVectorNumElements()), Depth + 1);
2561

Any reason for not providing the DemandedElts in this recursive call?

bjope added inline comments.Oct 24 2016, 1:03 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2561

Sorry, I just realised that BSWAP wasn't the only operation not providing DemandedElts in the recursive call.
So I have no special interest in BSWAP. But I guess that there is a general TODO to pass DemandedElts in many more recursive calls.

RKSimon added inline comments.Oct 24 2016, 1:19 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2057

What that TODO is talking about is returning separate KnownZero and KnownOne values for every vector element and not combining them down to the 'shared' bits. Something that could end up requiring a significant refactor of this function and every caller.

This TODO was written when I still thought that would be necessary, but I haven't been able to make a strong enough case for that approach yet and just using the shared bits of demanded elts might be satisfactory for the cases I have so far.

2530

No problem.

2554

This is what the computeKnownBits without the DemandedElts argument now does. I'll tweak the comments to make this clearer.

2561

As I mentioned in the summary, I've only added support for a few opcodes so far (the ones that have proven straightforward to test). The hope is that others can be added as they are required. I'd prefer to just add a TODO to the top of the function and to every outstanding opcode, unless you think it necessary?

RKSimon updated this revision to Diff 75641.Oct 24 2016, 1:27 PM

Updated based on Bjorn's comments

bjope added a comment.Oct 25 2016, 1:41 AM

No further comments from me.

But I agree with earlier comments that it could be nice if the computeKnownBits (and probably also computeNumSignBits) in both InstCombine and SelectionDAG could share same infrastructure. I guess that sometimes we want to do similar updates to those methods, and that is probably easier if they at least are kept fairly in sync about how they work and what operations they can handle.
But changing everything in one commit is not so wise either. So this commit looks like a good first step when it comes to adding support for "demended elts".

bjope accepted this revision.Oct 27 2016, 6:59 AM
bjope added a reviewer: bjope.
bjope edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 27 2016, 7:01 AM
This revision was automatically updated to reflect the committed changes.