Allow to reduce redundant shift masks
OR = x1 | x2, where
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
The x2 operation can be seen as
x2 = (x >> 8) & (0xAB00 >> 8)
x2 = (x & 0xAB00) >> 8
And finally reduced to
x2 = x1 >> 8
dnsampaio on Jul 12 2018, 5:44 AM.Authored by
There are a very large number of changes, so older changes are hidden. Show Older Changes
Fixed execution costs. See below.
Sorry, I thought that it being an entire function it wouldn't matter. Lesson learned.
The problem with this canonicalization process is that, it is not certainly reducing code-size. It might be profitable in certain cases, and problematic in others. I won't take place in the decision if the normalization should be changed, as I do not have a strong opinion about it.
About using the value uses in InstCombine. Well, it wouldn't be the 1st transformation to do it. The idea to start matching all the way from the bottom, like from the "or" operation and match both sides of it could be a solution. But is much less generic. It would require to be called in all binary operation as starting point as to cover the same cases of this one.
So what do you think is the best?
Use InstCombine and start all the way from the bottom? Should it be added to all binary operations or do I leave it just in the 'or' operation?
Or leave as it is, apart from the corrections just described?
All of your test cases are rooted at an or, so it makes sense to search up from there. Why not start with searching just from or (and xors?) and then add the search from more operators in later patches?
Did most fixes. Just don't how to capture non-leaf nodes of the pattern being matched. Using other match operations would actually be more complicated than just passing the operands as arguments to the new function, now that I already know they are AND operations due the function call placement.
Moved to a separated function. Placed function call after knowing more about the operands.
As we are replacing all the uses of the DeadAnd, it is not required to create and replace the visiting Or operation. Replace all uses does the job already.
Because the pattern that we are matching is larger than it needs to be (as the comment in the test file clearly shows - there is no 'or' in the minimal pattern). This problem of trying to make everything fit in instcombine has been discussed several times on llvm-dev in the last ~year. Eg:
Did you look at (new)-GVN to see if it fits in there?
If this is in instcombine (in addition to missing the pattern when there is no 'or'), I think you have to limit the transform based on uses as Roman mentioned in an earlier comment.
So this started life in the DAGCombiner and issues around the implementation were raised and that it would be useful to have earlier in the pipeline. But it seems that it hasn't really be thought, or discussion, about how this would fit well in the existing passes... I think DAG combine has always been the right place for this because we're trying to reuse values - something that DAGs are good for. In DAGCombiner::visitANDLike, we already handle ANDs with SRL operands and the motivating example can be addressed with very little effort:
+ uint32_t ShiftedMask = CAnd->getZExtValue() << ShiftBits; + SDNode *Res = + DAG.UpdateNodeOperands(N, N0->getOperand(0), DAG.getConstant(ShiftedMask, DL, VT)); + if (Res != N) + return DAG.getNode(ISD::SRL, DL, VT, SDValue(Res, 0), N0->getOperand(1)); + else + return SDValue(DAG.UpdateNodeOperands(N, N0, SDValue(CAnd, 0)), 0);
I don't know the cost of calling UpdateNodeOperands, potentially twice, but I feel the simplicity suggests that the transform is most suited to the DAG.
I agree that it won't handle all cases. But one will need to come with a more generic thinking as to create a new pass that handles this. Something like an abstract known bits, that tells that two values hold the same bits coming from a given instruction, or some simplification by demanded bits from the same values. It is feasible, but it is not my intention to do it so now.
I must confess that I did not quite understand all the work-flow of newGVN, but from what I did see, it mostly wouldn't fit. It seems to behave like InstCombine, expecting to replace the current instruction being visited. And it would require to create one value as to detect if there is a leader of that value and then reuse it. It is not that complicated, but quite awkward IMO.
Why? The @mulUses in the test shows the case where it does not require %x2 to have a single user. The shift operation dominates the and being replaced.
And I agree with @samparker, our motivating example is quite simple, is the @foo in our tests. I just intended to made it more generic so it would act in more shift operations, not to over complicate it. I acknowledge that doing the transformation in the IR could be good, but the DAG fits much simpler. So I really see no problems in having it both places (as in D48278). Either way, it would be nice to come to a definitive solution.
As it can be seen from the disscussion here, while the problem this is trying to solve is real,