This is an archive of the discontinued LLVM Phabricator instance.

Make cltz and cttz zero undef when the operand cannot be zero in InstCombine
ClosedPublic

Authored by deadalnix on Aug 3 2016, 12:59 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

deadalnix updated this revision to Diff 66700.Aug 3 2016, 12:59 PM
deadalnix retitled this revision from to Make cltz and cttz zero undef when the operand cannot be zero in InstCombine.
deadalnix updated this object.
deadalnix added reviewers: majnemer, spatel.
deadalnix added a subscriber: llvm-commits.
majnemer edited edge metadata.Aug 3 2016, 1:43 PM

Please split this patch into two changes.

spatel edited edge metadata.Aug 3 2016, 2:15 PM

For reference, the ctpop side of this patch is one of the missing folds noted in PR28668:
https://llvm.org/bugs/show_bug.cgi?id=28668

deadalnix updated this revision to Diff 66724.Aug 3 2016, 2:27 PM
deadalnix edited edge metadata.

Split popcount change in D23139

deadalnix updated this object.Aug 3 2016, 2:27 PM
spatel added a comment.Aug 4 2016, 4:26 PM

Just curious, is there a fold or codegen that doesn't work because of the undef?

lib/Transforms/InstCombine/InstCombineCalls.cpp
1403 ↗(On Diff #66724)

I'm not sure where we draw the line on 'auto', but I think the general preference is to use the actual type (Value *) in cases like this. At the least, this should be 'auto *' according to the coding standards.

1410 ↗(On Diff #66724)

Use 'Op0' since we have it now.

1416–1417 ↗(On Diff #66724)

I don't see anything about 'goto' in the coding standard doc, but I was trained to feel nervous anytime I see one. Make this a static helper or lambda instead?

1421–1438 ↗(On Diff #66724)

How about refactoring this whole case into a helper function since it's nearly identical to the cttz case? That would remove the question about the 'goto' too.

test/Transforms/InstCombine/intrinsics.ll
383 ↗(On Diff #66724)

You can remove the label to reuce the test and the CHECKs.

spatel added inline comments.Aug 4 2016, 4:33 PM
test/Transforms/InstCombine/intrinsics.ll
383 ↗(On Diff #66724)

typo:
"reuce" -> "reduce"

@spatel , the undef version lead to much better codegen on X86 at least, because bsf/bsr instruction are undefined when the operand is 0.

deadalnix updated this revision to Diff 66906.Aug 4 2016, 10:26 PM

Address @spatel comments.

spatel added inline comments.Aug 5 2016, 7:16 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1240–1253 ↗(On Diff #66906)

Can't this function simplify to one line of code?

II->setArgOperand(1, ConstantInt::getTrue(II->getContext()));

Ie, you don't even need to check the current value of the undef operand. If it's already true, setting it again does nothing.

1414–1452 ↗(On Diff #66906)

I was hoping that you could refactor this whole thing, so there's no code duplication.

On 2nd look, I see that the existing fold is actually in the wrong place. Since we're able to fold to a constant, I think it belongs in InstSimplify. Can you fix that or add a 'FIXME' comment?

I'd still prefer that you refactor this ahead of the functional change. If my first comment about makeCttzCtlzZeroUndef() being one line of code is correct, then it's still just one helper function for this whole thing.

Let me know if I'm not seeing this correctly.

test/Transforms/InstCombine/intrinsics.ll
406 ↗(On Diff #66906)

Remove unnecessary label.

deadalnix added inline comments.Aug 5 2016, 10:36 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1240–1253 ↗(On Diff #66906)

No, you want to patch this only once, not every time, or it'll never terminate.

deadalnix added inline comments.Aug 5 2016, 10:39 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1414–1452 ↗(On Diff #66906)

The bitmask manipulation previous this, while following a similar pattern, are different for cttz and ctlz.

This is NOT constant folding in any way. It is making cttz and/or ctlz undefined for 0 when possible as it lead to better codegen, at least on X86. No the function cannot be simplified to a one liner.

deadalnix updated this revision to Diff 66989.Aug 5 2016, 12:04 PM

Remove entry label.

spatel added inline comments.Aug 5 2016, 1:32 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1414–1452 ↗(On Diff #66989)

I've drafted what I tried to suggest here as D23223. Let me know if that makes sense.

deadalnix added inline comments.Aug 5 2016, 1:48 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1414–1452 ↗(On Diff #66989)

I don't think it really makes the code better. There are more branches and names become ambiguous as they can represent 2 things. Also, this is kind of out of the scope of this diff.

spatel added inline comments.Aug 5 2016, 2:05 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1240–1253 ↗(On Diff #66989)

OK, then:

if (match(II->getArgOperand(1), m_Zero()))
  II->setArgOperand(1, ConstantInt::getTrue(II->getContext()));
deadalnix added inline comments.Aug 5 2016, 2:51 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1240–1253 ↗(On Diff #66989)

I'd rather match everything that is NOT true, just in case.

spatel added inline comments.Aug 5 2016, 4:00 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1240–1253 ↗(On Diff #66989)

Sorry - I don't see the difference between the 'm_Zero' version and this?

Note that I checked in the refactoring in rL277883. We have too many LLVM bugs due to code duplication, and this function may continue to grow over time.

deadalnix added inline comments.Aug 8 2016, 12:53 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1240–1253 ↗(On Diff #66989)

Alright, I'll rebase on top of it.

Not all value are true or false, so matching what is NOT true is not the same as matching false.

deadalnix updated this revision to Diff 67132.Aug 8 2016, 1:36 AM

rebase on top of @spatel changes

spatel added inline comments.Aug 8 2016, 7:19 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1283–1296 ↗(On Diff #67132)

Sorry for still not seeing it...can you add test case to show the difference?

AFAICT, the undef param of these intrinsics must be an i1 true/false constant?

case Intrinsic::ctlz:  // llvm.ctlz
case Intrinsic::cttz:  // llvm.cttz
  Assert(isa<ConstantInt>(CS.getArgOperand(1)),
         "is_zero_undef argument of bit counting intrinsics must be a "
         "constant int",
         CS);
deadalnix added inline comments.Aug 8 2016, 8:19 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1283–1296 ↗(On Diff #67132)

I has no idea there was an assertion to make sure of this. I'd still go for the logic "if it is anything but true, make it true" rather than it "if it false, make it true".

I has no idea there was an assertion to make sure of this. I'd still go for the logic "if it is anything but true, make it true" rather than it "if it false, make it true".

I think that's fair - how about this:

if (KnownOne != 0 || isKnownNonZero(Op0, IC.getDataLayout()))
  if (!match(II.getArgOperand(1), m_One()))
    II.setArgOperand(1, ConstantInt::getTrue(II.getContext()));

But I think we need a comment to explain the transform and motivation. Something like

// If the input to cttz/ctlz is known to be non-zero, 
// then change the 'ZeroIsUndef' parameter to 'true'
// because we know the zero behavior can't affect the result.
deadalnix updated this revision to Diff 67463.Aug 9 2016, 10:26 PM

Use match and add comment that explain the transformation.

majnemer added inline comments.Aug 9 2016, 11:37 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1143 ↗(On Diff #67463)

I'd use IRBuilder's getTrue here.

deadalnix updated this revision to Diff 67479.Aug 10 2016, 1:08 AM

Use getTrue

spatel added inline comments.Aug 10 2016, 1:26 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1141 ↗(On Diff #67479)

Shouldn't this return the updated instruction to signal to the callers that a change was made? If we do that, then we don't need to explicitly add to the worklist either, right?

deadalnix added inline comments.Aug 11 2016, 2:23 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1141 ↗(On Diff #67479)

No as we don't want to go though replacing the instruction by itself and we especially don't want to replace false by true in the program.

No as we don't want to go though replacing the instruction by itself and we especially don't want to replace false by true in the program.

I don't understand the 2nd comment. For the first, I'm not suggesting that you replace the instruction by itself.
Looking around InstCombine, the common pattern I see is:

I.setOperand(1, X);
return &I;

This adheres to the 'visit' function comments:

// Visitation implementation - Implement instruction combining for different
// instruction types.  The semantics are as follows:
// Return Value:
//    null        - No change was made
//     I          - Change was made, I is still valid, I may be dead though
//   otherwise    - Change was made, replace I with returned instruction

I think we'd have to change the helper function signature to return an Instruction* for this to work, so I'll defer to @majnemer on whether the code here is an acceptable substitute. I have no further comments for this patch.

deadalnix updated this revision to Diff 67974.Aug 14 2016, 2:37 AM

Return the instruction instead of putting it in the worklist manually.

spatel added inline comments.Aug 15 2016, 8:11 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1473–1475 ↗(On Diff #67974)

In the case where no change is made, we want to continue on in this function:

if (Instruction *I = foldCttzCtlz(*II, *this))
  return I;
break;
deadalnix updated this revision to Diff 68348.Aug 17 2016, 7:34 AM

Continue if no change is made.

spatel accepted this revision.Aug 17 2016, 7:50 AM
spatel edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Aug 17 2016, 7:50 AM
This revision was automatically updated to reflect the committed changes.