This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
11/11
InstCombineCasts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
10/10
zext-ctlz-trunc-to-ctlz-add.ll

Differential D103788

[InstCombine] Eliminate casts to optimize ctlz operation
ClosedPublic

Authored by datta.nagraj on Jun 6 2021, 11:53 PM.

Download Raw Diff

Details

Reviewers

RKSimon
xbolva00
spatel
lebedev.ri

Commits

rGad0085d3381a: [InstCombine] Eliminate casts to optimize ctlz operation

Summary

If a ctlz operation is performed on higher datatype and then
downcasted, then this can be optimized by doing a ctlz operation
on a lower datatype and adding the difference bitsize to the result
of ctlz to provide the same output:

https://alive2.llvm.org/ce/z/8uup9M

The original problem is shown in
https://llvm.org/PR50173

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

datta.nagraj created this revision.Jun 6 2021, 11:53 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 6 2021, 11:53 PM

datta.nagraj requested review of this revision.Jun 6 2021, 11:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2021, 11:53 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

datta.nagraj edited reviewers, added: RKSimon, xbolva00; removed: • simon, • david.Jun 6 2021, 11:55 PM

RKSimon added reviewers: lebedev.ri, spatel.Jun 7 2021, 1:17 AM

RKSimon added inline comments.

llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll
11	Drop the ;
13	please can you use descriptive test names - @src0 etc. don't give themselves to searching (or when variant tests get added in the middle of the file...). @trunc_ctlz_zext_i32 etc. would be better
71	We need vector type coverage as well - at least fixed vectors, scalable vectors as well would be a bonus declare <2 x i32> @llvm.ctlz.v2i32 (<2 x i32>, i1) declare <vscale x 2 x i32> @llvm.ctlz.v2i32 (<vscale x 2 x i32>, i1)

RKSimon added inline comments.Jun 7 2021, 1:18 AM

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
954	We will need to add multiuse tests as well to check that that the m_OneUse is working correctly

Harbormaster completed remote builds in B107911: Diff 350181.Jun 7 2021, 1:22 AM

datta.nagraj updated this revision to Diff 350236.Jun 7 2021, 4:04 AM

Updated the unit tests

Added more tests as per review comments.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
954	Done. Added 3 tests at the end which have multiple uses.
llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll
71	Done. Added tests with vector and scalable vectors.
71	Done. Added tests with vector and scalable vectors.

datta.nagraj marked 2 inline comments as done.Jun 7 2021, 4:09 AM

Harbormaster completed remote builds in B107952: Diff 350236.Jun 7 2021, 4:37 AM

@RKSimon I have addressed the review comments Sir, please have a look.

spatel added inline comments.Jun 7 2021, 11:55 AM

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
959	We usually prefer to do something like: Value *NarrowCtlz = Builder.CreateIntrinsic(...); return BinaryOperator::CreateAdd(NarrowCtlz, WidthDiff); The instcombine caller function then handles the replace uses and transfers the name of the existing value to the new value, so you don't have to do that explicitly. You should see a cosmetic (but not functional) difference if you regenerate the CHECK lines in the test files with that change.

Remove replaceAllUses manually, and let LLVM do it

datta.nagraj marked an inline comment as done.Jun 7 2021, 10:07 PM

datta.nagraj added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
959	Done. Actually, I had refered the visitAdd function to see how to match intrinsic patterns, and there they were using the replaceInstUsesWith, but the approach you suggested looks clean. Made the changes.

datta.nagraj marked an inline comment as done.Jun 7 2021, 10:08 PM

Harbormaster completed remote builds in B108128: Diff 350489.Jun 7 2021, 10:23 PM

Clang Format

Harbormaster completed remote builds in B108136: Diff 350498.Jun 7 2021, 11:10 PM

spatel added inline comments.Jun 8 2021, 10:41 AM

llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll
53	There are a lot of tests here that don't provide much extra coverage. These are only changing the types to confirm that we get the add constant correct? I think we can verify that with 2 tests of varying types (including a vector type), but we can do better by using a weird type (for example 3 x i17). We can also get more value by varying the 2nd parameter (make it `true` half of the time?).
215	Typically, we make the extra use minimal and more plausible by adding a `call void @use(i32 %p)` instruction. Also, what happens if the zext has an extra use?

vdsered added a subscriber: vdsered.Jun 8 2021, 7:28 PM

Updated unit test and added condition for multiple use check for zext

@spatel Addressed review comments.

llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll
53	Removed redundant tests. (Yes, they were checking for the add constant) Added weird type tests. Varied the 2nd parameter as true for half of them.
215	Made use of the function call. That's a nice idea, didn't think of that. Added condition check for extra use of zext and added a test case for that as well.

datta.nagraj marked 2 inline comments as done.Jun 8 2021, 10:19 PM

Harbormaster completed remote builds in B108336: Diff 350779.Jun 8 2021, 10:54 PM

spatel mentioned this in rG9eef6e39816a: [InstCombine] add tests for casts-around-ctlz; NFC.Jun 9 2021, 8:24 AM

spatel added inline comments.Jun 9 2021, 8:41 AM

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
955	I don't think we need this one-use check on the zext. Usually, we say that if we are not increasing the instruction count, but we are reducing the sequence of computation, it's ok to do the transform. This will be easier to see if we commit the tests with the baseline CHECKs, so I pushed those to main: 9eef6e39816a Please rebase and regenerate the CHECK lines (and see if it's ok to remove the 2nd one-use check).

I think this is being approached from the wrong angle.
You currently transform trunc (ctlz(zext(A))) --> add(ctlz(A), (bitwidth(zext(A))-bitwidth(A)),
but why does that trunc matter?
Wouldn't it make more sense to view this as ctlz(zext(A)) --> add(ctlz(A), (bitwidth(zext(A))-bitwidth(A))?

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
954	Are you not missing a check that types of `A` and `Trunc` match?

This revision now requires changes to proceed.Jun 9 2021, 9:31 AM

@lebedev.ri In this opt: ctlz(zext(A)) --> add(ctlz(A), (bitwidth(zext(A))-bitwidth(A))

I think we require to look at the trunc because, the ctlz in the LHS is operating on higher datatype and the one on RHS is operating on a lower datatype.
If we pass the output of RHS to trunc, that would be wrong, as the datatype of ADD operation and trunc operation would be same. So, we do need to remove the trunc, isn't it.

Am I missing something here, please suggest?

Right, it of course should be https://alive2.llvm.org/ce/z/wuRBBs

In D103788#2810049, @lebedev.ri wrote:

Right, it of course should be https://alive2.llvm.org/ce/z/wuRBBs

Note that this only works if the scalar bit width of original un-extended %x is at least 6: https://alive2.llvm.org/ce/z/5ThC64
so i think we actually want to do https://alive2.llvm.org/ce/z/ZtZm4w

@lebedev.ri Sir, but if we don't take trunc into consideration, then how are we optimizing this?
Earlier it was 2 insts (zext - ctlz), and now its 3 insts(ctlz - add - zext). I agree that the ctlz , add are done on a lower datatype.
If we consider trunc as well, then we can convert 3 insts (zext - ctlz - trunc) to 2 insts (ctlz - add).
Shouldn't we do this opt only when the trunc is present?

In D103788#2810074, @datta.nagraj wrote:

@lebedev.ri Sir, but if we don't take trunc into consideration, then how are we optimizing this?
Earlier it was 2 insts (zext - ctlz), and now its 3 insts(ctlz - add - zext). I agree that the ctlz , add are done on a lower datatype.
If we consider trunc as well, then we can convert 3 insts (zext - ctlz - trunc) to 2 insts (ctlz - add).
Shouldn't we do this opt only when the trunc is present?

In general - yes, in instcombine we should not increase the instruction count,
however as it was already hinted by @spatel, this seems like a rare edge case
where we should be okay with that. (unless @spatel disagrees?)

In D103788#2810167, @lebedev.ri wrote:

In D103788#2810074, @datta.nagraj wrote:

@lebedev.ri Sir, but if we don't take trunc into consideration, then how are we optimizing this?
Earlier it was 2 insts (zext - ctlz), and now its 3 insts(ctlz - add - zext). I agree that the ctlz , add are done on a lower datatype.
If we consider trunc as well, then we can convert 3 insts (zext - ctlz - trunc) to 2 insts (ctlz - add).
Shouldn't we do this opt only when the trunc is present?

In general - yes, in instcombine we should not increase the instruction count,
however as it was already hinted by @spatel, this seems like a rare edge case
where we should be okay with that. (unless @spatel disagrees?)

This could go either way, but I'd prefer that we have this more conventional (don't increase instructions) transform as a first step. If there's evidence that we benefit from the more general (no trunc required) transform, then we can always follow-up.
Definitely agree that we need to check for matching source and destination types to make this patch sound (and add a negative test to verify that).

Check if size of zext src and trunc dst are same and above 5 bits

datta.nagraj marked 2 inline comments as done.Jun 10 2021, 8:12 AM

datta.nagraj added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
954	Added the check and added a negative test for that as well.
955	Rebased on top of your change. Yes, seems that we don't need the second check. The test looks fine even without the second check.

datta.nagraj marked 2 inline comments as done.Jun 10 2021, 8:12 AM

In D103788#2810270, @spatel wrote:

In D103788#2810167, @lebedev.ri wrote:

In D103788#2810074, @datta.nagraj wrote:

@lebedev.ri Sir, but if we don't take trunc into consideration, then how are we optimizing this?
Earlier it was 2 insts (zext - ctlz), and now its 3 insts(ctlz - add - zext). I agree that the ctlz , add are done on a lower datatype.
If we consider trunc as well, then we can convert 3 insts (zext - ctlz - trunc) to 2 insts (ctlz - add).
Shouldn't we do this opt only when the trunc is present?

In general - yes, in instcombine we should not increase the instruction count,
however as it was already hinted by @spatel, this seems like a rare edge case
where we should be okay with that. (unless @spatel disagrees?)

This could go either way, but I'd prefer that we have this more conventional (don't increase instructions) transform as a first step. If there's evidence that we benefit from the more general (no trunc required) transform, then we can always follow-up.
Definitely agree that we need to check for matching source and destination types to make this patch sound (and add a negative test to verify that).

Added a check to see that the source and destination types match. Added the negative test as well to verify that.
Also added a check to do this opt only for width more than 5 as @lebedev.ri Sir, has pointed out that this won't work for bitwidth below 6 and would require an additional zext.

Sorry, to ask this here, but this is my first time here and want to ask a lame question. Who is supposed to mark the review comments as done, the developer or the reviewer (really confused in this.)

Harbormaster completed remote builds in B108630: Diff 351181.Jun 10 2021, 8:42 AM

If we look at the sequence with trunc, then preconditions are different.
I'm not sure if we have *some* problematic combination of src bitwidth and bitwidth increase,
but the hard cut-off that is currently there is not correct.

In D103788#2810722, @lebedev.ri wrote:

If we look at the sequence with trunc, then preconditions are different.
I'm not sure if we have *some* problematic combination of src bitwidth and bitwidth increase,
but the hard cut-off that is currently there is not correct.

I am currently not sure of what to do here Sir, also I have no clue as to why some unit tests are failing, because they don't even contain trunc inst, and don't even hit my changes.

In D103788#2810814, @datta.nagraj wrote:

In D103788#2810722, @lebedev.ri wrote:

If we look at the sequence with trunc, then preconditions are different.
I'm not sure if we have *some* problematic combination of src bitwidth and bitwidth increase,
but the hard cut-off that is currently there is not correct.

It seems like we could go wrong if the narrow (destination) type can't hold at least the bitwidth of the wide type -- that's what we saw in the Alive2 example of the transform without the trunc -- but I haven't come up with an example where it fails when we have the trunc:
https://alive2.llvm.org/ce/z/89_wJb

I am currently not sure of what to do here Sir, also I have no clue as to why some unit tests are failing, because they don't even contain trunc inst, and don't even hit my changes.

I don't understand this comment. What is failing?

Add a check to see that the source and dst size are same.

Removed the hard cut off check. Please review Sir.

Harbormaster completed remote builds in B108744: Diff 351340.Jun 10 2021, 9:27 PM

In D103788#2811068, @spatel wrote:

In D103788#2810814, @datta.nagraj wrote:

In D103788#2810722, @lebedev.ri wrote:

If we look at the sequence with trunc, then preconditions are different.
I'm not sure if we have *some* problematic combination of src bitwidth and bitwidth increase,
but the hard cut-off that is currently there is not correct.

It seems like we could go wrong if the narrow (destination) type can't hold at least the bitwidth of the wide type -- that's what we saw in the Alive2 example of the transform without the trunc -- but I haven't come up with an example where it fails when we have the trunc:
https://alive2.llvm.org/ce/z/89_wJb

I am currently not sure of what to do here Sir, also I have no clue as to why some unit tests are failing, because they don't even contain trunc inst, and don't even hit my changes.

I don't understand this comment. What is failing?

The build had failed with that last commit, it might have been due to some other commit in main. The build is passing now after I did rebase.

In D103788#2812375, @datta.nagraj wrote:

Removed the hard cut off check. Please review Sir.

The hard check was definitely not right, but have you proven that a log2-of-bitwidth check is not necessary? The original Alive allowed for that kind of proof based on value type widths.
If we can't prove it, then I would include that check in this patch to be safe.
I don't know what the motivating source code looks like. It seems unlikely that we would ever truncate to a type smaller than log2 of the wide type in real code, but we can't rule it out, so it still deserves a regression test.

In D103788#2813238, @spatel wrote:

In D103788#2812375, @datta.nagraj wrote:

Removed the hard cut off check. Please review Sir.

The hard check was definitely not right, but have you proven that a log2-of-bitwidth check is not necessary? The original Alive allowed for that kind of proof based on value type widths.
If we can't prove it, then I would include that check in this patch to be safe.
I don't know what the motivating source code looks like. It seems unlikely that we would ever truncate to a type smaller than log2 of the wide type in real code, but we can't rule it out, so it still deserves a regression test.

Hi @spatel Sir, doesn't https://alive2.llvm.org/ce/z/89_wJb prove that this works for even type smaller than log2 of the wide type ? I can add the same test to the unit test, but I am in doubt whether to add it as a check, since the example here proves that it works for difference of bitwidth of more than log 2 as well. I tested with 63, 2 sizes as well, and that works too. Please suggest if we require to add the log2 check, since these are working fine from the above examples.

Add a test with bitwidth difference of more than log2 between zext src and trunc dst

In D103788#2813373, @datta.nagraj wrote:

Hi @spatel Sir, doesn't https://alive2.llvm.org/ce/z/89_wJb prove that this works for even type smaller than log2 of the wide type ? I can add the same test to the unit test, but I am in doubt whether to add it as a check, since the example here proves that it works for difference of bitwidth of more than log 2 as well. I tested with 63, 2 sizes as well, and that works too. Please suggest if we require to add the log2 check, since these are working fine from the above examples.

That test proves that the transform is correct for that pair of types exactly. It does not prove that the transform is correct for all pairs of types. If you can prove *why* truncating the addition constant will always work, then we can proceed without the additional check. If not, please add the extra check and mark the test with a 'TODO' comment that references what we have discussed here.

I'd rather be safe than cause a miscompile on some pair of types that we did not think of (and then potentially have the whole patch reverted). :)

Harbormaster completed remote builds in B108817: Diff 351451.Jun 11 2021, 8:36 AM

Do the opt only if the difference in bitwidth of zext src and dst is less than log2

In D103788#2813446, @spatel wrote:

In D103788#2813373, @datta.nagraj wrote:

Hi @spatel Sir, doesn't https://alive2.llvm.org/ce/z/89_wJb prove that this works for even type smaller than log2 of the wide type ? I can add the same test to the unit test, but I am in doubt whether to add it as a check, since the example here proves that it works for difference of bitwidth of more than log 2 as well. I tested with 63, 2 sizes as well, and that works too. Please suggest if we require to add the log2 check, since these are working fine from the above examples.

That test proves that the transform is correct for that pair of types exactly. It does not prove that the transform is correct for all pairs of types. If you can prove *why* truncating the addition constant will always work, then we can proceed without the additional check. If not, please add the extra check and mark the test with a 'TODO' comment that references what we have discussed here.

I'd rather be safe than cause a miscompile on some pair of types that we did not think of (and then potentially have the whole patch reverted). :)

Done. Agree to all your points Sir. :)

datta.nagraj added inline comments.Jun 12 2021, 1:25 AM

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
957	I am not sure if the 32 should be hardcoded here, or there is some other way for creating the APInt out of SrcWidth. Please guide here sir. @spatel

Harbormaster completed remote builds in B108960: Diff 351644.Jun 12 2021, 2:02 AM

spatel added inline comments.Jun 13 2021, 4:59 AM

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
957	You don't need to create an APInt; see "Log2_32()" in MathExtras.h.

Use Log2_32 function instead of APInt

datta.nagraj marked an inline comment as done.Jun 13 2021, 6:31 AM

datta.nagraj added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
957	Done.

datta.nagraj marked an inline comment as done.Jun 13 2021, 6:31 AM

Harbormaster completed remote builds in B109013: Diff 351716.Jun 13 2021, 7:03 AM

LGTM, but let's see if there are any more comments or concerns.

spatel added inline comments.Jun 13 2021, 4:14 PM

llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll
71	This comment should be removed or updated. The transform was enabled for this case.

Update comment for multiple use of zext test case

datta.nagraj marked an inline comment as done.Jun 13 2021, 7:01 PM

Harbormaster completed remote builds in B109041: Diff 351763.Jun 13 2021, 7:36 PM

@lebedev.ri @RKSimon - any more comments?

In D103788#2816578, @spatel wrote:

@lebedev.ri @RKSimon - any more comments?

ping

Seems ok to push this. @datta.nagraj - do you have commit access?

datta.nagraj added a comment.Jun 22 2021, 10:21 AM

This comment was removed by datta.nagraj.

In D103788#2833483, @spatel wrote:

Seems ok to push this. @datta.nagraj - do you have commit access?

@spatel I am not sure Sir. "arc land" gives me the below message, looks like the review needs to be in accepted state before committing:

 <!> 1 REVISION(S) ARE NOT ACCEPTED 
You are landing 1 revision(s) which are not in state "Accepted", indicating
that they have not been accepted by reviewers. Normally, you should land
changes only once they have been accepted. These revisions are in the wrong
state:

  *   D103788 [InstCombine] Eliminate casts to optimize ctlz operation
        Status: Needs Review

 >>>  Land 1 revision(s) in the wrong state? [y/N/?] N
 ---  User aborted the workflow.

In D103788#2833709, @datta.nagraj wrote:

In D103788#2833483, @spatel wrote:

Seems ok to push this. @datta.nagraj - do you have commit access?

@spatel I am not sure Sir. "arc land" gives me the below message, looks like the review needs to be in accepted state before committing:

If you don't know if you have commit access, then you probably do not have it. :)
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

For the patch state, ping @lebedev.ri to see if this is ok now.

lebedev.ri resigned from this revision.Jun 22 2021, 11:01 AM

This revision is now accepted and ready to land.Jun 22 2021, 11:01 AM

In D103788#2833788, @spatel wrote:

In D103788#2833709, @datta.nagraj wrote:

In D103788#2833483, @spatel wrote:

Seems ok to push this. @datta.nagraj - do you have commit access?

@spatel I am not sure Sir. "arc land" gives me the below message, looks like the review needs to be in accepted state before committing:

If you don't know if you have commit access, then you probably do not have it. :)
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

For the patch state, ping @lebedev.ri to see if this is ok now.

Yes Sir, I don't have commit access:

remote: Permission to llvm/llvm-project.git denied to dattanagraj.

@spatel Can you please commit it for me Sir. The patch is now in accepted state.

Closed by commit rGad0085d3381a: [InstCombine] Eliminate casts to optimize ctlz operation (authored by datta.nagraj, committed by spatel). · Explain WhyJun 23 2021, 8:19 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGad0085d3381a: [InstCombine] Eliminate casts to optimize ctlz operation.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

13 lines

test/

Transforms/

InstCombine/

zext-ctlz-trunc-to-ctlz-add.ll

58 lines

Diff 353991

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	if (match(Src, m_OneUse(m_c_Or(m_LShr(m_Value(X), m_Constant(C)),
Constant *One = ConstantInt::get(SrcTy, APInt(SrcWidth, 1));		Constant *One = ConstantInt::get(SrcTy, APInt(SrcWidth, 1));
Constant *MaskC = ConstantExpr::getShl(One, C);		Constant *MaskC = ConstantExpr::getShl(One, C);
MaskC = ConstantExpr::getOr(MaskC, One);		MaskC = ConstantExpr::getOr(MaskC, One);
Value *And = Builder.CreateAnd(X, MaskC);		Value *And = Builder.CreateAnd(X, MaskC);
return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);		return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);
}		}
}		}

Value *A;		Value A, B;
Constant *C;		Constant *C;
if (match(Src, m_LShr(m_SExt(m_Value(A)), m_Constant(C)))) {		if (match(Src, m_LShr(m_SExt(m_Value(A)), m_Constant(C)))) {
unsigned AWidth = A->getType()->getScalarSizeInBits();		unsigned AWidth = A->getType()->getScalarSizeInBits();
unsigned MaxShiftAmt = SrcWidth - std::max(DestWidth, AWidth);		unsigned MaxShiftAmt = SrcWidth - std::max(DestWidth, AWidth);
auto *OldSh = cast<Instruction>(Src);		auto *OldSh = cast<Instruction>(Src);
bool IsExact = OldSh->isExact();		bool IsExact = OldSh->isExact();

// If the shift is small enough, all zero bits created by the shift are		// If the shift is small enough, all zero bits created by the shift are
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	if (SrcWidth % DestWidth == 0) {

auto *BitCastTo =		auto *BitCastTo =
VectorType::get(DestTy, BitCastNumElts, VecElts.isScalable());		VectorType::get(DestTy, BitCastNumElts, VecElts.isScalable());
Value *BitCast = Builder.CreateBitCast(VecOp, BitCastTo);		Value *BitCast = Builder.CreateBitCast(VecOp, BitCastTo);
return ExtractElementInst::Create(BitCast, Builder.getInt32(NewIdx));		return ExtractElementInst::Create(BitCast, Builder.getInt32(NewIdx));
}		}
}		}

		// trunc (ctlz_i32(zext(A), B) --> add(ctlz_i16(A, B), C)
		if (match(Src, m_OneUse(m_Intrinsic<Intrinsic::ctlz>(m_ZExt(m_Value(A)),
		RKSimonUnsubmitted Done Reply Inline Actions We will need to add multiuse tests as well to check that that the m_OneUse is working correctly RKSimon: We will need to add multiuse tests as well to check that that the m_OneUse is working correctly
		datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Done. Added 3 tests at the end which have multiple uses. datta.nagraj: Done. Added 3 tests at the end which have multiple uses.
		lebedev.riUnsubmitted Done Reply Inline Actions Are you not missing a check that types of `A` and `Trunc` match? lebedev.ri: Are you not missing a check that types of `A` and `Trunc` match?
		datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Added the check and added a negative test for that as well. datta.nagraj: Added the check and added a negative test for that as well.
		m_Value(B))))) {
		spatelUnsubmitted Done Reply Inline Actions I don't think we need this one-use check on the zext. Usually, we say that if we are not increasing the instruction count, but we are reducing the sequence of computation, it's ok to do the transform. This will be easier to see if we commit the tests with the baseline CHECKs, so I pushed those to main: 9eef6e39816a Please rebase and regenerate the CHECK lines (and see if it's ok to remove the 2nd one-use check). spatel: I don't think we need this one-use check on the zext. Usually, we say that if we are not…
		datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Rebased on top of your change. Yes, seems that we don't need the second check. The test looks fine even without the second check. datta.nagraj: Rebased on top of your change. Yes, seems that we don't need the second check. The test looks…
		unsigned AWidth = A->getType()->getScalarSizeInBits();
		if (AWidth == DestWidth && AWidth > Log2_32(SrcWidth)) {
		datta.nagrajAuthorUnsubmitted Done Reply Inline Actions I am not sure if the 32 should be hardcoded here, or there is some other way for creating the APInt out of SrcWidth. Please guide here sir. @spatel datta.nagraj: I am not sure if the 32 should be hardcoded here, or there is some other way for creating the…
		spatelUnsubmitted Done Reply Inline Actions You don't need to create an APInt; see "Log2_32()" in MathExtras.h. spatel: You don't need to create an APInt; see "Log2_32()" in MathExtras.h.
		datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Done. datta.nagraj: Done.
		Value *WidthDiff = ConstantInt::get(A->getType(), SrcWidth - AWidth);
		Value *NarrowCtlz =
		spatelUnsubmitted Done Reply Inline Actions We usually prefer to do something like: Value NarrowCtlz = Builder.CreateIntrinsic(...); return BinaryOperator::CreateAdd(NarrowCtlz, WidthDiff); The instcombine caller function then handles the replace uses and transfers the name of the existing value to the new value, so you don't have to do that explicitly. You should see a cosmetic (but not functional) difference if you regenerate the CHECK lines in the test files with that change. spatel:* We usually prefer to do something like: Value *NarrowCtlz = Builder.CreateIntrinsic(...)…
		datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Done. Actually, I had refered the visitAdd function to see how to match intrinsic patterns, and there they were using the replaceInstUsesWith, but the approach you suggested looks clean. Made the changes. datta.nagraj: Done. Actually, I had refered the visitAdd function to see how to match intrinsic patterns, and…
		Builder.CreateIntrinsic(Intrinsic::ctlz, {Trunc.getType()}, {A, B});
		return BinaryOperator::CreateAdd(NarrowCtlz, WidthDiff);
		}
		}
return nullptr;		return nullptr;
}		}

/// Transform (zext icmp) to bitwise / integer operations in order to		/// Transform (zext icmp) to bitwise / integer operations in order to
/// eliminate it. If DoTransform is false, just test whether the given		/// eliminate it. If DoTransform is false, just test whether the given
/// (zext icmp) can be transformed.		/// (zext icmp) can be transformed.
Instruction InstCombinerImpl::transformZExtICmp(ICmpInst Cmp, ZExtInst &Zext,		Instruction InstCombinerImpl::transformZExtICmp(ICmpInst Cmp, ZExtInst &Zext,
bool DoTransform) {		bool DoTransform) {
▲ Show 20 Lines • Show All 1,853 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

				declare i3 @llvm.ctlz.i3 (i3 , i1)
	declare i32 @llvm.ctlz.i32 (i32, i1)			declare i32 @llvm.ctlz.i32 (i32, i1)
				declare i34 @llvm.ctlz.i34 (i34, i1)
	declare <2 x i33> @llvm.ctlz.v2i33 (<2 x i33>, i1)			declare <2 x i33> @llvm.ctlz.v2i33 (<2 x i33>, i1)
	declare <2 x i32> @llvm.ctlz.v2i32 (<2 x i32>, i1)			declare <2 x i32> @llvm.ctlz.v2i32 (<2 x i32>, i1)
	declare <vscale x 2 x i64> @llvm.ctlz.nxv2i64 (<vscale x 2 x i64>, i1)			declare <vscale x 2 x i64> @llvm.ctlz.nxv2i64 (<vscale x 2 x i64>, i1)
	declare <vscale x 2 x i63> @llvm.ctlz.nxv2i63 (<vscale x 2 x i63>, i1)			declare <vscale x 2 x i63> @llvm.ctlz.nxv2i63 (<vscale x 2 x i63>, i1)
	declare void @use(<2 x i32>)			declare void @use(<2 x i32>)
				RKSimonUnsubmitted Done Reply Inline Actions Drop the ; RKSimon: Drop the ;
	declare void @use1(<vscale x 2 x i63>)			declare void @use1(<vscale x 2 x i63>)

				RKSimonUnsubmitted Done Reply Inline Actions please can you use descriptive test names - @src0 etc. don't give themselves to searching (or when variant tests get added in the middle of the file...). @trunc_ctlz_zext_i32 etc. would be better RKSimon: please can you use descriptive test names - @src0 etc. don't give themselves to searching (or…
	define i16 @trunc_ctlz_zext_i16_i32(i16 %x) {			define i16 @trunc_ctlz_zext_i16_i32(i16 %x) {
	; CHECK-LABEL: @trunc_ctlz_zext_i16_i32(			; CHECK-LABEL: @trunc_ctlz_zext_i16_i32(
	; CHECK-NEXT: [[Z:%.]] = zext i16 [[X:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.ctlz.i16(i16 [[X:%.]], i1 false), !range [[RNG0:![0-9]+]]
	; CHECK-NEXT: [[P:%.*]] = call i32 @llvm.ctlz.i32(i32 [[Z]], i1 false), !range [[RNG0:![0-9]+]]			; CHECK-NEXT: [[ZZ:%.*]] = add nuw nsw i16 [[TMP1]], 16
	; CHECK-NEXT: [[ZZ:%.*]] = trunc i32 [[P]] to i16
	; CHECK-NEXT: ret i16 [[ZZ]]			; CHECK-NEXT: ret i16 [[ZZ]]
	;			;
	%z = zext i16 %x to i32			%z = zext i16 %x to i32
	%p = call i32 @llvm.ctlz.i32(i32 %z, i1 false)			%p = call i32 @llvm.ctlz.i32(i32 %z, i1 false)
	%zz = trunc i32 %p to i16			%zz = trunc i32 %p to i16
	ret i16 %zz			ret i16 %zz
	}			}

	; Fixed vector case			; Fixed vector case

	define <2 x i8> @trunc_ctlz_zext_v2i8_v2i33(<2 x i8> %x) {			define <2 x i8> @trunc_ctlz_zext_v2i8_v2i33(<2 x i8> %x) {
	; CHECK-LABEL: @trunc_ctlz_zext_v2i8_v2i33(			; CHECK-LABEL: @trunc_ctlz_zext_v2i8_v2i33(
	; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i33>			; CHECK-NEXT: [[TMP1:%.]] = call <2 x i8> @llvm.ctlz.v2i8(<2 x i8> [[X:%.]], i1 true)
	; CHECK-NEXT: [[P:%.*]] = call <2 x i33> @llvm.ctlz.v2i33(<2 x i33> [[Z]], i1 true)			; CHECK-NEXT: [[ZZ:%.*]] = add nuw nsw <2 x i8> [[TMP1]], <i8 25, i8 25>
	; CHECK-NEXT: [[ZZ:%.*]] = trunc <2 x i33> [[P]] to <2 x i8>
	; CHECK-NEXT: ret <2 x i8> [[ZZ]]			; CHECK-NEXT: ret <2 x i8> [[ZZ]]
	;			;
	%z = zext <2 x i8> %x to <2 x i33>			%z = zext <2 x i8> %x to <2 x i33>
	%p = call <2 x i33> @llvm.ctlz.v2i33(<2 x i33> %z, i1 true)			%p = call <2 x i33> @llvm.ctlz.v2i33(<2 x i33> %z, i1 true)
	%zz = trunc <2 x i33> %p to <2 x i8>			%zz = trunc <2 x i33> %p to <2 x i8>
	ret <2 x i8> %zz			ret <2 x i8> %zz
	}			}

	; Scalable vector case			; Scalable vector case

	define <vscale x 2 x i16> @trunc_ctlz_zext_nxv2i16_nxv2i64(<vscale x 2 x i16> %x) {			define <vscale x 2 x i16> @trunc_ctlz_zext_nxv2i16_nxv2i64(<vscale x 2 x i16> %x) {
	; CHECK-LABEL: @trunc_ctlz_zext_nxv2i16_nxv2i64(			; CHECK-LABEL: @trunc_ctlz_zext_nxv2i16_nxv2i64(
	; CHECK-NEXT: [[Z:%.]] = zext <vscale x 2 x i16> [[X:%.]] to <vscale x 2 x i64>			; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 2 x i16> @llvm.ctlz.nxv2i16(<vscale x 2 x i16> [[X:%.]], i1 false)
	; CHECK-NEXT: [[P:%.*]] = call <vscale x 2 x i64> @llvm.ctlz.nxv2i64(<vscale x 2 x i64> [[Z]], i1 false)			; CHECK-NEXT: [[ZZ:%.*]] = add nuw nsw <vscale x 2 x i16> [[TMP1]], shufflevector (<vscale x 2 x i16> insertelement (<vscale x 2 x i16> undef, i16 48, i32 0), <vscale x 2 x i16> undef, <vscale x 2 x i32> zeroinitializer)
	; CHECK-NEXT: [[ZZ:%.*]] = trunc <vscale x 2 x i64> [[P]] to <vscale x 2 x i16>
	; CHECK-NEXT: ret <vscale x 2 x i16> [[ZZ]]			; CHECK-NEXT: ret <vscale x 2 x i16> [[ZZ]]
	;			;
	%z = zext <vscale x 2 x i16> %x to <vscale x 2 x i64>			%z = zext <vscale x 2 x i16> %x to <vscale x 2 x i64>
	%p = call <vscale x 2 x i64> @llvm.ctlz.nxv2i64(<vscale x 2 x i64> %z, i1 false)			%p = call <vscale x 2 x i64> @llvm.ctlz.nxv2i64(<vscale x 2 x i64> %z, i1 false)
	%zz = trunc <vscale x 2 x i64> %p to <vscale x 2 x i16>			%zz = trunc <vscale x 2 x i64> %p to <vscale x 2 x i16>
	ret <vscale x 2 x i16> %zz			ret <vscale x 2 x i16> %zz
	}			}

				spatelUnsubmitted Done Reply Inline Actions There are a lot of tests here that don't provide much extra coverage. These are only changing the types to confirm that we get the add constant correct? I think we can verify that with 2 tests of varying types (including a vector type), but we can do better by using a weird type (for example 3 x i17). We can also get more value by varying the 2nd parameter (make it `true` half of the time?). spatel: There are a lot of tests here that don't provide much extra coverage. These are only changing…
				datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Removed redundant tests. (Yes, they were checking for the add constant) Added weird type tests. Varied the 2nd parameter as true for half of them. datta.nagraj: Removed redundant tests. (Yes, they were checking for the add constant) Added weird type tests.
				; Multiple uses of ctlz for which the opt is disabled

	define <2 x i17> @trunc_ctlz_zext_v2i17_v2i32_multiple_uses(<2 x i17> %x) {			define <2 x i17> @trunc_ctlz_zext_v2i17_v2i32_multiple_uses(<2 x i17> %x) {
	; CHECK-LABEL: @trunc_ctlz_zext_v2i17_v2i32_multiple_uses(			; CHECK-LABEL: @trunc_ctlz_zext_v2i17_v2i32_multiple_uses(
	; CHECK-NEXT: [[Z:%.]] = zext <2 x i17> [[X:%.]] to <2 x i32>			; CHECK-NEXT: [[Z:%.]] = zext <2 x i17> [[X:%.]] to <2 x i32>
	; CHECK-NEXT: [[P:%.*]] = call <2 x i32> @llvm.ctlz.v2i32(<2 x i32> [[Z]], i1 false)			; CHECK-NEXT: [[P:%.*]] = call <2 x i32> @llvm.ctlz.v2i32(<2 x i32> [[Z]], i1 false)
	; CHECK-NEXT: [[ZZ:%.*]] = trunc <2 x i32> [[P]] to <2 x i17>			; CHECK-NEXT: [[ZZ:%.*]] = trunc <2 x i32> [[P]] to <2 x i17>
	; CHECK-NEXT: call void @use(<2 x i32> [[P]])			; CHECK-NEXT: call void @use(<2 x i32> [[P]])
	; CHECK-NEXT: ret <2 x i17> [[ZZ]]			; CHECK-NEXT: ret <2 x i17> [[ZZ]]
	;			;
	%z = zext <2 x i17> %x to <2 x i32>			%z = zext <2 x i17> %x to <2 x i32>
	%p = call <2 x i32> @llvm.ctlz.v2i32(<2 x i32> %z, i1 false)			%p = call <2 x i32> @llvm.ctlz.v2i32(<2 x i32> %z, i1 false)
	%zz = trunc <2 x i32> %p to <2 x i17>			%zz = trunc <2 x i32> %p to <2 x i17>
	call void @use(<2 x i32> %p)			call void @use(<2 x i32> %p)
	ret <2 x i17> %zz			ret <2 x i17> %zz
	}			}

				; Multiple uses of zext
				RKSimonUnsubmitted Done Reply Inline Actions We need vector type coverage as well - at least fixed vectors, scalable vectors as well would be a bonus declare <2 x i32> @llvm.ctlz.v2i32 (<2 x i32>, i1) declare <vscale x 2 x i32> @llvm.ctlz.v2i32 (<vscale x 2 x i32>, i1) RKSimon: We need vector type coverage as well - at least fixed vectors, scalable vectors as well would…
				datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Done. Added tests with vector and scalable vectors. datta.nagraj: Done. Added tests with vector and scalable vectors.
				datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Done. Added tests with vector and scalable vectors. datta.nagraj: Done. Added tests with vector and scalable vectors.
				spatelUnsubmitted Done Reply Inline Actions This comment should be removed or updated. The transform was enabled for this case. spatel: This comment should be removed or updated. The transform was enabled for this case.

	define <vscale x 2 x i16> @trunc_ctlz_zext_nxv2i16_nxv2i63_multiple_uses(<vscale x 2 x i16> %x) {			define <vscale x 2 x i16> @trunc_ctlz_zext_nxv2i16_nxv2i63_multiple_uses(<vscale x 2 x i16> %x) {
	; CHECK-LABEL: @trunc_ctlz_zext_nxv2i16_nxv2i63_multiple_uses(			; CHECK-LABEL: @trunc_ctlz_zext_nxv2i16_nxv2i63_multiple_uses(
	; CHECK-NEXT: [[Z:%.]] = zext <vscale x 2 x i16> [[X:%.]] to <vscale x 2 x i63>			; CHECK-NEXT: [[Z:%.]] = zext <vscale x 2 x i16> [[X:%.]] to <vscale x 2 x i63>
	; CHECK-NEXT: [[P:%.*]] = call <vscale x 2 x i63> @llvm.ctlz.nxv2i63(<vscale x 2 x i63> [[Z]], i1 true)			; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i16> @llvm.ctlz.nxv2i16(<vscale x 2 x i16> [[X]], i1 true)
	; CHECK-NEXT: [[ZZ:%.*]] = trunc <vscale x 2 x i63> [[P]] to <vscale x 2 x i16>			; CHECK-NEXT: [[ZZ:%.*]] = add nuw nsw <vscale x 2 x i16> [[TMP1]], shufflevector (<vscale x 2 x i16> insertelement (<vscale x 2 x i16> undef, i16 47, i32 0), <vscale x 2 x i16> undef, <vscale x 2 x i32> zeroinitializer)
	; CHECK-NEXT: call void @use1(<vscale x 2 x i63> [[Z]])			; CHECK-NEXT: call void @use1(<vscale x 2 x i63> [[Z]])
	; CHECK-NEXT: ret <vscale x 2 x i16> [[ZZ]]			; CHECK-NEXT: ret <vscale x 2 x i16> [[ZZ]]
	;			;
	%z = zext <vscale x 2 x i16> %x to <vscale x 2 x i63>			%z = zext <vscale x 2 x i16> %x to <vscale x 2 x i63>
	%p = call <vscale x 2 x i63> @llvm.ctlz.nxv2i63(<vscale x 2 x i63> %z, i1 true)			%p = call <vscale x 2 x i63> @llvm.ctlz.nxv2i63(<vscale x 2 x i63> %z, i1 true)
	%zz = trunc <vscale x 2 x i63> %p to <vscale x 2 x i16>			%zz = trunc <vscale x 2 x i63> %p to <vscale x 2 x i16>
	call void @use1(<vscale x 2 x i63> %z)			call void @use1(<vscale x 2 x i63> %z)
	ret <vscale x 2 x i16> %zz			ret <vscale x 2 x i16> %zz
	}			}

				; Negative case where types of x and zz don't match

				define i16 @trunc_ctlz_zext_i10_i32(i10 %x) {
				; CHECK-LABEL: @trunc_ctlz_zext_i10_i32(
				; CHECK-NEXT: [[Z:%.]] = zext i10 [[X:%.]] to i32
				; CHECK-NEXT: [[P:%.*]] = call i32 @llvm.ctlz.i32(i32 [[Z]], i1 false), !range [[RNG1:![0-9]+]]
				; CHECK-NEXT: [[ZZ:%.*]] = trunc i32 [[P]] to i16
				; CHECK-NEXT: ret i16 [[ZZ]]
				;
				%z = zext i10 %x to i32
				%p = call i32 @llvm.ctlz.i32(i32 %z, i1 false)
				%zz = trunc i32 %p to i16
				ret i16 %zz
				}

				; Test width difference of more than log2 between x and t
				; TODO: Enable the opt for this case if it is proved that the
				; opt works for all combinations of bitwidth of zext src and dst.
				; Refer : https://reviews.llvm.org/D103788

				define i3 @trunc_ctlz_zext_i3_i34(i3 %x) {
				; CHECK-LABEL: @trunc_ctlz_zext_i3_i34(
				; CHECK-NEXT: [[Z:%.]] = zext i3 [[X:%.]] to i34
				; CHECK-NEXT: [[P:%.*]] = call i34 @llvm.ctlz.i34(i34 [[Z]], i1 false), !range [[RNG2:![0-9]+]]
				; CHECK-NEXT: [[T:%.*]] = trunc i34 [[P]] to i3
				; CHECK-NEXT: ret i3 [[T]]
				;
				%z = zext i3 %x to i34
				%p = call i34 @llvm.ctlz.i34(i34 %z, i1 false)
				%t = trunc i34 %p to i3
				ret i3 %t
				}
				spatelUnsubmitted Done Reply Inline Actions Typically, we make the extra use minimal and more plausible by adding a `call void @use(i32 %p)` instruction. Also, what happens if the zext has an extra use? spatel: Typically, we make the extra use minimal and more plausible by adding a `call void @use(i32…
				datta.nagrajAuthorUnsubmitted Done Reply Inline Actions Made use of the function call. That's a nice idea, didn't think of that. Added condition check for extra use of zext and added a test case for that as well. datta.nagraj: Made use of the function call. That's a nice idea, didn't think of that. Added condition check…

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Eliminate casts to optimize ctlz operationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 353991

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll

[InstCombine] Eliminate casts to optimize ctlz operation
ClosedPublic