This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.
ClosedPublic

Authored by craig.topper on Oct 21 2021, 1:58 PM.

Details

Summary

Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
end with larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Diff Detail

Event Timeline

craig.topper created this revision.Oct 21 2021, 1:58 PM
craig.topper requested review of this revision.Oct 21 2021, 1:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2021, 1:58 PM
Herald added a subscriber: MaskRay. · View Herald Transcript
craig.topper added inline comments.Oct 21 2021, 2:03 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
615

We should maybe expand parity as well to avoid an unneeded zero extend that SimplifyDemandedBits can't get rid of right now. Probably due to multiple uses. But we need to move ExpandPARITY from LegalizeDAG to TargetLowering first.

RKSimon accepted this revision.Oct 22 2021, 3:58 AM

LGTM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
615

Add a TODO comment

This revision is now accepted and ready to land.Oct 22 2021, 3:58 AM
This revision was landed with ongoing or failed builds.Oct 22 2021, 9:10 AM
This revision was automatically updated to reflect the committed changes.
thopre added a subscriber: thopre.Nov 17 2021, 7:59 AM

Hi @craig.topper ,

This caused a regression for us for i16 cttz with is_undef_zero true because cttz is expanded to a sequence using ctpop which loses the fact that any output is ok for the 0 case.

t17: i16 = sub t9, Constant:i16<1>
t19: i16 = xor t9, Constant:i16<-1>
t20: i16 = and t19, t17
t21: i16 = ctpop t20

So when this gets promoted, a mask is inserted to not have the result of ctpop change when t9 is 0.

t15: i32 = extract_vector_elt t2, Constant:i32<0>
t25: i32 = xor t15, Constant:i32<-1>
t23: i32 = sub t15, Constant:i32<1>
t26: i32 = and t25, t23
t28: i32 = and t26, Constant:i32<65535>
t29: i32 = ctpop t28

Prior to this patch, we'd get:

t19: i32 = sub t15, Constant:i32<1>
t21: i32 = xor t15, Constant:i32<-1>
t22: i32 = and t21, t19
t23: i32 = ctpop t22

I thought about inserting some llvm.assume saying the value is non zero when expanding but those work at IR level. Any pointer on how to solve this? Best regards.

Hi @craig.topper ,

This caused a regression for us for i16 cttz with is_undef_zero true because cttz is expanded to a sequence using ctpop which loses the fact that any output is ok for the 0 case.

t17: i16 = sub t9, Constant:i16<1>
t19: i16 = xor t9, Constant:i16<-1>
t20: i16 = and t19, t17
t21: i16 = ctpop t20

So when this gets promoted, a mask is inserted to not have the result of ctpop change when t9 is 0.

t15: i32 = extract_vector_elt t2, Constant:i32<0>
t25: i32 = xor t15, Constant:i32<-1>
t23: i32 = sub t15, Constant:i32<1>
t26: i32 = and t25, t23
t28: i32 = and t26, Constant:i32<65535>
t29: i32 = ctpop t28

Prior to this patch, we'd get:

t19: i32 = sub t15, Constant:i32<1>
t21: i32 = xor t15, Constant:i32<-1>
t22: i32 = and t21, t19
t23: i32 = ctpop t22

I thought about inserting some llvm.assume saying the value is non zero when expanding but those work at IR level. Any pointer on how to solve this? Best regards.

I take it i32 ctpop is legal for your target or the i16 ctpop would have gotten expanded too?

Hi @craig.topper ,

This caused a regression for us for i16 cttz with is_undef_zero true because cttz is expanded to a sequence using ctpop which loses the fact that any output is ok for the 0 case.

t17: i16 = sub t9, Constant:i16<1>
t19: i16 = xor t9, Constant:i16<-1>
t20: i16 = and t19, t17
t21: i16 = ctpop t20

So when this gets promoted, a mask is inserted to not have the result of ctpop change when t9 is 0.

t15: i32 = extract_vector_elt t2, Constant:i32<0>
t25: i32 = xor t15, Constant:i32<-1>
t23: i32 = sub t15, Constant:i32<1>
t26: i32 = and t25, t23
t28: i32 = and t26, Constant:i32<65535>
t29: i32 = ctpop t28

Prior to this patch, we'd get:

t19: i32 = sub t15, Constant:i32<1>
t21: i32 = xor t15, Constant:i32<-1>
t22: i32 = and t21, t19
t23: i32 = ctpop t22

I thought about inserting some llvm.assume saying the value is non zero when expanding but those work at IR level. Any pointer on how to solve this? Best regards.

I take it i32 ctpop is legal for your target or the i16 ctpop would have gotten expanded too?

Correct.

Hi @craig.topper ,

This caused a regression for us for i16 cttz with is_undef_zero true because cttz is expanded to a sequence using ctpop which loses the fact that any output is ok for the 0 case.

t17: i16 = sub t9, Constant:i16<1>
t19: i16 = xor t9, Constant:i16<-1>
t20: i16 = and t19, t17
t21: i16 = ctpop t20

So when this gets promoted, a mask is inserted to not have the result of ctpop change when t9 is 0.

t15: i32 = extract_vector_elt t2, Constant:i32<0>
t25: i32 = xor t15, Constant:i32<-1>
t23: i32 = sub t15, Constant:i32<1>
t26: i32 = and t25, t23
t28: i32 = and t26, Constant:i32<65535>
t29: i32 = ctpop t28

Prior to this patch, we'd get:

t19: i32 = sub t15, Constant:i32<1>
t21: i32 = xor t15, Constant:i32<-1>
t22: i32 = and t21, t19
t23: i32 = ctpop t22

I thought about inserting some llvm.assume saying the value is non zero when expanding but those work at IR level. Any pointer on how to solve this? Best regards.

I take it i32 ctpop is legal for your target or the i16 ctpop would have gotten expanded too?

Correct.

I just pushed d78fdf111dda26307e88d16f5f5d1411f3bd7e61 can you let me know if that fixes the issue?

Hi @craig.topper ,

This caused a regression for us for i16 cttz with is_undef_zero true because cttz is expanded to a sequence using ctpop which loses the fact that any output is ok for the 0 case.

t17: i16 = sub t9, Constant:i16<1>
t19: i16 = xor t9, Constant:i16<-1>
t20: i16 = and t19, t17
t21: i16 = ctpop t20

So when this gets promoted, a mask is inserted to not have the result of ctpop change when t9 is 0.

t15: i32 = extract_vector_elt t2, Constant:i32<0>
t25: i32 = xor t15, Constant:i32<-1>
t23: i32 = sub t15, Constant:i32<1>
t26: i32 = and t25, t23
t28: i32 = and t26, Constant:i32<65535>
t29: i32 = ctpop t28

Prior to this patch, we'd get:

t19: i32 = sub t15, Constant:i32<1>
t21: i32 = xor t15, Constant:i32<-1>
t22: i32 = and t21, t19
t23: i32 = ctpop t22

I thought about inserting some llvm.assume saying the value is non zero when expanding but those work at IR level. Any pointer on how to solve this? Best regards.

I take it i32 ctpop is legal for your target or the i16 ctpop would have gotten expanded too?

Correct.

I just pushed d78fdf111dda26307e88d16f5f5d1411f3bd7e61 can you let me know if that fixes the issue?

It does yes, codegen back to where it was. Thanks a lot!