This is an archive of the discontinued LLVM Phabricator instance.

[X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0)
ClosedPublic

Authored by RKSimon on Apr 14 2021, 9:10 AM.

Details

Summary

If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero.

If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation.

Fixes PR49028.

Diff Detail

Event Timeline

RKSimon created this revision.Apr 14 2021, 9:10 AM
RKSimon requested review of this revision.Apr 14 2021, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 9:10 AM
pengfei added inline comments.Apr 14 2021, 11:02 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
48835

Do we create a i64 cmp if TruncSrcVT = i64? I think it's not efficient than the truncated CMP.

48843

It's a bit confusing. Maybe better either use

SDValue Trunc = Op;
Op = Op.getOperand(0);
...
if (!Trunc.hasOneUse() || !Op.hasOneUse())

or say the source and dest of the truncate must have a single use.

llvm/test/CodeGen/X86/and-with-overflow.ll
66

Why is this test still kept?

RKSimon added inline comments.Apr 15 2021, 4:13 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
48835

It would, if we don't then we could still have cases where we fail to fold the compare to a i64 op's eflags result - but given most use cases of this patch is for promoted ops, then it will have just promoted from i8/i16 to i32 - so I'm happy to restrict it to i32. as well.

48843

Yes, that'd be better - will update it

llvm/test/CodeGen/X86/and-with-overflow.ll
66

Because i686 hasn't made use of the zeroext %0 tag, its become:

i32,ch = load<(load 2 from %fixed-stack.1, align 4), anyext from i16> t0, FrameIndex:i32<-1>, undef:i32

so known bits doesn't know the upper i16 should be zero.

RKSimon updated this revision to Diff 337717.Apr 15 2021, 4:35 AM
RKSimon edited the summary of this revision. (Show Details)

Updated based on feedback from @pengfei

pengfei accepted this revision.Apr 15 2021, 5:24 AM

LGTM.

llvm/test/CodeGen/X86/and-with-overflow.ll
66

I see, sounds like an optimization. By the way, PR49028 can generate a movzwl under i686.
Anyway, here the testw is necessary in this circumstances.

This revision is now accepted and ready to land.Apr 15 2021, 5:24 AM
This revision was landed with ongoing or failed builds.Apr 15 2021, 5:56 AM
This revision was automatically updated to reflect the committed changes.