This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Remove one-use limitation from X-Y==0 fold
ClosedPublic

Authored by nikic on Feb 22 2022, 9:04 AM.

Details

Summary

This one-use limitation is artificial, we do not increase instruction count if we fold it with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold.

I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine.

Diff Detail

Event Timeline

nikic created this revision.Feb 22 2022, 9:04 AM
nikic requested review of this revision.Feb 22 2022, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 9:04 AM
spatel accepted this revision.Feb 22 2022, 10:09 AM

LGTM.

I think previous changes like this have exposed some shortcomings in IR analysis or possibly LSR, so I'd watch for fallout there too. No way to know for certain until we try. :)

This revision is now accepted and ready to land.Feb 22 2022, 10:09 AM
This revision was landed with ongoing or failed builds.Feb 23 2022, 12:37 AM
This revision was automatically updated to reflect the committed changes.
rnk added a subscriber: rnk.Feb 23 2022, 11:04 AM

We noticed that this change pushed a BPF probe program over the Linux kernel's BPF program size limit. I think that's fine, we don't make guarantees about the size of the final program, but it does indicate that your change may have increased code size more generally. I haven't done extensive performance testing, but I wouldn't be surprised if we find something during that process.

It looks like this also causes a 7% code-size regression on tramp3d-v4: https://llvm-compile-time-tracker.com/compare.php?from=79353f940cf441e69f32af0a78a48baee89e8517&to=65dc78d63ee2eb20fbed54401091f08a685ef8c1&stat=size-text

So yeah, this clearly has some negative interactions that need to be addressed.

I just looked into the code-size regression for tramp3d-v4, and what happens is that we're performing additional (desirable) jump-threading in many places, and in particular in @_ZN6EngineILi3Ed10MultiPatchI7GridTag6RemoteI5BrickEEED2Ev a redundant branch is eliminated entirely and the function shrinks by just enough to be eligible for inlining at hot call sites -- of which there are a lot, or at least LLVM thinks so.

As such, at least for the tramp3d-v4 case, it looks like this change is having a positive effect and just ends up crossing an inlining threshold.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 7:43 AM