This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] try multi-use demanded bits folds for 'add'
ClosedPublic

Authored by spatel on Sep 13 2022, 10:09 AM.

Details

Summary

This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh

This mimics transforms that we already do on the single-use path.

Originally, this patch did not include the last part to form a constant, but that can be removed independently to reduce risk. It's not clear what the effect of either change will be when viewed end-to-end.

See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Diff Detail

Event Timeline

spatel created this revision.Sep 13 2022, 10:09 AM
spatel requested review of this revision.Sep 13 2022, 10:09 AM

Nice - have you looked if DAG SimplifyMultipleUseDemandedBits would benefit?

Nice - have you looked if DAG SimplifyMultipleUseDemandedBits would benefit?

I saw that we don't have this in codegen either, but I didn't try to adapt it yet. We can do the same thing for Op1 (RHS) of a sub, so that's another potential follow-up.

Nice - have you looked if DAG SimplifyMultipleUseDemandedBits would benefit?

I saw that we don't have this in codegen either, but I didn't try to adapt it yet. We can do the same thing for Op1 (RHS) of a sub, so that's another potential follow-up.

D133799 shows the DAG version...looks like a slog to make that worthwhile.

The 2nd part avoids computing known bits for every other multi-use add (giving the small improvement in compile-time). That also results in the test diffs that show large unsigned constants becoming small negative numbers. That should make analysis easier and codegen better in most cases.

I think it might make sense to separate these two changes if we can do so reasonably cheaply. Rather then calling computeKnownBits on the add, which will unnecessarily repeat the two recursive calls, can we call KnownBits::computeForAddSub() on the computed results? I expect that would be about compile-time neutral, while retaining existing behavior.

The 2nd part avoids computing known bits for every other multi-use add (giving the small improvement in compile-time). That also results in the test diffs that show large unsigned constants becoming small negative numbers. That should make analysis easier and codegen better in most cases.

I think it might make sense to separate these two changes if we can do so reasonably cheaply. Rather then calling computeKnownBits on the add, which will unnecessarily repeat the two recursive calls, can we call KnownBits::computeForAddSub() on the computed results? I expect that would be about compile-time neutral, while retaining existing behavior.

Yes - good suggestion and your guess about timing looks correct. I drafted the changes as independent patches and pushed them up to llvm-compile-time-tracker as perf/add-demand2:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

The timing diffs are nearly invisible in this set, so we can mostly judge the patches on the test diffs, and if it saves a bit of time too, that's a nice bonus. The sibling change in the codegen patch shows that the overall results may be unpredictable, but we can probably deal with any fallout with backend fixes.

spatel updated this revision to Diff 460039.Sep 14 2022, 5:07 AM
spatel edited the summary of this revision. (Show Details)
This revision is now accepted and ready to land.Sep 14 2022, 5:30 AM
This revision was landed with ongoing or failed builds.Sep 14 2022, 6:39 AM
This revision was automatically updated to reflect the committed changes.