This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Simplify shift-by-sext to shift-by-zext
ClosedPublic

Authored by lebedev.ri on Sep 26 2019, 2:32 PM.

Details

Summary

This is valid for any sext bitwidth pair:

Processing /tmp/opt.ll..

----------------------------------------
  %signed = sext %y
  %r = shl %x, %signed
  ret %r
=>
  %unsigned = zext %y
  %r = shl %x, %unsigned
  ret %r
  %signed = sext %y

Done: 2016
Optimization is correct!

(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)

Main motivation is the C++ semantics:

int shl(int a, char b) {
    return a << b;
}

ends as

%3 = sext i8 %1 to i32
%4 = shl i32 %0, %3

https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 26 2019, 2:32 PM

Revisited tests.

We can quite easily have more than one shift by the same sext.
I'm not quite sure where the fix for that should be.
Here i'm proposing to do that in instcombine, by checking
that all uses of such sext are shifts.
This is obviously O(N^2) iff the sext is also used by a non-shift.
Thoughts?

There are a handful of existing transforms in InstCombine that look at users rather than operands, so it's not unprecedented. But I'm not sure if there's another/better way.
One possibility would be to ignore the usual hasOneUse() constraint because we're ok with the trade-off (better analysis despite a potential extra instruction).
You could add the basic fold (with m_OneUse()) as a preliminary and non-controversial step? That would reduce risk and make it clear that whatever extra code that we use is specifically for the multi-use case.
cc @efriedma in case he has advice.

There are a handful of existing transforms in InstCombine that look at users rather than operands, so it's not unprecedented. But I'm not sure if there's another/better way.

The other obvious alternative i know of is putting the non-single-use handling into AggressiveInstCombine.

One possibility would be to ignore the usual hasOneUse() constraint because we're ok with the trade-off (better analysis despite a potential extra instruction).

Hmm, or that. We are sure no other pass will ever do the opposite transform and replace that zext with sext?

You could add the basic fold (with m_OneUse()) as a preliminary and non-controversial step? That would reduce risk and make it clear that whatever extra code that we use is specifically for the multi-use case.

Will split the patch up.

cc @efriedma in case he has advice.

I will fully understand if the greedy variants aren't good for instcombine proper,
i can put the current-ish vaariant into aggressiveinstcombine; there it won't have O(N^2) problem, too.
It's just that the AIC IIRC only runs in -O3, not even -O2?

Back to least greedy version of the patch, this should raise no concerns whatsoever.
The greedy version can be added later in another review.

i can put the current-ish vaariant into aggressiveinstcombine; there it won't have O(N^2) problem, too.
It's just that the AIC IIRC only runs in -O3, not even -O2?

That's correct, but it was limited to -O3 only because that was the easy, non-controversial path forward. If someone is willing to quantify the perf vs. compile-time trade-off for AIC to show it's worth it, then it could be moved to -O2.

spatel accepted this revision.Sep 27 2019, 10:53 AM

LGTM - it would be best to send the multi-use example to llvm-dev, so we can get some consensus on what's best.

This revision is now accepted and ready to land.Sep 27 2019, 10:53 AM
This revision was automatically updated to reflect the committed changes.