This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...))
ClosedPublic

Authored by aaboud on Aug 14 2017, 7:10 AM.

Details

Summary

Added support for the following case:

define i16 @test6_ashr_mul(i8 %X, i8 %Y) {
; CHECK-LABEL: @test6_ashr_mul(
; CHECK-NEXT: [[A:%.*]] = sext i8 %X to i16
; CHECK-NEXT: [[B:%.*]] = sext i8 %Y to i16
; CHECK-NEXT: [[C:%.*]] = mul nsw i16 [[A]], [[B]]
; CHECK-NEXT: [[D:%.*]] = ashr i16 %C, 15
; CHECK-NEXT: ret i16 %D

%A = sext i8 %X to i32
%B = sext i8 %Y to i32
%C = mul i32 %A, %B
%D = ashr i32 %C, 15
%E = trunc i32 %D to i16
ret i16 %E

}

So far, InstCombine handled only LShr instruction and tried to replace AShr instruction with the LShr when applicable.
However, there are cases where LShr has no advantage over AShr, especially when the higher bits are not known to be zero, but are known to be equal to the sign bit.

Note: There is more to do to catch all the cases where AShr can be optimized, but we end up having LShr instead, e.g., we could replace LShr with AShr when applicable. However, the current implementation of InstCombine will probably lead into infinite loop if we try implement both transform direction (i.e., LShr->AShr and AShr->LShr).

Diff Detail

Event Timeline

aaboud created this revision.Aug 14 2017, 7:10 AM
spatel added inline comments.Aug 14 2017, 8:47 AM
lib/Transforms/InstCombine/InstCombineShifts.cpp
562–565 ↗(On Diff #110960)

Use 'm_Shr' to reduce the code. This change is independent of anything else, so I think it should go in first.
http://rise4fun.com/Alive/lBi
Here's a test you can use for that patch:

define i8 @shishi(i8 %x) {
  %a = ashr i8 %x, 6
  %b = shl i8 %a, 6
  %extra_use_of_a = mul i8 %a, 5
  %r = sdiv i8 %extra_use_of_a, %b
  ret i8 %r
}
aaboud updated this revision to Diff 111171.Aug 15 2017, 8:19 AM

Split the patch by moving: (X >>s C) << C --> X & (-1 << C) to D36743.

spatel edited edge metadata.Aug 15 2017, 11:55 AM

I think all of the non-multiply tests are simplified after rL310942. Can you double-check that? If so, you can remove those tests from this patch. Feel free to add them as an independent NFC commit if they cover some case that isn't covered already.

It may also be possible to simplify the code in this patch now, but I didn't check that.

aaboud updated this revision to Diff 111247.Aug 15 2017, 1:51 PM

Updated patch to top of trunk and removed NFC tests (will commit them separately).

Can we have the ashr fix as a separate patch before the mul patch? If yes, make tests for just that diff and split this up?

lib/Transforms/InstCombine/InstCombineCasts.cpp
378

Over in D36763, we're going to (or maybe already have) made the other shift cases splat-vector-friendly. Can you do the same here and add a vector test?

lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
468

Why change this in this patch? Seem like that might be better as just 'unsigned', but either way, I think it's an independent diff.

aaboud marked an inline comment as done.Aug 16 2017, 12:50 AM

Can we have the ashr fix as a separate patch before the mul patch? If yes, make tests for just that diff and split this up?

I uploaded a separate patch that handles ashr including vector type handling (D36784).

lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
468

I change it to uint32_t to be consistent with line 507 (the case for AShr).
Sure, we can commit this separately.
Do we need a review for this?
Do you want to take care of it? or I should just commit it now?

spatel added inline comments.Aug 16 2017, 9:01 AM
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
468

Ah, I see the LangRef says we're allowed to go up to 24 bits of width:
http://llvm.org/docs/LangRef.html#integer-type

So I guess it's ok to specify 32-bit here...or just change them all to 'unsigned'? Either way seems fine to me. Feel free to commit separately.

aaboud updated this revision to Diff 111512.Aug 17 2017, 7:18 AM

After committing the other parts, this patch was reduced to handle only the multiply in ComputeNumSignBitsImpl.

spatel accepted this revision.Aug 18 2017, 9:25 AM

Thanks for breaking this up - LGTM.
See inline for a couple of nits.

lib/Analysis/ValueTracking.cpp
2200

can at -> can be at

2201–2203

It's poor to have variables named 'Tmp'. These can be locals now, so let's give these meaningful names - SignBitsOp0/SignBitsOp1 or something like that.

This revision is now accepted and ready to land.Aug 18 2017, 9:25 AM
This revision was automatically updated to reflect the committed changes.
aaboud marked 2 inline comments as done.