Page MenuHomePhabricator

Fix generic shift expansion when shift amount is 0
ClosedPublic

Authored by loladiro on Aug 19 2014, 6:22 PM.

Details

Summary

This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.

This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.

Diff Detail

Repository
rL LLVM

Event Timeline

loladiro updated this revision to Diff 12687.Aug 19 2014, 6:22 PM
loladiro retitled this revision from to Fix generic shift expansion when shift amount is 0.
loladiro updated this object.
loladiro edited the test plan for this revision. (Show Details)
loladiro set the repository for this revision to rL LLVM.
loladiro added a subscriber: Unknown Object (MLST).

Bump. Any thoughts on whether this is the correct way to do this or a different approach would be better?

chfast added a subscriber: chfast.Oct 2 2014, 8:21 AM

Do you know why shift by 0 produces wrong results?

Yes, see the linked bug report:

If the shift amount is 0 `shrdl	%cl, %esi, %ebx` acts as `or` (since it's doing shift mod 32), but LLVM seems to assume that it's shifting it out to the left.

You are right, that fix works for my case too.

In original code, e.g. when lowering i128 to 2 x i64, and shift amount is 0, one of the words needs to be zerod with shift by 64. That will not work as X86 architecture would execute it as shift by (64 mod 64 = 0).

I have an idea for improvement: if shift amount is zero there is no need to compute anything more. The select that takes isZero into account can be placed up front.

Can I be an reviewer for that change?

chfast accepted this revision.Mar 3 2015, 10:28 AM
chfast edited edge metadata.
This revision is now accepted and ready to land.Mar 3 2015, 10:28 AM
loladiro added a subscriber: resistor.

I'd like to get approval from the SelectionDAG code owner, which seems to be @resistor?

sanjoy added a subscriber: sanjoy.Mar 7 2015, 7:23 PM

Shallow, drop-by comment: does it make sense to generate a DAG that can be combined into a BEXTR when possible?

Shallow, drop-by comment: does it make sense to generate a DAG that can be combined into a BEXTR when possible?

Can you explain more?

Shallow, drop-by comment: does it make sense to generate a DAG that can be combined into a BEXTR when possible?

Can you explain more?

For a left shift, I think HiResult can be HiInput << ShiftAmt | BEXTR LoInput, Start = (WordSize - ShiftAmt), Len = ShiftAmt. This will do the right thing for ShiftAmt = 0.

I think many x86 CPUs do not support BEXTR so either this will have be a target dependent thing, or have to be a pattern that the DAG combiner will fold into a BEXTR.

In any case, this is very minor.

For a left shift, I think HiResult can be HiInput << ShiftAmt | BEXTR LoInput, Start = (WordSize - ShiftAmt), Len = ShiftAmt. This will do the right thing for ShiftAmt = 0.

I think many x86 CPUs do not support BEXTR so either this will have be a target dependent thing, or have to be a pattern that the DAG combiner will fold into a BEXTR.

In any case, this is very minor.

BEXTR would be useful only if shift amount is constant. I think this case is opposite.

resistor accepted this revision.Apr 1 2015, 9:50 AM
resistor edited edge metadata.

LGTM.

As the patch is accepted, can I commit it?

Yes, but I'd really like to include the test you wrote, which is why I haven't committed it yet.

chfast added a comment.EditedApr 20 2015, 9:53 AM

I asked about that test on LLVMdev today. There are opinions that it should check assembly generated instead of runtime results. I think we can check for "select" node in the assembly but the test will be implementation and target specific. What do you think? I can send you a test if you agree.

I think it's fine having this as a codegen test that checks that the generated code on, say X86 is correct.

This revision was automatically updated to reflect the committed changes.