This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeTypes] Bugfixes for big-endian targets when handling BITCASTs
ClosedPublic

Authored by uabelho on Dec 3 2019, 12:13 AM.

Details

Summary

This fixes PR44135.

The special case when we promote a bitcast from a vector to an int
needs special handling when we are on a big-endian target.

Prior to this fix, for the added vec_to_int we see the following in the
SelectionDAG printouts

Type-legalized selection DAG: %bb.1 'foo:bb.1'
SelectionDAG has 9 nodes:

t0: ch = EntryToken
      t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
    t17: v4i32 = bitcast t2
  t23: i32 = extract_vector_elt t17, Constant:i32<3>
t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23
t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

and I think here the extract_vector_elt is wrong and extracts the value
from the wrong index.

The program program should return the 32 bits made up of the elements at
index 4 and 5 in the vec6 array, but with

t23: i32 = extract_vector_elt t17, Constant:i32<3>

as far as I can tell, we will extract values that originally didn't even
exist in the vec6 vectore.

If we would instead extract the element at index 2 we would get the wanted
values.

With this fix we insert a right shift after the bitcast in
DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us

Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1'
SelectionDAG has 9 nodes:

t0: ch = EntryToken
      t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
    t23: v4i32 = bitcast t2
  t27: i32 = extract_vector_elt t23, Constant:i32<2>
t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27
t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

So now we get

t27: i32 = extract_vector_elt t23, Constant:i32<2>

which is what we want.

Similarly, the new int_to_vec testcase exposes a bug where we cast the other
direction. Then we instead need to add a left shift before the bitcast on
big-endian targets for the bits in the input integer to end up at the exptected
place in the vector.

Diff Detail

Event Timeline

uabelho created this revision.Dec 3 2019, 12:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2019, 12:13 AM

Originally triggered this problem for our out-of-tree target but I think it's exposed on ARM as well.

The problem (though an instcombine version of it) was discussed briefly here:
http://lists.llvm.org/pipermail/llvm-dev/2019-November/137297.html

I'm not very familiar with ARM though and I'm not very happy with the CHECKS, especially not for the int_to_vec
case, so any suggestions there from someone who knows ARM would be good.

uabelho added a subscriber: bjope.Dec 3 2019, 12:20 AM
spatel added inline comments.Dec 5 2019, 11:16 AM
llvm/test/CodeGen/ARM/legalize-bitcast.ll
1

The bug is still visible if you let this RUN all the way to asm output, right?
If so, I think that would be easier to read, especially if we can auto-generate the CHECK lines using "utils/update_llc_test_checks.py".
Either way, I think it's better to pre-commit this test to trunk showing the wrong output. That way, this patch will show the diff that creates the correct code. (And if for some reason this patch needs to be reverted, it will be clear that we have reverted to code that creates a miscompile.)

uabelho updated this revision to Diff 232498.Dec 5 2019, 11:15 PM

Added pre-commit of the testcase, changed testcase to run all the way to ASM output so we can use utils/update_llc_test_checks.py.

uabelho marked an inline comment as done.Dec 5 2019, 11:19 PM

Thanks for the suggestion @spatel. Updated the testcase.

llvm/test/CodeGen/ARM/legalize-bitcast.ll
1

I think so. At least we get diffs in the ASM output without/with the fixes applied.
I'm really not very good at reading ARM assembler though so it's not at all obvious to me what the resulting code really do.

Adding ARM asm experts to verify that the code is correct...

efriedma accepted this revision.Dec 6 2019, 12:07 PM
efriedma marked 3 inline comments as done.

LGTM. (Left some notes on the testcase if anyone else wants to follow.)

llvm/test/CodeGen/ARM/legalize-bitcast.ll
21

We split the load into two parts, so we don't read past the end. For the first 64 bits, we load it as <8 x i8>, then bitcast it to <4 x i16> (using vrev). For the last 32 bits, we load it as i32, then swap the high and low halves. Then we concatenate the two, and spill to a stack slot as a native <8 x i16>.

25

The vrev32.16 essentially bitcasts the <8 x i16> to <4 x i32>. Then we extract the third element of the vector. This corresponds to the scalar load in the first block, i.e. the last 4 bytes of the load. Looks correct.

47

So the value we want to return here is the first two bytes, if the i80 were stored in memory. On a big-endian target, that's the two most significant bytes. With LLVM's default calling convention, those bytes end up in the low bits of r0. (This took me a little while to figure out; it would probably be easier to understand if the i80 value was loaded from memory, instead of passed in registers.)

The sequence here is that we shift r0 left 16 bits, move it to d16, move that to d18, "shift" the value right 16 bits using vrev, them move it back to r0. That seems correct.

This revision is now accepted and ready to land.Dec 6 2019, 12:07 PM

LGTM. (Left some notes on the testcase if anyone else wants to follow.)

Thanks for the analysis @eli.friedman!

If I change the int_to_vec test so it loads from memory instead of passing the i80 as a parameter I get

define i16 @int_to_vec() {
; CHECK-LABEL: int_to_vec:
; CHECK:       @ %bb.0:
; CHECK-NEXT:    movw r0, :lower16:i80_p
; CHECK-NEXT:    movt r0, :upper16:i80_p
; CHECK-NEXT:    ldr r1, [r0]
; CHECK-NEXT:    ldr r2, [r0, #4]
; CHECK-NEXT:    ldrh r0, [r0, #8]
; CHECK-NEXT:    orr r0, r0, r2, lsl #16
; CHECK-NEXT:    lsl r3, r1, #16
; CHECK-NEXT:    orr r2, r3, r2, lsr #16
; CHECK-NEXT:    lsr r1, r1, #16
; CHECK-NEXT:    b .LBB1_1
; CHECK-NEXT:  .LBB1_1: @ %bb.1
; CHECK-NEXT:    vmov.i32 q8, #0x0
; CHECK-NEXT:    vrev32.16 q8, q8
; CHECK-NEXT:    @ kill: def $d16 killed $d16 killed $q8
; CHECK-NEXT:    vmov.u16 r0, d16[0]
; CHECK-NEXT:    bx lr
  %i80 = load i80, i80* @i80_p, align 1
  br label %bb.1

bb.1:
  %vec = bitcast i80 %i80 to <5 x i16>
  %e0 = extractelement <5 x i16> %vec, i32 0
  ret i16 %e0
}

without the fix and

define i16 @int_to_vec() {
; CHECK-LABEL: int_to_vec:
; CHECK:       @ %bb.0:
; CHECK-NEXT:    sub sp, sp, #8
; CHECK-NEXT:    movw r0, :lower16:i80_p
; CHECK-NEXT:    movt r0, :upper16:i80_p
; CHECK-NEXT:    ldr r1, [r0]
; CHECK-NEXT:    ldr r2, [r0, #4]
; CHECK-NEXT:    ldrh r0, [r0, #8]
; CHECK-NEXT:    orr r0, r0, r2, lsl #16
; CHECK-NEXT:    lsl r3, r1, #16
; CHECK-NEXT:    orr r2, r3, r2, lsr #16
; CHECK-NEXT:    lsr r1, r1, #16
; CHECK-NEXT:    str r2, [sp, #4] @ 4-byte Spill
; CHECK-NEXT:    str r1, [sp] @ 4-byte Spill
; CHECK-NEXT:    b .LBB1_1
; CHECK-NEXT:  .LBB1_1: @ %bb.1
; CHECK-NEXT:    ldr r0, [sp] @ 4-byte Reload
; CHECK-NEXT:    lsl r1, r0, #16
; CHECK-NEXT:    ldr r2, [sp, #4] @ 4-byte Reload
; CHECK-NEXT:    orr r1, r1, r2, lsr #16
; CHECK-NEXT:    @ implicit-def: $d16
; CHECK-NEXT:    vmov.32 d16[0], r1
; CHECK-NEXT:    @ implicit-def: $q9
; CHECK-NEXT:    vmov.f64 d18, d16
; CHECK-NEXT:    vrev32.16 q9, q9
; CHECK-NEXT:    @ kill: def $d18 killed $d18 killed $q9
; CHECK-NEXT:    vmov.u16 r0, d18[0]
; CHECK-NEXT:    add sp, sp, #8
; CHECK-NEXT:    bx lr
  %i80 = load i80, i80* @i80_p, align 1
  br label %bb.1

bb.1:
  %vec = bitcast i80 %i80 to <5 x i16>
  %e0 = extractelement <5 x i16> %vec, i32 0
  ret i16 %e0
}

with, so at least the fix makes a difference in that case too.
If you prefer that I can update the testcase.

Otherwise I'll just commit as is now.

Thanks!

The load version is a lot more instructions, so it's probably not worth it... the current patch should be fine as-is.

The load version is a lot more instructions, so it's probably not worth it... the current patch should be fine as-is.

Ok, Thanks!

This revision was automatically updated to reflect the committed changes.