This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Generate VDUP(Const) from constant buildvectors
ClosedPublic

Authored by dmgreen on Jun 7 2021, 6:20 AM.

Details

Summary

If we cannot otherwise use a VMOVimm/VMOVFPimm/VMVNimm, fall back to producing a VDUP(const) as opposed to a constant pool load. This will at least be smaller codesize and can allow the VDUP to be folded into other instructions.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 7 2021, 6:20 AM
dmgreen requested review of this revision.Jun 7 2021, 6:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2021, 6:20 AM

Looks like an obviously good thing, and I only have one nitpick.

llvm/lib/Target/ARM/ARMISelLowering.cpp
7648

You've used VECTOR_REG_CAST where other branches of this code have BITCAST.

As far as I can see, either one will work provided the constant is constructed right (e.g. if you wanted to make a v16i8 containing 1,2,3,4,1,2,3,4,... then you might have to vdup 0x01020304 or 0x04030201 depending which cast you wanted to use afterwards). But I don't see any big-endian test to demonstrate it picking the right one. Unless I've missed one, could you add it?

dmgreen updated this revision to Diff 350514.Jun 8 2021, 12:42 AM

Added two new test cases, mov_int8_1234 that does like you said i8 <1,2,3,4,1,2,3,4,..> and mov_int32_16908546 which is 0x1020102 VDUP'd as a i16.

simon_tatham accepted this revision.Jun 8 2021, 2:53 AM
simon_tatham added inline comments.
llvm/test/CodeGen/Thumb2/mve-vmovimm.ll
37–39

I think this output is right, but it confused me completely for a while and I had to try it in emulation to convince myself!

In the middle of a larger function, I think that if you wanted to make this 1,2,3,4,1,2,3,4,... vector and then immediately apply another v16i8 operation to it, you would vdup the same 32-bit constant 0x04030201 regardless of endianness, because the logical 'lane 0' of the vector always occupies the low-order bits.

And the reason why the output is different between LE and BE in this context is that the vdup is immediately followed by a function return, which in BE requires an extra vrev due to the vector register PCS. And that function-return vrev has been folded into the constant, which is why it's the other way round here.

So, I think this is the right output, but it might benefit from a comment in case the next reader gets as confused as I did!

This revision is now accepted and ready to land.Jun 8 2021, 2:53 AM
This revision was landed with ongoing or failed builds.Jun 8 2021, 12:52 PM
This revision was automatically updated to reflect the committed changes.