This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Convert VDUPLANE to VDUP under MVE
ClosedPublic

Authored by dmgreen on May 7 2020, 3:31 PM.

Details

Summary

Unlike Neon, MVE does not have a way of duplicating from a vector lane, so a VDUPLANE currently selects to a VDUP(move_from_lane(..)). This forces that to be done earlier as a dag combine to allow other folds to happen.

It converts to a VDUP(EXTRACT). On FP16 this is then folded to a VGETLANEu to prevent it from creating a vmovx;vmovhr pair, using a single move_from_reg instead.

Diff Detail

Event Timeline

dmgreen created this revision.May 7 2020, 3:31 PM

Some of the code differences here make me suspect we're missing combines for VDUPLANE. But that's not really something you need to concern yourself with here, I guess.

If you never want VDUPLANE, it doesn't seem like there's much point to generating it in the first place; I guess you want to continue supporting it just to make it easier to share code between NEON and MVE?

llvm/lib/Target/ARM/ARMISelLowering.cpp
13858

I guess if you didn't have a special case for f16 here, you could still eventually get to the same place, but it would take some extra steps?

dmgreen marked an inline comment as done.May 8 2020, 12:45 AM

If you never want VDUPLANE, it doesn't seem like there's much point to generating it in the first place; I guess you want to continue supporting it just to make it easier to share code between NEON and MVE?

Yep. They can be generate in a few different place, and although it would be possible to stop them being created, it complicates the logic. I agree it's strange on it's own to create a node only to convert it into something else, but if it keeps the buildvector/vectorshuffle code simpler and helps them be shared between neon and mve, I think this is probably simpler overall.

llvm/lib/Target/ARM/ARMISelLowering.cpp
13858

I was originally thinking this would need to look at the demanded bits of the VMOVrh which complicate things, but yeah it's simpler than that. With VGETLANEu we can add a fold easily enough and still get the top lanes correct. I can change things around to do it that way.

dmgreen updated this revision to Diff 262828.May 8 2020, 12:56 AM

Now with an extra VMOVrh(extract(..)) -> VGETLANEu fold.

dmgreen edited the summary of this revision. (Show Details)May 8 2020, 12:57 AM
This revision is now accepted and ready to land.May 8 2020, 10:29 AM
This revision was automatically updated to reflect the committed changes.