This is an archive of the discontinued LLVM Phabricator instance.

[Thumb2][MVE] Recognise shuffle truncation patterns suitable for ARMISD::MVETRUNC
ClosedPublic

Authored by RKSimon on Jan 15 2023, 6:14 AM.

Details

Summary

I'm helping with the remaining regressions on D127115, and one of my candidate fixes caused some regressions with MVE interleaved shuffles due to poor handling of 'truncation' style shuffle masks (0,2,4,6,...).

This patch attempts to use the ARMISD::MVETRUNC node to handle these cases, based off existing code in LowerTruncate.

It handles both (0,2,4,6,...) and (1,3,5,7,....) 'top' style patterns (assuming no endian problems). I shift down the 'top' patterns - a basic search of ARM docs suggests MVE has some top/bottom truncation/narrowing instructions but I don't seem to be able to get them to be used.

NOTE: I don't know anything about MVE so this might be completely wrong :-(

Diff Detail

Event Timeline

RKSimon created this revision.Jan 15 2023, 6:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2023, 6:14 AM
RKSimon requested review of this revision.Jan 15 2023, 6:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2023, 6:14 AM

Thanks for looking at this. I was considering the changes in mve-vst3.ll from D127115 to be unimportant, the tests were just added for completeness. This looks like a nice improvement on it's own though. It does involve going via the stack which can be quite expensive, but should hopefully be cheaper overall than all the lane moves.

I think the bitcasts will be causing BE to be incorrect. Can you change them to VECTOR_REG_CAST, so that they are not dependent on the endianness?

llvm/lib/Target/ARM/ARMISelLowering.cpp
8876

Can you use VECTOR_REG_CAST instead of bitcast.

Just to be clear - its not actually to the fix the mve-vst3.ll change in the current D127115 patch - its some additional regressions we see when I make a change to fix a number of build_vector regressions - there's so much yak shaving associated with D127115 :)

RKSimon updated this revision to Diff 489498.Jan 16 2023, 4:12 AM

Use VECTOR_REG_CAST instead of BITCAST

RKSimon added inline comments.Jan 16 2023, 6:12 AM
llvm/test/CodeGen/Thumb2/mve-vld4.ll
288

I think if we supported trunc from wider elements (in this case vXi64 to vXi16) we'd be able to catch all of these - but the patch currently only handles truncation from double-width element types.

dmgreen accepted this revision.Jan 16 2023, 7:00 AM

Thanks. LGTM

This revision is now accepted and ready to land.Jan 16 2023, 7:00 AM
This revision was landed with ongoing or failed builds.Jan 16 2023, 10:00 AM
This revision was automatically updated to reflect the committed changes.