This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Extra MVE VADDV reduction patterns
ClosedPublic

Authored by dmgreen on Feb 7 2020, 6:26 AM.

Details

Summary

We already make use of the VADDV vector reduction instruction for cases where the input and the output start out at the same type. The MVE instruction however will sum into an i32, so if we are summing a v16i8 into an i32, we can still use the same instructions. In terms of IR, this looks like a sext of a legal type (v16i8) into a very illegal type (v16i32) and a vecreduce.add of that into the result. This means we have to catch the pattern early in a DAG combine, producing a target VADDVs/u node, where the signedness is now important.

This is the first part, handling VADDV and VADDVA. There are also VADDVL/VADDVLA instructions, which are interesting because they sum into a 64bit value. And VMLAV and VMLALV, which are interesting because they also do a multiply of two values. It may look a little odd in places as a result.

This is something that I've had sat around on my computer for a while. On it's own it will probably not do very much, as the vectorizer will not produce this IR. Improving that will be a more complicated job than just these patterns though.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 7 2020, 6:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 6:26 AM
simon_tatham accepted this revision.Feb 10 2020, 3:14 AM

LGTM. Even if it's never used by the vectorizer, this will surely be useful when we get to that part of the intrinsics API.

This revision is now accepted and ready to land.Feb 10 2020, 3:14 AM
This revision was automatically updated to reflect the committed changes.