This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Detect bitcasted splat patterns
Needs ReviewPublic

Authored by luke on Aug 24 2023, 7:48 AM.

Details

Summary

A common issue in RV32 is that i64 splats of fixed length vectors are legalized
to (bitcast v2i64 (build_vector <i32 x, i32 y, i32 x, i32 y>)).

These are then lowered in RISCVISelLowering to something like:

(insert_subvector (bitcast (extract_subvector (vmv_v_x_vl))))

RV64 doesn't have this problem since the insert_subvector and extract_subvector pairs are usually combined away, but the bitcast introduced with SEW=64 on RV32 prevents this from happening.

This patch handles the case by peeking through the inserts, bitcasts and
extracts to detect the "hidden" splats.

Diff Detail

Unit TestsFailed

Event Timeline

luke created this revision.Aug 24 2023, 7:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 7:48 AM
luke requested review of this revision.Aug 24 2023, 7:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 7:49 AM

Should we DAGCombine some of this away? Or custom lower fixed vector bitcasts into scalable bitcast?

luke added a comment.Aug 25 2023, 4:07 AM

Should we DAGCombine some of this away? Or custom lower fixed vector bitcasts into scalable bitcast?

I tried a DAGCombine of (bitcast (build_vector)) to emit splat_vector_parts, but it's a bit awkward and results in more materialisations on the stack, because it doesn't get to take advantage of the build_vector lowering optimisations.

Will try combining/lowering it to vmv_v_x_vl directly

Should we DAGCombine some of this away? Or custom lower fixed vector bitcasts into scalable bitcast?

I tried a DAGCombine of (bitcast (build_vector)) to emit splat_vector_parts, but it's a bit awkward and results in more materialisations on the stack, because it doesn't get to take advantage of the build_vector lowering optimisations.

Will try combining/lowering it to vmv_v_x_vl directly

I was asking more about the inserts and extracts. Shouldn't we end up with a scalable vector bitcast instead of a fixed vector bitcast sandwiched between inserts and extracts?