Page MenuHomePhabricator

[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
Needs ReviewPublic

Authored by RKSimon on Wed, Jun 26, 5:26 AM.

Details

Summary

This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))), this allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine (note none of the regressions are due to removing this).

We see a number of regressions which is why I went for making it x86 in the first place:

AArch64's poor handling of shuffle(bitcast(),bitcast()) cases - AArch64ISD::DUPLANE cases in particular as well as "extract high" patterns.

The merge-store.ll change is interesting - AArch64TargetLowering::allowsMisalignedMemoryAccesses permits 'fast' unaligned v2i64 stores in all cases which is being exposed by removing the bitcast.

Most likely we need to address these issues before this patch can proceed - I'm happy to help if someone can suggest what is the most critical to fix first.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Wed, Jun 26, 5:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Jun 26, 5:26 AM

It looks like the arm64-neon-2velem.ll regressions are a shuffle lowering issue, yes; we're creating a DUPLANE where the operand is an extract_subvector, and it doesn't simplify.

The failure to match fcvtl2 probably is just a matter of fixing the pattern in the .td file: it explicitly checks for an operation where the operand is an extract_subvector.

Not sure what the right resolution is for allowsMisalignedMemoryAccesses; we currently only use FeatureSlowMisaligned128Store for certain Exynos targets. Probably not too important.