This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an extractelement+store
ClosedPublic

Authored by craig.topper on Jul 31 2019, 1:55 PM.

Details

Summary

We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it.

This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jul 31 2019, 1:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 1:55 PM
RKSimon accepted this revision.Jul 31 2019, 2:52 PM

LGTM, even if its just temporary until widening-legalization is flipped

This revision is now accepted and ready to land.Jul 31 2019, 2:52 PM
This revision was automatically updated to reflect the committed changes.