This is an archive of the discontinued LLVM Phabricator instance.

Stop the DAG combiner from combining vector stores greater than preferred vector width...
ClosedPublic

Authored by echristo on May 3 2019, 8:16 PM.

Details

Summary

Originally we said that -mpreferred-vector-width was only going to stop the vectorizer and some of code generation, but here's another spot if we want to make sure we don't canonicalize a memcpy/memmmove and then lower it to the widest vector type.

Original testcase:

void Copy256(const char* src, char* dst) {
  char tmp[32];
  for (int i = 0; i < 32; ++i) tmp[i] = src[i];
  for (int i = 0; i < 32; ++i) dst[i] = tmp[i];
}

which is pretty boring, but shows the problem:

vmovups ymm0, ymmword ptr [rdi]
vmovups ymmword ptr [rsi], ymm0
vzeroupper
ret

while the option says that this doesn't necessarily mean no vector code, I think this is a fairly reasonable place to stop some optimization.

Thoughts?

Diff Detail

Event Timeline

echristo created this revision.May 3 2019, 8:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 8:16 PM

Arguably this could use some more comments and I'll add those as well.

hfinkel added a subscriber: hfinkel.May 4 2019, 1:24 AM

Seems reasonable to me. Can have have both a 128- and 256-bit test case?

Seems reasonable to me too.

echristo updated this revision to Diff 198395.May 6 2019, 10:15 PM
echristo added a reviewer: hfinkel.

Update comments. Have testcase work for multiple sizes and multiple preferred vector sizes.

hfinkel accepted this revision.May 6 2019, 10:53 PM

Update comments. Have testcase work for multiple sizes and multiple preferred vector sizes.

Thanks. LGTM.

This revision is now accepted and ready to land.May 6 2019, 10:53 PM
echristo closed this revision.May 7 2019, 12:30 PM

This happened here:

echristo@athyra ~/r/llvm-project> git llvm push
Pushing 1 commit:

96aa9dda693 Make sure that the DAG combiner doesn't merge stores that we explicitly asked not be greater than preferred vector width for the vectorizer. Test for both 128 and 256 with a skylake architecture.

Sending llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
Adding llvm/trunk/test/CodeGen/X86/vector-width-store-merge.ll
Transmitting file data ..done
Committing transaction...
Committed revision 360183.
Committed 96aa9dda693 to svn.

I'm bad at remembering to add it.