Page MenuHomePhabricator

[x86] split 256-bit store of concatenated vectors
ClosedPublic

Authored by spatel on May 27 2019, 2:57 PM.

Details

Summary

This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.May 27 2019, 2:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2019, 2:57 PM
spatel marked 2 inline comments as done.May 27 2019, 3:04 PM
spatel added inline comments.
llvm/test/CodeGen/X86/oddsubvector.ll
119–126 ↗(On Diff #201584)

This seems like a failure of load combining? Even so, the split code has less uops than before even if the instruction count increased.

llvm/test/CodeGen/X86/vector-gep.ll
211 ↗(On Diff #201584)

We're obviously spilling here, but I'm not sure what is happening underneath or if this is an important test for perf rather than just correctness/crashing.

Can we ever happen to get volatile/atomic stores here?

Can we ever happen to get volatile/atomic stores here?

Good question - I didn't think about those. We have an ISD::ATOMIC_STORE node type, so that means we can rule out atomics? The existing split transform intended for SandyBridge doesn't appear to check for volatile either, so we might have an existing bug. I'll see if there's any test coverage for these cases.

Can we ever happen to get volatile/atomic stores here?

Good question - I didn't think about those. We have an ISD::ATOMIC_STORE node type, so that means we can rule out atomics? The existing split transform intended for SandyBridge doesn't appear to check for volatile either, so we might have an existing bug. I'll see if there's any test coverage for these cases.

I'm not too familiar with these modifiers, but atomic vector store looks forbidden:
"atomic store operand must have integer, pointer, or floating point type!"

We do have what appears to be an existing bug for volatile:
rL361785

This revision is now accepted and ready to land.May 27 2019, 7:41 PM
RKSimon added inline comments.May 28 2019, 2:54 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
21088 ↗(On Diff #201584)

Not sure if its any use but I created the collectConcatOps helper to do something similar

spatel marked an inline comment as done.May 28 2019, 6:49 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
21088 ↗(On Diff #201584)

I'm not sure if the insert_subvector pattern appears here, but it's definitely worth a look. I'll push this as-is to make sure it doesn't break anything, then look at using collectConcatOps as an improvement.

This revision was automatically updated to reflect the committed changes.
spatel reopened this revision.May 28 2019, 10:51 AM

Reopening - there must be some infinite loop potential even during lowering because this bot is failing on multiple test-suite tests:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684

This revision is now accepted and ready to land.May 28 2019, 10:51 AM
spatel planned changes to this revision.May 28 2019, 10:51 AM
spatel accepted this revision.Jun 4 2019, 9:05 AM

Limited store merging for x86:
rL362507
...so I'll try to push patch again (no changes).

This revision was not accepted when it landed; it landed in state Changes Planned.Jun 4 2019, 9:39 AM
Closed by commit rL362524: [x86] split 256-bit store of concatenated vectors (authored by spatel, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.Jun 5 2019, 9:41 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
21088 ↗(On Diff #201584)

rL362620 - use collectConcatOps().