This is an archive of the discontinued LLVM Phabricator instance.

[X86] Make EltsFromConsecutiveLoads code path work only on vectors it expects
ClosedPublic

Authored by mkuper on Dec 9 2014, 1:21 AM.

Details

Summary

EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly.
r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash.

This makes the assumption explicit by adding a check.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 17074.Dec 9 2014, 1:21 AM
mkuper retitled this revision from to [X86] Make EltsFromConsecutiveLoads code path work only on vectors it expects.
mkuper updated this object.
mkuper edited the test plan for this revision. (Show Details)
mkuper added a reviewer: spatel.
mkuper added a subscriber: Unknown Object (MLST).
spatel edited edge metadata.Dec 9 2014, 9:37 AM

Hi Michael,
Thanks for notifying me about this bug. While your patch will prevent the crash, it will also prevent vector loads for i8 / i16, right? Is it possible to fix the condition to avoid the crash while still generating a vector load for all data types?

For reference - while stepping through this path, I noticed another bug:
http://llvm.org/bugs/show_bug.cgi?id=21790

For your testcase with i64, it looks like we'll end up with the optimal codegen, but only because we'll crack the 256-bit vector in half and then the two i64 loads will be treated like a full vector load the next time we call EltsFromConsecutiveLoads().

Can you add your testcase to test/CodeGen/X86/vec_loadsingles.ll (name change is probably in order at this point), so we have all of the related testcases in one file? See inline comments for the testcase itself.

test/CodeGen/X86/elts-from-loads-256.ll
1

Use FileCheck for expected output.

7–9

Remove unnecessary test elements: addrspace, attributes.

23

Add newline

Thanks for the review, Sanjay.

Right about the test-case, it was bugpoint-reduced, and I didn't clean it up enough, thanks for pointing it out.
Will fix and put it in the right place.

Regarding vector loads of i8 / i16 - I believe that didn't work to begin with.
The existing code fires only when NumElems == 4, and creates a ResNode of type v2i64. So for the bitcast in the end to be valid, the element size must be 32 bit. Or did I miss something?

Regarding the PR - you're right, it looks like unneeded domain crossing. I'd rather resolve the crash first, though.

mkuper updated this revision to Diff 17089.Dec 9 2014, 12:47 PM
mkuper edited edge metadata.

Test-case updated.

spatel accepted this revision.Dec 9 2014, 1:34 PM
spatel edited edge metadata.
In D6579#5, @mkuper wrote:

Regarding vector loads of i8 / i16 - I believe that didn't work to begin with.
The existing code fires only when NumElems == 4, and creates a ResNode of type v2i64. So for the bitcast in the end to be valid, the element size must be 32 bit. Or did I miss something?

Ah, ok. Please add a 'TODO' comment there that we should handle other types and/or file a PR. I don't think there's any reason to limit the optimization to just 32-bit elements.

This revision is now accepted and ready to land.Dec 9 2014, 1:34 PM
mkuper closed this revision.Dec 10 2014, 12:47 AM
mkuper updated this revision to Diff 17115.

Closed by commit rL223922 (authored by @mkuper).