This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad
ClosedPublic

Authored by LemonBoy on Apr 29 2020, 8:48 AM.

Details

Summary

The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.

The technique employed by ExpandLoad is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.

Diff Detail

Event Timeline

LemonBoy created this revision.Apr 29 2020, 8:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2020, 8:48 AM
craig.topper added inline comments.Apr 30 2020, 10:32 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6633

This comment doesn't completely make sense. The memory VT is still based on NumSrcBits so it isn't loading the padding bits.

6649

Why do we need to AND before the TRUNCATE? Isn't the AND masking to the same number of bits as the truncate?

LemonBoy marked 2 inline comments as done.Apr 30 2020, 10:51 AM
LemonBoy added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6633

Indeed, I only wanted to highlight the fact the extra bits aren't masked away when legalizing the odd-sized loads. I'll reword the comment.

6649

I found that sometimes the generated pattern would be something like (anyext DstEltVT (truncate SrcEltVT (load LoadVT)) so when the optimizer kicks in and folds the truncate and anyext only the load remains.

craig.topper added inline comments.Apr 30 2020, 11:31 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6649

Why is that bad?

LemonBoy marked an inline comment as done.Apr 30 2020, 1:21 PM
LemonBoy added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6649

The extra bits are never masked away. If you bypass the AND and check out the generated code for test2 in X86/pr15267.ll you can see the packed 4i1 is only shifted and never and-ed so the generated vector has garbage in upper bits.

craig.topper added inline comments.Apr 30 2020, 1:46 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6649

Its ok that they are garbage. Type legalization inserted the anyextend which allowed them to be garbage. Type legalization in the caller that consumes the vector result is responsible for clearing them if it needs to.

Its ok that they are garbage. Type legalization inserted the anyextend which allowed them to be garbage. Type legalization in the caller that consumes the vector result is responsible for clearing them if it needs to.

I see, the hidden contract with the caller is the piece I was missing.
Nonetheless leaving the AND seems to help some backends produce slightly tighter code, updating the test suite with that change shows a net increase of ~200 lines. I have no strong opinion about the redundant AND, I can easily update the patch and completely drop it.

craig.topper accepted this revision.Apr 30 2020, 5:12 PM

I think I'm fine with this as is. I'm not sure I care a whole lot about how the performance of code that accesses weird non-byte sized memory types like this. There's no way to generate these loads from C as far as I know. If these cases are really important then there are probably much clever ways we can handle them. For example, the vXi1 case would be better served by using something like what's done in combineToExtendBoolVectorInReg from X86ISelLowering.cpp

This revision is now accepted and ready to land.Apr 30 2020, 5:12 PM

I forgot you were going to reword that comment.

LemonBoy updated this revision to Diff 261508.May 1 2020, 11:32 AM

Amend a comment.

I forgot you were going to reword that comment.

Can you please commit the patch if the updated comment is fine with you?

This revision was automatically updated to reflect the committed changes.
RKSimon added a subscriber: RKSimon.May 5 2020, 1:55 AM

@LemonBoy Please can you take a look at this fuzz regression, it looks like it appeared due to this patch: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22005

@LemonBoy Please can you take a look at this fuzz regression, it looks like it appeared due to this patch: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22005

Sure thing, I already have a patch ready.