This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [CodeGen] Fixed vector halving bug for masked load
ClosedPublic

Authored by aartbik on Apr 21 2020, 8:35 PM.

Details

Summary

Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().

Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.

Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Diff Detail

Event Timeline

aartbik created this revision.Apr 21 2020, 8:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2020, 8:35 PM

I'm concerned that there might be a way to get HiMemVT to contain 0 elements. I think v17i32 would trigger it. We'd widen to v32i32, then split to v16i32 which will have a memvt of v1i32 and v16i32. Then we'll split the v16i32 pieces. And now we'll try to split that v1i32 memory vt.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9468

EVTs should be passed by value.

aartbik marked an inline comment as done.Apr 21 2020, 10:42 PM

I'm concerned that there might be a way to get HiMemVT to contain 0 elements. I think v17i32 would trigger it. We'd widen to v32i32, then split to v16i32 which will have a memvt of v1i32 and v16i32. Then we'll split the v16i32 pieces. And now we'll try to split that v1i32 memory vt.

But do you agree this solution goes in the right direction?
Do you have any suggestions for the v17i32 case?
Happy to test that tomorrow and think about it some more.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9468

I mimicked the prototypes of related methods in this file? Are those wrong too?
Happy to change, just checking what best practices are here?

I'm concerned that there might be a way to get HiMemVT to contain 0 elements. I think v17i32 would trigger it. We'd widen to v32i32, then split to v16i32 which will have a memvt of v1i32 and v16i32. Then we'll split the v16i32 pieces. And now we'll try to split that v1i32 memory vt.

But do you agree this solution goes in the right direction?
Do you have any suggestions for the v17i32 case?
Happy to test that tomorrow and think about it some more.

I wonder if we should fix this in WidenVecRes_MLOAD by doing something closer to WidenVecRes_LOAD. Finding a legal type and slicing up the vector as we widen it.

aartbik updated this revision to Diff 259456.Apr 22 2020, 7:31 PM

do not generate zero storage size masked loads

PTAL

You had a very sharp eye that v17f32 would cause a split with an empty hi masked load eventually.
It is somewhat harmless since the load is not used and thus removed from the DAG, but this revision
is even cleaner by not generating the hi masked load at all!

aartbik updated this revision to Diff 259457.Apr 22 2020, 7:36 PM

removed left behind debug code

craig.topper added inline comments.Apr 22 2020, 7:39 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9495

Stray 'k' got added to this line.

aartbik updated this revision to Diff 259461.Apr 22 2020, 8:03 PM

fixed some stray diffs

aartbik marked an inline comment as done.Apr 22 2020, 8:03 PM
dmgreen added inline comments.Apr 22 2020, 10:55 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1596

Could we return PassThruHi directly?

llvm/test/CodeGen/X86/pr45563-2.ll
2

Please use update_llc_test_checks on tests this size

aartbik marked 3 inline comments as done.Apr 22 2020, 11:38 PM
aartbik added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1596

I experimented with different values, this seemed the least code change, since it folds directly in the DAG.getNode below, and is removed right after that. Any thing else would require more changes to deal with the 1 vs 2 result values.

llvm/test/CodeGen/X86/pr45563-2.ll
2

Ah, I saw some auto generated messages, but was not aware of this tool. But, forgive my noobness, what will this auto-generate? I am really just counting the number of masked moves that result, I don't want to pin down the full assembly?

craig.topper added inline comments.Apr 23 2020, 12:11 AM
llvm/test/CodeGen/X86/pr45563-2.ll
2

It will generate checks for the full assembly, but the X86 maintainers are used to that. We'll just rerun the script and look at the diff if it fails for some change in the future.

aartbik updated this revision to Diff 259647.Apr 23 2020, 11:31 AM
aartbik marked 4 inline comments as done.

used update_llc_test_checks (cool!)

aartbik added inline comments.Apr 23 2020, 11:49 AM
llvm/test/CodeGen/X86/pr45563-2.ll
2

Nice utility! Having the full assembly makes it a bit sensitive to changes, but I can also see that the X86 maintainers want to be aware of even the slightest of changes and with this utility that becomes a lot simpler. Cool!

This revision is now accepted and ready to land.Apr 23 2020, 2:21 PM
This revision was automatically updated to reflect the committed changes.