Page MenuHomePhabricator

[CodeGen] Fixed de-optimization of legalize subvector extract
ClosedPublic

Authored by tpr on Apr 9 2019, 5:49 AM.

Details

Summary

The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.

This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.

Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64

Diff Detail

Repository
rL LLVM

Event Timeline

tpr created this revision.Apr 9 2019, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2019, 5:49 AM

This fixes the SI regression with RADV.
Thanks a lot Tim.

efriedma added inline comments.Apr 9 2019, 12:35 PM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4066 ↗(On Diff #194299)

So if I'm following this correctly, this takes a cast like <12 x i8> -> <3 x i32>, and turns it into <16 x i8> -> <4 x i32>? That makes sense, but please add a comment describing it.

Can you reduce the test further?

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗(On Diff #194299)

I wouldn’t trust this to check this, a generated check would be better

tpr updated this revision to Diff 195527.Apr 17 2019, 3:05 AM

V2: Addressed review comments.

tpr updated this revision to Diff 195528.Apr 17 2019, 3:11 AM

V3: Further reduced test case.

tpr marked 3 inline comments as done.Apr 17 2019, 3:12 AM
tpr added inline comments.
test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗(On Diff #194299)

Not really sure what you're suggesting, but I hope this is better.

ping?
SI is still broken without this patch.

nhaehnle added inline comments.Tue, May 7, 1:27 AM
test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗(On Diff #194299)

Maybe you can use util/update_llc_test_checks.py?

tpr marked 2 inline comments as done.Wed, May 8, 2:31 AM

Hi Samuel. Sorry for the delay; I kind of lost track of this change.

Question for Nicolai below.

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗(On Diff #194299)

You mean have a check line for each line of IR output in the function? Do you think that would be better than the negative check for storing to stack?

nhaehnle added inline comments.Mon, May 13, 5:19 AM
test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗(On Diff #194299)

Yes, I do think so. Having the auto-generated assertions means that we catch other things going wrong, and it's easy enough to update them for benign changes.

I realize that you actually need update_mir_test_checks in this case due to the -stop-after, and the script is sensitive to the fact that there's no space between the < and the %s.

tpr updated this revision to Diff 199446.Tue, May 14, 8:04 AM

V4: update_mir_test_checks the test.

tpr marked 2 inline comments as done.Tue, May 14, 8:08 AM
tpr added inline comments.
test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗(On Diff #194299)

Thanks Nicolai. Now done.

tpr marked an inline comment as done.Wed, May 15, 10:35 AM
tpr added a comment.Thu, May 16, 10:19 AM

Is someone now able to approve this? Eli?

efriedma accepted this revision.Thu, May 16, 12:16 PM

LGTM, assuming the review comments about the testcase are resolved.

This revision is now accepted and ready to land.Thu, May 16, 12:16 PM
This revision was automatically updated to reflect the committed changes.