Download Raw Diff

Details

Reviewers

hakzsam
arsenm
efriedma

Commits

rZORG3fec75d017ad: [CodeGen] Fixed de-optimization of legalize subvector extract
rG3fec75d017ad: [CodeGen] Fixed de-optimization of legalize subvector extract
rGe3cbdaf1b5e7: [CodeGen] Fixed de-optimization of legalize subvector extract
rL360942: [CodeGen] Fixed de-optimization of legalize subvector extract

Summary

The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.

This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.

Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64

Diff Detail

Repository: rL LLVM

Event Timeline

tpr created this revision.Apr 9 2019, 5:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2019, 5:49 AM

Herald added subscribers: llvm-commits, nhaehnle, jvesely. · View Herald Transcript

Harbormaster completed remote builds in B30227: Diff 194299.Apr 9 2019, 5:50 AM

tpr mentioned this in D58902: [AMDGPU] Support for v3i32/v3f32.Apr 9 2019, 5:56 AM

tpr added reviewers: hakzsam, arsenm, efriedma.Apr 9 2019, 6:10 AM

Herald added a subscriber: wdng. · View Herald TranscriptApr 9 2019, 6:10 AM

This fixes the SI regression with RADV.
Thanks a lot Tim.

efriedma added inline comments.Apr 9 2019, 12:35 PM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4066 ↗	(On Diff #194299)	So if I'm following this correctly, this takes a cast like `<12 x i8>` -> `<3 x i32>`, and turns it into `<16 x i8>` -> `<4 x i32>`? That makes sense, but please add a comment describing it.

Can you reduce the test further?

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗	(On Diff #194299)	I wouldn’t trust this to check this, a generated check would be better

V2: Addressed review comments.

Harbormaster completed remote builds in B30674: Diff 195527.Apr 17 2019, 3:05 AM

V3: Further reduced test case.

Harbormaster completed remote builds in B30675: Diff 195528.Apr 17 2019, 3:12 AM

tpr marked 3 inline comments as done.Apr 17 2019, 3:12 AM

tpr added inline comments.

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗	(On Diff #194299)	Not really sure what you're suggesting, but I hope this is better.

ping?
SI is still broken without this patch.

nhaehnle added inline comments.May 7 2019, 1:27 AM

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗	(On Diff #194299)	Maybe you can use `util/update_llc_test_checks.py`?

nhaehnle mentioned this in D61553: AMDGPU: Fix ds_{read,write}2_b64 on SI/gfx6.May 7 2019, 4:27 AM

Hi Samuel. Sorry for the delay; I kind of lost track of this change.

Question for Nicolai below.

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗	(On Diff #194299)	You mean have a check line for each line of IR output in the function? Do you think that would be better than the negative check for storing to stack?

nhaehnle added inline comments.May 13 2019, 5:19 AM

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗	(On Diff #194299)	Yes, I do think so. Having the auto-generated assertions means that we catch other things going wrong, and it's easy enough to update them for benign changes. I realize that you actually need update_mir_test_checks in this case due to the -stop-after, and the script is sensitive to the fact that there's no space between the `<` and the `%s`.

V4: update_mir_test_checks the test.

Harbormaster completed remote builds in B31881: Diff 199446.May 14 2019, 8:06 AM

tpr marked 2 inline comments as done.May 14 2019, 8:08 AM

tpr added inline comments.

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll
4 ↗	(On Diff #194299)	Thanks Nicolai. Now done.

tpr marked an inline comment as done.May 15 2019, 10:35 AM

Is someone now able to approve this? Eli?

LGTM, assuming the review comments about the testcase are resolved.

This revision is now accepted and ready to land.May 16 2019, 12:16 PM

Closed by commit rL360942: [CodeGen] Fixed de-optimization of legalize subvector extract (authored by tpr). · Explain WhyMay 16 2019, 2:46 PM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fixed de-optimization of legalize subvector extract
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 199909

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/trunk/test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fixed de-optimization of legalize subvector extractClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 199909

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/trunk/test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll

[CodeGen] Fixed de-optimization of legalize subvector extract
ClosedPublic