This is an archive of the discontinued LLVM Phabricator instance.

DAG: Fold extract_vector_elt (scalar_to_vector), K to undef
ClosedPublic

Authored by RKSimon on Sep 3 2018, 5:43 AM.

Details

Summary

This was unconditionally folding this to the source operand,
even if the access was out of bounds. Use undef instead of
the extract is not the first element.

This helps with some cases where 3-vectors are legalized
and avoids processing the 4th component.

Avoids regressions in a future commit.

Diff Detail

Event Timeline

arsenm created this revision.Sep 3 2018, 5:43 AM

Avoids regressions in a future commit.

Do you have the patch available to show the test case?

arsenm added a comment.Sep 9 2018, 6:54 PM

Avoids regressions in a future commit.

Do you have the patch available to show the test case?

I think this was for D51736, but I somehow ended up avoiding needing this to avoid regressions.

Avoids regressions in a future commit.

Do you have the patch available to show the test case?

I think this was for D51736, but I somehow ended up avoiding needing this to avoid regressions.

So would you prefer to abandon this or try to add a different test?

I think this should be done, but I have a hard time coming up with AMDGPU test cases that actually use SCALAR_TO_VECTOR

The patch has rotted and needs rewriting, but putting in an assert causes hits in these tests:

Failing Tests (16):
    LLVM :: CodeGen/AMDGPU/copy-illegal-type.ll
    LLVM :: CodeGen/AMDGPU/cvt_f32_ubyte.ll
    LLVM :: CodeGen/AMDGPU/flat_atomics_i64.ll
    LLVM :: CodeGen/AMDGPU/global_atomics_i64.ll
    LLVM :: CodeGen/AMDGPU/load-constant-i1.ll
    LLVM :: CodeGen/AMDGPU/load-constant-i16.ll
    LLVM :: CodeGen/AMDGPU/load-global-i1.ll
    LLVM :: CodeGen/AMDGPU/load-global-i16.ll
    LLVM :: CodeGen/AMDGPU/load-local-i1.ll
    LLVM :: CodeGen/AMDGPU/load-local-i16.ll
    LLVM :: CodeGen/AMDGPU/max.i16.ll
    LLVM :: CodeGen/AMDGPU/min.ll
    LLVM :: CodeGen/AMDGPU/select.f16.ll
    LLVM :: CodeGen/AMDGPU/selectcc.ll
    LLVM :: CodeGen/AMDGPU/v_madak_f16.ll
    LLVM :: CodeGen/X86/vector-intrinsics.ll

If possible please can you use one of these to show a codegen change? I tried the x86 test but didn't find anything.

TBH, I think this patch can be accepted anyhow as we know the codepath is being tested, its just annoying that it doesn't show up in test codegen....

@arsenm Are you still looking at this? I haven't been able to make a test case yet but I'm expecting PR41097 to require this as well.

@arsenm Are you still looking at this? I haven't been able to make a test case yet but I'm expecting PR41097 to require this as well.

No, I don’t have time for this now

RKSimon commandeered this revision.Mar 17 2019, 1:54 PM
RKSimon updated this revision to Diff 191043.
RKSimon edited reviewers, added: arsenm; removed: RKSimon.

Finally got a test change, and its an increase in instructions....

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2019, 1:55 PM

I think this is a regression because at some point the high element unnecessarily goes from undef to 0:
Combining: t158: v2i16 = extract_subvector t188, Constant:i32<2>
... into: t212: v2i16 = BUILD_VECTOR t134, Constant:i16<0>

This was undef originally

RKSimon updated this revision to Diff 221327.Sep 23 2019, 7:33 AM

rebase - still need to fix the extract subvector regression

@arsenm The subvector regression is still there but the actual instruction count has reduced

arsenm accepted this revision.Jan 20 2020, 2:04 PM

LGTM

llvm/test/CodeGen/AMDGPU/max.i16.ll
163–164

I'm pretty sure these zeros are dead

173–175

I think this for some reason just got luckier on scheduling

This revision is now accepted and ready to land.Jan 20 2020, 2:04 PM
This revision was automatically updated to reflect the committed changes.