This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] avoid vector extract/insert around binop
ClosedPublic

Authored by spatel on Oct 25 2022, 1:37 PM.

Details

Summary

scalar-to-vector (binop (extractelt V, Idx), C) --> shuffle (bo V, C'), {Idx, -1, -1...}

We generally try to avoid ad-hoc vectorization in SDAG, but the motivating case from issue #39482
escapes our normal vectorization folds in IR. It seems like it should always be a win to transform this pattern in cases where we have the same vector type for input and output and the target supports the vector operation because we avoid transfers from vector to scalar and back.

In the x86 shift examples, we create the scalar-to-vector node during legalization. I'm not sure if there's a more general way to create the pattern for testing. (If so, I could add tests for other targets.)

Diff Detail

Event Timeline

spatel created this revision.Oct 25 2022, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 1:37 PM
spatel requested review of this revision.Oct 25 2022, 1:37 PM
RKSimon added inline comments.Oct 25 2022, 1:59 PM
llvm/test/CodeGen/X86/urem-vector-lkk.ll
149

can the fold be expanded to work with insert_vector_elt as well?

spatel added inline comments.Oct 25 2022, 2:32 PM
llvm/test/CodeGen/X86/urem-vector-lkk.ll
149

Yes - but I expect it'll be a closer call whether things are always profitable in those cases.

On this particular example, I think we'd need to change something else or have an x86-specific solution because we convert to target-specific ops during legalization and before there's a chance to clean up intermediate casts:

        t90: i32 = X86ISD::PEXTRW t2, TargetConstant:i8<1>
      t91: i16 = truncate t90
    t76: i16 = and t91, Constant:i16<31>
  t102: i32 = any_extend t76
t103: v8i16 = X86ISD::PINSRW t82, t102, TargetConstant:i8<1>
RKSimon added inline comments.Oct 26 2022, 3:02 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23548

Do we need a isShuffleMaskLegal check?

dmgreen added inline comments.Oct 26 2022, 3:30 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23548

Oh that might be a good idea, some instruction sets like MVE do not have many shuffles. It will depend on the type, it may just be a lane move which is cheap, but may require going through grp regs. I'm not sure there is a lot in it either way though, it might be slow in either case for architectures like that.

RKSimon added inline comments.Oct 26 2022, 4:00 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23548

I guess we should always allow idx == 0 as that's effectively what (vt scalar_to_vector(extract_vector_elt(vt x,0))) is

spatel marked 3 inline comments as done.Oct 26 2022, 5:55 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23548

I'll add the legality check. If a target says {0,-1, -1...} is not legal, that seems like a bug for that target since that's an identity (no-op) mask.

In the x86 shift tests that I stepped through, we always shuffle the demanded element down to element 0 before the extract, so we won't actually exercise that extra predicate, but I'll try to find some other case where it happens.

spatel updated this revision to Diff 470797.Oct 26 2022, 6:15 AM
spatel marked an inline comment as done.

Patch updated:

  1. Added isShuffleMaskLegal() check.
  2. Added one-use check (this might be eased as a follow-up, but needs more tests - I added one test to verify, so it's currently showing no diff).
  3. Added TODO comments for enhancements.
RKSimon added inline comments.Oct 26 2022, 6:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23518

is anything stopping a vectorized integer division with undef elements occurring apart from the fact that almost no target has such an instruction?

spatel planned changes to this revision.Oct 26 2022, 7:06 AM
spatel marked an inline comment as done.
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23518

Good point - I don't know how to test it, but we can't do this for opcodes that are not speculatable (because of immediate UB).

A similar problem came up here:
17dcbd816547
...so I should've remembered that! :)

spatel updated this revision to Diff 470822.Oct 26 2022, 8:35 AM
spatel marked an inline comment as done.

Updated to use isSafeToSpeculativelyExecute() (added with 3aec0211187b98).
That avoids executing div/rem on unknown/undef vector elements.

RKSimon accepted this revision.Oct 26 2022, 8:37 AM

LGTM - cheers!

This revision is now accepted and ready to land.Oct 26 2022, 8:37 AM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.Oct 27 2022, 11:55 PM

@spatel, after this commit, we started seeing an assertion failure being hit in two of our internal tests. I have filed this as issue #58661, can you take a look?

thakis added a subscriber: thakis.Oct 28 2022, 7:00 AM

@spatel, after this commit, we started seeing an assertion failure being hit in two of our internal tests. I have filed this as issue #58661, can you take a look?

Thanks for filing this! We hit the same problem, and the fix fixes it for us too.

It would've been nice if this had been reverted once the problem was found – then the probably would've gone away 7 hours earlier and I wouldn't have had to spend time to track it down as well :) If things take a while to fix, I think it's good to revert first and then fix asynchronously, to save the duplicate work of several people tracking down their compile failures to the same commit, only to discover that the problem is already known.

(Thanks for the fix too, of course!)

@spatel, after this commit, we started seeing an assertion failure being hit in two of our internal tests. I have filed this as issue #58661, can you take a look?

Thanks for filing this! We hit the same problem, and the fix fixes it for us too.

It would've been nice if this had been reverted once the problem was found – then the probably would've gone away 7 hours earlier and I wouldn't have had to spend time to track it down as well :) If things take a while to fix, I think it's good to revert first and then fix asynchronously, to save the duplicate work of several people tracking down their compile failures to the same commit, only to discover that the problem is already known.

I agree - if there's a clear connection between a patch and a crash, then please do revert. This was actually the best I could have done to post a fix given the timing; it was patched in about an hour after waking up and seeing the bug report. :)
Sorry about the bug, but thanks to all for the reports and reduced tests.